| 0:00:13 | so that and give you a a a uh a um all of you up the whole to locate and |
|---|
| 0:00:18 | just going to you give a brief description of how the |
|---|
| 0:00:20 | a to model various acts to model classes or |
|---|
| 0:00:23 | organise just to give you a flavour of file |
|---|
| 0:00:25 | what is meant by the court is modular |
|---|
| 0:00:28 | and parts that don't need to know about each of the north |
|---|
| 0:00:32 | um um |
|---|
| 0:00:34 | so just tool |
|---|
| 0:00:36 | re rate um |
|---|
| 0:00:39 | uh |
|---|
| 0:00:40 | the thing that we support currently it's |
|---|
| 0:00:42 | it's mainly the |
|---|
| 0:00:43 | the standard max in the cute training of acoustic models together with a gmms and in the kind of max |
|---|
| 0:00:48 | that cute framework |
|---|
| 0:00:50 | um we have the usual in your transforms like lda to |
|---|
| 0:00:54 | and S T C |
|---|
| 0:00:55 | um |
|---|
| 0:00:56 | we also support speaker adaptation |
|---|
| 0:00:59 | currently if a are is |
|---|
| 0:01:01 | a we have tested it in the recipes |
|---|
| 0:01:04 | mllr lower court is there it's |
|---|
| 0:01:05 | you mean tested um |
|---|
| 0:01:07 | this still |
|---|
| 0:01:08 | any to right so somebody needs to write the the cable |
|---|
| 0:01:12 | and and on them um |
|---|
| 0:01:14 | so |
|---|
| 0:01:15 | mllr is not in the recipe |
|---|
| 0:01:17 | almost done |
|---|
| 0:01:19 | um |
|---|
| 0:01:20 | and well |
|---|
| 0:01:21 | uh and uh |
|---|
| 0:01:22 | and leather obviously has but it it's with which trees then if and lower |
|---|
| 0:01:26 | has to |
|---|
| 0:01:27 | variations of one it's it's just a global transform or with which trees |
|---|
| 0:01:32 | a |
|---|
| 0:01:34 | uh yeah and i |
|---|
| 0:01:35 | this is the point |
|---|
| 0:01:36 | which |
|---|
| 0:01:37 | and once uh then can mention that that |
|---|
| 0:01:39 | we had some discussion whether two |
|---|
| 0:01:41 | a sub for um things like uh do you known type systems are be take models where |
|---|
| 0:01:46 | uh and uh for now |
|---|
| 0:01:49 | uh things are fairly simple |
|---|
| 0:01:51 | um we decided not to do it now |
|---|
| 0:01:54 | maybe if the need is felt in feature and sometimes P |
|---|
| 0:01:57 | also |
|---|
| 0:01:58 | for the course of |
|---|
| 0:01:59 | this development |
|---|
| 0:02:00 | a a couple of times a part |
|---|
| 0:02:01 | my |
|---|
| 0:02:02 | be good to have a system like that |
|---|
| 0:02:04 | but currently when a gmm it's |
|---|
| 0:02:06 | it's |
|---|
| 0:02:07 | uh a very specific thing with means and covariances |
|---|
| 0:02:11 | uh and i'm going to |
|---|
| 0:02:12 | just be few also see how the gmms are implemented |
|---|
| 0:02:15 | um |
|---|
| 0:02:16 | and yeah the sims in the thing with is gmms we also have the |
|---|
| 0:02:20 | uh if from lower adaptation court phrase gmms uh and a little bit |
|---|
| 0:02:24 | uh |
|---|
| 0:02:25 | um that there are few results we had previously published which are still lot in this new code base but |
|---|
| 0:02:30 | there |
|---|
| 0:02:31 | uh going to be added |
|---|
| 0:02:34 | so |
|---|
| 0:02:35 | this is |
|---|
| 0:02:36 | this is already been talked about we have a |
|---|
| 0:02:39 | gmm class and uh it knows really in about nothing else other than |
|---|
| 0:02:44 | and what what it contains uh |
|---|
| 0:02:46 | that is the parameters |
|---|
| 0:02:48 | and there is that acoustic stick model class which is just a vector of gmms |
|---|
| 0:02:51 | and for implementation reason |
|---|
| 0:02:54 | or pointers but |
|---|
| 0:02:55 | not that |
|---|
| 0:02:56 | uh interesting uh a thing but uh uh a the green of in this |
|---|
| 0:03:00 | slides would |
|---|
| 0:03:01 | uh signify this |
|---|
| 0:03:03 | technical term called knows about where hit which is and |
|---|
| 0:03:07 | it it could be a so it's so we have a did um as much of inheritance has because |
|---|
| 0:03:13 | uh so |
|---|
| 0:03:15 | um most of the time things are not uh inherited things |
|---|
| 0:03:19 | if |
|---|
| 0:03:20 | uh uh uh object needs to |
|---|
| 0:03:22 | cheap |
|---|
| 0:03:23 | uh track of another object it's |
|---|
| 0:03:25 | either |
|---|
| 0:03:26 | by keeping a once preference uh it |
|---|
| 0:03:29 | that's that case otherwise |
|---|
| 0:03:30 | yeah |
|---|
| 0:03:31 | specific fake uh that will take just pointers and modified that |
|---|
| 0:03:35 | um |
|---|
| 0:03:36 | so knows was about is in that sense that you can think that |
|---|
| 0:03:39 | you know if you have to write the code you have to be to the head or four |
|---|
| 0:03:42 | this on the thing right |
|---|
| 0:03:44 | um |
|---|
| 0:03:47 | uh so so |
|---|
| 0:03:48 | so |
|---|
| 0:03:49 | the gmms are parametrized |
|---|
| 0:03:51 | um |
|---|
| 0:03:52 | using the natural parameters which is a which |
|---|
| 0:03:55 | a natural parameters in the sense of um the that's of parameters of an mention distribution |
|---|
| 0:04:00 | where uh if you right of the |
|---|
| 0:04:02 | like your got you get |
|---|
| 0:04:04 | um |
|---|
| 0:04:05 | this too |
|---|
| 0:04:06 | i think that the |
|---|
| 0:04:08 | uh them |
|---|
| 0:04:08 | the there is a |
|---|
| 0:04:09 | uh the mean time |
|---|
| 0:04:11 | the inverse of the covariance and the inverse of the covariance of the natural parameters of few M |
|---|
| 0:04:15 | and the reason for doing that is then you can do the like your calculation |
|---|
| 0:04:18 | using just |
|---|
| 0:04:20 | two |
|---|
| 0:04:20 | matrix vector multiplication locations because it or if you have diagonal covariance system |
|---|
| 0:04:25 | you have your and |
|---|
| 0:04:26 | you have the mean times |
|---|
| 0:04:28 | in this covariance is the vector and say |
|---|
| 0:04:30 | you five components are i mean |
|---|
| 0:04:32 | i components |
|---|
| 0:04:33 | and you have your data vector and |
|---|
| 0:04:35 | you just |
|---|
| 0:04:36 | do this to make exact vector |
|---|
| 0:04:38 | but |
|---|
| 0:04:40 | and |
|---|
| 0:04:41 | there are last ratings for doing that obviously |
|---|
| 0:04:43 | yeah a to blast |
|---|
| 0:04:45 | is |
|---|
| 0:04:46 | yeah not the most optimize thing but |
|---|
| 0:04:48 | i mean it's still |
|---|
| 0:04:49 | uh a nice |
|---|
| 0:04:50 | um |
|---|
| 0:04:51 | uh we of doing things |
|---|
| 0:04:53 | so um |
|---|
| 0:04:56 | so uh uh uh a graphical uh overview of uh what dan has already said that |
|---|
| 0:05:01 | uh uh we have this as to model class but |
|---|
| 0:05:04 | when it in to the decoder it contracts with this decodable |
|---|
| 0:05:08 | uh object |
|---|
| 0:05:09 | and uh the decoder knows only about uh this the court of an interface and |
|---|
| 0:05:13 | for each type of acoustic model we need to implement the project us |
|---|
| 0:05:17 | as with the able |
|---|
| 0:05:18 | uh interface uh for that model right |
|---|
| 0:05:22 | and the decodable |
|---|
| 0:05:23 | uh object is the one which all some about features |
|---|
| 0:05:26 | and um |
|---|
| 0:05:27 | just that isn't you'd of the like computation |
|---|
| 0:05:30 | and this is |
|---|
| 0:05:31 | exactly how the decoder interface looks like |
|---|
| 0:05:34 | so |
|---|
| 0:05:35 | so but when i be avoid yeah using uh in here dense |
|---|
| 0:05:39 | this is the only exception which would be uh |
|---|
| 0:05:42 | when V have interfaces which we have a |
|---|
| 0:05:45 | you |
|---|
| 0:05:45 | for features for portable and |
|---|
| 0:05:47 | a few of the things |
|---|
| 0:05:49 | uh and these are actually pure interfaces |
|---|
| 0:05:52 | uh so that |
|---|
| 0:05:54 | what B |
|---|
| 0:05:55 | a a a that's only case where we hate |
|---|
| 0:05:58 | um so as you can see it's |
|---|
| 0:06:01 | a simple E |
|---|
| 0:06:02 | the main function is that like you good combination |
|---|
| 0:06:04 | and uh the decoder can know that but there |
|---|
| 0:06:07 | at |
|---|
| 0:06:08 | there no more frames |
|---|
| 0:06:09 | and yeah |
|---|
| 0:06:11 | how many states essentially you have |
|---|
| 0:06:17 | so |
|---|
| 0:06:17 | a for every other model type you then in heard from this end |
|---|
| 0:06:20 | uh in |
|---|
| 0:06:22 | not |
|---|
| 0:06:23 | so um |
|---|
| 0:06:24 | that was the decoding for training we similarly have a object for |
|---|
| 0:06:29 | spring that matters |
|---|
| 0:06:30 | and uh |
|---|
| 0:06:31 | for the gmms and |
|---|
| 0:06:33 | uh in in the same way that the acoustic model is just a vector of gmms the |
|---|
| 0:06:37 | uh the |
|---|
| 0:06:38 | acoustic model trainer is just a vector of |
|---|
| 0:06:40 | uh objects with screen that you |
|---|
| 0:06:43 | and uh |
|---|
| 0:06:49 | yeah yeah |
|---|
| 0:06:51 | okay yes sure this this yeah that my slides are not compatible |
|---|
| 0:06:56 | yeah |
|---|
| 0:06:57 | so |
|---|
| 0:06:58 | um |
|---|
| 0:07:00 | yeah |
|---|
| 0:07:02 | ah |
|---|
| 0:07:02 | um and and and the red arrow means that uh this classes with modified those classes |
|---|
| 0:07:08 | obviously modifies it implies it also knows about and |
|---|
| 0:07:11 | typically modification it doesn't keep |
|---|
| 0:07:14 | any or an object up the other class pictures |
|---|
| 0:07:18 | it has a method which will |
|---|
| 0:07:19 | um take that object and |
|---|
| 0:07:21 | do the modification |
|---|
| 0:07:25 | um so how do you adaptation adaptation for that |
|---|
| 0:07:28 | say uh for feature space mllr um |
|---|
| 0:07:33 | and so it's |
|---|
| 0:07:34 | if it's global it's implemented as as |
|---|
| 0:07:36 | as a |
|---|
| 0:07:37 | simple matrix |
|---|
| 0:07:38 | uh |
|---|
| 0:07:39 | and |
|---|
| 0:07:40 | the matrix doesn't need to know what it as like a a it's it's only the estimation which makes it |
|---|
| 0:07:44 | that from the ladder |
|---|
| 0:07:45 | so the estimator knows about acoustic model nodes |
|---|
| 0:07:49 | about revision too if you're using the version three |
|---|
| 0:07:51 | and if you're using regression P |
|---|
| 0:07:54 | the timber object has just multiple transform |
|---|
| 0:07:57 | um |
|---|
| 0:07:58 | and similarly to so that it from another object then however doesn't know about |
|---|
| 0:08:02 | uh regression feed this concept |
|---|
| 0:08:04 | it just has a bunch of transforms it's a decodable object which |
|---|
| 0:08:08 | nose |
|---|
| 0:08:09 | hoping to read this thing |
|---|
| 0:08:14 | a similarly with mllr |
|---|
| 0:08:16 | uh obviously that has to know that "'cause" model and them a lower |
|---|
| 0:08:20 | uh can either |
|---|
| 0:08:21 | uh you can |
|---|
| 0:08:22 | it can acoustic model and tell it give me an adapted models are to just |
|---|
| 0:08:26 | a all the means and give you and you model |
|---|
| 0:08:28 | uh a i it can do it lazy so that every you can |
|---|
| 0:08:33 | um um so the decodable |
|---|
| 0:08:35 | the decoder will as the D portable to |
|---|
| 0:08:37 | get the lack you'd from an out of date model the |
|---|
| 0:08:39 | the decodable will |
|---|
| 0:08:41 | quite either the M other object which |
|---|
| 0:08:43 | then we'll see fit |
|---|
| 0:08:46 | has already completed this |
|---|
| 0:08:48 | i mean it catches the mean |
|---|
| 0:08:49 | if not then will |
|---|
| 0:08:51 | uh a the mean from the acoustic model and i weekly see that |
|---|
| 0:08:55 | then convert it right |
|---|
| 0:08:56 | so which |
|---|
| 0:08:58 | which is |
|---|
| 0:08:59 | how you would use it can practical uh situation |
|---|
| 0:09:05 | there's gmms |
|---|
| 0:09:06 | have very similar structure |
|---|
| 0:09:08 | again |
|---|
| 0:09:09 | yeah there is that the able |
|---|
| 0:09:10 | uh on the is gmm |
|---|
| 0:09:12 | oh it |
|---|
| 0:09:14 | that should say S |
|---|
| 0:09:17 | jim |
|---|
| 0:09:18 | and the gmm class |
|---|
| 0:09:20 | um it the is gmm model it has |
|---|
| 0:09:22 | this you switch |
|---|
| 0:09:25 | um that's why needs to know about |
|---|
| 0:09:27 | the gmm classes as well |
|---|
| 0:09:29 | right and |
|---|
| 0:09:30 | just for |
|---|
| 0:09:32 | yeah the |
|---|
| 0:09:32 | convenience of coding |
|---|
| 0:09:34 | there's gmm up for the gmm classes that can lead to send out dating |
|---|
| 0:09:38 | class is the same |
|---|
| 0:09:39 | for is you rooms they different because |
|---|
| 0:09:41 | there many uh a big |
|---|
| 0:09:42 | method |
|---|
| 0:09:43 | used in is |
|---|
| 0:09:47 | yeah and things sort nets so am |
|---|
| 0:09:50 | and uh so |
|---|
| 0:09:51 | so the first bullet point there from lower basis for for you miss already |
|---|
| 0:09:54 | published |
|---|
| 0:09:55 | like know |
|---|
| 0:09:57 | to your own work on most |
|---|
| 0:09:58 | uh it's in the old code base |
|---|
| 0:10:00 | new |
|---|
| 0:10:01 | we need to put it in the new one |
|---|
| 0:10:03 | um |
|---|
| 0:10:04 | partially actually done |
|---|
| 0:10:05 | um |
|---|
| 0:10:06 | then |
|---|
| 0:10:08 | a couple of is back then present the symmetric extension of is gmms |
|---|
| 0:10:13 | um |
|---|
| 0:10:14 | so at you can |
|---|
| 0:10:15 | people keep an asking what's summit at means |
|---|
| 0:10:18 | uh |
|---|
| 0:10:19 | um uh uh uh so so that that's also partially done |
|---|
| 0:10:23 | um |
|---|
| 0:10:24 | and then has then mention that |
|---|
| 0:10:26 | we of reading for um that generation to finished |
|---|
| 0:10:29 | and we can out of the this thing things |
|---|
| 0:10:32 | um |
|---|
| 0:10:34 | yes there but parts and discussions and debates and this |
|---|
| 0:10:38 | um and on |
|---|
| 0:10:40 | supporting multiple feature transforms |
|---|
| 0:10:42 | currently you only have |
|---|
| 0:10:45 | global transform send their just |
|---|
| 0:10:47 | put into one chain |
|---|
| 0:10:53 | a regression class yeah i i you can have regression classes for M F and alarms |
|---|
| 0:10:58 | but then you can compose it with any other transform which has multiple |
|---|
| 0:11:02 | john some as well |
|---|
| 0:11:03 | so yeah so |
|---|
| 0:11:05 | so that when i say |
|---|
| 0:11:12 | no yeah no |
|---|
| 0:11:16 | so |
|---|
| 0:11:16 | that's the thing with that |
|---|
| 0:11:18 | but would feature transforms and |
|---|
| 0:11:20 | okay that is |
|---|
| 0:11:21 | to multiple here |
|---|
| 0:11:23 | first of for for each type there are multiple transforms and then my |
|---|
| 0:11:27 | that's types |
|---|
| 0:11:27 | composed of good |
|---|
| 0:11:29 | and i don't know |
|---|
| 0:11:30 | for the roof feel the need for a but when me to the need for a will think about four |
|---|
| 0:11:33 | to do this |
|---|
| 0:11:34 | i and probably will be handled in something like a decodable |
|---|
| 0:11:38 | uh object level because |
|---|
| 0:11:39 | nothing |
|---|
| 0:11:41 | else needs to know about |
|---|
| 0:11:42 | uh how the compose |
|---|
| 0:11:45 | so that's the end of |
|---|
| 0:11:46 | we would be you of |
|---|
| 0:11:48 | a models |
|---|
| 0:11:50 | i |
|---|
| 0:11:55 | i |
|---|