0:00:13so that and give you a a a uh a um all of you up the whole to locate and
0:00:18just going to you give a brief description of how the
0:00:20a to model various acts to model classes or
0:00:23organise just to give you a flavour of file
0:00:25what is meant by the court is modular
0:00:28and parts that don't need to know about each of the north
0:00:32um um
0:00:34so just tool
0:00:36re rate um
0:00:39uh
0:00:40the thing that we support currently it's
0:00:42it's mainly the
0:00:43the standard max in the cute training of acoustic models together with a gmms and in the kind of max
0:00:48that cute framework
0:00:50um we have the usual in your transforms like lda to
0:00:54and S T C
0:00:55um
0:00:56we also support speaker adaptation
0:00:59currently if a are is
0:01:01a we have tested it in the recipes
0:01:04mllr lower court is there it's
0:01:05you mean tested um
0:01:07this still
0:01:08any to right so somebody needs to write the the cable
0:01:12and and on them um
0:01:14so
0:01:15mllr is not in the recipe
0:01:17almost done
0:01:19um
0:01:20and well
0:01:21uh and uh
0:01:22and leather obviously has but it it's with which trees then if and lower
0:01:26has to
0:01:27variations of one it's it's just a global transform or with which trees
0:01:32a
0:01:34uh yeah and i
0:01:35this is the point
0:01:36which
0:01:37and once uh then can mention that that
0:01:39we had some discussion whether two
0:01:41a sub for um things like uh do you known type systems are be take models where
0:01:46uh and uh for now
0:01:49uh things are fairly simple
0:01:51um we decided not to do it now
0:01:54maybe if the need is felt in feature and sometimes P
0:01:57also
0:01:58for the course of
0:01:59this development
0:02:00a a couple of times a part
0:02:01my
0:02:02be good to have a system like that
0:02:04but currently when a gmm it's
0:02:06it's
0:02:07uh a very specific thing with means and covariances
0:02:11uh and i'm going to
0:02:12just be few also see how the gmms are implemented
0:02:15um
0:02:16and yeah the sims in the thing with is gmms we also have the
0:02:20uh if from lower adaptation court phrase gmms uh and a little bit
0:02:24uh
0:02:25um that there are few results we had previously published which are still lot in this new code base but
0:02:30there
0:02:31uh going to be added
0:02:34so
0:02:35this is
0:02:36this is already been talked about we have a
0:02:39gmm class and uh it knows really in about nothing else other than
0:02:44and what what it contains uh
0:02:46that is the parameters
0:02:48and there is that acoustic stick model class which is just a vector of gmms
0:02:51and for implementation reason
0:02:54or pointers but
0:02:55not that
0:02:56uh interesting uh a thing but uh uh a the green of in this
0:03:00slides would
0:03:01uh signify this
0:03:03technical term called knows about where hit which is and
0:03:07it it could be a so it's so we have a did um as much of inheritance has because
0:03:13uh so
0:03:15um most of the time things are not uh inherited things
0:03:19if
0:03:20uh uh uh object needs to
0:03:22cheap
0:03:23uh track of another object it's
0:03:25either
0:03:26by keeping a once preference uh it
0:03:29that's that case otherwise
0:03:30yeah
0:03:31specific fake uh that will take just pointers and modified that
0:03:35um
0:03:36so knows was about is in that sense that you can think that
0:03:39you know if you have to write the code you have to be to the head or four
0:03:42this on the thing right
0:03:44um
0:03:47uh so so
0:03:48so
0:03:49the gmms are parametrized
0:03:51um
0:03:52using the natural parameters which is a which
0:03:55a natural parameters in the sense of um the that's of parameters of an mention distribution
0:04:00where uh if you right of the
0:04:02like your got you get
0:04:04um
0:04:05this too
0:04:06i think that the
0:04:08uh them
0:04:08the there is a
0:04:09uh the mean time
0:04:11the inverse of the covariance and the inverse of the covariance of the natural parameters of few M
0:04:15and the reason for doing that is then you can do the like your calculation
0:04:18using just
0:04:20two
0:04:20matrix vector multiplication locations because it or if you have diagonal covariance system
0:04:25you have your and
0:04:26you have the mean times
0:04:28in this covariance is the vector and say
0:04:30you five components are i mean
0:04:32i components
0:04:33and you have your data vector and
0:04:35you just
0:04:36do this to make exact vector
0:04:38but
0:04:40and
0:04:41there are last ratings for doing that obviously
0:04:43yeah a to blast
0:04:45is
0:04:46yeah not the most optimize thing but
0:04:48i mean it's still
0:04:49uh a nice
0:04:50um
0:04:51uh we of doing things
0:04:53so um
0:04:56so uh uh uh a graphical uh overview of uh what dan has already said that
0:05:01uh uh we have this as to model class but
0:05:04when it in to the decoder it contracts with this decodable
0:05:08uh object
0:05:09and uh the decoder knows only about uh this the court of an interface and
0:05:13for each type of acoustic model we need to implement the project us
0:05:17as with the able
0:05:18uh interface uh for that model right
0:05:22and the decodable
0:05:23uh object is the one which all some about features
0:05:26and um
0:05:27just that isn't you'd of the like computation
0:05:30and this is
0:05:31exactly how the decoder interface looks like
0:05:34so
0:05:35so but when i be avoid yeah using uh in here dense
0:05:39this is the only exception which would be uh
0:05:42when V have interfaces which we have a
0:05:45you
0:05:45for features for portable and
0:05:47a few of the things
0:05:49uh and these are actually pure interfaces
0:05:52uh so that
0:05:54what B
0:05:55a a a that's only case where we hate
0:05:58um so as you can see it's
0:06:01a simple E
0:06:02the main function is that like you good combination
0:06:04and uh the decoder can know that but there
0:06:07at
0:06:08there no more frames
0:06:09and yeah
0:06:11how many states essentially you have
0:06:17so
0:06:17a for every other model type you then in heard from this end
0:06:20uh in
0:06:22not
0:06:23so um
0:06:24that was the decoding for training we similarly have a object for
0:06:29spring that matters
0:06:30and uh
0:06:31for the gmms and
0:06:33uh in in the same way that the acoustic model is just a vector of gmms the
0:06:37uh the
0:06:38acoustic model trainer is just a vector of
0:06:40uh objects with screen that you
0:06:43and uh
0:06:49yeah yeah
0:06:51okay yes sure this this yeah that my slides are not compatible
0:06:56yeah
0:06:57so
0:06:58um
0:07:00yeah
0:07:02ah
0:07:02um and and and the red arrow means that uh this classes with modified those classes
0:07:08obviously modifies it implies it also knows about and
0:07:11typically modification it doesn't keep
0:07:14any or an object up the other class pictures
0:07:18it has a method which will
0:07:19um take that object and
0:07:21do the modification
0:07:25um so how do you adaptation adaptation for that
0:07:28say uh for feature space mllr um
0:07:33and so it's
0:07:34if it's global it's implemented as as
0:07:36as a
0:07:37simple matrix
0:07:38uh
0:07:39and
0:07:40the matrix doesn't need to know what it as like a a it's it's only the estimation which makes it
0:07:44that from the ladder
0:07:45so the estimator knows about acoustic model nodes
0:07:49about revision too if you're using the version three
0:07:51and if you're using regression P
0:07:54the timber object has just multiple transform
0:07:57um
0:07:58and similarly to so that it from another object then however doesn't know about
0:08:02uh regression feed this concept
0:08:04it just has a bunch of transforms it's a decodable object which
0:08:08nose
0:08:09hoping to read this thing
0:08:14a similarly with mllr
0:08:16uh obviously that has to know that "'cause" model and them a lower
0:08:20uh can either
0:08:21uh you can
0:08:22it can acoustic model and tell it give me an adapted models are to just
0:08:26a all the means and give you and you model
0:08:28uh a i it can do it lazy so that every you can
0:08:33um um so the decodable
0:08:35the decoder will as the D portable to
0:08:37get the lack you'd from an out of date model the
0:08:39the decodable will
0:08:41quite either the M other object which
0:08:43then we'll see fit
0:08:46has already completed this
0:08:48i mean it catches the mean
0:08:49if not then will
0:08:51uh a the mean from the acoustic model and i weekly see that
0:08:55then convert it right
0:08:56so which
0:08:58which is
0:08:59how you would use it can practical uh situation
0:09:05there's gmms
0:09:06have very similar structure
0:09:08again
0:09:09yeah there is that the able
0:09:10uh on the is gmm
0:09:12oh it
0:09:14that should say S
0:09:17jim
0:09:18and the gmm class
0:09:20um it the is gmm model it has
0:09:22this you switch
0:09:25um that's why needs to know about
0:09:27the gmm classes as well
0:09:29right and
0:09:30just for
0:09:32yeah the
0:09:32convenience of coding
0:09:34there's gmm up for the gmm classes that can lead to send out dating
0:09:38class is the same
0:09:39for is you rooms they different because
0:09:41there many uh a big
0:09:42method
0:09:43used in is
0:09:47yeah and things sort nets so am
0:09:50and uh so
0:09:51so the first bullet point there from lower basis for for you miss already
0:09:54published
0:09:55like know
0:09:57to your own work on most
0:09:58uh it's in the old code base
0:10:00new
0:10:01we need to put it in the new one
0:10:03um
0:10:04partially actually done
0:10:05um
0:10:06then
0:10:08a couple of is back then present the symmetric extension of is gmms
0:10:13um
0:10:14so at you can
0:10:15people keep an asking what's summit at means
0:10:18uh
0:10:19um uh uh uh so so that that's also partially done
0:10:23um
0:10:24and then has then mention that
0:10:26we of reading for um that generation to finished
0:10:29and we can out of the this thing things
0:10:32um
0:10:34yes there but parts and discussions and debates and this
0:10:38um and on
0:10:40supporting multiple feature transforms
0:10:42currently you only have
0:10:45global transform send their just
0:10:47put into one chain
0:10:53a regression class yeah i i you can have regression classes for M F and alarms
0:10:58but then you can compose it with any other transform which has multiple
0:11:02john some as well
0:11:03so yeah so
0:11:05so that when i say
0:11:12no yeah no
0:11:16so
0:11:16that's the thing with that
0:11:18but would feature transforms and
0:11:20okay that is
0:11:21to multiple here
0:11:23first of for for each type there are multiple transforms and then my
0:11:27that's types
0:11:27composed of good
0:11:29and i don't know
0:11:30for the roof feel the need for a but when me to the need for a will think about four
0:11:33to do this
0:11:34i and probably will be handled in something like a decodable
0:11:38uh object level because
0:11:39nothing
0:11:41else needs to know about
0:11:42uh how the compose
0:11:45so that's the end of
0:11:46we would be you of
0:11:48a models
0:11:50i
0:11:55i