0:00:15all right nice the so i'm going to propose something that we have been the
0:00:19what knowing actually during the last
0:00:24see last p washable hopkinson trying to explore
0:00:28you've these any information useful in the gmm weights because of the i-vectors are or
0:00:32probably no is only at the
0:00:34try to adapt the means
0:00:38so as you probably all down now that the i-vectors is related to adapting the
0:00:43means for handies be have been
0:00:45very well applied for speaker language dialect and different all applications
0:00:50so and the story
0:00:53behind only adapting them each is
0:00:55going back to the
0:00:57gmm amount but the gmm map adaptation with the ubm universal background model as basis
0:01:02so undefined it only means can be adapted to what whenever try to revise the
0:01:07fact that maybe what the i-vectors it only older useful information
0:01:12in the we trained the weights or even in the variance
0:01:15probably patrick well a already tried with the variance for jfa
0:01:21so we hear when this work we try to
0:01:24to do something with the weight so huh the having a lot of a peak
0:01:28i technique set of inoperable proposed for the weights
0:01:34and we have tried will build a new one called non-negative factor analysis what was
0:01:38actually has and who was the students and belgian one he was busy male the
0:01:42mighty and we first tried foreign language id and which actually we have some success
0:01:47with it
0:01:48and the reason was that you know for language at some time of me haven't
0:01:52ubm and you use for that you're portions are kind of phonemes supposedly so if
0:01:57some from language this phoneme that not appearing so turning dams how can also zero
0:02:02or even the weights of this goal cushions can be useful information and that's what
0:02:08we found out and that's what actually motivated construct four speakers there's any information that
0:02:13can be also used for speaker from the gmm weights
0:02:16and this is ultimately the topics of this work
0:02:18and we also compared to switch to this non-negative factor analysis and fa two
0:02:24already existing techniques that it was proposed in but we should subspace a multinomial model
0:02:29and over the speed this that the this presentation is kind of comprising between the
0:02:34two for in the case of gmm weight adaptation
0:02:38it's for forty adapting the gmm weights i have been a lot of a peak
0:02:41techniques already applied maximum a posteriori maximum likelihood any many recreational are
0:02:48eigenvoice wishes they
0:02:50the starting point of all the new technology not jfa and i-vector
0:02:54and they were also a lot of
0:02:56weight adaptation techniques they're like for example maximum likelihood nonnegative matrix factorization and multinomial subspace
0:03:05and then you wanted we propose non-negative factor analyzers
0:03:10sell the idea behind this example the i-vector concept i don't want to bore you
0:03:13with this
0:03:14is you know you say that for a given utterance there is an ubm which
0:03:17is not a prior of all the sounds how the sauce look like and the
0:03:21i-vectors try to model all the shift for this ubm to a given recording you
0:03:26can be
0:03:26can be require a model by a low dimensional
0:03:30us matrix representation
0:03:33and the coordinates of this
0:03:36recording in this space we call that i-vectors
0:03:38so we tried to use the same concepts actually for doing though the means and
0:03:42all that this fourteen sorry for doing that we the weights
0:03:45and the only difference that we were facing is that the way it should be
0:03:49all positive and they sum to one so
0:03:52i can explain that later so in order to do the weights now so the
0:03:56first thing so you when you when you when you have we have one ubm
0:03:59for example universal background model
0:04:01and you have a sequence of features you can compute some counts which is the
0:04:06posterior distribution of occupational fish a portion given the frame cell which given here in
0:04:13so the objective function in the weights is kind of be of this kind of
0:04:18it's work
0:04:20this callback liberty versions which is kind of trying to maximize
0:04:23the cover different versions between the because the that's a redeeming about the cover different
0:04:27prices between the counts
0:04:29and of the weights that you want to model
0:04:33and so if you want if you get the data discounts and you don't normalise
0:04:37in with the land of
0:04:38you're for eight euros sentence you get the maximum likelihood estimation
0:04:42for the weight which is easy to do so for example the first two well
0:04:46technique that we propose a bit unfortunately we can compare would it for this for
0:04:51this paper is more negative matrix factorization so we suppose you have a weights and
0:04:56you say that okay this which can be
0:04:58split into negative nonnegative sorry
0:05:02and the first one is gonna be the basis of your space and the second
0:05:06one he's the cover the coordinate of this
0:05:08and this the composition can be
0:05:15can be a the of to find to optimize
0:05:18the auxiliary function
0:05:20okay so this is the fact that and forty two in have time to do
0:05:23comparison with it
0:05:24but we did what we did we compare with this subspace model because always would
0:05:29look actually that i
0:05:30so we try to compare with a two
0:05:33so they behind and what this subsystems to model this is you have that the
0:05:38concise and of accounts here
0:05:40and you try to find a multinomial distribution
0:05:43that fit
0:05:45this distribution
0:05:46and this can be defined by saluted much there are i-vector space this is the
0:05:50ubm weights
0:05:52and this is normalized
0:05:53but to get the that the weights sum to one
0:05:57and it had they have splits over papers in that how to do the optimization
0:06:01have some haitian
0:06:04solution for that
0:06:06so for example suppose you have for example for the s m suppose you have
0:06:10to go options
0:06:12and the in band for each point here is the maximal likely to put that
0:06:16is a maximum likelihood estimation of the weights for a given recording okay
0:06:20and for this example which we see with the actually this point are generated for
0:06:25and suppress model don't the subspace multimodal distribution
0:06:30so we generate from this model
0:06:32because the time of belief that for low dim for high dimensional space
0:06:36you that the detector should be distributed like to not over like because if you
0:06:40take a lot of data and it right only to go options the data would
0:06:44be everywhere
0:06:45but if and high dimensional space
0:06:48i would try to simulate that and is it and to find that
0:06:51you know tried to simulate high dimensional gmm intrigue oceans
0:06:56quite case and this is kind of that's what we did so we they you
0:07:00to look at low and other people but did
0:07:03so we generate a data from this model
0:07:07and we shall we what's difference between this
0:07:10this model and the non-negative factor analyses
0:07:12so we non-negative factor analysis actually what would say let's say which is the same
0:07:16as the i-vectors we suppose that we haven't ubm
0:07:19and issue recording which the weight for each recording can be explained by a shift
0:07:23just t v in the direction of the data
0:07:26and this
0:07:28so the same as the i-vector sell this can be a low rank and are
0:07:31is a new i-vectors in this new space
0:07:34so the only problem with this we had were facing is that
0:07:38the weights for each recording should be always positive initial sum to one
0:07:43so here we have we develop some kind of an em like
0:07:47so in of so we have an like an
0:07:51we first a big air
0:07:52we get some statistics for this here
0:07:55we get some a to the gods in the a sound
0:07:59to estimate the air
0:08:00and then
0:08:00when we obtain the l
0:08:02we do and projected crow project projected the gradient ascent
0:08:07what the projection metric that we used try to
0:08:10a given as the constraint that they should always sum to one and they should
0:08:13always be positive
0:08:15and that's what we actually did if you want to have more explanation
0:08:19i don't have time for that and here i can find that so
0:08:22remember this is the several account this is the auxiliary function for the lack of
0:08:27for the for the gmm weights case and with this is our weight and we
0:08:32would like to estimate is to
0:08:34which subjected that they should sum to one so what we did we just
0:08:38assume that the g is a one vector of one so they should just one
0:08:42should sum to one
0:08:43and they all we should be positive
0:08:47so this is a to constrain that a low us to keep that the weights
0:08:52should be something to one and they should be opposite so indicates for example if
0:08:56you compare between what the gmm what the non-negative factor as a the when compared
0:09:00to what subspace model to model
0:09:03and what you know muir model is doing so
0:09:05so for this case for example
0:09:08differently the s m is different refitting well the data
0:09:12because it was generated from it
0:09:14but the i-vector the anything would choose an approximation of the data so it has
0:09:18the benefit it has a disadvantage to because the been if it is in the
0:09:23case of it and s m has a behavior to overfit of the data because
0:09:28he we really model well the distribution of the training data but twenty go to
0:09:32the lid task
0:09:33sometime and in generalize well
0:09:35so as to what but did they have the user is a regularization
0:09:39to try to control this over fitting
0:09:41so they have an orgasm regularization term that you to one when you're dead
0:09:45in order to do that so for our case we are so we are not
0:09:50suffering too much from this the good from this the we don't fit to may
0:09:54very well the training data
0:09:56but we approximate one generalize sometime better
0:09:59sometime then that's mm but is that then of that application to be honest we
0:10:02compare that for several application is sometimes the one is a bit another sometime the
0:10:07but anyway so the difference is this one is like some this as a man
0:10:11can fit really well the data
0:10:13the training data but can have problem of overfitting we need to control with regularization
0:10:18or and the n f a approximate optional the data
0:10:22and you will sometimes generalize better
0:10:26so this is the approximate the experimented with this so we have actually train and
0:10:31i-vectors first and all that the data that we have
0:10:35and which would test it actually in telephone condition of nist two thousand ten
0:10:40and we have ubm of
0:10:43two thousand forty eight this is not more technical things so we haven't i-vectors of
0:10:46extend read we use the lda let normalisation p lda scheme that but use
0:10:53and then we ask which so we try to use an i-vector for the means
0:10:56and an i-vector from
0:10:58the weights from s m and four and fa
0:11:01and we tried to do fusion how we can combine them for example just a
0:11:04simple fusion so we did score fusion
0:11:07didn't help
0:11:08allow so we just so key for get it would be some i-vectors fusion
0:11:13it seems to be
0:11:15a little bit of the better but not too much for speaker that's what little
0:11:19but for a language id actually with helping a lot
0:11:24so i for example i two can affect for example i try to see how
0:11:27the dimensionality of and then the day this new weight adaptation a compared to for
0:11:33example the i-vectors
0:11:34so i took and i don't wanna get factor analysis to train five hundred
0:11:39and one by one power
0:11:41one thousand five hundred so we remember that the starting ubm was two thousand forty
0:11:46so i and the this one's the lda first do much reduction before you length
0:11:53and you see that
0:11:54it's not really do is to the difference of not really big by the one
0:11:58by varying the data dimension d for lda
0:12:00and even
0:12:02if you compare between five hundred thousand data as the difference is not really big
0:12:07so we were a little bit surprised especially for and fa which we seen the
0:12:11same behaviour for s m as well
0:12:15but sometimes they just send them is you need to be more low dimensional compared
0:12:19to all
0:12:20wanna get the factorized as non-negative factor try to be more high dimension compared to
0:12:25the other one
0:12:27so here for example you i we compare the best result that we obtained from
0:12:31a negative factor analysis
0:12:32compared to one for the
0:12:34subsist multi model
0:12:36and for the core condition of male
0:12:39and female and eight conversations so we can see that actually that's not really too
0:12:44much difference
0:12:46some time and if a listening that sometime is less but better than
0:12:49then an s m
0:12:52and the but you see that for the conversation you know you can get very
0:12:57nice improvement over a nice result even without using the gmm weights is the mean
0:13:01they're just t weights
0:13:04no if you compared with the i-vectors
0:13:08so the i-vectors is i don't so i the says a lot the maximum likelihood
0:13:11of the weights so we should talk about the
0:13:15the maximum likelihood of the weights with of the log and feed that to lda
0:13:18maybe it's not the best way to do
0:13:22so maybe you can do something clever so it seems that to with a local
0:13:25the women selected was worse
0:13:27compared to s m and
0:13:29and the weight for all the condition eight conversation male female and core condition as
0:13:34so now we remove the maximum likelihood from the loop
0:13:37and we put the i-vectors here so we can see that
0:13:41usually the i-vectors is twice better
0:13:45this year we can do you get the can i-vectors other than the weight vectors
0:13:48any divide by traffic to get the i-vectors so
0:13:52so the
0:13:53so the i-vectors is it differently much better than the weights
0:13:59let's not too much but if you go to do eight conversation
0:14:03it's actually pretty cool "'cause" the correct is a very low
0:14:06so even for the long for when you have a lack of a lot a
0:14:09more recording from the speaker
0:14:12the weights can also give your
0:14:13almost useful information that need
0:14:16the i-vector can give you
0:14:18so that was of the source surprising for that of reason
0:14:21so here what we took sector this will have and you the minimal dcf only
0:14:25the c of liquids doesn't a this doesn't and is the great you have the
0:14:29baseline with you the i-vector
0:14:31for female and male
0:14:33so one would you the i-vectors with the weights would use the
0:14:37i-vector fusion here
0:14:38so this is an f when you're ready to and if we if we added
0:14:41and fa will win little bit here
0:14:44we use an acre eight looks but here but not rate too much
0:14:48but at one for example for female when we do this when you fuse we
0:14:52s m we get muscular but again for new dcf
0:14:55you know operating point and even the correct
0:14:59so for f e m s m was the best
0:15:02diffuse with
0:15:03for male you know we can see that the and if a was much better
0:15:07for all this but not really in medium new minimal dcf
0:15:11so it was not really
0:15:13exciting to tour of fusion to be honest it was loaded with improvement of really
0:15:17locked compared what was seen for language id
0:15:22so here since the i-vectors is an awful related to the dimensionality of the supervectors
0:15:27so we cannot go right to increase the ubm sizes for the gmm weights
0:15:33the dimensionality is kind of related to how many courses you have so
0:15:37we have
0:15:39with tried to say okay well let's try to increase and decrease the ubm sides
0:15:43and see what happened with the with the
0:15:46we did not example here we tried only get a factor as for the only
0:15:50so if we can see that for example if we increase the
0:15:53the portions in the ubm size you get a very nice improvement in the for
0:15:59both men and female
0:16:01especially if only maybe
0:16:03and you mindcf so
0:16:05so here vince the i-vector that the weight is not ready to the size of
0:16:09the supervector as you can increase
0:16:12the amount of the
0:16:14or of the
0:16:17of the portions in the so in the ubm size so you can you can
0:16:20even think about using
0:16:22a speech recognizer and try it some if you want
0:16:26so i'll so what we did here is actually took the baseline as well
0:16:30which i
0:16:40we took the i storage notably i-vectors
0:16:44that is all i would try to fuse it with the
0:16:47the i-vectors from different
0:16:49ubm size
0:16:51and he can see that for example of is a kind of are not really
0:16:54kind of conclusions
0:16:55over true even you increase even here for example we get
0:17:00well sorry yes
0:17:00if and you get better results with two thousand forty thousand questions
0:17:05diffusions for example for female didn't help too much
0:17:09to be honest was actually words
0:17:11and for female form a was a little bit so as well
0:17:15political court and would do
0:17:17it doesn't mean that you get better results in the weights will happen would if
0:17:19use only i-vectors as they could the question
0:17:23so as a conclusion here we try to a
0:17:28use the weights and try to think if it is worth a little better way
0:17:31of and using the weights and updating as well
0:17:34not only the means which is the
0:17:35what the i-vector is doing
0:17:37and so we will we seen some slight improvement when you want to combine them
0:17:41maybe we need to find a better way to combine them some for example
0:17:45similar to what subspace gmms are doing for is for speech recognition
0:17:50i don't know what
0:17:52then and look for working on that and on of they make some progress
0:17:55which i tried interactively for example you estimate the weights you all data gmm weights
0:18:01of the ubm
0:18:02and then extract wilma statistic second you i-vectors it didn't have for speaker to be
0:18:07honest i tried it and in a given the same result no improvement not think
0:18:13but i met in tried for language id but only for speaker
0:18:18thank you
0:18:34you have a lot of time that i'm to understand my question that i
0:18:38you know we walk a lot on the way to unit in avenue always negotiate
0:18:42for mainly
0:18:44and we are looking also on
0:18:47the weights with l you know we approach and
0:18:51michelle has also some results so
0:18:54it seems to me since beginning was because she felt that
0:18:59the weights are very interesting very nice source of information
0:19:05in fact it's of in every information
0:19:08why if you come back to ubm-gmm
0:19:11and come back to dog results
0:19:14when he proposed via a top cushion scoring compression you are using on the
0:19:18one motion put one to this one and zero to all view of those the
0:19:23lowest them of that summons was quite small
0:19:27after that when you go to a nickel ship a results
0:19:32which wine is the em too many out and
0:19:35do a lot of things very close to what you
0:19:39at the end the best solution was to use very rank based normalisation
0:19:43in the right based is very close to a
0:19:47put the one to some portion zero to view of those on the weight and
0:19:53this after
0:19:54and now if you look at p m share the results he's bit of a
0:19:58need to explain a text in but and of the time using just the zero
0:20:03and one information a of the weights it seems that we are able to find
0:20:09so according to me the way to a position
0:20:13represents information birdies
0:20:15this information yes or no and not a continuous information
0:20:21you are trying to do any we
0:20:23so i so there is a good point here because
0:20:27one i one eight one to one has sent could start working with the mean
0:20:31in a negative factor is my first question on my first think was that was
0:20:35aware what nicole lasted in
0:20:37i don't i'm
0:20:38i want to sparsity into in the weights
0:20:42this is this is not able to do what are what we're doing now sell
0:20:47because i agree
0:20:48with top one top five what it what ones for the top five
0:20:51so it's like i say one sparsity in the in the in the weights like
0:20:55which is right and all of them the response like some
0:20:57zero one can only the top five for example or something
0:21:00but for this system
0:21:02well for this model that we have we're not doing that
0:21:05that was my first actually common one because of it was in the committee and
0:21:09made was my first comment was like how we can be good sparse
0:21:13because based on what you what you're saying exactly
0:21:28extract the i-vectors adaptively
0:21:31okay that the ubm
0:21:34before you extract
0:21:37for each frame
0:21:39there are very few girls
0:21:44that's what happens i don't know it's that's
0:21:47them going to knock down to solution to your problem but you will get sparsity
0:21:51about way how okay
0:21:55thanks one a
0:22:03so this kind of follows up on patrick's questionnaire so you're doing sequential estimation for
0:22:09the l six l some c are
0:22:12on how many iterations at you go through two
0:22:15to get that
0:22:17a wrong
0:22:19ten of like an em style greater and for each one is a grant in
0:22:23this s and i go looks five for our and three for l
0:22:30i'm asking this because to me it's interesting to see the rate of convergence that
0:22:34you might actually hit on this and i know it's extra work right
0:22:38you did in your evaluations i believe you're doing your evaluations when you believe you
0:22:42converge did you run any previous system
0:22:46so let's say that before hitting five you try to that i try and just
0:22:52to see where you're actually see it may be there are certain
0:22:56seems to get activator i-vector to get active
0:22:59are you might actually see
0:23:03there might be some insight into c
0:23:06i try this but not in the context on the on this context of the
0:23:11without of this enforce something what that the n and it
0:23:14the different and it's a little it's sensitive to one like to see it up
0:23:18more sometime when you get it that's true like fifteen iteration
0:23:22kind of like seen that results going to grabbing after some from point the degradation
0:23:27start to be seen
0:23:28and usually it's like between
0:23:31eight five to eight you already saturated
0:23:40yes we need to control a little bit that
0:23:43yes if we go if elected goal you is sort yourself actually s m sometime
0:23:47especially for the sparsity s m is much better because it will it would hit
0:23:51it just so what you will know like it would fit the data but he
0:23:55with just
0:23:56and have a would not do that because it's like an approximation
0:23:59so that's my issue would and fa
0:24:02s m would definitely get miss some sparsity if you know if you know how
0:24:05to control it because you know you maybe you might overfit
0:24:09the side
0:24:10more probably marceau it can no better than meets is
0:24:13for a morsel
0:24:15you probably know what are then mean that
0:24:18"'cause" he was doing this isn't that right
0:24:30where did i
0:24:35actually when we did this work are we tried different optimization algorithms
0:24:42for these approximate hacienda it converts and iterated
0:24:46quite good and also like the questions before we also saw like even few iterations
0:24:54you got already quite good results
0:24:56and if you like when only iterating you've got some degradation that so it looked
0:25:02like it starts over fitting the model
0:25:04so i guess you use all similar