0:00:39oh
0:00:41also
0:00:43fusion techniques for extracting i-vectors
0:00:46by efficient the
0:00:49we went looking for some way to address
0:00:52most of the memory of patient of the i-vector extractor
0:00:55extracting genes
0:00:58so the results a bit more state-of-the-art technology nowadays is based on i-vectors which are
0:01:04very good as
0:01:05produced a traditional
0:01:08the computation of i-vector can be quite demanding in that at least one of the
0:01:13time
0:01:15so while some solutions
0:01:19proposed for a system action with low memory requirements the namely
0:01:24the diagonal isolate vectors proposed bigram but plus the
0:01:29that is
0:01:31we should also shown to have some degradation in accuracy
0:01:36when
0:01:38some to show some degradation of accuracy so we
0:01:42well looking for a solution which does not include such degradation
0:01:46but still those two
0:01:49and greatly reduce the amount of memory required to store
0:01:54so how variation action again today
0:01:59that represent the original baser aside for i-vector extraction which is
0:02:04can see that one in the previous two
0:02:07then we present our conjugate gradient approach for i-vector extraction and finally present some experimental
0:02:14results of these techniques
0:02:17so
0:02:20i guess everybody else what's i-vectors are but does brief introduction
0:02:25there are not only for low dimensional informative for each utterance the presentations which is
0:02:30that i don't is like model
0:02:34so the most widely used
0:02:36i-vector race if we
0:02:39assume that
0:02:41most of the speaker and channel variations like that small subspace in the supervector space
0:02:47then we assume a session prior for the latent variable representing these variation
0:02:54and
0:02:55approximating the data likelihood by means so well with statistics we can compute the posterior
0:03:00of these latent variable
0:03:02and then we compute the i-vector a maximum a posteriori estimate of the latent variables
0:03:11we can show that the post
0:03:13is abortion these correspond to the a posteriori
0:03:17and for the i-vector
0:03:19so as you can see yeah
0:03:22computing is the computational cost matrix
0:03:26which and tasty a multiplication of the for the inverse matrix times the eigenvoice matrix
0:03:32is that
0:03:33or additional
0:03:36these
0:03:38this dataset
0:03:39which are
0:03:41a dimensionality which is where the i-vector dimensionality
0:03:47so
0:03:48we can see that
0:03:50no plastic on a selection techniques can be
0:03:54and
0:03:55that is all so you see represents the number of abortions and the feature dimensionality
0:04:02and then use the i-vector dimensionality
0:04:05so if we don't put anything we have a
0:04:09complexity which is the
0:04:11with a quadratic in the i-vector dimensionality
0:04:15and is the examples in the number of gaussian in the definition of two features
0:04:21we can reduce the complexity but i mean and we ask that if we want
0:04:26to this matter since he
0:04:28but this is we have a shot of memory constraint which is again quadratic in
0:04:33the effect of the nation's and proportional to the number of abortions
0:04:38with
0:04:39jessica like two thousand forty eight dimension of the ubm as used in this is
0:04:46easily the most expensive
0:04:47part that's memory of an i-vector instead
0:04:52so i thought that was the last yeah i organisation based on that have a
0:04:58nice mess over a vector instruction was proposed
0:05:02which essentially okay we can i forgot mention that we can have the same as
0:05:08that yeah
0:05:10from the form has just by performance a normalization for the problem with statistics and
0:05:15in this case of the eigenvoice matrix
0:05:18then we can assume that these are simultaneously that as a model by some methods
0:05:23Q and that we cannot compute an approximation of the posterior covariance which is
0:05:30the yeah not so that
0:05:32and session
0:05:34can be performed in a much faster way with a very limited additional requirements
0:05:40however you know it's
0:05:42yes
0:05:43right i can cause a degradation recognition accuracy
0:05:47so we wanted to do better in terms of what you see here
0:05:52so
0:05:53and we said that the problem is the computation of the covariance matrix
0:05:59the problem is that the covariance matrix is not that yeah
0:06:03if you
0:06:05this means that the i-vector components would be uncorrelated
0:06:10you're and the posteriors that would factorize
0:06:14so even though the posterior said that cannot be factorized about the different components we
0:06:19look for an approximation of the posterior which factorizes all the sets of the i-vector
0:06:25components
0:06:27so we partition the i-vector components in to be disjoint sets
0:06:32and we assume that the
0:06:33here are can be approximated by
0:06:36i distribution which factorizes of these states
0:06:39yeah
0:06:41the correlation baseband for facades a
0:06:44way to estimate is the approximate posterior
0:06:48by minimizing the kl divergence between the original posterior and this approximation
0:06:55so
0:06:58yeah i need to introduce some notation
0:07:00namely we just
0:07:03then all the
0:07:05a simple the eigenvoices an associated to each block
0:07:09of the i-vectors all each can i
0:07:12we i is associated with a low that you wanna buy vector components
0:07:18and these are just the compliments of those
0:07:20subsets so that we can express
0:07:24duplication in this way
0:07:26so if we do some until we updated for each
0:07:31a factor of the posterior of the approximate posterior
0:07:35the its distribution is a great nor without expression which is very see that the
0:07:41original i-vector inspiration
0:07:43the difference is that this precision matrix is here are computed using the eigenvoices relative
0:07:49to this subset
0:07:51and for the mean of the posterior we are essentially centering the statistics over a
0:07:58slightly different ubm
0:08:00essentially we
0:08:02say that
0:08:04if we assume that are not components of the i-vector a fixed size and we
0:08:09are
0:08:10to this end the statistics of these new ubm
0:08:15and
0:08:16this is
0:08:18these are allows us to see what is the complexity of this that be
0:08:24we do not take a
0:08:27okay reestimations only a new implementation implementing this technique because
0:08:32if we just compute this at every time with a block size with a block
0:08:37of size one
0:08:39the complexity is again what that the unit vector images because every time
0:08:44centering this
0:08:47so we need is
0:08:50we keep a supervector of a set of statistics which are always cat center of
0:08:57the i-vector estimate
0:09:00and we use the real well then you mean is computed by removing the centre
0:09:07and all those components that we are estimating and then after we had they the
0:09:13mean we update and you'll a vector of since order statistics so that its center
0:09:19of the joystick to be a vector
0:09:22so this way if we consider the contribution of the computational the precision matrix the
0:09:28complexity of this approach is proportional to the dimensionality of i-vectors and the number of
0:09:35iterations that we need to perform
0:09:37to compute the i-vector
0:09:41i can see is so that the similarity of this form with the original i-vector
0:09:46was the covariance matrix essentially these are the block diagonal of that the last matrix
0:09:52and we can model
0:09:54again
0:09:55two different techniques to compute the and you know we
0:09:58compute
0:09:59we therefore computation to compute the every time this covariance matrices
0:10:04or we can restore the block diagram but also the audience matrix so in this
0:10:09case we get
0:10:11plus i selection time but slightly higher memory and the memory requirements depend on the
0:10:16size we choose for the block
0:10:19so essentially well we can show that this variational bayes and the variational bayes approach
0:10:26implements a gaussian approach to the solution of this you know system
0:10:32and we also investigated a different
0:10:35techniques for
0:10:37so it is used and namely the jacobi method in the conjugate gradient vector
0:10:43what we found out is that the jacobi method is very see that this approach
0:10:47but instead of updating the
0:10:50i-vector after each iteration you have a vector is updated only after all components to
0:10:55be estimated
0:10:56in these encoders and this causes slightly slow whatever
0:11:01the
0:11:02the convergence rates in our experience
0:11:06yeah we analyze is conjugate gradient
0:11:09what's nice about squinted at it is that we don't need to be bad
0:11:14the
0:11:16covariance matrix here
0:11:19what to do is that we don't even need to compute it really because we
0:11:23just need to do the product of this matrix time a general vector which is
0:11:27required by the conjugate gradient algorithm
0:11:31so if we write the computation in the
0:11:34but for your precious in this way we can see that the computation of this
0:11:38product is a say should be you know in apples
0:11:41you don't the components so it's not in the number of the components of the
0:11:46ubm
0:11:47number of features and dimensionality of i-vector
0:11:50so we have a complexity which is the same as the variational bayes approach
0:11:57so i guess
0:11:59this kind of what's nice about this technique is that we don't require any kind
0:12:03of additional memory
0:12:05and has the for the variational bayes approach we can use this technique what's a
0:12:11full covariance ubm if we do the prewhitening all the transmitters
0:12:18ubm ones
0:12:21so
0:12:22i'll show you how we show you some results on the female dataset the extended
0:12:28telephone conditions one is
0:12:30so we do then
0:12:33our setup is a sixty dimensional ubm we
0:12:37two thousand four components
0:12:40we ask for permission to make
0:12:44we use
0:12:44but i will length normalized i-vectors classifier you have
0:12:50you know where
0:12:51limitation we assume efficiency issues so i'm sure you
0:12:55the results
0:12:57those
0:13:00so
0:13:01before seen the results just one point out that
0:13:05you directions
0:13:08yeah one is an article
0:13:13the exact i-vector also
0:13:16and
0:13:17so if we don't know that we can recover exactly same
0:13:21accuracy or you know classifier
0:13:26so you interest in is
0:13:28see if we can do that i mean
0:13:31we can stop yeah and still
0:13:33achieve good results we
0:13:35process structure of course
0:13:38which one
0:13:40which was the first one
0:13:42so yeah i'm showing the results of the baseline system the egg that i
0:13:49well approximated i-vectors
0:13:52variational bayes the case we
0:13:54size is ten twenty and these are the same six
0:14:00we gotta
0:14:02estimation that just a special yeah stuff
0:14:06both
0:14:07chosen
0:14:09so as to was evaluated using the difference between the do not before S L
0:14:16two successive based i-vector estimates
0:14:19so essentially this experiment is doing between two or three iterations for estimation is
0:14:26in between three and four
0:14:29so that's is a specialist in this sort of two norm of the residual
0:14:37so essentially what we see you know that
0:14:40most of the system performance X and
0:14:44yeah
0:14:45and this was the reason why we phones
0:14:48so that is
0:14:50two
0:14:51find out
0:14:52so you
0:14:53section
0:14:55so what is that sometimes these are
0:15:00this system including the required courses
0:15:07and
0:15:08okay system is the one which implies
0:15:12the request and is comparable to the variational bayes approach does last
0:15:22you see that
0:15:24essentially the slow
0:15:29yeah
0:15:30we can be used to always
0:15:34yeah voice matrix
0:15:36however
0:15:37note that
0:15:38the lattice as we can see that
0:15:42that is
0:15:43quite high baseline
0:15:45on the other the original the variational bayes we can obtain an accurate results just
0:15:52a few percent reason
0:15:54done
0:15:55which one compared to
0:15:57the time required tools it's not forced zero so statistics is
0:16:03what was used
0:16:07so that's addition
0:16:11yeah he also that the not exist
0:16:14the size of the box
0:16:16is
0:16:17and we can see that using
0:16:19yeah it is of course there were requirements
0:16:26this case it's function
0:16:29significantly
0:16:31and
0:16:32essentially
0:16:34it is comparable to that of the country
0:16:38while the using
0:16:40reason not to block size is allows us to
0:16:45improve
0:16:46right
0:16:47and
0:16:48and the
0:16:51so
0:16:53oh
0:16:54we
0:16:56we have some and you never efficient accurate vectors
0:17:00techniques
0:17:01which are based on variational bayes submission
0:17:05and the use of and
0:17:07so
0:17:09yeah
0:17:11we present a little sizes line
0:17:14but since then
0:17:16we have some role channels to it's not very accurate i-vector we
0:17:22a very
0:17:23we i don't know the we present the time required vector itself
0:17:31well i think that is
0:17:33on the other and allows to
0:17:37yeah the right directions
0:17:41well we use a high
0:17:46i
0:17:56to say let's thank the speaker
0:17:59so you have
0:18:00a few minutes for questions
0:18:03for a
0:18:10yes
0:18:14yes
0:18:15yes
0:18:17well
0:18:19i
0:18:20yeah
0:18:22nice
0:18:24and
0:18:26okay
0:18:28yeah or
0:18:30yes i
0:18:32yeah
0:18:35one
0:18:37it's
0:18:38then
0:18:39really
0:18:42yeah
0:18:43oh
0:18:46yeah
0:18:49i
0:18:51i
0:18:53i
0:18:54i
0:18:57well
0:18:59so
0:19:01oh
0:19:05okay
0:19:08that's this
0:19:09five
0:19:10which was
0:19:12and what's
0:19:15yeah
0:19:16say that the results are i see that
0:19:21you know
0:19:24yeah
0:19:25yeah
0:19:25or
0:19:27but
0:19:29that's right
0:19:31i
0:19:32yeah
0:19:34yeah
0:19:37of course
0:19:39vol
0:19:40one of us
0:19:43right
0:19:44you want
0:19:52i
0:19:52oh
0:20:04i
0:20:05yes as well
0:20:08the base classifier
0:20:10i would say that
0:20:12no is this is
0:20:13the classifier
0:20:15right
0:20:16very fast
0:20:18i
0:20:20i
0:20:21you don't
0:20:26i
0:20:33i
0:20:33yeah
0:20:35one
0:20:36yeah
0:20:44questions
0:20:47let me ask
0:20:49i have seen the difference between what partly depend what you need or what we
0:20:52try to
0:20:54rotate the
0:20:55the space of eigenvectors so that
0:20:57it would be already gonna do you start from the same
0:21:01oh
0:21:03this
0:21:04since
0:21:05use
0:21:09yeah
0:21:13or
0:21:16yes
0:21:17say
0:21:23yeah
0:21:26i
0:21:27i
0:21:30but then you effect compared with what we did basically he try to diagonalized a
0:21:34separate transmitted first and what you need to diagonal structure and i
0:21:43yeah
0:21:45results
0:21:46oh
0:21:47yeah
0:21:49well as
0:21:51oh
0:21:54just
0:21:57oh
0:21:58make
0:22:07that's in fact the speaker again and
0:22:09i