0:00:15thanks project but i introductions
0:00:17and graph and all that it but the them going to present its adjoint what
0:00:22my a student's t
0:00:24it's wise the are we hi joanne prof young from nineteen you still pose and
0:00:29try to train a
0:00:30so put into the right context we called it to a post present about one
0:00:36way and in central
0:00:38is on the use of i-vectors in the lda
0:00:41so in this paper stand alone to present but the intention is to we use
0:00:46the computations
0:00:47in i-vector extraction so we call repeat competition i-vectors
0:00:53"'kay" for going to detail is let me as bank of a slight
0:00:57to we send the background and so as the motivations of the work
0:01:02so and i-vectors extraction process can be seen as a compression process
0:01:07right maybe you compress
0:01:09across the crime
0:01:11and the supervector space
0:01:13the optimal which is a low and fixed dimensional vector speech recall i-vectors which can
0:01:18see this
0:01:19not only the speaker information is but we have the characteristics of the recording devices
0:01:25the microphones to use
0:01:27the transmission channel characteristics which including the ankle is made that we use
0:01:32in transmission
0:01:34for this transmitted of the speech signals that as well as the cost experiments
0:01:39point two would be a mathematical form this is the i-vector
0:01:44this is i-vectors and i-vectors
0:01:47is the mlp x timit of the
0:01:50latent variables
0:01:53if you see here we have a single latent variable which is high cross
0:01:58and it i of course of frames so tying across frames and also is the
0:02:03one that gives us that compressions process
0:02:06compressed "'cause" a time in this but with the space
0:02:11we assume that we know the alignment of frames to gaussian
0:02:15and in the actual implementations this year of a frame alignment of gaussians
0:02:20could be you love ideally what the gmm pasta you
0:02:24most of is only used a single posteriors i
0:02:28so no if we look at this latent variables
0:02:33there is the assumption that the
0:02:37of this late in trouble is the standard gaussian distributions to be zero mean and
0:02:41unit variance
0:02:43so even the observation sequence
0:02:46we could x t makes the post you which is and that of gaussians
0:02:50we main five and covariance are inverse
0:02:54of course this five
0:02:56will be applied the speech is the posterior means of the latent variable x
0:03:02one can see i-vectors is italy about it was the covariance the pot over t
0:03:07matrix c
0:03:08think mars is the colour matrix of the ubm
0:03:12and f is the centroids first order statistics
0:03:16l inverse which is the post your covariance is under determined by the
0:03:21joe the statistics
0:03:23so one point or not is that
0:03:26in order to compute what extent the i-vectors
0:03:29we have to compute
0:03:31the posterior covariance
0:03:33because this is part of the questions
0:03:38we cannot in this paper reviews what we called up you want the statistics
0:03:43where we want to do is to be active speech this task in the house
0:03:48and it's open to t and f similar here
0:03:51so this sector simplified equations
0:03:54we ought having the stick my speaker
0:04:04okay so now the we have only one
0:04:07objective in this paper that is really of the computations complexity of i-vector extraction
0:04:13while keeping a memory common the low
0:04:16and which like all perhaps not degradations on the performance
0:04:21okay so why it is important because
0:04:25is important because implementations of a very fast
0:04:30exclamation i-vectors could be
0:04:32before on hand held devices
0:04:34all for that scale how based applications where a single server may have to
0:04:41receive request
0:04:42from hundred or one thousand quite some kind of the same time
0:04:46okay and
0:04:48also we reason we also recently we have you know increasing
0:04:52the numbers of gaussian w is a system for example in the people there is
0:04:56going to present coming
0:05:01number one thousand which process ten thousand so direct computation would be
0:05:06something while for these
0:05:10okay and
0:05:11i know whatever estimation is that the
0:05:13the and i think is on the right precomputation i-vectors
0:05:17rather conservative exclamation t matrix because t matrix is extreme at once and usually
0:05:23and we can use a huge amount of computation resources
0:05:26they can use fixed but
0:05:32okay so
0:05:34yes the
0:05:35problem statement
0:05:39the computation of alternate of i-vector extractions
0:05:43lights as the exclamations of the posterior means
0:05:46requires us to
0:05:49extreme at first the post your covariance
0:05:52so are they are
0:05:54couples of existing solutions to solve this problem
0:05:58including the eigen decomposition method also covariance model but we
0:06:04fix compose account by a guy
0:06:07factors subspace
0:06:08by up a little
0:06:09and we also on the sparse coding to improve the you know a simplified
0:06:15the most your cover estimations
0:06:18so in this paper what we propose is to
0:06:23complexity may rightly the posterior means be up and it to evade it will still
0:06:29so we did this by doing a first one we call to use an informative
0:06:35which are going to shows later
0:06:37and the uniform occupancy assumptions are still with the commission this tool
0:06:41we can do a fuss extreme i-vectors
0:06:44of course without the need to estimate the posterior covariance
0:06:52okay so
0:06:53in the combination of all
0:06:57i-vector extraction we issue a standard doesn't profile
0:07:03no if we can see those
0:07:07involvement for all
0:07:09mean given by new p and the core and you must marquee then i-vector extractions
0:07:15is given by this regions where we have to an additional terms here
0:07:20people determines by the
0:07:23cover the prior
0:07:25and this new mike
0:07:28so no if we consider the case where this like with the zero this cycle
0:07:32demanded a matrix then distance will disappear
0:07:36and is only go to the i didn't matrix so we did use to the
0:07:39standard form
0:07:42so in this paper we propose to use this
0:07:47well for informative problem
0:07:49where the means to zero but the
0:07:52but over in this young by this
0:07:54t is the total where t matrix still we have the inner product
0:07:59of that order bitexts of and in bus
0:08:01to be a book file
0:08:03so okay now i've able to reduce i think
0:08:07so what is that we in the i-vector second formulas we have additional terms you
0:08:12about the problem right so now if you plot is into this i-vector extraction from
0:08:17will then we'll when the get this right so we can always share that it
0:08:23transpose t there is a inverse because we can this always full rank
0:08:27i given that the assumption of training data
0:08:30then we could take this t l
0:08:34no and again this in both then we'll get
0:08:41and then us these matrix inversion identity which
0:08:46i copied from the matrix a global
0:08:49okay so like the idea guys of you have a matrix p and q and
0:08:53p here we construct the although something
0:08:56p and q by putting this in the front right
0:09:00so if you look at this formula speech is the same as
0:09:07this one
0:09:10right so we can say this is the p is it's a key when it's
0:09:15the pa then we can put this
0:09:17for what
0:09:18and then sort of these right so no if you do and this formulas write
0:09:22this is the linear algebra this is a projection matrix right approaches in matrix is
0:09:27you know you can buy in this fall what you want you to a although
0:09:32than a matrix meaning that
0:09:33each column of this
0:09:35you want
0:09:37is a
0:09:39all the love each other columns
0:09:41and there is a unique now
0:09:43and you wanna spend the same subspace as the t matrix
0:09:47okay and this
0:09:49although the nice properties is actually introduced to the primal
0:09:54right and that's why we call it
0:09:56the problem we use
0:10:00at the subspace of the nineteen prior
0:10:04okay so
0:10:06if it'll it
0:10:09well like a avoiding the exclamation the posterior covariance
0:10:13by you know we can data extreme at the post you means you
0:10:16but the thing is that if you use this formula is going to encode more
0:10:21computations because we are dealing with the t
0:10:26t transpose which is a very big matrix
0:10:30so there's a reason why we have to introduce another assumptions recon uniform occupancy assumptions
0:10:35which speed up the computations
0:10:39okay so to do so
0:10:40we first of all window a singular value decomposition of t
0:10:45into t
0:10:47into u s b u one be a be a single but in a single
0:10:52but others matrix
0:10:55okay and then you
0:10:57is this
0:10:58side speech is assumed at stft matrix
0:11:11okay so
0:11:13one dataset is that you one which is the u one in the previous slide
0:11:19spend the same subspace t
0:11:22and then you two
0:11:23is all together when you one okay then we use this property to simplify this
0:11:31right so we can express t transit inverse t into this fall because this
0:11:39is equal to this right
0:11:41and then this can be expressed in to this file
0:11:45okay because of this property
0:11:48then we can multiply and into this so we have i plus and this okay
0:11:56it's a i class and is equal to a
0:12:01and then apply
0:12:02the matrix inversion lemma
0:12:05in this from this is what we get
0:12:07and we apply gains this the are
0:12:10matrix inversion entity that we used before here we have these
0:12:18he'll and p right now we can put this p the front
0:12:26have a few when p
0:12:28so that is that we want to express this thing
0:12:32on the laugh
0:12:34in two days
0:12:35a inverse and i terms
0:12:39expressed in terms of you two
0:12:41which is orthogonal be you one or to go an o b g
0:12:49so is the a uniform occupants assumptions
0:12:53because okay
0:12:54okay is
0:12:56i class and
0:12:59and itself is the diagonal matrix
0:13:02so if you look into individual elements of this
0:13:06matrix here what we get is this thing here what we get this and see
0:13:11divided by i
0:13:14one class and see
0:13:15right so that you need vol occupancy assumption says that
0:13:21for all the doesn't components
0:13:24the occupancy count divided by one cluster occupancy call is the same for all the
0:13:30constants right here we do need to know what's of value of what is appropriate
0:13:34value of all file
0:13:36what we assume is that this the same of a
0:13:40would be applied forty percent right
0:13:44by doing so we have this
0:13:46into this fall
0:13:48and if you multiply this if you this is the i-vector extractor on this so
0:13:53if you multiply this t
0:13:56in two
0:13:57we did you to then this to move we can sell
0:14:00so we end up with this formula for i-vector extraction this is very fast because
0:14:05a week and pre-computed systems
0:14:09and this is thus
0:14:10this is a diagonal matrix right so taking the inverse is
0:14:14is very simple
0:14:21okay no that's a look at the eer computational complexity
0:14:25so we have four
0:14:29comparison of for different the algorithm so we have the baseline i-vector extraction which is
0:14:34the standard fall
0:14:36we have the you know we have to do d in the product the of
0:14:41but with these metrics
0:14:43t c transpose d c
0:14:45and for all the c components so this is your by c f m square
0:14:53the m u is due to the metric conversions
0:14:57also in terms of memory cost may have to install but and i t matrix
0:15:02so this is the c f m
0:15:05okay so now forty fast baseline we can actually be computed is a t transpose
0:15:13and story while this computer cost all for this
0:15:17a c m square
0:15:18but we will actually we use the complete data cost from this to this
0:15:25okay and that for all
0:15:27what was made using the informative prior
0:15:31without the uniform occupants assumptions
0:15:34the a computational complexity and memory cost is it could be at the same and
0:15:40the fast baseline
0:15:42because we can recompute distance and story
0:15:46well as for the fast
0:15:48the proposed method
0:15:50we have
0:15:51computational complexity we use stream and the to be a this them
0:15:56and we can pretty complete distance down to memory so in terms of computational complexity
0:16:02the proposed
0:16:03fast meant that is
0:16:05twelve times faster
0:16:07then the fast baseline
0:16:08and had a time faster than the s o baseline
0:16:18okay so
0:16:20you know there is to present a shall we talk about
0:16:23a as of today propagation
0:16:25we need to post your problem
0:16:29then i mean yes application of an impostor common so the pasta correct could actually
0:16:34be computed using the same fast method
0:16:37a given by these cushion here
0:16:40using the same informative prior
0:16:42as well as the uniform corpus assumption i mean this the computational complexity
0:16:51we can actually use this that informative prior
0:16:53given by d transposed he
0:16:55into the is that
0:16:57but be in the em a fixed emissions of the t matrix
0:17:02okay of course we only use in the is that but in the sense that
0:17:05we actually
0:17:07this car but others associated with a prior which
0:17:12allows you i think in the form
0:17:21experiments the experiment was conducted on the is as i ten x and the fast
0:17:27come with condition one to nine
0:17:29we use a gender and then ubm we found two gaussians
0:17:34we fifty seven dimension mfcc and the ubm is trained on switchboard as i four
0:17:39or five or six and we use you we use the same the about to
0:17:42train the t matrix
0:17:43we do a co-ranks of four hundred
0:17:47based on the obvious p lda for scoring so our before
0:17:52passing the p lda we use the dimension i-vector those two hundred using lda
0:17:57and followed by an angle
0:17:58and for the p lda we have the art when the speaker factors then we
0:18:02use a full
0:18:04race you can go into
0:18:05more the session but
0:18:10okay so this table shows the
0:18:15without so for the baseline
0:18:18the proposed as that method proposed fast method
0:18:22so the first rule
0:18:23so's the eer the second rule is the mean dcf so i'll know if we
0:18:29compare this
0:18:31results with this
0:18:33well we can see that the result is not really much difference so we can
0:18:38say that
0:18:40by using implement a project what we use
0:18:43it does not seem to degrade performance
0:18:46okay then a if we look at the common condition five
0:18:54which is a telephone conditions
0:18:56for the proposed fast make the degradation is actually
0:19:01about ten percent eer and four point five percent and mindcf
0:19:06k and t v c across all the night common conditions
0:19:10the relative degradation is ranging from ten to sixteen percent and
0:19:16where is you can be a source that you with six seven percent
0:19:20up to twenty point four percent mindcf
0:19:27okay so i'm is okay so this is
0:19:33this is the system that we use
0:19:37this it's of
0:19:38white data centre i suppose of the statistics
0:19:41normalize three the an the occupancy kernel
0:19:45so we use this as a small vectors
0:19:47and we'd work pca
0:19:50and then we do what projections of all these test or training utterance
0:19:55and woman
0:19:56into the low dimensional subspace
0:19:58and useful for the p l d a simple
0:20:02a what you can see that
0:20:04okay i'll why we do that because
0:20:12if you look at these formulas
0:20:15this is the can be seen as a transformation matrix
0:20:20and this is the input vector
0:20:22and is the projection of this input vector
0:20:25into a low dimensional vectors
0:20:34binary comparing to resolve this we don't fast made but it's the others shows that
0:20:38by using the t matrix training with the em
0:20:41in the commission of phone give a better performance
0:20:46no a
0:20:47this result shows the comparisons of you matrix
0:20:50train we do not all be informative problem with standard doesn't prowl
0:20:56but extremely informative problem
0:21:00comparing this tool we can see that the proposed as that may to actually give
0:21:04a slightly better result
0:21:11okay so in conclusions we introduced two new concept
0:21:16of already computation i-vectors
0:21:18the first one is what we call the subspace l optimising pro
0:21:22and we
0:21:24the use of subspace modeling probably can about in the to compute the posterior covariance
0:21:30okay before computing the pasta means
0:21:33and then we use a uniform workable assumption because read used
0:21:38computed complicity
0:21:41so we the combined combination use of this to the assumptions and informative prior
0:21:46we speed of the i-vector extraction process
0:21:50but i-vector trial we a slight degradation in terms of accuracy
0:21:57is my have
0:22:03we have time for a few questions
0:22:15so it seems useful problem of course
0:22:19i have so that i so i
0:22:31this the performance of to me by saying this that's that we notice the same
0:22:36as baseline is you have access also we what the as that because
0:22:45exactly as we of the use of the uniform occupants assumptions
0:22:49by just using the subspace the other than same problem
0:22:56because we want to see that a by introducing difference that we first introduce the
0:23:01starts based recogniser brow and informal by a uniform the basic assumptions so want to
0:23:06see a in t v just
0:23:08what is the a
0:23:10what if x
0:23:12maybe use you know we introduce a subset of the problem
0:23:16we get a better performance of slightly was performance