0:00:15 thanks project but i introductions and graph and all that it but the them going to present its adjoint what my a student's t it's wise the are we hi joanne prof young from nineteen you still pose and try to train a so put into the right context we called it to a post present about one way and in central is on the use of i-vectors in the lda so in this paper stand alone to present but the intention is to we use the computations in i-vector extraction so we call repeat competition i-vectors "'kay" for going to detail is let me as bank of a slight to we send the background and so as the motivations of the work so and i-vectors extraction process can be seen as a compression process right maybe you compress across the crime and the supervector space the optimal which is a low and fixed dimensional vector speech recall i-vectors which can see this not only the speaker information is but we have the characteristics of the recording devices the microphones to use the transmission channel characteristics which including the ankle is made that we use in transmission for this transmitted of the speech signals that as well as the cost experiments so point two would be a mathematical form this is the i-vector this is i-vectors and i-vectors is the mlp x timit of the latent variables and if you see here we have a single latent variable which is high cross gaussian and it i of course of frames so tying across frames and also is the one that gives us that compressions process compressed "'cause" a time in this but with the space so we assume that we know the alignment of frames to gaussian and in the actual implementations this year of a frame alignment of gaussians could be you love ideally what the gmm pasta you all most of is only used a single posteriors i so no if we look at this latent variables we there is the assumption that the trial of this late in trouble is the standard gaussian distributions to be zero mean and unit variance so even the observation sequence we could x t makes the post you which is and that of gaussians we main five and covariance are inverse of course this five will be applied the speech is the posterior means of the latent variable x and one can see i-vectors is italy about it was the covariance the pot over t matrix c think mars is the colour matrix of the ubm and f is the centroids first order statistics and l inverse which is the post your covariance is under determined by the joe the statistics so one point or not is that in order to compute what extent the i-vectors we have to compute the posterior covariance because this is part of the questions okay we cannot in this paper reviews what we called up you want the statistics where we want to do is to be active speech this task in the house and it's open to t and f similar here so this sector simplified equations we ought having the stick my speaker okay so now the we have only one objective in this paper that is really of the computations complexity of i-vector extraction while keeping a memory common the low and which like all perhaps not degradations on the performance okay so why it is important because is important because implementations of a very fast exclamation i-vectors could be before on hand held devices all for that scale how based applications where a single server may have to receive request from hundred or one thousand quite some kind of the same time okay and also we reason we also recently we have you know increasing the numbers of gaussian w is a system for example in the people there is going to present coming sections see number one thousand which process ten thousand so direct computation would be something while for these scenario okay and i know whatever estimation is that the the and i think is on the right precomputation i-vectors rather conservative exclamation t matrix because t matrix is extreme at once and usually offline and we can use a huge amount of computation resources they can use fixed but okay so yes the problem statement the computation of alternate of i-vector extractions lights as the exclamations of the posterior means requires us to extreme at first the post your covariance so are they are couples of existing solutions to solve this problem and including the eigen decomposition method also covariance model but we fix compose account by a guy factors subspace by up a little and we also on the sparse coding to improve the you know a simplified the most your cover estimations so in this paper what we propose is to complexity may rightly the posterior means be up and it to evade it will still covariance so we did this by doing a first one we call to use an informative prior which are going to shows later and the uniform occupancy assumptions are still with the commission this tool we can do a fuss extreme i-vectors of course without the need to estimate the posterior covariance okay so in the combination of all the i-vector extraction we issue a standard doesn't profile and no if we can see those involvement for all we mean given by new p and the core and you must marquee then i-vector extractions is given by this regions where we have to an additional terms here people determines by the cover the prior and this new mike so no if we consider the case where this like with the zero this cycle demanded a matrix then distance will disappear and is only go to the i didn't matrix so we did use to the standard form so in this paper we propose to use this well for informative problem where the means to zero but the but over in this young by this t is the total where t matrix still we have the inner product of that order bitexts of and in bus to be a book file so okay now i've able to reduce i think so what is that we in the i-vector second formulas we have additional terms you about the problem right so now if you plot is into this i-vector extraction from will then we'll when the get this right so we can always share that it transpose t there is a inverse because we can this always full rank i given that the assumption of training data then we could take this t l from no and again this in both then we'll get right and then us these matrix inversion identity which i copied from the matrix a global okay so like the idea guys of you have a matrix p and q and p here we construct the although something p and q by putting this in the front right so if you look at this formula speech is the same as this one right so we can say this is the p is it's a key when it's the pa then we can put this for what and then sort of these right so no if you do and this formulas write this is the linear algebra this is a projection matrix right approaches in matrix is you know you can buy in this fall what you want you to a although than a matrix meaning that each column of this you want is a all the love each other columns and there is a unique now and you wanna spend the same subspace as the t matrix okay and this although the nice properties is actually introduced to the primal right and that's why we call it the problem we use at the subspace of the nineteen prior okay so if it'll it well like a avoiding the exclamation the posterior covariance by you know we can data extreme at the post you means you but the thing is that if you use this formula is going to encode more computations because we are dealing with the t t transpose which is a very big matrix so there's a reason why we have to introduce another assumptions recon uniform occupancy assumptions which speed up the computations okay so to do so we first of all window a singular value decomposition of t into t into u s b u one be a be a single but in a single but others matrix okay and then you is this side speech is assumed at stft matrix okay so one dataset is that you one which is the u one in the previous slide is spend the same subspace t and then you two is all together when you one okay then we use this property to simplify this formulas right so we can express t transit inverse t into this fall because this is equal to this right and then this can be expressed in to this file okay because of this property then we can multiply and into this so we have i plus and this okay next it's a i class and is equal to a and then apply the matrix inversion lemma in this from this is what we get and we apply gains this the are matrix inversion entity that we used before here we have these p he'll and p right now we can put this p the front then have a few when p so that is that we want to express this thing on the laugh in two days a inverse and i terms expressed in terms of you two which is orthogonal be you one or to go an o b g right so is the a uniform occupants assumptions because okay okay is i class and and itself is the diagonal matrix so if you look into individual elements of this matrix here what we get is this thing here what we get this and see divided by i one class and see right so that you need vol occupancy assumption says that for all the doesn't components the occupancy count divided by one cluster occupancy call is the same for all the constants right here we do need to know what's of value of what is appropriate value of all file what we assume is that this the same of a would be applied forty percent right so by doing so we have this into this fall and if you multiply this if you this is the i-vector extractor on this so if you multiply this t in two we did you to then this to move we can sell so we end up with this formula for i-vector extraction this is very fast because a week and pre-computed systems and this is thus this is a diagonal matrix right so taking the inverse is is very simple right okay no that's a look at the eer computational complexity so we have four comparison of for different the algorithm so we have the baseline i-vector extraction which is the standard fall we have the you know we have to do d in the product the of the but with these metrics t c transpose d c and for all the c components so this is your by c f m square and the m u is due to the metric conversions also in terms of memory cost may have to install but and i t matrix so this is the c f m okay so now forty fast baseline we can actually be computed is a t transpose t and story while this computer cost all for this a c m square but we will actually we use the complete data cost from this to this okay and that for all what was made using the informative prior without the uniform occupants assumptions the a computational complexity and memory cost is it could be at the same and the fast baseline "'kay" because we can recompute distance and story well as for the fast the proposed method we have computational complexity we use stream and the to be a this them and we can pretty complete distance down to memory so in terms of computational complexity the proposed fast meant that is twelve times faster then the fast baseline and had a time faster than the s o baseline okay so you know there is to present a shall we talk about a as of today propagation we need to post your problem then i mean yes application of an impostor common so the pasta correct could actually be computed using the same fast method a given by these cushion here using the same informative prior as well as the uniform corpus assumption i mean this the computational complexity also we can actually use this that informative prior given by d transposed he into the is that but be in the em a fixed emissions of the t matrix okay of course we only use in the is that but in the sense that we actually this car but others associated with a prior which which allows you i think in the form okay experiments the experiment was conducted on the is as i ten x and the fast come with condition one to nine we use a gender and then ubm we found two gaussians we fifty seven dimension mfcc and the ubm is trained on switchboard as i four or five or six and we use you we use the same the about to train the t matrix we do a co-ranks of four hundred based on the obvious p lda for scoring so our before passing the p lda we use the dimension i-vector those two hundred using lda and followed by an angle and for the p lda we have the art when the speaker factors then we use a full race you can go into more the session but okay so this table shows the without so for the baseline the proposed as that method proposed fast method so the first rule so's the eer the second rule is the mean dcf so i'll know if we compare this results with this well we can see that the result is not really much difference so we can say that by using implement a project what we use it does not seem to degrade performance okay then a if we look at the common condition five which is a telephone conditions for the proposed fast make the degradation is actually about ten percent eer and four point five percent and mindcf k and t v c across all the night common conditions the relative degradation is ranging from ten to sixteen percent and where is you can be a source that you with six seven percent up to twenty point four percent mindcf okay so i'm is okay so this is this is the system that we use this it's of white data centre i suppose of the statistics normalize three the an the occupancy kernel so we use this as a small vectors and we'd work pca right and then we do what projections of all these test or training utterance and woman into the low dimensional subspace and useful for the p l d a simple so a what you can see that okay i'll why we do that because if you look at these formulas this is the can be seen as a transformation matrix and this is the input vector and is the projection of this input vector into a low dimensional vectors binary comparing to resolve this we don't fast made but it's the others shows that by using the t matrix training with the em in the commission of phone give a better performance no a this result shows the comparisons of you matrix train we do not all be informative problem with standard doesn't prowl but extremely informative problem so comparing this tool we can see that the proposed as that may to actually give a slightly better result okay so in conclusions we introduced two new concept of already computation i-vectors the first one is what we call the subspace l optimising pro and we the use of subspace modeling probably can about in the to compute the posterior covariance okay before computing the pasta means and then we use a uniform workable assumption because read used the computed complicity so we the combined combination use of this to the assumptions and informative prior we speed of the i-vector extraction process but i-vector trial we a slight degradation in terms of accuracy is my have we have time for a few questions so it seems useful problem of course i have so that i so i i this the performance of to me by saying this that's that we notice the same as baseline is you have access also we what the as that because exactly as we of the use of the uniform occupants assumptions by just using the subspace the other than same problem because we want to see that a by introducing difference that we first introduce the starts based recogniser brow and informal by a uniform the basic assumptions so want to see a in t v just what is the a what if x maybe use you know we introduce a subset of the problem we get a better performance of slightly was performance