0:00:39 | oh |
---|
0:00:41 | also |
---|
0:00:43 | fusion techniques for extracting i-vectors |
---|
0:00:46 | by efficient the |
---|
0:00:49 | we went looking for some way to address |
---|
0:00:52 | most of the memory of patient of the i-vector extractor |
---|
0:00:55 | extracting genes |
---|
0:00:58 | so the results a bit more state-of-the-art technology nowadays is based on i-vectors which are |
---|
0:01:04 | very good as |
---|
0:01:05 | produced a traditional |
---|
0:01:08 | the computation of i-vector can be quite demanding in that at least one of the |
---|
0:01:13 | time |
---|
0:01:15 | so while some solutions |
---|
0:01:19 | proposed for a system action with low memory requirements the namely |
---|
0:01:24 | the diagonal isolate vectors proposed bigram but plus the |
---|
0:01:29 | that is |
---|
0:01:31 | we should also shown to have some degradation in accuracy |
---|
0:01:36 | when |
---|
0:01:38 | some to show some degradation of accuracy so we |
---|
0:01:42 | well looking for a solution which does not include such degradation |
---|
0:01:46 | but still those two |
---|
0:01:49 | and greatly reduce the amount of memory required to store |
---|
0:01:54 | so how variation action again today |
---|
0:01:59 | that represent the original baser aside for i-vector extraction which is |
---|
0:02:04 | can see that one in the previous two |
---|
0:02:07 | then we present our conjugate gradient approach for i-vector extraction and finally present some experimental |
---|
0:02:14 | results of these techniques |
---|
0:02:17 | so |
---|
0:02:20 | i guess everybody else what's i-vectors are but does brief introduction |
---|
0:02:25 | there are not only for low dimensional informative for each utterance the presentations which is |
---|
0:02:30 | that i don't is like model |
---|
0:02:34 | so the most widely used |
---|
0:02:36 | i-vector race if we |
---|
0:02:39 | assume that |
---|
0:02:41 | most of the speaker and channel variations like that small subspace in the supervector space |
---|
0:02:47 | then we assume a session prior for the latent variable representing these variation |
---|
0:02:54 | and |
---|
0:02:55 | approximating the data likelihood by means so well with statistics we can compute the posterior |
---|
0:03:00 | of these latent variable |
---|
0:03:02 | and then we compute the i-vector a maximum a posteriori estimate of the latent variables |
---|
0:03:11 | we can show that the post |
---|
0:03:13 | is abortion these correspond to the a posteriori |
---|
0:03:17 | and for the i-vector |
---|
0:03:19 | so as you can see yeah |
---|
0:03:22 | computing is the computational cost matrix |
---|
0:03:26 | which and tasty a multiplication of the for the inverse matrix times the eigenvoice matrix |
---|
0:03:32 | is that |
---|
0:03:33 | or additional |
---|
0:03:36 | these |
---|
0:03:38 | this dataset |
---|
0:03:39 | which are |
---|
0:03:41 | a dimensionality which is where the i-vector dimensionality |
---|
0:03:47 | so |
---|
0:03:48 | we can see that |
---|
0:03:50 | no plastic on a selection techniques can be |
---|
0:03:54 | and |
---|
0:03:55 | that is all so you see represents the number of abortions and the feature dimensionality |
---|
0:04:02 | and then use the i-vector dimensionality |
---|
0:04:05 | so if we don't put anything we have a |
---|
0:04:09 | complexity which is the |
---|
0:04:11 | with a quadratic in the i-vector dimensionality |
---|
0:04:15 | and is the examples in the number of gaussian in the definition of two features |
---|
0:04:21 | we can reduce the complexity but i mean and we ask that if we want |
---|
0:04:26 | to this matter since he |
---|
0:04:28 | but this is we have a shot of memory constraint which is again quadratic in |
---|
0:04:33 | the effect of the nation's and proportional to the number of abortions |
---|
0:04:38 | with |
---|
0:04:39 | jessica like two thousand forty eight dimension of the ubm as used in this is |
---|
0:04:46 | easily the most expensive |
---|
0:04:47 | part that's memory of an i-vector instead |
---|
0:04:52 | so i thought that was the last yeah i organisation based on that have a |
---|
0:04:58 | nice mess over a vector instruction was proposed |
---|
0:05:02 | which essentially okay we can i forgot mention that we can have the same as |
---|
0:05:08 | that yeah |
---|
0:05:10 | from the form has just by performance a normalization for the problem with statistics and |
---|
0:05:15 | in this case of the eigenvoice matrix |
---|
0:05:18 | then we can assume that these are simultaneously that as a model by some methods |
---|
0:05:23 | Q and that we cannot compute an approximation of the posterior covariance which is |
---|
0:05:30 | the yeah not so that |
---|
0:05:32 | and session |
---|
0:05:34 | can be performed in a much faster way with a very limited additional requirements |
---|
0:05:40 | however you know it's |
---|
0:05:42 | yes |
---|
0:05:43 | right i can cause a degradation recognition accuracy |
---|
0:05:47 | so we wanted to do better in terms of what you see here |
---|
0:05:52 | so |
---|
0:05:53 | and we said that the problem is the computation of the covariance matrix |
---|
0:05:59 | the problem is that the covariance matrix is not that yeah |
---|
0:06:03 | if you |
---|
0:06:05 | this means that the i-vector components would be uncorrelated |
---|
0:06:10 | you're and the posteriors that would factorize |
---|
0:06:14 | so even though the posterior said that cannot be factorized about the different components we |
---|
0:06:19 | look for an approximation of the posterior which factorizes all the sets of the i-vector |
---|
0:06:25 | components |
---|
0:06:27 | so we partition the i-vector components in to be disjoint sets |
---|
0:06:32 | and we assume that the |
---|
0:06:33 | here are can be approximated by |
---|
0:06:36 | i distribution which factorizes of these states |
---|
0:06:39 | yeah |
---|
0:06:41 | the correlation baseband for facades a |
---|
0:06:44 | way to estimate is the approximate posterior |
---|
0:06:48 | by minimizing the kl divergence between the original posterior and this approximation |
---|
0:06:55 | so |
---|
0:06:58 | yeah i need to introduce some notation |
---|
0:07:00 | namely we just |
---|
0:07:03 | then all the |
---|
0:07:05 | a simple the eigenvoices an associated to each block |
---|
0:07:09 | of the i-vectors all each can i |
---|
0:07:12 | we i is associated with a low that you wanna buy vector components |
---|
0:07:18 | and these are just the compliments of those |
---|
0:07:20 | subsets so that we can express |
---|
0:07:24 | duplication in this way |
---|
0:07:26 | so if we do some until we updated for each |
---|
0:07:31 | a factor of the posterior of the approximate posterior |
---|
0:07:35 | the its distribution is a great nor without expression which is very see that the |
---|
0:07:41 | original i-vector inspiration |
---|
0:07:43 | the difference is that this precision matrix is here are computed using the eigenvoices relative |
---|
0:07:49 | to this subset |
---|
0:07:51 | and for the mean of the posterior we are essentially centering the statistics over a |
---|
0:07:58 | slightly different ubm |
---|
0:08:00 | essentially we |
---|
0:08:02 | say that |
---|
0:08:04 | if we assume that are not components of the i-vector a fixed size and we |
---|
0:08:09 | are |
---|
0:08:10 | to this end the statistics of these new ubm |
---|
0:08:15 | and |
---|
0:08:16 | this is |
---|
0:08:18 | these are allows us to see what is the complexity of this that be |
---|
0:08:24 | we do not take a |
---|
0:08:27 | okay reestimations only a new implementation implementing this technique because |
---|
0:08:32 | if we just compute this at every time with a block size with a block |
---|
0:08:37 | of size one |
---|
0:08:39 | the complexity is again what that the unit vector images because every time |
---|
0:08:44 | centering this |
---|
0:08:47 | so we need is |
---|
0:08:50 | we keep a supervector of a set of statistics which are always cat center of |
---|
0:08:57 | the i-vector estimate |
---|
0:09:00 | and we use the real well then you mean is computed by removing the centre |
---|
0:09:07 | and all those components that we are estimating and then after we had they the |
---|
0:09:13 | mean we update and you'll a vector of since order statistics so that its center |
---|
0:09:19 | of the joystick to be a vector |
---|
0:09:22 | so this way if we consider the contribution of the computational the precision matrix the |
---|
0:09:28 | complexity of this approach is proportional to the dimensionality of i-vectors and the number of |
---|
0:09:35 | iterations that we need to perform |
---|
0:09:37 | to compute the i-vector |
---|
0:09:41 | i can see is so that the similarity of this form with the original i-vector |
---|
0:09:46 | was the covariance matrix essentially these are the block diagonal of that the last matrix |
---|
0:09:52 | and we can model |
---|
0:09:54 | again |
---|
0:09:55 | two different techniques to compute the and you know we |
---|
0:09:58 | compute |
---|
0:09:59 | we therefore computation to compute the every time this covariance matrices |
---|
0:10:04 | or we can restore the block diagram but also the audience matrix so in this |
---|
0:10:09 | case we get |
---|
0:10:11 | plus i selection time but slightly higher memory and the memory requirements depend on the |
---|
0:10:16 | size we choose for the block |
---|
0:10:19 | so essentially well we can show that this variational bayes and the variational bayes approach |
---|
0:10:26 | implements a gaussian approach to the solution of this you know system |
---|
0:10:32 | and we also investigated a different |
---|
0:10:35 | techniques for |
---|
0:10:37 | so it is used and namely the jacobi method in the conjugate gradient vector |
---|
0:10:43 | what we found out is that the jacobi method is very see that this approach |
---|
0:10:47 | but instead of updating the |
---|
0:10:50 | i-vector after each iteration you have a vector is updated only after all components to |
---|
0:10:55 | be estimated |
---|
0:10:56 | in these encoders and this causes slightly slow whatever |
---|
0:11:01 | the |
---|
0:11:02 | the convergence rates in our experience |
---|
0:11:06 | yeah we analyze is conjugate gradient |
---|
0:11:09 | what's nice about squinted at it is that we don't need to be bad |
---|
0:11:14 | the |
---|
0:11:16 | covariance matrix here |
---|
0:11:19 | what to do is that we don't even need to compute it really because we |
---|
0:11:23 | just need to do the product of this matrix time a general vector which is |
---|
0:11:27 | required by the conjugate gradient algorithm |
---|
0:11:31 | so if we write the computation in the |
---|
0:11:34 | but for your precious in this way we can see that the computation of this |
---|
0:11:38 | product is a say should be you know in apples |
---|
0:11:41 | you don't the components so it's not in the number of the components of the |
---|
0:11:46 | ubm |
---|
0:11:47 | number of features and dimensionality of i-vector |
---|
0:11:50 | so we have a complexity which is the same as the variational bayes approach |
---|
0:11:57 | so i guess |
---|
0:11:59 | this kind of what's nice about this technique is that we don't require any kind |
---|
0:12:03 | of additional memory |
---|
0:12:05 | and has the for the variational bayes approach we can use this technique what's a |
---|
0:12:11 | full covariance ubm if we do the prewhitening all the transmitters |
---|
0:12:18 | ubm ones |
---|
0:12:21 | so |
---|
0:12:22 | i'll show you how we show you some results on the female dataset the extended |
---|
0:12:28 | telephone conditions one is |
---|
0:12:30 | so we do then |
---|
0:12:33 | our setup is a sixty dimensional ubm we |
---|
0:12:37 | two thousand four components |
---|
0:12:40 | we ask for permission to make |
---|
0:12:44 | we use |
---|
0:12:44 | but i will length normalized i-vectors classifier you have |
---|
0:12:50 | you know where |
---|
0:12:51 | limitation we assume efficiency issues so i'm sure you |
---|
0:12:55 | the results |
---|
0:12:57 | those |
---|
0:13:00 | so |
---|
0:13:01 | before seen the results just one point out that |
---|
0:13:05 | you directions |
---|
0:13:08 | yeah one is an article |
---|
0:13:13 | the exact i-vector also |
---|
0:13:16 | and |
---|
0:13:17 | so if we don't know that we can recover exactly same |
---|
0:13:21 | accuracy or you know classifier |
---|
0:13:26 | so you interest in is |
---|
0:13:28 | see if we can do that i mean |
---|
0:13:31 | we can stop yeah and still |
---|
0:13:33 | achieve good results we |
---|
0:13:35 | process structure of course |
---|
0:13:38 | which one |
---|
0:13:40 | which was the first one |
---|
0:13:42 | so yeah i'm showing the results of the baseline system the egg that i |
---|
0:13:49 | well approximated i-vectors |
---|
0:13:52 | variational bayes the case we |
---|
0:13:54 | size is ten twenty and these are the same six |
---|
0:14:00 | we gotta |
---|
0:14:02 | estimation that just a special yeah stuff |
---|
0:14:06 | both |
---|
0:14:07 | chosen |
---|
0:14:09 | so as to was evaluated using the difference between the do not before S L |
---|
0:14:16 | two successive based i-vector estimates |
---|
0:14:19 | so essentially this experiment is doing between two or three iterations for estimation is |
---|
0:14:26 | in between three and four |
---|
0:14:29 | so that's is a specialist in this sort of two norm of the residual |
---|
0:14:37 | so essentially what we see you know that |
---|
0:14:40 | most of the system performance X and |
---|
0:14:44 | yeah |
---|
0:14:45 | and this was the reason why we phones |
---|
0:14:48 | so that is |
---|
0:14:50 | two |
---|
0:14:51 | find out |
---|
0:14:52 | so you |
---|
0:14:53 | section |
---|
0:14:55 | so what is that sometimes these are |
---|
0:15:00 | this system including the required courses |
---|
0:15:07 | and |
---|
0:15:08 | okay system is the one which implies |
---|
0:15:12 | the request and is comparable to the variational bayes approach does last |
---|
0:15:22 | you see that |
---|
0:15:24 | essentially the slow |
---|
0:15:29 | yeah |
---|
0:15:30 | we can be used to always |
---|
0:15:34 | yeah voice matrix |
---|
0:15:36 | however |
---|
0:15:37 | note that |
---|
0:15:38 | the lattice as we can see that |
---|
0:15:42 | that is |
---|
0:15:43 | quite high baseline |
---|
0:15:45 | on the other the original the variational bayes we can obtain an accurate results just |
---|
0:15:52 | a few percent reason |
---|
0:15:54 | done |
---|
0:15:55 | which one compared to |
---|
0:15:57 | the time required tools it's not forced zero so statistics is |
---|
0:16:03 | what was used |
---|
0:16:07 | so that's addition |
---|
0:16:11 | yeah he also that the not exist |
---|
0:16:14 | the size of the box |
---|
0:16:16 | is |
---|
0:16:17 | and we can see that using |
---|
0:16:19 | yeah it is of course there were requirements |
---|
0:16:26 | this case it's function |
---|
0:16:29 | significantly |
---|
0:16:31 | and |
---|
0:16:32 | essentially |
---|
0:16:34 | it is comparable to that of the country |
---|
0:16:38 | while the using |
---|
0:16:40 | reason not to block size is allows us to |
---|
0:16:45 | improve |
---|
0:16:46 | right |
---|
0:16:47 | and |
---|
0:16:48 | and the |
---|
0:16:51 | so |
---|
0:16:53 | oh |
---|
0:16:54 | we |
---|
0:16:56 | we have some and you never efficient accurate vectors |
---|
0:17:00 | techniques |
---|
0:17:01 | which are based on variational bayes submission |
---|
0:17:05 | and the use of and |
---|
0:17:07 | so |
---|
0:17:09 | yeah |
---|
0:17:11 | we present a little sizes line |
---|
0:17:14 | but since then |
---|
0:17:16 | we have some role channels to it's not very accurate i-vector we |
---|
0:17:22 | a very |
---|
0:17:23 | we i don't know the we present the time required vector itself |
---|
0:17:31 | well i think that is |
---|
0:17:33 | on the other and allows to |
---|
0:17:37 | yeah the right directions |
---|
0:17:41 | well we use a high |
---|
0:17:46 | i |
---|
0:17:56 | to say let's thank the speaker |
---|
0:17:59 | so you have |
---|
0:18:00 | a few minutes for questions |
---|
0:18:03 | for a |
---|
0:18:10 | yes |
---|
0:18:14 | yes |
---|
0:18:15 | yes |
---|
0:18:17 | well |
---|
0:18:19 | i |
---|
0:18:20 | yeah |
---|
0:18:22 | nice |
---|
0:18:24 | and |
---|
0:18:26 | okay |
---|
0:18:28 | yeah or |
---|
0:18:30 | yes i |
---|
0:18:32 | yeah |
---|
0:18:35 | one |
---|
0:18:37 | it's |
---|
0:18:38 | then |
---|
0:18:39 | really |
---|
0:18:42 | yeah |
---|
0:18:43 | oh |
---|
0:18:46 | yeah |
---|
0:18:49 | i |
---|
0:18:51 | i |
---|
0:18:53 | i |
---|
0:18:54 | i |
---|
0:18:57 | well |
---|
0:18:59 | so |
---|
0:19:01 | oh |
---|
0:19:05 | okay |
---|
0:19:08 | that's this |
---|
0:19:09 | five |
---|
0:19:10 | which was |
---|
0:19:12 | and what's |
---|
0:19:15 | yeah |
---|
0:19:16 | say that the results are i see that |
---|
0:19:21 | you know |
---|
0:19:24 | yeah |
---|
0:19:25 | yeah |
---|
0:19:25 | or |
---|
0:19:27 | but |
---|
0:19:29 | that's right |
---|
0:19:31 | i |
---|
0:19:32 | yeah |
---|
0:19:34 | yeah |
---|
0:19:37 | of course |
---|
0:19:39 | vol |
---|
0:19:40 | one of us |
---|
0:19:43 | right |
---|
0:19:44 | you want |
---|
0:19:52 | i |
---|
0:19:52 | oh |
---|
0:20:04 | i |
---|
0:20:05 | yes as well |
---|
0:20:08 | the base classifier |
---|
0:20:10 | i would say that |
---|
0:20:12 | no is this is |
---|
0:20:13 | the classifier |
---|
0:20:15 | right |
---|
0:20:16 | very fast |
---|
0:20:18 | i |
---|
0:20:20 | i |
---|
0:20:21 | you don't |
---|
0:20:26 | i |
---|
0:20:33 | i |
---|
0:20:33 | yeah |
---|
0:20:35 | one |
---|
0:20:36 | yeah |
---|
0:20:44 | questions |
---|
0:20:47 | let me ask |
---|
0:20:49 | i have seen the difference between what partly depend what you need or what we |
---|
0:20:52 | try to |
---|
0:20:54 | rotate the |
---|
0:20:55 | the space of eigenvectors so that |
---|
0:20:57 | it would be already gonna do you start from the same |
---|
0:21:01 | oh |
---|
0:21:03 | this |
---|
0:21:04 | since |
---|
0:21:05 | use |
---|
0:21:09 | yeah |
---|
0:21:13 | or |
---|
0:21:16 | yes |
---|
0:21:17 | say |
---|
0:21:23 | yeah |
---|
0:21:26 | i |
---|
0:21:27 | i |
---|
0:21:30 | but then you effect compared with what we did basically he try to diagonalized a |
---|
0:21:34 | separate transmitted first and what you need to diagonal structure and i |
---|
0:21:43 | yeah |
---|
0:21:45 | results |
---|
0:21:46 | oh |
---|
0:21:47 | yeah |
---|
0:21:49 | well as |
---|
0:21:51 | oh |
---|
0:21:54 | just |
---|
0:21:57 | oh |
---|
0:21:58 | make |
---|
0:22:07 | that's in fact the speaker again and |
---|
0:22:09 | i |
---|