0:00:15 | mission |
---|---|

0:00:15 | i'm not sure must be from but |

0:00:18 | and we will have a five |

0:00:21 | papers |

0:00:21 | first one |

0:00:23 | will the was that incorporation of eer for a start but |

0:00:29 | score variance spectra based normalization for i-vector standard probabilistic linear discriminant analysis |

0:00:36 | the authors are okay started |

0:00:38 | if we show skit domain is it is possible |

0:00:42 | not possible task |

0:00:44 | also you are the last one i can problems |

0:00:47 | so |

0:00:49 | present paper |

0:00:54 | yes thinking on that |

0:00:55 | so that in the past on that just mention needs a collaborative work so actually |

0:01:01 | it's also offer a lot because |

0:01:04 | the |

0:01:05 | right |

0:01:06 | the work has been started with condition scale and because you don't invest in the |

0:01:11 | speech and so i wanna start with some analysis of for what we did before |

0:01:17 | and also try to improve the work that has been previously |

0:01:22 | so |

0:01:23 | this is based on i think so and kcca welcome back to i-vectors |

0:01:28 | so i would start with a brief description of a system and which is based |

0:01:34 | on classical i-vector in i |

0:01:37 | yeah |

0:01:40 | a tall muscular the post processing of the i-vectors beef between the i-vector extraction the |

0:01:45 | plp |

0:01:46 | which is the buttons a system where we try to improve the discriminant see so |

0:01:52 | usually by using a D approaches |

0:01:55 | and also to compensate for the session variability so one way to do it is |

0:02:00 | to use the length normalization there are plenty of way to do this but i |

0:02:03 | will focus on these two |

0:02:06 | and as the discriminant C is a related to the variance |

0:02:10 | the data are and we look at |

0:02:14 | in the between and within class variability |

0:02:19 | so |

0:02:20 | we start with the description of the system so that |

0:02:23 | between on one for |

0:02:26 | so the system is just a classical ubm |

0:02:30 | everything is gender dependent from the beginning to the end |

0:02:33 | so the idea is to some distribution |

0:02:36 | the we extract mfcc sixty dimensions of the use the based on the use recognizer |

0:02:44 | and the constraint is the very classical so using a large amount of data for |

0:02:50 | based on four or five or six |

0:02:52 | and wait |

0:02:55 | so for the second pass the i-vector extractor also gender dependent and |

0:03:01 | we only telephone data from these four or five was the switchboard future |

0:03:06 | i think it's quite the state of the art |

0:03:09 | so just a rough idea of the number of sessions |

0:03:12 | and for the i would say that a normalization and classification training which includes both |

0:03:18 | the gplda training |

0:03:19 | and you training and everything will see in the following |

0:03:24 | we used a gender dependent subsets of the various sets of data |

0:03:28 | based on still for five or six and sweet spot and we use only is |

0:03:32 | because of the number of sessions yeah |

0:03:35 | and the we restrain the development set to segments for which the nominal X |

0:03:42 | is higher than one hundred eighty seconds |

0:03:46 | so no look at some tools that can be useful when we talk about variability |

0:03:52 | so first i would just remind discriminant C and covariances so |

0:03:58 | we |

0:03:59 | a commonly used the covariance matrices of the total covariance the between class covariance the |

0:04:05 | within class covariance |

0:04:07 | but usually it's very i mean it's very common speaker verification to instead of using |

0:04:12 | the between and within class covariance matrices to use the scatter |

0:04:16 | matrices |

0:04:18 | so the definition is roughly similar and so they can use |

0:04:21 | is that one of the ozark and for several applications |

0:04:25 | the recent chapter |

0:04:27 | is that |

0:04:28 | i don't the scatter matrices the do not take into account |

0:04:32 | the number of sessions per speaker so the weight actually a speaker is that the |

0:04:36 | one of the pounding of the number of sessions |

0:04:39 | so i think it's a commonly used look at we just need a few experiments |

0:04:45 | distance to see |

0:04:47 | in our system |

0:04:49 | one of the other it's much efficient |

0:04:52 | so what talking about classification what we are interested in is to |

0:04:57 | read use |

0:04:58 | the maximise the between |

0:05:01 | speaker variability and reduce the within speaker variability |

0:05:05 | and one way to do this is to look at the covariance |

0:05:10 | and so what we need to do |

0:05:13 | this to which is a spectrum and so too is very common to |

0:05:17 | yeah of the raw |

0:05:19 | the main |

0:05:21 | what is it so on this graph of we can see three plots which are |

0:05:25 | coming from the top of any to the violence |

0:05:28 | is that science and within class variance of for us so the speaker and session |

0:05:35 | so what we compute the between class covariance matrix |

0:05:38 | B |

0:05:39 | then we rotate all the data on the development set in the i-vector basis |

0:05:44 | can be |

0:05:46 | we compute then dimensions |

0:05:49 | and then we just but the diagonal of this matrix so you can see that |

0:05:52 | the variability |

0:05:53 | in the first dimensions is higher for the speaker and also for the sessions |

0:06:01 | so now talking about this way to maximize this ratio is to use the very |

0:06:05 | common lda someone is just maximizing the rayleigh coefficient |

0:06:10 | so there is completely defined is really coefficient using the within and between-class covariance matrices |

0:06:17 | or using the scatter matrix |

0:06:19 | so in this work the it would be used to reduce an exercise from six |

0:06:24 | hundred to eight so this is constant for all the experiments we have |

0:06:29 | and the to go is that it system description |

0:06:33 | we try to define scoring the first one is based on the two covariance model |

0:06:39 | that has been used by need to two years ago we can write |

0:06:42 | and so |

0:06:46 | shen |

0:06:47 | and the second one |

0:06:49 | is based on the period using the gaussian assumption |

0:06:53 | that you were used is based on so we used the eigenchannel matrix of the |

0:07:00 | key |

0:07:01 | but the full range because on television this time was using the diagonal see |

0:07:06 | so the number of speaker factors in the key thing but i mean at to |

0:07:11 | be consistent with the lda |

0:07:14 | and the number of channel factors six something because it's the way to |

0:07:19 | compensate for the diagonal |

0:07:24 | so that the problem is all this |

0:07:27 | students including to model programs and here is that everything is based on the questioned |

0:07:34 | assumption and |

0:07:36 | for those working you know that's two D C |

0:07:39 | we have very good to know that we are talking about it at the T |

0:07:45 | and the noise very company |

0:07:49 | not in the community that the i-vector are not following the nice motion but something |

0:07:54 | a bit more that you like |

0:07:56 | they didn't |

0:07:57 | distribution |

0:07:59 | so what we do is that we try to take all decided these i-vectors and |

0:08:04 | make that make the distribution motion |

0:08:07 | in one way to do this just been proposed initially by to present the same |

0:08:12 | time |

0:08:13 | is so then i guess they're male and that's the speech intention |

0:08:18 | is to normalize the magnitude of i-vectors |

0:08:21 | so using this formula as this one and just the |

0:08:25 | we centered at thousand that we just normalize them into |

0:08:31 | so using this method the distribution the car become a bit more cushion |

0:08:37 | and we can see that the effect is |

0:08:40 | very efficient |

0:08:42 | so just using the tool to |

0:08:45 | but |

0:08:46 | two covariance model |

0:08:48 | we can see that again in both equal error rate |

0:08:50 | and this form at mit on nist two thousand and so this |

0:08:55 | and instances two thousand and extending |

0:08:58 | is a simple presentation |

0:09:01 | so everything until now is very common so going back to the to the to |

0:09:06 | introduce previously |

0:09:07 | oh we would like to show the effect of length normalization |

0:09:12 | provides a by a spectrum |

0:09:15 | and as you can clearly see |

0:09:17 | a det curves a exactly the same except for the rest of the value was |

0:09:22 | because a normalizing the magnitude |

0:09:24 | we can see that |

0:09:26 | the button on the right side are smaller but it doesn't affect me much just |

0:09:32 | so |

0:09:35 | fortunately |

0:09:36 | an initial papers the maximization as we introduce with |

0:09:41 | whitening so it has to be done after whitening of the data |

0:09:45 | so that they got several in this in this algorithm so the whitening is just |

0:09:50 | using the total covariance matrix you know when i vectors and then we apply the |

0:09:56 | length normalization |

0:09:58 | at the same time initial risque introduce the eigenvector measure which is just a whitening |

0:10:03 | plus like summarisation but don't iterative |

0:10:07 | and by this iteratively the interest of this method is that for |

0:10:12 | converge very fast |

0:10:13 | and we introduce some properties |

0:10:17 | that we can use further |

0:10:20 | so the properties out that the need of the development set is a converging to |

0:10:25 | zero very fast |

0:10:26 | the covariance matrix the total covariance matrix is become the identity you five i |

0:10:34 | and going from this all the eigen vectors for the from the |

0:10:39 | between class covariance matrix |

0:10:42 | because also eigen vectors of the within class covariance matrix |

0:10:47 | and thus using all this property together |

0:10:50 | it happens that the eigen vectors of the |

0:10:54 | between and so within class covariance matrices |

0:10:57 | now solution of the |

0:11:00 | and the optimization |

0:11:01 | that means after all this |

0:11:03 | it at and the eight yeah improvement is |

0:11:08 | so that was one of the conclusion junctions |

0:11:12 | first paper |

0:11:13 | and the that we can see the effect of the this normalization of the on |

0:11:18 | the variance spectra |

0:11:19 | so before we a treatment i-vector based on this |

0:11:25 | provide |

0:11:26 | and after one |

0:11:29 | after one iteration which is exactly what the former romero |

0:11:33 | proposed |

0:11:34 | and so what i think the signal |

0:11:37 | we can see that the total covariance spectra become a flat |

0:11:42 | after two iterations |

0:11:44 | even better |

0:11:45 | and after three |

0:11:47 | almost perfect at least for the human eye |

0:11:50 | so you can see that |

0:11:52 | the big advantage of this paper is that the first dimensions |

0:11:55 | data does not contain the major portion |

0:12:00 | the variability |

0:12:01 | there might a portion of the session variability |

0:12:04 | so what actually |

0:12:06 | after this treatment the i-vectors become |

0:12:09 | optimal for the weighting coefficient optimization that means this should be the |

0:12:15 | optimization of at |

0:12:19 | so to illustrate this some results using the lda then we use the two covariance |

0:12:27 | model for score |

0:12:29 | and |

0:12:30 | so the baseline is just the length normalization when i say length normalization |

0:12:34 | without any whitening |

0:12:36 | is just the magnitude normalization |

0:12:40 | so you can see that using the |

0:12:43 | and eigenvector original doesn't improve |

0:12:46 | the performance after one iteration |

0:12:48 | if we use the scatter matrices to compute the U |

0:12:52 | but in the case we compute the |

0:12:54 | the at you using the between and within class covariance |

0:12:57 | we can see that for the female at least it improves the performance |

0:13:02 | and after two iterations |

0:13:04 | we can see that the conclusion is the same means using this data |

0:13:08 | the between and within class covariance matrices |

0:13:11 | he seems a not optimal so it's better to inspectors use the between class covariance |

0:13:18 | the initial definition |

0:13:21 | so that after this result we try to apply the same data to |

0:13:26 | before the P which is more robust maybe the covariance model |

0:13:31 | so |

0:13:32 | this is the baseline using only length normalization and when we apply two iterations eigenvector |

0:13:37 | original which is optimal in this case |

0:13:40 | that we see that the data is not adapted for the key idea |

0:13:44 | so the performance on the bizarre |

0:13:47 | might states even worse |

0:13:49 | but |

0:13:50 | there |

0:13:51 | so it was a extending this work a by still looking at the covariance is |

0:13:58 | but |

0:13:59 | thinking that after the length normalization everything is on the sphere so that means we |

0:14:04 | have a spherical surface and what it does not like this |

0:14:08 | and is very difficult to estimate the covariance matrix |

0:14:11 | because when you look at each speaker |

0:14:13 | from one side of this field one also that the within speaker variability we |

0:14:19 | very different |

0:14:21 | and if we just take the average of this |

0:14:23 | to estimate the development set within class covariance matrix |

0:14:27 | then it doesn't make sense anymore because the them at the |

0:14:31 | metrics negative for some speaker but obviously not for |

0:14:35 | so what was in this paper is that keeping the detectors on the surface because |

0:14:42 | no it's commonly admitted that is |

0:14:46 | really to use t-normalization for the session compensation |

0:14:50 | but we want to be the principal directions for the decision boundaries |

0:14:56 | that means |

0:14:57 | we won't us within class covariance matrix to become |

0:15:01 | diagonal and even better if it's the just the i don't teams about |

0:15:06 | a constant |

0:15:08 | so we decided to apply exactly the same algorithm as previously |

0:15:12 | an iterative process which is using the same process instead that we replace |

0:15:17 | the performance metrics |

0:15:19 | by the within class commencement |

0:15:22 | and so by doing this |

0:15:24 | we can see on the spectral of the same set of development that one iteration |

0:15:29 | make them |

0:15:31 | the set become very fast so this is the session but we can see that |

0:15:35 | it's almost what spread |

0:15:37 | oh the dimensions |

0:15:41 | and the after two iterations |

0:15:44 | for all or from the point of view fume and still exactly the same but |

0:15:48 | in the rate that you the performance |

0:15:51 | so that weak emission we can see that it's completely flat and what's the effect |

0:15:56 | so this |

0:15:57 | when we use them to us so that's why i'm gonna show in a few |

0:16:00 | minutes |

0:16:01 | but before that i just want to identify the this process can also be used |

0:16:05 | to initialize the key here matrices |

0:16:09 | actually |

0:16:11 | for most of us are using a pc in order to them |

0:16:16 | to initialize the key idea matrices because |

0:16:20 | provide the first point |

0:16:23 | the first information space |

0:16:25 | which we can reproduce so that's a very good starting point |

0:16:29 | and but actually what we propose here is to use this process so we what |

0:16:34 | they all the i-vectors the eigenvectors basis of B |

0:16:39 | and then we initialize the was that this is the speech in the speaker factor |

0:16:43 | matrices matrix |

0:16:45 | we each we initialize by using the first ten dollars |

0:16:49 | the distance |

0:16:51 | then for the eigenchannel metrics we use the |

0:16:56 | to rescue decomposition of the brain |

0:16:59 | the within class commencement |

0:17:00 | actually if you |

0:17:02 | if you can see that actually using you wanted to think the eigenchannel matrix |

0:17:07 | you can just initialize the signal using the same process works |

0:17:11 | i think |

0:17:12 | so that some results using the so we don't using before it's just detectors plus |

0:17:18 | the normalization process |

0:17:20 | and |

0:17:21 | so i just want to mention that |

0:17:23 | for the random initialization of the pac as the performance can vary it |

0:17:30 | depending on the initialization point |

0:17:32 | we performed and experiments with different physician and then we may be averaged the results |

0:17:40 | so you can see the baseline that i previously presented and also the eigenvector method |

0:17:45 | which is not efficient this case |

0:17:48 | and you can see that using the spherical normalization |

0:17:52 | how we call this |

0:17:54 | you normalization |

0:17:55 | performance |

0:17:56 | so improve in the case |

0:18:01 | so |

0:18:02 | no |

0:18:03 | the say the C station |

0:18:06 | process that i just described we can see that the performance of data |

0:18:10 | but i just want to that the fact that performance on the best are actually |

0:18:14 | it's just the fact that |

0:18:16 | in this case going towards for |

0:18:19 | the performance when using this physician are just the lower bound |

0:18:23 | of what we obtained by using mandarin physician |

0:18:26 | so that means it's a it's maybe better but i guarantee a certain of and |

0:18:36 | performance |

0:18:38 | so |

0:18:40 | to conclude this presentation i just want to new |

0:18:44 | and for that the fact that we used |

0:18:49 | so i didn't do this to the band spectra which is very well known be |

0:18:53 | non-separable so that |

0:18:56 | use that used in the presentation may be a few |

0:18:59 | use it |

0:19:00 | but this tool was to analyze the performance of the system and actually can also |

0:19:05 | be used to |

0:19:06 | what i'm thinking after obtaining the two i-vectors |

0:19:10 | it's a very good indicator of the quite |

0:19:13 | what |

0:19:13 | extractor |

0:19:14 | because just looking at the spectral you can have a rough idea of the performance |

0:19:17 | we get that yeah |

0:19:19 | and i think iteration is doing some experiments at this time and you will present |

0:19:23 | this |

0:19:24 | in this thesis i think very soon or he doesn't |

0:19:29 | so i |

0:19:30 | this would have to be useful for analysis proposed |

0:19:34 | so for the case we shoot |

0:19:38 | coming back to our previous paper we show that the rating process |

0:19:43 | the normalization whitening |

0:19:45 | to improve the performance slightly so it's not that the improvement |

0:19:49 | why not doing it twice and it's three |

0:19:52 | and |

0:19:54 | also that the co-occurrence matrices |

0:19:56 | i think you know case perform better than using the scatter matrix |

0:20:00 | then to and this talk just remember that the spherical nuisance normalization |

0:20:06 | in the in the middle |

0:20:08 | improve the performance of in the case of |

0:20:11 | you scoring |

0:20:13 | and also that |

0:20:14 | something in mentioned before but when you use the this type of process to initialize |

0:20:19 | you didn't matrices |

0:20:21 | the and the you don't need to perform |

0:20:24 | so yeah em iterations |

0:20:26 | so for the case i presently we obtained the best performance but |

0:20:31 | using hundred iterations of yeah |

0:20:34 | in case of problem can see section |

0:20:36 | using this process we just need to make ten iterations |

0:20:41 | so if the key is not the requesting them |

0:20:44 | training |

0:20:45 | in some ways to reduce the time |

0:20:49 | so no if you and question |

0:20:51 | yeah |

0:20:59 | oh |

0:21:04 | the |

0:21:07 | oh |

0:21:09 | i |

0:21:14 | i |

0:21:17 | oh |

0:21:18 | i |

0:21:22 | i |

0:21:43 | i |

0:21:51 | yeah i actually if we get really i don't like the length normalization because it's |

0:21:56 | three a |

0:21:57 | and only not process which is going just right now so apps of justly but |

0:22:03 | and i think we need to find a way we address this issue |

0:22:06 | by finding something more |

0:22:09 | consistent |

0:22:10 | you |

0:22:13 | yeah i |

0:22:26 | and you |

0:22:27 | i |