0:00:15 | i present the other words that we did the |
---|---|

0:00:19 | our first speech to the i-vector challenge |

0:00:21 | and actually that is in just the slides it is some more that was not |

0:00:27 | presented in the paper |

0:00:29 | but the was submitted the a system description i think that this was to |

0:00:33 | we should with you with you guys |

0:00:35 | so |

0:00:38 | here's outline of my talk so first i will present the |

0:00:41 | of the progress of our system |

0:00:43 | and then i will a detailed to work to ideas that are the class training |

0:00:47 | and the score normalisation for comp losing computing the stock |

0:00:52 | so |

0:00:54 | so we it is |

0:00:54 | the time of the panel for the mindcf |

0:00:58 | for also for our system |

0:01:00 | so for starting from the baseline was they |

0:01:03 | min dcf of zero point three hundred the at six |

0:01:06 | we end up with the mean dcf of zero point two hundred the forty seven |

0:01:09 | which makes a |

0:01:11 | relative improvement of about thirty six percent |

0:01:14 | so i'm gonna present the this is the main a direct in the i graphical |

0:01:20 | manner so we have the development set and have the evaluation set that is |

0:01:23 | split into enrollment and test |

0:01:25 | and we have this these the three steps that was the in that baseline so |

0:01:31 | we have the whitening the nickel normalisation and the cosine scoring |

0:01:34 | and as we see that only whitening need the training |

0:01:38 | and do so we don't they really need the at the label of |

0:01:43 | of the development set for that |

0:01:45 | so static from this the baseline us something we get that can be done is |

0:01:50 | if we can better choose the of the data the data for the whitening |

0:01:55 | i mean if we take only that the |

0:01:57 | the you tenants with more than thirty five seconds no id experiments |

0:02:01 | we will are getting like |

0:02:03 | what it is some improvements with the mean dcf of zero point three hundred seventy |

0:02:06 | two |

0:02:08 | so after afterward i what i'm gonna use that this the a conditioned i-vectors so |

0:02:12 | i'm gonna use this deaf twenty about two and later experiments |

0:02:16 | like to systems |

0:02:18 | so |

0:02:19 | so all the next step that we did is the clustering |

0:02:23 | so |

0:02:25 | is a clustering so actually tried different kind of clustering and then i'm gonna come |

0:02:29 | back this is just later on but the one of the best clustering that you're |

0:02:33 | getting is that what you called the cosine be any clustering |

0:02:36 | and so actually |

0:02:39 | after this clustering we take only the |

0:02:41 | the clusters that have more than a to i-vectors in it |

0:02:46 | and we and we apply and now we can apply like |

0:02:50 | supervised based techniques like lda be at a double c and muppets |

0:02:54 | so here we just a this study at and clustering in the loop and you |

0:02:59 | can see that we can already get some improvements women dcf of zero point three |

0:03:04 | hundred three hundred fifty six |

0:03:09 | so what we tried next is |

0:03:12 | less to place the cosine scoring by about the kind of scrollings force of for |

0:03:16 | them was the svm |

0:03:17 | so actually here the so we trained a linear svm for every target speaker |

0:03:23 | what the positive we have only one positive samples of the next normalized |

0:03:27 | i-vector of the target speaker and the negative samples are the next normalize i-vector of |

0:03:32 | of the processed the |

0:03:35 | development set |

0:03:37 | so we had we can get some jump more miss with the mindcf of three |

0:03:41 | hundred two |

0:03:43 | you're two |

0:03:44 | so next we added the w c n and the loop |

0:03:50 | just after the lda |

0:03:51 | so he had for the svm would not get any improvement |

0:03:55 | but for |

0:03:57 | for the lda that would explain next slide we will got the w c and |

0:04:02 | was happily but |

0:04:03 | so here's |

0:04:05 | so he is a bit the a so we use our scalability implementation of the |

0:04:08 | standard lda |

0:04:09 | and does so the scores are the likelihood ratio between the average i-vectors of the |

0:04:14 | target speaker and the test i-vector not as he that the i-vector the average i-vectors |

0:04:19 | not normalized in this case which is not the case for the svm |

0:04:24 | so here also again we can get additional improvements with the mindcf of zero point |

0:04:28 | the two hundred it and i two |

0:04:32 | afterward we tried the some |

0:04:35 | we tried some score normalisation ideas |

0:04:38 | actually that i tried that you know i tried the |

0:04:42 | s-norm and others and one that was the working the best is a small |

0:04:47 | but i will also come back to the slated |

0:04:50 | as so actually a small usually what was used only at the recognition level but |

0:04:54 | i also applied as a clustering so he'll when we apply that's clustering we can |

0:04:59 | we can get additional improvement to the even dcf of zero a zero point the |

0:05:03 | two hundred eighty six |

0:05:06 | then i applied this if one at the after a lda scoring and you can |

0:05:13 | get also another jumping performance of the mindcf of zero point two how that the |

0:05:16 | fifty and eight and this was a system that was submitted as a dateline |

0:05:21 | at the design that line of the evaluation |

0:05:25 | afterward i thought also i that idea which replace this cosine create a score a |

0:05:30 | clustering by svm clustering which is also done in iraq and a manner |

0:05:36 | and also we can get them into several as and |

0:05:39 | additional improvements to the mindcf of the |

0:05:42 | zero point two hundred the forty seven which is very close to the best performing |

0:05:45 | system |

0:05:46 | so we now system this is more or less than i hit of the |

0:05:49 | just the pushing of our system |

0:05:51 | we don't have usually don't have quality measures |

0:05:53 | function |

0:05:56 | so that's it after afterward we tried the so i was trained with the clustering |

0:06:02 | so for the clustering |

0:06:05 | okay clustering was already study in the charts are four i-vectors in either support unsupervised |

0:06:12 | the manner or supervised manner for example the work from mit on cosine bayes k-means |

0:06:18 | clustering in which the number of clusters is known a priori and which because they |

0:06:22 | would what you want composition conversational a telephone speech |

0:06:26 | and then the improve the system by using good basic spectral clustering i don't with |

0:06:30 | a simple heuristic that the that in that computing the number of cluster automatically |

0:06:36 | other words from cream what using the cosine based the mean shift clustering |

0:06:41 | so wouldn't post methods all if i'm not among all use cosine does the scoring |

0:06:47 | other method used to provide the clustering like the one from you where the used |

0:06:52 | integer linear programming |

0:06:55 | but their method there a distance metric i think a small amount of this |

0:06:59 | requires labeled training data |

0:07:01 | in order to compute the within class at companies matrix |

0:07:04 | other works from the project at all when using the p at a |

0:07:09 | based clustering but of course this vad a needs labeled the external unlabeled data to |

0:07:14 | remote two |

0:07:15 | to compute the lda model and then of to do the similar to compute of |

0:07:21 | similarity measure and the iraqi could and do the iraqi plastic |

0:07:27 | so actually we tried different kind of clustering i'm not gonna going to ten and |

0:07:31 | all of them one of those was the ward clustering |

0:07:34 | and actually so it is also known also provides you don't clustering with the goal |

0:07:38 | is to optimize an overall objective functions by function by minimizing the within class scatter |

0:07:45 | this clustering is very fast |

0:07:48 | since its use lance williams algorithm |

0:07:50 | in a recursive manner |

0:07:52 | like in a recursive manner |

0:07:56 | and the actually the problem of this algorithm |

0:08:00 | is that it needs euclidean distance to be to be to be good |

0:08:05 | and the problem |

0:08:06 | it was shown in this work that the cost the euclidean this is not as |

0:08:10 | good as the cosine distance |

0:08:12 | what the as a cluster that we tried is what i quit the cosine ple |

0:08:16 | clustering so it's two-step clustering |

0:08:18 | what the first one is based on cosine |

0:08:22 | cosine measure |

0:08:23 | so |

0:08:24 | actually after each iteration the similarity measure is updated by the computing the cosine measure |

0:08:30 | between average i-vector of the resulting clusters |

0:08:32 | and the here the we decide to stop early in the clustering process in order |

0:08:37 | to ensure high purity clusters |

0:08:41 | so once we have this a first set of cluster because we can would step |

0:08:45 | a second us a step of clusters is the |

0:08:48 | s dataset is that is second step of clustering which debate on the lda |

0:08:53 | and actually we did it so somehow differently from others so actually we |

0:08:58 | we after each iteration we could train the p lda model and compute i again |

0:09:06 | the this is a bit i can similar to make a matrix |

0:09:10 | and the but since this is hot somehow posterior doing it we would we get |

0:09:14 | every five hundred |

0:09:17 | merged |

0:09:19 | so i'm gonna show this |

0:09:22 | this figure that the show them as the evaluation of a mindcf in terms of |

0:09:26 | the clustering process |

0:09:29 | on the progress set using as back and the bit happier days a model scoring |

0:09:35 | so as we see boasts |

0:09:38 | what clustering which is in blue and cosine classical sample at clustering which is in |

0:09:42 | that we can get better performance so then |

0:09:46 | baseline system and also we can see that consecutive clustering is much better than the |

0:09:50 | ward clustering |

0:09:52 | and the best the heat in this experiment the best the results were obtained was |

0:09:55 | a number of clusters of sixteen fell |

0:10:01 | let me now look a bit of the score normalisation |

0:10:04 | so as i say the we try to think of kind of normalization one of |

0:10:07 | the most the successful one was introduced by professor can but and he's as soon |

0:10:13 | then i think energy models on the paper |

0:10:16 | so this actually works quite nice in was unlabeled code set which is the case |

0:10:22 | in our that's not you |

0:10:24 | so as a set i use it for both a recognition and clustering so few |

0:10:29 | for recognition |

0:10:30 | the core set |

0:10:31 | that i used was all the development set |

0:10:33 | so the thirty six on the |

0:10:37 | i-vectors and what i took that the top-k neighbours |

0:10:41 | neighbours the i-vector to the propose but target the speech i-vector and the test i-vector |

0:10:48 | so use the formalize you see it's a symmetric form a lot |

0:10:52 | so we have mu and sigma involve this formal or more you the mean you |

0:10:57 | kate by for instance just means that |

0:10:59 | we take the top the one thousand five hundred the scores |

0:11:05 | that are scores that of the highest for |

0:11:08 | target speaker for the target speaker and then we do this and c and the |

0:11:12 | same for some there's the duration and that's one |

0:11:15 | so we have more or less the same formula that was used for the |

0:11:19 | for clustering |

0:11:20 | and he it but you of course it's between |

0:11:23 | two plus two pair of a pair of clusters |

0:11:26 | and the cohort set in this case is actually all the |

0:11:30 | what the average i-vectors that what that are not concern in this and this measures |

0:11:36 | so please or dialect or the clusters |

0:11:38 | but not see wanted one this one |

0:11:41 | so |

0:11:44 | that's that i'm gonna |

0:11:46 | conclude so |

0:11:47 | actually in this but this evaluation was very helpful for us we learn a lot |

0:11:51 | of things and the |

0:11:53 | and it was i mean and also the by special successful |

0:11:57 | so |

0:11:59 | and also we don't that clustering is |

0:12:02 | what important |

0:12:03 | and also the adaptive a symmetric normalization |

0:12:06 | this is that's can be reproduced with the with our open-source libraries that the |

0:12:11 | that you can see this link and we also you can |

0:12:16 | use you know what it and icassp paper |

0:12:20 | as future work and its you start working nist on it is |

0:12:24 | and how to automatically |

0:12:26 | addicted mind that of the stopping criteria criterion the clustering process and actually we have |

0:12:30 | some ideas |

0:12:31 | but i hope we can lead such a shared with you guys |

0:12:34 | so like the variation of the number of the mindcf on the development set and |

0:12:38 | the variation of the number of clusters of nothing written a clusters |

0:12:42 | and also possible use of spectral clustering |

0:12:46 | and so one a good idea for next the maybe for next evaluation that could |

0:12:51 | be considered |

0:12:52 | because it's because of its potential application is the somewhat supervised the clustering |

0:12:58 | so actually here there's many techniques emotionally that were that are order to use like |

0:13:02 | co-training and others |

0:13:04 | thank you for |

0:13:27 | congratulations that was very good system without fusion getting these results is amazing i have |

0:13:34 | the slight impression that you make the distinction between a supervised and unsupervised if you |

0:13:40 | can go back pieces like |

0:13:42 | i could easy to go back then slide |

0:13:47 | well i think this distinction is a little bit arbitrary a good as the unsupervised |

0:13:53 | we use the tree with that muhammad since i we used we try to use |

0:13:59 | labels and of course to what's better in the best results we demonstrated was it |

0:14:04 | was of course |

0:14:05 | it's always good a good idea is like some labels if you have them and |

0:14:08 | my impression is that the only way to get a fully unsupervised clustering without knowing |

0:14:14 | the number of classes is a more like model is bayesian method although in the |

0:14:20 | main see if there are some tricks in they if you check the original paper |

0:14:24 | of common each you you'll see that there are some tricks yet you can do |

0:14:28 | in that are successful in a much processing so you can somehow estimate the number |

0:14:33 | of classes but i think that's also the guys from |

0:14:40 | from liam that you have the must supervised |

0:14:44 | they use it also with the stander prewhitening without even |

0:14:49 | i getting about the labels and this and the system works fine as well so |

0:14:54 | it's a little bit are better for me this these distinctions not |

0:14:58 | it i think and my sense the |

0:15:01 | supplies an unsupervised adjust the |

0:15:04 | i in the sense of labeled around the unlabeled training data to |

0:15:08 | and actually i think |

0:15:21 | i just have a question of outdoor your svm you said you use this single |

0:15:25 | positive examples from the averaged i-vector instead of |

0:15:30 | five was that the examples of you try both |

0:15:33 | i that the |

0:15:35 | so you see a number of summation i tried many and actually |

0:15:38 | as so this one this one was what you the best |

0:15:42 | and i forgot to mention that it's was in and by used the |

0:15:46 | it's not you will the weights would like zero point one for positive ends you |

0:15:50 | want mine for negative |

0:15:52 | so that's but i think it's not it's not |

0:15:54 | well we gain bit by doing this |

0:15:57 | it's more or less the same if use the |

0:16:01 | but i mean |

0:16:03 | by the |

0:16:05 | by nist in a t |

0:16:07 | but as an online |

0:16:17 | the new could just the |

0:16:19 | say what they wanted to say about is em so i just have a comment |

0:16:27 | i never had the or progress |

0:16:30 | slide like you a the one you sure in |

0:16:33 | you of the third slide so |

0:16:36 | when you're developing the system you had the only progress which is |

0:16:41 | wonderful a really wonderful situation for to be more |

0:16:46 | almost it's very interesting for us to know also or you negative trials what you |

0:16:52 | tried and what was not efficient |

0:16:56 | during the development of who systems it's a somewhat the but it's interesting for me |

0:17:01 | that's true and |

0:17:02 | well if a file if i want to talk what about the things that did |

0:17:05 | not work i think it takes |

0:17:07 | but whatever that's |

0:17:20 | so you show some |

0:17:24 | different approaches for clustering but like you is just a few of the system |

0:17:33 | when distance slightly and the stuff to get a the backend was only the lda |

0:17:38 | the system |

0:17:39 | you |

0:17:41 | i combination different back end and i try |

0:17:47 | it was also |

0:17:49 | a different from the others i |

0:17:52 | put me it was not |

0:17:53 | maybe |

0:17:54 | some small gain something forensic |

0:17:57 | i guess regression and you |

0:18:02 | use measure |

0:18:10 | i think the that at the adaptive score normalization was doing the work of what |

0:18:15 | the |

0:18:17 | a quality measure was do we get for others i think it was also that |

0:18:23 | you know find |

0:18:24 | what the |