0:00:15i present the other words that we did the
0:00:19our first speech to the i-vector challenge
0:00:21and actually that is in just the slides it is some more that was not
0:00:27presented in the paper
0:00:29but the was submitted the a system description i think that this was to
0:00:33we should with you with you guys
0:00:35so
0:00:38here's outline of my talk so first i will present the
0:00:41of the progress of our system
0:00:43and then i will a detailed to work to ideas that are the class training
0:00:47and the score normalisation for comp losing computing the stock
0:00:52so
0:00:54so we it is
0:00:54the time of the panel for the mindcf
0:00:58for also for our system
0:01:00so for starting from the baseline was they
0:01:03min dcf of zero point three hundred the at six
0:01:06we end up with the mean dcf of zero point two hundred the forty seven
0:01:09which makes a
0:01:11relative improvement of about thirty six percent
0:01:14so i'm gonna present the this is the main a direct in the i graphical
0:01:20manner so we have the development set and have the evaluation set that is
0:01:23split into enrollment and test
0:01:25and we have this these the three steps that was the in that baseline so
0:01:31we have the whitening the nickel normalisation and the cosine scoring
0:01:34and as we see that only whitening need the training
0:01:38and do so we don't they really need the at the label of
0:01:43of the development set for that
0:01:45so static from this the baseline us something we get that can be done is
0:01:50if we can better choose the of the data the data for the whitening
0:01:55i mean if we take only that the
0:01:57the you tenants with more than thirty five seconds no id experiments
0:02:01we will are getting like
0:02:03what it is some improvements with the mean dcf of zero point three hundred seventy
0:02:06two
0:02:08so after afterward i what i'm gonna use that this the a conditioned i-vectors so
0:02:12i'm gonna use this deaf twenty about two and later experiments
0:02:16like to systems
0:02:18so
0:02:19so all the next step that we did is the clustering
0:02:23so
0:02:25is a clustering so actually tried different kind of clustering and then i'm gonna come
0:02:29back this is just later on but the one of the best clustering that you're
0:02:33getting is that what you called the cosine be any clustering
0:02:36and so actually
0:02:39after this clustering we take only the
0:02:41the clusters that have more than a to i-vectors in it
0:02:46and we and we apply and now we can apply like
0:02:50supervised based techniques like lda be at a double c and muppets
0:02:54so here we just a this study at and clustering in the loop and you
0:02:59can see that we can already get some improvements women dcf of zero point three
0:03:04hundred three hundred fifty six
0:03:09so what we tried next is
0:03:12less to place the cosine scoring by about the kind of scrollings force of for
0:03:16them was the svm
0:03:17so actually here the so we trained a linear svm for every target speaker
0:03:23what the positive we have only one positive samples of the next normalized
0:03:27i-vector of the target speaker and the negative samples are the next normalize i-vector of
0:03:32of the processed the
0:03:35development set
0:03:37so we had we can get some jump more miss with the mindcf of three
0:03:41hundred two
0:03:43you're two
0:03:44so next we added the w c n and the loop
0:03:50just after the lda
0:03:51so he had for the svm would not get any improvement
0:03:55but for
0:03:57for the lda that would explain next slide we will got the w c and
0:04:02was happily but
0:04:03so here's
0:04:05so he is a bit the a so we use our scalability implementation of the
0:04:08standard lda
0:04:09and does so the scores are the likelihood ratio between the average i-vectors of the
0:04:14target speaker and the test i-vector not as he that the i-vector the average i-vectors
0:04:19not normalized in this case which is not the case for the svm
0:04:24so here also again we can get additional improvements with the mindcf of zero point
0:04:28the two hundred it and i two
0:04:32afterward we tried the some
0:04:35we tried some score normalisation ideas
0:04:38actually that i tried that you know i tried the
0:04:42s-norm and others and one that was the working the best is a small
0:04:47but i will also come back to the slated
0:04:50as so actually a small usually what was used only at the recognition level but
0:04:54i also applied as a clustering so he'll when we apply that's clustering we can
0:04:59we can get additional improvement to the even dcf of zero a zero point the
0:05:03two hundred eighty six
0:05:06then i applied this if one at the after a lda scoring and you can
0:05:13get also another jumping performance of the mindcf of zero point two how that the
0:05:16fifty and eight and this was a system that was submitted as a dateline
0:05:21at the design that line of the evaluation
0:05:25afterward i thought also i that idea which replace this cosine create a score a
0:05:30clustering by svm clustering which is also done in iraq and a manner
0:05:36and also we can get them into several as and
0:05:39additional improvements to the mindcf of the
0:05:42zero point two hundred the forty seven which is very close to the best performing
0:05:45system
0:05:46so we now system this is more or less than i hit of the
0:05:49just the pushing of our system
0:05:51we don't have usually don't have quality measures
0:05:53function
0:05:56so that's it after afterward we tried the so i was trained with the clustering
0:06:02so for the clustering
0:06:05okay clustering was already study in the charts are four i-vectors in either support unsupervised
0:06:12the manner or supervised manner for example the work from mit on cosine bayes k-means
0:06:18clustering in which the number of clusters is known a priori and which because they
0:06:22would what you want composition conversational a telephone speech
0:06:26and then the improve the system by using good basic spectral clustering i don't with
0:06:30a simple heuristic that the that in that computing the number of cluster automatically
0:06:36other words from cream what using the cosine based the mean shift clustering
0:06:41so wouldn't post methods all if i'm not among all use cosine does the scoring
0:06:47other method used to provide the clustering like the one from you where the used
0:06:52integer linear programming
0:06:55but their method there a distance metric i think a small amount of this
0:06:59requires labeled training data
0:07:01in order to compute the within class at companies matrix
0:07:04other works from the project at all when using the p at a
0:07:09based clustering but of course this vad a needs labeled the external unlabeled data to
0:07:14remote two
0:07:15to compute the lda model and then of to do the similar to compute of
0:07:21similarity measure and the iraqi could and do the iraqi plastic
0:07:27so actually we tried different kind of clustering i'm not gonna going to ten and
0:07:31all of them one of those was the ward clustering
0:07:34and actually so it is also known also provides you don't clustering with the goal
0:07:38is to optimize an overall objective functions by function by minimizing the within class scatter
0:07:45this clustering is very fast
0:07:48since its use lance williams algorithm
0:07:50in a recursive manner
0:07:52like in a recursive manner
0:07:56and the actually the problem of this algorithm
0:08:00is that it needs euclidean distance to be to be to be good
0:08:05and the problem
0:08:06it was shown in this work that the cost the euclidean this is not as
0:08:10good as the cosine distance
0:08:12what the as a cluster that we tried is what i quit the cosine ple
0:08:16clustering so it's two-step clustering
0:08:18what the first one is based on cosine
0:08:22cosine measure
0:08:23so
0:08:24actually after each iteration the similarity measure is updated by the computing the cosine measure
0:08:30between average i-vector of the resulting clusters
0:08:32and the here the we decide to stop early in the clustering process in order
0:08:37to ensure high purity clusters
0:08:41so once we have this a first set of cluster because we can would step
0:08:45a second us a step of clusters is the
0:08:48s dataset is that is second step of clustering which debate on the lda
0:08:53and actually we did it so somehow differently from others so actually we
0:08:58we after each iteration we could train the p lda model and compute i again
0:09:06the this is a bit i can similar to make a matrix
0:09:10and the but since this is hot somehow posterior doing it we would we get
0:09:14every five hundred
0:09:17merged
0:09:19so i'm gonna show this
0:09:22this figure that the show them as the evaluation of a mindcf in terms of
0:09:26the clustering process
0:09:29on the progress set using as back and the bit happier days a model scoring
0:09:35so as we see boasts
0:09:38what clustering which is in blue and cosine classical sample at clustering which is in
0:09:42that we can get better performance so then
0:09:46baseline system and also we can see that consecutive clustering is much better than the
0:09:50ward clustering
0:09:52and the best the heat in this experiment the best the results were obtained was
0:09:55a number of clusters of sixteen fell
0:10:01let me now look a bit of the score normalisation
0:10:04so as i say the we try to think of kind of normalization one of
0:10:07the most the successful one was introduced by professor can but and he's as soon
0:10:13then i think energy models on the paper
0:10:16so this actually works quite nice in was unlabeled code set which is the case
0:10:22in our that's not you
0:10:24so as a set i use it for both a recognition and clustering so few
0:10:29for recognition
0:10:30the core set
0:10:31that i used was all the development set
0:10:33so the thirty six on the
0:10:37i-vectors and what i took that the top-k neighbours
0:10:41neighbours the i-vector to the propose but target the speech i-vector and the test i-vector
0:10:48so use the formalize you see it's a symmetric form a lot
0:10:52so we have mu and sigma involve this formal or more you the mean you
0:10:57kate by for instance just means that
0:10:59we take the top the one thousand five hundred the scores
0:11:05that are scores that of the highest for
0:11:08target speaker for the target speaker and then we do this and c and the
0:11:12same for some there's the duration and that's one
0:11:15so we have more or less the same formula that was used for the
0:11:19for clustering
0:11:20and he it but you of course it's between
0:11:23two plus two pair of a pair of clusters
0:11:26and the cohort set in this case is actually all the
0:11:30what the average i-vectors that what that are not concern in this and this measures
0:11:36so please or dialect or the clusters
0:11:38but not see wanted one this one
0:11:41so
0:11:44that's that i'm gonna
0:11:46conclude so
0:11:47actually in this but this evaluation was very helpful for us we learn a lot
0:11:51of things and the
0:11:53and it was i mean and also the by special successful
0:11:57so
0:11:59and also we don't that clustering is
0:12:02what important
0:12:03and also the adaptive a symmetric normalization
0:12:06this is that's can be reproduced with the with our open-source libraries that the
0:12:11that you can see this link and we also you can
0:12:16use you know what it and icassp paper
0:12:20as future work and its you start working nist on it is
0:12:24and how to automatically
0:12:26addicted mind that of the stopping criteria criterion the clustering process and actually we have
0:12:30some ideas
0:12:31but i hope we can lead such a shared with you guys
0:12:34so like the variation of the number of the mindcf on the development set and
0:12:38the variation of the number of clusters of nothing written a clusters
0:12:42and also possible use of spectral clustering
0:12:46and so one a good idea for next the maybe for next evaluation that could
0:12:51be considered
0:12:52because it's because of its potential application is the somewhat supervised the clustering
0:12:58so actually here there's many techniques emotionally that were that are order to use like
0:13:02co-training and others
0:13:04thank you for
0:13:27congratulations that was very good system without fusion getting these results is amazing i have
0:13:34the slight impression that you make the distinction between a supervised and unsupervised if you
0:13:40can go back pieces like
0:13:42i could easy to go back then slide
0:13:47well i think this distinction is a little bit arbitrary a good as the unsupervised
0:13:53we use the tree with that muhammad since i we used we try to use
0:13:59labels and of course to what's better in the best results we demonstrated was it
0:14:04was of course
0:14:05it's always good a good idea is like some labels if you have them and
0:14:08my impression is that the only way to get a fully unsupervised clustering without knowing
0:14:14the number of classes is a more like model is bayesian method although in the
0:14:20main see if there are some tricks in they if you check the original paper
0:14:24of common each you you'll see that there are some tricks yet you can do
0:14:28in that are successful in a much processing so you can somehow estimate the number
0:14:33of classes but i think that's also the guys from
0:14:40from liam that you have the must supervised
0:14:44they use it also with the stander prewhitening without even
0:14:49i getting about the labels and this and the system works fine as well so
0:14:54it's a little bit are better for me this these distinctions not
0:14:58it i think and my sense the
0:15:01supplies an unsupervised adjust the
0:15:04i in the sense of labeled around the unlabeled training data to
0:15:08and actually i think
0:15:21i just have a question of outdoor your svm you said you use this single
0:15:25positive examples from the averaged i-vector instead of
0:15:30five was that the examples of you try both
0:15:33i that the
0:15:35so you see a number of summation i tried many and actually
0:15:38as so this one this one was what you the best
0:15:42and i forgot to mention that it's was in and by used the
0:15:46it's not you will the weights would like zero point one for positive ends you
0:15:50want mine for negative
0:15:52so that's but i think it's not it's not
0:15:54well we gain bit by doing this
0:15:57it's more or less the same if use the
0:16:01but i mean
0:16:03by the
0:16:05by nist in a t
0:16:07but as an online
0:16:17the new could just the
0:16:19say what they wanted to say about is em so i just have a comment
0:16:27i never had the or progress
0:16:30slide like you a the one you sure in
0:16:33you of the third slide so
0:16:36when you're developing the system you had the only progress which is
0:16:41wonderful a really wonderful situation for to be more
0:16:46almost it's very interesting for us to know also or you negative trials what you
0:16:52tried and what was not efficient
0:16:56during the development of who systems it's a somewhat the but it's interesting for me
0:17:01that's true and
0:17:02well if a file if i want to talk what about the things that did
0:17:05not work i think it takes
0:17:07but whatever that's
0:17:20so you show some
0:17:24different approaches for clustering but like you is just a few of the system
0:17:33when distance slightly and the stuff to get a the backend was only the lda
0:17:38the system
0:17:39you
0:17:41i combination different back end and i try
0:17:47it was also
0:17:49a different from the others i
0:17:52put me it was not
0:17:53maybe
0:17:54some small gain something forensic
0:17:57i guess regression and you
0:18:02use measure
0:18:10i think the that at the adaptive score normalization was doing the work of what
0:18:15the
0:18:17a quality measure was do we get for others i think it was also that
0:18:23you know find
0:18:24what the