0:00:16 | i am certainly not myself and that would like to |
---|
0:00:21 | tell you |
---|
0:00:22 | about our |
---|
0:00:24 | system for the nist i-vector challenge |
---|
0:00:29 | so |
---|
0:00:30 | the old land of my topic is false |
---|
0:00:33 | first |
---|
0:00:36 | i would like to |
---|
0:00:37 | show your overall system description |
---|
0:00:41 | is then i will be the i will describe i clustering program and a |
---|
0:00:47 | next |
---|
0:00:48 | i can stick one went we will present so |
---|
0:00:53 | our subsystems |
---|
0:00:55 | like i-vector p l d subsystem be vector |
---|
0:00:59 | r b m or dbn i p l d subsystem |
---|
0:01:04 | and the last one i-vector lda svm subsystems |
---|
0:01:10 | next so i would talk about |
---|
0:01:12 | mark while the matter function to incorporate |
---|
0:01:18 | test duration information |
---|
0:01:20 | in scoring |
---|
0:01:21 | and the |
---|
0:01:24 | next so |
---|
0:01:25 | subsystem fusion really present that and finally i will |
---|
0:01:32 | the present so our results and so i will make conclusions |
---|
0:01:40 | let's min |
---|
0:01:42 | show you overall system description |
---|
0:01:45 | yes you can see we |
---|
0:01:49 | exploring different systems |
---|
0:01:53 | subsystems |
---|
0:01:54 | idea to build the that's a standard one and |
---|
0:01:58 | state-of-the-art systems the speaker recognition task |
---|
0:02:04 | the same no and |
---|
0:02:07 | some noble systems also |
---|
0:02:11 | and was aware used |
---|
0:02:13 | aside just our bn or d b and b vectors |
---|
0:02:17 | subsystems |
---|
0:02:19 | which is based on a p l d's tandem be of the model |
---|
0:02:25 | and the last one is |
---|
0:02:27 | and |
---|
0:02:28 | well known lda svm subsystem based on i-vectors |
---|
0:02:37 | we made a fusion or four |
---|
0:02:39 | our different combinations or for our systems |
---|
0:02:44 | and also we take we took into account so quality measure function and so we |
---|
0:02:52 | incorporated test duration information |
---|
0:02:56 | two |
---|
0:02:57 | it's a good scoring results |
---|
0:03:05 | so |
---|
0:03:06 | our system was developed by different although simultaneously |
---|
0:03:11 | and that the let us to |
---|
0:03:14 | different clustering algorithms |
---|
0:03:17 | to the different subsystems |
---|
0:03:20 | as you can see for |
---|
0:03:23 | the lda are be |
---|
0:03:25 | the l b |
---|
0:03:26 | subsystem be used |
---|
0:03:29 | clustering algorithm one |
---|
0:03:32 | and for the |
---|
0:03:33 | lda svm subsystem we |
---|
0:03:37 | have developed |
---|
0:03:38 | its own clustering |
---|
0:03:41 | algorithm which name is order and two |
---|
0:03:48 | so few words about the clustering problem |
---|
0:03:51 | with the which we |
---|
0:03:53 | we're |
---|
0:03:54 | do their thing |
---|
0:03:56 | so |
---|
0:03:59 | first so we try to use sound a standard |
---|
0:04:02 | techniques for clustering such as |
---|
0:04:04 | kind means and bottoms |
---|
0:04:06 | but we didn't succeed with |
---|
0:04:10 | those techniques |
---|
0:04:12 | and the |
---|
0:04:14 | there are two empirical established back from the speaker recognition |
---|
0:04:18 | which are can help us |
---|
0:04:21 | first of them is that the cosine metric is a kind meaning comparison metric and |
---|
0:04:26 | on vector space and the second so you the |
---|
0:04:29 | that the model a raging normalized a vector is |
---|
0:04:34 | consider the most efficient model the session |
---|
0:04:37 | model |
---|
0:04:38 | so |
---|
0:04:39 | we decided to use for initial clustering step only for initial clustering step |
---|
0:04:45 | cosine distance |
---|
0:04:49 | next we try to used to build a big would be very clustering strategy |
---|
0:04:55 | after there is of course |
---|
0:04:58 | cosine initial clustering step |
---|
0:05:01 | it's makes sense to use a more efficient bill dimitri |
---|
0:05:04 | which explicitly takes into account |
---|
0:05:07 | between speaker or within speaker variability |
---|
0:05:11 | so you can see the |
---|
0:05:13 | this scheme all the |
---|
0:05:15 | you'll do we clustering on this line |
---|
0:05:18 | but we |
---|
0:05:20 | manage |
---|
0:05:22 | with only one iteration |
---|
0:05:24 | we obtain good results are on the after the first iteration of the p l |
---|
0:05:29 | d requires three |
---|
0:05:31 | so we did |
---|
0:05:33 | cosine into the station then the lda training and |
---|
0:05:37 | building a tree clustering |
---|
0:05:41 | we a deed |
---|
0:05:43 | sites you know four bars |
---|
0:05:45 | using a bus |
---|
0:05:47 | algorithm one em algorithm two |
---|
0:05:52 | no i should mention about |
---|
0:05:56 | and b lda model because i will need |
---|
0:05:59 | some |
---|
0:06:00 | parameter names on the next slides |
---|
0:06:03 | so we used on our model |
---|
0:06:08 | and the number or for eigenvoice matrix a eigenvoice voices source and the one and |
---|
0:06:15 | the number of eigen channels was and two |
---|
0:06:22 | well |
---|
0:06:23 | first |
---|
0:06:23 | clustering algorithm consist of two stage |
---|
0:06:27 | states |
---|
0:06:28 | and so |
---|
0:06:29 | but you're stage is |
---|
0:06:31 | and every stick also watch |
---|
0:06:33 | for the clusters |
---|
0:06:35 | it is |
---|
0:06:37 | like i mean shift |
---|
0:06:38 | clustering algorithm |
---|
0:06:41 | so we step by step find |
---|
0:06:43 | the clusters |
---|
0:06:46 | using mean shift |
---|
0:06:48 | algorithm |
---|
0:06:51 | and the second stage we try to compensates the hero all |
---|
0:06:57 | the weighting for one speaker i-vectors to diff |
---|
0:07:02 | one different in different clusters |
---|
0:07:06 | so we used |
---|
0:07:07 | a simple bottom-up stage of the |
---|
0:07:10 | agglomerative hierarchical clustering |
---|
0:07:13 | and so |
---|
0:07:14 | use a simple repeat until up |
---|
0:07:17 | i'll |
---|
0:07:20 | they also you can see the reference |
---|
0:07:22 | to the mean shift clustering |
---|
0:07:24 | our viewers told us about |
---|
0:07:27 | that our algorithm is very similar to the |
---|
0:07:32 | two |
---|
0:07:33 | that's it is described |
---|
0:07:35 | in this or |
---|
0:07:39 | our seconds algorithm is just a sound or standard |
---|
0:07:45 | agglomerative four |
---|
0:07:47 | bottom-up stage of h t algorithm and it is else a used i it is |
---|
0:07:54 | also uses a course |
---|
0:07:57 | cosine or plp matrix |
---|
0:08:00 | and so |
---|
0:08:01 | the threshold tower three is involved |
---|
0:08:04 | two |
---|
0:08:06 | for stopping criterion |
---|
0:08:10 | the next slide i |
---|
0:08:15 | show you |
---|
0:08:17 | i will show you |
---|
0:08:19 | the same with some parameters |
---|
0:08:21 | and it's values |
---|
0:08:23 | for initial post clustering we used to |
---|
0:08:30 | such condition such conditions |
---|
0:08:33 | that our threshold from |
---|
0:08:35 | first and second stage |
---|
0:08:37 | or was equal |
---|
0:08:39 | and so |
---|
0:08:41 | were you go and the equal zero point twenty nine |
---|
0:08:46 | we used to sixty a |
---|
0:08:48 | sixteen the random clustering integerization |
---|
0:08:52 | and also we |
---|
0:08:54 | use the rules that no liz and two and no more than |
---|
0:08:58 | fifteen fifty vectors |
---|
0:09:01 | or could be |
---|
0:09:03 | in |
---|
0:09:04 | a cluster one cluster |
---|
0:09:06 | because |
---|
0:09:08 | l so it should be mentioned that the p lda clustering |
---|
0:09:14 | was done using simplified the lda model |
---|
0:09:18 | so we i used |
---|
0:09:20 | the three hundred eigenvoices |
---|
0:09:23 | and the used full covariance noise model |
---|
0:09:27 | for such a case |
---|
0:09:29 | the threshold tall one was equal negative zero point two |
---|
0:09:34 | and shower |
---|
0:09:35 | two was |
---|
0:09:38 | zero point twenty two |
---|
0:09:40 | nine |
---|
0:09:42 | and for a clustering who we will use the rules a normal it's and three |
---|
0:09:47 | and no more than |
---|
0:09:48 | fifty i-vectors |
---|
0:09:52 | jolt |
---|
0:09:53 | would be chosen |
---|
0:09:56 | for algorithm two |
---|
0:09:58 | would be used to the value |
---|
0:10:00 | that was three |
---|
0:10:01 | which was people zero point forty three and we also used simplified really model but |
---|
0:10:08 | the different is that we used only |
---|
0:10:10 | the diagonal covariance noise maddox |
---|
0:10:14 | and the |
---|
0:10:15 | there was another rule |
---|
0:10:18 | no list three and no more than |
---|
0:10:20 | so directors in clusters |
---|
0:10:26 | well |
---|
0:10:27 | for as their bodies and false or our experiments |
---|
0:10:31 | we use we used another plp model |
---|
0:10:36 | which |
---|
0:10:38 | two into cannot you count channel factors |
---|
0:10:41 | and to be used only diagonal covariance matrix |
---|
0:10:45 | so in our case |
---|
0:10:48 | and one was required to achieve d and two was |
---|
0:10:54 | fifty five |
---|
0:10:56 | model training or to build the i-vector purity system |
---|
0:11:03 | have to be made using curve the results of for the algorithm one clustering |
---|
0:11:10 | for the initialisation all their eigenvoice maddox we may have used you see |
---|
0:11:16 | and the |
---|
0:11:19 | it to have been mentioned that only one ml duration you maximum likelihood duration is |
---|
0:11:25 | need |
---|
0:11:26 | you we will eight |
---|
0:11:31 | next iteration you'd so we'll that best to some degradation |
---|
0:11:37 | a few words about a b m p l d system |
---|
0:11:41 | and we can use it's to |
---|
0:11:44 | extract |
---|
0:11:47 | you be vectors from our i-vector |
---|
0:11:50 | i-vectors |
---|
0:11:51 | so it is not so strictly speaking it is not |
---|
0:11:55 | and extractor but it is and non-linear project of role i-vector space to be i-vector |
---|
0:12:01 | space which incorporate the not information or to the |
---|
0:12:06 | speaker verification task |
---|
0:12:09 | so we now simply used |
---|
0:12:12 | probably in training for their |
---|
0:12:14 | classification task |
---|
0:12:17 | two |
---|
0:12:18 | obtain german |
---|
0:12:19 | distribute distribution all the i-vectors and its |
---|
0:12:24 | the labels |
---|
0:12:28 | and also we try to use so |
---|
0:12:30 | additional hidden line |
---|
0:12:33 | with |
---|
0:12:34 | unsupervised training |
---|
0:12:38 | and the in this case the number or for a new rounds or for first |
---|
0:12:44 | wire was two thousand and the number all |
---|
0:12:47 | neurons of softmax lie was five hundred |
---|
0:12:52 | just that's in the previous one |
---|
0:12:54 | where are |
---|
0:12:56 | each |
---|
0:12:58 | was equal |
---|
0:13:01 | five hundred |
---|
0:13:04 | so what is to be reactive |
---|
0:13:08 | we used posterior or posteriors of the softmax layer to obtain our be vectors by |
---|
0:13:14 | using |
---|
0:13:15 | p c and the |
---|
0:13:19 | we see projection all the local posteriors |
---|
0:13:23 | in the low dimensional space |
---|
0:13:25 | so in our case |
---|
0:13:28 | the number was |
---|
0:13:30 | and see it was equal to |
---|
0:13:33 | number all near on solve who he don't lie and |
---|
0:13:38 | what equal five |
---|
0:13:41 | but for that be vector p l b vector space be used |
---|
0:13:47 | another be lda model which is different from the i-vector space |
---|
0:13:53 | we use the number of for each invoice four hundred and the in such a |
---|
0:13:59 | case to be used a simplified be of v mobile |
---|
0:14:05 | so |
---|
0:14:07 | lda svm as the have been mentioned |
---|
0:14:11 | before used to |
---|
0:14:13 | rusting algorithm to and tusks score normalization procedure yes it's normalization |
---|
0:14:21 | few worst about well to measure function |
---|
0:14:23 | we it is well-known that the a threshold of the mean decision cost |
---|
0:14:31 | function depends on |
---|
0:14:34 | test |
---|
0:14:35 | and roll |
---|
0:14:37 | segment duration |
---|
0:14:39 | and to take intake for so i in the nist i-vector challenge of a deal |
---|
0:14:44 | with we don't with |
---|
0:14:46 | multi session and role model |
---|
0:14:49 | and the |
---|
0:14:52 | every duration also and role model is much better a much larger than the duration |
---|
0:14:59 | of the test models |
---|
0:15:00 | so we ignored the dependence |
---|
0:15:03 | one there |
---|
0:15:05 | and roll durations |
---|
0:15:07 | and so we |
---|
0:15:09 | focused |
---|
0:15:10 | on the explore investigation all the dependence on the test |
---|
0:15:15 | duration |
---|
0:15:17 | so we did it using power |
---|
0:15:20 | clustering results |
---|
0:15:21 | we |
---|
0:15:23 | prepare |
---|
0:15:24 | some protocols |
---|
0:15:26 | five session |
---|
0:15:27 | and roll protocols and to be obtained and several points |
---|
0:15:33 | and the also obtained linear dependence |
---|
0:15:37 | well the threshold |
---|
0:15:40 | front |
---|
0:15:41 | locally from both |
---|
0:15:43 | this duration |
---|
0:15:44 | but |
---|
0:15:48 | it should be mentioned that |
---|
0:15:51 | though who are very from function no could be replaced by the |
---|
0:15:56 | power function for example |
---|
0:15:59 | the |
---|
0:16:00 | square root |
---|
0:16:02 | the because of similar bic a or |
---|
0:16:06 | those function |
---|
0:16:09 | for of system fusion we used a simple |
---|
0:16:14 | linear combination weighted sum |
---|
0:16:17 | well the scores but to be also |
---|
0:16:21 | we need to some sigma normalising a fusion |
---|
0:16:26 | for c lda svm subs system |
---|
0:16:30 | it equals one but for a other subsystems it it's |
---|
0:16:38 | before |
---|
0:16:41 | so to results |
---|
0:16:43 | first |
---|
0:16:45 | i will show you |
---|
0:16:47 | our results |
---|
0:16:49 | with incorporating hopeful to duration information so they can see that using |
---|
0:16:55 | quite a measure of for function let us a two |
---|
0:17:00 | significantly to reduce |
---|
0:17:04 | minimum decision cost function |
---|
0:17:07 | and i guess |
---|
0:17:08 | requires the reduction |
---|
0:17:11 | for me minimum decision cost function by ten percent |
---|
0:17:16 | for lda svm subsystem but for final fusion with equal weights |
---|
0:17:25 | it's also achieve achieves good performance break seven thousand |
---|
0:17:30 | relative |
---|
0:17:35 | no about the pure sound or for i-vector and be vector |
---|
0:17:40 | space purity models |
---|
0:17:42 | and scores of this model |
---|
0:17:45 | so |
---|
0:17:47 | it's |
---|
0:17:49 | we a obtain and |
---|
0:17:51 | we obtain so |
---|
0:17:54 | and reduction of the mean decision cost function this is you to the fact that |
---|
0:17:59 | the |
---|
0:18:01 | r b m or dbn presents non-linear |
---|
0:18:06 | transform want the i-vector space it's a it's a little with us to make that |
---|
0:18:12 | few room |
---|
0:18:13 | such systems |
---|
0:18:19 | no for |
---|
0:18:21 | lda and r b m field is subsystems pure and b |
---|
0:18:25 | at your good results |
---|
0:18:29 | but the weights aurora on equal |
---|
0:18:31 | different that we are there have optimize it by submissions |
---|
0:18:36 | and v the habit you |
---|
0:18:39 | zero point the two |
---|
0:18:41 | four and one |
---|
0:18:45 | and the to the our best results |
---|
0:18:49 | we just consists of four |
---|
0:18:51 | three subsystems |
---|
0:18:53 | of the svm subsystems are be mpo this subsystem and ubm the only subsystem |
---|
0:18:59 | or |
---|
0:19:00 | in such a case the dbn plp |
---|
0:19:04 | you it gave us a little bit more information |
---|
0:19:07 | for the verification and we managed to achieve |
---|
0:19:13 | zero point two |
---|
0:19:15 | three nine results |
---|
0:19:17 | results |
---|
0:19:18 | which is the best one |
---|
0:19:21 | and took |
---|
0:19:22 | conclusion |
---|
0:19:23 | we have presented so our system which consist of |
---|
0:19:27 | p obviously it'll d and their bm systems |
---|
0:19:32 | we present its agglomerative clustering algorithms |
---|
0:19:38 | they also combination of the lda and l die it'll d is frames systems |
---|
0:19:44 | use |
---|
0:19:45 | different clustering algorithm |
---|
0:19:47 | this resulted in effect if you're one |
---|
0:19:50 | and a nonlinear transformation of |
---|
0:19:53 | i-vectors in be vector space |
---|
0:19:55 | it also |
---|
0:19:57 | leads to successful fusion |
---|
0:20:02 | classical i-vector systems |
---|
0:20:06 | so that's all |
---|
0:20:32 | i have also congratulations a i just wanna one ask you the use of the |
---|
0:20:38 | mean six outweighs more version of mincing start with |
---|
0:20:43 | did you compare its for example with that standard right of clustering |
---|
0:20:48 | to see how much gain from using this algorithm |
---|
0:20:53 | yes we did it and the |
---|
0:20:56 | a you can see that to be used the algorithm to and we try to |
---|
0:21:00 | use a great and two clustering for training the p l d model |
---|
0:21:06 | and the algorithm to is just an bottom-up stage hands says honour one and the |
---|
0:21:13 | it's let us to |
---|
0:21:16 | some degradation the mean shift the was |
---|
0:21:20 | better |
---|
0:21:21 | for this task |
---|
0:21:23 | specially for p l d train |
---|
0:21:26 | and |
---|