0:00:16 | i am certainly not myself and that would like to |
---|---|

0:00:21 | tell you |

0:00:22 | about our |

0:00:24 | system for the nist i-vector challenge |

0:00:29 | so |

0:00:30 | the old land of my topic is false |

0:00:33 | first |

0:00:36 | i would like to |

0:00:37 | show your overall system description |

0:00:41 | is then i will be the i will describe i clustering program and a |

0:00:47 | next |

0:00:48 | i can stick one went we will present so |

0:00:53 | our subsystems |

0:00:55 | like i-vector p l d subsystem be vector |

0:00:59 | r b m or dbn i p l d subsystem |

0:01:04 | and the last one i-vector lda svm subsystems |

0:01:10 | next so i would talk about |

0:01:12 | mark while the matter function to incorporate |

0:01:18 | test duration information |

0:01:20 | in scoring |

0:01:21 | and the |

0:01:24 | next so |

0:01:25 | subsystem fusion really present that and finally i will |

0:01:32 | the present so our results and so i will make conclusions |

0:01:40 | let's min |

0:01:42 | show you overall system description |

0:01:45 | yes you can see we |

0:01:49 | exploring different systems |

0:01:53 | subsystems |

0:01:54 | idea to build the that's a standard one and |

0:01:58 | state-of-the-art systems the speaker recognition task |

0:02:04 | the same no and |

0:02:07 | some noble systems also |

0:02:11 | and was aware used |

0:02:13 | aside just our bn or d b and b vectors |

0:02:17 | subsystems |

0:02:19 | which is based on a p l d's tandem be of the model |

0:02:25 | and the last one is |

0:02:27 | and |

0:02:28 | well known lda svm subsystem based on i-vectors |

0:02:37 | we made a fusion or four |

0:02:39 | our different combinations or for our systems |

0:02:44 | and also we take we took into account so quality measure function and so we |

0:02:52 | incorporated test duration information |

0:02:56 | two |

0:02:57 | it's a good scoring results |

0:03:05 | so |

0:03:06 | our system was developed by different although simultaneously |

0:03:11 | and that the let us to |

0:03:14 | different clustering algorithms |

0:03:17 | to the different subsystems |

0:03:20 | as you can see for |

0:03:23 | the lda are be |

0:03:25 | the l b |

0:03:26 | subsystem be used |

0:03:29 | clustering algorithm one |

0:03:32 | and for the |

0:03:33 | lda svm subsystem we |

0:03:37 | have developed |

0:03:38 | its own clustering |

0:03:41 | algorithm which name is order and two |

0:03:48 | so few words about the clustering problem |

0:03:51 | with the which we |

0:03:53 | we're |

0:03:54 | do their thing |

0:03:56 | so |

0:03:59 | first so we try to use sound a standard |

0:04:02 | techniques for clustering such as |

0:04:04 | kind means and bottoms |

0:04:06 | but we didn't succeed with |

0:04:10 | those techniques |

0:04:12 | and the |

0:04:14 | there are two empirical established back from the speaker recognition |

0:04:18 | which are can help us |

0:04:21 | first of them is that the cosine metric is a kind meaning comparison metric and |

0:04:26 | on vector space and the second so you the |

0:04:29 | that the model a raging normalized a vector is |

0:04:34 | consider the most efficient model the session |

0:04:37 | model |

0:04:38 | so |

0:04:39 | we decided to use for initial clustering step only for initial clustering step |

0:04:45 | cosine distance |

0:04:49 | next we try to used to build a big would be very clustering strategy |

0:04:55 | after there is of course |

0:04:58 | cosine initial clustering step |

0:05:01 | it's makes sense to use a more efficient bill dimitri |

0:05:04 | which explicitly takes into account |

0:05:07 | between speaker or within speaker variability |

0:05:11 | so you can see the |

0:05:13 | this scheme all the |

0:05:15 | you'll do we clustering on this line |

0:05:18 | but we |

0:05:20 | manage |

0:05:22 | with only one iteration |

0:05:24 | we obtain good results are on the after the first iteration of the p l |

0:05:29 | d requires three |

0:05:31 | so we did |

0:05:33 | cosine into the station then the lda training and |

0:05:37 | building a tree clustering |

0:05:41 | we a deed |

0:05:43 | sites you know four bars |

0:05:45 | using a bus |

0:05:47 | algorithm one em algorithm two |

0:05:52 | no i should mention about |

0:05:56 | and b lda model because i will need |

0:05:59 | some |

0:06:00 | parameter names on the next slides |

0:06:03 | so we used on our model |

0:06:08 | and the number or for eigenvoice matrix a eigenvoice voices source and the one and |

0:06:15 | the number of eigen channels was and two |

0:06:22 | well |

0:06:23 | first |

0:06:23 | clustering algorithm consist of two stage |

0:06:27 | states |

0:06:28 | and so |

0:06:29 | but you're stage is |

0:06:31 | and every stick also watch |

0:06:33 | for the clusters |

0:06:35 | it is |

0:06:37 | like i mean shift |

0:06:38 | clustering algorithm |

0:06:41 | so we step by step find |

0:06:43 | the clusters |

0:06:46 | using mean shift |

0:06:48 | algorithm |

0:06:51 | and the second stage we try to compensates the hero all |

0:06:57 | the weighting for one speaker i-vectors to diff |

0:07:02 | one different in different clusters |

0:07:06 | so we used |

0:07:07 | a simple bottom-up stage of the |

0:07:10 | agglomerative hierarchical clustering |

0:07:13 | and so |

0:07:14 | use a simple repeat until up |

0:07:17 | i'll |

0:07:20 | they also you can see the reference |

0:07:22 | to the mean shift clustering |

0:07:24 | our viewers told us about |

0:07:27 | that our algorithm is very similar to the |

0:07:32 | two |

0:07:33 | that's it is described |

0:07:35 | in this or |

0:07:39 | our seconds algorithm is just a sound or standard |

0:07:45 | agglomerative four |

0:07:47 | bottom-up stage of h t algorithm and it is else a used i it is |

0:07:54 | also uses a course |

0:07:57 | cosine or plp matrix |

0:08:00 | and so |

0:08:01 | the threshold tower three is involved |

0:08:04 | two |

0:08:06 | for stopping criterion |

0:08:10 | the next slide i |

0:08:15 | show you |

0:08:17 | i will show you |

0:08:19 | the same with some parameters |

0:08:21 | and it's values |

0:08:23 | for initial post clustering we used to |

0:08:30 | such condition such conditions |

0:08:33 | that our threshold from |

0:08:35 | first and second stage |

0:08:37 | or was equal |

0:08:39 | and so |

0:08:41 | were you go and the equal zero point twenty nine |

0:08:46 | we used to sixty a |

0:08:48 | sixteen the random clustering integerization |

0:08:52 | and also we |

0:08:54 | use the rules that no liz and two and no more than |

0:08:58 | fifteen fifty vectors |

0:09:01 | or could be |

0:09:03 | in |

0:09:04 | a cluster one cluster |

0:09:06 | because |

0:09:08 | l so it should be mentioned that the p lda clustering |

0:09:14 | was done using simplified the lda model |

0:09:18 | so we i used |

0:09:20 | the three hundred eigenvoices |

0:09:23 | and the used full covariance noise model |

0:09:27 | for such a case |

0:09:29 | the threshold tall one was equal negative zero point two |

0:09:34 | and shower |

0:09:35 | two was |

0:09:38 | zero point twenty two |

0:09:40 | nine |

0:09:42 | and for a clustering who we will use the rules a normal it's and three |

0:09:47 | and no more than |

0:09:48 | fifty i-vectors |

0:09:52 | jolt |

0:09:53 | would be chosen |

0:09:56 | for algorithm two |

0:09:58 | would be used to the value |

0:10:00 | that was three |

0:10:01 | which was people zero point forty three and we also used simplified really model but |

0:10:08 | the different is that we used only |

0:10:10 | the diagonal covariance noise maddox |

0:10:14 | and the |

0:10:15 | there was another rule |

0:10:18 | no list three and no more than |

0:10:20 | so directors in clusters |

0:10:26 | well |

0:10:27 | for as their bodies and false or our experiments |

0:10:31 | we use we used another plp model |

0:10:36 | which |

0:10:38 | two into cannot you count channel factors |

0:10:41 | and to be used only diagonal covariance matrix |

0:10:45 | so in our case |

0:10:48 | and one was required to achieve d and two was |

0:10:54 | fifty five |

0:10:56 | model training or to build the i-vector purity system |

0:11:03 | have to be made using curve the results of for the algorithm one clustering |

0:11:10 | for the initialisation all their eigenvoice maddox we may have used you see |

0:11:16 | and the |

0:11:19 | it to have been mentioned that only one ml duration you maximum likelihood duration is |

0:11:25 | need |

0:11:26 | you we will eight |

0:11:31 | next iteration you'd so we'll that best to some degradation |

0:11:37 | a few words about a b m p l d system |

0:11:41 | and we can use it's to |

0:11:44 | extract |

0:11:47 | you be vectors from our i-vector |

0:11:50 | i-vectors |

0:11:51 | so it is not so strictly speaking it is not |

0:11:55 | and extractor but it is and non-linear project of role i-vector space to be i-vector |

0:12:01 | space which incorporate the not information or to the |

0:12:06 | speaker verification task |

0:12:09 | so we now simply used |

0:12:12 | probably in training for their |

0:12:14 | classification task |

0:12:17 | two |

0:12:18 | obtain german |

0:12:19 | distribute distribution all the i-vectors and its |

0:12:24 | the labels |

0:12:28 | and also we try to use so |

0:12:30 | additional hidden line |

0:12:33 | with |

0:12:34 | unsupervised training |

0:12:38 | and the in this case the number or for a new rounds or for first |

0:12:44 | wire was two thousand and the number all |

0:12:47 | neurons of softmax lie was five hundred |

0:12:52 | just that's in the previous one |

0:12:54 | where are |

0:12:56 | each |

0:12:58 | was equal |

0:13:01 | five hundred |

0:13:04 | so what is to be reactive |

0:13:08 | we used posterior or posteriors of the softmax layer to obtain our be vectors by |

0:13:14 | using |

0:13:15 | p c and the |

0:13:19 | we see projection all the local posteriors |

0:13:23 | in the low dimensional space |

0:13:25 | so in our case |

0:13:28 | the number was |

0:13:30 | and see it was equal to |

0:13:33 | number all near on solve who he don't lie and |

0:13:38 | what equal five |

0:13:41 | but for that be vector p l b vector space be used |

0:13:47 | another be lda model which is different from the i-vector space |

0:13:53 | we use the number of for each invoice four hundred and the in such a |

0:13:59 | case to be used a simplified be of v mobile |

0:14:05 | so |

0:14:07 | lda svm as the have been mentioned |

0:14:11 | before used to |

0:14:13 | rusting algorithm to and tusks score normalization procedure yes it's normalization |

0:14:21 | few worst about well to measure function |

0:14:23 | we it is well-known that the a threshold of the mean decision cost |

0:14:31 | function depends on |

0:14:34 | test |

0:14:35 | and roll |

0:14:37 | segment duration |

0:14:39 | and to take intake for so i in the nist i-vector challenge of a deal |

0:14:44 | with we don't with |

0:14:46 | multi session and role model |

0:14:49 | and the |

0:14:52 | every duration also and role model is much better a much larger than the duration |

0:14:59 | of the test models |

0:15:00 | so we ignored the dependence |

0:15:03 | one there |

0:15:05 | and roll durations |

0:15:07 | and so we |

0:15:09 | focused |

0:15:10 | on the explore investigation all the dependence on the test |

0:15:15 | duration |

0:15:17 | so we did it using power |

0:15:20 | clustering results |

0:15:21 | we |

0:15:23 | prepare |

0:15:24 | some protocols |

0:15:26 | five session |

0:15:27 | and roll protocols and to be obtained and several points |

0:15:33 | and the also obtained linear dependence |

0:15:37 | well the threshold |

0:15:40 | front |

0:15:41 | locally from both |

0:15:43 | this duration |

0:15:44 | but |

0:15:48 | it should be mentioned that |

0:15:51 | though who are very from function no could be replaced by the |

0:15:56 | power function for example |

0:15:59 | the |

0:16:00 | square root |

0:16:02 | the because of similar bic a or |

0:16:06 | those function |

0:16:09 | for of system fusion we used a simple |

0:16:14 | linear combination weighted sum |

0:16:17 | well the scores but to be also |

0:16:21 | we need to some sigma normalising a fusion |

0:16:26 | for c lda svm subs system |

0:16:30 | it equals one but for a other subsystems it it's |

0:16:38 | before |

0:16:41 | so to results |

0:16:43 | first |

0:16:45 | i will show you |

0:16:47 | our results |

0:16:49 | with incorporating hopeful to duration information so they can see that using |

0:16:55 | quite a measure of for function let us a two |

0:17:00 | significantly to reduce |

0:17:04 | minimum decision cost function |

0:17:07 | and i guess |

0:17:08 | requires the reduction |

0:17:11 | for me minimum decision cost function by ten percent |

0:17:16 | for lda svm subsystem but for final fusion with equal weights |

0:17:25 | it's also achieve achieves good performance break seven thousand |

0:17:30 | relative |

0:17:35 | no about the pure sound or for i-vector and be vector |

0:17:40 | space purity models |

0:17:42 | and scores of this model |

0:17:45 | so |

0:17:47 | it's |

0:17:49 | we a obtain and |

0:17:51 | we obtain so |

0:17:54 | and reduction of the mean decision cost function this is you to the fact that |

0:17:59 | the |

0:18:01 | r b m or dbn presents non-linear |

0:18:06 | transform want the i-vector space it's a it's a little with us to make that |

0:18:12 | few room |

0:18:13 | such systems |

0:18:19 | no for |

0:18:21 | lda and r b m field is subsystems pure and b |

0:18:25 | at your good results |

0:18:29 | but the weights aurora on equal |

0:18:31 | different that we are there have optimize it by submissions |

0:18:36 | and v the habit you |

0:18:39 | zero point the two |

0:18:41 | four and one |

0:18:45 | and the to the our best results |

0:18:49 | we just consists of four |

0:18:51 | three subsystems |

0:18:53 | of the svm subsystems are be mpo this subsystem and ubm the only subsystem |

0:18:59 | or |

0:19:00 | in such a case the dbn plp |

0:19:04 | you it gave us a little bit more information |

0:19:07 | for the verification and we managed to achieve |

0:19:13 | zero point two |

0:19:15 | three nine results |

0:19:17 | results |

0:19:18 | which is the best one |

0:19:21 | and took |

0:19:22 | conclusion |

0:19:23 | we have presented so our system which consist of |

0:19:27 | p obviously it'll d and their bm systems |

0:19:32 | we present its agglomerative clustering algorithms |

0:19:38 | they also combination of the lda and l die it'll d is frames systems |

0:19:44 | use |

0:19:45 | different clustering algorithm |

0:19:47 | this resulted in effect if you're one |

0:19:50 | and a nonlinear transformation of |

0:19:53 | i-vectors in be vector space |

0:19:55 | it also |

0:19:57 | leads to successful fusion |

0:20:02 | classical i-vector systems |

0:20:06 | so that's all |

0:20:32 | i have also congratulations a i just wanna one ask you the use of the |

0:20:38 | mean six outweighs more version of mincing start with |

0:20:43 | did you compare its for example with that standard right of clustering |

0:20:48 | to see how much gain from using this algorithm |

0:20:53 | yes we did it and the |

0:20:56 | a you can see that to be used the algorithm to and we try to |

0:21:00 | use a great and two clustering for training the p l d model |

0:21:06 | and the algorithm to is just an bottom-up stage hands says honour one and the |

0:21:13 | it's let us to |

0:21:16 | some degradation the mean shift the was |

0:21:20 | better |

0:21:21 | for this task |

0:21:23 | specially for p l d train |

0:21:26 | and |