0:00:16i am certainly not myself and that would like to
0:00:21tell you
0:00:22about our
0:00:24system for the nist i-vector challenge
0:00:29so
0:00:30the old land of my topic is false
0:00:33first
0:00:36i would like to
0:00:37show your overall system description
0:00:41is then i will be the i will describe i clustering program and a
0:00:47next
0:00:48i can stick one went we will present so
0:00:53our subsystems
0:00:55like i-vector p l d subsystem be vector
0:00:59r b m or dbn i p l d subsystem
0:01:04and the last one i-vector lda svm subsystems
0:01:10next so i would talk about
0:01:12mark while the matter function to incorporate
0:01:18test duration information
0:01:20in scoring
0:01:21and the
0:01:24next so
0:01:25subsystem fusion really present that and finally i will
0:01:32the present so our results and so i will make conclusions
0:01:40let's min
0:01:42show you overall system description
0:01:45yes you can see we
0:01:49exploring different systems
0:01:53subsystems
0:01:54idea to build the that's a standard one and
0:01:58state-of-the-art systems the speaker recognition task
0:02:04the same no and
0:02:07some noble systems also
0:02:11and was aware used
0:02:13aside just our bn or d b and b vectors
0:02:17subsystems
0:02:19which is based on a p l d's tandem be of the model
0:02:25and the last one is
0:02:27and
0:02:28well known lda svm subsystem based on i-vectors
0:02:37we made a fusion or four
0:02:39our different combinations or for our systems
0:02:44and also we take we took into account so quality measure function and so we
0:02:52incorporated test duration information
0:02:56two
0:02:57it's a good scoring results
0:03:05so
0:03:06our system was developed by different although simultaneously
0:03:11and that the let us to
0:03:14different clustering algorithms
0:03:17to the different subsystems
0:03:20as you can see for
0:03:23the lda are be
0:03:25the l b
0:03:26subsystem be used
0:03:29clustering algorithm one
0:03:32and for the
0:03:33lda svm subsystem we
0:03:37have developed
0:03:38its own clustering
0:03:41algorithm which name is order and two
0:03:48so few words about the clustering problem
0:03:51with the which we
0:03:53we're
0:03:54do their thing
0:03:56so
0:03:59first so we try to use sound a standard
0:04:02techniques for clustering such as
0:04:04kind means and bottoms
0:04:06but we didn't succeed with
0:04:10those techniques
0:04:12and the
0:04:14there are two empirical established back from the speaker recognition
0:04:18which are can help us
0:04:21first of them is that the cosine metric is a kind meaning comparison metric and
0:04:26on vector space and the second so you the
0:04:29that the model a raging normalized a vector is
0:04:34consider the most efficient model the session
0:04:37model
0:04:38so
0:04:39we decided to use for initial clustering step only for initial clustering step
0:04:45cosine distance
0:04:49next we try to used to build a big would be very clustering strategy
0:04:55after there is of course
0:04:58cosine initial clustering step
0:05:01it's makes sense to use a more efficient bill dimitri
0:05:04which explicitly takes into account
0:05:07between speaker or within speaker variability
0:05:11so you can see the
0:05:13this scheme all the
0:05:15you'll do we clustering on this line
0:05:18but we
0:05:20manage
0:05:22with only one iteration
0:05:24we obtain good results are on the after the first iteration of the p l
0:05:29d requires three
0:05:31so we did
0:05:33cosine into the station then the lda training and
0:05:37building a tree clustering
0:05:41we a deed
0:05:43sites you know four bars
0:05:45using a bus
0:05:47algorithm one em algorithm two
0:05:52no i should mention about
0:05:56and b lda model because i will need
0:05:59some
0:06:00parameter names on the next slides
0:06:03so we used on our model
0:06:08and the number or for eigenvoice matrix a eigenvoice voices source and the one and
0:06:15the number of eigen channels was and two
0:06:22well
0:06:23first
0:06:23clustering algorithm consist of two stage
0:06:27states
0:06:28and so
0:06:29but you're stage is
0:06:31and every stick also watch
0:06:33for the clusters
0:06:35it is
0:06:37like i mean shift
0:06:38clustering algorithm
0:06:41so we step by step find
0:06:43the clusters
0:06:46using mean shift
0:06:48algorithm
0:06:51and the second stage we try to compensates the hero all
0:06:57the weighting for one speaker i-vectors to diff
0:07:02one different in different clusters
0:07:06so we used
0:07:07a simple bottom-up stage of the
0:07:10agglomerative hierarchical clustering
0:07:13and so
0:07:14use a simple repeat until up
0:07:17i'll
0:07:20they also you can see the reference
0:07:22to the mean shift clustering
0:07:24our viewers told us about
0:07:27that our algorithm is very similar to the
0:07:32two
0:07:33that's it is described
0:07:35in this or
0:07:39our seconds algorithm is just a sound or standard
0:07:45agglomerative four
0:07:47bottom-up stage of h t algorithm and it is else a used i it is
0:07:54also uses a course
0:07:57cosine or plp matrix
0:08:00and so
0:08:01the threshold tower three is involved
0:08:04two
0:08:06for stopping criterion
0:08:10the next slide i
0:08:15show you
0:08:17i will show you
0:08:19the same with some parameters
0:08:21and it's values
0:08:23for initial post clustering we used to
0:08:30such condition such conditions
0:08:33that our threshold from
0:08:35first and second stage
0:08:37or was equal
0:08:39and so
0:08:41were you go and the equal zero point twenty nine
0:08:46we used to sixty a
0:08:48sixteen the random clustering integerization
0:08:52and also we
0:08:54use the rules that no liz and two and no more than
0:08:58fifteen fifty vectors
0:09:01or could be
0:09:03in
0:09:04a cluster one cluster
0:09:06because
0:09:08l so it should be mentioned that the p lda clustering
0:09:14was done using simplified the lda model
0:09:18so we i used
0:09:20the three hundred eigenvoices
0:09:23and the used full covariance noise model
0:09:27for such a case
0:09:29the threshold tall one was equal negative zero point two
0:09:34and shower
0:09:35two was
0:09:38zero point twenty two
0:09:40nine
0:09:42and for a clustering who we will use the rules a normal it's and three
0:09:47and no more than
0:09:48fifty i-vectors
0:09:52jolt
0:09:53would be chosen
0:09:56for algorithm two
0:09:58would be used to the value
0:10:00that was three
0:10:01which was people zero point forty three and we also used simplified really model but
0:10:08the different is that we used only
0:10:10the diagonal covariance noise maddox
0:10:14and the
0:10:15there was another rule
0:10:18no list three and no more than
0:10:20so directors in clusters
0:10:26well
0:10:27for as their bodies and false or our experiments
0:10:31we use we used another plp model
0:10:36which
0:10:38two into cannot you count channel factors
0:10:41and to be used only diagonal covariance matrix
0:10:45so in our case
0:10:48and one was required to achieve d and two was
0:10:54fifty five
0:10:56model training or to build the i-vector purity system
0:11:03have to be made using curve the results of for the algorithm one clustering
0:11:10for the initialisation all their eigenvoice maddox we may have used you see
0:11:16and the
0:11:19it to have been mentioned that only one ml duration you maximum likelihood duration is
0:11:25need
0:11:26you we will eight
0:11:31next iteration you'd so we'll that best to some degradation
0:11:37a few words about a b m p l d system
0:11:41and we can use it's to
0:11:44extract
0:11:47you be vectors from our i-vector
0:11:50i-vectors
0:11:51so it is not so strictly speaking it is not
0:11:55and extractor but it is and non-linear project of role i-vector space to be i-vector
0:12:01space which incorporate the not information or to the
0:12:06speaker verification task
0:12:09so we now simply used
0:12:12probably in training for their
0:12:14classification task
0:12:17two
0:12:18obtain german
0:12:19distribute distribution all the i-vectors and its
0:12:24the labels
0:12:28and also we try to use so
0:12:30additional hidden line
0:12:33with
0:12:34unsupervised training
0:12:38and the in this case the number or for a new rounds or for first
0:12:44wire was two thousand and the number all
0:12:47neurons of softmax lie was five hundred
0:12:52just that's in the previous one
0:12:54where are
0:12:56each
0:12:58was equal
0:13:01five hundred
0:13:04so what is to be reactive
0:13:08we used posterior or posteriors of the softmax layer to obtain our be vectors by
0:13:14using
0:13:15p c and the
0:13:19we see projection all the local posteriors
0:13:23in the low dimensional space
0:13:25so in our case
0:13:28the number was
0:13:30and see it was equal to
0:13:33number all near on solve who he don't lie and
0:13:38what equal five
0:13:41but for that be vector p l b vector space be used
0:13:47another be lda model which is different from the i-vector space
0:13:53we use the number of for each invoice four hundred and the in such a
0:13:59case to be used a simplified be of v mobile
0:14:05so
0:14:07lda svm as the have been mentioned
0:14:11before used to
0:14:13rusting algorithm to and tusks score normalization procedure yes it's normalization
0:14:21few worst about well to measure function
0:14:23we it is well-known that the a threshold of the mean decision cost
0:14:31function depends on
0:14:34test
0:14:35and roll
0:14:37segment duration
0:14:39and to take intake for so i in the nist i-vector challenge of a deal
0:14:44with we don't with
0:14:46multi session and role model
0:14:49and the
0:14:52every duration also and role model is much better a much larger than the duration
0:14:59of the test models
0:15:00so we ignored the dependence
0:15:03one there
0:15:05and roll durations
0:15:07and so we
0:15:09focused
0:15:10on the explore investigation all the dependence on the test
0:15:15duration
0:15:17so we did it using power
0:15:20clustering results
0:15:21we
0:15:23prepare
0:15:24some protocols
0:15:26five session
0:15:27and roll protocols and to be obtained and several points
0:15:33and the also obtained linear dependence
0:15:37well the threshold
0:15:40front
0:15:41locally from both
0:15:43this duration
0:15:44but
0:15:48it should be mentioned that
0:15:51though who are very from function no could be replaced by the
0:15:56power function for example
0:15:59the
0:16:00square root
0:16:02the because of similar bic a or
0:16:06those function
0:16:09for of system fusion we used a simple
0:16:14linear combination weighted sum
0:16:17well the scores but to be also
0:16:21we need to some sigma normalising a fusion
0:16:26for c lda svm subs system
0:16:30it equals one but for a other subsystems it it's
0:16:38before
0:16:41so to results
0:16:43first
0:16:45i will show you
0:16:47our results
0:16:49with incorporating hopeful to duration information so they can see that using
0:16:55quite a measure of for function let us a two
0:17:00significantly to reduce
0:17:04minimum decision cost function
0:17:07and i guess
0:17:08requires the reduction
0:17:11for me minimum decision cost function by ten percent
0:17:16for lda svm subsystem but for final fusion with equal weights
0:17:25it's also achieve achieves good performance break seven thousand
0:17:30relative
0:17:35no about the pure sound or for i-vector and be vector
0:17:40space purity models
0:17:42and scores of this model
0:17:45so
0:17:47it's
0:17:49we a obtain and
0:17:51we obtain so
0:17:54and reduction of the mean decision cost function this is you to the fact that
0:17:59the
0:18:01r b m or dbn presents non-linear
0:18:06transform want the i-vector space it's a it's a little with us to make that
0:18:12few room
0:18:13such systems
0:18:19no for
0:18:21lda and r b m field is subsystems pure and b
0:18:25at your good results
0:18:29but the weights aurora on equal
0:18:31different that we are there have optimize it by submissions
0:18:36and v the habit you
0:18:39zero point the two
0:18:41four and one
0:18:45and the to the our best results
0:18:49we just consists of four
0:18:51three subsystems
0:18:53of the svm subsystems are be mpo this subsystem and ubm the only subsystem
0:18:59or
0:19:00in such a case the dbn plp
0:19:04you it gave us a little bit more information
0:19:07for the verification and we managed to achieve
0:19:13zero point two
0:19:15three nine results
0:19:17results
0:19:18which is the best one
0:19:21and took
0:19:22conclusion
0:19:23we have presented so our system which consist of
0:19:27p obviously it'll d and their bm systems
0:19:32we present its agglomerative clustering algorithms
0:19:38they also combination of the lda and l die it'll d is frames systems
0:19:44use
0:19:45different clustering algorithm
0:19:47this resulted in effect if you're one
0:19:50and a nonlinear transformation of
0:19:53i-vectors in be vector space
0:19:55it also
0:19:57leads to successful fusion
0:20:02classical i-vector systems
0:20:06so that's all
0:20:32i have also congratulations a i just wanna one ask you the use of the
0:20:38mean six outweighs more version of mincing start with
0:20:43did you compare its for example with that standard right of clustering
0:20:48to see how much gain from using this algorithm
0:20:53yes we did it and the
0:20:56a you can see that to be used the algorithm to and we try to
0:21:00use a great and two clustering for training the p l d model
0:21:06and the algorithm to is just an bottom-up stage hands says honour one and the
0:21:13it's let us to
0:21:16some degradation the mean shift the was
0:21:20better
0:21:21for this task
0:21:23specially for p l d train
0:21:26and