i am certainly not myself and that would like to

tell you

about our

system for the nist i-vector challenge

so

the old land of my topic is false

first

i would like to

show your overall system description

is then i will be the i will describe i clustering program and a

next

i can stick one went we will present so

our subsystems

like i-vector p l d subsystem be vector

r b m or dbn i p l d subsystem

and the last one i-vector lda svm subsystems

next so i would talk about

mark while the matter function to incorporate

test duration information

in scoring

and the

next so

subsystem fusion really present that and finally i will

the present so our results and so i will make conclusions

let's min

show you overall system description

yes you can see we

exploring different systems

subsystems

idea to build the that's a standard one and

state-of-the-art systems the speaker recognition task

the same no and

some noble systems also

and was aware used

aside just our bn or d b and b vectors

subsystems

which is based on a p l d's tandem be of the model

and the last one is

and

well known lda svm subsystem based on i-vectors

we made a fusion or four

our different combinations or for our systems

and also we take we took into account so quality measure function and so we

incorporated test duration information

two

it's a good scoring results

so

our system was developed by different although simultaneously

and that the let us to

different clustering algorithms

to the different subsystems

as you can see for

the lda are be

the l b

subsystem be used

clustering algorithm one

and for the

lda svm subsystem we

have developed

its own clustering

algorithm which name is order and two

so few words about the clustering problem

with the which we

we're

do their thing

so

first so we try to use sound a standard

techniques for clustering such as

kind means and bottoms

but we didn't succeed with

those techniques

and the

there are two empirical established back from the speaker recognition

which are can help us

first of them is that the cosine metric is a kind meaning comparison metric and

on vector space and the second so you the

that the model a raging normalized a vector is

consider the most efficient model the session

model

so

we decided to use for initial clustering step only for initial clustering step

cosine distance

next we try to used to build a big would be very clustering strategy

after there is of course

cosine initial clustering step

it's makes sense to use a more efficient bill dimitri

which explicitly takes into account

between speaker or within speaker variability

so you can see the

this scheme all the

you'll do we clustering on this line

but we

manage

with only one iteration

we obtain good results are on the after the first iteration of the p l

d requires three

so we did

cosine into the station then the lda training and

building a tree clustering

we a deed

sites you know four bars

using a bus

algorithm one em algorithm two

no i should mention about

and b lda model because i will need

some

parameter names on the next slides

so we used on our model

and the number or for eigenvoice matrix a eigenvoice voices source and the one and

the number of eigen channels was and two

well

first

clustering algorithm consist of two stage

states

and so

but you're stage is

and every stick also watch

for the clusters

it is

like i mean shift

clustering algorithm

so we step by step find

the clusters

using mean shift

algorithm

and the second stage we try to compensates the hero all

the weighting for one speaker i-vectors to diff

one different in different clusters

so we used

a simple bottom-up stage of the

agglomerative hierarchical clustering

and so

use a simple repeat until up

i'll

they also you can see the reference

to the mean shift clustering

our viewers told us about

that our algorithm is very similar to the

two

that's it is described

in this or

our seconds algorithm is just a sound or standard

agglomerative four

bottom-up stage of h t algorithm and it is else a used i it is

also uses a course

cosine or plp matrix

and so

the threshold tower three is involved

two

for stopping criterion

the next slide i

show you

i will show you

the same with some parameters

and it's values

for initial post clustering we used to

such condition such conditions

that our threshold from

first and second stage

or was equal

and so

were you go and the equal zero point twenty nine

we used to sixty a

sixteen the random clustering integerization

and also we

use the rules that no liz and two and no more than

fifteen fifty vectors

or could be

in

a cluster one cluster

because

l so it should be mentioned that the p lda clustering

was done using simplified the lda model

so we i used

the three hundred eigenvoices

and the used full covariance noise model

for such a case

the threshold tall one was equal negative zero point two

and shower

two was

zero point twenty two

nine

and for a clustering who we will use the rules a normal it's and three

and no more than

fifty i-vectors

jolt

would be chosen

for algorithm two

would be used to the value

that was three

which was people zero point forty three and we also used simplified really model but

the different is that we used only

the diagonal covariance noise maddox

and the

there was another rule

no list three and no more than

so directors in clusters

well

for as their bodies and false or our experiments

we use we used another plp model

which

two into cannot you count channel factors

and to be used only diagonal covariance matrix

so in our case

and one was required to achieve d and two was

fifty five

model training or to build the i-vector purity system

have to be made using curve the results of for the algorithm one clustering

for the initialisation all their eigenvoice maddox we may have used you see

and the

it to have been mentioned that only one ml duration you maximum likelihood duration is

need

you we will eight

next iteration you'd so we'll that best to some degradation

a few words about a b m p l d system

and we can use it's to

extract

you be vectors from our i-vector

i-vectors

so it is not so strictly speaking it is not

and extractor but it is and non-linear project of role i-vector space to be i-vector

space which incorporate the not information or to the

speaker verification task

so we now simply used

probably in training for their

classification task

two

obtain german

distribute distribution all the i-vectors and its

the labels

and also we try to use so

additional hidden line

with

unsupervised training

and the in this case the number or for a new rounds or for first

wire was two thousand and the number all

neurons of softmax lie was five hundred

just that's in the previous one

where are

each

was equal

five hundred

so what is to be reactive

we used posterior or posteriors of the softmax layer to obtain our be vectors by

using

p c and the

we see projection all the local posteriors

in the low dimensional space

so in our case

the number was

and see it was equal to

number all near on solve who he don't lie and

what equal five

but for that be vector p l b vector space be used

another be lda model which is different from the i-vector space

we use the number of for each invoice four hundred and the in such a

case to be used a simplified be of v mobile

so

lda svm as the have been mentioned

before used to

rusting algorithm to and tusks score normalization procedure yes it's normalization

few worst about well to measure function

we it is well-known that the a threshold of the mean decision cost

function depends on

test

and roll

segment duration

and to take intake for so i in the nist i-vector challenge of a deal

with we don't with

multi session and role model

and the

every duration also and role model is much better a much larger than the duration

of the test models

so we ignored the dependence

one there

and roll durations

and so we

focused

on the explore investigation all the dependence on the test

duration

so we did it using power

clustering results

we

prepare

some protocols

five session

and roll protocols and to be obtained and several points

and the also obtained linear dependence

well the threshold

front

locally from both

this duration

but

it should be mentioned that

though who are very from function no could be replaced by the

power function for example

the

square root

the because of similar bic a or

those function

for of system fusion we used a simple

linear combination weighted sum

well the scores but to be also

we need to some sigma normalising a fusion

for c lda svm subs system

it equals one but for a other subsystems it it's

before

so to results

first

i will show you

our results

with incorporating hopeful to duration information so they can see that using

quite a measure of for function let us a two

significantly to reduce

minimum decision cost function

and i guess

requires the reduction

for me minimum decision cost function by ten percent

for lda svm subsystem but for final fusion with equal weights

it's also achieve achieves good performance break seven thousand

relative

no about the pure sound or for i-vector and be vector

space purity models

and scores of this model

so

it's

we a obtain and

we obtain so

and reduction of the mean decision cost function this is you to the fact that

the

r b m or dbn presents non-linear

transform want the i-vector space it's a it's a little with us to make that

few room

such systems

no for

lda and r b m field is subsystems pure and b

at your good results

but the weights aurora on equal

different that we are there have optimize it by submissions

and v the habit you

zero point the two

four and one

and the to the our best results

we just consists of four

three subsystems

of the svm subsystems are be mpo this subsystem and ubm the only subsystem

or

in such a case the dbn plp

you it gave us a little bit more information

for the verification and we managed to achieve

zero point two

three nine results

results

which is the best one

and took

conclusion

we have presented so our system which consist of

p obviously it'll d and their bm systems

we present its agglomerative clustering algorithms

they also combination of the lda and l die it'll d is frames systems

use

different clustering algorithm

this resulted in effect if you're one

and a nonlinear transformation of

i-vectors in be vector space

it also

leads to successful fusion

classical i-vector systems

so that's all

i have also congratulations a i just wanna one ask you the use of the

mean six outweighs more version of mincing start with

did you compare its for example with that standard right of clustering

to see how much gain from using this algorithm

yes we did it and the

a you can see that to be used the algorithm to and we try to

use a great and two clustering for training the p l d model

and the algorithm to is just an bottom-up stage hands says honour one and the

it's let us to

some degradation the mean shift the was

better

for this task

specially for p l d train

and