university espain speaker recognition

i-vector speaker recognition

PLDA

to get the parameters of the PLDA, we need to do the point estimates of

the parameters

maximum likelihood supervise

plenty of data

development data from

the PLDA considers i-vector decompose

where the prior is Gaussian

to use this model

a large number of data

if we don't have a large of data, we are forced to

speaker vector

where the prior for y is Gaussian

Gaussian

in this case we need less

so if we have for example twenty

a number of

dimension of speaker vector ninety

in the Bayesian approach

for the parameters

we are assumed they are

priors

on the model parameters

and then we compute the posterior

given the i-vectors and

so

methods

compute the posterior

prior

in this case we compute the posterior

from now on we call this prior

and finally we take

by computing their expected values given the target posterior

to get the posterior of the model parameters

solutions

what we do is they compose

assume model parameters

then we compute in a cyclic fashion

and finally we approximate

is the number of speakers in the database

and the posterior for the

for the channels

is the number of the segments in the

then we can compute

for the target data set

from the original data set to the target data set

we can compute the weight of the prior

target data

to do that we should modify the prior distribution

the weight prior has dependent

of the number of the speakers

that we have in the last data set

so we change the parameters

we want to multiply the weight prior

we have need to modify the alpha

these two parameters

but at the same time, they give the same expectation values for

we can do the same with the prior of w

and the finally

for the number of speakers and the number of segments

effective number of speakers and segments of the prior Gaussian

we are going to compare out methods

the normalization is

that do centering and whitening

to make more Gaussians

fixing Gaussian

unitary hypersphere

to reduce the data set

now I explain the data set

data set

this is

data set we will use

similar to the

telephone channels

that contains 30 male and 30 female

data has the similar conditions

conditions

two to three minutes

data set with large

we use this five

that contains more than five hundred males and seven hundred females

and it has variety of channels

speaker verification

we got twenty MFCC's plus delta and

we build the system

we use the normalization too

the parameters

and finally we used s norm score normalization with cohorts from the

first here

we compare

we can see improvement

we can see that

the prior distribution

we compare for instance the first line and the last line equal error rate

forty percent for males and fourteen percent for females for min d c f improvement

of twelve percent for males and forty six percent for females

here it is a table compare difference parameters

we can see

improvement

here we show length normalization with s norm and without s norm

when we use

improvement using i-vector but not as much as

we can see too that

in this data set vector normalization

better or

here we show some improvements

and for females

finally

we see that

we can see that without normalization

finally the conclusions we have developed a method to adapt a p l d a

i-vector classifier from a domain with a large amount of development data to a domain

with scarce development data

we have conducted experiments

we can see this technique improves the performance of the system

and these improvement mainly comes from the adaptation of the channel matrix w

we have compared this method with the length normalization

we have better results

we have discussed length normalization

as future work Bayesian adaptation of the u b m and the i-vector extractor

no the i-vector length means

not the dimensional of the i-vector

maybe we can do the same

as we have more norm data