given an i-vector on the value

can be decomposed in part

speaker all part with your town zero on

with

matrix v

score one contains them but this of

and they can based voice subspace

and always it you

which is like to speaker factor normally distributed

which is

so we to do is a

consummate weeks from now

which is for inside too

and the lies in most commonly used

p l system for i-vectors in which shown in effect is kept for

the decision score

proposed by someone prince

is or log likelihood ratio

in which we can see that

the computing the scroll depends only on the

nolan shelf

i matrix fifty transpose of five but

for speaker

factor

and vector transpose it proves long down

which content to talk about every reliability

there shouldn't fairly on modeling can provide good performance but

it has been shown that just performance are achieved only if the condition and prosody

a it follows the and extraction of i-vector all this conditioning posteriors

and is summarized by whitening most commonly used a

whitening is a standardization and length normalisation

i was matrix

of liability shown in for the standardisation

can is a total covariance matrix

so within speaker covariance matrix the volume

eventually to eventually we iterate this process

parameters are computed for the i-vectors present in the training corpus and applied to test

i-vectors

assumptions of the mission p lda the vicinity

justly

and the linearity of eigenvoices it means that

so the speaker are but can be constrained in a linear subspace

and the mostly just a city of the radio or

it means that a system to build your model assumes

that a speaker classes

statistics means that channel effects can be modeled

in a speaker independent way

so that the distributions shells a seven grams metrics

so it's independency between number

and the speaker factor

and the equality of covariance

garlic which occurrence it means that there are also between the residual between the actually

beach of a class

and the middle parameter

computed

for the jelly a seem to be uncorrelated

normally distributed a on the explained by

so front it's a simple

of the development corpus

so randomly

and that surrounds the not vary with the effects being more target

on the left as a graph

is the simple condition of the p lda model is in speaker factor one dimension

one additional subspace

where is no more

while stoned a normal prior for the speaker factor

and some classes with the same

viability metrics

or am is that i-vector no lie on

the nonlinear and find it connects subsets of an impostor

so as the distribution of i-vector noise

which is referred to as it's very core distribution

we think that perhaps insurance that exists a renowned speaker-independent admits a parameter on the

of within stego abilities questionable

in such a not affect be modeled in a speaker independent way

it's difficult to sure that something is right or something is wrong

for example if we find out or ration significant duration between

the whole

and the class parameter

the effect drama to it where you're late the estimation of random variable

first we present the deterministic approach

why printing deterministic approach to compute a purely apparently fast

because first two and we try some

deterministic approach is an remarks and that other approaches

not

all relevant sometimes a not so but the to suit

it should still there is not optimal for i-vector cycle distribution

can we replace is sophistication of the expectation maximization maximum likelihood

estimation of

parameters

by a simple and straightforward while stifle wildest an acoustic approach

so we want to know if

so application of the maximum likelihood

approach compute the parameters of the india

brings significant improvement of performance

we did not sorry may be the value when signals the between losing into programs

matrix was completely

on our development corpus

a singular value decomposition of the between speaker covariance matrix

give a matrix

whose columns are

so eigenvectors of the weighting between speaker

liability and the their remote matrix of eigenvalues

sorted in decreasing order

un a wrong are less and b

we can

compute

as arounds principle between speaker variability

and summarize it's and metric speech times t matrix

defined by the question for

the fast not to x p one two we are used to be turned on

matrix composed of the first occurrence of p

and so they're gonna matrix don't i want to well

is only comprise of the

highest hardest

eigenvalues

and so we propose a two

carry out

experiment with only

i w conditioning

conditioning and the system the still addition according to

within class covariance matrix

followed by next lemmatization

and the direct estimation of others at the parameters of the p l

the lda without which emitted and then and

on the bus on development corpus

so the scoring replaced by is the smart this is the total covariance matrices

for

is estimated by

that at the transmitters of the development corpus

and speaker levity metrics fifty transpose by be want to all

suppose can be justified if we consider somebody solely data from the development corpus

we can express as a factor and the parameters

speaker and with your

factors and she

well i on the value s is the mean vector director of speaker s

we show in the article that the covariance matrix is be i two well as

desirable that the speaker factor is standardised mean zero and i don't to metrics for

ability

and the dependence between that and variables

remark that only the new which of the covariance which is a necessary condition

is the

shift

and we cry and to obtain the lda scoring

next mission is known to improve the question it is so we compute the custody

of the speaker and was or fact also for development corpus

before and after length normalisation

top graphs shows

and distribution offices quell line source to standardise digital factors

left as the speaker factors on whites are ways of the optimum

the dashed lull i and is

the distribution of the key to

the speaker factor or must follow

a key with a degrees of freedom

and still on

okay to is a p u is of freedoms peas dimension of the i-vector space

we show it's not use that

for all

development board line

and so for evaluation

datasets

there is a mismatch

between them

and as a distribution of an intimate we can give it a distribution

remark also the several dataset shift between

development and evaluation dataset

after length normalization

is the volume

right care to experiments with

manage to compute parameters and with a deterministic approaches

in both cases

we can see that

so the numbers and the t v

partially reduced

and the shift

between the development and evaluation

mark sets a deterministic approach

improves the question e g

in a similar manner to ml technique

what's that is on the and it's recognition but most distant of motion t

always use of

three systems

we ultraviolet of conditions of the nist speaker recognition evaluations on eight ten

twelve telephone

is that the noisy environment

with the system

was a length normalization

following do not exist from going to signal

so that learns metrics and

two w which two cases

what is you know and mel

estimate of parameters and is a deterministic

an estimate of parameters

we can see

you can see that the result of the same in terms of

the colour right

between the two last the last two techniques

in terms of this you have the probabilistic approach women superior

and we mark sets l w conditioning performed a bitter

done the l signal conditioning

event with a deterministic approach

so no you consider that maybe the fact that so

and the end and better approach doesn't bring as expected

improvement of performance

maybe is due to the fact that the g p l d ar model is

not optimal for i-vector spherical distributions

so we compute

two series for development corpus

first the average people celebrated of zero is you of our observations

given the model

even

she and until

and standard t money t

which we consider that are likely would

off

class but also for likelihood

of the class given number

then we compile this

likelihood to the parameter of position of the class consider a wider of probabilistic class

position it's pasta or like a likelihood of the speaker and for speaker factor of

the class

and we display

the two series

all horizontal

wasn't really is parameter of class position and optical

the likelihood

of the reason you

according

we can model

the first graph

shows results always that would next normalisation

with i-vector lost provided buys extractor

and we remark here that

no volition a cross between the position of the class

and is a likelihood

of the residue

each time we displays a coefficient of determination task well

two scroll from zero to one

when which indicates l well data for points fit alignment

the task was equal to zero point zero four

close to zero

after are all length normalisation

a significant reduction

appears between the likelihoods of the class factors and the likelihood of there is you

a squirrel are equal to zero point filing nine and zero point six four

so there is a dependency between

the actual vulnerability

matrix of class and the probability position of this classic sperry by the likelihood of

the fractal

so we can see they are that's the show and it there was a dusty

of the raising your

we compute the previews

results who is well training set

in which that are not evenly distributed across speakers

so we can object that relations due to the can to differ information to speaker

some four

so we compute the same graphs and before

but on the for is speaker

training classes

with the minimum number of sessions per training speaker

we don't are you see that a minimal number of sessions speaker

one from two to sixty two

and this time for only segments of speaker which

the more than this minimum well

we compute the l score

we see that before makes them addition there are no problems

because the

the two series are independent

and after maximization be seen

that event for

uses speaker classes with the

the maximum number of sessions

the same

was we took us

is else well which are higher than zero point six

so we remark

that the j p alone modelling is a good model

but if we are obliged to

project that on the nonlinear also phones are

problem is to be sure that and the most acoustic model with the quality of

covariance

will for from the simpson

we don't dusty does take this out to replace the overall with a cluster between

that parameter by the class dependent parameter

steak the queen the local position of the class to fit to it

actual distortions

such an adrenaline is difficult to carry out

because it induces a complex density

i passing the within class variability parameters will nonlinear function

or getting up length normalization and

posting approaches as well which present over the i-vector on the one

attempting to find out attic what why also as heavy tailed be

discriminative classifiers pairwise discriminative

all just on why we are obliged to ignore the non maybe because and all

contain expected

the art abilities

may be related to some parameters

acoustic

just remark

which the and w conditioning

transform is the within class variability in the identity matrix

and identity matrix as no

principal components

maybe it at alleviates is a constant of

almost a dusty city

thank you

i

condition man something eat with experiments that you replaced the probabilistic approach of estimating the

parameters with the say on the screen

the minister

i think that's

in the limit if you're stream have main speakers

these two conditions exactly the same sort the only difference is that you're putting the

prior in the one case

okay so we present us with the number of the number of speakers average number

of speakers a and i guess that's you can go to a small number of

speakers when you train the model

yes and it and difference that's as of the deterministic approach is not intended competes

with a man a matter and then is the best way

but just i was surprised by is a slight yelp of performance

and so it's

assume that maybe because our aim ml count

be optimal because there is a problem of sphericity of data

but deterministic approach is not

and that's exactly this topic when we try to show that the norm

of the speaker factors

whether the full weight a

yes i guess a because you have to treat them as random variables because the

not simply points

under the plp scheme there they have a posterior distribution

"'kay" a better way

to consider whether they following the distribution

would be

broccoli to at the trace

the posterior covariance matrix v should be should also be added when you can't leave

the norm in order to see the overall distribution rather than

dot products on okay

marketing that's

so that in that's the same rationale "'cause" with evaluation was test like toss a

rice with development corpus used as vectors

same effect okay was that the difference is that between length normalization they'll score is

not close to zero

the cost for off test

i-vectors before estimators and has provided by the extract all

is the to zero point three

where is a vector of test not used for training the lda factor analyses so

there is a shift only not only for mean

but only four

this problem of almost instantly

just one quick what i just missed your point when you said

i think you were saying that

trying to make the det spherically distributed you thought was inconsistent with being gaussian

why's that

its empirical but the gaussian high dimensional space are sphere

yes some very

but

we constrained speaker fact all floral

and sphere

the just a goat

to assume that the within class but not be a set of the problem but

we will be affected

by the position

writings the posterior

the prior distribution of the i-vectors

zero mean unit identity rate both in high dimensional space that will be approximate

so that that's i mean that's care what happened i not as mathematically what a

high dimensional space so why's it in its just

here we actually a lot of what's a spherical distribution for phase as well

and applying model with the quality of correlators is a difficult the surface

maybe a see that length normalization is a whole technique projects on the sphere

instead of adjusting the tanks taking the information i think

good but which

but not so

discussion