Speech Transcript - Memory and Computation Effective Approaches for i-Vector Extraction

also

fusion techniques for extracting i-vectors

by efficient the

we went looking for some way to address

most of the memory of patient of the i-vector extractor

extracting genes

so the results a bit more state-of-the-art technology nowadays is based on i-vectors which are

very good as

produced a traditional

the computation of i-vector can be quite demanding in that at least one of the

time

so while some solutions

proposed for a system action with low memory requirements the namely

the diagonal isolate vectors proposed bigram but plus the

that is

we should also shown to have some degradation in accuracy

when

some to show some degradation of accuracy so we

well looking for a solution which does not include such degradation

but still those two

and greatly reduce the amount of memory required to store

so how variation action again today

that represent the original baser aside for i-vector extraction which is

can see that one in the previous two

then we present our conjugate gradient approach for i-vector extraction and finally present some experimental

results of these techniques

i guess everybody else what's i-vectors are but does brief introduction

there are not only for low dimensional informative for each utterance the presentations which is

that i don't is like model

so the most widely used

i-vector race if we

assume that

most of the speaker and channel variations like that small subspace in the supervector space

then we assume a session prior for the latent variable representing these variation

and

approximating the data likelihood by means so well with statistics we can compute the posterior

of these latent variable

and then we compute the i-vector a maximum a posteriori estimate of the latent variables

we can show that the post

is abortion these correspond to the a posteriori

and for the i-vector

so as you can see yeah

computing is the computational cost matrix

which and tasty a multiplication of the for the inverse matrix times the eigenvoice matrix

is that

or additional

these

this dataset

which are

a dimensionality which is where the i-vector dimensionality

we can see that

no plastic on a selection techniques can be

and

that is all so you see represents the number of abortions and the feature dimensionality

and then use the i-vector dimensionality

so if we don't put anything we have a

complexity which is the

with a quadratic in the i-vector dimensionality

and is the examples in the number of gaussian in the definition of two features

we can reduce the complexity but i mean and we ask that if we want

to this matter since he

but this is we have a shot of memory constraint which is again quadratic in

the effect of the nation's and proportional to the number of abortions

with

jessica like two thousand forty eight dimension of the ubm as used in this is

easily the most expensive

part that's memory of an i-vector instead

so i thought that was the last yeah i organisation based on that have a

nice mess over a vector instruction was proposed

which essentially okay we can i forgot mention that we can have the same as

that yeah

from the form has just by performance a normalization for the problem with statistics and

in this case of the eigenvoice matrix

then we can assume that these are simultaneously that as a model by some methods

Q and that we cannot compute an approximation of the posterior covariance which is

the yeah not so that

and session

can be performed in a much faster way with a very limited additional requirements

however you know it's

yes

right i can cause a degradation recognition accuracy

so we wanted to do better in terms of what you see here

and we said that the problem is the computation of the covariance matrix

the problem is that the covariance matrix is not that yeah

if you

this means that the i-vector components would be uncorrelated

you're and the posteriors that would factorize

so even though the posterior said that cannot be factorized about the different components we

look for an approximation of the posterior which factorizes all the sets of the i-vector

components

so we partition the i-vector components in to be disjoint sets

and we assume that the

here are can be approximated by

i distribution which factorizes of these states

yeah

the correlation baseband for facades a

way to estimate is the approximate posterior

by minimizing the kl divergence between the original posterior and this approximation

yeah i need to introduce some notation

namely we just

then all the

a simple the eigenvoices an associated to each block

of the i-vectors all each can i

we i is associated with a low that you wanna buy vector components

and these are just the compliments of those

subsets so that we can express

duplication in this way

so if we do some until we updated for each

a factor of the posterior of the approximate posterior

the its distribution is a great nor without expression which is very see that the

original i-vector inspiration

the difference is that this precision matrix is here are computed using the eigenvoices relative

to this subset

and for the mean of the posterior we are essentially centering the statistics over a

slightly different ubm

essentially we

say that

if we assume that are not components of the i-vector a fixed size and we

are

to this end the statistics of these new ubm

and

this is

these are allows us to see what is the complexity of this that be

we do not take a

okay reestimations only a new implementation implementing this technique because

if we just compute this at every time with a block size with a block

of size one

the complexity is again what that the unit vector images because every time

centering this

so we need is

we keep a supervector of a set of statistics which are always cat center of

the i-vector estimate

and we use the real well then you mean is computed by removing the centre

and all those components that we are estimating and then after we had they the

mean we update and you'll a vector of since order statistics so that its center

of the joystick to be a vector

so this way if we consider the contribution of the computational the precision matrix the

complexity of this approach is proportional to the dimensionality of i-vectors and the number of

iterations that we need to perform

to compute the i-vector

i can see is so that the similarity of this form with the original i-vector

was the covariance matrix essentially these are the block diagonal of that the last matrix

and we can model

again

two different techniques to compute the and you know we

compute

we therefore computation to compute the every time this covariance matrices

or we can restore the block diagram but also the audience matrix so in this

case we get

plus i selection time but slightly higher memory and the memory requirements depend on the

size we choose for the block

so essentially well we can show that this variational bayes and the variational bayes approach

implements a gaussian approach to the solution of this you know system

and we also investigated a different

techniques for

so it is used and namely the jacobi method in the conjugate gradient vector

what we found out is that the jacobi method is very see that this approach

but instead of updating the

i-vector after each iteration you have a vector is updated only after all components to

be estimated

in these encoders and this causes slightly slow whatever

the

the convergence rates in our experience

yeah we analyze is conjugate gradient

what's nice about squinted at it is that we don't need to be bad

the

covariance matrix here

what to do is that we don't even need to compute it really because we

just need to do the product of this matrix time a general vector which is

required by the conjugate gradient algorithm

so if we write the computation in the

but for your precious in this way we can see that the computation of this

product is a say should be you know in apples

you don't the components so it's not in the number of the components of the

ubm

number of features and dimensionality of i-vector

so we have a complexity which is the same as the variational bayes approach

so i guess

this kind of what's nice about this technique is that we don't require any kind

of additional memory

and has the for the variational bayes approach we can use this technique what's a

full covariance ubm if we do the prewhitening all the transmitters

ubm ones

i'll show you how we show you some results on the female dataset the extended

telephone conditions one is

so we do then

our setup is a sixty dimensional ubm we

two thousand four components

we ask for permission to make

we use

but i will length normalized i-vectors classifier you have

you know where

limitation we assume efficiency issues so i'm sure you

the results

those

before seen the results just one point out that

you directions

yeah one is an article

the exact i-vector also

and

so if we don't know that we can recover exactly same

accuracy or you know classifier

so you interest in is

see if we can do that i mean

we can stop yeah and still

achieve good results we

process structure of course

which one

which was the first one

so yeah i'm showing the results of the baseline system the egg that i

well approximated i-vectors

variational bayes the case we

size is ten twenty and these are the same six

we gotta

estimation that just a special yeah stuff

both

chosen

so as to was evaluated using the difference between the do not before S L

two successive based i-vector estimates

so essentially this experiment is doing between two or three iterations for estimation is

in between three and four

so that's is a specialist in this sort of two norm of the residual

so essentially what we see you know that

most of the system performance X and

yeah

and this was the reason why we phones

so that is

two

find out

so you

section

so what is that sometimes these are

this system including the required courses

and

okay system is the one which implies

the request and is comparable to the variational bayes approach does last

you see that

essentially the slow

yeah

we can be used to always

yeah voice matrix

however

note that

the lattice as we can see that

that is

quite high baseline

on the other the original the variational bayes we can obtain an accurate results just

a few percent reason

done

which one compared to

the time required tools it's not forced zero so statistics is

what was used

so that's addition

yeah he also that the not exist

the size of the box

and we can see that using

yeah it is of course there were requirements

this case it's function

significantly

and

essentially

it is comparable to that of the country

while the using

reason not to block size is allows us to

improve

right

and

and the

we have some and you never efficient accurate vectors

techniques

which are based on variational bayes submission

and the use of and

yeah

we present a little sizes line

but since then

we have some role channels to it's not very accurate i-vector we

a very

we i don't know the we present the time required vector itself

well i think that is

on the other and allows to

yeah the right directions

well we use a high

to say let's thank the speaker

so you have

a few minutes for questions

for a

yes

well

yeah

nice

and

okay

yeah or

yes i

yeah

one

it's

then

really

yeah

well

okay

that's this

five

which was

and what's

yeah

say that the results are i see that

you know

yeah

but

that's right

yeah

of course

vol

one of us

right

you want

yes as well

the base classifier

i would say that

no is this is

the classifier

right

very fast

you don't

yeah

one

yeah

questions

let me ask

i have seen the difference between what partly depend what you need or what we

try to

rotate the

the space of eigenvectors so that

it would be already gonna do you start from the same

this

since

use

yeah

yes

say

yeah

but then you effect compared with what we did basically he try to diagonalized a

separate transmitted first and what you need to diagonal structure and i

yeah

results

yeah

well as

just

make

that's in fact the speaker again and

Memory and Computation Effective Approaches for i-Vector Extraction

SESSION 01: Speaker Recognition - Compact Representation

Sandro Cumani