oh
also
fusion techniques for extracting i-vectors
by efficient the
we went looking for some way to address
most of the memory of patient of the i-vector extractor
extracting genes
so the results a bit more state-of-the-art technology nowadays is based on i-vectors which are
very good as
produced a traditional
the computation of i-vector can be quite demanding in that at least one of the
time
so while some solutions
proposed for a system action with low memory requirements the namely
the diagonal isolate vectors proposed bigram but plus the
that is
we should also shown to have some degradation in accuracy
when
some to show some degradation of accuracy so we
well looking for a solution which does not include such degradation
but still those two
and greatly reduce the amount of memory required to store
so how variation action again today
that represent the original baser aside for i-vector extraction which is
can see that one in the previous two
then we present our conjugate gradient approach for i-vector extraction and finally present some experimental
results of these techniques
so
i guess everybody else what's i-vectors are but does brief introduction
there are not only for low dimensional informative for each utterance the presentations which is
that i don't is like model
so the most widely used
i-vector race if we
assume that
most of the speaker and channel variations like that small subspace in the supervector space
then we assume a session prior for the latent variable representing these variation
and
approximating the data likelihood by means so well with statistics we can compute the posterior
of these latent variable
and then we compute the i-vector a maximum a posteriori estimate of the latent variables
we can show that the post
is abortion these correspond to the a posteriori
and for the i-vector
so as you can see yeah
computing is the computational cost matrix
which and tasty a multiplication of the for the inverse matrix times the eigenvoice matrix
is that
or additional
these
this dataset
which are
a dimensionality which is where the i-vector dimensionality
so
we can see that
no plastic on a selection techniques can be
and
that is all so you see represents the number of abortions and the feature dimensionality
and then use the i-vector dimensionality
so if we don't put anything we have a
complexity which is the
with a quadratic in the i-vector dimensionality
and is the examples in the number of gaussian in the definition of two features
we can reduce the complexity but i mean and we ask that if we want
to this matter since he
but this is we have a shot of memory constraint which is again quadratic in
the effect of the nation's and proportional to the number of abortions
with
jessica like two thousand forty eight dimension of the ubm as used in this is
easily the most expensive
part that's memory of an i-vector instead
so i thought that was the last yeah i organisation based on that have a
nice mess over a vector instruction was proposed
which essentially okay we can i forgot mention that we can have the same as
that yeah
from the form has just by performance a normalization for the problem with statistics and
in this case of the eigenvoice matrix
then we can assume that these are simultaneously that as a model by some methods
Q and that we cannot compute an approximation of the posterior covariance which is
the yeah not so that
and session
can be performed in a much faster way with a very limited additional requirements
however you know it's
yes
right i can cause a degradation recognition accuracy
so we wanted to do better in terms of what you see here
so
and we said that the problem is the computation of the covariance matrix
the problem is that the covariance matrix is not that yeah
if you
this means that the i-vector components would be uncorrelated
you're and the posteriors that would factorize
so even though the posterior said that cannot be factorized about the different components we
look for an approximation of the posterior which factorizes all the sets of the i-vector
components
so we partition the i-vector components in to be disjoint sets
and we assume that the
here are can be approximated by
i distribution which factorizes of these states
yeah
the correlation baseband for facades a
way to estimate is the approximate posterior
by minimizing the kl divergence between the original posterior and this approximation
so
yeah i need to introduce some notation
namely we just
then all the
a simple the eigenvoices an associated to each block
of the i-vectors all each can i
we i is associated with a low that you wanna buy vector components
and these are just the compliments of those
subsets so that we can express
duplication in this way
so if we do some until we updated for each
a factor of the posterior of the approximate posterior
the its distribution is a great nor without expression which is very see that the
original i-vector inspiration
the difference is that this precision matrix is here are computed using the eigenvoices relative
to this subset
and for the mean of the posterior we are essentially centering the statistics over a
slightly different ubm
essentially we
say that
if we assume that are not components of the i-vector a fixed size and we
are
to this end the statistics of these new ubm
and
this is
these are allows us to see what is the complexity of this that be
we do not take a
okay reestimations only a new implementation implementing this technique because
if we just compute this at every time with a block size with a block
of size one
the complexity is again what that the unit vector images because every time
centering this
so we need is
we keep a supervector of a set of statistics which are always cat center of
the i-vector estimate
and we use the real well then you mean is computed by removing the centre
and all those components that we are estimating and then after we had they the
mean we update and you'll a vector of since order statistics so that its center
of the joystick to be a vector
so this way if we consider the contribution of the computational the precision matrix the
complexity of this approach is proportional to the dimensionality of i-vectors and the number of
iterations that we need to perform
to compute the i-vector
i can see is so that the similarity of this form with the original i-vector
was the covariance matrix essentially these are the block diagonal of that the last matrix
and we can model
again
two different techniques to compute the and you know we
compute
we therefore computation to compute the every time this covariance matrices
or we can restore the block diagram but also the audience matrix so in this
case we get
plus i selection time but slightly higher memory and the memory requirements depend on the
size we choose for the block
so essentially well we can show that this variational bayes and the variational bayes approach
implements a gaussian approach to the solution of this you know system
and we also investigated a different
techniques for
so it is used and namely the jacobi method in the conjugate gradient vector
what we found out is that the jacobi method is very see that this approach
but instead of updating the
i-vector after each iteration you have a vector is updated only after all components to
be estimated
in these encoders and this causes slightly slow whatever
the
the convergence rates in our experience
yeah we analyze is conjugate gradient
what's nice about squinted at it is that we don't need to be bad
the
covariance matrix here
what to do is that we don't even need to compute it really because we
just need to do the product of this matrix time a general vector which is
required by the conjugate gradient algorithm
so if we write the computation in the
but for your precious in this way we can see that the computation of this
product is a say should be you know in apples
you don't the components so it's not in the number of the components of the
ubm
number of features and dimensionality of i-vector
so we have a complexity which is the same as the variational bayes approach
so i guess
this kind of what's nice about this technique is that we don't require any kind
of additional memory
and has the for the variational bayes approach we can use this technique what's a
full covariance ubm if we do the prewhitening all the transmitters
ubm ones
so
i'll show you how we show you some results on the female dataset the extended
telephone conditions one is
so we do then
our setup is a sixty dimensional ubm we
two thousand four components
we ask for permission to make
we use
but i will length normalized i-vectors classifier you have
you know where
limitation we assume efficiency issues so i'm sure you
the results
those
so
before seen the results just one point out that
you directions
yeah one is an article
the exact i-vector also
and
so if we don't know that we can recover exactly same
accuracy or you know classifier
so you interest in is
see if we can do that i mean
we can stop yeah and still
achieve good results we
process structure of course
which one
which was the first one
so yeah i'm showing the results of the baseline system the egg that i
well approximated i-vectors
variational bayes the case we
size is ten twenty and these are the same six
we gotta
estimation that just a special yeah stuff
both
chosen
so as to was evaluated using the difference between the do not before S L
two successive based i-vector estimates
so essentially this experiment is doing between two or three iterations for estimation is
in between three and four
so that's is a specialist in this sort of two norm of the residual
so essentially what we see you know that
most of the system performance X and
yeah
and this was the reason why we phones
so that is
two
find out
so you
section
so what is that sometimes these are
this system including the required courses
and
okay system is the one which implies
the request and is comparable to the variational bayes approach does last
you see that
essentially the slow
yeah
we can be used to always
yeah voice matrix
however
note that
the lattice as we can see that
that is
quite high baseline
on the other the original the variational bayes we can obtain an accurate results just
a few percent reason
done
which one compared to
the time required tools it's not forced zero so statistics is
what was used
so that's addition
yeah he also that the not exist
the size of the box
is
and we can see that using
yeah it is of course there were requirements
this case it's function
significantly
and
essentially
it is comparable to that of the country
while the using
reason not to block size is allows us to
improve
right
and
and the
so
oh
we
we have some and you never efficient accurate vectors
techniques
which are based on variational bayes submission
and the use of and
so
yeah
we present a little sizes line
but since then
we have some role channels to it's not very accurate i-vector we
a very
we i don't know the we present the time required vector itself
well i think that is
on the other and allows to
yeah the right directions
well we use a high
i
to say let's thank the speaker
so you have
a few minutes for questions
for a
yes
yes
yes
well
i
yeah
nice
and
okay
yeah or
yes i
yeah
one
it's
then
really
yeah
oh
yeah
i
i
i
i
well
so
oh
okay
that's this
five
which was
and what's
yeah
say that the results are i see that
you know
yeah
yeah
or
but
that's right
i
yeah
yeah
of course
vol
one of us
right
you want
i
oh
i
yes as well
the base classifier
i would say that
no is this is
the classifier
right
very fast
i
i
you don't
i
i
yeah
one
yeah
questions
let me ask
i have seen the difference between what partly depend what you need or what we
try to
rotate the
the space of eigenvectors so that
it would be already gonna do you start from the same
oh
this
since
use
yeah
or
yes
say
yeah
i
i
but then you effect compared with what we did basically he try to diagonalized a
separate transmitted first and what you need to diagonal structure and i
yeah
results
oh
yeah
well as
oh
just
oh
make
that's in fact the speaker again and
i