a little buddy i'm happy to be here
"'cause" this is my first where the field of speaker recognition
and remember
one attorney
then these four
providing such approach and eighty four that's two
participate
"'cause" this i think it is
an important to the improvement you can be
that we can improve the speaker recognition system
with the holding these kinds of challenges
okay
what i'm going to propose is the
kind of the idea from the beamforming
which is a name is technique in signal processing
okay what are going to
present
the first one i one to try to explain what beamforming is and the
what how we apply to this challenge
we explain know how we can solve the problem with an adaptive filtering
and then find an optimal a beamformer in order to
solve problem
first of all without any constants windy and can have that
we include a sensitivity
to make you more robust
and or work also include some modification of the
possible audience matrix and some one was it or more station in order to
owing to the
performance
so
what is to know what we suppose
from i-vectors
i-vector is interesting because it's
provides
a fixed dimensional representation of any arbitrary length
speech
and the what
problem with i-vectors that it varies with different i environments speaker role
and this is the challenge
in this field
okay in interspersed intersession compensation we are going to remove this unwanted variability but in
this challenge
using a probabilistic linear discriminant analysis is not going to be a good idea since
we don't have any label for the data
and if we provide this label for the data it's also you'll be
the performance of that clustering labeling be affect the performance of p lda
okay
one important things is that
what
we need
if you have a lot up
speech data
so
we can use of these amounts of
available data for example in a speech sensor
up with telephone speech centre there a lot of speech data passing through
so we can use the take advantage of these data in order to improve speaker
recognition
instead of providing some
artificially
data by labelling them
so the p lda what is similar approaches the
it's a two
have label
so this is not a good idea to use that we taken on a new
approaches
two
solve the problem
so if it can't
finest within speaker scatter matrix reliably so why to be
why don't we go to find that the between speaker variance then increase that
okay
the first things the on going to explain is the beamforming
it is the signal processing technique
from since we're base in order to direct the signal transmission to a
desired target
and adaptive filtering is used the two
using optimal filtering the interference rejections
in order to estimate the signal of interest
so what i beamforming operation is that when a signal implying on some and ten
as well
from the same distance
it then passed through a filter
and then the results
the that filter
the
desired angles
and rejects all the other groups
this is the same as the
dot product of to a filter and the sequel
so if we can
illustrate the
idea is that in the
omnidirectional antennas
the signal the interference of the targets of are treated equally but in the beamformer
we all focus on the talked
so what i have
filter so we are going to design a filter like this the w transpose start
by where i is the i-vector and w is the filter
so we wants to
pass the target speaker to this filter
but we check all the others impostor speakers
so the development set is
impostors so all day impostors comes from the development set
so iffy
use the mean square error
in order to solve the problem
we reach the this result as it can see here
okay
the w is there
a particle filter for this solution
and parties the
autocorrelation matrix
and i is the target which can be estimated by using
okay listen to compare it with the baseline system
the baseline systems
is computed after whitening the i-vectors
and the using it that the cosine similarity to find the score
you can see that when
the use cosine similarity before that we should the normalized the math of the i-vectors
a display but in the
adaptive filtering as like just
explain
there is no normalisation of
the i-vectors
okay just a little further unchanged a criteria
in the beamforming the minimum variance distortionless response
there is a new approach area that is to maximize signal interference lost more information
so we wants to
maximize
this relation
that is to maximize the output of the filter when the targets past
but to recheck all the
impostors the to want to minimize the
did not meaning to but t vs the dominate two
in order to
solve the problem we assume that
the nominee two
equals one
that's the
all
the best way
so
we wants to minimize the
did not many to which is this for of a pasta been passed through the
field
where a value of that
and here
particles the
impostor the covariance matrix so the optimum solution for this problem
can easily be found this way
so let's just compare it with the cosine similarity
the baseline system is like that and the mvdr proposed this way
so if you look at this idea we see that
this nor mvdr suppose that new similarity measure
that does not include the normalisation of the test i-vector but focuses more on the
targets
the result
shows that
it's will provide a
improvement of seven point seven percent
in the
i-vector challenge
so let's the goal and
step further and to make it more robust
as the we had we had in the previous
the slide that we use the all the mean of
all the target i-vectors in order to
so estimate the target since the mvdr suppose that there is no uncertainty regarding the
target
but in this
the linear constrained minimum variance speech and the include uncertainty by some linear constraints
so that we anna i all the i-vectors provided for the target
in the matrix c
and the
we enforce
that the past the filter
we the value of one
so f is equal to one
so if you solve this problem
the optimal filter will be as you can see here
and when we applied to the challenge
there is a more another improvement of three point seven relative mvdr but
and then eleven point one percent relative to the baseline system
so
now we have your no we can
do an additional the
job
in order to improve the performance
since you need in signal processing
there are many a more techniques such as will paucity palm beamformer or public constraint
robust keep on the formant were two
improve the performance by top only loading the
covariance matrix
i just used a similar approach and the but use the pop impostor i-vectors
ward the most similar to the target i-vectors
so
in this way we compare we passport impostors through the
filter for each target
and
selected those what was six thousands impostors know the for similarity
two and computed the covariance matrix again
this result in a
very good improvements
of twenty one point five percent
relative to the baseline system
we can see that
is
for all the impostor when compared to the
a target
after the
applying is you have covariance matrix modification we can see you
the put reduction
in the schools
another
factor to be true for the
or the speaker performance was to use that score normalisation
i just found this relation
the best
contrary to some others use the variance of well
two z norm or t-norm use the various of the scores
we could not do that
and this results nine further improve
okay
and let's go and the more a supervised
okay
that we use the within class covariance matrix
fine using some clustering methods
but this clustering method is some what different
as we
three set each target each a single i-vector in the development set as a target
and found the closest or the similar
i-vector to that target
and this is repeated each time at one more in order to find more similar
i-vectors
after finding those i-vectors
we use the this formula in order to compute a within class of the tools
like vector which assumes to be from the same speaker
and the final model
can be found by adding this w since we what we apply to chance so
the inter session variability as well as the
rejecting impostor
so we added together
to find this optimum what
so can see the results
it's
you to leads to an improvement of twenty five to twenty seven point five percent
relative to the baseline system
so in conclusion
we have proposed a new
idea of rum of the signal processing for adaptive filtering in order to solve the
i-vector challenge
so a modification of the impostor covariance matrix can be possible
this way
so
we have used the
this idea
two we can apply to p l d i thing to do we can
improve the speaker recognition if we apply to p lda
but we had not much enough time to do that
thank you for your listening
so one time if do not remember starts and eleven we did language at that
was doing something cosine and i was i the target michael length normalized text
we should the same what's your data
would be inferred and it was cut off the backend was able to a calibrate
scores but you have some shift in the scroll effect on two test
sort calibration sessions cell so that's what happened time so the calibration what nick but
also was about the clock offset estimation
but we have that we find that too much worse when you wait when you
don't normalise the test
so no
for language id was possible speaker okay but
okay thank you
good