and a causes b well i think we're interest and therefore above is the speaker
recognition for telephone number of is one data
usually my these submission form is design a
this is a during war from distances on the human language than only standard estimators
and standard orders
it's and language processing from the two in my feeling
these assigning the income tax
c d is like we telephone speech intonational i mean
and the audio visual like composed of them on the internet deviates from the bus
core
there is one that have a speaker recognition on face recognition only working model of
this
all also used and what database or some formula one and very well why don't
you lda or cosine scoring okay
in the other side the key points for women for still you what do you
really
not is gonna the nn are businessmen
still be lots kind of a place the nn vectors
also
but cannot they still using in the melee but mostly from any estimator fine tuning
to in domain data also
one will be assigned the key points where usage of rain is that are based
bodies
okay we use cosine a score in several areas these to combine it is and
variance from different be there
again we use this on the i
what i will face acoustic features that similar well for be overcome based detection
problem that we just a speaker
or face images
and do not isn't based on but when there walking and we kind of in
this course
we start describing the oracle systems
so we're starving different acoustic features we use and this is used for units vectors
and build a lattice for rest of this vectors
it's be we use community vad or sixty s and you don't and v for
really
in video we constantly system
so what we'll from there is a sin was clustering or a be lda gmm
that a single speaker factors in speaker labels posteriors
we used to estimate of labels
based on similar you know and as the best one double will make me but
is not very sure would be is generally
also on responsiveness might consider are less money is
this and that was one based on god i
we got some improvement will is but i during
i seriously we're finding the for the n and then what we in domain data
just finding the leslie using four letter words in this way embodies becomes a sinus
or
and we call this
besides discriminant percent
so we have seven that is that or architectures
we have
i was gonna be and then but since
three five basis
than better since what we're gonna since the is the same that we use of
and sre
the contains translators from new domain
with a linear size of
one thousand four
alright is an utterance
we unknown
and therefore based on find a we have regulators five miles away two thousand forty
eight
are very agreements
we also several possible ways that five questions
they're having less than wireless the inverse there wasn't one the and that's always been
feeding
this is this one of the datasets used for training or not the inspectors
so it's in serious condition
zero use switchboard was designed for okay
r c of this work
it's
there isn't or is something the in work we use all the data set someone
one their completion
a is evident in a we use the same but with the model
we remove the so systems i one microphone
lincoln labs the still use businesses
microphone
confrontation or this though
we used as i e one
and i'm gonna is this study
and you state
or they are all from being the one d c and we just use the
most of the thing in this
we have a for like principal equations
c l is the only one last use the first configuration that's the line of
the you're
let's say that we have some all domain and some in domain
first we and that the out-of-domain in domain using their or a little
and they're all in an out-of-domain data in
then we use a different thing that in for in domain
although mean data
we use common whitening
then the my face
the other two in domain data
are then at the score normalization was and in domain data a calibrated
but for steely and have a three by conventions
something that and use for that all lda
and the use yes everyday the lda for a swear and very nice thing what's
almost instantly
are also in the scoring or
we also the lda for cases where
and then it is then we only the model in salt
or
so this is a this what are the values something the markets
that's a small difference between sites
but as forces us ordinance yuri a on
the use this study for then i x values to some well on this study
in u one
for the dc one
as you use the is something at you by
we also and since the only problem we also use the unlabeled
that it really by doing clustering
or other score normalization we use the only really
i'm use the sre seen that for
or maybe a we just think that can almost the latter
this is a very good speakers in the white honestly demos data
score by bayesian also us
the i have to be also provided us an significant improvement
a value will use this i think bias you one for calibration
that's you know this used the silence
first we analyze the us also that five million and
romana something the we use
where a source false or misleading there
on the on the lower a sliding i b d one all the
the base then system used unsupervised really in a bayesian with this study only
then in the signal were we is that in the u one okay
provides a very nice
then we i we are noise segmentation lately
that improves the convince your in the u
then we have that the a spectrum and also
and the in domain be i get some room and you the by a small
improvement
all in one
i think that if we change that sure or then run your that's where we
made the grade on our way we
getting some
implementing that you well limbaugh an improvement in the
also analysis on this you by also versa before rest
the bayesian network use a risk of a system for based silence mean versus evaluation
will also must present a unique
then we alignments unless something dusty the data
provides a nice improvement in the u and it again
then we a the we got a number of channels in the network and that
provides a small role
not remote really okay and we define the never will always unusable sinus fourteen
so on without use of us more ergonomically baseline but in there about their grace
and they always fits to the or something or thirteen data
and that's was in those identity
these are also all four to all the single system
the based system is your five better results before was one of the database sinus
ability have okay
so we're very close to be easily affected formal system for which channels
a personal one of the
and
for this part of the nn with the
will be the training set
in all cases you was greater than this method was i
or we apply several
medals for the fusion we have there
but it's a you don't use of in it was used in calibration and yes
is for a basis for
an efficient v
once you so in the real assisting calibration a one when you mean and another
is that it is not the union that i mean and
the scores
a quality with a where we can see that is consistent when interviews with a
very high or station
are you sure we got everything we on over and over
so the based system for us your proposal by in address the source for calibration
i
i think five series systems with but like plus three system is not possible
or
usually might need them
we have the fusion of existence
and the basic progress is a thing with fusion be but obviously once she
the best results that they want you can see that are the system also
the present problems phones your feature
no it's either a your problem of your results
was also an analysis of our last for the nn are where lunges it was
also for delay of advanced
or the u s
the first figure analyze this problem i phase you're
so and we can see that score normalization provides more meetings in a savvy the
in domain sre an eighteen
also we can see that i mean by handle this problem i faced is that
why
provide some a similar guy
great
the second year so the was also a v i
right and that we will one between their usage
so the decision rule
the relative improvement in bic studies
log in this i mean idea of illness i in
so systems and it is easier to the utterance
besides the results of the signal system that we used in all submissions
we can see that there is anything about christmas is to have that is that
e d u
these is too small
so you systems for the reestimation by a significant
all by n c l is be part of the nn a waitress
there is no right in assigning from using y for a given in a network
for this
we use a real efficient is the input shows the system for fusion
we just reading writing i
includes your we still is involved in an a small step
so you're right value is yes one system
you'd reminding contrast to estimate ubm
the misuse
have a very similar a million this year use women right i have the base
a once you
now see the face recognition systems
this is there may be a front end
the bible any something will be different for enrollment and test
but elsewhere well
phase of that still
then enrollment
we use the reference mumbles and you the test phase
but overlap with the telephone calls
in this will yes all the faces with it
then we used the final
modeling more on the original on a small line ungrounded phase and then we use
that are facing varies
we use briefly visited those and invariance
you just be used every now and a snack implementations or within a face on
our face unless you use the one d by the implementation
we examine the task as a c n
the video but since what are based on percent is for
series system doesn't use score normalization for enrollment the average the enrollment and variance
and the test set the new animated clustering with a twenty one clusters
unless you listen we have several and robustness the
but based methods also indicated in table we have
you mean and variance
averaged and variance the median of a multi clustering so turns you form an alliance
you
maybe also balanced young ones used for in somewhere in the media we go
similar to his twitter that's they will i know fine inventing which is then weighted
average
all the meetings rooms
in the total attention we obtain a single invading for this with a weighted average
all the testing babies
but also
and enrollment set
no see the this problem model
we have analysis the csp markets for this experiment we used in save face first
one hundred and very
the best figure is without is not understand your is it is not
is not improve the low in the guns you one and it's a need in
the
well rules less in this study night in
you one
and the baseline and in the about is the
made in enrollment bonuses are limited clustering in the that is
well as in the other datasets
the baseline peons overall only once the contents of attention
there are more steam or impostors are statistics
we compare the different and variance improve work of the us you by the question
and now there was as follows we have
the questions all the inside phase
printing models
we use the whole or can we use a already some enrollment and omit the
last three test
area so we can see that the white gaussian is better than a form the
exact reason but is there a lot of in the network a very significant we
can see that doesn't work on my personal
this of the submission process
then used primarily
is a really use general
the only last three assumes be systems on the taste of is a well this
year
this using a system is close to the right
using a system is worse a posteriori because we're and based on we were or
generally but one best so that no one
analysis
against a
based on the equal error rate
well no that's impossible
this was also than one model
in addition
so for the fusion we assume that independent within that we live video these so
we assume this calls
in the figure we have a combination of more than useful single all those used
in
the additional value systems
single videos used in a fisherman previous nist and finally in one more
we can see that
we can get yours implement all eighty percent exactly
when we will from a single of assistant
who but it would be more efficient
okay
the key will results was using be data
the no more than one the one used
well cts less money loss
probably provide some woman we're got significant improvement of that a spectrum that for some
backends we
small liberal in domain
they can perform better than listening
what a probability of the screen but it was saying performance where
without the need for every
the results difference between as i the n-best and instantly in obvious that we wonder
why is that the is fitting work
so it is also studied in it has led with the transform it is because
the italians or entity that or
i mean doesn't in there is no
so we won't remember a city bus always focus on the same the on the
other side exactly you have already body was also incredibly or we don't want to
solve problem
we're really on all levels
i mean and variance
and organs performing very well
i mean it is obvious what is only obviously modalities are when
in the unimodal this so we will maybe that's came are used
that's all from my say thank you for