okay
so that morning
i
is the what we mean by a classifier fusion
of
classifier fusion is applicable uh
whenever we have some uh and symbol of of
i X
and we need to come to some final decision don't
uh
furthermore
in this um but example we we assume that those experts are able to give us off
decisions
in in in a a uh a form of some can fit
so so perhaps the simplest and also own mostly working method uh
how to fuse those scores would be just to a breach out
those confidence values
that sometimes we we have some prior information about uh
the experts and
about better uh
is
uh
in the past
um so so we would like to exploit the host
this information to to uh make
that there
fusion
so the task of
classifier fusion is to take uh
the of and
base classifiers and uh produce one
output score
uh which which ideally a uh
which
we better performance than uh
a single base classifier
so we now we where we we assume uh
so called
linear fusion
which is
a very simple method that but
uh
i also uh
used in the state of the art tools
like the focal uh
toolkit kit or or
that's that the word to at
um
so a linear fusion is just wait it's sum of of
the input scores
uh
where are the weights are trained
uh from from previous
uh
trials
with with the known based through
but what we mean by uh subset fusion of uh
is that uh
in in
subset fusion
we first
uh
so like
uh
only certain classifiers from from the full set
and those uh
then for C to to the fusion training and and fusion
what what
could be the motivation for for such
to something uh so first for the traditional
uh approach uh with the full set
it's it's
the
mostly used method it's
forward
it
computationally efficient since you don't have to do the a subset selection
oh
but
for for the lot and when we have a large number of classifiers
uh we
could be
possibly simply over
training
fusion
virus in in the
stops case
um
we might
possibly suitably but there
that
of course this this uh
matt that relies on on a good subset selection
so the question is can a subset fusion give better performance than the force
oh forty for this system overview uh
on the input we have
uh
speech
typically two utterances
a
those are
um
uh
classified by a classifier
uh
which
i by several classifiers that we that we selected from from of full set of the classifiers
and
those
passive that were selected that and fuse
more more in detail
uh
how we do it
is uh we first
uh train uh the S skull mapping
for for each of the base
base classifiers scores
a a S come mapping mac maps the scores in
uh well calibrated log likelihood ratio
um
on the one that
first
yeah a you see you see that as kyle mapping
and on this second and uh is is uh
cost function C L
which uh we minimize
uh for the match score
okay then then for each of the subset
in be uh power set up two
you
a power of and minus one
uh we train a linear fusion
uh uh with a C C W L are objective function
same same that that's in the focal toolkit
a a that one you C
in the first
uh formal a
uh
the the prior uh
with which the the C W L R
function is way
comes from the cost function
so so for the cost function we we use the new next function
but at the cost of miss type of error one cost of false alarm is one
and uh a probability of target
you're
a target trial is
zero point zero zero one
okay that then after we uh use all the possible subset we we select the
subset based on the smallest
uh
minimum uh decision cost function
so the decision cost function of uh is
is a function of threshold
um
and and
the
cost function parameters
so so for uh
we we we pick the
we pick the one with with the low
uh with the minimum decision
function
and it possible threshold
and finally we we still
but the actual uh a decision cost function which is
the cost function
in a threshold in and all the multi racial that we trained on the training
a with includes
uh uh also the
calibration error
oh of a our
base classifiers
uh we had
well
different
classifiers
uh
which are used in the a i for you called salt to part for the nist two thousand then
evaluation
um
we used three different sets of scores
uh the so called train set and it about set one
where from the extended nice
uh sre sorry two thousand page
files set
and they are just
uh
a like they have very similar uh
score distribution
and then for um
or something different you have also to
is is is the
uh
if we shall nice
two thousand and a
uh evaluations
ah
so for the results
we we divide it uh
all the possible subset
i size
uh
from one to twelve since we had
twelve classifiers fires and and study different
and measure
we can get by selecting a good
a subset
uh
but three
uh
most important point in
points in this
a a lot of are
the worst individual subsystem
the
uh best individual system subsystems so that was are the sets of
size one
only only once is them not no fusion
and uh the baseline is uh the full
in sample the fusion
where where all the twelve plus fires
so
if
usual
so first for for the blue line uh
the blue line shows
uh
the non of non cheating really realistic use case where
we predict the best
uh subset
uh from the training set and then we evaluate on the about that one
so so for for this one unfortunately we we cannot but get the better result than the set fusion but
we can get
sometimes for for
in the size if of seven right
and
uh we can get
a a very similar result
and the best subset selection or or four shows uh the best subset uh
the uh performance of the best subset uh
if if we knew how to select a
uh
then the worst subset selection or well
uh i shows the case
uh uh when we
cell like the worst possible
subset from from the power set
so
those are uh
and and of are bound
ah
okay
this is the same case uh
only not to not for the actual dcf but for
minimum dcf and you rely right
so you can see we we can still uh get
but their mean dcf or equal error rate by
by
not doing the full set fusion
so
but selecting a subs
and finally
um
this is the performance on the of all set to
or or of the nist two thousand ten
a
evaluation set
um
and we can also see
see that for for most of the conditions
interview interview uh
interview telephone and telephone telephone
the best subset
gives
that their their performance than the full and sample
only only
in the
mike mike condition there is something wrong
uh
here uh even the even the full and sample
it's worse
results than
best individual
oh
uh
conclusion of
this research is that
subset fusion has
a then shall to perform the full set fusion
course
if we knew how to select
best
uh
there are the further study should focus on
subset selection methods
they i i think that
it
uh
okay
we have a a a a a a time question
right
you this was uh yeah back from please
uh i'd like to ask if you use the same subsets for all that i was or different subsets for
all the files
uh
you mean in one of the block
or
uh
i generally so a this is this the system
you you put a not that i was to it in
yeah
do you miss select a different subsets for each high or a no no no now
okay
so like one cell
i
okay
did you can are you a solution with the random selection of the subset set of positions
uh
what we mean by a round them
just
see to D you can you shows one to me
a so you have to plot here the
to a but the two bound
okay
well well the random decision
somewhere uh
in the base
oh
and when you when you pick randomly you you and up with the performance between them
two
well
and can be could be interesting to do where you these days
maybe
okay
it
the on the random selection but uh you what
probably like to see a distribution
oh okay
but
because
okay do not mess up the speaker