Přepis řeči - CLASSIFIER SUBSET SELECTION AND FUSION FOR SPEAKER VERIFICATION

okay so that morning i is the what we mean by a classifier fusion of classifier fusion is applicable uh whenever we have some uh and symbol of of i X and we need to come to some final decision don't uh furthermore in this um but example we we assume that those experts are able to give us off decisions in in in a a uh a form of some can fit so so perhaps the simplest and also own mostly working method uh how to fuse those scores would be just to a breach out those confidence values that sometimes we we have some prior information about uh the experts and about better uh is uh in the past um so so we would like to exploit the host this information to to uh make that there fusion so the task of classifier fusion is to take uh the of and base classifiers and uh produce one output score uh which which ideally a uh which we better performance than uh a single base classifier so we now we where we we assume uh so called linear fusion which is a very simple method that but uh i also uh used in the state of the art tools like the focal uh toolkit kit or or that's that the word to at um so a linear fusion is just wait it's sum of of the input scores uh where are the weights are trained uh from from previous uh trials with with the known based through but what we mean by uh subset fusion of uh is that uh in in subset fusion we first uh so like uh only certain classifiers from from the full set and those uh then for C to to the fusion training and and fusion what what could be the motivation for for such to something uh so first for the traditional uh approach uh with the full set it's it's the mostly used method it's forward it computationally efficient since you don't have to do the a subset selection oh but for for the lot and when we have a large number of classifiers uh we could be possibly simply over training fusion virus in in the stops case um we might possibly suitably but there that of course this this uh matt that relies on on a good subset selection so the question is can a subset fusion give better performance than the force oh forty for this system overview uh on the input we have uh speech typically two utterances a those are um uh classified by a classifier uh which i by several classifiers that we that we selected from from of full set of the classifiers and those passive that were selected that and fuse more more in detail uh how we do it is uh we first uh train uh the S skull mapping for for each of the base base classifiers scores a a S come mapping mac maps the scores in uh well calibrated log likelihood ratio um on the one that first yeah a you see you see that as kyle mapping and on this second and uh is is uh cost function C L which uh we minimize uh for the match score okay then then for each of the subset in be uh power set up two you a power of and minus one uh we train a linear fusion uh uh with a C C W L are objective function same same that that's in the focal toolkit a a that one you C in the first uh formal a uh the the prior uh with which the the C W L R function is way comes from the cost function so so for the cost function we we use the new next function but at the cost of miss type of error one cost of false alarm is one and uh a probability of target you're a target trial is zero point zero zero one okay that then after we uh use all the possible subset we we select the subset based on the smallest uh minimum uh decision cost function so the decision cost function of uh is is a function of threshold um and and the cost function parameters so so for uh we we we pick the we pick the one with with the low uh with the minimum decision function and it possible threshold and finally we we still but the actual uh a decision cost function which is the cost function in a threshold in and all the multi racial that we trained on the training a with includes uh uh also the calibration error oh of a our base classifiers uh we had well different classifiers uh which are used in the a i for you called salt to part for the nist two thousand then evaluation um we used three different sets of scores uh the so called train set and it about set one where from the extended nice uh sre sorry two thousand page files set and they are just uh a like they have very similar uh score distribution and then for um or something different you have also to is is is the uh if we shall nice two thousand and a uh evaluations ah so for the results we we divide it uh all the possible subset i size uh from one to twelve since we had twelve classifiers fires and and study different and measure we can get by selecting a good a subset uh but three uh most important point in points in this a a lot of are the worst individual subsystem the uh best individual system subsystems so that was are the sets of size one only only once is them not no fusion and uh the baseline is uh the full in sample the fusion where where all the twelve plus fires so if usual so first for for the blue line uh the blue line shows uh the non of non cheating really realistic use case where we predict the best uh subset uh from the training set and then we evaluate on the about that one so so for for this one unfortunately we we cannot but get the better result than the set fusion but we can get sometimes for for in the size if of seven right and uh we can get a a very similar result and the best subset selection or or four shows uh the best subset uh the uh performance of the best subset uh if if we knew how to select a uh then the worst subset selection or well uh i shows the case uh uh when we cell like the worst possible subset from from the power set so those are uh and and of are bound ah okay this is the same case uh only not to not for the actual dcf but for minimum dcf and you rely right so you can see we we can still uh get but their mean dcf or equal error rate by by not doing the full set fusion so but selecting a subs and finally um this is the performance on the of all set to or or of the nist two thousand ten a evaluation set um and we can also see see that for for most of the conditions interview interview uh interview telephone and telephone telephone the best subset gives that their their performance than the full and sample only only in the mike mike condition there is something wrong uh here uh even the even the full and sample it's worse results than best individual oh uh conclusion of this research is that subset fusion has a then shall to perform the full set fusion course if we knew how to select best uh there are the further study should focus on subset selection methods they i i think that it uh okay we have a a a a a a time question right you this was uh yeah back from please uh i'd like to ask if you use the same subsets for all that i was or different subsets for all the files uh you mean in one of the block or uh i generally so a this is this the system you you put a not that i was to it in yeah do you miss select a different subsets for each high or a no no no now okay so like one cell i okay did you can are you a solution with the random selection of the subset set of positions uh what we mean by a round them just see to D you can you shows one to me a so you have to plot here the to a but the two bound okay well well the random decision somewhere uh in the base oh and when you when you pick randomly you you and up with the performance between them two well and can be could be interesting to do where you these days maybe okay it the on the random selection but uh you what probably like to see a distribution oh okay but because okay do not mess up the speaker

CLASSIFIER SUBSET SELECTION AND FUSION FOR SPEAKER VERIFICATION

Speaker Verification

Přednášející: Filip Sedlak, Autoři: Filip Sedlak, Tomi Kinnunen, University of Eastern Finland, Finland; Ville Hautamäki, Kong Aik Lee, Haizhou Li, Institute for Infocomm Research, Singapore