money um my name is raymond and we have from the chinese university of hong kong and the two four if a come research in singapore the topic or four today's days presentation is score fusion and calibration in multiple language detection with large performance variation score fusion and type not clearly defined a or to the best of our knowledge but uh in this paper we will define that to be a a process which combine or and uh i just the numerical by of scores from one or multiple detectors systems for low action call so a to be more a take uh uh i think of this we have a multi dimensional score factors from different detection systems or even a a different language detectors and what we want to have is to combine these multidimensional fact in some way to or obtain a scale a decision to the detection well oh of a particular language say so the question involved or in clues a how to adjust so combine the numerical value of scores and whether or not we need some criteria to uh guide these as just month so to name a few common approaches to fusion and calibration all we can just find like to detection systems line combine this score nearly with then a but a bit to wait and only need D discriminant that's is and goes in back and is another popular approach which assumes the uh multidimensional score factors of different detection "'cause" as in to a normal distribution and a popular approach is the logistic sorry but used to would question back and which combines the uh a detection scores with the uh maximum posterior probability criteria this method is O many of these could be approximated by a fine or linear transformation in this paper we going to focus in performance variation it is a finally define term but um generally we are going to cover this in our where we we will face a performance variation among different detectors systems all performance variation among different language actors in a following will have a multi-class logistic regression to deal with the situation of all variation among type systems and all error and this deviation calibration to do with the situation of a a variation model different language this is this in uh we have all we test it with the uh nice oh i are you two all nine and what we are having is one from a tactic system and one prosodic system we can see that huge performance for between these two system so a a a for the prosodic systems the eer of languages is range from i six percent to twenty seven presents intuitively uh we would hope to use the prosodic of the detector as which are more reliable i mean these languages which have have low errors and then we put more weight on the because it should be more reliable so we want to investigate deeper problem it it's setting in the common um multi-class logistic regression a a a a setting and are we going to demonstrate our reduction of this C average score scroll but look of like system so this is the uh a i a set up we have a too languished the data system P H from a to one and the P R prosodic one we have the light was scores or E for try okay in a language a language and T so we just to uh a you combination of the two systems scores i want to find is the uh you a combination weights are a and the uh ice factor number so in a time are are we just can see that this equation and uh we optimize oh of the data and got mode with a a a a maximum posterior probability criteria this equation so this time P a is the ports they were probably probability of of the cost and T and we just use a selection function to choose oh all those a in class data and uh finally we combine the posterior probability of all different classes is to have a a general posterior probability or mation to the cope with a large performance variation we just do a very slight oh oh changes as a two D R with foam we will test the language specific data which means we're going to use different by for different detector languages oh because we believe that there are some prosodic a a some some languages which we have a while in the prosodic system and some language pairs be high of back the in the prosodic system and um we going two try also or or or a a a two move the guy about the by uh in the out to see that there any facts in these but they the case where we have a yeah performing a prosodic systems so the uh in implementation we just a follow the about "'cause" focal to kid and do a slight modification in court we go to a i ran this deviation calibration for of dealing with the problem of variation among different not detect languages in know a a a a be there are ah pairs of we like the line just in that you two or two O nine so these oh pairs of languages becomes a bottleneck neck for D all or detection of different languages to be more but take that we look at the uh are you are generally it's four percent's but a what if focus on particular language just say possibly in we will have an error of twenty presents this the error problem the from a tactic a a the combine fusion system and the confusions between particular language language pairs say posse in croatian can be as high as twenty four percent and not the situation is same for D we can find a serious confusion between jean D and who do so can oh but calibration at we're from based on minimum a and E S a deviation was proposed or here and in this uh and we're from we hypothesise that they are pairs of detectors which contains similar that and complementary information because uh we have set a about serious confusion between pairs of language so we look at the uh like the ratio between well type language and one of the uh confusion line just what we call that related languages on top of all of the M are we find the optimal all combination of a a a for that means still have that is already the the the results after a and then we do a second time calibration and find the are optimal a i'll for parameter this transformation is same as uh that we seen you know mow is also a fine a is we have been talking about a confusion between particular pairs of languages so we can find our calibrations to uh selected data subsets oh of course is not possible for to guess ought to know in advance a a way oh a particular trial be to these will a language pairs all or or not so we use a whole re take uh just to choose the past two score among the a multi-class score factor and a a lot to guess to to obtain deep the the the estimated uh a trial was for our calibration so this is the uh optimisation equation for to find the optimal parameter i'll four oh we just start with a this difference time give first a a the minus the of to is just deviation of the a that the the arrow from the reference a a a which is actually is the detection threshold and this would like a it's uh like function it has a positive do for all in post to data and has a negative find of for in class data by having the uh product of this why and uh the difference time we can have all positive steve value for and is detection and active fellows does for a correct detection so have to my stations are only concerned with paul stiff very here so we optimize thing oh or two was a minimum total erroneous deviation but not the the ones with the correct detections oh we have to parameters you there's a loop seal on and also the and which are kind of the uh application dependent parameters oh if we just the oops don't is just ship the detection whole uh a if we is a a a a a just the and is scale the importance of fall detection misses versus false alarm so you is a brief comparison between ml are and uh our proposed a calibration i would from oh that the same a a in that both that were from a about affine transformation of score a all the M L O focus i know a a a a uh the a maximum posterior probability criterion where as our and with um is optimising was a minimum error and this deviation L use a lot with data set and uh in our a implementation we just select data set and then i was a standalone process yeah our calibration algorithm operates on top of M are i i am a its application independent and hours have a specific problem the settings for C on and of the importance oh against this is all false alarms so short comings all of our a posts or calibration i will one base that a target languages to be calibrated switch means the uh a target language related languages has to be predetermined in advance we want to enhance D calibration out from by allowing on-the-fly selection of the target languages for collaboration such that these kind of which i were from can work like in the general situation um we go back to or original hypothesis that a lot lighter races for indian and all contains similar as and complementary information and we do a post hoc and for this course of P trying to three detectors in the are you two O nine we just uh in or pairs from the trying three detectors and we plot or of the uh light whose scores of target classes and T against a of the classes uh and are and uh or we have a a interesting finding that for those related language which means to say sheen D and who do you can see a a very strong correlation of scores for the impulse discussed data that actually a a matches the uh are we should no hypothesis that if we want to find a pays of oh language detectors to calibrate they have to contain similar and complementary information so these is just and that the proof or our which you know how close so this is the case where the two language detectors are not source same that we do not see a of very high correlations in the the cost data so he we uh propose just imposed to to sticks first a we impose a minimum correlation of to point nine between two detect as before calibration mechanisms can be in and for every target class and T we just find a language with the highest correlation and how to act as a pair of detector for calibration so he's experiments do this experiment with the east or language recognition thanks a if two a nine a a closed set of thirty seconds language detection task we start with the uh from a to take a P P L B as M system with a a C average of four point six nine percent and uh we do the lda da sim by and i had fun that i mean of P for all all calibration uh i with and we carry out four experiment to cost first this we try different and L settings seconds we are tried the on-the-fly selection of the target language pairs and we also see D uh a new without of of these on the fly selection with that mean you room every N is deviation calibration and we and that's is the calibration result this C uh the M our results for different parameter the settings are we generally we have four point six nine percent C average scores oh we found that or the language dependent to which means the second row here only give a much you know error reduction which is a a actually not what we have a have expected home and the best results we have is here with a language dependence they pay to and uh also with the uh by vector calm up presents that is about time point five T for reduction of the C average i how with a this set so then we use our correlation method to find the R and yeah now pair for for the calibration so uh this is the pair trend three pairs we found a all the word i through high lights the pairs of related languages least that's by are are are uh the specification in these two O nine we found that the correlation method recovers or language pairs which are specified as mutually intelligible except for all rush and and uh ukrainian and in fact uh even if we use you can use scrolls to calibrate russian a we found that that the the the the a red did not a do was which means a in terms of the the data that the two a language detectors are not swayed same at a high correlation in impose the data is necessary but not actually sufficient condition for a and i'll from to work effectively and we see in the following slide oh with these trying three pairs of a similar language we carry out the uh minimum erroneous deviation calibration the C average we'd uses form or a point two percent to three point three one percent and uh as we have set the maybe some or language pass which may not be really use for so we look into the uh errors that T six off each specific languages so we got uh we picked this C average function into to a C D tax and also P ms and P false alarm of different type align just and we'll "'em" rate S three language here on the first table and uh at the worst three or languages language in the uh table in the bottom a are we have an very interesting finding that uh the a keep a stiff or negative all of the of a problem to actually correspond to the uh preference we need a two was base or false alarm if we are having a a positive if uh how for uh actually the minimum error an is a deviation calibration would give us a a small L a P ms and if we have an active uh a how for it and the uh find a we were have a few fewer false alarm and they will back in the well there error metric equation are actually the P me uh E having a a a larger weight in the overall was C average E questions so all we going to decide to prefer fewer misses and then we going to impose a and not uh a constraint to for the uh a for to be a so this is the find we cells are we have ah at the bottom you see with this uh find a one string of forcing a a of what do post if this C average fine a for that we used to three point one percent and uh the det curve of different uh stages of the calibration we have introduced this shown here is interesting to see that a a at the stages just before we carry out the um minimum error and is deviation calibration we look at the det curve here in in the region of a a high false alarm we can see that the they are also high missus which means the are in this region there are some in class data we at the uh a lot like to scroll case for negative and the irony erroneous deviation calibration a just kind of rescue these very negative score and T find of det curve seems to P oh no but a symmetric than the original curves so he the conclusion or for the two days presentation we have well it a a different problem settings for the multiple logistic regression with of variation among detect systems and we have also and has a minimum all erroneous deviation calibration i will from such that there are on the of we like to language pairs and we have also at an extra optimization constraint in calibration i were from just press a detection basis so this what is i important in the sense that it can bring this uh a calibration out from two was the general a more general applicability of the calibration i would form we have tested uh these are with from with the uh are you do all seven dataset where we did not expect a and a performance variation among detectors and i really works speak uh in in a way that it doesn't um the upper performance and uh we going to extend this and we've from two a situation where we going to consider multiple will late that languages and in this scenario the like correlation method and also of choosing which K as or choosing or what datasets sets to calibrate will become a difficult and very different and we going to work on that in future that's the end of today's presentation think is for much a i hmmm we have i i are i i a ah no i mean of you know prosodic system performance between different languages very a lot so there are some language in the prosodic system we've thing should be more reliable than then a languages in the prosodic system i i i i yeah yeah oh uh the so i actually is the uh for point two percent and uh we the yeah hmmm yes yes yeah yes yeah a oh we we we didn't show that all that the reason is that a with the a P a a a we have less than a five percent relative improvement of of of and i think the uh uh a a a prosodic this and a you to oh i hmmm yeah yes yeah uh a no yeah a or oh the situation would become very different in multi because as i have shown here when the to class then we can and have a very clear correlation between a to detect that a when the multi cost then you see like oh at different tales because the this is not multi dimensional then we cannot find correlation anymore so the situation is uh uh quite different a more complex than than than this one oh if we the not oh we we we are we are now in the in the stage of also of doing that and hopefully use see the results it date i okay yeah that oh oh okay i is a that that was a that's the uh uh for pouring to present you've seen oh this one a is yeah a that it's multi that is before we won the pairs yeah i hmmm hmmm hmmm that's exactly well same as what we've set the beginning we just feel D a score calibration all fusion as a problem of combining in multi uh the measures call factor to a scale these decisions so we don't K whether that multi dimensions mentions go back to a how comprises scores from different detection systems score scores from different a language detector we just a few that as a general it it generic multi dimensional scroll fact and try to find ways to combine them for