um today uh i'm going to share with you some uh results for that human assisted speaker recognition nist sre ten uh but first uh i want to point out that this was a possible only through the efforts of hard work uh of a lot of different people and so uh credit should go to uh jack godfrey and george got into as well as joe campbell uh our friends of the L D C uh and then um i can't think of it more as well so there's a question how how can human experts effectively utilise automatic speaker recognition technology um so this was um the perspective uh that uh we are coming from trying to answer this question uh when considering uh has a for sre ten um it's important to know this was a pilot um and also important to note that it was not uh what we hope it is uh uh informative and of interest add to the forensic community it was not uh design uh to be a forensic tests uh we did not use forensic data we uh leverage data that we're already using for the sorry ten evaluation so the hazard task uh was given to different speech segments determine whether uh they're both spoken by the same speaker uh has or was a subset of the sre core test uh there were fifteen trouser has a one hundred and fifty thousand hazard to a hazard wise human listening uh to assist in decision making and system descriptions um describe how much in this day uh exactly was involved uh participation was open to all who might be interested as it is uh recognition uh nist evaluations and we're just wouldn't strange uh from experts to so um when considering how much um data humans are able to process it fixed length of time um we determined that it would be necessary to try to select a few really difficult data uh in order to uh make the experiment uh challenging enough to be interesting and so the first question that came up was or even able to select data so uh we ran a preliminary experiment uh and this also a lot of success the protocol for the proposed as or study uh so what we did was we identified confusable speaker pairs from that's really for what um we use the automatic system uh i'm sorry a a set of automatic cyst uh from the follow up and for some voting system selected out forty seven pairs multiple cyst first and we listen to all the interviews for each speaker in each pair um they're these portions of the and selected ten pairs that's the most difficult to which uh from those we selected eight nontarget trials and then from the same speakers we also chose uh target rows uh something actually don't know now um that there's a little bit of but challenging terminology um so when we say uh target trials for model uh for test segment uh what we really mean in the case of target trials same speaker nontarget trials a different speaker and then for the model you might uh call this the known uh for the um uh suspect maybe uh and a case of the test um maybe the question uh let's see so um this every level here was used uh both training test uh four all of the uh has are probably expenditure the symbol fourteen human evaluators uh the uh volunteers were uh people involved in the project uh either um working for the government or they are uh as a data provider uh and people were permitted to unrestricted mostly table three test um evaluators provide a actual decision uh and optionally a coffee it's cool so i hear the results from experiment um see the overall miss rate was well percent the overall false alarm rate from four to six percent uh you see that on the left uh on the right you see the number of errors per trial uh you can see from the some trials are very challenging for example uh fifty five eight had nine nine eight years respectively two table uh three trials had where is that correct decision additional three have more than one third errors uh and one target one month or control there's no and this uh why shows uh number of errors by value a solid red or false alarm the outline yes uh you see all evaluators had false alarm errors i have had to yeah uh so the trials that probably be quite challenging and so this a preliminary experiment supported the idea that it would be possible to have a meaningful has evaluation uh with this use fifteen trial um when uh initially discussing potential for uh retina has experiment uh several potential participants were reluctant to do more than fifteen trials uh but do the concern for such systems if we can uh we decided to have a fifteen trial having fifty uh shock hazard um uh all the nodes they're still in he sure so let's talk about the has which in itself uh trans consisted of training and test speech segments as i said they have different the terminology uh trials where to be processed separately and independently um so uh what we did to help accommodate this was create a automated email system uh where um participants were so that the results for a given trial and then would receive a automated response um hopefully shortly after submitting their job uh with the lake to the neck yeah an extra uh unlimited amounts time permitting uh this thing was um was hello uh human listeners could in fact were in some cases other one person or a panel uh and uh decision and a likelihood score were acquired for each role uh_huh decisions could be made either from a combination of automatic uh processing and human expertise uh were made solely based on human history uh and we did not define a cost function four this evaluation uh and so this morning consisted solely of having this is false uh as i said earlier we us all difficult uh uh trials due to the number of journals and we did so from the mixture six corpus which was to be uh the training data was from interviews including various remarks and the test data was from phone calls uh put in some time uh to select the nontarget speaker pairs we're and he a speaker recognition system over four majors oh possible interview training of you test sex um i think an opportunity now two fig power uh how atlantic sea uh for helping us set up a uh speaker recognition system uh for the spring um after running the system over uh the four majors we identified speaker pairs uh that had a large number um uh very um hi scores uh and uh thus select the thirties and speaker pairs uh and from them as we listen to uh interview and uh phone call trials and selected nine uh based on perception uh that seem to be recent uh for the target rolls ran a full matrix of uh interview train telephone test uh and selected the actual trials that way uh thirty from a score and uh likewise by perception selected six just to be most assume after hazard to uh the first fifteen trials were uh the same as for has one for many another thirty five um we change some threshold to allow for additional speaker pairs uh we're trials but otherwise selected uh the data randomly are chosen oh we did listen for anomalous egg one this is speech or some uh uh and um uh so let's play a game uh yeah wisconsin same speaker different speaker um so uh here we're going to play that you segment and uh try to determine whether same speaker more oh oh oh yeah oh but should note no yeah oh oh i oh okay oh yeah yeah i i oh oh oh oh oh oh okay so how many people thought same speaker oh five different speaker okay yeah that we have sorry oh well yeah they rely zero okay and then for a child to oh yeah oh yeah yeah yeah oh oh oh oh yeah yeah oh one yeah oh oh yeah oh right oh oh where oh oh oh oh yeah okay well uh same speaker speaker yeah we have yeah uh so the uh first one was same yeah ah so uh twenty twenty systems from fifteen brave souls uh willing to uh dissipate a complete the fifteen trouser has a one uh and it's systems for six sites uh yeah i needed additional hundred thirty five trials answer to um oh sites that participated in has are also participated in the ministry ten about uh and the other side recruitment uh represented academic and government organisations from six different countries uh so for has or one uh here's the story um blue or uh target trials and read or nontarget trials uh and the darker colours uh represent their uh so at the bottom you can see sort of number of errors per trial and then to the columns you can see and this is false alarms for each each side okay uh missus corresponding information for hazard to uh just the information from the colours at the end um this is ooh also so um one question we are interested in was whether the human listening uh was uh worthwhile little selecting uh trials for the uh hazards power uh so uh are first look at this was um that the uh has one travels with the twenty systems had about thirty eight percent miss rate forty seven percent false alarm language wasn't too different uh from the eight systems or a hundred and thirty five trial so without L uh uh probably not worth the human life uh which was quite a bit uh to select the uh has or one draws weapons like same are sensitive uh we recently yeah maybe last year or so uh brought this out and do the same it's systems here we see the miss rate is about saying uh but the false alarm rate exactly signal hi uh so it seems as though maybe that has a one trials were a little more difficult but uh hopefully the hasn't you channels uh we're difficult enough uh this is yeah another view of the has a one has or two comparison uh the blue dots are uh has a while the red circles are or has or two uh and the lines connecting them uh sure there's the same system and all cases exception um i'm not sure um with the exception of this one system here uh and all cases has a two uh uh was less challenging i was or what and now this is a real obviously so uh they probably wouldn't and sure so uh with um has one uh and has or two we had some interest and uh how how to estimate also how systems did uh all the systems we were actually a little concerned that we're bias in this again systems since we use the system to uh select difficult trials um what we see here uh each red dot represents uh aside uh performance or uh has a one something to know someone brought up um and the last presentation that uh the dots on the axes uh actually shouldn't be on the axes and there is zero percent uh but that would otherwise be i'm not be visible so we put them there just a i see uh and uh so is is it the red dots represent the uh has or sites uh the lines uh represent top systems from uh the sre ten evaluation and thing uh is notable here uh is that in all cases the dots or or outside uh uh donna yellow line um so uh we do something similar what has or two uh and uh here we have all the same plot automatic systems uh and uh has or two systems uh the thick lines or uh automatic system and the lines or has a two system uh and we note uh a fair amount of separation uh between the two uh though these are the top sre uh automatic system uh and the slide we see um systems that uh or sites that participated both in house or two uh and in the uh sorry tell about uh and corresponding colours i represent the same site once again like lines uh are automatic systems the lines uh or has or system and the thing to note here is uh within colour uh the thick lines uh oh is closer to the origin then um uh so to summarise uh effort to select challenging data uh for has or was successful we feel uh we very pleased to that we were able to do that uh has or two trials uh only somewhat less challenging than has or what uh we can probably revises and say they were more challenging enhancer one though once again perhaps answer uh to was challenging enough uh for purpose this time uh performance of the hazard systems did not compare favourably with that of automatic systems uh on the other channels uh although i i feel obliged to remind everyone uh is dominant that there's uh there are so contrast that's um it's difficult to find any signal again uh in the uh statistical significance results um and it then there's a question uh if has our evaluations invaluable uh how should this be extended uh should the tense particle be changed or wasn't reasonable uh uh is there a good way to uh uh after the trial selection uh or wasn't uh also like was reasonable and uh what we keep on uh bumping up against this will signify so is there away uh to be able to get uh additional statistical significance uh without um burning the site too much and that is all we have time for some questions come on thank you craig um can you briefly clean oh what protocols oh i would be transmit uh what people actually do right hmmm sure so what i can say is that it it varied from side to side um and insights and uh system description um that was available to all the uh has approaches um in many cases that they were not even as with earlier in some cases and they um do you uh are or maybe more for the process of uh some uh uh process be online the question oh oh right i'm sorry did you produce they were just giving you binary so sure so um each uh has a participant was required in addition to giving the decision to give a uh score no that was i'm sorry yes but that was in the preliminary experiment yeah it was also on that and uh has or uh evaluation is we exp risk we we we making virtually impossible where do not exist bias it is or on a priori two humans are more three oh but where with fig Q well this is our concern um not a node in the case of the uh has or one trials i kiwi perception was also involved uh to select right so uh maybe it wasn't yeah i my suggestion two right right right wow her to use you know speaker I D yeah two oh right thanks yes fig a system two picks right sure i something was we have to i like to be you you said that one of the reasons of doing this is to see if the combination of human a machine would help yeah she just for a considerable oh faster did you see uh persists for people have a system the normal needs to you know and then if they want to answer yeah a hundred fifty trials they could listen so you can decide which ones oh fig be helpful to listen to i and then um you can listen to whatever we just a decision is just a score uh of course then you wouldn't have seen from sri system but you have you combine decision where something which you have buying so yes i think that would be interesting to think oh no uh however is that some sites did their goals you know human machine fusion uh so i guess we um are able to get uh some very small glimpse of that uh but i like yours efficient also you cool so attribute description during the walkthrough we look from you you too you you man be system you know the true system and we are trying not to answer two questions try to evaluate you just a few men people human expert and try to this time you selection process you maybe not you do if we try to to see if we are to system it's good to evaluate remember no you we must maximum yeah um so i just one and that um of course are it's i yeah table what we we're more interest see she no you um so in mind that we're going to be establishing um in additional mechanism for communicating about uh future evaluations along this line how um for a couple or in right um what might no in future english to help mode discussions and whatnot just as there was just one group for story continuation uh will you well thing where he's or oh yeah see dover downs and suggestions for how to make this oh an interesting um evaluation in the future uh and many very interesting suggestions including multi we and related activities like you can yeah where do we like well it's and uh um and you contact me your nist people because that's cute a fusion between human beings and the automatic to to see but well the human part it's something commissioner didn't yeah i didn't have one sure yes uh we have not uh we're not really looked at this uh no i find it very interesting part of uh part of it challenge is um that in some cases uh the human and um uh machine uh results were use prior to coming to us so would be difficult those cases that is out what part of that percent thank you Q from