0:00:14uh figure out much check
0:00:16um low
0:00:17uh i think you all very much for coming
0:00:19uh i was strongly encouraged to be brief in order to allow time for questions
0:00:24uh but if i a a like to begin by technology my file authors uh jack free george doddington
0:00:29and i one martin
0:00:31uh is well as this uh that has or participants
0:00:34a many of whom uh are in this room
0:00:36uh for there
0:00:38a a hard work and effort and conducting has a reason
0:00:43so the question or trying to address dresses
0:00:45how can you human experts effectively
0:00:49uh you to lies automatic speaker recognition technology
0:00:52uh to our knowledge this is still an open question
0:00:55uh so we included a small pilot test in the twenty ten nist speaker recognition evaluation
0:01:05uh task and has determine whether two different speech segments were both spoken by the uh same speaker
0:01:11the has evaluation valuation included two test
0:01:14uh the first court has or one consisted of fifteen trials uh that is fifteen pairs of speech segments
0:01:19uh i and uh the second has or to consist of a hundred and fifty trials uh the first fifteen
0:01:24of which
0:01:25a where the has one trial
0:01:27has or systems could use human listeners uh or machines or both
0:01:31and anyone who wish to participate uh was welcome
0:01:37again uh each trial consisted of two speech segments
0:01:41uh and the task is to determine whether they were spoken by the same speaker
0:01:45uh there was no time limit on the amount of the scheme presented
0:01:48uh but it was required that trials be processed separately and independently one at a time and and C
0:01:55each trial
0:01:56each system provided that same speaker or see uh or different speaker decision
0:02:00uh as well as a numeric score
0:02:03where a higher score indicated greater confidence
0:02:05in a speaker
0:02:06a same speaker
0:02:09because of the limited number of trials the evaluate uh evaluation metric consisted of simply tallying the number of misses
0:02:15and false or more
0:02:17uh uh let me note that a miss is deciding the segments were spoken by different speakers were were spoken
0:02:22by the same speaker
0:02:23i and of false alarm is deciding segments are spoken by a uh the same speaker when in fact there
0:02:28were spoken by different
0:02:35uh do you to the limited number of trials it was necessary to select challenging segment errors
0:02:41uh in each case one of the segments was a three minute recording of an interview uh of of one
0:02:45of several different microphone
0:02:47uh and in the other
0:02:49uh uh the other segment was a five minute call recorded over a telephone channel
0:02:54for has or one segment pair similarity was determined using an automatic system uh and the most similar different speaker
0:03:02uh were selected for
0:03:04uh different speaker trials and at least
0:03:06similar speaker segments uh are chosen for
0:03:09uh same speaker true
0:03:12he's pairs were then screen by human
0:03:14to select the most difficult trials more them eight any content cues
0:03:18a has or to a selected in the same way uh the only difference being the screen
0:03:28alright right
0:03:29now that we know all about the hasr evaluation
0:03:31uh let's play a game
0:03:33it's called same speaker different speaker
0:03:35and it's played by listening to uh a a a speech segments and uh voting whether they were spoken by
0:03:41the same speak
0:04:19i i
0:04:23okay how many people believe was the same speaker
0:04:27K a how many different speakers
0:04:29okay overwhelmingly same but some different
0:04:32okay and and the second row
0:04:57i i
0:05:16all right how many people think same speaker
0:05:19uh just a couple
0:05:20uh i i
0:05:21how many uh how many different speaker
0:05:24um well
0:05:28there's a set of a little differently yeah but you may be surprised to learn that the first one was
0:05:31different speaker
0:05:33and the second was same speaker
0:05:37yeah it's true it's absolutely true
0:05:39and let us know that these were the trials and has or one that had the most missus and false
0:05:54okay so let's see how that has or uh one systems did
0:05:57uh on the top or same-speaker trials and on the bottom different speaker trials
0:06:02uh there were twenty systems that participated from fifteen sites uh in six different countries
0:06:07uh the green portion of the bars represents correct decisions
0:06:11the blue misses
0:06:12and the red false alarms
0:06:14uh as we look from left to right we sea trials increase in uh an increasing difficulty of for the
0:06:20yeah and we just listen to
0:06:23uh this trial and
0:06:24and the strong
0:06:31uh here we see individual system performance uh a on the hasr one trials
0:06:36a each bar represents the total number of errors divided by the total number of trials uh that's fifty in
0:06:42this case
0:06:43a again blue indicates misses and read false alarms
0:06:46uh this system with the fewest
0:06:49uh i had to as and no false alarms
0:06:52and the system with the most
0:06:54had four missus and uh seven four
0:07:09okay um here we consider the performance of uh uh uh was uh from the sites that participated in hasr
0:07:16one and hasr two
0:07:18uh the bar on the left for each system repair uh represents uh has or one trials and the on
0:07:23the right uh represents uh errors has a two trials
0:07:27uh sorry
0:07:28left uh has or one
0:07:31and then right
0:07:32as or two
0:07:34uh again blues misses and and are there are false alarms
0:07:37and a on average
0:07:42the has or one uh
0:07:43prove more challenging uh then has a two trials
0:07:47no if you took your time and carefully read the fine print
0:07:51of this or G of a a uh read ten evaluation plan
0:07:54uh you would discover that we embedded in the automatic uh uh system evaluation the the hasr trials
0:08:00uh i to uh see how the automatic systems to
0:08:05so we when we look at the uh three leading systems in the main evaluation and and look of they
0:08:10did on the
0:08:12uh a has or trials
0:08:13um this is what we see here on the right
0:08:16i think we should note on this uh uh is that
0:08:19the actual decisions
0:08:21are being displayed here for the hasr systems
0:08:23uh but we were not able to do that for the um
0:08:26automatic systems uh due to a thousand to one different speaker the same speaker prior probability uh a given in
0:08:33the main evaluation
0:08:34uh so we
0:08:37uh a the decision threshold
0:08:39uh of the automatic system so as to produce equal counts of misses and false more
0:08:48uh so we saw that leading automatic systems had noticeably fewer errors than the has or systems uh and the
0:08:54tests proved quite challenging
0:08:58in fact uh have the systems got more trials right them long and has are one
0:09:04yes thank you
0:09:08so uh we leave you um with
0:09:11uh a couple questions
0:09:13uh first was this data appropriate for support in has a research
0:09:17um and where do we go from here
0:09:22we are planning in another has or evaluation to be held in conjunction with that you twelve
0:09:27uh we expect there be two test
0:09:29uh of the first row twenty trials and the second with two hundred
0:09:33and the trial selection process is plan to be similar as and has or ten
0:09:37uh but hopefully with less human screen
0:09:40uh the data will uh still be in english only
0:09:43uh and the evaluation period is plain to be form months
0:09:46uh which is three much longer than the automatic system evaluation is typically
0:09:52um we or you are for your feedback
0:09:55uh so please E or
0:09:56or speak with this
0:10:00i should note that statistical significance is of great importance
0:10:04to nist
0:10:05so if you interest to us
0:10:07uh but with so few trials unowned can be assigned
0:10:10uh to these result
0:10:12uh we are also interested in ideas on how to improve uh the channel selection process so again please
0:10:18sure with us
0:10:20uh for more information uh we're to provide feedback
0:10:23um you're some websites or speak with us
0:10:26uh you know is on the paper
0:10:28very much
0:10:35so for questions please come to the mike
0:11:07okay i would like to have more explanation i can
0:11:11and uh the proximity had difficulty and that approximately optimized how you
0:11:17next year it is proximity
0:11:19exactly sure um
0:11:29so uh we ran a full matrix of
0:11:33uh uh
0:11:35um uh
0:11:36interview train interview test on target trials of all speaker pairs
0:11:40uh the three seven speaker pairs uh were identified
0:11:43uh using a threshold of
0:11:45six scores where the idea was
0:11:48the score was included if the scores including the top one percent of
0:11:53scores in the direction
0:11:55of those thirty seven acres were chosen and then
0:11:58um combinations of segments for each speaker pair
0:12:01um listen to
0:12:03to determine which would be used for
0:12:06uh for non-target
0:12:07there's four
0:12:08uh a target roles
0:12:10or same speaker true
0:12:13uh we did a a full matrix
0:12:15uh of the actual sect
0:12:18a and then this to the sec
0:12:20that way
0:12:20and that was for has or one for as a two uh that was
0:12:23very care
0:12:24uh screen
0:12:25a process was similar just with a a a a large
0:12:35i i quick what was
0:12:36the percentage of non sing have two data
0:12:40uh uh uh uh just a non non-native
0:12:43have that you what i
0:12:46present present of non-native speakers in the hash is some people who were not native us english speakers
0:12:52um let's see
0:12:55something in one
0:13:04uh i'm thinking of two
0:13:17oh i'm sorry misunderstood
0:13:19or or or or maybe a was source are you're asking are you asking for the trials are for the
0:13:24oh i'm sorry yeah i
0:13:30uh i do not know that off and but that something we can uh find that with so for port
0:13:34them i will note that everyone who uh was recorded was reported in philadelphia
0:13:39uh but that's of for a leader national city so i
0:13:49i believe that's correct but uh sometimes
0:14:26give a there's another question
0:14:32well what was the gender breakdown to you specifically select for uh could divide or did you
0:14:38choose based upon
0:14:39a challenge in the past but for
0:14:42uh sure a get i don't have the gender breakdown handy but this was a um
0:14:48this just fill out we did not try to uh about this but a whole trials
0:14:52also of course but all trials were
0:14:54a same sex
0:14:59that you very much