Speech Transcript - Speaker Recognition for Forensic Applications

0:00:15	well thank you for that kind introduction j are
0:00:19	you're right about luggage is then an issue would me and
0:00:23	but i'm close when i don't have my luggage as an even bigger issue
0:00:28	for me
0:00:29	so i
0:00:32	appreciate the introduction and i thank the organising committee for inviting me
0:00:37	and especially for naming this town i don't know joe once you ha i've never
0:00:45	had this happened before we went to give it is a presentation so please
0:00:52	so let me start by asking for show of hands
0:00:56	who among us has participated in a forensic style evaluation of speaker recognition technology
0:01:06	that's good
0:01:08	that's good i'm gonna try to get more hands up with interest that the and
0:01:12	my presentation
0:01:16	who is processed real forensic case data
0:01:23	well that's pretty good okay
0:01:26	so i'll be preaching of the choir some of you
0:01:31	and finally who has actually testified in court
0:01:37	that's good
0:01:38	very good okay
0:01:41	so let me
0:01:44	talk about some of the interesting not challenges in a forensic and investigatory speaker recognition
0:01:55	the basic introductory material for my talk is
0:02:02	you know basically to define the problem so in forensic in investigated a speaker comparison
0:02:09	the speech utterances are compared
0:02:13	and the process can either be by humans or machines
0:02:17	and
0:02:19	in the forensic case typically this is for used in a court of law
0:02:24	this is very high state
0:02:27	it demands the best that signs has to offer and those of you who pay
0:02:32	attention to trials on television probably are a pretty nauseated by what you see out
0:02:40	there and what is happening in the world
0:02:43	in terms of these expert witnesses that i'll be talking about later in the methods
0:02:49	they use
0:02:52	the map it's a vary quite widely and there is a very nice survey paper
0:03:00	by golden french the describes some of the variations in these processes
0:03:07	and that's not necessarily for the good
0:03:12	and
0:03:13	it's important that these methods that are used be grounded in scientific principles and be
0:03:19	applied properly
0:03:23	and just as important
0:03:24	is to decide when you should not except that case
0:03:29	when i it would be irresponsible
0:03:33	so this idea of went upon or not apply a the methods is also very
0:03:40	work
0:03:43	so we're gonna provide some analysis of the methods and a place to make citing
0:03:49	examples that i hope will get you excited about
0:03:52	how challenging this kind and domain really can be
0:03:57	and
0:03:58	one of the things i wanted you hear and in the broader sense it other
0:04:04	conferences with wide diversity
0:04:08	is to improve communications among
0:04:11	the research community this rate group here a legal scholars
0:04:18	you know we have for example in speech people like bill thompson
0:04:22	who
0:04:23	wrote the prosecutor's fallacy and was involved in the o j simpson trial so we've
0:04:30	got a number of very high profile a legal scholars in the us
0:04:35	involved and also international and of course the legal systems are different throughout the world
0:04:43	so you have to address these contexts these questions within context
0:04:50	and then finally
0:04:52	i'm going to ask this community for health and present some other things that you
0:04:58	could actually you get involved in and help us make progress
0:05:07	so i'll start by giving some background
0:05:12	cover some example approach is a talk about some of the activities
0:05:18	that are currently going on a request some
0:05:21	things for the community to get involved in some future ideas and conclude
0:05:29	okay so with forensics and investigation basically they differ by
0:05:36	primarily by whether the
0:05:39	methods
0:05:41	we will be presented in a court of law
0:05:43	a lot of people for investigation will try to use a similar process that has
0:05:50	the rigour necessary should it be important to pretty later presented in a court of
0:05:57	law
0:05:58	but the basic forensic community and investigative community
0:06:03	work and similar problems in terms of trying to establish facts
0:06:09	and the actual presentation form is where they differ now here i have a cartoon
0:06:17	that shows basic the most canonical example of a speaker comparison we have a known
0:06:24	a speech sample and a question speech sample
0:06:28	and you compare them
0:06:30	and there's some summary or analysis
0:06:32	the
0:06:34	forensic examiner or in less than mike right a reporter
0:06:39	and
0:06:40	we're not done that's
0:06:43	that's the simple view of the world
0:06:48	then i was happy when i asked the number of friends for suggestions
0:06:54	michael jensen from a p k a kindly provided this table from his summer school
0:07:00	that shows a little more granularity in terms of a forensic versus investigated
0:07:08	including a large scale
0:07:13	investigation where you might actually be running
0:07:17	automatic systems that are similar to i office the f b i z integrated automated
0:07:23	fingerprint identification system
0:07:25	which conducts large-scale searches through databases
0:07:30	and you can see here that they vary in terms of whether they will be
0:07:34	presented in court
0:07:38	what kind of
0:07:42	methods are used
0:07:43	a number of comparisons
0:07:45	and the type then
0:07:48	style of working on the date
0:07:51	so let me now give just a couple examples
0:07:55	of some forensic situations
0:08:01	first you might remember the olympics in nineteen ninety six with the centennial park but
0:08:10	there was a of thirteen second phone call
0:08:15	that said there is a bum
0:08:19	in centennial park
0:08:21	you have thirty minutes
0:08:23	that's it
0:08:24	so now i you've got this thirties thirteen second call the people are frantically trying
0:08:31	to figure out the address of where is centennial park at nine one
0:08:37	so that they can dispatch officers to the scene
0:08:42	basically a lot of time passes they have a short time to clear the park
0:08:47	by the time the officers get their two people are murdered a hundred and twenty
0:08:52	people are injured
0:08:54	and now they have a suspect in custody two matches the description of someone that
0:09:00	was seen it
0:09:01	payphone
0:09:03	and that person's name is richard jewel
0:09:06	and
0:09:09	they have quite a bit of her sin trying to establish if this person is
0:09:15	the one on their call
0:09:19	turns out the actual person who made the call
0:09:24	escaped the scene and was not caught for seven years
0:09:29	another a very high profile in recent case a tray of and martin
0:09:36	this was
0:09:39	had all sort of the wrong things happening all at once
0:09:44	extreme mismatches of every type imaginable
0:09:48	these outrageous claims of justified shootings and then
0:09:53	just to make it more interesting the orlando sentinel newspaper decides to go higher some
0:10:01	voice exports
0:10:03	and
0:10:06	i don't know if they quite appreciated the conditions under which they were working
0:10:13	first of all it's hardly consider speaker recognition when the person is a crying out
0:10:19	for help right
0:10:21	and i'll show you later some of the issues involved in that so this was
0:10:27	a very turbulent time in the us
0:10:31	and
0:10:32	a lot of controversy regarding the kind of data that was involved in this case
0:10:38	in how
0:10:40	i how inappropriate the whole situation is we have people by the way like george
0:10:45	doddington who's here today for keeping the system on the rails he was one of
0:10:53	the expert witnesses
0:10:57	so how heart is forensic speaker recognition
0:11:01	well
0:11:02	a first step in that direction that actually is not truly forensic speaker recognition
0:11:10	who was this nist a haze or evaluation and actually i miss the before the
0:11:15	nist hazy there was actually and evaluation by and if i to you know that
0:11:22	actually would real
0:11:23	forensic case data
0:11:25	i'll talk about that in one
0:11:27	but in the haze your evaluation
0:11:30	you know unlike conventional nist evaluations
0:11:36	you know where you have so many trials they're not really pride itself practical for
0:11:41	humans to process the data
0:11:44	here there was a paring down to make the number of trials manageable by humans
0:11:51	and the process for doing that was a two-stage selection process where
0:11:58	you would use an automatic system to find the most confusable pairs
0:12:03	and then a file that by using humans to then
0:12:09	find the most confusable pairs of the confusable automatic here's so you have a very
0:12:16	difficult data to work with and the benefit of that was now you can have
0:12:21	a you know evaluation
0:12:24	with the mere fifteen trials that's manageable by humans
0:12:29	in this what is the beginning for the nist
0:12:33	style of evaluations that are in this direction
0:12:37	so i don't know if you've heard the use but let me just play one
0:12:40	here
0:12:43	so here is a trial eleven
0:12:46	now play the two samples and the question
0:12:51	that is asked are these from the same source
0:12:55	here's the first one
0:13:07	yours the seconds
0:13:18	so
0:13:19	it's pretty impressive to me the
0:13:23	that's supposed to be as you can see by the truth label here
0:13:27	two people
0:13:29	i will i like i said in brno i would love to actually meet these
0:13:34	two people and see that there are two separate people have dinner with them
0:13:40	you know maybe it would be high price of for the meal but
0:13:45	those two people confused the humans and the automatic systems consistently in the first is
0:13:53	your evaluation
0:13:54	and it inspired a lot of people to
0:13:57	look in the this interesting problem
0:14:01	and i unlike the to traditional nist sre protocol he's are of course allows human
0:14:07	listening
0:14:08	so this is exciting
0:14:10	at the time of all the data was in english so that might somewhat limit
0:14:16	some of the human approaches
0:14:19	but it's shore gave a nice flavour of the challenge in this is difficult
0:14:25	but you know what it's not nearly as difficult as the real thing and i'll
0:14:30	play that the mom
0:14:34	so some
0:14:37	challenges in i speaker recognition for humans in machine i have a few slides
0:14:43	the nist of else have made progress in things like channel mismatch
0:14:49	distance to the microphone by progress i mean progress in evaluating the of these of
0:14:54	facts
0:14:56	also in terms of duration and cross language although not showing notes here
0:15:03	so that this is good but there's a lot more going on in a lot
0:15:08	of forensic case data
0:15:11	so the typically in these scenarios
0:15:16	the talkers are unfamiliar to the examiner
0:15:20	the talkers tend to be familiar with each other
0:15:24	and that affects their conversation-style there can be multiple talkers there's all sorts of different
0:15:32	styles a conversational read aloud crying speech for example if you wanna call it speech
0:15:41	and then accommodation when you have familiar talkers adapting to each other
0:15:47	if there's a conversation that's part of the evidence which is often the case they
0:15:54	might be deceptive
0:15:57	and i have examples of this and sometimes you dealing with people who are mentally
0:16:01	ill or medicated and they can be all these situational mismatches to deal with
0:16:10	this goes on and on
0:16:12	but you know what it's actually have a nation often used thing so if you
0:16:18	have an evaluation where you evaluated a few of these factors the problem is these
0:16:24	are combined in horrible ways to make it even more challenging
0:16:29	when you're trying to determine what is the performance
0:16:33	about system or a human or human with the system
0:16:37	so you can have mit mismatch galore between the samples that are being compared and
0:16:43	also all the information used to train our automatic systems the background hyper parameters it
0:16:51	goes on and on
0:16:54	then you have additional challenges in terms of how should this information be presented
0:17:01	in terms of scoring or decisions you know we will be pretty strong advocates in
0:17:08	general about say for example reporting log-likelihood ratios or something like that
0:17:14	but
0:17:15	a lot of the forensic people i work with
0:17:19	the investigators don't wanna hear a log-likelihood ratio they want to know what they should
0:17:23	go take action
0:17:25	this gets very bad in the number of ways mathematically ugly because of asserting prior
0:17:32	probabilities to make decisions this is a very hardened
0:17:38	a tenuous situation
0:17:41	in an area where this community is made some progress and i'm hoping route odyssey
0:17:46	all actually sees the more in this direction
0:17:51	then you have this whole issue of calibration with system scores moving around and drifting
0:17:59	if you well in this causes chaos among the analyst
0:18:03	so one of the biggest challenges in a lot of this is building a i
0:18:08	trust and confidence in the analyst or examiners if your system starts misbehaving i they
0:18:17	might start using it or do something kind of crazy
0:18:22	so there is a lot of issues with down establishing trust and having the system
0:18:27	be reliable and stable and calibrated
0:18:30	then you have the issue of the courts question so we talked about sort of
0:18:35	this canonical example with i got two speech samples
0:18:39	is the source the saying
0:18:41	well that's not necessarily you the question that the court hence
0:18:46	it's not make it but the other guys just been murdered
0:18:50	and we don't have any recordings of his voice
0:18:53	so now what you do you there is a whole bunch of
0:18:59	challenges with trying to figure out
0:19:03	you know how do you deal with the
0:19:06	you know a known in advance questions from the courts right now one of things
0:19:11	that i've been pursuing with some probably is to see what are those questions somewhat
0:19:17	negotiable
0:19:18	and can we get a pretty good menu of what the history of these kinds
0:19:24	of questions are to help us as developers build systems an acquired data to help
0:19:30	address the kinds of questions that are likely to come up
0:19:37	then you have this issue with the automatic systems where you know people might think
0:19:41	that they're fully automatic
0:19:43	but often what happens is there is models that have been bill
0:19:48	i human head is segmented speech
0:19:52	and
0:19:53	decided what speech utterances are assembled to create models
0:20:00	so you got this kind of chicken in the or the egg problem right so
0:20:05	i'm trying to recognize speakers but yet when i'm training my models i need to
0:20:10	do some segmentation
0:20:13	so there's that factor to keep in mind also
0:20:18	then there is a whole bunch of other things going on here
0:20:22	i've already talked about went upon
0:20:25	in terms of not accepting a case
0:20:29	you went upon is the expression from american football i'm not sure that translates internationally
0:20:38	and then there's some other issues about noise and degradation that are important to keep
0:20:44	in mind
0:20:46	and we'll talk more about those in the moment
0:20:50	so
0:20:52	now let's actually here some real case data
0:20:57	this is pretty fascinating i thing
0:21:02	i'm going to show some examples play some examples the first one
0:21:08	i'll set it up for you a triples triple homicide is just been committed
0:21:15	the suspect runs from the scene
0:21:18	with one of the victims cell phones and their blue two
0:21:22	and he's calling his for and
0:21:25	to come and pick came up
0:21:28	he's running is fast and see
0:21:31	the wind is blowing
0:21:34	and i it's a very difficult situation so let me play this
0:22:02	so that has
0:22:04	a lot of characteristics that you probably are used to working with in say the
0:22:09	nist evaluations
0:22:14	and the this is really challenging stuff and it gets better because
0:22:20	now we have the suspect in our custody in his jails
0:22:27	and he's kind of perverted to being like just in beaver
0:22:33	so listen to the
0:22:58	so that's pretty a mismatch when you say
0:23:03	i don't know what you would do with the data like that
0:23:09	so that's just one the one example of just incredible mismatch and always not only
0:23:16	between the samples themselves well maybe the last one isn't
0:23:21	terribly unlike a lot of that's training data that our systems are built with but
0:23:26	i'd be surprised if are systems have been trained and have their hyper a hyper
0:23:31	parameters and background models knowledgeable of the at the this like that first same
0:23:39	so this is
0:23:42	extreme mismatch not only between the samples but against are systems
0:23:48	but we play another example
0:23:50	of a very complex situation
0:23:54	where you have some pretty stressed overlapping talkers
0:24:15	how many talkers are there in that situation
0:24:20	sounded about like three to me but you know i i'm not sure
0:24:25	or
0:24:27	you know and apart i didn't plays the beginning where you've got
0:24:31	the operator at answering nine one and then you hear the person in whispering and
0:24:37	then putting the phone into their pocket
0:24:40	i where they found it later unfortunately
0:24:45	who is then the victim
0:24:47	so
0:24:48	this is the type situation some so this gets in of the questions like what
0:24:53	question am i trying to answer how many people work rats and
0:24:59	who said what
0:25:01	the area of disputed utterances as it is known in the forensic community
0:25:07	so these guys of course are you know rounded up and they're all claiming nodes
0:25:13	the other guy that shot am i was just visiting right and friends or so
0:25:19	so there's challenges like that you're with
0:25:24	another example
0:25:26	is
0:25:27	is a very interesting threat hall
0:25:31	and this one has some timeliness about it as well
0:25:36	so listen to this first recording
0:25:50	so the audio system in here is pretty good i don't know if you could
0:25:55	make that out but the guys basically giving the address of that's going to be
0:26:00	attacked by gunmen tomorrow
0:26:03	wow better decide what you're gonna do you
0:26:07	so they decide to bring in a suspect
0:26:10	and here's his interview
0:26:26	so i there's and number of things going on that first call it seems like
0:26:33	the person was like in the movies holding a handkerchief over the phone
0:26:39	sound like they had marbles in their mouth
0:26:41	the second one i don't know if there are medicated or what's going on there
0:26:46	but there is a lot of mismatch going on in that situation and you know
0:26:52	for investigative purposes even though you're not in a court of law
0:26:58	it still has high stakes when you go decide to take somebody in the custody
0:27:04	i mean that's a dramatic experience right so you still need to be cautious how
0:27:11	to proceed with that
0:27:14	but it's very difficult to make a quick decision in situations like this
0:27:20	and you know
0:27:22	this is just a small part of it
0:27:25	as reversed warts at the your secret service as if it's always something every case
0:27:31	there is a case where
0:27:34	somebody a had a sex change operation during the
0:27:39	first sample and the second same ball that we're being compared with
0:27:45	you know the so a lot of our systems that are gender dependent like what
0:27:51	you do you know that there is just
0:27:55	so many challenging situations
0:27:59	they come up when you're dealing with real a forensic case data and i should
0:28:04	add
0:28:06	the when samples get elevated to the level of the national resource like reba schwartz
0:28:13	those of the hardest of the forensic cases the easier ones can be handled it
0:28:19	a lower level
0:28:21	so these are very challenging situations
0:28:26	and one might ask what how do i figure out if
0:28:31	i if i should process this data
0:28:34	if it can be admitted in the core
0:28:38	if i'm in the united states
0:28:40	i have this
0:28:42	admissibility standard and the with the doppler
0:28:48	so for example
0:28:52	in us federal court and in about half of the us the words
0:28:59	the job which will consider the admissibility of scientific evidence
0:29:04	but judges are often the first to admit that generally they're not sign this
0:29:09	so they had this sort of d he role pushed onto them
0:29:16	and the idea is
0:29:18	under federal rules of evidence number seven no to the testimony by expert witnesses
0:29:25	the purpose is to assist the trier of fact the jog through the jurors
0:29:30	if the evidence is going to be very confusing
0:29:34	then it's not
0:29:37	it method
0:29:40	so that this is kind of loose
0:29:42	here the courts have in the us have tried to
0:29:49	structure this
0:29:51	and a
0:29:52	form this so called out we're test
0:29:55	this is a the over versus merrill dow pharmaceuticals
0:30:00	and basically four or five depending on how you read it different factors
0:30:08	are introduced in the this the outward test
0:30:12	so has the method bin or can it be test
0:30:17	well
0:30:18	one of the nice things about our communities that we do test a lot
0:30:22	not sure that we test on this kind of data
0:30:27	another is you know has been subjected to peer review and publication
0:30:33	well are communities very good at publishing papers and
0:30:38	this odyssey is just one of those excellent the forms
0:30:44	now we're in trouble
0:30:47	does it have a known error
0:30:51	wow well if you tell me what error rate you want i can find the
0:30:56	corpus that will probably give you that error rate that's not the answer they wanna
0:31:01	hear right they are they want something pretty solid much more certain like
0:31:07	for example the in a
0:31:10	which by the way also has variability
0:31:14	but that's a whole nother story but at least it's relatively small compared to what
0:31:19	we experience
0:31:20	in the voice world
0:31:22	are there existing standards controlling its use
0:31:26	and maintain
0:31:28	well currently there's very little in that area but in the us all be talking
0:31:33	in a moment about some activities in that direction
0:31:37	and
0:31:38	learning about what's happening internationally which is one reason implied to be here this workshop
0:31:45	and then of
0:31:47	the first one is sort of this friendly thing like you know is it generally
0:31:52	accepted by the scientific community
0:31:55	then you get in all these problems like what's a community what's the scientific community
0:32:01	and
0:32:02	this up there are part
0:32:05	is also known as the fried test which predated the arbour
0:32:10	test
0:32:12	so looking at
0:32:13	the basic anatomy of the speaker comparison system
0:32:17	you can form
0:32:18	two parallel branches
0:32:20	the start with the feature extraction and creating models and then go through a comparison
0:32:26	of the
0:32:29	hypothesis that the samples matched versus they don't
0:32:35	i and then a producer calibrated a match score out what
0:32:41	now that's
0:32:43	fine however
0:32:46	there's all these knowledge sources that are under the but
0:32:50	and all these areas that are right
0:32:52	for mismatch
0:32:54	so for example let's just take and i-vector system
0:32:59	so we have this signal processing chain
0:33:03	and
0:33:05	different stages here are shown where we need all these different kinds of background information
0:33:12	whether it's
0:33:13	hi hyper parameter tuning
0:33:16	you know the universal background models
0:33:20	i
0:33:20	total variability matrix for the
0:33:24	covariance matrix that's needed
0:33:26	to make these systems successful
0:33:30	but there's more
0:33:33	what about calibration
0:33:35	i need to train that system is well
0:33:39	and a system that's not calibrated will drive in one is absolutely crazy
0:33:45	and you lose their confidence and they'll stop using your system
0:33:51	so this is a very important stage it's great the nico heads the paper here
0:33:56	on
0:33:56	calibration and weights to address this again
0:34:01	one of nicholas favourite topics of mine too
0:34:05	so basically you want to try to minimize all these nuisance as a some of
0:34:10	which
0:34:12	if you're processing single here's of samples at a time you can get a good
0:34:17	handle on other nuisances are partly due on single pair comparisons
0:34:22	those have to deal with logical consistency with the to use two samples matching
0:34:29	and then another pair of samples matching but the others powder samples not match and
0:34:34	when i say matching i don't mean that in the binary sense i mean scoring
0:34:39	high
0:34:42	so
0:34:43	calibration is a good thing makes in was happy smile and when it works
0:34:50	thank you go when everybody that works on
0:34:54	so now what
0:34:56	whatever what why do you if i want to combine these methods
0:35:00	this gets also quite complicated
0:35:05	and you know do you do we way these processes in a dynamic fashion taking
0:35:11	into account when there are working in areas that they've been developed in trained on
0:35:17	and
0:35:18	d weighting them when there are running a little bit out of the regions that
0:35:24	they've been developed for
0:35:27	how do we mitigate the observation bias you know you certainly don't one day human
0:35:33	examiner to know what the scores are from the automatic system before they can finish
0:35:39	their evaluation
0:35:42	but it gets even more fine grained than that sometimes
0:35:46	you know you hear
0:35:48	content in the mid in the samples you're working on that can bias you
0:35:53	you might consider removing that content at the expense of working with less data
0:36:00	you've got all these variabilities to deal with the subjects of the samples themselves the
0:36:06	humans that are actually conducting the comparison process
0:36:10	all analysts are alike
0:36:14	for example then the machines that as well
0:36:18	there's issues about consistency in repeat ability
0:36:24	already mentioned logically consistent the desires and then
0:36:30	you know having some best practices to establish howdy
0:36:34	use these processes remember one of the doubt where criteria is the existence of standards
0:36:40	and their maintenance
0:36:42	to invoked these process
0:36:45	so it works only there's a number of evaluations that can help us and if
0:36:52	i t no i think in two thousand three had the very first one on
0:36:57	real forensic data
0:36:59	that was a lot of fun
0:37:02	and you know the agreement we require that you destroy the data after you didn't
0:37:07	unfortunately we divided by the agreement no longer have that the at the but
0:37:13	that was really very nice
0:37:15	but the good news is that
0:37:17	there might be more about coming
0:37:21	then we have the nist a teaser series which you know isn't quite forensic but
0:37:26	it's probing some dimensions that will help us make progress i think in the forensic
0:37:30	domain
0:37:32	and the next sre
0:37:35	might actually have real forensic samples and
0:37:41	so
0:37:42	are
0:37:43	you know i think it's important to look at all this in the context of
0:37:47	the delaware factor
0:37:49	and
0:37:50	i especially for application the united states
0:37:54	but maybe throughout the rest of the world as well it's it they seem like
0:37:58	pretty sound principles to me
0:38:01	but if there's additional factors that are used internationally i would love to know about
0:38:06	them to make sure that they're being is addressed at least in our work as
0:38:10	well
0:38:14	so some activities
0:38:16	there's the us we speaker the scientific working group on speaker recognition
0:38:24	here we have a history of starting this and
0:38:28	a lot of the
0:38:30	efforts were motivated by the two thousand nine a report from the national research council
0:38:37	national academy of sciences
0:38:40	and strengthening forensic science in the united states it basically called all of forensic science
0:38:46	on the car
0:38:47	and said what
0:38:51	a the practise that's used for d n a is a gold standard
0:38:56	the rest you guys should model it
0:38:58	they call then the question things like got carpet fibre analysis tool marks
0:39:05	things they just scientifically didn't quite have the background
0:39:10	in terms of their development
0:39:12	and that's partly because forensic science didn't grow up being developed by sign this
0:39:19	so one area that worked reinhardt
0:39:23	to address with the investigatory work
0:39:26	voice working group
0:39:29	actually is to make progress in different things like the different use cases and collection
0:39:36	standards
0:39:38	i or word already mentioned best practise are best practise when the pun
0:39:43	standard operating procedures there's this new type of eleven standard
0:39:48	the scientific working group has a number of ad hoc committees
0:39:53	i including in our det any committee which number of you would probably be interested
0:39:58	in
0:39:58	and the best practices can maybe
0:40:01	science and the law
0:40:03	and vocabulary to get kind of the whole community talking together
0:40:08	so best practices committee for example deals with the number of areas including collection audio
0:40:14	recordings
0:40:16	the related data that goes with an audio recording you know maybe you know about
0:40:21	the phone numbers that handsets used
0:40:24	a number things like that
0:40:26	some of those factors should be passed to the examiner others might cause bias you
0:40:32	have to be concerned about
0:40:33	then there's the transmission part of the standard known as the type eleven record your
0:40:38	probably be hearing a lot about that
0:40:42	and then the proper application
0:40:45	and also guidelines for examiners and reporting
0:40:51	so here for example is how you form a standard transaction in this type of
0:40:56	eleven a framework basically you create a transaction that has the known in questioned recording
0:41:06	and then you've got the two type eleven a records the go with that about
0:41:12	how to transmit
0:41:13	that data you have type two information about the situation of each of those recordings
0:41:22	and then you have this type to that has all the issue has all the
0:41:27	information about
0:41:28	the legal framework and justification and then an overall
0:41:33	type one to enact the transaction and you go through this a process where you
0:41:39	do something speaker recognition scoring reporting and then deliver the report back to the submitter
0:41:48	so this is just one of seventeen
0:41:51	ut types of transactions that are currently define in this effort i don't have time
0:41:57	to go over all of them
0:42:00	how do how does one actually a arrived at a best practise
0:42:05	you can
0:42:08	go through two branches survey the community as see what candidate best practices there are
0:42:15	at the other branches to look for gaps and develop new best practices
0:42:20	but in all cases these are going to go through a validation process the requires
0:42:26	evaluation
0:42:29	and then
0:42:31	finally when they been evaluated they will be proposed a i and except proposed as
0:42:37	an actual best practise and maybe a step further as a proposed standard this is
0:42:43	all and within the in seen yes i t l framework
0:42:48	sometimes you need multiple best practices especially in human based approach is because there's a
0:42:53	lot of variability bit among analysts and what they're different talents are
0:42:58	so if we had one standard this as a human recognition should be done by
0:43:02	structured listening
0:43:03	you will exclude eighty five ninety five percent of the laboratories mean i'd state
0:43:11	whenever you do evaluation you need to be very careful about the design collection of
0:43:16	data finding how do you keep this going
0:43:19	so there is some new efforts
0:43:22	that all talk about later with this sack
0:43:26	let me start with this simple request to the community
0:43:30	so if you have candidates for best practices please submit them to swig speaker and
0:43:37	the sack for consideration
0:43:42	pursued outer factors improve robustness
0:43:46	work with the analyst you never in there's nothing quite as i opening is working
0:43:50	with an analyst and understanding the challenges they're dealing with
0:43:54	and participate in forensic style evaluations
0:43:58	that's what we would really like to see
0:44:01	wrote the most serious
0:44:03	so here i just have a couple then slides i norm uninsured
0:44:08	and the idea here is i mentioned set
0:44:11	okay so the organisation of scientific area committees this is a new after
0:44:16	it's house the nist
0:44:18	swig speaker here will be absorbed in sack is there's speaker recognition subcommittee
0:44:25	i've already mentioned in this in seen a slightly l type eleven records
0:44:32	i has a great set of
0:44:36	documents and a journal and even the air code of conduct
0:44:42	that you might be very interested in
0:44:46	there's a lot of other organisations i basically had a list of
0:44:51	a quarter this line the mast some friends for help thank you everybody who sent
0:44:56	me things
0:44:56	now i have too many things to actually talk about all of them
0:45:00	so this highlight to here
0:45:03	and in fact
0:45:06	i mentioned the and a five folks are pursuing some new data that's in the
0:45:12	forensic domain i won't steal the thunder from their paper which is why trim a
0:45:16	conference
0:45:17	and there is some big efforts in
0:45:20	euro in the
0:45:23	f p seven as well for
0:45:25	b
0:45:26	multi integrate voice systems that are multi media multi
0:45:32	source system
0:45:35	okay so let me i conclude
0:45:39	speaker recognition is successful used today in a variety of applications
0:45:44	but must be applied responsibility with caution
0:45:47	and this is referencing the paper the chair finally mention that the beginning
0:45:54	we need to work more to address the factors in the forensic domain the
0:46:00	i degrade performance
0:46:03	real case data as you heard can be extremely challenging
0:46:07	in right now if somebody wanted to ask okay that first example with the triple
0:46:12	homicide what kind of error rate could i x that
0:46:16	in that situation that is one of the downward factors
0:46:20	nobody can answer that even close
0:46:26	there's many challenges to as
0:46:28	that are needed to address these questions
0:46:31	please contact me if you have any ideas and i think has he said it
0:46:36	best
0:46:37	someone is a very good finish way for a decision
0:46:41	so maybe we can talk more about this and this on a nine
0:46:45	thank you so much
0:46:55	when you think drawable is a very much so
0:47:00	where a little bit longer but what we us to have five or ten minutes
0:47:05	for questions so yes
0:47:09	wants to
0:47:10	begin
0:47:13	what for microphone is coming
0:47:18	i four recording
0:47:22	self recording for the mismatched especially the first equality play
0:47:26	is that the question of the intelligibility of the speech is even a human cannot
0:47:30	understand for example the first but you like you can understand what they say how
0:47:35	the machine can't it with like
0:47:37	so that the intensity of the speech is one part of
0:47:41	like for special for maybe locates the say okay
0:47:45	this problem for just a bit of speech is no one can ask expert or
0:47:49	t so we can expose from the beginning or something like that right is
0:47:53	is the issue addressed before so
0:47:56	the intelligibility issue is an interesting one because it comes up and one of the
0:48:00	very first courtroom the ask goes with the michigan state leaves
0:48:05	with some voice evidence
0:48:08	when the testimony from one of the police was that this per the
0:48:16	voice on that recording
0:48:18	can only be this person to the exclusion of all others and then the judge
0:48:23	played the recording
0:48:24	he couldn't understand
0:48:27	so then he's asking so how what makes you think
0:48:32	and quickly this was overturned
0:48:35	or ruled out
0:48:38	then stepping forward
0:48:40	as you saw with the structured listening
0:48:43	the first step there's to transcribe the speech in the words and then look for
0:48:48	these
0:48:48	very variation
0:48:51	i you're in trouble if you can't transcribe the speech in for that now
0:48:57	one thing that we need to be cautious a with the automatic systems
0:49:02	as long as they can detect speech which isn't always the case
0:49:07	they'll process the data and produce a score
0:49:11	well you shouldn't three like a black box
0:49:15	that score might be meaningless
0:49:17	so i don't really know how to directly address your question other than share those
0:49:22	observations but if you're working on that would be good no
0:49:28	okay
0:49:29	thank you
0:49:31	what else
0:49:35	which are
0:49:40	thanks for torture i'll well also adding speech and leon in france i attended the
0:49:45	forensic tutorial
0:49:47	and he said that when i have a tracks recording a and the core suspect
0:49:53	like in to them but like to rate
0:49:56	so that covers assigned phonetic pronunciations in the actual choice
0:50:01	can you just
0:50:03	i can i can clear
0:50:05	next we cringe but go a sorry i was kinda listening to your presentation about
0:50:10	the phonetic content we actually looking at the london fines right is that is that
0:50:15	occur something you follow similar type thing i you get the suspect to pronounce assigned
0:50:20	twelve fines
0:50:22	use of this gets down to
0:50:25	in one area the methods being use
0:50:29	so the very old
0:50:32	antiquated i method known is that spectrographic matching
0:50:37	actually requires at least twenty word like units
0:50:42	being spoken
0:50:45	that match what's in the evidence
0:50:48	so one way they would deal with this it's to give the person something to
0:50:53	really get loads twenty word like units
0:50:56	well as you can imagine read speech is disastrous if you're trying to study things
0:51:02	like dialectal variation
0:51:05	so
0:51:06	what's good for the all
0:51:08	spectrographic matching process is a disaster for modern
0:51:13	methods like structured listening which i should add are inspired by a lot of the
0:51:17	methods used in europe in germany by the be okay
0:51:23	so this is
0:51:24	those recordings that they could be talking about the old
0:51:28	style manner
0:51:30	just as a subsequent questioned then we're able to get some kind of speech recognition
0:51:36	into a speaker id systems
0:51:38	where there is some kind of phonetic alignment is not beneficial to the community
0:51:45	the forensic
0:51:47	well in fact some speaker recognition approaches
0:51:51	have a layer where they're actually doing speech recognition and phone recognition
0:51:58	and that a lot of that work was inspired by george doddington actually
0:52:03	i and idiolect
0:52:05	and sure whether it's in the recognition system itself for a by product of these
0:52:11	structured listening approach speech recognition becomes a very important process whether it's automatic it's a
0:52:19	different question
0:52:22	but if there's a lot of data to analyze the overwhelming analysed if they have
0:52:27	to manually i do you say phonetic transcription which was the approach being used for
0:52:33	quite awhile
0:52:35	that is this bad system i showed and that one slide helps to automate that
0:52:40	speed the efficiency in fact
0:52:49	but question
0:52:52	sit under a texture you mentioned in a is the sort of pitch more and
0:52:58	of course that's scary for us to what we're never gonna be as accurate as
0:53:02	they are that's i think that's problem in speaker recognition
0:53:05	but are we have valuable evidence to introduce it softer it sweeter evidence
0:53:12	the using the american legal system can understand the concept of weaker evidence and how
0:53:18	value valuable it can be an integer do you think a likelihood ratio
0:53:22	can be understood by four
0:53:26	okay so that is multiple ones
0:53:30	the first one
0:53:31	it is what the i national academy of sciences with calling for with the framework
0:53:37	like the in
0:53:39	they weren't although would be nice they were demanding that the performance be on par
0:53:44	with the end
0:53:45	but they let it be in a in the scientific background behind and very large
0:53:52	studies that have been done here all evidence it's a very nice
0:53:57	except by the way when you're dealing with uni mixtures but for the time being
0:54:02	just assume that you the any samples where there is a whole nother the of
0:54:08	dealing with some of the those channel so
0:54:11	in a is not perfect but it's extremely good
0:54:15	the next question about will jurors be able to deal with properly understand likelihood ratios
0:54:23	so bill perhaps and it is conducting a survey of the mock your
0:54:30	actually see when they're presented with
0:54:34	evidence in different forms whether it's likelihood ratios i a verbal description of what a
0:54:41	log-likelihood ratios for might mean to see how that's interpreted by jurors i don't know
0:54:49	he's publish that paper but it should be happening soon
0:54:53	and one thing that happened with dorothy going and see who is also involved in
0:54:57	this study is a hybrid cy x where she came up with a very scary
0:55:02	statistic in that was something like a quarter of jurors in the us
0:55:08	don't understand fraction
0:55:11	what are we gonna do
0:55:13	move to europe i don't know how well i don't know what the ratio is
0:55:18	in europe but wow that's this area so
0:55:23	but it's important that the general public vad
0:55:29	i don't know what but if i could commanders peace last question i'm not sure
0:55:35	it's useful to ask the question in fact i have the answer but don't and
0:55:41	pickle will not understand the likelihood ratio and we know all about because well for
0:55:46	and able to understand likelihood ratio and how mine
0:55:49	under the
0:55:51	reason to and so like but there's
0:55:53	you should still requesting for local overall system in all the countries to be expected
0:55:58	to be a witness to coming from papa coped
0:56:01	you know that we explain for people to you means but we still keep results
0:56:06	but it so like to one issue is not the non-focal is not
0:56:10	the lemon
0:56:11	so why we define orifice
0:56:15	that can break issue used only
0:56:17	according to me to give you pour to needy to view of a party
0:56:22	to
0:56:24	i bouquet do to a estimated quality of what we didn't up of science in
0:56:31	the ripple
0:56:32	i like the ratio is defined for some difficult people use one expert
0:56:38	in the park the report is using a global that if you meet all we'd
0:56:43	like to ratio and of or expert could
0:56:48	review baseball than the a firewall against to the middle
0:56:53	and the we are in some to pick language not in the cold language after
0:56:58	about the expert the younger people
0:57:00	you see his own opinion and taking his own risk
0:57:05	and this is not
0:57:06	like calibration at all
0:57:09	sorry i don't want to take that would a i would like to a location
0:57:13	to discuss just question the later maybe k varies
0:57:17	last question
0:57:19	so one
0:57:20	no you
0:57:24	george the
0:57:26	well likelihood ratios a wonderful thing
0:57:32	the primary issue with the likelihood ratio use the
0:57:38	happens to be the output of a system whose crazy
0:57:42	the likelihood ratio
0:57:44	if you actually know the likelihood ratio
0:57:47	perfectly wonderful to use
0:57:50	but the likelihood ratio audible supposed to most portion
0:57:55	let's works
0:57:58	maybe what you were just getting at is that we need to keep in mind
0:58:03	we're always estimating likelihood ratios and it's just another
0:58:09	i area cost of mismatch
0:58:12	you know our systems are producing these estimates
0:58:15	and
0:58:16	using data that probably doesn't
0:58:18	look anything like that first real case i
0:58:23	so what you
0:58:25	i don't
0:58:27	i have to closed position a unfortunately i and i want to thank you
0:58:32	by your jewelry okay

Speaker Recognition for Forensic Applications

Keynotes

Joseph Campbell