Speech Transcript - Forensic and investigative speaker recognition

0:00:16	good morning everybody a well nigh
0:00:19	contribution here and will be more most focus on keeping
0:00:24	overview of some rating guidelines that have been developed in the last two years
0:00:29	concerning directly or indirectly a speaker recognition systems or semi automatic speaker recognition we human
0:00:39	intervention the feature extraction mainly
0:00:41	and then the message format that before
0:00:45	doing something with speaker recognition in court in europe at least we should read this
0:00:50	guidelines because they're being generated after process of consensus among some community so i think
0:00:58	they're relevant community so it's that's a message phone we want to do something you're
0:01:02	if you're not from europe i thing at least it they deserve a re the
0:01:06	not to know what's going on in you know or environment
0:01:12	well the first one is
0:01:13	and the so called m c guideline for evaluative reporting in forensic science most of
0:01:18	you probably already know
0:01:20	eight was released in two thousand and fifteen i'll talk about later
0:01:24	second one is
0:01:26	this works
0:01:30	something wrong
0:01:35	i don't know what's going on
0:01:38	second one is a gallon that we have developed in a collaboration were then if
0:01:42	i and with consensus roommate additions on validation of light of racial methods
0:01:48	and for forensic evidence evaluation
0:01:50	and the first guideline is a guideline that has been
0:01:53	released
0:01:55	something's wrong with the computers
0:01:58	right
0:02:38	that's for the best for the windows system
0:02:41	okay
0:02:43	with a one are some recent guidelines on but the logic and islands for back
0:02:49	practising for a six you madam adding an automatic speaker recognition also develop by m
0:02:53	c in europe and network forensic sciences to do particular the forensic speech analysis work
0:02:59	we're concerned in the first the first one is probably to the three of them
0:03:02	are available second one is already published in forensic science international from the third are
0:03:07	in this repository of documents from m c
0:03:11	and some critical combinations of this guideline are about expressing conclusions in court in general
0:03:18	not only in speaker recognition but in forensic science in general their recommendation for all
0:03:22	forensic science fields
0:03:25	and there's some critical recommendations in the guideline that have especially stressed
0:03:30	first one is that the expression of conclusions must be probabilistic
0:03:36	somewhere breast cancer recommend in their server must gain in this in the guideline
0:03:40	that i recommended to transform the probabilistic statement at a form of likelihood ratios in
0:03:45	terms of formal equivalence and what is absolutely stressed is that okay absolute statements should
0:03:53	be avoided
0:03:54	like identification exclusion categorical statements
0:03:59	second one is that when the one has to the finally hypothesis in the case
0:04:03	that's a same die different guy or this guy comes from this voice comes from
0:04:08	this guy all these speech segment comes from another person in with this characteristics
0:04:13	one has to consider at least one alternative
0:04:17	can be many of them but at least one
0:04:20	and a clear definition of the database also
0:04:23	a is also mandatory because the definition of interactive defines
0:04:28	what is the data we have to handle in order to compute this weight of
0:04:33	evidence
0:04:34	there's one it'd findings must be evaluated to given each of all the buttons is
0:04:38	so that lead as to
0:04:41	somehow kind of
0:04:44	well likelihood for each hypothesis only two hypothesis case we try to a where we're
0:04:48	going to a likelihood ratio
0:04:51	for the one it said that the conclusions of this breaks in terms of support
0:04:55	of hypothesis instead of probability of the processes this support to the hypothesis that putting
0:05:01	read this the way of
0:05:03	it is quite easy way to avoid some fallacies in a reasoning
0:05:09	and it for so as to suppress are support are the weight of the evidence
0:05:14	in terms of aligning racial rather than a posterior probability ratio
0:05:17	so support is an important work you want to avoid this kind of classes
0:05:22	so they will last one is that a data driven approaches should be the
0:05:28	final goal
0:05:29	but in the meantime their many people that cannot role in the lower tiers to
0:05:34	data driven approaches so
0:05:36	the guy lighter considers a they use of subject the subjective judgement is subject to
0:05:41	probabilities and so on
0:05:43	but it is recommended that data-driven this is kind of
0:05:46	a long-term goal
0:05:49	there's also an example in speaker recognition is not an example of what speaker recognition
0:05:54	soon should be because the generate the some controversy in the into the m c
0:05:58	four six p channel your analysis group because there are many ways of doing speaker
0:06:02	recognition
0:06:03	this is an example you should on automatic case it was generated by people from
0:06:07	what if it will that they used automatic speaker recognition for doing this but it's
0:06:10	not
0:06:11	exclusive
0:06:12	just a guy templeton given example
0:06:15	how to do this in a given particular scenario with a given particular weight of
0:06:20	special conclusions which speaker
0:06:23	well the second that nine is a guideline validation we have been developing with people
0:06:27	identify and people that the a professor
0:06:30	and this guideline is aimed
0:06:33	to recommend everybody in forensic science that is you the likelihood ratios to go to
0:06:39	works
0:06:40	a objective evaluation procedures which is not the case
0:06:45	typically in many forensic science fields
0:06:48	here a speaker recognition we use that definitely in a in this conference everybody
0:06:53	use a experimental environment to validate their methods
0:06:58	but the two questions here first i if you're not used to that how to
0:07:01	do it
0:07:03	which
0:07:04	somehow i it comes to perform as measuring how performance at messrs should be interpreted
0:07:10	which perform and messrs i relevant
0:07:13	and the second one is okay i have a validated by a system is in
0:07:18	performance measure so
0:07:20	how to put that into play in order to make one technique
0:07:25	to be able to go to court some recommendations regarding laboratory accreditation laboratory and
0:07:32	okay procedures and so on
0:07:35	the guideline is very
0:07:37	particular it can create but i'm not gonna go into more many details the of
0:07:41	the thing is just determining if an implied a correlation matrix is able to be
0:07:45	used in court
0:07:46	and everything should be documented
0:07:50	we are in the process of a stellar accent is island into allies just and
0:07:54	therefore biometrics
0:07:56	d mlps meeting these but there are some of the people here collaborating from start
0:08:01	and or laboratories related to i so
0:08:04	and we proposed in a tile i some relevant characteristic this table is not intended
0:08:09	to that you read the table but you can see somehow cor eer thinks that
0:08:15	we are used to here so we contributed this into the general for as you
0:08:18	feel but this performance measures are not
0:08:22	limited to this once just a proposal
0:08:24	so that the guideline is supposed to be open that sense
0:08:27	so everybody can contribute would more performance measures these are the minimum requirement that we
0:08:31	understand that the validation process should contact regarding
0:08:36	performance measure and also there's a high stress
0:08:39	and most of my colleagues would talk about it
0:08:43	about the use of relative relevant for a six data so laboratory data it's okay
0:08:48	using a nist evaluation is nice
0:08:50	but
0:08:52	the last we follow with a critical the performance measuring in forensic
0:08:58	fourth conditions which is extremely tricky can stay
0:09:02	an extremely tricky issue and that like colleagues will talk about it later so
0:09:09	finally this l m c guideline for forensic automatic or semiautomatic on automatic speaker recognition
0:09:14	that was laid by pretty led by under the got a within the forensic speech
0:09:19	utterances working group
0:09:21	and it is guile anything have is compatible with the m c guideline for reporting
0:09:26	is also compatible with the validation guideline that we have been talking about before
0:09:31	and also address
0:09:33	many other issues
0:09:35	like a the most used technologies and matters with that the state-of-the-art methods that reliable
0:09:41	the most used features we have the features that typically used in hearing in different
0:09:46	approaches which are more reliable audio preprocessing how what is information if you might be
0:09:51	a human being in the process as well as well so it's based techniques guideline
0:09:56	and they have been developed within the for six a speech about it has what
0:09:59	many of us here have been developing having contributing to that so it's a guideline
0:10:05	that presents a high degree of agreement today i mean
0:10:09	okay that was my can be
0:10:15	thank you then and just and namely not so we have one minute for a
0:10:21	small or we question
0:10:24	in the case will have more time
0:10:26	when all the fast talk
0:10:29	any question
0:10:31	for then
0:10:33	and the guidelines
0:10:40	we don't continue with k
0:10:41	yes
0:10:43	i
0:10:44	so
0:10:45	dennis
0:10:47	is going to the one you know with his presentation
0:10:51	okay
0:10:56	window
0:11:01	how do you full screen
0:11:18	a good morning everybody
0:11:20	i'm that jonathan
0:11:22	from sweden
0:11:24	work for a company always be
0:11:27	and also
0:11:29	why the university of garber
0:11:32	currently at
0:11:36	i'm gonna talk a little bit about
0:11:38	a credit the small forensic speaker comparison
0:11:42	which we are
0:11:44	so the company
0:11:46	company we performed case work for around eleven years
0:11:49	been to sweden norway and us
0:11:54	approximately fourteen cases
0:11:55	almost all the more swedish cases
0:11:58	there are three
0:11:59	people employed
0:12:02	all the most part time
0:12:04	all employed by the university as well
0:12:07	and we are the sub contract or of the swedish national forensic centre
0:12:12	basically we handle more or less all the cases
0:12:15	sweden
0:12:17	a small area
0:12:19	just give you some short
0:12:22	just quickly talk about an implied methods mentioned them
0:12:26	and then i'm gonna talk about the evaluations for accreditation where daniel stuff comes in
0:12:33	very briefly what a forensic conclusion in sweden looks like and
0:12:38	quite a few questions
0:12:39	to put up there
0:12:43	so before explaining the three parts very briefly there's of course screening processes so and
0:12:51	fc screen means that
0:12:53	and that's developed over the years of course of these days it's basically
0:12:58	screen part of the cases are round fifty percent
0:13:02	and that happens and fc
0:13:05	these days basically
0:13:06	before it used to be a lot more screening an in house for us
0:13:10	but not in a t does it and one more because it's cheaper for them
0:13:14	and then there's always the second screening done
0:13:18	at our place as well and then
0:13:21	hunting comes from one station with joe the others we always say keep open so
0:13:26	that we can actually one samples during the analysis of web even if we take
0:13:29	taken on the analysis
0:13:32	job
0:13:34	the first part of the analysis is the linguistic phonetic perceptual analysis
0:13:39	i also
0:13:41	these days and some cases a
0:13:44	it's also could begin with a light dusting depending on how many people are embolus
0:13:48	on linguistic part is you know go through different steps of perceptual evaluation
0:13:53	it try to keep it in and some kind of bayesian manner so how do
0:13:57	we treat covered by a small
0:13:59	keep very brief you go through it once with and you bias yourself actually for
0:14:05	the one hypothesis and then you go through it again and you bias yourself for
0:14:09	the other i
0:14:10	and two people always doing this and third person
0:14:14	in most cases and to the by test
0:14:16	now the three more or less to the point at a private case
0:14:20	some level
0:14:22	also
0:14:23	matter cost and how much working pretty to case
0:14:27	second part is stiff you acoustic measurements that we still do
0:14:31	and are part of the standard protocol ones articulation rate basically produced a little per
0:14:36	second
0:14:38	fundamental frequency measures few of them graph and then the long-term formant analysis
0:14:45	which is basically nowadays handle more or less automatically
0:14:49	and well
0:14:50	also put into an i-vector system
0:14:53	and
0:14:54	and third party cycles than the automatic system so currently there are two systems are
0:14:58	active
0:15:00	we're
0:15:01	evaluating one system and as one system researcher for systems altogether
0:15:08	guidelines when it comes to the evaluation for accreditation
0:15:12	we've been
0:15:13	fiddling around in the dark basically not knowing what to do exactly and i think
0:15:17	false we we're
0:15:19	we very much appreciate the work that's done but by and say and that's true
0:15:24	but also maybe especially since we're in a tight schedule mouse are next a deadline
0:15:29	for accreditation is like
0:15:31	to over
0:15:32	so when they regardless on when i was that and a five month ago meeting
0:15:36	would be da
0:15:37	and only but this work with the dog you know rudolph
0:15:42	that guidelines really important for us to how to treat the validation of automatic systems
0:15:48	it doesn't solve and everything of course and that's a lot of you can discuss
0:15:51	that very much but at least there are some guidelines now we can follow and
0:15:55	we know what to do basically for the accreditation at least and then you
0:16:00	people discussing
0:16:02	so that some of these are just example some of the plots that they
0:16:05	a suggestion the guidelines for some that it all looks five and you know it
0:16:09	can get the figures for each of those plots
0:16:14	these are some example of the problems you can start running into well from this
0:16:20	is from doubled from the flu to identify
0:16:23	you created directly maps for the results in this case it's a little are means
0:16:27	but also for equal error rate and so on four different testing that you don't
0:16:31	and huge telephone database so
0:16:34	more or less like see what happens when the training samples are more than one
0:16:39	and when the test sample is more or less or shorter and shorter
0:16:43	what happens
0:16:45	in the evaluation process
0:16:46	but if you consider all those plots
0:16:49	and all those figures you can and accreditation process you can realise that
0:16:54	is gonna be quite many pages if you also very brief you don't have to
0:16:57	read all this
0:16:58	consider how many validation start
0:17:01	very quickly went through that during these eleven years we've done over a hundred evaluations
0:17:07	and if you consider all those different the conditions and so on different durations like
0:17:12	microphone distant microphone mobile recordings with and without phase cover in an outsider car indoor
0:17:19	outdoor different languages different compression with done more less all those
0:17:25	with different datasets and some simulations but
0:17:28	you can imagine what a large document that would be
0:17:33	document in all those evaluations for the accreditation process
0:17:38	the perceptual phonetic analysis also has to be evaluated
0:17:42	currently we well it's been a difficulty for us because we're we've been to before
0:17:46	and we know pretty much that a to we have to some extent at least
0:17:51	so we been trying to evaluate each other
0:17:54	back and forth over the years now we are third person she goes through basically
0:17:59	training
0:18:00	testing because even though your the phd like speech pathology in her case
0:18:04	and you a great year still have to evaluate everything and you're not really used
0:18:08	to do forensic analysis on telephone material
0:18:12	she had to go through training phase the testing phase and then aligned evaluations
0:18:17	as we started that the small scale of course because extremely time consuming the last
0:18:22	almost like
0:18:23	twenty three speaker took are some three days to form the analyses
0:18:30	just quickly showing you what the
0:18:33	the national forensic centre verbal scale looks like nine point ordinal scale conclusions
0:18:39	two hypotheses
0:18:42	so from level loss for two-level minus four zero in the middle
0:18:46	and it goes something syllable-level plus four isn't like the results are extremely much more
0:18:51	probable is the main approaches to compare the alternative
0:18:55	a mind of two there's also more probable the alternative hypothesis to compare two main
0:19:01	behind each level there is a standard likelihood ratios
0:19:08	important to remember
0:19:10	even if you do all these evaluations and you put this probably thousand pages document
0:19:14	for accreditation every cases uni how much can you actually inferred from all the evaluations
0:19:20	you've done
0:19:21	to each and every case
0:19:23	is not easy at all
0:19:25	even though it looks really don't know it's evaluation
0:19:29	see a lot of stuff to think about still even though you go threat accreditation
0:19:33	process and you get this down problem
0:19:37	evaluation is not like the evaluation stops
0:19:42	and that just in general pattern that out there as well
0:19:45	we what is need to have a transparent report
0:19:49	still don't know that there's
0:19:51	something that we need to discuss much more
0:19:54	and who has to be able to understand this report
0:19:57	is it the actual
0:19:59	the jury or judge the
0:20:01	actually another expert probably which that's how are basically the
0:20:08	i think that's pretty quick
0:20:09	thank
0:20:16	excellent i mean we have time for a couple of quest
0:20:20	nico
0:20:22	like we did not
0:20:28	why
0:20:31	it's data mining for its well it's just two examples because of output all them
0:20:36	they're the slide look crazy and
0:20:38	so i just like plus or minus to give you an example i could have
0:20:42	taken the minus four
0:20:53	i suppose is probably more of a common that will get it to later but
0:20:56	based on
0:20:58	the preview so far seeing with first to talk
0:21:01	but one concern i have nothing wrong that is that
0:21:05	the big of the forces all the data right so that the data and there's
0:21:10	a lot of that are going to about guidelines accreditation so for one thing is
0:21:14	gonna be everybody keeps a the data is the problem but it keeps
0:21:19	kind of putting
0:21:22	of the near that if i guidelines and accreditation now it's gonna look like it's
0:21:28	more official time disconcerting later it is not really quest the discuss of how we're
0:21:32	actually ever gonna get our hands around the data issue
0:21:37	one leader answer all to me
0:21:40	well what i can tell us that
0:21:42	there is a lot of data
0:21:44	but of course we can cover all these conditions that amount of data but
0:21:50	to me also it's this the sensitivity of the data
0:21:53	so i can tell you there's a lot of data i can't really tell you
0:21:57	about how it's collected what data is its own because it's all kept behind
0:22:04	secrecy to too large extent and
0:22:07	that's also depose specially in sweden to do when you publish things
0:22:11	a lot of evaluation that we don't over the years we can publish because it's
0:22:16	i hope that is actually going to change now but it's the it's gonna
0:22:20	huge problem i can't really i can give
0:22:22	well if i probably something i have to be able to give the data actually
0:22:26	to another researcher if he asked for
0:22:28	or making can i
0:22:31	to this intuition and actually use the data error or something to for falsify ability
0:22:35	thing
0:22:38	but
0:22:39	if you can do that you can't really publish anything so
0:22:43	and that's gonna difficulty but now we're
0:22:46	probably we maybe can do that anyway because organization it's changed please it's also
0:22:51	but will see
0:22:59	thank you let's go to our next us the you can sense
0:23:28	some talk a little bit up about some aspects of a word could be a
0:23:34	became we do speaker recognition since the seventies and early days it was done automatically
0:23:38	but the technology wasn't really
0:23:40	and ready and
0:23:44	that the method used was the autumn
0:23:45	auditory in acoustic-phonetic method starting from the eighties
0:23:49	and since about two thousand five use both this onto an acoustic method a compact
0:23:53	with that
0:23:54	plus also automatic speaker recognition
0:23:59	a just a few slides you
0:24:01	so you heard about from daniel about these
0:24:04	guidelines
0:24:05	for
0:24:06	as semi-automatic an automatic speaker recognition
0:24:09	and just again repeating into two of the aspects or one is
0:24:14	the outcome of an automatic or semiautomatic method is
0:24:18	the likelihood ratio so
0:24:20	it's all about
0:24:22	it it's and systems that output like ratios
0:24:26	and another important aspect that then dimension as well this is that validation
0:24:31	of a like information method has to be performed with speech samples
0:24:35	that are typical representation of the speech material frantically boundaries confronted with an everyday work
0:24:41	so it's gonna be
0:24:42	forensically the relevance
0:24:45	information
0:24:47	these kinds are accessible even here on the on the website you might have noticed
0:24:52	that this is link using all the
0:24:54	it gets you to well
0:24:57	the and c website and there are four documents on there so as one of
0:25:02	all documents
0:25:04	on the nist website
0:25:07	now
0:25:07	since we have you those guidelines are we have to sort of a
0:25:11	practice what we preach so we have two
0:25:13	get busy
0:25:15	collecting the forensic data forensically relevant data and we've been
0:25:21	starting doing this a while ago one of those activities have been published and the
0:25:27	odyssey two thousand twelve
0:25:30	in our activity and ongoing
0:25:33	and another q this is our collaboration with the end of high
0:25:36	on
0:25:37	they have a
0:25:39	not really is they have good
0:25:42	compiles vienna five fruits corpus that was document and all those in two thousand fourteen
0:25:47	and we have a special license to work with them was off work to look
0:25:51	at this going many restrictions and so forth
0:25:56	also for in terms of what kind of data we have the best coverage is
0:26:00	for matching conditions
0:26:03	involving telephone intercept data
0:26:07	what's more difficult is about condition so especially mismatched conditions one type of conditionally frequently
0:26:14	have is
0:26:15	comparing
0:26:17	terrace videos
0:26:19	the people making announcements to public disguising the phase this and
0:26:25	is encouraging people to come to their
0:26:28	training hams and stuff like that
0:26:31	as opposed to telephone intercepted recording us all these guys callhome and then there's interception
0:26:40	it would be captured telephone section
0:26:43	so this would be an indispensable in terms of technology but also the speech style
0:26:48	so this guy i read something for example make involvement or learned it's at all
0:26:52	it's different from a natural telephone conversations
0:26:56	so we do have somebody remote we it's more difficult to collect the data
0:27:00	in other challenges language so we have case work in several languages and we want
0:27:05	to can cover them
0:27:07	and we do collect data from different languages but there is a limit to what
0:27:11	we can do in is an impact as a parallel strategy
0:27:15	we also investigate the affects both the size and that the type of the effect
0:27:21	if the is
0:27:23	mismatch in terms of the data we have so one type of situation is if
0:27:28	we have a
0:27:29	a testing corpus were but we don't have the right reference population for that we
0:27:34	have to use a reference for lid from another language what is the effect in
0:27:38	terms of shifting the like ratios will be used the incorrect
0:27:42	reference population as not a big effects
0:27:44	so these kind of effects are to some extent predictable
0:27:47	this what we also to took too to capture this language should languages is a
0:27:52	big issue
0:27:53	it's a
0:27:54	we don't we can just one
0:27:57	language a it's a it's in several languages we want to cover that
0:28:02	this one more practical problems not
0:28:05	no that to move more conceptual
0:28:08	problem and it issues
0:28:12	the one that's combining
0:28:14	different kind of every this there is quantifiable evidence
0:28:18	like a ratios coming from automatic or semiautomatic systems that's what the guy like
0:28:23	well
0:28:23	there's also qualitative evidence coming from the auditory phonetic and acoustic phonetic method
0:28:30	and we use both kind of evidence i mean some partitions an answer to this
0:28:35	just work with quantifiable evidence
0:28:38	others work with both have of evidence and the question is that how to combine
0:28:43	the two
0:28:44	and the since not everything is quantifiable if we use both methods eventually it we
0:28:50	have to be something
0:28:53	some strength of evidence statements that are not entirely
0:28:57	quantitative so in the in the end of it
0:28:59	one components qualitative the quality of the their entries that has to be qualitative be
0:29:04	because it doesn't
0:29:06	can you cannot calculated or way through there is some qualitative aspect so that's standing
0:29:11	problem
0:29:14	not unsolvable of course
0:29:16	but it was all those in students to use both like the ratio producing methods
0:29:22	and qualitative methods the other one that's the most painful problem probably is this one
0:29:29	here about the a colour the interfacing with the core so
0:29:34	you can do audio stuff and would well or not so well but i one
0:29:39	cases judgement they have to go to court and interface with people from record and
0:29:43	they have of and have different mindset and different expectations and so forth and the
0:29:48	situation that we have in germany is
0:29:50	the courts in germany still expect posteriors statements
0:29:54	so the expects things like what you have your table
0:29:59	well the identity and or not identity cannot be assessed or is probable are highly
0:30:04	probable very are probable
0:30:06	can be assumed was near certainty that this is sort of stuff they used to
0:30:09	and that it still expects
0:30:12	no this of course this that discusses and everything but there is sort of psychology
0:30:17	inertia against switching to a bayesian framework
0:30:21	the v ideal
0:30:24	idea about the bayesian framework is
0:30:26	the speech experts supplies like reissues over prevalence of a forensic experts to the same
0:30:34	and then the courts applied prior all calculate posterior also from the prior art
0:30:39	and all the like iterations that coming from the expert so that would be the
0:30:43	ideal scenario
0:30:45	that's still and there's against implementing it and all and the netherlands sweden
0:30:51	you have much five and then we didn't only
0:30:54	i don't know if you sort of can
0:30:56	especially point three how state of the or some on that one
0:31:01	but this is since topic for discussion
0:31:04	just i'm just a interfaces so that
0:31:07	this it's not this is
0:31:09	the and then and this expectations coming from the core about sort of things they
0:31:14	want and so forth
0:31:17	that's basically but
0:31:19	i've system model
0:31:27	good thank you very much we have time for a couple questions
0:31:42	could you can just say something about how you actually at the moment go about
0:31:46	combining
0:31:47	they quantative on the qualitative data is the sum
0:31:52	explicit statement about how you do that and how you integrate any kind of
0:31:57	relationships between those two types of evidence
0:32:03	we went to do with for the automatic is a thing of a here for
0:32:08	example
0:32:09	this is a plot coming from the guidelines and
0:32:12	and four we have is that we have
0:32:17	and i
0:32:21	well
0:32:29	i
0:32:34	i
0:32:36	i
0:32:38	i
0:32:44	i
0:32:48	i
0:32:52	i
0:33:10	i
0:33:15	i
0:33:43	five
0:33:48	i
0:33:54	i
0:34:11	the resistance against the bayesian
0:34:15	paradigm
0:34:16	could it could it could vocabulary contribute to all the could german words for like
0:34:22	to drive so prior also still you have
0:34:28	i
0:34:38	i think easy i think using john colour you have to explain the concepts and
0:34:45	everything
0:34:46	no i think it's not
0:34:48	language a little or no so probably but
0:34:51	as more as regression process on the core
0:34:55	the reason i'm asking in my home language awfully cons we don't really have words
0:35:00	for we have four probability
0:35:03	voice kind look but okay this enables as well that's why is that it but
0:35:07	once all the because there's not even though this things with your likelihood and probability
0:35:11	is just a sign or this is like overcomes were i got no idea of
0:35:17	this a posteriori
0:35:19	probability of the cost i don't know how to set
0:35:22	i
0:35:29	since the guy border vocabulary
0:35:34	my comment on there are two sort of got up again and again
0:35:38	which contribute to
0:35:41	at least at least partially to pull this with
0:35:45	interfacing with the legal profession
0:35:49	one of them is support
0:35:53	and the other one is the use of speaker recognition
0:35:56	no if you keep on talking about speaker recognition is not surprising that the cool
0:36:00	thing should one speaker recognition
0:36:03	right and do not this isn't speaker recognition you giving them elected version the speaker
0:36:08	recognition comes with the posterior
0:36:12	i think it's okay for us we understand that i think but
0:36:17	of course the legal profession is something to the if you keep on talking about
0:36:21	forensic speaker recognition
0:36:24	then
0:36:25	so surprising that i'm the one to the size of the sense we will
0:36:29	and secondly this
0:36:31	the one of the things that really gets by backup is this will support
0:36:37	in the likelihood ratio supports the hypothesis
0:36:42	it doesn't well
0:36:45	the like to the meaning of the likelihood ratio is the
0:36:50	hypothesis merges with the post with you when you take two parts into account mm
0:36:56	it can be reversed you know the last iteration of the thousand be robust
0:37:01	it has the meaning else it has a that has no meaning
0:37:05	apps the problems sight talking about this likelihood ratio of support for the prosecution hypothesis
0:37:14	the trouble thing to support a language
0:37:19	this is i know that's what people use i think it's a very bad choice
0:37:27	i
0:37:30	i
0:37:34	what's the same think then
0:37:35	they didn't this is this is you talking you're talking about you the trying to
0:37:40	say something about
0:37:42	no trying to say something about the posterior in the in the absence of the
0:37:47	prior
0:37:49	and i'm not that there are plenty of other words but the but it's a
0:37:53	it seems to the standard itself as i
0:37:57	expression i and i again a way that we discuss later but i think that
0:38:04	the grim some core implicitly stays
0:38:08	there is no a consideration of all information for supporting previous opinion but you use
0:38:15	it in conjunction with
0:38:17	support for the hypothesis the not
0:38:21	the results are more likely
0:38:25	not that are i understand lately sentiment over the whole thing but if you say
0:38:31	my likelihood ratio to give support for the prosecution hypothesis well the defence hypothesis that
0:38:38	no one is a i mean how could happen that the wording that's been used
0:38:42	i understand the problem
0:38:45	i would like to stress is not the likelihood ratio what supports
0:38:49	is the findings also for via
0:38:52	with of evidence which is quantified in a range well the findings of different
0:38:59	s
0:39:09	okay so next
0:39:12	having
0:39:32	good morning
0:39:34	the title for like till today
0:39:38	he's opening the black box
0:39:40	for forensic automatic speaker recognition and this talk was
0:39:44	a prepared by financially and myself
0:39:48	we're from also wave research
0:39:52	which is e audio not speech rd company based out of oxford and are all
0:39:56	experiences feel is that we develop systems for automatic speaker recognition speaker diarization and audio
0:40:03	fingerprinting
0:40:04	and we've been what in this field
0:40:06	for quite awhile a products all used by law enforcement u k and other agencies
0:40:12	in the u k u is your the middle east
0:40:16	and include them at least you came only the n if i and seventy k
0:40:26	the
0:40:27	topic i'd like to dress
0:40:30	coast with some of the common set of in that come up already
0:40:34	and
0:40:36	it is the fact that
0:40:37	automatic speaker recognition
0:40:40	ease eight black box and this is a comment that what about colleagues
0:40:44	one of our conferences set and it stuck with me
0:40:48	and
0:40:49	i think a lot of this work needs to be attracted to address the fact
0:40:53	that automatic speaker recognition methodology is a black box
0:40:58	well the last few days we being treated to a variety of new algorithms you
0:41:03	techniques in might have i mean variations and modifications of different algorithms
0:41:09	it isn't
0:41:10	any surprise
0:41:12	that these mathematically complex methods
0:41:15	all black box
0:41:16	to the laypeople the juries judges and voice
0:41:21	to a certain extent even to the forensic experts
0:41:24	where using these
0:41:27	now
0:41:28	as we've seen recent advances have been with these
0:41:32	with a large number of variables and does comment earlier about it or being about
0:41:37	the data training and evaluation data the feature modeling and parameter choices if you have
0:41:43	an evaluation you have fifteen systems with
0:41:45	variations of orders where the arguments been placed in one way of the other
0:41:49	and how parameters and tested i have been included in the focus
0:41:54	has been on getting incremental improvements on these loss database
0:41:58	and weighted like to do not
0:42:00	the variability in these databases has been designed all controlled
0:42:06	now
0:42:06	how does this it within the context of opening up this black box if you've
0:42:11	got real forensic casework like some recordings of doing
0:42:15	how do you use and how do you address
0:42:18	the can
0:42:20	but
0:42:22	let's look at the end c guidelines for some sport
0:42:26	now the l c guidelines talk about any expert method
0:42:30	addressing
0:42:31	balance
0:42:33	transparency robustness and logic is on these of we already addressed quite good to go
0:42:37	into them
0:42:39	the things that stick out of balance for example that you have competing hypotheses or
0:42:44	propose a propositions and evidence is considered with respect to these hypotheses and propositions given
0:42:53	of course the prior background
0:42:58	and then there was about loading
0:43:01	and the fact that you know you don't want to
0:43:04	transpose the logical of
0:43:06	evaluating the hypothesis against evaluate the evidence instead
0:43:13	and robustness which is slightly different from the sorted speaker engineering we're talking about robustness
0:43:19	which is how well we did hold up to scrutiny however we really wanted to
0:43:23	cross examination the actual techniques the actual techniques of the use i will build a
0:43:27	problem
0:43:28	and i think
0:43:29	white importantly that something you don't get any black box
0:43:34	its transparency
0:43:38	so
0:43:39	how well with the forensic expert be able to explain the methods
0:43:43	and explain the data and that goes in
0:43:47	a few system that the using
0:43:49	now let's take a very simple straightforward it's expect for tonight used i-vectors in the
0:43:54	same sentence politics a straightforward automatic pipeline wave training the ubm
0:44:02	you've got a whole lot of data that you can put into training the ubm
0:44:05	you choose another
0:44:06	another set of data for training the total variability space
0:44:10	and then you if you using lda p lda you can use even yet another
0:44:14	speaker and that i know was used a lot well in these
0:44:20	and this is just before you it testing in training and validation or equal error
0:44:24	rates and so on
0:44:25	so if you we even got started
0:44:28	you've got data decisions multiple data decisions about the ubm training about the tv matrix
0:44:33	about the l d and the lda
0:44:36	and this is before considering things like what is the relevant population than the likelihood
0:44:41	ratio method and so on it so for this is embedded within the system
0:44:45	and
0:44:47	going back to dogs comment about resolving about data
0:44:51	the system that are developed
0:44:53	with these kind of background data
0:44:56	have to be explicit
0:44:59	about their effects on
0:45:01	the likely to show what least that needs to be transparency about the effects that
0:45:06	the that these are like calibrated
0:45:11	that that's one part of the problem that is sort of the automatic
0:45:15	a black box if you will
0:45:18	somebody could help
0:45:19	now if the u k most
0:45:22	of the forensic speaker recognition case what is performed by forensic conditions
0:45:27	and they have a lot of experience and knowledge they understand the material and send
0:45:32	the language they understand the that idiosyncrasies of that speech the in the centre legal
0:45:35	requirements of their
0:45:37	and
0:45:39	that they want to
0:45:40	include these automatic methods but are all automatic systems give these goals
0:45:45	and how you then
0:45:47	connect
0:45:48	this automatic score that you've got with this knowledge that you have about the fact
0:45:54	that this
0:45:55	speaker says
0:45:57	something that is very particular to a region or space
0:46:01	how do but these things together
0:46:03	okay assuming you even wanted to make your analysis more objective using likelihood ratios and
0:46:08	evaluating before system performance
0:46:11	how do you can to do this
0:46:14	what generally happens all happened was you had to
0:46:18	putting
0:46:19	against that sort of
0:46:21	you had a traditional sort of forensic phonetics based approach look at performance and voice
0:46:25	quality and linguistic
0:46:29	characteristics
0:46:30	and then you have the automatic space
0:46:33	which
0:46:33	which look at the spectrum and
0:46:37	you know street treated as a signal processing problem
0:46:39	because they only against each other
0:46:41	sometimes we don't even sit together at conferences
0:46:44	so
0:46:47	it's not
0:46:48	that kind of needs to go to this common political platform produce
0:46:53	beginning to be accepted which is that the that the bayesian likely iterations and it's
0:46:57	nice because you can have these multiple methods and not approaches and they can put
0:47:04	together in the same direction
0:47:08	i've been working with this problem for quite some years and then be with a
0:47:13	lot of colleagues who work with forensic casework
0:47:16	and i really think the
0:47:19	black box used
0:47:20	quite a quite an important probably creates
0:47:24	you've got situation where the forensic expert has four systems that they haven't elaborately decorated
0:47:28	these four systems for example
0:47:30	and you don't wind able to look in order that automatic system to you all
0:47:35	k-histograms i go back to but you on this is point about every case being
0:47:39	unique
0:47:40	and the expert should be
0:47:43	say system parameters means to use
0:47:45	new data at every step speaker recognition process
0:47:49	and in some sense
0:47:50	i in this
0:47:51	doesn't just go for you know commercial systems
0:47:55	i
0:47:56	x the expert should not be limited to these prepackaged preprocessed manufacturer provided models and
0:48:02	configurations
0:48:03	and they should be able to train the system specifically for the problem domain
0:48:08	and it's it was in this context from table three
0:48:14	that's
0:48:16	that we looked at one point in this is by no means the only good
0:48:20	only way of doing things
0:48:22	but
0:48:24	when you know
0:48:26	we don't that
0:48:27	putting together a not automatic system that was built with the with an open box
0:48:33	architect if you will so one if you flexibility
0:48:36	in the features that you put in so you could use automatic spectral features like
0:48:40	mfccs and so
0:48:42	but it is important but you could also use traditional forensic parameters like formants
0:48:47	and then
0:48:48	a debatable the fate but you can use user provided features again allow i i'll
0:48:54	the strength of these mathematical modelling techniques like i-vector p lda gmm and gmm ubm
0:49:01	and
0:49:01	and you can use and within the context of these lexical features
0:49:07	and
0:49:08	been doing this was that it was you were able to introduce needed all stages
0:49:12	in the i-vector by plane or the gmm-ubm pipeline
0:49:16	and
0:49:17	to a certain extent the system to the conditions of the case now
0:49:22	you lasting is this make
0:49:24	it's this big black box
0:49:27	transparent
0:49:28	no it doesn't
0:49:30	i e ds as complicated as it is
0:49:32	the what it tries to is open it up
0:49:36	to what goes into it and what data was into it and
0:49:42	allows for validation that's more meaningful
0:49:45	in the context of in this case
0:49:55	thanks any so there is only one we questioned
0:49:59	in you know case two
0:50:03	so that has a speaker
0:50:04	anyone very quick and then
0:50:10	and then the question itself
0:50:12	i'm another so i'm by s so i'm sorry for that but this is this
0:50:16	is a very interesting topic the black box thing and so on and i think
0:50:20	that
0:50:22	my opinion of course address trained yes because i think that when forensic expertise going
0:50:27	to court the board if an something he needs to understand what's going on and
0:50:31	what type of with a little additional using what type of algorithms that but using
0:50:36	wasteland that deceased into your specific case yes it's obvious every that's the main in
0:50:42	forensic problem that is every casey's is different and you need to have some ability
0:50:46	but that
0:50:47	but be careful with that because
0:50:49	you create a system where you can tune everything
0:50:52	then you create you make unsolvable the problem that what something before
0:50:57	because if you wanna system that is validated
0:50:59	and the same time you can change everything every time
0:51:03	that we're gonna problem because then you are gonna need to validate this is then
0:51:06	a single case so that for me for me creates
0:51:11	l a big problem and apply them or with a time because you need to
0:51:15	change data and sometimes is not a see the change data in the form of
0:51:20	audio files and so on if every single system every single case that you need
0:51:25	different the parameters of different song also makes more difficult to separate as also so
0:51:31	i think that
0:51:32	we need to find a place where you balance both things a transparency and openness
0:51:36	of the system but also unique list data lies some sort of a specific things
0:51:42	on the system just to the make it
0:51:45	to make the little the validation of the system at
0:51:47	what it does it
0:52:54	okay thank you any thank you in any case we can we can twenty maybe
0:52:58	this is interesting is gaussian
0:53:00	after that as a speaker
0:53:02	and then said well actually it and some of these points in all at the
0:53:06	in the other hand the demo in this challenge so you can also continue with
0:53:11	him
0:53:15	okay i'm gonna tell you about simon introduced to you right multi love our evaluation
0:53:21	or friends or voice comparison
0:53:23	that is being organised by myself and my former phd student of all bands and
0:53:32	so i think we've already talked about doesn't need for evaluation of forensic evidence
0:53:37	this goes across all branches of forensic evidence best been calls since the nineteen sixties
0:53:42	for forensic voice comparison to be evaluated under realistic case what conditions but i think
0:53:49	just by what everybody here said i think this still goes widely unheeded
0:53:58	so in our contribution to this is to run this friends go evaluation which were
0:54:02	calling forensically vol zero one
0:54:05	it's designed to be open to operational friends a greater or trees we especially want
0:54:10	them to partake take part
0:54:13	it's also going to be open to research work
0:54:16	and where providing training and testing data they're representing the conditions of one forensic case
0:54:23	so based a where providing the data but have that has based on a relevant
0:54:28	population for the case it based on the speaking styles for this particular case and
0:54:32	also the particular recording conditions for this
0:54:35	and
0:54:37	we are going to have the papers recording on the evaluation of each system published
0:54:42	in a virtual special is you all of speech communication
0:54:46	so the call for papers the system is not quite setup but i'm hoping it'll
0:54:51	be done maybe ventilate of this week or next week covers your
0:54:57	the
0:54:58	information if you wanna get information that still that's already available you can find it
0:55:02	by going to my website
0:55:04	and you can get started if you wanna start
0:55:09	so there's an introductory paper which is already available dropped of at least is already
0:55:14	available and it includes a description of the data and it includes the rules for
0:55:19	the evaluation
0:55:22	each paper that's evaluating system needs to describe the system in sufficient detail that it
0:55:26	could potentially be replicated
0:55:28	and we're thinking about the level of it could be replicated by forensic practitioners who
0:55:32	have the requisite skills and knowledge and facilities
0:55:37	we're not prototypes deadline on this people working in operational forensic laboratories are very busy
0:55:43	there
0:55:45	their priorities to actually do case work so where giving a two year time period
0:55:50	within which people can evaluate systems and submit
0:55:57	so disclaimer casework conditions very substantially from case the case
0:56:03	basically i'm of the opinion that you're sensually at this stage do have to evaluate
0:56:08	your system on a case by case basis because three conditions also variable from case
0:56:14	the case
0:56:17	and what that means is one should not whatever results one gets out of taking
0:56:23	part in this evaluation one should not assume that those are generalisable to other cases
0:56:27	unless a one can make a case that yes this all the case is very
0:56:31	similar to these the conditions in the in the front to give l zero one
0:56:35	case
0:56:38	so a little bit by the data to based on real cases i said of
0:56:42	the offender recordings of telephone call made your financial institutions call center this is just
0:56:48	something i
0:56:49	this work i just something i still of internet it's a landline recording at the
0:56:56	call center and it has babble and typing background noise it saved in the compressed
0:57:01	format because of course they want to reduce the matter storage that they have its
0:57:05	forty six seconds long and it is clearly an adult male australian english speaker
0:57:09	the suspect recording we should be able to get nice high quality suspect recording yes
0:57:16	okay right okay or no i have a point over there right this is the
0:57:20	actual room but the suspect recording was made in u c v is nice heart
0:57:24	goals and i think the cat the person taking the camera is like in the
0:57:28	opposite corner of the room
0:57:30	right imagine what the reverberation is like and you see this here
0:57:35	is nice fashion
0:57:36	and the microphone is in this box
0:57:41	so
0:57:42	a problems with the suspect recording as well but that's pretty typical of
0:57:48	the sorts of problems that we used we experience in real forensic work
0:57:52	so the data that we're providing a come from a database we collected which is
0:57:57	the whole database is actually available
0:58:01	but this is that this is extracted from that database i we got male australian
0:58:04	english speakers we have multiple non-contemporaneous recordings of each speaker we have multiple speaking tasks
0:58:11	recording session
0:58:13	we've got high quality audio so we recorded we actually had to record
0:58:18	the route speakers from the relevant population we have to record the relevant speaking styles
0:58:23	but then what we've done is with you type of the audio and we simulated
0:58:26	the technical recording conditions that i just mentioned and that's pretty pictures about signal at
0:58:31	most conditions
0:58:33	so we have training data from a hundred five speakers so if you're if you're
0:58:38	nist
0:58:39	definitely used of nist sre is that sounds ridiculously low but day i think availability
0:58:46	of data relevant data is a major problem in forensic voice comparison
0:58:50	and that's
0:58:51	are actually quite a lot of data of compared to what people
0:58:55	can usually manage to get
0:58:56	and the test data comes from a total of sixty one speaker
0:59:01	so i can i have time to show you some preliminary results
0:59:06	based on the data from friends give a zero one
0:59:10	so this is results that of all than i actually did so this is not
0:59:15	part of this special the specialist you in speech communication it's something that we did
0:59:21	previously which is pretty a which is already been submitted but it's on almost exactly
0:59:27	the same data
0:59:29	so it's the in this example is looking at an i-vector system mfccs ubm t
0:59:35	matrix lda ilp lda and then a score to likelihood ratio conversion at the end
0:59:45	using logistic regression
0:59:48	and we trained a two different versions of this system one is using generic data
0:59:55	it's not using the training the first training level is not using the date i
0:59:58	just talked about it using a whole bunch of nist sre data it's about an
1:00:03	order of magnitude more speakers and two orders of magnitude more recordings
1:00:07	and we use the generic data for everything to get to the score to training
1:00:11	all the models to get to the school and then we use the case specific
1:00:15	data for training the model that goes from the score to likelihood ratio so that
1:00:19	logistic regression model at the end
1:00:21	that's a fairly typical way of doing things
1:00:25	because you do all the heart rending upfront here
1:00:28	right we did another system where we use case specific data all the way through
1:00:33	where train the models that get to the scores using k specific data and then
1:00:36	with training the score the likelihood ratio models using k specific data
1:00:40	and here are some results in terms of a zero if you just nosy llr
1:00:45	accuracy of a look at
1:00:48	okay so the case specific data
1:00:50	is the one that performed using k specific there are always through perform much better
1:00:56	than using joe generic data to get to the score and then k specific data
1:01:00	for sparse code a likelihood ratio commercial
1:01:04	and if you like tippett plots use tippett plots there's the generic the gen our
1:01:08	data systems use the k specifics
1:01:12	and if you understand tippett plots that's a huge difference
1:01:18	dive in front of words has already been mentioned his
1:01:22	doing very well in this presentation for not having been here
1:01:26	so he's going to his or his already started doing the evaluation and we've got
1:01:32	some results from him and his kindly allowed us to show the results here he
1:01:37	was testing that works this different user options and bat fox a one user is
1:01:42	a one option is a reference population
1:01:44	we put in either or data from all the hockey put in data from all
1:01:48	hundred five speakers or you like that but select a subset of thirty and he
1:01:53	tried using no impostor data already tried using impostor data from all hundred five train
1:01:58	speakers
1:02:00	we here are the results us summarize if you use
1:02:05	data from all the speakers instead of having better luck select a subset you get
1:02:08	better performance
1:02:10	if you use impostors versus don't use impostors using about this gives you better performance
1:02:15	so that the combination that gets you the best performance at the two
1:02:20	and if you like to but there's a tippet plot one thing that's clear to
1:02:23	notice is when you only using the thirty speakers selected by that works there's a
1:02:28	clear by us here which is then maybe a bias there but it is less
1:02:32	it's less clear
1:02:35	okay scale cask
1:02:43	thank you so we have just time for one question before we move into the
1:02:50	final phase for open questions and all the presentations remember in the session and z
1:02:56	nine forty five so there's less than ten minutes
1:03:01	so if we could begin with some questions for jeff that be great
1:03:20	the if the data was
1:03:22	totally appropriate but
1:03:24	giving it's viable to do a comparison of the two systems that you put up
1:03:29	based on your compare your evaluation
1:03:38	i was prepared for the question
1:03:41	here's the use the best so this
1:03:44	that was the red what the red one is the best of that systems and
1:03:49	the blue one is the best of this just the i-vector systems we did
1:03:53	and
1:03:55	so blue one is better in terms of cmllr and there's the difference
1:04:00	in terms of the tippett plots as well
1:04:05	right and i think and i think cross going back going back to
1:04:11	just our system that there are the versions of our systems i think the but
1:04:15	the big differences where using case relevant data although we threw
1:04:19	where is that was using a lot of generic data to get the score to
1:04:23	likelihood ratio
1:04:24	to get the score level
1:04:27	and i think that fox works better than our system that use generic data at
1:04:31	the beginning but i think we've end work better than that folks because we use
1:04:36	case relevant data all the way through
1:04:52	what's the difference in the likelihood ratios for the data
1:04:56	that's the crucial things
1:05:02	sorry three
1:05:05	what was the outcome in so you the you've compare two systems
1:05:12	but i would like to know what is the difference in the likelihood ratios the
1:05:17	this that the systems gave you the actual comparison
1:05:22	for the actual case yes
1:05:26	well there is
1:05:27	are we haven't we haven't tested that when we did when we did the actual
1:05:30	case we chose one system when we used one system
1:05:35	so we haven't for doing the case work we chose one system we validated the
1:05:39	performance of that one system and we didn't
1:05:42	go out and try a whole bunch about the systems on the actual on the
1:05:46	actual case
1:05:49	right because we do in case work it's it do in case work is not
1:05:54	a research activity were not trying to choose the best one and also the problem
1:05:59	comes up is okay you might say we chose three or four different systems and
1:06:04	then we pick the one that were the best
1:06:07	we will then over training so
1:06:11	where over training on the test set
1:06:13	we've optimize to the test set then rather than to the previously unseen actual suspect
1:06:18	and offender recording
1:06:20	and then there's also the problems of you know well okay you're presented
1:06:24	three different systems which one should we believe
1:06:28	precisely in a that's what i'll ask evolves so the defence counsel yes but and
1:06:36	not that i would've expected to have but suppose one of the systems gives you
1:06:41	a little loglr both minus five on the other one gives you local or four
1:06:48	twenty
1:06:50	right so certainly that's not so what we what we would do what we do
1:06:54	re in our practise is we
1:06:57	we pick the we optimize the system we pick a system that we're gonna use
1:07:01	we optimized to the conditions of the case we don't freeze the system
1:07:06	we then test the system using test data
1:07:10	with that we don't go back and change the system again that's just that's it
1:07:13	that's how well the system works and then the last thing reduced has the actual
1:07:17	suspect and offender recording
1:07:19	so we don't go gee i got an answer g let's see i got a
1:07:24	relatively low likelihood ratio who's paying me the prosecution they want a high one i'm
1:07:28	i'll go back i don't with the system and i can get a better answer
1:07:31	so we keep a straight chronological order to avoid any
1:07:37	and he suggested that we would be doing anything like that
1:07:40	yes i understand that but we're talking about different systems are we know little the
1:07:45	just one wants that all about the freezing of the system but the moment we
1:07:49	comparing systems
1:07:51	that's what tools about so while the results there were comparing says but it's a
1:07:55	whole across a whole bunch of test rats so it's averaged over a whole bunch
1:07:59	of trust us
1:08:01	for is the compare the comparison of the two different systems are based on this
1:08:05	you might decide
1:08:07	that you wanted to use one of this you might decide wanted to use the
1:08:10	best performing system but
1:08:14	in a few cases you would maybe decide to choose one of those systems but
1:08:19	if the conditions of the case in the future different i we then test the
1:08:24	performance of the system under the conditions of that you case
1:08:28	i might have decided on the basis of this case but i'm not taking this
1:08:31	case as the validation for the case what conditions are very different
1:08:50	rhino you're having entries news but i guess my question goes to about michael and
1:08:54	jeff at some point
1:08:56	okay
1:08:57	as you go through your case work
1:09:00	most judges are not experts maybe speech or speaker verification so if you're working for
1:09:10	example a tippet plots do you present there was in core proceedings and if so
1:09:18	how do your difference in prosecuting attorneys actually i'd
1:09:23	program ask about the support about you always plots or how you present results
1:09:33	yes and case you point one in recent years we did included to the plot
1:09:38	together with the case specific thing that's but decided before so when we do explain
1:09:44	everything and try to make it easy and so forth will be not shielding the
1:09:49	the court from both results we we're giving them the results and then but try
1:09:53	to explain assesses that this is used
1:09:56	possible
1:10:01	okay
1:10:03	yes all its stuff that we put in our ports of course we see his
1:10:08	the validation of the system
1:10:11	and typically itself i centric or two lawyer
1:10:15	and then they start from the call me and they start asking me questions was
1:10:18	this mean what is this mean and i have system known okay
1:10:22	i'll come to your office will spend a day together i will go through the
1:10:25	basics with you so that you have got to level of sufficient level of understanding
1:10:30	and then the next day then you can ask specific questions about this particular case
1:10:35	in this particular report and so you know sometime in the mid afternoon which we
1:10:42	get to the level with so we started by doing very basic what's a likelihood
1:10:45	ratio and sometime by mid afternoon we get to the testing level and explaining
1:10:50	what a something like a tippet plot means
1:10:53	and then you get a court
1:10:56	and the court seem to be designed to prevent this transfer of information from the
1:11:01	expert to the trier of fact
1:11:04	because you know if
1:11:06	if you were going to train so if you're going to train somebody you what
1:11:09	you do you might send them something to read beforehand you go you give them
1:11:12	a little lecture you get them to ask the questions you ask them confirmation question
1:11:17	to see the understand but in court it's
1:11:20	the lawyer asked you the questions and you answer only those questions and that
1:11:24	jury isn't allowed to ask questions it's
1:11:28	getting major getting the trier of fact understand this
1:11:32	i a serious problem
1:11:36	i'm not a research it's don't also varies there we have not good solutions of
1:11:40	the one thing
1:11:52	thank you
1:11:53	i just so the suggestion which is to
1:11:56	to stop two
1:11:57	see for like a glacial has a single number
1:12:01	likelihood ratios not number it's a rush you and it's very important to be able
1:12:06	to present
1:12:07	with two parts of the racial the similarity and typicality
1:12:13	it's really important for you do fall because when you all
1:12:19	changing the reference population
1:12:21	could be very interesting the coat two
1:12:24	make link between the similarity typicality pills and
1:12:28	you'll decision the boat v
1:12:31	reference population
1:12:34	but talk
1:12:41	the sum and for some new software perhaps also the buttons will give you in
1:12:47	the very for the actually sure what electricians calculated from the from where the evidence
1:12:54	intersects
1:12:56	with the is a different speaker
1:12:58	with the suspect distribution and then the and the
1:13:04	the distribution coming from the reference population so easy to
1:13:08	two distributions use you the case and then point you could you could see the
1:13:13	how the decorations calculated
1:13:18	the question is then if you i mean this is an important that we call
1:13:22	can request then you please are added to the board or not but can always
1:13:29	can an insider how it is calculated
1:13:34	so
1:13:35	i guess they seek out of the
1:13:37	two pieces that are going on here one thing jeff actually what he was ending
1:13:42	up presenting was talking about the
1:13:45	underlying
1:13:46	accuracy of the system right
1:13:49	the performance of the system and then we have the whole thing about the likelihood
1:13:53	ratio that number that comes out that you what present to the trier of fact
1:13:58	we all think is the
1:14:00	or seems to be the going and way to go
1:14:02	one issue i have with the likelihood ratio
1:14:06	when we talk about being a number is
1:14:10	there is no real ground truth likelihood ratio right
1:14:13	in reality the only ground truth likelihood ratio that we can even calibrate ourselves to
1:14:18	are infinity
1:14:20	i mean zero one right it's
1:14:22	it's either true or not true between those two things we start saying that we
1:14:26	actually have evaluated the likelihood ratio
1:14:29	of six point three
1:14:32	there is no we never actually a value we don't
1:14:35	estimate the likelihood ratio relative to any ground truth likely racial because the ground truth
1:14:41	likelihood ratio lives the polarity
1:14:44	i mean there's right we only evaluated through the posteriors
1:14:48	is the llr stewart posterior
1:14:51	so i guess my question the people to go to court is
1:14:54	what you say is the ground truth how do you say what it means to
1:14:58	be between the two poles i guess
1:15:01	unlike which were ground truth likelihood ratio is what's
1:15:04	what is that
1:15:09	thank you some
1:15:10	might be one
1:15:14	for me and this
1:15:16	this is personal opinion
1:15:18	for me the answers the calibration of the likelihood ratio so it is definitely to
1:15:23	that the only ground truth is like the final label what is to proposition
1:15:27	so
1:15:29	what we have tried to do and
1:15:32	in this validation guideline that comes from the workers have been on precisely here in
1:15:36	speaker recognition
1:15:38	is that okay
1:15:40	then i will racial would be better is not at this supports the right decision
1:15:43	and the decision has to ground truth bold fine like
1:15:48	so
1:15:49	and there's another issue that is the issue of the calibration so calibration helps you
1:15:54	to make better decisions because if you likelihood ratio that calibration calibrated when you buy
1:15:59	them to the vocal imitation changes usually chain
1:16:02	the cost reduced
1:16:04	so
1:16:06	that's one issue of calibration and the other issues the kind that calibration gives you
1:16:10	some kind of tuning imagery to a
1:16:15	generate heavier or lighter weight of the evidence depending on what you're discriminative power
1:16:21	so systems with a very good the car should generally higher likelihood ratios good conditions
1:16:27	right then
1:16:29	system with the one stronger migration systems equipped with the words that occur is that
1:16:33	the two properties of calibration so
1:16:35	on the on one hand you improve your decisions which is the final accuracy mess
1:16:39	you're looking for
1:16:40	and on the other hand you have and it's kind of limiting a entity that
1:16:47	is telling you okay you do not discriminating good they give likelihood ratio should be
1:16:51	model
1:16:53	so that's
1:16:54	that's a true that the performance measure that we have been proposed
1:17:01	e
1:17:05	i mean i know this it's fills a politically for a but it also just
1:17:08	seems that everything that we want to say that we're presenting this likelihood ratio in
1:17:13	talking about scales right for you know bands on it but at the end of
1:17:17	the day
1:17:18	what really talking about is
1:17:21	a decision
1:17:22	which has prior mean you even still are when you calibrate everything's done through
1:17:27	a priors that are there you may say you integrated out we go through all
1:17:30	this the realities the day a six point three you can't say in ground truth
1:17:36	my six point three likely racial estimated was really close to the true likelihood ratio
1:17:42	except it's poles you're going to the heart decisions is the prior so i think
1:17:47	it away were sort of
1:17:48	i think that's what
1:17:51	j a set of twenty two
1:17:53	is your really just try to tell people of all the time to use this
1:17:57	is how often it was saying when it was the same for you know the
1:18:01	true this is what how often it wasn't when it was not you know to
1:18:06	the quality and similarity i just wondering in a sense of breaking
1:18:10	are we making it more complicated going to this issue try to describe the court
1:18:14	are we getting a too complicated by overlaying with so much
1:18:17	issues here in training ourselves and
1:18:20	to not to try to get away from any the priors verses just trying to
1:18:23	give
1:18:24	a simple answer a like so it this forensic
1:18:28	thing one guy setup
1:18:29	and just had a visual way of doing it you put down the dots like
1:18:33	here's all the dots when i ran it was the same here's the dots are
1:18:36	and what it was the same here's dot of when i ran the case data
1:18:41	through and you can visually see where sets relative to
1:18:45	it's true that's the two distributions but in some sense is almost just saying here's
1:18:51	here's what i got my read it when it was i knew the truth here's
1:18:54	what i got there and they were the same it hears with this starts it
1:18:58	you choose you know look at deciding to think it's close to the
1:19:01	one of the other without right overlay so much
1:19:05	issues on putting down to the single number
1:19:07	but i mean he's using equation
1:19:11	one of the things that
1:19:13	in my opinion there is not the to line ratio
1:19:16	likelihood ratios inspirational
1:19:18	kind of support and hopefully then somewhat to doing so that there's another should is
1:19:22	the competence is you so the likelihood ratio it's they're mainly because incompetence so that
1:19:28	the final decision has to be done by someone
1:19:30	that person with five i find in fact asks for some information so how the
1:19:36	guy that have the information that the fact finder has not can communicate his opinion
1:19:41	about that piece of information that he tries to integrate with a whole
1:19:46	that's the main issue about behind language all the way
1:19:50	decision could be made by anyone but this proportion of competence so
1:19:55	the form out the formalities their problems because
1:19:58	leaving everything without performance leads them anything that are consider illogical because the decisions are
1:20:04	made in the reports about that's one issue
1:20:07	the issue about simplicity about complexity i fully agree with you
1:20:11	i think that things are
1:20:13	have to be made much simpler and i was talking about with joe before the
1:20:19	band of yesterday that if you've got a chemical analysis
1:20:23	there's one guy that i expressed his opinion about one comparison of two pieces of
1:20:27	glass using
1:20:29	and scanning electron my microscopically with energy for six x rays and so on
1:20:35	and or by well i it is not the same goal they're trying to be
1:20:41	displaying what's going on inside the microscope of whatever with the energy is present right
1:20:46	so for the it's you know how to say that this agreement on the community
1:20:50	that there are standards regulate in the use of the procedures are great that are
1:20:54	there some kind of make sure
1:20:57	error rate so that comes along with it would be would with the standard over
1:21:02	so in my opinion that the weighted ball so
1:21:06	giving a lot of information to judges is something that can be counterproductive be my
1:21:11	way so the balance between transparency and not biasing communication it's important so i think
1:21:17	i think your argument is
1:21:19	it in some way talking about this issue it is very important issue for me
1:21:23	given things simple are the starting point to go if you want to put a
1:21:28	new method following way
1:21:32	can i can i just a we i think there's lots of details that we
1:21:36	can talk about later but i think
1:21:39	we have to present something which we believe is logically correct first and then we
1:21:44	have to second worry about how to communicate that and it's not appropriate to present
1:21:49	something which we believe is logically incorrect all we which
1:21:53	but
1:21:54	which is easy to present
1:21:55	and the exact example you were giving i think that's one where when we if
1:22:00	the jury looks of that they will immediately jumped to it was him
1:22:04	they will jump to an identification
1:22:08	and so that i think that's a problem
1:22:13	okay we have to move might be this one
1:22:16	yes
1:22:17	jason
1:22:23	is that i just want to common to on the point
1:22:26	proposed by do go in the body and so from then you
1:22:30	are you agree and i think we should be honest when you experts are doing
1:22:37	information report
1:22:39	it's not for the judge
1:22:41	it's only for some over simple if you kick spits which will be able but
1:22:46	the difference side for example to exit mine you information and to give some inputs
1:22:52	to willow your
1:22:53	we have in a g and you in front of the court
1:22:57	the only important things is how you are present in your opinion
1:23:03	it's only based on what you are sitting on the there is no thing to
1:23:09	do with the cued racial you could save my ticket rituals then to any you
1:23:14	know but betterment me or like me you know
1:23:16	so we have to be clear
1:23:20	report on the scientific boats of for some people should find enough information in order
1:23:26	to criticism norwalk
1:23:28	and if you know
1:23:30	information is given by some morning and widely by the expert recall
1:23:39	i will you don't
1:23:42	i have been discussed in many forensic scientist where we always agree that transpires is
1:23:48	important everything has been transparently reported and so on
1:23:52	talking about explanations in court about issues
1:23:56	the balance have to be taken into account for example indiana analysis the nn analysis
1:24:01	the deities they start to use probabilities for reporting and it was a huge mess
1:24:07	for ten twenty years
1:24:09	but i have a more exactly right in things and interpretation fallacies where common
1:24:15	and
1:24:17	that experience tell us that
1:24:20	it has to be a balance between boarding
1:24:23	transparencies important one and when someone comes to core to explain that don't reports
1:24:28	probably is better to keep the simplest writers thing rather than going to complicate things
1:24:33	for me for example having a performance graph with a lot of details
1:24:39	can be
1:24:40	okay for us but when you well that for all your
1:24:44	probably the information that he's taking from that graph is not what you're trying to
1:24:48	express
1:24:49	so the problem is that the level of detail
1:24:53	which are transparent
1:24:55	probably so much detail
1:24:57	is giving a person like listening
1:25:01	a different message then the person that is speaking is given
1:25:06	so the balance has to be there and i'm not saying how to do things
1:25:09	but the balance has to be there might been and i fully agree with you
1:25:12	are we have to be transparent
1:25:14	transparency and
1:25:16	and the level of detail our things that has to be considered
1:25:22	i can do you want me
1:25:26	i can't and on that to planning
1:25:28	just one minute
1:25:30	should i okay no
1:25:34	i think is really important what you're saying and the you can never
1:25:38	sort of leaves of the
1:25:40	responsibility of what you're actually expressing the court
1:25:44	ask some somewhat subjective whatever you do you know you go jeff's
1:25:49	weight of the
1:25:51	that's a danger in that i know if you
1:25:53	i read something about like theory of science or something called like physics and b
1:25:57	and that's very much appears when you when you move into a different paradigm which
1:26:03	did you just salsa system is actually completed different paradigm where argumentation is actually the
1:26:10	thing that they're doing not
1:26:12	i it's not
1:26:13	engineering more signs in the way we are used to it so you do a
1:26:17	lot of analyses but when you end up in corked it's a lot of a
1:26:21	argumentation and you express some opinion on
1:26:26	all the analysis you made so you are actually
1:26:30	you is this a big point with the physics and you that you don't leave
1:26:34	the responsibility to just the number logged on all this i have this system and
1:26:37	the this is the score and you do whatever you want with it because
1:26:41	the communication is equally important on the insecurities noticing i think those
1:26:47	there is a mile like in our system with the nine point scale and there
1:26:52	are some likelihood ratios bands it's not really that important but it's also like historical
1:26:56	and everything that they are used to this kind of system and of course the
1:26:59	in a is much
1:27:01	stronger is a label
1:27:02	much more often have a plus four and so we now our case we're almost
1:27:06	never about the class to for example and
1:27:09	you have to express the a kind of strange that you can get to in
1:27:12	that
1:27:13	that it's all a lot of parts of it no matter if you use automatic
1:27:16	system or it is based on this phonetic analysis is gonna be some subjective part
1:27:20	of it are
1:27:21	i mean even that the things that you of the that produce with automatic system
1:27:25	you know you choose chosen the data are you of this some subjective nist to
1:27:29	all of it so
1:27:30	i think is really important to remember the i think some neto is written a
1:27:34	really good the article on this interior signs on physics and the end because of
1:27:38	you show all these numbers and all these graph and nobody
1:27:41	well understanding cord i promise you
1:27:43	the
1:27:44	the defence lawyer will say something like okay so you actually adjusted your system to
1:27:49	the case that jeff
1:27:52	did you do that and then you probably in the end at one he forces
1:27:55	you when you've been in court for twelve hours a in the chair you that
1:27:59	i did that then is gonna say okay able so it's objected
1:28:03	and then and you're done
1:28:05	so you have to really
1:28:08	think about how you expressed thinking course try to stick to your opinion and what
1:28:13	you based it on
1:28:15	but can remember the physics and b i think it's really important they see number
1:28:19	and the score and they would just all
1:28:21	he's really smart this guy you know see snaps
1:28:25	so okay good thank you so much and i think we wanna go around the
1:28:31	plots for all the panelists

Forensic and investigative speaker recognition

Industry & Forensics Track (Short Talks + Panel Session)

Daniel Ramos (ATVS - Universidad Autónoma de Madrid, Spain), Jonas Lindh (Voxalys, Sweden), Michael Jessen (BKA Germany), Anil Alexander (Oxford Wave Research Ltd, UK), Geoffrey Stewart Morrison (Independent Forensic Consultant)