thank you for the uh invitation to be here uh i it did come as a surprised because as you know uh uh immediately appreciate are not uh uh a voice or language recognition person uh but right from day one i realise that there are lots of issues uh circulating here but uh related to things that we've had to uh struggle with in connection with yeah i'm not even a D N A evidence person mainly i work in a kind of medical genetics context and my main uh brighton but work is you know looking for disease genes and cool you know 'cause the fixed tween genes an interesting uh phenotype uh but i've long had an interest in the interpretation of D N A evidence and uh try to contribute a uh a lot to the developments in the field over there oh the yeah it is and i'm pleased to say that we have made a a lot of progress um it's also clear that uh people in this uh community here have made a lot of progress in trying to get uh the the field on what i would regard as a more rigorous footing in terms of the interpretation and i i'm thinking uh in a second frames the context of all the evidence that will be comprehensible and meaningful in whole uh and so i've done a little bit of background reading of uh uh oh interesting our work in the field by F and what team and several other people here and so uh it's clear that i you know don't have much to say about the the basics but what i thought i would um do is take a some slightly contrary in position and uh i would say there seems to be uh uh a kind of sense i got from the reading that the that the that the grass is greener in the next field uh that uh everything is solved and works very well for deny evidence and and uh i'm going to tell you that that's not the case uh it's very complicated and messy there are some compromises that that's all for work uh we are saved to a large extent by the evidenced by the by the fact that D N A evidence is in general very good evidence uh and very powerful uh and so even if you make a mess of the interpretation uh the ultimate outcome might not be the wrong one but that's not always the case uh and actually the reality in courts today of the presentation of D N A evidence it's still pretty dismal uh and it doesn't matter when i get to the end are we talking about the the latest generation of low template D N A evidence where very small amounts of D N A lots of stochastic effect uh and lots of complications so um those of you who are paying careful attention my recognised some of the whatever it in some of the children's change my tackles uh somewhat instead of talking i've talked about comparisons here instead of recognition so we we have the same debate in indeed evidence that we shouldn't talk about deny identification because identification is is not possible and not the business of the scientific expert um personally i'm a bit more lack some light on happy to use words that make sense the general public even if uh we have to be careful about understanding uh what they really mean but anyway you know acknowledgement here i put voice comparison might happen but in the end although my goal was to try and think about relationships between D N A evidence and uh and voice evidence all the basic work has already been done by people here and i didn't feel i had very much the way so i'm really just gonna restrict myself to talking about the you know evidence some of the problems that we've had some of my views on how well come them uh and then we leave for the discussion uh uh the possibility for people to really raise parallels and you go advised me not to leave any time for discussion because it's a very controversial area but uh i'm going i'm going to try and take the risk uh but uh in fact i packed quite a lot of uh stuff into my slides and sounds to me a bit louder again down just move it down a little bit um the um and i i wouldn't have time to get through it all properly but the um you have the luxury of knowing that uh you don't have to really get to grips with all of this material i just wanna give you the the flavour of things the problems we worry about and uh historical perspectives on the go back right to the sort of beginning of time so to speak of this whole um weight of evidence academic literature a lot of it springs from this famous case uh in california in the nineteen sixty eight uh i could define how about all the papers written about how to interpret the evidence in this case correctly it would would go up to the roof and it wouldn't reach a conclusion the famous uh uh saying about the colours because uh it it is a very very interesting case uh i i mean i think you get the details just from there that numbers one made up uh that might be associated with frequencies for various traits that the defendants possessed and uh it was claimed the uh the true criminals also possessed uh and you know it's lots of fun you can give this to students 'cause there's lots of things wrong with this you know obviously those probabilities of just made up obviously independence is a problem but this sort of more fundamental issues uh every packet wave my magic wand and get rid of those problems and if those really work through probabilities and they really were independent um what would the number you get by multiplying these probabilities together what would it be uh and how does it relate to the juror or the finder of fact problem of deciding whether the computer well i'm i'm not gonna answer that problem for you here entirely but uh it's an interesting and and uh and difficult problem but certainly one version to answering it uh and and one branch of the academic literature but i'm sort of merging things here that slightly uh there there was uh um developed slightly differently but the the the sort of canonical problem for D N A evidence is uh we've got a sample left sign yeah crime scene uh we've already had some discussion of the notion of a match is meaningful for D N A evidence not always um not follow template in a of and stochastic fixed not for the older form of D N A profiles that was in use in the early nineteen nineties and is still occasionally crops up but let's just he rarely all the idea that you know there is a notion of a match of a yes no answer uh and we've got some frequency information again um let's not worry about where this frequency information comes from just believe it for the moment um and so you know how how convinced should you be i think the sort of fallacy that uh many people has already been alluded to here is to think well one and it's pretty small so he must be guilty a what you know what is that sort of logic there and that was a bit of a and academic literature a fun discussion that went on a quite a few years um it's uh in retrospect the answers seems very easy and you wonder how we manage to argue about it for quite a number of years but uh of the uh but uh anyway that's what economics uh and therefore and uh finding out uh problems to argue over and i can't consensus eventually emerged around the you know what from uh and orthodox uh bayesian position would be a kind of standard and straightforward response you should be using base there but uh and i put one version of it there um you can so uh introduce some notations C is the name of the person who committed the crime or let's say of course being the source of the D N A is not logically equivalent to committing the crime will just suppose it is here um and S it's the name of the of the offended uh and there are some subtleties and difficulties built into here i'm gonna spend a little bit more time and talking about uh and that revolves around the idea of of what is the alternate hypothesis uh so in um as i i've already uh mentioned earlier this idea that i i think there's a bit of an impression that the you know the grass is greener in the next field and things are easy if the D N A evidence but one of the things that's uh that's not easier in some uh is this uh specifying of the alternate hypothesis and and the uh can be why uh i difficulty logically uh in a number of ways uh but yeah i'm going to assume so so in the the this the speech recognition literature people have been happy to just posit that the the null hypothesis the prosecution hypothesis if you like the same source that the uh uh queried rick uh voice and the suspect's voice uh come from the same individual or different source um four you know evidence at least it can be that simple and uh i've chosen to break down the whole time to type up this year into a number of hypotheses of the form X did it for various things uh but for more complex problems there are different ways uh to break down the evidence uh the alternative hypotheses for example if there uh multiple uh do you know samples which is often the case there are lots of alternatives around different contributors to the different samples you know just uh often it's just assumed implicitly that there's a single contributor but for mixed samples that's not at all straightforward uh there's different alternative hypotheses a around uh relatedness and there's different alternative hypotheses around the number of contributors to the sample um but um in this form here i'm just thinking about breaking down the alternate hypotheses into all the individuals who it could have been and we have to add up over these uh evidence um logically you have to add up of everyone on a uh and uh and that this idea came up actually in a court S and the judge was horrified at the idea that he has to sit there and think yeah uh about every person on a one at a time uh but i wanna emphasise this point that logically you have to um the if you want to prove that particular individual is the source of your D enable the source of your voice recording logically that means that everyone else on the is not the source and also alternate hypotheses around uh you know synthetic um voice fabrication on these kind of things all of those hypotheses uh have to be ruled out in order to establish the one type of do you care about um so a little bit of uh manipulation we can right now formula like this in uh again this is kind of just uh classic uh way of breaking down the evidence uh uh breaking down the calculation in bayes theorem and the idea is to introduce some notation here i put a whole the likelihood ratio and again i want to emphasise that isn't one likelihood ratio there are many and we count that is difficult problem of how to combine the like ratios uh although we we we we'd like to um i'm thinking in terms of the D N A evidence being interpreted last and so this other ratio the prior i'm thinking all those incorporating all the other evidence i mean there's no logical reason for doing it that way around of course that's it's a nice coherence property of of the bayesian analysis you don't two one uh you get the same answer which whichever order you analyse the other um and in order to get it in this form you need to make a uh assumption uh various uh uh independence some some we could argue about that scene generally reason here some putting the um the no getting to be able to write uh weight of evidence in this form in a kind of forensic setting was a pretty big uh step for all that it took yeah many years and lots of arguments and so on but it's you know pretty much accepted amongst a bigger community nowadays and it overcame a lot of problems that people uh struggled with i mean i've been in the field so long now that it's sort of hard to remember how difficult some of these troubles were um but uh you know this basic idea that i nations you to well one in a million is really small he must be guilty uh it's not it's not true and people didn't know how to think about that uh until we were able to formalise the problem in this way uh and now it seems pretty easy to think about it for again this is a simplification in the general problem is not that simple but one way to think about is how many alternate suspects there are uh and you under some simplifying assumptions you were essentially add up the likelihood ratio of a your alternative suspects and so a likelihood ratio or one over a million isn't convincing uh if the number of alternate suspects is larger very into the time that there's you know no fundamental logical problem here about this uh nice uh distinction about between the role of the experts and the role of the of the of the finder of fact oh come back to that but but uh you know this is certainly an only true under some simplifying assumptions and whenever if i kind of present this idea in court i have to sort of be careful about wording like if you choose to assume that all the alternate suspects are equally likely uh then you come up with a formula like this of course nowadays um likelihood ratios a much bigger or smaller whichever way you do them around uh and one million uh and so uh tip in typical cases the problem has vanished but again i wanna emphasises lots of cases out there would mix profile small amounts of D N A complex relatedness we're all these issues still matter uh the role of relatives so that was a again much confusion about this than in the past uh this nice uh formalisation in terms of bayes theorem i want i might slip into this language but another point four for discussion is i don't think what i'm doing is fundamentally bayesian i tend to avoid the label bayesian the way i'm just using bayes theorem and its theorem probability that all uh or uh or light here it's the model mathematical probability except uh in fact i would say my approach is fundamentally non bayesian in ways that i will point out uh want to later uh i just remembered now i forgot to put a slide on this there was a mention of it uh that there was a big uh court case um in the U K a number of years ago where the uh that was strong D N A evidence implicating implicating a defendant but it was quite a substantial amount of evidence in his favour any particular uh the victim of this crime gave a good description of the defended uh of the of the attack uh and the defendant didn't match you know gross mismatch between the description and what he looked like but she also said in court this does not resemble them and you know he does not resemble the man attack since i wasn't interested it doesn't resemble the man that attacked me and so uh and he had an alibi and wasn't near the scene of the crime at the time and so um quite a complicated case went to summary trials and uh the um i wasn't involved in that case but the the defence expert actually uh proposed at all the jurors through a bayes theorem calculation uh with likelihood ratios for the wood description on likelihood ratios for the uh for the D N A evidence uh and likelihood ratios for the alibi evidence and suggesting values and the jurors were asked to multiply them together and the judge got quite enthusiastic about this and ordered somebody to go out and buy doesn't calculators for the jurors to uh multiply the numbers together but uh the judge kept getting zero and uh when he tried to do the calculation himself uh anyway the um uh the guy was uh was convicted but it went to appeal and the appeal court was sort of horrified absolutely horrified about this uh complicated mathematical stuff that these uh wise all judges didn't understand uh was uh about having this in court and so the judgement was very severe that uh you know bayesian methods one not to be introduced in U K courts that uh uh because you know ask right here at all judges don't understand that and uh you know it's all lots of a power thing no worried about losing that losing that how about i just thought it was a sort of amusing idea that any form of reasoning is is allowed there is no other role as far as i know there's not oh all reasoning is allowed in a british court except the form of reasoning that's been established to sort of be logical and read and rational and reasonable that's the only thing you're not allowed to present in a in a british court uh so that so that was a bit of an aside but that's why i sort of avoid the label bayesian and i think it's uh it is uh irrelevant and um uh to what we are doing uh and of course i don't have a explicitly introduce mathematical formalism whenever i'm giving expert witness but i do try and talk to jurors through this kind of thing and say imagine how many close relatives there are of the defendant what is the match what is the matter probability for then uh and imagine how many unrelated people what's the matter probability you gotta combine the total weight uh to come up with a and if that combined weight is norman negligible then you've got reasonable doubt about having the right guy uh unless there's other evidence uh implicating those uh oh or not taking the brothers four uh uh yes many of these features of course i'm gonna be talking you want really relevant to voice recognition but i think some of them will be uh i'm talking uh i've got a sneak in the label here now i only mean genetic uh ideas of uh of ethnicity here um relatedness matches the do you know i haven't uh and this is really the same issue is as close relatives but it's just relatedness on a more distance scale uh so the relatedness of people in an isolated uh uh geographical or religious group someone bay yeah compared to relatives they're relatedness is less relatives like cousins and so forth they're relatedness is less than is typically more of them and so they you know they i kind of plausibly balance out um i'll come back to this uh really really important of these issues are around uh and and a real fundamental difficulty that i don't think we really have sold is what to do about lab air uh labelling errors uh and outright evidence for award um but at least the bayes theorem paradigm tells us how to think about the problem and what relevant issues out but but what it tells us is um uh is a little bit worried i mean first of all one thing that's not wiring is that some critics of D N A evidence we're going round saying well you know i know no human activity has an error rate less than about one in a thousand and therefore numbers like ten to the minus six ten to the minus seven ten to the minus eight come up in uh in in connection with dean evidence a completely meaningless so that reasoning is invalid this is not the probability of any error that matters but only a narrow that generated the data that we observe so uh there's a famous story that uh feel david uh pointed me to from price i think probably eighteen century english philosopher to discuss this point that uh a printing error in the newspaper is more likely than you winning the lottery uh but nevertheless if you see a number printed in the newspaper as the winning lottery number you don't through the newspaper out and say a probably a printing uh fig because uh the uh because it's not any printing error that matches the printing a rather generated your number is much less likely that you winning the lottery and therefore you do through the paper up and run down the lottery office to claim your prize um but this is a fundamentally a problem that uh some of the more reason critics of D N A evidence i don't think we can easily get away with this that uh evidence tampering doesn't involve this problem because evidence prob tampering doesn't generate the evidences of um and she quickly a reasonable persons view of the probability that the police or somebody else tempered with the other in some way it's gonna be much greater than a match than a match probability or likelihood ratio in connection with you know um and logically i think it is true but because of this the this typically will swarm the uh the significance of the match probability D N A evidence so that if you do get a good you know profile match the actual number connected with it is pretty meaningless it's it's virtually impossible and the only way down now too thinking about these kind of alternatives but what i nets but can do about that in court is quite uh it's quite difficult you could you you can't even consider putting numbers on this kind of thing of course uh but uh but ideally you should be alert injurious to this possibility uh and uh that it should be you know wait in combination with the with the match probability for the D N A and i don't have time to go this here this is also the start of stuff but this was a sort of fun debate will not find really 'cause i did get a bit tedious it just went on and on and on and it still goes on this uh argument about the uh effect of the evidence i know some of your read some of the literature so you'll be aware of these issues but uh have some of you want be one i imagine case number one uh i just say you know he matches and it's one in a million probability of a match uh case number two i tell you those two facts but also tell you all by the way uh i found him by looking through our database of D in a profiles he was the only match now in which case is the evidence stronger case one okay stew the classical statistical viewpoint is a case too is uh evidence trolling uh you've gone through the uh uh you know you're going out fishing for hypotheses and we all know about multiple testing and one for any corrections kind of thing uh evidence is much weaker if you go fishing for hypotheses um so uh if you've the defendant has been identified through a search in a database or D N intelligence database of known uh previous offenders uh the the data uh we can then in a standard and of course uh that's completely wrong uh this i i think the uh the standard uh statistical reasoning is just uh inappropriate here certainly all of that you know there's a classical argument about frequentist and bayesian views or in the literature but but often the frequentist in the bay seems get to roughly the same place in the end that is one example where they get to very different places and when it because of the strong logical foundations of the bayesian foundation whenever the two of them disagree uh it's pretty much always the bayesian view that's right uh and and it is yeah what people who are worried about the evidence trolling idea on that weakening the evidence the problem with their approach is they are not being critical enough in the first because even if i just even if there was no um evidence trolling even if there was no database search i still have to logically to prove that this guy committed crime i have to prove that every other person on a didn't commit the crime those two formulations are equivalent statements of the problem that's a really tough task and nobody else nobody would have dared to even contemplate that in the past because it was unthinkable that you could prove that everyone else on the didn't do it but you can now would be an evidence you can think about an ugly arguably due and so any amount of fishing or trolling for hypotheses is it doesn't change that fact you still have to prove that everyone else on a didn't commit the crime and in fact uh it makes life better because everyone else in the database didn't match so that helps even your task proving that everyone else on a didn't commit the crime 'cause you've got a whole lot of people that you've shown not to that a profile doesn't match um that is but latter argument is an argument about uh hypotheses rather than so this issue about probabilities of evidence was probably posses um i know it's a bit more about in a moment we like to separate the two but i insist that fundamentally it's not uh it's not possible to achieve that ideal in many situations that that's not really uh what yeah um so yeah some of the things right feel that the the the the formulation of the problem even though it's just a sort of standard was no bayes theorem it wasn't obvious for about twenty or thirty years after the commons case when all this academic literature was piling up people didn't get to this position of just writing down bayes theorem and seeing its implications in the way described in them uh nowadays the majority of people even in the field don't uh so i don't succeed in understanding the evidence this way but there's a big enough community obvious that do uh that that isn't the problem but um there are many many uh problems that remain uh and uh i have already so stressed this one but this is one of my key points about the difficulty is that uh you know it's nice to think about a competition between the prosecution hypothesis and the defence side sis and uh you know many lawyers have argued with me that is fundamentally is what the whole legal system is based on the competition between two hypotheses and a sorta reject my idea but i've represented at but i claim that this is just a straightforward logical situation that in order to uh establish hypothesis one the prosecution hypothesis you must proof that every other competing hypothesis is false whether or not the defence puts forward and of course and most uh legal systems the defence downtown to put forward any story at all of course uh and even if they do put forward a story uh judges usually advise the court in a that um the jurors uh in now setting that um they don't necessarily uh even if they disbelieve the defence story it doesn't necessarily need the defendant is guilty these a separate question to be and sit separately uh it's inevitable that the forensic scientist has to make subjective judge about for example implausible hypotheses uh so uh in D N A evidence you've always got i was my identical twin story which makes D N A evidence uh completely uh useless uh it's actually quite remarkable how rarely that is used i think that everyone would just laugh it out of court actually identical twins are not rare and it's very hard to prove that you don't have an identical twin uh so if any of you do commit a serious crime and rub on court would again errands i do recommend you try the story that uh and uh uh i i i think logically it's hard to be uh um the uh but nevertheless in practice acting queueing sixties unfortunately that uh uh the but also in the evidence we have the number of contributors do we deny sample even if there's no more than two little that every locker it doesn't follow that there's only one contributor there's no what the bound on the number of contributors uh and i'm the involved um right in the middle of a court case uh you know that i was giving evidence um fried enough to go back and continue mild evidence tomorrow uh and then the uh i it looks like one contributed the crime sample and i did some calculations one contributor to contributors and of course that the that i'm not advising the prosecution in this case usually on divine advising the defence uh but the defence of course it jumped up and said you haven't done any calculations for three contributors and and of course i said well you know there's no sign of even to contribute to so three contributors is ridiculous and they say a but you cannot rule out the possibility of three contributors and i have can see the uh but i can't you know that's a subjective judgement right uh and uh i kind of transgress is this idea of trying to key a clear logical distinction between the likelihood ratio uh and the uh and the probabilities of hypotheses uh and if you can read this uh in this respect scene is under what is is completely unavoidable that you can't uh you can't avoid making judgements about probabilities of hypotheses uh but nevertheless we should maintain the goal right just behaviour which is to try and avoid as far as possible any assumptions about the hypotheses and you know to be aware of them and make them explicit as far as we can the um oh i didn't mention this one as well contamination rates also an issue here there is um sometimes it's easy to get confused with discussions of of priors because there is a prior on the hypothesis uh that he's guilty uh for example and that's very clearly not the business of the expert uh and this is the and is the master of the a finder fact and it's for you know we have to be very careful in our wording to avoid any suggestion that we're expressing a view uh on the probability that he's guilty either before or after the evidence but of course we we include priors for other quantities all the way along a particular rate of contamination so with low template do you know profiles it's just amazing how difficult it is to get rid of contamination our environment is entirely covered with D N A you know for four meters around me there is my D N A staff it everywhere from my bread uh uh and i know touching things leaves your D N A uh it's a very kind of shocking so when you think about you know it is room is entirely covered with D N A uh the um but uh very thin film obviously and we cannot ever exclude that there are some of the illegals we see in a mixed profile got there not through any of the main contributors that we're thinking about is the offender crime but some environmental combination uh that's a really serious issue with low low template D N A profiles but in any case any assessment about contamination rate is it is effectively a prior judgement um based on you know there is is it so into that the um okay i yeah as us one that got too much stuff here i want to say very much i haven't said anything really about the technology of the of the you know profiling um i'd out there's a little bit there are those of the you don't know it's just that uh he short tandem repeat profiles a little words of D N A the repeated a number of times and the number of repeats affects the length uh and the current technology dist it's still not sequence based even though this may change in the future but there's so much investment in this technology now time to think about changing it we don't actually read the sequence we just measure the length of it in a fragment and the length is measured by running uh these fragments through a gel and there's a laser i detector at the finish line and time taken uh her response the length fragment usually we can interpolate the number of repeats so you might have seven copies of the repeat someone promise on the nine on the other so you would unit i would be represented the seven nine but uh unfortunately for this nice story partial repeats do okay so this doesn't mean nine point three to decimal number it means nine copies all the four base pair repeat and then three base pair uh fragment a repeat and but uh nevertheless it is pretty much possible to to say yes no whether the um whether the fragment lengths match uh and this is sort of idealised view of the electra fairground basically a time series plot as these freshmen past the finish line uh there they are there are dies you know coloured eyes you can think about uh that distinguish the fragments from different loci uh and then different loci have fragments and it in a different length ranges so that enables you in one test you uh to uh and the lies channel twenty uh genetic loci and in the current technology we all done i mean we'd love to be able to take into account heights of these peaks uh but we don't we we just it was of the binary yes no uh there is a piece here um and i'll come back to that and a little bit if i have time because with small amounts of D N A that problematic um so there's a lot of problems with these issues of where does that where do the probabilities come from um and uh people have mentioned to me here and it's sort of true that in um you know in D N A it's easy 'cause we've got population genetics theory which generates uh um uh which generates probabilities and of course in a larger population genetics theories based on one of bruno's famous sums of course mental who is a uh here and that uh you did is a work on the P Z here um and but nevertheless although mentors labels as applied here ah near enough to being objective fact uh there's a lots of elements of theory that subjective uh strength of D N A evidence it's all about related uh and these questions of independence all questions about relatedness how you model relatedness is you know a typical story in complex scientific evidence you think about all the people in this room we've got hugely complicated of relatedness through all think of my all my lineage as mother father for grandparents great grandparents you know go back five generations where where up to large numbers of ancestors and then any other individual in this room every has got you know same so many lineage is up to sixteen great grandparents and every yeah all those lineage is one of my sixteen great grandparents someone of your sixteen great grandparents all meet in a common ancestor at some point past and unless you think i i'm an alien from another planet but more or less the uh that's pretty substantial evidence that we all have common ancestors so the fully detailed model would specify all the patterns of relatedness for every individual on a and of course that's ridiculous the complicated so we have to make simplifying assumptions uh and most of the models kind of break relatedness down into three levels known relatedness which is usually you know just one or two generations in the past uh relatedness future unknown shared ancestors but understood to be on a relevant relatively recent time scale and and how you define recent is how these theories very uh and then the completely unrelated case is an idealised case where the ancestors of so far back but it really doesn't matter we can just assume independence um the they are kind of good enough models even uh he too few people really understand how they work now but you know the great vocational reality and i just wanna emphasise the uh this objective miss of the underlying model of these models um there's a lot of argument over the years about independence various independence assumptions that go together and that's of course important four for you guys as well uh we did this is where we do have an advantage that uh in in the the only dependence that matters is due to relatedness uh the other important point i want to make it causes that um is whether or not to think you know to kind of meaningless thing to say a is independent to be in a in a kind of a general real world objects that uh independence is all about what information you condition on and if you get the conditioning right things are typically independent to a good enough approximation the example i've used this uh is uh reading ability and shoe size in children are not independent the the bigger the better readers have big F eight uh and uh that's a very well established fact then you can look at the correlation it's quite strong uh of course they depended because of the varying ages both of those things are correlated with age if you uh you condition on age the dependence goes away uh and uh similarly uh if you condition if you do the right conditioning for D N A evidence uh you and make a reasonable assumption of in and that sort of course you know if you if you want to take a contrary imposition which of course defences in court so sometimes do you can never rena rigorously prove anything to be independent um so the what what matters fundamentally in the match probability is a statement like this at a single locus the probability that an unknown individual acts as gina type A B even yeah oh i but okay good just a speck um and so what matters is affected by this conditioning and of course the probability that this guy's got a be given that this guy's got a bee depends on the on their relatedness what doesn't matter and they argued about at great length is the dependence or otherwise of the two labels within a block us so cold hardy weinberg equilibrium i mean again uh um much discussion about this it's relatively unimportant i've had overemphasised in my writing this condition because that's what i see is the important one but of course there's a lot of other stuff in the conditioning as well as all kinds assumptions and background data uh that you are relying on and i'll i'll say more about that moment now i see i'm going to take uh i didn't intend to uh following because advice and uh use up all my time and not leave any for discussion but uh there's a bigger this is where i say that uh what i've been doing is fundamentally non bayesian although based on bayes theorem because uh all of these theories require uh parameter estimates and and everybody likes to putting plugin estimates uh the simplest thing to do but also you know you can think about what the different estimates are the different parameter estimates are change them that the parameters for us are they really all frequencies uh and this population genetics parameter which is the average relatedness in a community uh we've got various estimates of these um we like to use plugin estimates it has an advantage that it keeps the the evidence specific to the case over here and all your training and background data that feed into apply guest estimates over there of course the ideal and and again this or the bayesian position would be to integrate out the unknowns and to in a sense combining the data so uh and and you know feel david in london right papers trying to do this where the you know the the days you conditional is not just the dependence profile but the defendant profile and all the profiles even the scene before uh that uh that formula background information um so we are recognising this idea like to think it is just a bit too complicated um and so i have donna sort of a good compromise of using plugin estimate but in recognition uh all these uh problems the about the um the expectation all the hype how i can be much greater than the power the expectation so this is why putting in plugin estimates at that uh something like maximum likelihood estimates can be really really misleading uh because you know the out the effect of uncertainty is not symmetric uh when you've got high powers and product um so that's right you know there's a lot of again this sort of boston vast amount of wasted literature in this field like there is in any academic field so you have people talking about how to do maximum likelihood estimates of these plug in parameters and it's just a complete waste of time because the maxima like to estimate or anything like in any kind of sensible estimate in the middle of the distribution is hopelessly wrong uh because of this problem here um so but i haven't really got a very good solution i just say well we want something new the top of the plausible range like a ninety eight or ninety percent or something like that uh although of course i haven't really got any formal just cation doing that um right a lot more to say what should i choose to include i can't resist talking a little bit i talked about the the probability is coming from uh series uh uh population genetics theories which sound very brandon i can easily put them past just get you know a courtroom who never sort of question me about any of these things but ultimately when you looking them it's all full of subjectivity and judgements and and i chosen this theory and not the theory and so on um of course many people are happy with that kind of subjective element and they want to sort of rigorous and one way and again the sort of classical statistical position to get a kind of rigorous probabilities uh is to put it in the context of random sampling so that a lot of literature out there and a lot of um critical thinking which is based around the idea that the suspect as being chosen randomly in a population uh now i've already talked about evidence tampering and uh relatively high probability that the police could fiddle with the evidence but the possibility that the police are capable of uniform random sampling is completely ridiculous i don't to accuse them of that of course all the all the X that's in the world can't do uh in a find it very difficult to do uniform random sampling uh and so you know many people think that uniform random sampling idea uh click field on a rigorous footing because these are objective probabilities uh and i would say yes objective that is clearly nonsense uh the police haven't sample size X randomly this is just a completely made up assumption uh which is uh uh i i mean i'm probably overdoing it here i mean it is the kind of assumption that that people make and for good reason in some settings i don't think we need to make it here and it doesn't lead to lots problems and in particular the sort of endless endless arguments about in which population has suspect being randomly chosen and i say here that you know because there is no such sampling and there is no such population is like arguing over the number of weenies on the two very that uh yeah that that you know there is there is no such object so there's no point arguing about the properties um but you know there is a real fundamental problem here that the more now really define the population the better it is for the defendant and we usually try to sort of leaning defendants direction but the only logical endpoint of this is uh the population of size one that includes the defendant and have a hundred percent frequency for the uh for the dependence profile which of course is a useless uh position and of course so you get you lose all these nice advantages of the bayesian formulation because this friend sampling hypothesis i don't suppose you could do it i mean you just can't do it in a bayesian way because it's just a ridiculous hypothesis that's got nothing to do with the with the with the evidence um and all this stuff which works well in the framework i've been telling you about hard to do in this in this setting right i this is an old topic of mine and i will skip over this one that the the the U S national research council did a report maybe fifteen years ago it still and hold absolute sway in the U S uh it's all it's all based on this random and hypo sis and it's all kind of riddled with errors but it's interesting in the sort of social psychology of the feel we had huge arguments about D N A evidence and an early nineteen nineties any nineteen ninety six the mood was just right the kind of settled on a compromise and so the authority of the national research council in the U S was was such that everyone kind of lead on this uh and in some kind of prey consensus it sort of worked you know D N A evidence based on this is gonna be does a lot of people in the U S and they're probably all guilty but the fact that it's a completely riddled with misunderstandings and errors and uh and and the evidence is being devastated in almost every court case in the U S involving deny evidence the evidence is routinely overstated because of oh you know the truth is that the evidence was probably pretty strong anyway and this is why we haven't had the kind of gross miscarriages of justice coming to light uh that would just a channel these floors so i won't go into that but all these things i've been talking about they they use on the stored but really the important thing was this uh population genetics where what we care about is the conditional match probability but what they cared about was just the marginal probability and everything all the population genetics issues are in this conditioning uh and so by leaving that out they had a whole population genetics experts on this comedian they had big chapters on population genetics and completely missed the point uh and gave completely misleading and recommendations oh now i yeah i want you that i tried to have a too much of material in this talk and um i've withdrawal these topics but i just wanna bring you up to date with some of the uh let 'cause everything i've been talking about today i could've talked about years ago it's sort of a what the arguments of the nineties uh and uh really uh two thousand but but what's really really come to a crunch this year in particular is what to do about this low template D in a way down to getting D N A from samples of just two or three cells and so this huge stochasticity in the results um the uh and of course many jurisdictions just say this is way too complicated and we don't want to touch this uh but more and more particularly uk that more and more people are and it ended in and and uh it is potentially you know it doesn't mean that just from the slide touch it's rather than collect a fingerprint uh it's it's can be strong evidence to collect D N A from this way phone i think great uh but we get all these kind of stochastic features i've got some slides yeah one huh time to but um these peaks that i showed you about you get so the top half as we could be in a good amount of D N A and this is with a sort of moderately low amount of D N A and you get all these features like uh peak imbalance but most one really complete drop out of any of the labels either two peaks there but there's only one showed up here and that's because the the P C R reaction that underlies the whole thing with such with so few cells involved it can just completely fail if there's some uh uh you know mutation the primer or something else goes wrong um and you can get dropped in the contaminant really owes you you would have thought that these high tech uh le bar trees could keep the land in a free but it's absolutely impossible even just you know the plastic where that uh people use it uh it's full of D N A and you just because our in denies everywhere in our environment uh it's impossible to keep it out hi um so the little bit here about the the the various so uh where draw so these thresholds uh that are being used you can see that the way the evidence is analysed is quite true uh but this threshold means anything below this doesn't count so this he he is very strong evidence could be against individual but that peak is now we have so tall because it's uh because of the thresholding affect but this is the threshold for where there's a single peak about this we assume there's enough D in a block that part hasn't dropped out and there's only one only able to true hamas i get but a single peak below this such as that one there um the black one uh it doesn't have a partner but because it's below the threshold it's considered the dropout all of this is sort of a battery in very unsatisfactory but it's about this where at at the moment uh i would say so much about that case now 'cause i'm running out of time but this is a zeromean on what the electorate rhymes actually look like and with these low announced at dinner it's quite noisy this thirteen liam turned out to be quite important and at that time and this court case i on this axis was regarded as the threshold and you can see the audio the team reached a peak height of fifty four on this one run up the dozens and dozens of reruns of different samples from the crime scene that was the only time that it reached about fifty but i counted as a a full a leo and this because it's a rare really all turned out the strongest evidence against this guy so you can see what this page here much bigger than that one is of no evidential value that's just an experiment a cold start uh this one yeah and this one here are assumed to be just background noise uh and so you can see that it's not quite as sensitive issue about whether that's a real big um but uh nevertheless it was counted as such um and in that case there was some three a labels that shouldn't be in there if the defendant really was the contributed sample but one time we have a lot of argument about how to deal with this um and the standard um way of analysing this problem is a kind of version of the random anything you work out you would the probability there are a a guy chosen at random in the population would be excluded by the seven and there's a huge amount of problems with this you probably gathered i'm not a fan at all all this approach uh and i got hollis here things that are wrong but i'm sort of rushing out the end of my talk uh so i won't go in any uh did how but other other than that to say that in the the whole idea of inclusion and exclusion don't apply anymore when we've got a small amounts of D N A uh and uh but just one of many uh problems with this approach uh and how we're gonna talk you through a little bit of the how to work through a likelihood ratio in this problem in the way that i would fig is at least uh somewhat acceptable but i want to i won't go into that so they're all these issues about modelling dropout um but i'm going to skip over um the quite important would be low level cases usual masking that you often have D N A from a victim which is of high level uh and it could be masking nearly all from uh from the true uh perpetrator um so we need to take that into account drop in uh i've got just some little simulation results here that showed no matter how much you feel this to pee wee which is part of the of the random and I D and always claim to be conservative it's not so these probabilities under various assumptions sorry likelihood ratios which are smaller than the likelihood ratio one to that two P rule uh and i'll skip all of that oh i see this one is quite interesting if i oops it's quite interesting this is about what happens if the crime scene profile is now and if the defendant is pictures i guess uh i'd say that's like evidence against such a big so the typical position of almost everyone in the field would be to say that if the crime scene profile is now i'm is nowhere we can ignore it uh i say that's why incriminating because if you didn't see anything it's more likely that the offender was hedges i guess and so if you would defend the dispatchers i guess that's like evidence against him it's like evidence in his favour if he's homozygous but if there's masking it can be dramatic yeah evidence in favour of and and that sometimes so not appreciated um and i will do you have hesitate just like maybe on this case because it's sort of remarkable the idea is sometimes suggested that uh all the problems are solved in the uh in the D i haven't field and this is an example about what seems to me the most kind of scandalous uh uh miscarriage of just as i understood the case um the uh there was another who contributed to the sample uh and the case revolved around whether this whether or not this the stuff fig it's suspect it contributed was actually true contributor this is what was seen in the crime scene profile you see several dashes he means nothing was observed so both contribute is we're very low levels of the you know i and we have a substantial amount dropped out this was the sort of random and not excluded probability reported in court totally one in ninety six thousand it seems to be convincing enough to the guy to get i did uh but if you start looking closely at this this and there's some really uh scandalous things going on here uh look at this twelve and thirteen that was in the in the crime scene sample it's exactly the same as the G the type of the node contributed so arguably this is no evidence at all it's just reflecting the known contributed doesn't tell us anything uh but the relevant likelihood ratio used for that locus was six point five uh because that's what you get from this random and not excluded for me which is completely illogical uh and completely miss rate presents the evidence and uh when i applied the sort of likelihood ratio based theory that i'm talking about so i got some criticisms of the methods here um i could modify the random and not exclude formula to be a bit more reasonable instead might two thousand i would've got eight uh but when i did a likelihood ratio calculation that allows for example evidence to favour the depend some loci with less than one um i come up instead of ninety X thousand with a like a racial too uh this is you know virtually useless and i'm the worst uh uh we can study and i haven't i don't come across this really hardly any information in in italy three labels in all of this that are attributable to this person and not to that person so it's really uh kind of uh shockingly weak evidence uh completely misunderstood and misrepresented in court uh and the guy was found guilty so my uh conclusion as i said i had hoped the come back to draw more explicit parallels with voice problems but i'd i didn't really feel confident uh to do that uh i one tell you that uh there's a lot of progress being made with D N A evidence situation is much better than it used to be uh well as the previous case just shows that a lot still wrong um and uh much remains unsatisfactory and much there are some fundamental problems with the logical approach that uh that to which they i don't think there's ever going to be really sad actually solution but it nevertheless provides the most useful framework for thinking so i should stop yeah area never to very much you could be going on or oh the i think we can some a few minutes to okay hmmm scription um i work with and consuming no pollution automatic systems um usually we select no hmmm speaker comp from cool randomly one recuse work uh_huh oh always yeah so so when we do hmmm one to me oh cues see solution uh_huh fig equipment we have from selection different speakers uh oh well obviously it's um it's difficult to get it right and it seems to me that this is you you have to do some uh some version of this calibration on the basis of man no um speakers but the but but let me see comes to your question i mean the problem is about to leave the limited selection of comparison is that what you see not mutation but um uh usually whatever okay the question is should we only use different speaker comp and you know evaluations all speakers who sound simple because the keys right the case no obvious man comes to us with two totally different sounding speakers also expertise uh i see that um well i i can i i um do you you know the issues that i had to worry about a a quite a distinction is some overlap but there are fundamental uh uh differences and um i would that is that's obviously somewhat uh unsatisfactory but nevertheless i can see that it's going to sort of bias you in a difficult in in a bad direction because this is most this is the most challenging situation to distinguish the similar sounding voices uh and um any and by biasing harrington that should be a good bye i would oh it makes it more difficult for you to um establish i or whatever get produce evidence for identity but what about all this summation of all accuracy and precision please and yes speaker so maybe we want to during to use mm hmmm sounds maybe oh yeah more easy yes um but if you watch trying to distinguish same source from different sources if the different sources that you use i'm different but similar that makes that a hot a comparison not easy so that's what i was suggesting that um there should be a yeah that's this it's it's it's good that you you have to do one something like we would like what you were doing it would be nice to have probably well designed experiments where you have uh speakers that are similar and speakers that are more different uh and you can see the range of differences um you know i have emphasised a lot roller relatedness for D N A evidence but i don't know well you know of relative you know distinguishing brothers speaking for example whether that's harder than for unrelated individuals um but it um so ideally you'd like to be able to consider all those yeah i when you say um you might be overstating the precision but ultimately you want to process your uh and if you've given yourself harder task do by having the different speakers being somewhat simple i like it actually the other way around thing which you should two however false you know action so we can be more sure no yeah okay maybe i missed something the problem 'cause it does seem to me harder task if you had very different sources you could distinguish them quite easily uh and so that's an easy task if you have similar sources trying to distinguish them is hard so you have given yourself harder task it seems to me on this i've missed something problem maybe maybe we can chat a bit more later and i guess the bottom of this and i think if we don't yeah my question tries to the conventional you should be using or so yeah imagine we we yeah but uh speech lab which analyze the highest you to present evidence well and is which would be that uh we have a with the recording maybe along um one um we have we we are able to estimate multiple like reissues yes often do you have you know actually you know not you would imagine that those one yeah um reasonable uh and correlation with them but you cannot okay independent you like to to make the problem but you kind of um uh those uh multiple issues all of them small values maybe ten and more than one thousand um but the idea is how to present that i didn't see what would you have multiple different uses i mean it was yes and do you cannot the proof yeah this do you time hmmm the independence those yes well okay that is it an interesting and uh difficult question and i i feel instinctively as i was saying earlier that um you know what given the right framework and some independence assumption you know or to be uh more or less reasonable and you can never prove independence it does always uh you know like people spent a long time trying to prove independent self or different labels in D N A profiles it's a it's a few tile um the size all ultimately but um so but if you if if dependence is a really serious problem i am M i'm just trying to think i need i need to understand what the dependence structure is to so really help anybody dependence is a real and i'm to mount a problem then i think you're stuck really i don't i really can't see how um to make use of the multiple level because obviously you know if you did have independence you can multiply likelihood ratios and everything is uh it's uh is easy the um or oh yeah so the analogy with um with D N A evidence is that is that the relatedness is the right the condition on things become uh uh can be independent once you've got the right um uh conditioning but in general um this kind of model where there's some kind of latent variables so essentially relatedness is a latent variable uh and some kind of model you i where there is a latent variable that and encapsulate the common features of the different recording to generate pen um if you can uh condition on that latent variable and then integrated out in some way would deal with it in some appropriate way uh you know i feel as if there should be some modelling approach like that that would work and allowing to then make and it depends assumption i mean just you know in general modelling dependent data disk and of um random effects models type things work well and i would go back you have you you have to explore to the extent where you can be reasonably confident about independence assumption and give some good arguments for it i mean i've always no i can still never ruth independent any of the independent sign assumptions i make for the D N A profiles but i just tried walking from uh from reason that you know relation is just you know oh and if we model that we should have sold and problem and i i would have thought that some kind of a venue like that the only option well about can can you tell me briefly what is the cause of the dependence that uh the deepens is what you see yeah we have as for two can be analysing different phones syllables personally depends well then have different characters but a lot and come from the same source yes but once you conditioned on it being the same source right yeah uh anyway i think there's a modelling answer but if you really can't tackle the dependence with some kind of modelling and so than that and i think it is you know you do have a real fundamental problem because ultimately gonna say well if there is dependent stay then how big could it be and uh if you can't really quantify that in some way then i don't think you can usefully give multiple likely ratios and the court will figure it out you've got to do the work but you do with a people like working tonight for historical reasons B C we should emulate D N A yeah huh you've gone through to oh you're all the problems of the yeah yeah oh i i did i think you're doing exactly the right thing i wouldn't disagree with the strategy that order that's what i wanted to ask them what you yes should we still be saying we should ideally i i i think so yeah something where a lot for all these i mean i have to say difficulties remain otherwise on out of a job and the uh um uh about it is true but we definitely much better off than we were ten years ago and uh it's a bit like when you're teaching well almost anything in effect you tell the second year class to get everything we told you last year that was an over simplified version of the problem here is that here is the real them and then in the third year you tell the students to get everything we told you last year that's an over simplified version of the problem here is the real thing uh and uh i i i mean i can remember now there are some things where i do actually literally tell the students that and uh and i think that um you have to you know get except it's from the community by focusing on the on simplified versions uh and it is a step forward and then there will always be you know you never going to overcome all the proper just but is actually interesting that um you know the way these various complications that i've talked about many of them i don't think you do have an awards for it in many cases you're better off uh we i think you suggested to me in conversation that you know we have these population genetics models and that so and basically this thing i'm talking about that all the dependence comes from relatedness and once we concludes condition on the right level relatedness we can get rid of the dependence and i agree that's a good point but there's a lots of subjective ms in those models um and we have many other problems that i was describing to you that i don't think you do have any and the analogy for so in many ways i think the grass is greener on your side of fans i yes well i would like to thing but again i we should do this