Přepis řeči - SPEAKER VERIFICATION BY INEXPERIENCED AND EXPERIENCED LISTENERS VS. SPEAKER VERIFICATION SYSTEM

"'kay" Q so um my name is julie it can and uh i will present you uh what we have done for hasr submission and uh and all or are not this we have them after the mission and what we have done because of this proposition off uh questions so this work is than we have and nicholas as of where uh us not of that all is a seven arsed and uh we come from friend okay so uh the goal of as or uh as two i'm lies how can you man expert i think that but to lot uh use of was automatic speaker recognition technology is and uh how you we can have the makes on the the boss communities so uh it's was a very great experience the first time and so uh for for us it's was just uh the first experience and try to do something for this submission and uh the task was uh a very few good a classical a verification task we have a two point five minutes uh of uh samples for a it's speakers they come from a every ten and it was very difficult trials and this trials all's were uh choose and by uh need and give it in the the choose the them the choose of the trials was done for reports from a particular system and uh with this two sets has a one and has or two and so we poured to pay to as so with because the the the samples the the trials or uh colour include the trials of hasr or to include the has a one so with the all this the task so what we propose it was very simple because uh E a is the more and computer science uh uh rubber or or E and uh the league to so uh we just take as three net C french listeners and uh to for all and the males and one male and we all of them to exam a a examined spectrogram and uh to chance to band past uh filters signals so they can do was they want and uh they have to decide if it was the same speaker was speaking or it was two different speaker and give a uh a confidence score and uh if they gave as they were was that's means that they are not confidence in that decision and if they gave five that's means that the are very confident and Z C that for the submission uh to um i missed uh we choose to use the majority voting that's mean because three people so it's easy to have a majority so i uh we we we choose to use this indication to to to to the the choose of the decision and uh for the score no uh we said may it's uh we try to do it do we choose to do a mapping between the you men decision and uh the score we have we are as the and gmm system uh so has to compare uh the in try to find version score uh after and to compare uh if you have question on the mapping i can and sir but i think that's it's more interesting to to take uh all their results and see all things so um a the fact is that's a nice provide us a very long um samples so because we decide to do receptive test ten to uh listen to lots of the the samples uh we decide to chance L a little bit the things and not to give to the listeners all the two minutes of uh speech for each speaker so um it's and that we we decide so to to cats the signal and to sell at the the part which i with them more energy and uh we where we we are sure is that there is a lot of speech and the maybe a lot of information so uh and so we select a um the short things around six six second for it's samples and because of the in you have a lot of uh in perceptive test and for um which are only in psychology they use this kind of duration so that's why we we choose this kind and the uh so high have some example four that's you you use can see and her what's i am talking about because the idea was to have a beep between each sample to knows that's we are changing uh of sample yeah that are is a chance of them so that's my first example but a i a i i i i i i i i i a i oh i i i i so same-speaker same different speaker what these thing okay and it's not the same i it's always is the sensing i mean we we choose the or difficulty so yeah it's not the same but um yeah you have different sample in you can uh have you you not memorise because you don't have in house to memorise but with two minutes at exact same thing we can have a i mean your voice leaf two minutes so it's something you can compare very quickly and try to do to take a decision and at the consequence of this kind of the the steam is is that's uh are are are are listener take a decision very quickly um in around thirty sec and they they take these decision so that's why we we we choose this okay so i come back here so yeah and so they they can uh use uh L and they can yeah or or try to listen but you all the usual is they they just take the decision very quickly this as their results yeah you will have the other side and i i think that what's is very very thing is that uh are a some i think system is batter that the the decision we takes by you man and um are first question why as two because the the question is the human performance at that is is the very important things so uh a a first things that's we can have four and good information to know if the U we can cook have a confidence of the decision of human is to to see if they are agree and uh uh what's up and when they are we did they take the good decision or uh these they are wrong so you can see that here yeah we count no if they are if they agree it's not uh a good uh indication of the fact that's uh we can have confidence of that our on their decision because they do um yeah if you hear you have this this is the the good um that would then sir okay that the correct answer and the here are the the trials and so you can see that here that seems that they take that good did the correct decision but here on uh when they are working on a target as target uh we can know if it's good or not because here you have exactly the same proportion and and the confidence score as gave uh is not a a good indication to so you can't trust the people when they are as they say say okay i'm sure is that is the same that's not a good thing irrigation to say okay we can trust them so that's a problem and for the um the the um yeah we we are we have some discussion on the of the protocol of has or because the first one is first thing is that a listener uh have the the feelings that's it's was more and evaluation to no he they can come pence the cup to channels the and to evaluate the proximity of the voice for to speakers so but it's because it's just not that i as they'll days they ha uh uh usually do so yeah it's was difficult um yeah we it's thin only it's not it's for our summation mission is more a perceptive says as that an an acoustic and then is is because they don't use they just filter when they have very different channels and something like this but they don't use some part of the signal to know uh where is the end to take they decision it is just press the tips things and for the limitation of the protocol the question is is we have in a house to to dues that the T sense you know exactly what's happened and what is very important is that we can't randomized the the trials is that's means that yeah all the speaker um her in the same time the set the trails and it's clears is that you don't have the same attention when used charts and when it is the a hundred of trials you are listening so as that's important and and college is a we do always that's that's to two to have the to randomise the the the C so that's after is that we have a lot of question of of does this submission our first question was okay um what is the influence of the number of speakers because we have only needs tree speakers so what's up an if we increase the the number of speaker um and the uh what is the difference between experience and and not experience and listener these we have express sort of expert and what is the compliment charity T between the you men and the system decision because uh we just said made the decision of you men so which ends a little bit the the protocol from has or uh we have more listener search non experiments and ten experience listener we randomized because we have of all the trial so we randomised of trials and we balanced to the number of non-target and target uh because uh the first time the idea of this these an are are there okay i have to to it's so yeah it's a balanced so i will take them the there were point five of um my natural priori is result and so and for uh we only uh allows them to to listen one the trials and not to repeat the trials again and and so what are the result of it we have a only for non experience and listener that's for uh above chance level so if you take uh a occur on and here take it exactly the same thing for the majority of the listeners but what's is very interesting for us is that you have a very large gap of performance according to the the trials you have some trials where ninety percent of the listener are are core are right our core but give the good the correct answer and for all other trials you are only strip or or and of the listeners that gave uh the good answer so we don't find uh difference between the male and the female trials it's exactly same thing and we have sir different be if your of are we have some is no was say oh always yes yes yes it the same is the same and or there's that's are always thing no it's not the same as not the say so we all as that for for the from the the listeners and we find a correlation between the performance and the in level of the the listeners because here it's for and not not of uh in people so yeah we find that so the last question was the complementarity between the you men and the system and that's what we find is that's um for non-target trials the as be ham of uh a lot off correct answer and it's the only correct then some for the N the M and not for the you min but it's the contrary for you and we have a lots of uh a a big for per oh sorry a we have a be a big for version here uh of correct answer only for the you and so maybe we can find a compliment terry T and uh yeah um and the not yeah that's so and the after we have are known the experiments so then experience and listener and we don't to find difference on the performance for the non expert that and the experience a listener yeah it's exactly same thing you'll have the th so for the suggest and the the first or work is more question and all those things yes because the first question is how house the you men can help the system so um maybe you uh we have to eggs i mean the trials with the scores that are near the threshold of the system because we observed in the compliments are T that that is that it's is the that it is the them the trials where uh you man a right and uh system is wrong so maybe it's something thing we can do you and yeah the second question is okay i have some trials which are very easy and all their very difficult for you man what's are the different between so trials and it's its clear that it's important to rip gates this kind of experiments we have not to have listen at that i'm sure that joe or the next paper will answer to this question thank you you describe performance or with experience to of experienced was listeners uh how how someone of experience to image bruce was but the question and and experience that listener is the fun addition but who doesn't the fire and they don't to work on for and C E core they don't are uh interested in the the the speaker they they work on the language and uh so they they are not eight uh everyday day novels they are very yeah they they but it's not so experts a for and six something because in france we don't have this kind of people i yeah so this a lot of people saying just always a all makes me fine where it would be possible to how real human judge just give the mumbled string as might come from from list model and do post facto dollar bleep each room thing to of the pressure your bit of the best to perform to do do that but it we make them but that the problem is that for the first two D we have done we the three people the three people um we don't have a correlation be good uh correct answer and a confidence score which is good so we comes use the car the you see you we can't trust the the the the the the listener that are not it's not because they say i i'm sure is that it's the of my decision that's made is that's sim is that's sin if you K eight that they are right so it's difficult to have a liberation and to use this confidence score

SPEAKER VERIFICATION BY INEXPERIENCED AND EXPERIENCED LISTENERS VS. SPEAKER VERIFICATION SYSTEM

Human Assisted Speaker Recognition

Přednášející: Juliette Kahn, Autoři: Juliette Kahn, University of Avignon, France; Nicolas Audibert, Laboratoire de Phonétique et Phonologie, France; Solange Rossato, University of Grenoble, France; Jean-François Bonastre, University of Avignon, France