um i haven't unique challenges you i'm it in that case um and we can see i'm sad also but that's my reading or and then i was so i have to prevent this instead of fast the colours but to begin which a two some D V C R which mentioned simple code you know it's oh oh um this that is on that oh it could mean human speech so topic so what can we all copyrights good and use it in in two yeah i'm fine fig in the future you may be possible meaning i'm one sport that uh but equation just one yes well that are taken as you both just oh problem is okay well the big one speak search and then yep this problem cool and because then if you're still yeah two so we in this talk we evaluate how to secure the speaker verification systems uh okay fig speech yeah cool someone speech voices using just and sent but once and but we can call and a speaker's voice from ten sentences but this is a content my talk i to talk about some yeah now introductions um i'm wrong and then we there's a lot recognition ideals the S U N then i we show some of its work which right then i cats this year and then i think panes of a speaker verification systems for by i think the system useful yeah and then i mean streak payment conditions um uh and then i wish also or some quiet to detect synthetic speech speaker verification using i mean so cool oh yeah yeah yeah and is that what it right and they are somewhat item i so do you know about that but it's not a you know no my kids will be because we can assist them how some you know but i know we used to tts systems that used to be and in the conventional we thought conventional scenarios pdf is then that's true ah it's great you need selection tts system is right so or what combos on technique unit selection if the just and peso equated with ones and he and then transform so someone's voice to target speaker with using your joint probability of gmm trained on all right right um three in any of that six well beatification of course can be things i from only you can see but um also since uh oh with a fair fight five can be transformed into basically how good ha ha a voice you think this document but this combination oh problem speaker or something i think it's probably speaker verification system but all the is distance is it this yeah fictional our tts systems uh it can be speech synthesis cross speaker adaptation such as embedded uh it um this just a also what's the problem speaker vacations because speaker adaptation scan possible speaker independent agent which which which um which are cool but this was more distinctly dsp use into the target okay a voice using small amount of data and then uh that any i don't use for verification can insights from and update more so this justin also probably speaker recognition but we'll be the justice system it's more probably more interest it's combinations i needed i think basically that's and this problem was fast reported by my scores and you have a go so why do we need why do we need this is used there are several times um the positive and its performance it is it and basically thanks the whole month of its fear it can be this way that's right quite different ways in power the quality of a ten base yeah it's no problem with well detection systems and and then well it it in disobedience of holding elections more specifically in basically field agent based it's same as human uh under speaker adaptation techniques what speech is a hot what maybe cochlea it well yes we can do speaker adaptation unsupervised but uh like is that which add up to a much past we got a job which is and also we need be able to use we can use when the us in part clean speech data uh fig adaptation data so taken together it is now possible automatically create how did it because tts voices from any at all what about it which i thought that right no which means what do by oh available ones of it can be used oh affecting speaker but it was just so i think should not you yes not you he that's fine speech data five a quite well well the gas well look at or texture like this you know we can record my speech i think my speech might be like right um anyway so we can why a speech one board all the cows well cast because jazz well it then using this well p2p does that that uh yeah about it can be a speech systems right the other ones then you think about what is but yeah right speech or beatification useful because it gives it is and then we prepared accept samples um which much to the scenarios so we really is terrific speech from yeah does it have a year which but cats and also clean it up they can well i guess i go you know it's got and there um see fig speech is you know i pray couple samples on this in six speech samples oh okay from a genocide that yeah it's put together and how yes so what's up with george bush yeah so he's adapted with this meeting we keep a T S one george bush or not and then clean it up fig yeah right can you identify how oh oh meeting people communicate yeah maybe and and of course yeah it's inside speech yeah i know so the with this just octaves have also but uh oh size so is this poses yeah times with fig so um yeah let's go back to sorry um but okay so i hope you understand that the security issues of this and then we use um explain i'm sure that is uh yeah i guess two thousand and that's so we we use it in this databases which ah i agree but speech uh why when you and john because and then we you really really simple speaker verification system is in place because um but we well i yeah i know this then but yeah so what standard gmm ubm and also you know gaussian but but if it's at the end which you know some people this yeah use right now um you know so the with score normalisation feature normalisation but when he is there's no significant device this from a point of views because in most cases the speaker verification system a tape green fig speech voice you know i think um so in the store i it was one indian you'd be in it but what we have used but um which are basically the same so this is the design previous so it's oh oh well what distributions one ten german speakers um this we do not sure school what human speech target because um the human that ha sure the human each well this was just and this is a i see fig speech about impostors it is not with a button uh you did and then this is a fig speech about oh i guess and green one really right these figures sure scene six speech will again that you can see these qualities previews on for human size speech for both postures and and also what to do green i need i think yeah that was it can okay it's not and but was yes but the problem is you know pretty payment because number of speakers is yeah yeah too small and then the speech data use was a read speech tagged as but you know oh i think it's not you speech data why be you know it's assumed to be not a clean so in this cool in this new book so we use three hundred speakers included was the channel zero i would say to eight so what right they were this this oh much of it that some tts corpora because yes this is yeah i agree you know it's not perfect three you know and vitamin and stuff is because and p2p uh what is the point or something else i think snotty and also we therefore happiness to you formation on missile could detect fig speech because it cation systems what sample sample it with sup sup i thought it was a mess fig speech in speaker verification wow but again which is you speech becomes much better someone so we have a body dismissal obvious certainly um probably more this impostors a lot it's and it's hmmm um histology about your name ubm guns i think right but you uh the way you want it uh we use if the end of the the the stuff no energy on it data um we a bright future one thing right robustness proposed by then uh we had that is G and then you'd mark adaptation um in addition to what janet was we evaluate it yeah but you didn't system you'd be used for the whole process which we have a but because and uh okay right what right okay about right and which is level or more is that it right so probably this she's be and uh um this is the quite well but that over the whole i don't speak about it it's so quite because it's in this piece of this it's the complex it's that's that's in march possibility i really want in speech right but no speaking so we use this guy same technique uh it starts training average for some of this which is basically yeah i did ubm or speaker independent agenda so we use because of it i mean yes it is hot it is with some of the yeah we uh uh what is you think adidas functional like houdini regulations well you know pulse train and made it off or it's not about see in the data yeah small amount of because okay be then we generate acoustic on that such as but um uh so each duration so some noise for me citations from the side of it and then you mean maximum likelihood on occasion as well i proposed by with a ninety five for this taken out can you it's yeah how much someone says and then and then you think it is generated acoustic um it does we run and i would be with the whole right proposed by colour and then this is about patience so we can create new tts voice from um senior just that's from three minutes of speech data was if speech database a bit of more quickly becomes bit but minimum the meeting if where am i yeah i think i'm leery ha of them with this or that and this small sure at that individual speakers and then they well actually the a female speakers and in this remark sure the male speaker other people will uh as you can see this paper how about his point and also china and so on um you and that's it and sounds which one and that was my question how many voices available in mark can be who the speaker verification systems so again our scenario it's not building tts system on speaker verification databases it is no money you don't narrow band ooh go to the noise or maybe all five microphones oh what can i do is you know most of my nearest acquire speech because um you know we why you crises like this they adapt yeah fine so we use okay i think we we use also i don't know um data bases sort of this database yes um two hundred eighty four speakers uh we weeks once because fig can you got even uh we use and it's a speech and then we buy excited for it speaker but you in to see it if you see the old and that it it's for training data source tts um in the set they retrain average voice models or by speaker adaptation individual speakers we use she made it out was trained and data for the patient and that be it training data that's for speaker recognition systems um right any buzz about that one what is in a moment uh we have that yeah that's what but set see it has been as which have these accounts all speech data part but also that's because and this to be if that's true speech data just from useful cations um i did for a couple of samples um data from this yes trained on this was original data oh come on one this policy yeah yeah yeah so is this too long reverberation you you huh this thing is um yeah a big car they show yeah right ready to additionally the weight of a oh you must not um and of the you know the equal error rate it just the point five this is a false alarm probabilities and season diction um so we can see speaker verification for human speech so you don't know yeah but that's why you know we can say our speaker verification systems channel they are can't distinguish because yeah speakers part almost part and the this is that is that human speech but speech um if the score distributions uh similar to create i mean this is the human speech what are you fig because um this is the same sex speech about target because um this is a human speech input just well this is six speech but also just and the distribution all this was good this for distribution um no i don't know anymore but as you can these they uh significant or whatever in but in march claimant is where lies voice okay you know maybe the extreme um hum about ninety percent speech but it it so see much train uh two hundred sixty was oh fig two hundred six people was actually so someone is of course but despite excellent performance because the case was this thing which uh one why it out i all right the speaker i didn't speaker his eyes before speaker out of it it is because this is hi enough to allow the use right pause to do human right going on see what i keep up well what yeah um because they have significant overlap i just meant decision the shooting was one of my vision uh uh like the head so of course problem is how can we this yeah we are not all right right it was so yeah i yeah i just so we why yeah extra missile yeah the commission on it which uh nothing if i see them like we do what's your idea i propose but so what and also we use what is that what data rate um we can from the us curious you know oh no um a base and define right it's pretty both it just define sees the right kind of video thing this is the like right right on yeah um we of it but this is simple but he was useful do they six speech because p2p anything from a challenge and how or was this project we switch out it's more of a spy i that's and also things expedia the unit selection and have john trajectories uh uh change from point which is that data is yeah i yeah i but and tedious and it can be a speeding is included some global time but is from all this by the with kind of for what some of them effect both project for the in fact this um this is that is that average five year we do right sure human speech i think it a few months um the same one well speech and it if if angel that's okay speech and you can be they have quite all brought up and therefore this measure no longer robust you know fig speech cool they yeah it ended this and uh because i yes uh well you know it because in speech patterns to use if the school or six speech maybe okay fictions uh based in p2p humour speech so we sort it might be possible to save in p2p you fig speech yeah what they like it's all um we p2p up to a month yeah marcy it it E G okay i'm up for it yeah oh um evaluate well be right human speech um fig it um this is the weather right this is a yeah there are a as you can the we tested it fig speech was found to have data where there are a few or both grammar a few months while they're writing about involved in for the first six speech where they just say well in it means that if you go grammar yeah there are huge differences uh and then this it's too even for the adaptation data is just one me speech today so it is not i you yeah what they write is that fig fig um i to summarise my talk um this but the extent almost speaker verification yeah yeah speaker age and it because i didn't speech yeah i got that a channel yeah this something school it's tedious it's high enough of these inside was possible to the human right this thing brought it the speech data available i guess can be you import speaker verification this can in i don't know how many well i guess but or support because oh it is impostors okay fig yeah and then i'll mention a missile you think uh commissioning but yes i hear it or what they write fig fig what no moreover robust no but yeah but it is this you know security issues we and we like to do these this voice going speaker adaptation two for free or on the way right right provides a base what's going on well but you don't know why um so this technique um um we have about them and from all speakers what and so national in it is not his fantasies you please and uh i and you like to it's hard that's you you can because this technique can cool people's has yeah talking some T Vs cool sample and you want to use we have right be um just techniques can because welcome hoping someone that's just because voiced and use the voice um we can associate with they are embedded devices that's voice indication eight so yeah that was we need they do you future but it is the screen voice since it voice and he must oh oh that um this that's all right presentation uh we should oh so oh or four sure oh uh with oh you do oh right which hmmm so but oh replica guns working on speech transmission to oh your your yes which i see but i ninety percent of the voices box that accent so even speaker verification cranes to start with sure uh identical people well i think of puzzles we have to that the uh to speech one ooh mark oh four yeah oh hmmm oh for a moment we um or oh or but oh well when we were different circumstance yeah oh oh oh sorry sure true oh oh well if we yeah for the money would be to model i'm not like uh_huh drawn from from uh yeah that too well right one actually going on we we do um right perhaps a big challenge yeah i think that's that's the that's the crystal this see fig speech maybe they can variables in doing that right hmmm uh_huh okay i am i we have some similar work and i some paper so there we also um right um yeah then fine yeah um see signs oh transform tonight basically to intermediate speech will be back to the speaker identification system so um also i don't know what street journal in a nice and then you so we can and and you had to to to type and speaker identities instead right based on like ubm agenda like using that low level acoustic features in the other one is a novel speaker identification system and such as no phonetic that right so what we are used and that um and they generate generated the it is and and i think that now and a novel feature based speaker identification hmmm small one double hmmm well well whatever bottleneck and and you really to be selected by now generative i um but yeah looks like a high level yeah speaker I D's i didn't use instant it's not make it's not robust okay at low levels so like mm they stand for and just got in speech you yes right and then it looks like and hmmm there you can do you like a i mean i yeah this p2p a speech reason or no it's not yeah so probably um and that's what you have done yeah experiments also you see and at that time speaker verification system using try using a novel and features yeah yeah temporal features mike not only not long range and speed make some characteristics probably and now be more robust against that that generated speech so basically and so hmmm the speaker I D C that was transformation all three yeah can be too yeah we see each other and then something to do with this uh_huh D C so yeah we can probably also borrow yeah symphonies um speech since it's it's uh jenny generation you know yeah try to make speaker and so but yes and and and probably and now also on how expensive where is it is fine to use for the speech thing is is that you probably two okay normally and teachers are that's right sure yeah yeah it is yeah okay so no time i i just got my question uh no no no no no you you you you you you and uh yeah that should be used them on the same what would happen if you change the yeah so uh i questions um we use you know gmmubm systems and svm with you know caution it's a contest um but we haven't ones that in a long time future yeah it's real time values um but um um but uh_huh we have a new features um we have one you huge which is we really one so i reassured that is uh right next i guess next with bonds and you yeah that's not a long time yeah right right yeah