0:00:15how can open them
0:00:16everyone
0:00:18the paper
0:00:20i would like to peace and is in type of
0:00:23text dependent speaker verification system in the did you have a communication channel from is
0:00:30to do fourteen for comb research syllable
0:00:36here i show you the all night of this representation improvement you overview of the
0:00:43paper and four by the hit you have communication introduction and i will show you
0:00:51the biometric assistance of for this we hit you communication
0:00:56speaker verification systems
0:00:58the name
0:00:59i will keep the performance evaluation flow by
0:01:02conclusions
0:01:06firstly
0:01:09for the task of these research projects that is pure a and biometric systems that
0:01:15recently that you of communication for building she the revision control
0:01:22and
0:01:23this means that you have to the means benny a high frequency but duration communication
0:01:30channel
0:01:32so
0:01:34the main device and phone these the usage of communication channel is what keep okay
0:01:41this for kentucky is actually use the embedding issue communication so this approach that is
0:01:47a focus on realistic communication
0:01:50then
0:01:50for the navigation control phone the for the authentication of the speaker so
0:01:59especially for the c must the when this
0:02:02go into
0:02:03sub-turn seaport people
0:02:06in the control pendant this one though full is
0:02:10is in this one and
0:02:11sun
0:02:12some people register of this
0:02:18of the nice the presence of
0:02:19so
0:02:20only a tonight the person speaking
0:02:22can trying to see the two and the
0:02:24the sufficiency part
0:02:27so is the
0:02:28point been enforcing to set up these the
0:02:32systems of one the project
0:02:33and but the problem phone be so the
0:02:38the average of communication speaker verification is that
0:02:42this is a
0:02:43speaker verification system we hasn't but initial durations
0:02:48and this is short duration
0:02:51maybe about how one second per second
0:02:54and up to
0:02:55chi seconds and
0:02:56so that's compared to the conventional duration like
0:03:01one meeting put leading up or
0:03:04ten seconds and
0:03:06i usually use the in this as i
0:03:09alright is a quite short so
0:03:12we may
0:03:13focus on this
0:03:15up opens the by sun solutions and
0:03:19under the age of
0:03:21communication in all database
0:03:23hasn't many problems and
0:03:25and i goes all you
0:03:26those of problems and these the
0:03:28of phone this a
0:03:30speaker verification
0:03:32and
0:03:32and
0:03:33so we see some solutions of by using
0:03:36pass phrase
0:03:38a pass phrases
0:03:40what three screens the
0:03:41so we also collect
0:03:44some proper database
0:03:46i use the
0:03:47two
0:03:48improving in the past and those are all the speech data
0:03:51verification systems
0:03:53well also applied the
0:03:54marty system combination
0:03:56to form a improvement
0:03:58the performance of the systems
0:04:06no we go to
0:04:08so you
0:04:09a few hedge of communication
0:04:11power
0:04:13so in this
0:04:14this a finger
0:04:16so you
0:04:17bussing that application of the usage of speaker communication so you can see that too
0:04:25one is the
0:04:26from the user as it
0:04:29like six must the u s c n and
0:04:33so the other part isn't the control not in a purely so this person is
0:04:38to pass in communication
0:04:40so we had you have to devise a
0:04:43unlike what exactly device
0:04:45and that use the first three seven and
0:04:50thank initiate quality to the control synthesis and the control centres we applied the
0:04:56by we present
0:04:57for the c must and then so at this moment that's it must the so
0:05:03speak tune the
0:05:04what we talked is that
0:05:06with his the name
0:05:08speech is that
0:05:09and this piece you transferred to look control panel and the control site also input
0:05:16these the
0:05:17speech is to speaker verification and use them for verification
0:05:22so at
0:05:24for example and the same time and the console on the also can beep as
0:05:28a banana
0:05:30speech is that
0:05:31like
0:05:32we present for certificate the identity
0:05:37numbers of four
0:05:38for
0:05:39verification and we also
0:05:41combined is to the netting and the idea is to can a
0:05:46two
0:05:47to improve this the verification performance
0:05:55now
0:05:56for
0:05:58for
0:05:59speaker verification
0:06:01proposal
0:06:02the nine hundred
0:06:03correct that is because of the usage of speech you that are
0:06:08alright
0:06:09as shown here
0:06:11facility this is a
0:06:13j of communications speech you
0:06:15has quite noisy
0:06:18because i
0:06:19i is recorded in
0:06:21in on what development there is a in this environment this noise in baby c
0:06:26and d noisy this the quite strong
0:06:29another problem is that the is the a bunch and the
0:06:34for verification
0:06:35the open channel the means
0:06:39then the channel probability can be norm
0:06:43by
0:06:44the speaker verification systems
0:06:47so that so is the quite
0:06:49ugh
0:06:51so for this case the
0:06:53we of course
0:06:53quite big problems are for channel compensation
0:06:57so we cannot use the question a
0:07:00channel compensation
0:07:02then enclosed
0:07:03two
0:07:04but we use the channel mismatch effects for example you cannot use jfa it can
0:07:10channel factors the or even we cannot use appear at the a
0:07:15channel factors a
0:07:16for this the proposed so it is a ha
0:07:21how difficult t
0:07:23for this the project and not know why is that not be friend not those
0:07:28and speech
0:07:29speech is speech
0:07:31that means the
0:07:33during the
0:07:34you don't then
0:07:35and
0:07:36yes speech you is recorded in of these development
0:07:39so and is obviously but and
0:07:41is the
0:07:42up i
0:07:45is apply applied
0:07:47why
0:07:48quite
0:07:49that you element so
0:07:51for on the one a test
0:07:53environment that is the
0:07:55in a six
0:07:56so maybe there is this engine
0:07:59so we sent is the so now engine
0:08:02the speaker we have speech may be louder than in
0:08:06in all these development
0:08:07so also
0:08:08well that's because speak to now maybe
0:08:12this speak
0:08:14speech is speaking we have be plastic
0:08:17so not a
0:08:19problem is that the channel frequency and imitation
0:08:22with the usage of one we had to have a guy so you
0:08:27and this whole spectrum
0:08:29range that
0:08:30for comparison
0:08:32the first one
0:08:35it's normal recording without you had you have
0:08:38communication
0:08:40and this one is the
0:08:41recall that with we had you of china
0:08:44so you can see
0:08:47the high-frequency part is a sub present match and
0:08:51and we know for speaker verification
0:08:54the major
0:08:57speaker features a
0:08:59is in the high-frequency part so if this the information is not large
0:09:05much so maybe
0:09:08this is a speaker
0:09:09but if based on performance the we have dropped and whatnot
0:09:16known disco to the by energy
0:09:19since the introduction
0:09:22in this is systems
0:09:25a bus or you know
0:09:27all pass phrase based those speaker verification
0:09:30systems the
0:09:31this is the input to the g
0:09:34subsystems a
0:09:36with the
0:09:37gmm-ubm but there is a twenty conversion to
0:09:42systems the jfa and i-vector
0:09:44because they a
0:09:45gmms you audios and so they're having many problem and planted has a can be
0:09:50shared each other so for example as a
0:09:53the cash
0:09:56generally
0:09:57but ubm parameters a and they can share the supply sense that it occurs a
0:10:03so
0:10:04so on the proposed systems that
0:10:07the computation complexity we have be drawn and table two is just reading
0:10:13so we
0:10:15with sony's one and then entice systems the is actually the fusion of the
0:10:22cheese expensive
0:10:23the fusion
0:10:25calibration parameters the and the big
0:10:28can be
0:10:29changed by using but you a set of development database
0:10:34and then finally we
0:10:37with what we get
0:10:38this goes from the combination of the
0:10:41g systems
0:10:46and then he we so
0:10:48you
0:10:51the pass phrase and three screens the
0:10:53and is the verification
0:10:56personally
0:10:57for pass phrase and watering knitting is a
0:11:00what each pass of phase the
0:11:02of a speaker
0:11:03we are here the
0:11:07is the corresponding models and
0:11:09for the modelling so
0:11:13a certain that there are k plus phrases the for speaker i and then
0:11:18we are
0:11:20you're k plus place
0:11:22model was and for this because the
0:11:25so if speaker
0:11:29say
0:11:30for one to crying
0:11:32to be as the speaker i and
0:11:38with this the
0:11:39pass phrase and all so we will
0:11:42if this and autoseek ha
0:11:44i
0:11:45and all up to you compare although with the all these utterances all j at
0:11:53all
0:11:53and finally we get
0:11:55that verification
0:11:57scores no
0:12:02we so
0:12:04the database the
0:12:08clustering phone this
0:12:11point is if you had you have communication speaker verification
0:12:16projects and
0:12:18this database is it was still for parameter changing
0:12:21presenter's the
0:12:22they are used the
0:12:23for ubm training and values for symmetry total variability in the tree i
0:12:32in i-vector systems chaining
0:12:35and also used for plp a chaining and i either used for
0:12:42i can
0:12:44eigenvoice the fact the eigenvoice the metric chanting
0:12:48so
0:12:51one this database and now from different
0:12:55you minimum and
0:12:56from different recording bayesian
0:12:59presenter's a they can
0:13:01in office environment and visit you had to have china
0:13:05and so with different distances
0:13:09between the recording
0:13:10and receiving
0:13:13and then we also collect son database and forum
0:13:18by using d that setting all recordings
0:13:21you obviously you've elements of for example
0:13:24is i as are
0:13:26pending for clean and also we
0:13:30because you on what you're element is it to simulate the
0:13:33no real reason is because a development set up for communication
0:13:39speech
0:13:40second speech database the recording with the we had you have
0:13:44and here is the recording devices
0:13:49like what we talking
0:13:51mike and on the microphones and also i pay that is the mobile phone
0:13:58and
0:14:03so we have development
0:14:06a real time
0:14:09systems the phone this approach that's
0:14:11i think you know we so
0:14:13know how well components of the voice the biometric systems the how about
0:14:20improve the computer
0:14:22and
0:14:23this usb song call
0:14:26and
0:14:27this of you had you have
0:14:29has said that there is a walkie talkie here
0:14:32for receiving and also for just meeting
0:14:36and here we so the so well user interface and in this the survey into
0:14:42of
0:14:43user interface and not cheap regions
0:14:45the first case i is used for any stray showing up registrations
0:14:51and then
0:14:52the second one is a for enrollment
0:14:55pigeons the and the so one is for test patch pigeons and so
0:15:01this has being updating find the
0:15:04by using the idea is to go
0:15:07test inside
0:15:09on what
0:15:14now we go to the performance evaluation
0:15:18so we can see
0:15:22and in this the
0:15:25the pass phrases the
0:15:26for the evaluation proposal
0:15:29you know
0:15:31one this purpose of the participate the we s p
0:15:34one by one the name
0:15:35the i b
0:15:37and
0:15:38no but repeat several times it
0:15:40in different sets when i in different development with samples the in different should
0:15:47so
0:15:49so here we so
0:15:52than the
0:15:53the evaluation database and the development
0:15:56when database it
0:15:58the number of what goes the we use the phone this the performance evaluation
0:16:03and the number of chaining and has a
0:16:07utterances used of one these evaluation
0:16:09also we so the true trier the number of trying to try
0:16:14number of impostor trials
0:16:16use the phone the evaluation
0:16:18and we separate the
0:16:20and not twenty speakers
0:16:23participating for these the database that
0:16:26recordings the and we separate and check
0:16:29this is because the and
0:16:31and ten speakers and
0:16:34i'll four
0:16:35one evaluation and for development purposes
0:16:38and here
0:16:40we also given that
0:16:42the averaging durations the
0:16:44for the name and for the i d
0:16:47for
0:16:48and also for next bus i d
0:16:50and you can see the averaging
0:16:53duration is about one point joe four
0:16:56for the name i is the one point g six that
0:17:00for all i e
0:17:01and one then pass i the it can reach
0:17:04two point four seconds
0:17:09so
0:17:11we so
0:17:12the performance
0:17:13these are in terms of eer and minimum dcf
0:17:19for each of
0:17:20the single system is
0:17:22and the fusion designs and
0:17:24and you can see
0:17:29for things and disaster is always better than the single system the gmm
0:17:36and jfa and i-vector
0:17:39why i like to is the performed
0:17:42is not so cool
0:17:44as compared to add rosa
0:17:46so actually
0:17:47because the in i characteristics than the
0:17:50we only encode to the
0:17:52those the
0:17:54channel information as aforesaid pen
0:17:57in reality a this there's a
0:17:59but these channel compensation
0:18:03"'cause" that is right isn't
0:18:05what is "'cause" consideration is not so single
0:18:09for this is duration so
0:18:11so maybe i you we all make
0:18:14then the pierrette the a performance draw time
0:18:17so
0:18:20in ten so meeting mindcf we also
0:18:23so the best performance is all that fusion and
0:18:29so compare always in the second leading and single id than the implies i
0:18:35performs a
0:18:36better
0:18:37then every
0:18:38in every
0:18:40systems a
0:18:41single or
0:18:43that there's and one
0:18:46so here we also
0:18:50can
0:18:51the
0:18:52the fusion
0:18:53with the name process id
0:18:55current
0:18:56alright at
0:18:58in a ten point one cheaper same but
0:19:00of eer
0:19:02so is the
0:19:04quite good results and
0:19:06we expect
0:19:10you know from the second
0:19:12perform an performance of with so here
0:19:15with the
0:19:16det plots for you had you have
0:19:19then
0:19:20i t and
0:19:21then they about i the comparisons
0:19:24so we can see
0:19:27these the things and results the opportunity better than and the other subsystems the
0:19:34for name for i b phone n-grams id
0:19:37and also can see
0:19:38banana
0:19:39i e
0:19:40then the performance that is quite good
0:19:49now we go to
0:19:50the conclusion on this the presentation we haven't introduces a
0:19:57we have introduced a possibly bayes the text dependent speaker verification system
0:20:02against the industrial
0:20:03duration condition
0:20:05we have
0:20:07develop appears as is then consisting of gmm ubm jfa and i-vector
0:20:13among then
0:20:14the ubm and the stuff reasons they do they got astra
0:20:18and according to the different conditions that between enrollment and but indication we
0:20:25besides the suitable in these four
0:20:28for parameter changing and find us
0:20:30system setup
0:20:32experimental results or that there's insisting gives the of one system or what any single
0:20:38system
0:20:39then
0:20:40two point four second duration or like eer of that's and then chapter seven
0:20:47this is my presentation sink
0:21:03so
0:21:04for this application i assume
0:21:07and correct me if i'm wrong where is your operating space i assume that for
0:21:11the most part if boats are coming in the most the time it's expected that
0:21:17the right person is gonna be radio in
0:21:20so you really care about so my correct and that you really care about the
0:21:24the very low miss rate
0:21:27is that correct you basically you care what region are you most such an extremely
0:21:32low miss rate
0:21:35we just terrible
0:21:38the identity of a person's
0:21:40right jet set operating point
0:21:43right so for this scenario laid out boats coming in this in general so that
0:21:48it is sense
0:21:49i guess and try to get a sense like here and i think even a
0:21:51lot of the text and then applications people are talking about where kind of the
0:21:56low road
0:21:57we're focusing on a different part of the debt curve that we would when it's
0:22:00trying to find a low prior target in the dataset this actually maybe in the
0:22:05hyper prior target the cost of changing a lot so
0:22:09do you care what region you gave like equal error rates all that you have
0:22:13an idea where you really care about operating the system where it's gonna make its
0:22:17threshold
0:22:18so
0:22:20we may consider okay
0:22:22we go to this
0:22:25this
0:22:29for this part
0:22:30then
0:22:32we can see
0:22:33for this one
0:22:34we may consider to use the automatic speech recognition machine use it to replace in
0:22:41this control on the so that means that we can improve the total performance of
0:22:46less is then so that
0:22:48by total automatically
0:22:51sue
0:22:51zero in this the
0:22:53automatically by or not
0:22:54systems that and get
0:22:56then the then the information of the
0:22:59us because the and to the verification when this task as
0:23:04then this
0:23:06idea can improve in the total
0:23:09performance
0:23:11the
0:23:14sorry
0:23:16in some way
0:23:19i'm interested in the communication part of fuel system you talk is entitled v h
0:23:27if communication
0:23:29we had to have communication is not very specific it simply means that the radio
0:23:35frequency ranges between city and three hundred megahertz but there are many ways and many
0:23:41different channel qualities and signal quality set you can transmit over v h if so
0:23:48i think you implied that you use marine radial which is the
0:23:53usually analogue and f m but not necessarily you can transmit the signal digitally
0:24:00in many different modulation the channels and then i assume that you talking just about
0:24:07the marine the walkie talkies but then in your list of databases you also mentioned
0:24:13mobile phone data now mobile phone data is not either transmitted on v h f
0:24:21no analog so i'm confused how you use that data in analysing your the range
0:24:29channels
0:24:31so from this that isn't why we choose and those them about mobile phone devices
0:24:37because the we don't have enough
0:24:40database to use the
0:24:42with the did you have
0:24:44a friend from the system changing so we haven't tried several times it
0:24:51by we use
0:24:53by discussing this ball database of button the performance of we have dropped so
0:25:00we i in the sun
0:25:01some that are from this a mobile device and recording
0:25:06i in many
0:25:07had of course so this is a
0:25:12one is a consideration
0:25:15we only based on
0:25:17the experimental results
0:25:21sink
0:25:29formatting communication whole most cases it we only use the
0:25:36this we had you have
0:25:38like walkie talkie for communication
0:25:41is a popular so is a suitable
0:25:43for universal
0:25:44six communication with the control panel
0:25:52so it normally when you look at ship to ship or maureen type communications in
0:25:59be modulation demodulation process
0:26:02quite often in the in b d modulation
0:26:05if the speech bandwidth is not shifted back to the right location to be an
0:26:11offset in there
0:26:12and so that distortion will actually introduce a lot of problem so you have to
0:26:16kind of a cadre normalization or adjusting here
0:26:19are you looking at real data when you're doing you're testing and if so what
0:26:24is be the plan to kind of interest some of the other
0:26:28problems of used to be christian analogue
0:26:32v h after i
0:26:33communications because i don't see you have listed