Speech Transcript - Text-Dependent Speaker Verification System in VHF Communication Channel

0:00:15	how can open them
0:00:16	everyone
0:00:18	the paper
0:00:20	i would like to peace and is in type of
0:00:23	text dependent speaker verification system in the did you have a communication channel from is
0:00:30	to do fourteen for comb research syllable
0:00:36	here i show you the all night of this representation improvement you overview of the
0:00:43	paper and four by the hit you have communication introduction and i will show you
0:00:51	the biometric assistance of for this we hit you communication
0:00:56	speaker verification systems
0:00:58	the name
0:00:59	i will keep the performance evaluation flow by
0:01:02	conclusions
0:01:06	firstly
0:01:09	for the task of these research projects that is pure a and biometric systems that
0:01:15	recently that you of communication for building she the revision control
0:01:22	and
0:01:23	this means that you have to the means benny a high frequency but duration communication
0:01:30	channel
0:01:32	so
0:01:34	the main device and phone these the usage of communication channel is what keep okay
0:01:41	this for kentucky is actually use the embedding issue communication so this approach that is
0:01:47	a focus on realistic communication
0:01:50	then
0:01:50	for the navigation control phone the for the authentication of the speaker so
0:01:59	especially for the c must the when this
0:02:02	go into
0:02:03	sub-turn seaport people
0:02:06	in the control pendant this one though full is
0:02:10	is in this one and
0:02:11	sun
0:02:12	some people register of this
0:02:18	of the nice the presence of
0:02:19	so
0:02:20	only a tonight the person speaking
0:02:22	can trying to see the two and the
0:02:24	the sufficiency part
0:02:27	so is the
0:02:28	point been enforcing to set up these the
0:02:32	systems of one the project
0:02:33	and but the problem phone be so the
0:02:38	the average of communication speaker verification is that
0:02:42	this is a
0:02:43	speaker verification system we hasn't but initial durations
0:02:48	and this is short duration
0:02:51	maybe about how one second per second
0:02:54	and up to
0:02:55	chi seconds and
0:02:56	so that's compared to the conventional duration like
0:03:01	one meeting put leading up or
0:03:04	ten seconds and
0:03:06	i usually use the in this as i
0:03:09	alright is a quite short so
0:03:12	we may
0:03:13	focus on this
0:03:15	up opens the by sun solutions and
0:03:19	under the age of
0:03:21	communication in all database
0:03:23	hasn't many problems and
0:03:25	and i goes all you
0:03:26	those of problems and these the
0:03:28	of phone this a
0:03:30	speaker verification
0:03:32	and
0:03:32	and
0:03:33	so we see some solutions of by using
0:03:36	pass phrase
0:03:38	a pass phrases
0:03:40	what three screens the
0:03:41	so we also collect
0:03:44	some proper database
0:03:46	i use the
0:03:47	two
0:03:48	improving in the past and those are all the speech data
0:03:51	verification systems
0:03:53	well also applied the
0:03:54	marty system combination
0:03:56	to form a improvement
0:03:58	the performance of the systems
0:04:06	no we go to
0:04:08	so you
0:04:09	a few hedge of communication
0:04:11	power
0:04:13	so in this
0:04:14	this a finger
0:04:16	so you
0:04:17	bussing that application of the usage of speaker communication so you can see that too
0:04:25	one is the
0:04:26	from the user as it
0:04:29	like six must the u s c n and
0:04:33	so the other part isn't the control not in a purely so this person is
0:04:38	to pass in communication
0:04:40	so we had you have to devise a
0:04:43	unlike what exactly device
0:04:45	and that use the first three seven and
0:04:50	thank initiate quality to the control synthesis and the control centres we applied the
0:04:56	by we present
0:04:57	for the c must and then so at this moment that's it must the so
0:05:03	speak tune the
0:05:04	what we talked is that
0:05:06	with his the name
0:05:08	speech is that
0:05:09	and this piece you transferred to look control panel and the control site also input
0:05:16	these the
0:05:17	speech is to speaker verification and use them for verification
0:05:22	so at
0:05:24	for example and the same time and the console on the also can beep as
0:05:28	a banana
0:05:30	speech is that
0:05:31	like
0:05:32	we present for certificate the identity
0:05:37	numbers of four
0:05:38	for
0:05:39	verification and we also
0:05:41	combined is to the netting and the idea is to can a
0:05:46	two
0:05:47	to improve this the verification performance
0:05:55	now
0:05:56	for
0:05:58	for
0:05:59	speaker verification
0:06:01	proposal
0:06:02	the nine hundred
0:06:03	correct that is because of the usage of speech you that are
0:06:08	alright
0:06:09	as shown here
0:06:11	facility this is a
0:06:13	j of communications speech you
0:06:15	has quite noisy
0:06:18	because i
0:06:19	i is recorded in
0:06:21	in on what development there is a in this environment this noise in baby c
0:06:26	and d noisy this the quite strong
0:06:29	another problem is that the is the a bunch and the
0:06:34	for verification
0:06:35	the open channel the means
0:06:39	then the channel probability can be norm
0:06:43	by
0:06:44	the speaker verification systems
0:06:47	so that so is the quite
0:06:49	ugh
0:06:51	so for this case the
0:06:53	we of course
0:06:53	quite big problems are for channel compensation
0:06:57	so we cannot use the question a
0:07:00	channel compensation
0:07:02	then enclosed
0:07:03	two
0:07:04	but we use the channel mismatch effects for example you cannot use jfa it can
0:07:10	channel factors the or even we cannot use appear at the a
0:07:15	channel factors a
0:07:16	for this the proposed so it is a ha
0:07:21	how difficult t
0:07:23	for this the project and not know why is that not be friend not those
0:07:28	and speech
0:07:29	speech is speech
0:07:31	that means the
0:07:33	during the
0:07:34	you don't then
0:07:35	and
0:07:36	yes speech you is recorded in of these development
0:07:39	so and is obviously but and
0:07:41	is the
0:07:42	up i
0:07:45	is apply applied
0:07:47	why
0:07:48	quite
0:07:49	that you element so
0:07:51	for on the one a test
0:07:53	environment that is the
0:07:55	in a six
0:07:56	so maybe there is this engine
0:07:59	so we sent is the so now engine
0:08:02	the speaker we have speech may be louder than in
0:08:06	in all these development
0:08:07	so also
0:08:08	well that's because speak to now maybe
0:08:12	this speak
0:08:14	speech is speaking we have be plastic
0:08:17	so not a
0:08:19	problem is that the channel frequency and imitation
0:08:22	with the usage of one we had to have a guy so you
0:08:27	and this whole spectrum
0:08:29	range that
0:08:30	for comparison
0:08:32	the first one
0:08:35	it's normal recording without you had you have
0:08:38	communication
0:08:40	and this one is the
0:08:41	recall that with we had you of china
0:08:44	so you can see
0:08:47	the high-frequency part is a sub present match and
0:08:51	and we know for speaker verification
0:08:54	the major
0:08:57	speaker features a
0:08:59	is in the high-frequency part so if this the information is not large
0:09:05	much so maybe
0:09:08	this is a speaker
0:09:09	but if based on performance the we have dropped and whatnot
0:09:16	known disco to the by energy
0:09:19	since the introduction
0:09:22	in this is systems
0:09:25	a bus or you know
0:09:27	all pass phrase based those speaker verification
0:09:30	systems the
0:09:31	this is the input to the g
0:09:34	subsystems a
0:09:36	with the
0:09:37	gmm-ubm but there is a twenty conversion to
0:09:42	systems the jfa and i-vector
0:09:44	because they a
0:09:45	gmms you audios and so they're having many problem and planted has a can be
0:09:50	shared each other so for example as a
0:09:53	the cash
0:09:56	generally
0:09:57	but ubm parameters a and they can share the supply sense that it occurs a
0:10:03	so
0:10:04	so on the proposed systems that
0:10:07	the computation complexity we have be drawn and table two is just reading
0:10:13	so we
0:10:15	with sony's one and then entice systems the is actually the fusion of the
0:10:22	cheese expensive
0:10:23	the fusion
0:10:25	calibration parameters the and the big
0:10:28	can be
0:10:29	changed by using but you a set of development database
0:10:34	and then finally we
0:10:37	with what we get
0:10:38	this goes from the combination of the
0:10:41	g systems
0:10:46	and then he we so
0:10:48	you
0:10:51	the pass phrase and three screens the
0:10:53	and is the verification
0:10:56	personally
0:10:57	for pass phrase and watering knitting is a
0:11:00	what each pass of phase the
0:11:02	of a speaker
0:11:03	we are here the
0:11:07	is the corresponding models and
0:11:09	for the modelling so
0:11:13	a certain that there are k plus phrases the for speaker i and then
0:11:18	we are
0:11:20	you're k plus place
0:11:22	model was and for this because the
0:11:25	so if speaker
0:11:29	say
0:11:30	for one to crying
0:11:32	to be as the speaker i and
0:11:38	with this the
0:11:39	pass phrase and all so we will
0:11:42	if this and autoseek ha
0:11:44	i
0:11:45	and all up to you compare although with the all these utterances all j at
0:11:53	all
0:11:53	and finally we get
0:11:55	that verification
0:11:57	scores no
0:12:02	we so
0:12:04	the database the
0:12:08	clustering phone this
0:12:11	point is if you had you have communication speaker verification
0:12:16	projects and
0:12:18	this database is it was still for parameter changing
0:12:21	presenter's the
0:12:22	they are used the
0:12:23	for ubm training and values for symmetry total variability in the tree i
0:12:32	in i-vector systems chaining
0:12:35	and also used for plp a chaining and i either used for
0:12:42	i can
0:12:44	eigenvoice the fact the eigenvoice the metric chanting
0:12:48	so
0:12:51	one this database and now from different
0:12:55	you minimum and
0:12:56	from different recording bayesian
0:12:59	presenter's a they can
0:13:01	in office environment and visit you had to have china
0:13:05	and so with different distances
0:13:09	between the recording
0:13:10	and receiving
0:13:13	and then we also collect son database and forum
0:13:18	by using d that setting all recordings
0:13:21	you obviously you've elements of for example
0:13:24	is i as are
0:13:26	pending for clean and also we
0:13:30	because you on what you're element is it to simulate the
0:13:33	no real reason is because a development set up for communication
0:13:39	speech
0:13:40	second speech database the recording with the we had you have
0:13:44	and here is the recording devices
0:13:49	like what we talking
0:13:51	mike and on the microphones and also i pay that is the mobile phone
0:13:58	and
0:14:03	so we have development
0:14:06	a real time
0:14:09	systems the phone this approach that's
0:14:11	i think you know we so
0:14:13	know how well components of the voice the biometric systems the how about
0:14:20	improve the computer
0:14:22	and
0:14:23	this usb song call
0:14:26	and
0:14:27	this of you had you have
0:14:29	has said that there is a walkie talkie here
0:14:32	for receiving and also for just meeting
0:14:36	and here we so the so well user interface and in this the survey into
0:14:42	of
0:14:43	user interface and not cheap regions
0:14:45	the first case i is used for any stray showing up registrations
0:14:51	and then
0:14:52	the second one is a for enrollment
0:14:55	pigeons the and the so one is for test patch pigeons and so
0:15:01	this has being updating find the
0:15:04	by using the idea is to go
0:15:07	test inside
0:15:09	on what
0:15:14	now we go to the performance evaluation
0:15:18	so we can see
0:15:22	and in this the
0:15:25	the pass phrases the
0:15:26	for the evaluation proposal
0:15:29	you know
0:15:31	one this purpose of the participate the we s p
0:15:34	one by one the name
0:15:35	the i b
0:15:37	and
0:15:38	no but repeat several times it
0:15:40	in different sets when i in different development with samples the in different should
0:15:47	so
0:15:49	so here we so
0:15:52	than the
0:15:53	the evaluation database and the development
0:15:56	when database it
0:15:58	the number of what goes the we use the phone this the performance evaluation
0:16:03	and the number of chaining and has a
0:16:07	utterances used of one these evaluation
0:16:09	also we so the true trier the number of trying to try
0:16:14	number of impostor trials
0:16:16	use the phone the evaluation
0:16:18	and we separate the
0:16:20	and not twenty speakers
0:16:23	participating for these the database that
0:16:26	recordings the and we separate and check
0:16:29	this is because the and
0:16:31	and ten speakers and
0:16:34	i'll four
0:16:35	one evaluation and for development purposes
0:16:38	and here
0:16:40	we also given that
0:16:42	the averaging durations the
0:16:44	for the name and for the i d
0:16:47	for
0:16:48	and also for next bus i d
0:16:50	and you can see the averaging
0:16:53	duration is about one point joe four
0:16:56	for the name i is the one point g six that
0:17:00	for all i e
0:17:01	and one then pass i the it can reach
0:17:04	two point four seconds
0:17:09	so
0:17:11	we so
0:17:12	the performance
0:17:13	these are in terms of eer and minimum dcf
0:17:19	for each of
0:17:20	the single system is
0:17:22	and the fusion designs and
0:17:24	and you can see
0:17:29	for things and disaster is always better than the single system the gmm
0:17:36	and jfa and i-vector
0:17:39	why i like to is the performed
0:17:42	is not so cool
0:17:44	as compared to add rosa
0:17:46	so actually
0:17:47	because the in i characteristics than the
0:17:50	we only encode to the
0:17:52	those the
0:17:54	channel information as aforesaid pen
0:17:57	in reality a this there's a
0:17:59	but these channel compensation
0:18:03	"'cause" that is right isn't
0:18:05	what is "'cause" consideration is not so single
0:18:09	for this is duration so
0:18:11	so maybe i you we all make
0:18:14	then the pierrette the a performance draw time
0:18:17	so
0:18:20	in ten so meeting mindcf we also
0:18:23	so the best performance is all that fusion and
0:18:29	so compare always in the second leading and single id than the implies i
0:18:35	performs a
0:18:36	better
0:18:37	then every
0:18:38	in every
0:18:40	systems a
0:18:41	single or
0:18:43	that there's and one
0:18:46	so here we also
0:18:50	can
0:18:51	the
0:18:52	the fusion
0:18:53	with the name process id
0:18:55	current
0:18:56	alright at
0:18:58	in a ten point one cheaper same but
0:19:00	of eer
0:19:02	so is the
0:19:04	quite good results and
0:19:06	we expect
0:19:10	you know from the second
0:19:12	perform an performance of with so here
0:19:15	with the
0:19:16	det plots for you had you have
0:19:19	then
0:19:20	i t and
0:19:21	then they about i the comparisons
0:19:24	so we can see
0:19:27	these the things and results the opportunity better than and the other subsystems the
0:19:34	for name for i b phone n-grams id
0:19:37	and also can see
0:19:38	banana
0:19:39	i e
0:19:40	then the performance that is quite good
0:19:49	now we go to
0:19:50	the conclusion on this the presentation we haven't introduces a
0:19:57	we have introduced a possibly bayes the text dependent speaker verification system
0:20:02	against the industrial
0:20:03	duration condition
0:20:05	we have
0:20:07	develop appears as is then consisting of gmm ubm jfa and i-vector
0:20:13	among then
0:20:14	the ubm and the stuff reasons they do they got astra
0:20:18	and according to the different conditions that between enrollment and but indication we
0:20:25	besides the suitable in these four
0:20:28	for parameter changing and find us
0:20:30	system setup
0:20:32	experimental results or that there's insisting gives the of one system or what any single
0:20:38	system
0:20:39	then
0:20:40	two point four second duration or like eer of that's and then chapter seven
0:20:47	this is my presentation sink
0:21:03	so
0:21:04	for this application i assume
0:21:07	and correct me if i'm wrong where is your operating space i assume that for
0:21:11	the most part if boats are coming in the most the time it's expected that
0:21:17	the right person is gonna be radio in
0:21:20	so you really care about so my correct and that you really care about the
0:21:24	the very low miss rate
0:21:27	is that correct you basically you care what region are you most such an extremely
0:21:32	low miss rate
0:21:35	we just terrible
0:21:38	the identity of a person's
0:21:40	right jet set operating point
0:21:43	right so for this scenario laid out boats coming in this in general so that
0:21:48	it is sense
0:21:49	i guess and try to get a sense like here and i think even a
0:21:51	lot of the text and then applications people are talking about where kind of the
0:21:56	low road
0:21:57	we're focusing on a different part of the debt curve that we would when it's
0:22:00	trying to find a low prior target in the dataset this actually maybe in the
0:22:05	hyper prior target the cost of changing a lot so
0:22:09	do you care what region you gave like equal error rates all that you have
0:22:13	an idea where you really care about operating the system where it's gonna make its
0:22:17	threshold
0:22:18	so
0:22:20	we may consider okay
0:22:22	we go to this
0:22:25	this
0:22:29	for this part
0:22:30	then
0:22:32	we can see
0:22:33	for this one
0:22:34	we may consider to use the automatic speech recognition machine use it to replace in
0:22:41	this control on the so that means that we can improve the total performance of
0:22:46	less is then so that
0:22:48	by total automatically
0:22:51	sue
0:22:51	zero in this the
0:22:53	automatically by or not
0:22:54	systems that and get
0:22:56	then the then the information of the
0:22:59	us because the and to the verification when this task as
0:23:04	then this
0:23:06	idea can improve in the total
0:23:09	performance
0:23:11	the
0:23:14	sorry
0:23:16	in some way
0:23:19	i'm interested in the communication part of fuel system you talk is entitled v h
0:23:27	if communication
0:23:29	we had to have communication is not very specific it simply means that the radio
0:23:35	frequency ranges between city and three hundred megahertz but there are many ways and many
0:23:41	different channel qualities and signal quality set you can transmit over v h if so
0:23:48	i think you implied that you use marine radial which is the
0:23:53	usually analogue and f m but not necessarily you can transmit the signal digitally
0:24:00	in many different modulation the channels and then i assume that you talking just about
0:24:07	the marine the walkie talkies but then in your list of databases you also mentioned
0:24:13	mobile phone data now mobile phone data is not either transmitted on v h f
0:24:21	no analog so i'm confused how you use that data in analysing your the range
0:24:29	channels
0:24:31	so from this that isn't why we choose and those them about mobile phone devices
0:24:37	because the we don't have enough
0:24:40	database to use the
0:24:42	with the did you have
0:24:44	a friend from the system changing so we haven't tried several times it
0:24:51	by we use
0:24:53	by discussing this ball database of button the performance of we have dropped so
0:25:00	we i in the sun
0:25:01	some that are from this a mobile device and recording
0:25:06	i in many
0:25:07	had of course so this is a
0:25:12	one is a consideration
0:25:15	we only based on
0:25:17	the experimental results
0:25:21	sink
0:25:29	formatting communication whole most cases it we only use the
0:25:36	this we had you have
0:25:38	like walkie talkie for communication
0:25:41	is a popular so is a suitable
0:25:43	for universal
0:25:44	six communication with the control panel
0:25:52	so it normally when you look at ship to ship or maureen type communications in
0:25:59	be modulation demodulation process
0:26:02	quite often in the in b d modulation
0:26:05	if the speech bandwidth is not shifted back to the right location to be an
0:26:11	offset in there
0:26:12	and so that distortion will actually introduce a lot of problem so you have to
0:26:16	kind of a cadre normalization or adjusting here
0:26:19	are you looking at real data when you're doing you're testing and if so what
0:26:24	is be the plan to kind of interest some of the other
0:26:28	problems of used to be christian analogue
0:26:32	v h after i
0:26:33	communications because i don't see you have listed

Text-Dependent Speaker Verification System in VHF Communication Channel

Text-dependent Speaker Recognition

Changhuai You, Kong Aik Lee, Bin Ma and Haizhou Li