Speech Transcript - Partial AUC Metric Learning Based Speaker Verification Back-End

0:00:26	hi everyone i
0:00:28	moneymaking sponsored by
0:00:29	i come from the null suppressed and only technical university over time
0:00:34	is that we deal is a presentation on my paper
0:00:38	although word for work or
0:00:40	workshop of the odyssey two so
0:00:42	to sound and the twenty
0:00:46	now that speaking
0:00:47	the title of this paper these partial using metric learning best a speaker verification back
0:00:53	end
0:00:54	in other wars
0:00:56	this paper proposed a shallow match learning back end algorithm both speaker verification
0:01:20	okay i will present it from this for aspects
0:01:24	including as the metric learning and of the motivation
0:01:28	the proposed objective function
0:01:31	some experimental results
0:01:34	and the and last i will give some conclusions
0:01:38	and i will also introduce several all of that works
0:01:41	this paper and do
0:01:43	our future plans
0:01:48	first
0:01:49	the maxent learning and the motivation
0:01:55	and illustrated in the title i thing well i can't on their these two questions
0:02:01	the motivation of this paper we are equally
0:02:06	the first one is what at the automatic learning and what i've we proposed a
0:02:12	metric learning passed back end algorithm
0:02:22	the mac learning em's to learn distance function to matters the similarity of them both
0:02:28	third and the mahalanobis distance
0:02:32	both speaker verification as displayed in the right speaker of this slide
0:02:39	we first extract it is speaker identity features problems what i'm she's by a front
0:02:45	and the speaker feature extractor
0:02:48	but and the i-vector of the extractor
0:02:52	and the thing we feed them to the metric learning past the back end to
0:02:57	calculate the here
0:02:59	similar just goals
0:03:02	for the learning of the metrics
0:03:05	we
0:03:07	employed a loss function best on the optimisation of the actual use the as displayed
0:03:13	in select speaker of this slide
0:03:22	follows them actually learning i thing the first other one g h e that's the
0:03:27	challenge of as a distance function is a consistent with the evaluation procedure
0:03:33	therefore it back into can directly optimize the
0:03:38	tom evaluation metrics the for speaker verification
0:03:42	such as the equal the rats the life use the
0:03:47	and style
0:03:50	thank and eat can be easily combined to these
0:03:56	accents front ends for them both the i-vector of the x better
0:04:04	third this channel matched learn a matter that can be easily extended to choose the
0:04:09	and to and the pram work
0:04:18	the second requesting i needed to uncertainties
0:04:21	what is the partial a use the
0:04:24	and the
0:04:25	why was them metric learning back end aims at its optimising
0:04:30	actually use the
0:04:38	in the
0:04:39	left finger of this slide
0:04:42	the power to use the divine and or small part of what re on there
0:04:47	is a all c call
0:04:49	like
0:04:50	this correct re
0:04:53	vol
0:04:54	the metric learning can directly optimize thumb evaluation metrics
0:04:59	its implementation fess these
0:05:01	some difficulties
0:05:05	as we all know
0:05:06	we needed to "'cause" tried to peer wise all triple edge chanting trials with speaker-level
0:05:11	labels to change is this function
0:05:15	in matched learning
0:05:17	in this edition
0:05:19	the number of all possible training trials
0:05:22	e is very large
0:05:24	besides many easily distinguishable channels unnecessary to the challenge of the distance function
0:05:32	in terms of these difficulties
0:05:35	i think
0:05:36	the optimisation of the pa use the has the
0:05:40	pointing to the ones you jeez
0:05:44	first
0:05:45	it is easy to select the difficulty samples by cindy a two
0:05:51	the overall
0:05:54	and the we'd have to
0:05:58	relative small value
0:06:00	in this to be
0:06:01	we can also progress the number of the
0:06:05	ct of the training trials
0:06:08	second we can optimize them interested the partial use the according to some specific applications
0:06:16	and obviously
0:06:17	a to z is a special case of partial using
0:06:27	next
0:06:28	in the centre part of your express the bedding comparing the impulse of the proposed
0:06:34	algorithm
0:06:44	in this slide i will introduce the whole to calculate to the partial use the
0:06:51	and i health and metric learning need to construct pairwise trials
0:06:57	here we don't see the whole to construct them
0:07:00	and the be the in that
0:07:03	t is an hour a day constructed this there'd
0:07:06	here x and y n
0:07:10	speaker features over two speech segments
0:07:14	our is the year round to choose level
0:07:17	you they come from of them speaker
0:07:19	l a equal one
0:07:21	otherwise i l and you quote the are able
0:07:26	besides the function of s
0:07:29	is use the to calculate the similarity
0:07:32	of two speaker features
0:07:35	here we used to the mahalanobis distance function
0:07:40	no creativity the level l had can be obtained by a comparison of the distances
0:07:47	calls
0:07:48	as a
0:07:50	and the is the threshold receiver
0:07:55	given a fixed the value of the hot we i about to compute to posterior
0:08:00	at t p r
0:08:04	and to
0:08:07	post
0:08:08	positive rats f p r
0:08:13	boundary of the hobby can get a theories o t p and the f b
0:08:18	r
0:08:19	which one
0:08:20	and are of the call
0:08:22	and the role in the speaker
0:08:27	and to really optimize the entire
0:08:32	optimising the optimize the entire roc call if an actual follows
0:08:39	you were this is not only costly but also unnecessary
0:08:43	because in most practical system
0:08:46	work
0:08:47	and only practical
0:08:52	because the most of practical systems
0:08:55	work
0:08:56	and the part of their our roc curves
0:09:02	walking them whole
0:09:04	back security system you're leave equalized smaller force posterior rats
0:09:09	in contrast
0:09:11	terrorist the detector system always hopes
0:09:15	we in
0:09:16	hyper record react
0:09:21	so without optimize the partial use the your the walk imports look at it is
0:09:27	a better choice
0:09:32	in this light
0:09:34	t even though constructed up here was trained if that's
0:09:38	key and do a
0:09:41	the positive and negative subset of t
0:09:45	then be needed to compute a new stuff that and the or
0:09:52	vol
0:09:54	from by eating that they'll
0:09:57	can stress of that's the value of p r is peachy
0:10:03	are far and the beta
0:10:05	you order to compute and the oral we first needed to thank you lance our
0:10:10	and the be higher but this formula
0:10:15	then all this values of connectives that
0:10:22	so
0:10:23	sorted in ascending order
0:10:25	and then e
0:10:26	and the overall he's is selected as a subset of the samples under the problems
0:10:31	at all
0:10:33	i was to be fast position of the result you discourse
0:10:39	after obtaining the overall
0:10:42	p a use the can be calculated and all
0:10:45	normalized
0:10:46	it was the
0:10:48	or p
0:10:49	and the and they are well
0:10:57	in respectively
0:11:00	the partial if the is calculated by they'll
0:11:04	that can a full meal or
0:11:05	of this light
0:11:07	you
0:11:09	all i
0:11:10	is an indicator function so directory optimising this formula is np-hard therefore we needed to
0:11:17	relax eight in the best if agree
0:11:20	elias there's no
0:11:22	here use the calculation function by replacing the indicator function v is a huge loss
0:11:28	function
0:11:32	here
0:11:33	third time is eligible hyper parameter and the it is larger than the oral
0:11:40	the
0:11:41	last from lord give of the relaxed the loss function
0:11:48	to prevent
0:11:50	it to bremen to this
0:11:53	loss function
0:11:54	or feed into the training data be also
0:11:57	indeed regular
0:11:59	not addition term
0:12:01	the land that all mean a
0:12:04	to the minimization problem
0:12:08	finally
0:12:09	this green part in large as the between-class distance
0:12:13	and this read the patch
0:12:16	try to minimize no between-class variance
0:12:19	in awards our objective function ends
0:12:23	and
0:12:24	enlarging of each he'd marketing
0:12:26	been to use the
0:12:28	pasta you and in
0:12:29	negative trials by minimizing they'll sitting at the various
0:12:35	of the two colours trials simultaneously
0:12:42	in the third part i go give some experimental results
0:12:47	this lighted display our experimental it's easiness
0:12:52	more details can be bounded in the paper
0:12:57	this paper
0:12:58	this table lists no comparison results on the conscience that's the data set
0:13:07	it is then that's of the proposed
0:13:10	pa use them actually it's better performance than p lda
0:13:14	given both the i-vector and the expected front ends
0:13:19	specifically the pac p a using them actually over ten s
0:13:24	not persons and to twenty percent relative improvement over p lda
0:13:30	in terms of the
0:13:32	pa use the and it was the
0:13:34	actually
0:13:38	respectively
0:13:40	no worry
0:13:41	it achieves models that eleven percent relative eer reduction
0:13:49	and five percent
0:13:51	relative this the effort reduction over p lda
0:13:58	table two at least the results on the core task
0:14:03	the s i t w data that is that
0:14:05	it is thing that's the problem lost
0:14:08	p a using matching it's better performance than p lda
0:14:13	specifically but the x factor front and is used
0:14:18	pa using matching achieve some of them
0:14:21	eight percent
0:14:23	relative pa use the
0:14:25	an improvement all work p l d a t
0:14:28	if the
0:14:29	it is also
0:14:30	of a tent
0:14:31	no then
0:14:32	twenty percent and the channel or since about it you a was the improvements on
0:14:37	the development and evaluation call tasks respectively
0:14:43	moreover it achieves
0:14:46	ten percent relative eer reduction and the
0:14:49	three percent relative dcf
0:14:52	reduction
0:14:53	although the performance improvement to be though
0:14:56	i-vector front end is not still significant
0:15:00	and that the extract a front end
0:15:03	the tense with different a front ends are consistent
0:15:10	this page displayed as some experimental results
0:15:15	bid i use the two analysis the if at all hyper parameters hopefulness
0:15:21	we adopt e d
0:15:22	read the source to study the impact of the values of common enemy performance
0:15:29	in the
0:15:31	a vector
0:15:32	yes
0:15:33	from these two tables bank and the data does double working region is quite large
0:15:42	this fink or souls the relative performance improvements all work p lda
0:15:48	in terms of the difference
0:15:50	of different adored
0:15:52	in the objective function
0:15:55	from this finger be fine this dances the pa use them actually is a robust
0:16:01	e o by the advantage of is the best value around do one point two
0:16:05	five
0:16:14	finally
0:16:15	i will give some conclusions and the introduced several for the works as you and
0:16:21	of our future plans
0:16:33	in this paper
0:16:35	mahalanobis distance past them magical learning back end is proposed to optimize partial a use
0:16:42	the both speaker verification
0:16:47	because directly optimize thing
0:16:50	partial you the at and b heart
0:16:53	be relaxed aid by a huge loss function
0:16:56	experimental results
0:16:58	carried out of the
0:17:00	nist is a risky and data
0:17:02	s i t w that have that's
0:17:05	that must just as the effectiveness of our proposed algorithm
0:17:14	after this work we also mad the general done normalization
0:17:20	and to compress the analysis
0:17:23	to the pac metric
0:17:27	we show me
0:17:29	published as the
0:17:30	without relative without
0:17:33	in this paper
0:17:37	besides
0:17:38	we also extended the extended to the
0:17:42	pa is the magic to an energy and the framework
0:17:51	more information can be found in this too
0:17:54	more information can be found in this paper
0:18:00	in the theatre
0:18:03	maybe all research more general mexican and best the speaker verification or rhythm
0:18:08	to optimize
0:18:11	evaluation metrics
0:18:13	in order to
0:18:15	further improve speaker verification performance
0:18:23	that all from my presentation
0:18:26	thank you for your watching

Partial AUC Metric Learning Based Speaker Verification Back-End

Speaker Recognition 2

Zhongxin Bai, Xiao-Lei Zhang, Jingdong Chen