Speech Transcript - Recent Improvements on ILP-based Clustering for Broadcast News Speaker Diarization

0:00:15	so i everyone the
0:00:17	presentation is about
0:00:19	speaker diarization
0:00:21	and i would speak about
0:00:23	ilp clustering
0:00:25	what we introduce in the last addition of o t c
0:00:30	because we add some improvements it was necessary
0:00:34	i would speak about password a graph
0:00:36	clustering
0:00:39	so
0:00:41	the in
0:00:43	as a presentation will be first are we speak about the context in the ionisation
0:00:48	architecture where you thing in your
0:00:51	to show you where the ilp clustering is used
0:00:55	and then i will show you what's wrong with this formulation the original one and
0:01:00	then i would show you the graph
0:01:03	clustering
0:01:05	so
0:01:06	the context is the same challenge as every spoke
0:01:11	the hotel challenge so
0:01:13	the goal was to i one was are you with that but the goal was
0:01:16	to detect at any time in the during the video the
0:01:20	who is speaking and who is busy but on the screen and we cited
0:01:25	and
0:01:26	speaker diarization was just one of the sub task of the challenge
0:01:31	so do in this paper and this the presentation and present result
0:01:37	on the generally to seven search in this corpus it to put the duration of
0:01:42	forty in our roles there is twenty eight tv shows recorded from french t v
0:01:48	channels
0:01:49	so it's broadcast news
0:01:52	video broadcast news
0:01:53	and what social while balanced between prepared and spontaneous speech
0:01:59	so that it actual we used in the room
0:02:03	it's the two-stage architectures there is a first
0:02:07	segmentation part in clustering
0:02:10	which give us the first segmentation so
0:02:14	there is a
0:02:15	secretary on segmentation followed by a clustering a viterbi re-segmentation
0:02:21	and then we detect the speech nonspeech areas engenders
0:02:25	so the first segmentation files
0:02:29	each cluster
0:02:31	contains the voice of only one speaker
0:02:34	but several cluster can be
0:02:36	related to a same speaker so we have two
0:02:40	do another clustering
0:02:42	that's where we used that's where we propose to use ilp clustering to replace the
0:02:48	h a c
0:02:49	a traditional clustering we used in speaker diarization
0:02:55	so i will give you what about those two clustering because that we just give
0:02:59	you some results in order to compare the
0:03:02	if you can see in term of diarisation error rate
0:03:06	so from the put
0:03:08	big based segmentation
0:03:11	we do here we can implement of clustering with a complete linkage we used the
0:03:16	cross actually with ratio to estimate the similarities
0:03:20	and the speaker cluster up
0:03:23	modeled with
0:03:24	but question mixture models so we used twelve
0:03:27	and if she sees plus the energy we removed the channel contribution
0:03:33	it was performed with a map adaptation on the two five six component ubm
0:03:39	really basic colour
0:03:41	clustering
0:03:43	and on the other side e i d
0:03:46	so the clustering is expressed as an ilp problem the speaker cluster are modeled with
0:03:51	i-vectors of sixty dimensionality so not that much
0:03:57	we use
0:03:58	and i ching mfcc the energy the first and second order derivatives we use as
0:04:03	where the one so than twenty four ubm
0:04:06	i-vector that avoid links normalize
0:04:10	the training data we used came from the ester one french broadcast news dataset it
0:04:16	was
0:04:17	a common evaluation campaign so this is desire sorry right you radio data
0:04:24	and so we estimate the similarities between each i-vectors with a man database distance
0:04:31	and so are we give you sorry the clustering we express it with the got
0:04:37	it i linear programming
0:04:40	sorry
0:04:41	which consist in
0:04:43	gently minimize the number of cluster
0:04:47	so there and the dissipation between the cluster
0:04:52	as a constraint are just
0:04:55	what one point two is to which is to say that we used in area
0:04:58	variable so if
0:05:00	a cluster g is that assigned to a center k
0:05:04	it would be equal to one
0:05:07	question one the tree is that to be to say that the clusters you have
0:05:12	to be assigned to a single center k
0:05:16	and then once the performance for the twenty sure that
0:05:20	the center k selected if
0:05:22	a cluster g is assigned to it
0:05:24	and the last one is distance so the distance between two clusters ascended g a
0:05:31	cluster gmm sent okay i've to be shorter
0:05:34	one
0:05:36	but special
0:05:38	and about the comparison of some results so i don't we cannot compare its because
0:05:44	it's not the same
0:05:46	acknowledges and mode a decision
0:05:48	but what we have we see agency gmm we obtain the sixteen that twenty two
0:05:54	diarization error rate
0:05:56	and
0:05:57	we went down to fourteen that seven with the ilp clustering
0:06:02	to this was done on the data are presented first
0:06:06	so what's wrong in the site be formulation actually nothing is wrong it just
0:06:12	that
0:06:13	we have to use an external solver two
0:06:18	to obtain all clustering
0:06:20	which uses
0:06:22	mostly up most of them use the branch and bound algorithm which is general algorithm
0:06:27	to determine what optimal solution of discrete programs
0:06:32	and it's not depending on the added error
0:06:35	i mean the complexities not
0:06:37	but good
0:06:38	it may result
0:06:40	in a systematic enumeration of all the possible solution we are
0:06:44	you know the to give you the optimal solution
0:06:46	and so big problems made it to unreasonable processing duration
0:06:53	so we have two
0:06:54	in order to decrease the complexity of the solving we have two
0:07:00	minimize the path the algorithm have to explore so to do that with the i
0:07:04	p
0:07:05	it means we have to reduce the number of binary variables and constraints which are
0:07:13	defined in the problem to be solved
0:07:16	and because the distance between clusters i-vectors are computed
0:07:21	before two
0:07:23	define the ilp problem itself
0:07:26	we already know which
0:07:29	pair of i-vectors of cluster can be used because of the distance
0:07:34	we already knows that
0:07:37	the distance between
0:07:39	each i-vectors i mean
0:07:41	so
0:07:42	you less two
0:07:44	to construct the big ilp clustering
0:07:47	big at problem
0:07:48	with all the variables
0:07:51	while we can just uses the interesting one
0:07:56	so we formulate the clustering by
0:08:02	what
0:08:03	we use a subset of the
0:08:06	or
0:08:08	set of clusters
0:08:10	which correspond to the for each
0:08:13	cluster g
0:08:15	it correspond to all the possible values
0:08:19	of k for which the distance are shorter than the threshold which is a very
0:08:24	tended to mine
0:08:26	so well we don't need anymore that cost rent
0:08:29	and
0:08:31	so the problem
0:08:34	lit to a reduction of in terms of number of been area variables and constraints
0:08:40	so i took the
0:08:43	we counted
0:08:45	and the i p five which are submitted to the solver
0:08:51	the number of binary variables and constraints and then i present for each show of
0:08:55	the corpus and would i presented only the
0:08:58	the statistics
0:09:00	so the average in average will reduce from one thousand seven to fifty three cost
0:09:06	variables
0:09:08	and the number of constraints have been reduced from three thousand four
0:09:14	two fifty tree as weighted so
0:09:17	the diarization error rate didn't
0:09:19	change it's
0:09:20	it just a re formulation of the problem in order to decrease the complexity of
0:09:26	the sorting process
0:09:30	and so
0:09:32	because we reduced a lot is the number of variables and
0:09:36	and the constraint
0:09:38	we can to think about
0:09:40	us graph speaker clustering so that the representation of
0:09:46	so when using metrics distance which associate the distance between each cluster
0:09:52	it can be interpreted as a connected graph so the clusters are represented by the
0:09:57	note and the distance by the ages
0:10:00	and second easy representation of the original ilp formulation which is complex
0:10:07	with all the
0:10:08	distance
0:10:11	and i
0:10:13	so
0:10:13	we can
0:10:14	if we decompose that graph into
0:10:18	connected component
0:10:20	by removing the edges which are long as a threshold delta
0:10:25	we obtain several connected component which can which constitute independent subproblems so we can process
0:10:33	those components separately
0:10:36	instead of doing a big clustering we just
0:10:39	therefore some
0:10:42	small clustering which are much more three details
0:10:44	and as you can see there is some
0:10:48	cluster we don't have to be processed
0:10:50	because the solution is abuse
0:10:52	even that one
0:10:59	so
0:11:00	instead of
0:11:02	doing an ilp clustering
0:11:04	or whatever the clustering is but we use i give it a jesse's find as
0:11:08	well
0:11:12	we actually
0:11:14	look for the abuse centers which can be formulated as the search for star graph
0:11:22	components so star graph it just the kind of trees
0:11:27	three sorry which is composed of one central node then
0:11:31	many a set the number of live
0:11:34	just the one
0:11:35	that level
0:11:38	it's real easy to find
0:11:40	so i mean it's fourteen and
0:11:42	so there is
0:11:43	obvious solution all of those don't have to be process it with clustering algorithm
0:11:52	but there are some more complex sub components like that one
0:11:56	or we still need to two
0:12:00	to use a clustering algorithm in order to have the optimal solution
0:12:06	so we did it with the i p of course compared
0:12:11	as a result of the previous
0:12:15	slide i mean the with a reduction of the number of a but it cost
0:12:18	trends
0:12:19	and
0:12:20	on the right is the one with
0:12:23	star graph a connected component search on which the ilp clustering is used only to
0:12:29	process the complex
0:12:31	sub components
0:12:33	so it is reduced to fifty three toward most seven in average and
0:12:39	the minimum is zero it means that some of the shows
0:12:42	didn't presents it at
0:12:45	complex sub components so
0:12:48	on these that
0:12:50	only by finding the start subgraph we with all so e
0:12:55	clustering problem
0:12:58	and so we were questioning about the interest of the clustering method to process the
0:13:06	complex
0:13:08	components
0:13:09	because on the eight
0:13:11	of the eight twenty eight shows which compose the corpus
0:13:15	web present t souls complex connected components
0:13:19	so we tried to do it without any clustering process
0:13:24	so that was two strategies and low clustering where
0:13:29	nothing is done with the complex component which just say okay we have a complex
0:13:33	subcomponents just let it like that and the others the what single cluster strategy is
0:13:39	the opposite we merge all
0:13:41	of the cup of the look sorry all the cluster of a complex component into
0:13:46	a single cluster
0:13:49	an
0:13:50	it appears that
0:13:52	well so no clustering strategy when the thing is done is a don't present interesting
0:13:57	result but
0:13:59	if we look each
0:14:00	on the ad the
0:14:03	z are also good results the best result we have for each threshold
0:14:09	star graph
0:14:10	research
0:14:11	by and minutes of merging of the all the cluster the complex component give better
0:14:15	results
0:14:17	land
0:14:18	the one with an ilp clustering because of this ratio
0:14:21	but we still better to use
0:14:24	a clustering method to have the really on optimal values because of the processing of
0:14:30	the complex sub components
0:14:33	but what we can say is
0:14:38	where i don't i should have i but the diarisation on the rights we add
0:14:43	with the agency approach using gmms we at sixteen that twenty two percent so it's
0:14:52	z a star graph approach with and a clustering algorithm to process the complex sub
0:14:58	components give better diarization error rate
0:15:01	so it's almost all look clustering process
0:15:05	at
0:15:07	so
0:15:07	that's a conclusion so we
0:15:10	we formulate the ip in order to reduce the complexity of the serving processing
0:15:15	the reason no interference and diarization error rate
0:15:18	and then we expose the clustering as a graph exploration which can which'll
0:15:24	the system to split
0:15:27	the clustering problem into several independent subproblems and can be used to search for star
0:15:32	graph connected component
0:15:35	the star graph collect the star graph up rush euros
0:15:41	solve almost the entire problem but it's to professor able to use
0:15:47	and clustering algorithm in order to process the complex sub components
0:15:54	some clustering algorithm have already been studied
0:15:58	to do that
0:15:59	graph with a graph approach women
0:16:01	but we find that id give better result than the agency approach which was the
0:16:07	conclusion of the odours
0:16:10	and we have some
0:16:14	so
0:16:18	i performed an experiment on the
0:16:20	when large corpora it's not to read is that large but one hundred dolls so
0:16:25	i to the segmentation five from the be at clustering about several and then i
0:16:31	i do would be a big clustering ilp clustering on that
0:16:35	so it represent a clustering with something like a bit more than four thousand a
0:16:40	speaker cluster
0:16:41	and i compared to duration of the i t so the original one from two
0:16:46	years
0:16:47	two hours to be to be done as a re formulation to con the i'd
0:16:52	units
0:16:53	and the graph approach to
0:16:55	only five
0:16:56	so
0:16:57	this is clustering included the time and required to compute the distance between each clusters
0:17:04	as the definition of the problem and the solving
0:17:08	well i think most of the time they're dispense to estimate similarities between clusters
0:17:18	what
0:17:19	that
0:17:20	would be my last night
0:17:23	section
0:17:37	that's i have to remarks first it's quite
0:17:42	normal to conclude that eurostar algorithm is able to
0:17:49	a graph according to say about a sort of by itself you clustering problem because
0:17:53	you achy call initial allegory is the graph clustering of going
0:17:58	so
0:17:59	it's just a different version it could be inter would be interesting to compile a
0:18:04	in term of we have simulation in terms of refs a rough jewelry europe which
0:18:11	is directly
0:18:13	the second point are remark i'm we could be disappointed after two euros with the
0:18:17	ilp to see that
0:18:20	various t-norm really improvement in them of all using ilp
0:18:27	because you have less
0:18:29	you have more or you are not taking only decision like in your article a
0:18:34	clustering so we could expect
0:18:36	to have also improvement performance
0:18:39	us can i agree with you and well the ilp is not the solution of
0:18:43	the clustering when
0:18:45	we use it
0:18:47	to perform clustering on the big beta and it's
0:18:52	almost because what i want to say is
0:18:56	processing duration is really
0:18:59	interesting compared to the edges e one
0:19:03	well i think it will still fail with a huge amount so that the i
0:19:09	mean sows on of house i never tried but i think that would be some
0:19:13	so e
0:19:14	in
0:19:16	h i c i think will be
0:19:20	we can do the job but it will take time
0:19:25	but we did that the improvement from eight to what the number of constraints and
0:19:30	viable is really mean nothing but we have
0:19:33	to add it's because is
0:19:36	was
0:19:38	essential i mean to process
0:19:41	data
0:19:55	so and i wasn't is then you look "'cause"
0:19:58	i was just a big static the channels and
0:20:01	and i wanted to
0:20:03	to try to apply but the fact that is the nine hundred this that need
0:20:08	something data to
0:20:10	compute the covariance matrix
0:20:12	sorry could be i mean
0:20:14	i what dialects but that the channels
0:20:18	but the i-vector challenge we don't have the training data which is the not the
0:20:22	case for the to compute them hunters this
0:20:26	in the not just as when we actually
0:20:31	i haven't as a slight but that's not published result but we switch we're using
0:20:35	now i-vectors of three hundred dimensionality
0:20:38	and we stopped using man database we use the key idea scoring another to that
0:20:43	was my mind compare that's
0:20:45	much more
0:20:47	we have better results and does not
0:20:49	thanks
0:20:53	thanks

Recent Improvements on ILP-based Clustering for Broadcast News Speaker Diarization

Speaker Diarization

Grégor Dupuy, Sylvain Meignier, Paul Deléglise and Yannick Estève