0:00:15so i everyone the
0:00:17presentation is about
0:00:19speaker diarization
0:00:21and i would speak about
0:00:23ilp clustering
0:00:25what we introduce in the last addition of o t c
0:00:30because we add some improvements it was necessary
0:00:34i would speak about password a graph
0:00:41the in
0:00:43as a presentation will be first are we speak about the context in the ionisation
0:00:48architecture where you thing in your
0:00:51to show you where the ilp clustering is used
0:00:55and then i will show you what's wrong with this formulation the original one and
0:01:00then i would show you the graph
0:01:06the context is the same challenge as every spoke
0:01:11the hotel challenge so
0:01:13the goal was to i one was are you with that but the goal was
0:01:16to detect at any time in the during the video the
0:01:20who is speaking and who is busy but on the screen and we cited
0:01:26speaker diarization was just one of the sub task of the challenge
0:01:31so do in this paper and this the presentation and present result
0:01:37on the generally to seven search in this corpus it to put the duration of
0:01:42forty in our roles there is twenty eight tv shows recorded from french t v
0:01:49so it's broadcast news
0:01:52video broadcast news
0:01:53and what social while balanced between prepared and spontaneous speech
0:01:59so that it actual we used in the room
0:02:03it's the two-stage architectures there is a first
0:02:07segmentation part in clustering
0:02:10which give us the first segmentation so
0:02:14there is a
0:02:15secretary on segmentation followed by a clustering a viterbi re-segmentation
0:02:21and then we detect the speech nonspeech areas engenders
0:02:25so the first segmentation files
0:02:29each cluster
0:02:31contains the voice of only one speaker
0:02:34but several cluster can be
0:02:36related to a same speaker so we have two
0:02:40do another clustering
0:02:42that's where we used that's where we propose to use ilp clustering to replace the
0:02:48h a c
0:02:49a traditional clustering we used in speaker diarization
0:02:55so i will give you what about those two clustering because that we just give
0:02:59you some results in order to compare the
0:03:02if you can see in term of diarisation error rate
0:03:06so from the put
0:03:08big based segmentation
0:03:11we do here we can implement of clustering with a complete linkage we used the
0:03:16cross actually with ratio to estimate the similarities
0:03:20and the speaker cluster up
0:03:23modeled with
0:03:24but question mixture models so we used twelve
0:03:27and if she sees plus the energy we removed the channel contribution
0:03:33it was performed with a map adaptation on the two five six component ubm
0:03:39really basic colour
0:03:43and on the other side e i d
0:03:46so the clustering is expressed as an ilp problem the speaker cluster are modeled with
0:03:51i-vectors of sixty dimensionality so not that much
0:03:57we use
0:03:58and i ching mfcc the energy the first and second order derivatives we use as
0:04:03where the one so than twenty four ubm
0:04:06i-vector that avoid links normalize
0:04:10the training data we used came from the ester one french broadcast news dataset it
0:04:17a common evaluation campaign so this is desire sorry right you radio data
0:04:24and so we estimate the similarities between each i-vectors with a man database distance
0:04:31and so are we give you sorry the clustering we express it with the got
0:04:37it i linear programming
0:04:41which consist in
0:04:43gently minimize the number of cluster
0:04:47so there and the dissipation between the cluster
0:04:52as a constraint are just
0:04:55what one point two is to which is to say that we used in area
0:04:58variable so if
0:05:00a cluster g is that assigned to a center k
0:05:04it would be equal to one
0:05:07question one the tree is that to be to say that the clusters you have
0:05:12to be assigned to a single center k
0:05:16and then once the performance for the twenty sure that
0:05:20the center k selected if
0:05:22a cluster g is assigned to it
0:05:24and the last one is distance so the distance between two clusters ascended g a
0:05:31cluster gmm sent okay i've to be shorter
0:05:36but special
0:05:38and about the comparison of some results so i don't we cannot compare its because
0:05:44it's not the same
0:05:46acknowledges and mode a decision
0:05:48but what we have we see agency gmm we obtain the sixteen that twenty two
0:05:54diarization error rate
0:05:57we went down to fourteen that seven with the ilp clustering
0:06:02to this was done on the data are presented first
0:06:06so what's wrong in the site be formulation actually nothing is wrong it just
0:06:13we have to use an external solver two
0:06:18to obtain all clustering
0:06:20which uses
0:06:22mostly up most of them use the branch and bound algorithm which is general algorithm
0:06:27to determine what optimal solution of discrete programs
0:06:32and it's not depending on the added error
0:06:35i mean the complexities not
0:06:37but good
0:06:38it may result
0:06:40in a systematic enumeration of all the possible solution we are
0:06:44you know the to give you the optimal solution
0:06:46and so big problems made it to unreasonable processing duration
0:06:53so we have two
0:06:54in order to decrease the complexity of the solving we have two
0:07:00minimize the path the algorithm have to explore so to do that with the i
0:07:05it means we have to reduce the number of binary variables and constraints which are
0:07:13defined in the problem to be solved
0:07:16and because the distance between clusters i-vectors are computed
0:07:21before two
0:07:23define the ilp problem itself
0:07:26we already know which
0:07:29pair of i-vectors of cluster can be used because of the distance
0:07:34we already knows that
0:07:37the distance between
0:07:39each i-vectors i mean
0:07:42you less two
0:07:44to construct the big ilp clustering
0:07:47big at problem
0:07:48with all the variables
0:07:51while we can just uses the interesting one
0:07:56so we formulate the clustering by
0:08:03we use a subset of the
0:08:08set of clusters
0:08:10which correspond to the for each
0:08:13cluster g
0:08:15it correspond to all the possible values
0:08:19of k for which the distance are shorter than the threshold which is a very
0:08:24tended to mine
0:08:26so well we don't need anymore that cost rent
0:08:31so the problem
0:08:34lit to a reduction of in terms of number of been area variables and constraints
0:08:40so i took the
0:08:43we counted
0:08:45and the i p five which are submitted to the solver
0:08:51the number of binary variables and constraints and then i present for each show of
0:08:55the corpus and would i presented only the
0:08:58the statistics
0:09:00so the average in average will reduce from one thousand seven to fifty three cost
0:09:08and the number of constraints have been reduced from three thousand four
0:09:14two fifty tree as weighted so
0:09:17the diarization error rate didn't
0:09:19change it's
0:09:20it just a re formulation of the problem in order to decrease the complexity of
0:09:26the sorting process
0:09:30and so
0:09:32because we reduced a lot is the number of variables and
0:09:36and the constraint
0:09:38we can to think about
0:09:40us graph speaker clustering so that the representation of
0:09:46so when using metrics distance which associate the distance between each cluster
0:09:52it can be interpreted as a connected graph so the clusters are represented by the
0:09:57note and the distance by the ages
0:10:00and second easy representation of the original ilp formulation which is complex
0:10:07with all the
0:10:11and i
0:10:13we can
0:10:14if we decompose that graph into
0:10:18connected component
0:10:20by removing the edges which are long as a threshold delta
0:10:25we obtain several connected component which can which constitute independent subproblems so we can process
0:10:33those components separately
0:10:36instead of doing a big clustering we just
0:10:39therefore some
0:10:42small clustering which are much more three details
0:10:44and as you can see there is some
0:10:48cluster we don't have to be processed
0:10:50because the solution is abuse
0:10:52even that one
0:11:00instead of
0:11:02doing an ilp clustering
0:11:04or whatever the clustering is but we use i give it a jesse's find as
0:11:12we actually
0:11:14look for the abuse centers which can be formulated as the search for star graph
0:11:22components so star graph it just the kind of trees
0:11:27three sorry which is composed of one central node then
0:11:31many a set the number of live
0:11:34just the one
0:11:35that level
0:11:38it's real easy to find
0:11:40so i mean it's fourteen and
0:11:42so there is
0:11:43obvious solution all of those don't have to be process it with clustering algorithm
0:11:52but there are some more complex sub components like that one
0:11:56or we still need to two
0:12:00to use a clustering algorithm in order to have the optimal solution
0:12:06so we did it with the i p of course compared
0:12:11as a result of the previous
0:12:15slide i mean the with a reduction of the number of a but it cost
0:12:20on the right is the one with
0:12:23star graph a connected component search on which the ilp clustering is used only to
0:12:29process the complex
0:12:31sub components
0:12:33so it is reduced to fifty three toward most seven in average and
0:12:39the minimum is zero it means that some of the shows
0:12:42didn't presents it at
0:12:45complex sub components so
0:12:48on these that
0:12:50only by finding the start subgraph we with all so e
0:12:55clustering problem
0:12:58and so we were questioning about the interest of the clustering method to process the
0:13:09because on the eight
0:13:11of the eight twenty eight shows which compose the corpus
0:13:15web present t souls complex connected components
0:13:19so we tried to do it without any clustering process
0:13:24so that was two strategies and low clustering where
0:13:29nothing is done with the complex component which just say okay we have a complex
0:13:33subcomponents just let it like that and the others the what single cluster strategy is
0:13:39the opposite we merge all
0:13:41of the cup of the look sorry all the cluster of a complex component into
0:13:46a single cluster
0:13:50it appears that
0:13:52well so no clustering strategy when the thing is done is a don't present interesting
0:13:57result but
0:13:59if we look each
0:14:00on the ad the
0:14:03z are also good results the best result we have for each threshold
0:14:09star graph
0:14:11by and minutes of merging of the all the cluster the complex component give better
0:14:18the one with an ilp clustering because of this ratio
0:14:21but we still better to use
0:14:24a clustering method to have the really on optimal values because of the processing of
0:14:30the complex sub components
0:14:33but what we can say is
0:14:38where i don't i should have i but the diarisation on the rights we add
0:14:43with the agency approach using gmms we at sixteen that twenty two percent so it's
0:14:52z a star graph approach with and a clustering algorithm to process the complex sub
0:14:58components give better diarization error rate
0:15:01so it's almost all look clustering process
0:15:07that's a conclusion so we
0:15:10we formulate the ip in order to reduce the complexity of the serving processing
0:15:15the reason no interference and diarization error rate
0:15:18and then we expose the clustering as a graph exploration which can which'll
0:15:24the system to split
0:15:27the clustering problem into several independent subproblems and can be used to search for star
0:15:32graph connected component
0:15:35the star graph collect the star graph up rush euros
0:15:41solve almost the entire problem but it's to professor able to use
0:15:47and clustering algorithm in order to process the complex sub components
0:15:54some clustering algorithm have already been studied
0:15:58to do that
0:15:59graph with a graph approach women
0:16:01but we find that id give better result than the agency approach which was the
0:16:07conclusion of the odours
0:16:10and we have some
0:16:18i performed an experiment on the
0:16:20when large corpora it's not to read is that large but one hundred dolls so
0:16:25i to the segmentation five from the be at clustering about several and then i
0:16:31i do would be a big clustering ilp clustering on that
0:16:35so it represent a clustering with something like a bit more than four thousand a
0:16:40speaker cluster
0:16:41and i compared to duration of the i t so the original one from two
0:16:47two hours to be to be done as a re formulation to con the i'd
0:16:53and the graph approach to
0:16:55only five
0:16:57this is clustering included the time and required to compute the distance between each clusters
0:17:04as the definition of the problem and the solving
0:17:08well i think most of the time they're dispense to estimate similarities between clusters
0:17:20would be my last night
0:17:37that's i have to remarks first it's quite
0:17:42normal to conclude that eurostar algorithm is able to
0:17:49a graph according to say about a sort of by itself you clustering problem because
0:17:53you achy call initial allegory is the graph clustering of going
0:17:59it's just a different version it could be inter would be interesting to compile a
0:18:04in term of we have simulation in terms of refs a rough jewelry europe which
0:18:11is directly
0:18:13the second point are remark i'm we could be disappointed after two euros with the
0:18:17ilp to see that
0:18:20various t-norm really improvement in them of all using ilp
0:18:27because you have less
0:18:29you have more or you are not taking only decision like in your article a
0:18:34clustering so we could expect
0:18:36to have also improvement performance
0:18:39us can i agree with you and well the ilp is not the solution of
0:18:43the clustering when
0:18:45we use it
0:18:47to perform clustering on the big beta and it's
0:18:52almost because what i want to say is
0:18:56processing duration is really
0:18:59interesting compared to the edges e one
0:19:03well i think it will still fail with a huge amount so that the i
0:19:09mean sows on of house i never tried but i think that would be some
0:19:13so e
0:19:16h i c i think will be
0:19:20we can do the job but it will take time
0:19:25but we did that the improvement from eight to what the number of constraints and
0:19:30viable is really mean nothing but we have
0:19:33to add it's because is
0:19:38essential i mean to process
0:19:55so and i wasn't is then you look "'cause"
0:19:58i was just a big static the channels and
0:20:01and i wanted to
0:20:03to try to apply but the fact that is the nine hundred this that need
0:20:08something data to
0:20:10compute the covariance matrix
0:20:12sorry could be i mean
0:20:14i what dialects but that the channels
0:20:18but the i-vector challenge we don't have the training data which is the not the
0:20:22case for the to compute them hunters this
0:20:26in the not just as when we actually
0:20:31i haven't as a slight but that's not published result but we switch we're using
0:20:35now i-vectors of three hundred dimensionality
0:20:38and we stopped using man database we use the key idea scoring another to that
0:20:43was my mind compare that's
0:20:45much more
0:20:47we have better results and does not