0:00:15 | so i everyone the |
---|---|

0:00:17 | presentation is about |

0:00:19 | speaker diarization |

0:00:21 | and i would speak about |

0:00:23 | ilp clustering |

0:00:25 | what we introduce in the last addition of o t c |

0:00:30 | because we add some improvements it was necessary |

0:00:34 | i would speak about password a graph |

0:00:36 | clustering |

0:00:39 | so |

0:00:41 | the in |

0:00:43 | as a presentation will be first are we speak about the context in the ionisation |

0:00:48 | architecture where you thing in your |

0:00:51 | to show you where the ilp clustering is used |

0:00:55 | and then i will show you what's wrong with this formulation the original one and |

0:01:00 | then i would show you the graph |

0:01:03 | clustering |

0:01:05 | so |

0:01:06 | the context is the same challenge as every spoke |

0:01:11 | the hotel challenge so |

0:01:13 | the goal was to i one was are you with that but the goal was |

0:01:16 | to detect at any time in the during the video the |

0:01:20 | who is speaking and who is busy but on the screen and we cited |

0:01:25 | and |

0:01:26 | speaker diarization was just one of the sub task of the challenge |

0:01:31 | so do in this paper and this the presentation and present result |

0:01:37 | on the generally to seven search in this corpus it to put the duration of |

0:01:42 | forty in our roles there is twenty eight tv shows recorded from french t v |

0:01:48 | channels |

0:01:49 | so it's broadcast news |

0:01:52 | video broadcast news |

0:01:53 | and what social while balanced between prepared and spontaneous speech |

0:01:59 | so that it actual we used in the room |

0:02:03 | it's the two-stage architectures there is a first |

0:02:07 | segmentation part in clustering |

0:02:10 | which give us the first segmentation so |

0:02:14 | there is a |

0:02:15 | secretary on segmentation followed by a clustering a viterbi re-segmentation |

0:02:21 | and then we detect the speech nonspeech areas engenders |

0:02:25 | so the first segmentation files |

0:02:29 | each cluster |

0:02:31 | contains the voice of only one speaker |

0:02:34 | but several cluster can be |

0:02:36 | related to a same speaker so we have two |

0:02:40 | do another clustering |

0:02:42 | that's where we used that's where we propose to use ilp clustering to replace the |

0:02:48 | h a c |

0:02:49 | a traditional clustering we used in speaker diarization |

0:02:55 | so i will give you what about those two clustering because that we just give |

0:02:59 | you some results in order to compare the |

0:03:02 | if you can see in term of diarisation error rate |

0:03:06 | so from the put |

0:03:08 | big based segmentation |

0:03:11 | we do here we can implement of clustering with a complete linkage we used the |

0:03:16 | cross actually with ratio to estimate the similarities |

0:03:20 | and the speaker cluster up |

0:03:23 | modeled with |

0:03:24 | but question mixture models so we used twelve |

0:03:27 | and if she sees plus the energy we removed the channel contribution |

0:03:33 | it was performed with a map adaptation on the two five six component ubm |

0:03:39 | really basic colour |

0:03:41 | clustering |

0:03:43 | and on the other side e i d |

0:03:46 | so the clustering is expressed as an ilp problem the speaker cluster are modeled with |

0:03:51 | i-vectors of sixty dimensionality so not that much |

0:03:57 | we use |

0:03:58 | and i ching mfcc the energy the first and second order derivatives we use as |

0:04:03 | where the one so than twenty four ubm |

0:04:06 | i-vector that avoid links normalize |

0:04:10 | the training data we used came from the ester one french broadcast news dataset it |

0:04:16 | was |

0:04:17 | a common evaluation campaign so this is desire sorry right you radio data |

0:04:24 | and so we estimate the similarities between each i-vectors with a man database distance |

0:04:31 | and so are we give you sorry the clustering we express it with the got |

0:04:37 | it i linear programming |

0:04:40 | sorry |

0:04:41 | which consist in |

0:04:43 | gently minimize the number of cluster |

0:04:47 | so there and the dissipation between the cluster |

0:04:52 | as a constraint are just |

0:04:55 | what one point two is to which is to say that we used in area |

0:04:58 | variable so if |

0:05:00 | a cluster g is that assigned to a center k |

0:05:04 | it would be equal to one |

0:05:07 | question one the tree is that to be to say that the clusters you have |

0:05:12 | to be assigned to a single center k |

0:05:16 | and then once the performance for the twenty sure that |

0:05:20 | the center k selected if |

0:05:22 | a cluster g is assigned to it |

0:05:24 | and the last one is distance so the distance between two clusters ascended g a |

0:05:31 | cluster gmm sent okay i've to be shorter |

0:05:34 | one |

0:05:36 | but special |

0:05:38 | and about the comparison of some results so i don't we cannot compare its because |

0:05:44 | it's not the same |

0:05:46 | acknowledges and mode a decision |

0:05:48 | but what we have we see agency gmm we obtain the sixteen that twenty two |

0:05:54 | diarization error rate |

0:05:56 | and |

0:05:57 | we went down to fourteen that seven with the ilp clustering |

0:06:02 | to this was done on the data are presented first |

0:06:06 | so what's wrong in the site be formulation actually nothing is wrong it just |

0:06:12 | that |

0:06:13 | we have to use an external solver two |

0:06:18 | to obtain all clustering |

0:06:20 | which uses |

0:06:22 | mostly up most of them use the branch and bound algorithm which is general algorithm |

0:06:27 | to determine what optimal solution of discrete programs |

0:06:32 | and it's not depending on the added error |

0:06:35 | i mean the complexities not |

0:06:37 | but good |

0:06:38 | it may result |

0:06:40 | in a systematic enumeration of all the possible solution we are |

0:06:44 | you know the to give you the optimal solution |

0:06:46 | and so big problems made it to unreasonable processing duration |

0:06:53 | so we have two |

0:06:54 | in order to decrease the complexity of the solving we have two |

0:07:00 | minimize the path the algorithm have to explore so to do that with the i |

0:07:04 | p |

0:07:05 | it means we have to reduce the number of binary variables and constraints which are |

0:07:13 | defined in the problem to be solved |

0:07:16 | and because the distance between clusters i-vectors are computed |

0:07:21 | before two |

0:07:23 | define the ilp problem itself |

0:07:26 | we already know which |

0:07:29 | pair of i-vectors of cluster can be used because of the distance |

0:07:34 | we already knows that |

0:07:37 | the distance between |

0:07:39 | each i-vectors i mean |

0:07:41 | so |

0:07:42 | you less two |

0:07:44 | to construct the big ilp clustering |

0:07:47 | big at problem |

0:07:48 | with all the variables |

0:07:51 | while we can just uses the interesting one |

0:07:56 | so we formulate the clustering by |

0:08:02 | what |

0:08:03 | we use a subset of the |

0:08:06 | or |

0:08:08 | set of clusters |

0:08:10 | which correspond to the for each |

0:08:13 | cluster g |

0:08:15 | it correspond to all the possible values |

0:08:19 | of k for which the distance are shorter than the threshold which is a very |

0:08:24 | tended to mine |

0:08:26 | so well we don't need anymore that cost rent |

0:08:29 | and |

0:08:31 | so the problem |

0:08:34 | lit to a reduction of in terms of number of been area variables and constraints |

0:08:40 | so i took the |

0:08:43 | we counted |

0:08:45 | and the i p five which are submitted to the solver |

0:08:51 | the number of binary variables and constraints and then i present for each show of |

0:08:55 | the corpus and would i presented only the |

0:08:58 | the statistics |

0:09:00 | so the average in average will reduce from one thousand seven to fifty three cost |

0:09:06 | variables |

0:09:08 | and the number of constraints have been reduced from three thousand four |

0:09:14 | two fifty tree as weighted so |

0:09:17 | the diarization error rate didn't |

0:09:19 | change it's |

0:09:20 | it just a re formulation of the problem in order to decrease the complexity of |

0:09:26 | the sorting process |

0:09:30 | and so |

0:09:32 | because we reduced a lot is the number of variables and |

0:09:36 | and the constraint |

0:09:38 | we can to think about |

0:09:40 | us graph speaker clustering so that the representation of |

0:09:46 | so when using metrics distance which associate the distance between each cluster |

0:09:52 | it can be interpreted as a connected graph so the clusters are represented by the |

0:09:57 | note and the distance by the ages |

0:10:00 | and second easy representation of the original ilp formulation which is complex |

0:10:07 | with all the |

0:10:08 | distance |

0:10:11 | and i |

0:10:13 | so |

0:10:13 | we can |

0:10:14 | if we decompose that graph into |

0:10:18 | connected component |

0:10:20 | by removing the edges which are long as a threshold delta |

0:10:25 | we obtain several connected component which can which constitute independent subproblems so we can process |

0:10:33 | those components separately |

0:10:36 | instead of doing a big clustering we just |

0:10:39 | therefore some |

0:10:42 | small clustering which are much more three details |

0:10:44 | and as you can see there is some |

0:10:48 | cluster we don't have to be processed |

0:10:50 | because the solution is abuse |

0:10:52 | even that one |

0:10:59 | so |

0:11:00 | instead of |

0:11:02 | doing an ilp clustering |

0:11:04 | or whatever the clustering is but we use i give it a jesse's find as |

0:11:08 | well |

0:11:12 | we actually |

0:11:14 | look for the abuse centers which can be formulated as the search for star graph |

0:11:22 | components so star graph it just the kind of trees |

0:11:27 | three sorry which is composed of one central node then |

0:11:31 | many a set the number of live |

0:11:34 | just the one |

0:11:35 | that level |

0:11:38 | it's real easy to find |

0:11:40 | so i mean it's fourteen and |

0:11:42 | so there is |

0:11:43 | obvious solution all of those don't have to be process it with clustering algorithm |

0:11:52 | but there are some more complex sub components like that one |

0:11:56 | or we still need to two |

0:12:00 | to use a clustering algorithm in order to have the optimal solution |

0:12:06 | so we did it with the i p of course compared |

0:12:11 | as a result of the previous |

0:12:15 | slide i mean the with a reduction of the number of a but it cost |

0:12:18 | trends |

0:12:19 | and |

0:12:20 | on the right is the one with |

0:12:23 | star graph a connected component search on which the ilp clustering is used only to |

0:12:29 | process the complex |

0:12:31 | sub components |

0:12:33 | so it is reduced to fifty three toward most seven in average and |

0:12:39 | the minimum is zero it means that some of the shows |

0:12:42 | didn't presents it at |

0:12:45 | complex sub components so |

0:12:48 | on these that |

0:12:50 | only by finding the start subgraph we with all so e |

0:12:55 | clustering problem |

0:12:58 | and so we were questioning about the interest of the clustering method to process the |

0:13:06 | complex |

0:13:08 | components |

0:13:09 | because on the eight |

0:13:11 | of the eight twenty eight shows which compose the corpus |

0:13:15 | web present t souls complex connected components |

0:13:19 | so we tried to do it without any clustering process |

0:13:24 | so that was two strategies and low clustering where |

0:13:29 | nothing is done with the complex component which just say okay we have a complex |

0:13:33 | subcomponents just let it like that and the others the what single cluster strategy is |

0:13:39 | the opposite we merge all |

0:13:41 | of the cup of the look sorry all the cluster of a complex component into |

0:13:46 | a single cluster |

0:13:49 | an |

0:13:50 | it appears that |

0:13:52 | well so no clustering strategy when the thing is done is a don't present interesting |

0:13:57 | result but |

0:13:59 | if we look each |

0:14:00 | on the ad the |

0:14:03 | z are also good results the best result we have for each threshold |

0:14:09 | star graph |

0:14:10 | research |

0:14:11 | by and minutes of merging of the all the cluster the complex component give better |

0:14:15 | results |

0:14:17 | land |

0:14:18 | the one with an ilp clustering because of this ratio |

0:14:21 | but we still better to use |

0:14:24 | a clustering method to have the really on optimal values because of the processing of |

0:14:30 | the complex sub components |

0:14:33 | but what we can say is |

0:14:38 | where i don't i should have i but the diarisation on the rights we add |

0:14:43 | with the agency approach using gmms we at sixteen that twenty two percent so it's |

0:14:52 | z a star graph approach with and a clustering algorithm to process the complex sub |

0:14:58 | components give better diarization error rate |

0:15:01 | so it's almost all look clustering process |

0:15:05 | at |

0:15:07 | so |

0:15:07 | that's a conclusion so we |

0:15:10 | we formulate the ip in order to reduce the complexity of the serving processing |

0:15:15 | the reason no interference and diarization error rate |

0:15:18 | and then we expose the clustering as a graph exploration which can which'll |

0:15:24 | the system to split |

0:15:27 | the clustering problem into several independent subproblems and can be used to search for star |

0:15:32 | graph connected component |

0:15:35 | the star graph collect the star graph up rush euros |

0:15:41 | solve almost the entire problem but it's to professor able to use |

0:15:47 | and clustering algorithm in order to process the complex sub components |

0:15:54 | some clustering algorithm have already been studied |

0:15:58 | to do that |

0:15:59 | graph with a graph approach women |

0:16:01 | but we find that id give better result than the agency approach which was the |

0:16:07 | conclusion of the odours |

0:16:10 | and we have some |

0:16:14 | so |

0:16:18 | i performed an experiment on the |

0:16:20 | when large corpora it's not to read is that large but one hundred dolls so |

0:16:25 | i to the segmentation five from the be at clustering about several and then i |

0:16:31 | i do would be a big clustering ilp clustering on that |

0:16:35 | so it represent a clustering with something like a bit more than four thousand a |

0:16:40 | speaker cluster |

0:16:41 | and i compared to duration of the i t so the original one from two |

0:16:46 | years |

0:16:47 | two hours to be to be done as a re formulation to con the i'd |

0:16:52 | units |

0:16:53 | and the graph approach to |

0:16:55 | only five |

0:16:56 | so |

0:16:57 | this is clustering included the time and required to compute the distance between each clusters |

0:17:04 | as the definition of the problem and the solving |

0:17:08 | well i think most of the time they're dispense to estimate similarities between clusters |

0:17:18 | what |

0:17:19 | that |

0:17:20 | would be my last night |

0:17:23 | section |

0:17:37 | that's i have to remarks first it's quite |

0:17:42 | normal to conclude that eurostar algorithm is able to |

0:17:49 | a graph according to say about a sort of by itself you clustering problem because |

0:17:53 | you achy call initial allegory is the graph clustering of going |

0:17:58 | so |

0:17:59 | it's just a different version it could be inter would be interesting to compile a |

0:18:04 | in term of we have simulation in terms of refs a rough jewelry europe which |

0:18:11 | is directly |

0:18:13 | the second point are remark i'm we could be disappointed after two euros with the |

0:18:17 | ilp to see that |

0:18:20 | various t-norm really improvement in them of all using ilp |

0:18:27 | because you have less |

0:18:29 | you have more or you are not taking only decision like in your article a |

0:18:34 | clustering so we could expect |

0:18:36 | to have also improvement performance |

0:18:39 | us can i agree with you and well the ilp is not the solution of |

0:18:43 | the clustering when |

0:18:45 | we use it |

0:18:47 | to perform clustering on the big beta and it's |

0:18:52 | almost because what i want to say is |

0:18:56 | processing duration is really |

0:18:59 | interesting compared to the edges e one |

0:19:03 | well i think it will still fail with a huge amount so that the i |

0:19:09 | mean sows on of house i never tried but i think that would be some |

0:19:13 | so e |

0:19:14 | in |

0:19:16 | h i c i think will be |

0:19:20 | we can do the job but it will take time |

0:19:25 | but we did that the improvement from eight to what the number of constraints and |

0:19:30 | viable is really mean nothing but we have |

0:19:33 | to add it's because is |

0:19:36 | was |

0:19:38 | essential i mean to process |

0:19:41 | data |

0:19:55 | so and i wasn't is then you look "'cause" |

0:19:58 | i was just a big static the channels and |

0:20:01 | and i wanted to |

0:20:03 | to try to apply but the fact that is the nine hundred this that need |

0:20:08 | something data to |

0:20:10 | compute the covariance matrix |

0:20:12 | sorry could be i mean |

0:20:14 | i what dialects but that the channels |

0:20:18 | but the i-vector challenge we don't have the training data which is the not the |

0:20:22 | case for the to compute them hunters this |

0:20:26 | in the not just as when we actually |

0:20:31 | i haven't as a slight but that's not published result but we switch we're using |

0:20:35 | now i-vectors of three hundred dimensionality |

0:20:38 | and we stopped using man database we use the key idea scoring another to that |

0:20:43 | was my mind compare that's |

0:20:45 | much more |

0:20:47 | we have better results and does not |

0:20:49 | thanks |

0:20:53 | thanks |