0:00:15 | hello would have "'em" everybody in these presentation and we show you some of my |
---|---|

0:00:21 | work in speaker clustering |

0:00:23 | but before starting i would like to define two things the first one is the |

0:00:28 | speaker clustering problem that we want to scroll we have another database in which i |

0:00:34 | would be awesome belong to unknown speaker and also we have are known number of |

0:00:38 | a speaker |

0:00:39 | and the second one is we will talk about audio database characteristic in this presentation |

0:00:44 | when we refer to this term we think is in things such as the number |

0:00:49 | of audio or how many of yours we happening |

0:00:52 | each speaker higher |

0:00:54 | so |

0:00:56 | first of all i would percent you the outline of the presentation |

0:01:01 | we will start with the motivation |

0:01:03 | later we i present you the clustering algorithm that we are we have been using |

0:01:09 | later a we will see the them |

0:01:12 | the right of also that we have studied and we will conclude |

0:01:16 | with some experiment a starting the stopping criteria |

0:01:21 | so |

0:01:23 | if we talk about the what the question why we suppose that a we a |

0:01:28 | receiving number of these one client that is interesting |

0:01:32 | it getting a clustering based solution |

0:01:35 | and one common question that we have to deal with is okay |

0:01:40 | how is your system working |

0:01:42 | and for that purpose a we will ask them to give a and how the |

0:01:46 | database a similar as possible |

0:01:49 | to that one that will be used |

0:01:51 | later in the in the system and with that database we will make something we |

0:01:56 | will be able to say okay we expect |

0:01:59 | to have similar results as this one but |

0:02:02 | based on hours again we've seen that a clustering task |

0:02:07 | my of that |

0:02:08 | very different results depending on the of the database so we also my sake be |

0:02:13 | careful because if the distribution of our viewers and speaker in the database is different |

0:02:18 | from what we have now |

0:02:20 | you may have |

0:02:21 | very different results |

0:02:22 | and then |

0:02:24 | of course based on how can we expect |

0:02:27 | those disorders to change |

0:02:29 | and one so that's what you would need to and on several experiment and someone |

0:02:35 | else experiment one to nine percent think you're |

0:02:38 | okay so now we know what we want to do first of all i we |

0:02:42 | present the clustering algorithm that we are using |

0:02:46 | we can see that and are domain i think it i've got clustering about these |

0:02:50 | a clustering algorithm that are that stuck in a partition in which each audio is |

0:02:56 | identified with one single cluster and it editing really we match the close to a |

0:03:01 | cluster |

0:03:02 | two completely fine i will algorithm we will have to a fixed three scenes the |

0:03:09 | first one is the distance metric and for this purpose we will can see that |

0:03:13 | a the scores provided by the lda system so |

0:03:17 | before running the clustering algorithm |

0:03:19 | we compute all the buttons all scores for the abolition database and we will use |

0:03:24 | both the score to be the similarity matrix |

0:03:27 | we also saw and need to define a linkage method and we will use minimum |

0:03:32 | distance |

0:03:33 | and also what we have six |

0:03:36 | but stopping criterion and we can see that a score based initial particularly |

0:03:41 | a maximum distance scores about these were to cluster made is the this time is |

0:03:48 | right about certain threshold we will start |

0:03:51 | and weather wise we will continue a |

0:03:54 | messing cluster |

0:03:56 | regarding the performance measures we are when i use a we will use a those |

0:04:00 | defined by david but only when one of his work that a lot of one |

0:04:05 | are the speaker but the and the clustering purity speaker the matter how to speak |

0:04:11 | at the house but in the speaker a |

0:04:14 | overall the clustered |

0:04:15 | white cluster impurity measure of how corrupt cluster are and when we say that one |

0:04:20 | cluster gypsy score but i we refer to the fact that you |

0:04:24 | has audio from many different the speaker |

0:04:27 | if we compute |

0:04:29 | a those of i levels at each iteration of the big clustering process |

0:04:34 | and we blocks |

0:04:35 | the always point in graph |

0:04:37 | we will get impunity three of course that are going as the one but a |

0:04:42 | we have here in this slide |

0:04:44 | we will use these graphs |

0:04:46 | to make sure that performance of our way the clustering experiments a using the |

0:04:52 | the whole the presentation |

0:04:54 | and for as a reference |

0:04:57 | point will be that you went but working point that these when we have |

0:05:00 | the same is speaking ability of the clustering purity |

0:05:05 | before we start with the presentation a i was and you the database that we |

0:05:10 | have used we can see that |

0:05:12 | and i leo's from these that are |

0:05:14 | telephone channel |

0:05:15 | and with a three hundred segment duration and here in a graph you can see |

0:05:22 | the are we just put a speaker distributions that we have in this database |

0:05:27 | okay |

0:05:28 | use our policies |

0:05:30 | to conduct a times an hour ago database was first meet a to define some |

0:05:34 | variables that if an art in this part so we can see that don't then |

0:05:39 | the first one size of the task |

0:05:41 | but these the number of audio we have been database |

0:05:44 | the second one number of a speaker that is the number of a speaker that |

0:05:49 | we haven't database and the balance of a speaker that meshes |

0:05:53 | and how many how close it just be good a house |

0:05:58 | show |

0:05:59 | and regarding the first well what we will perform different experiments in which |

0:06:05 | we might i the size of the task |

0:06:07 | it was started from the initial set of audio and we will study |

0:06:11 | i into |

0:06:12 | that's what is more the side |

0:06:14 | so for example a we have as you can see in the table six |

0:06:19 | and subset of side a three subsets results and |

0:06:25 | for those task in which |

0:06:27 | it we have more than clustering task we will the weather or the |

0:06:31 | one of the resource l with one single car |

0:06:34 | we can better results between different size of the task |

0:06:39 | here we have a meeting place of course not they |

0:06:43 | what extent that actually have we have clustering purity and in the medical axes we |

0:06:48 | have speaker impurity |

0:06:50 | and as we can see as we introduce |

0:06:53 | the size of the task we expect to have better results in our clustering problem |

0:07:00 | the second part of what we have i think use if the number for speaker |

0:07:04 | and to characterize this experiment |

0:07:06 | we will use |

0:07:08 | the value out that is defined as the number of a speaker divided by the |

0:07:13 | number of our with your |

0:07:14 | we can also have another interpretation of these available |

0:07:18 | but it allows us to know that |

0:07:21 | iteration in which we should stop since we want to stop when we have as |

0:07:25 | many clusters |

0:07:26 | as the speakers |

0:07:29 | we can see that several groups of clustering that's we will win of a time |

0:07:34 | the number of speakers and all the task |

0:07:38 | and have the same number of yours and given a task of a concrete number |

0:07:43 | of a speaker |

0:07:44 | a we will have a same number of our guest better speaker |

0:07:50 | so as you can see in the table four component we will have task with |

0:07:54 | a five a speaker size hundred and twenty hours per speaker |

0:08:00 | and |

0:08:00 | here we have the universal bases |

0:08:04 | and that it's a little bit different from what we have seen the previews experiment |

0:08:11 | but again we will exactly the same information on the a forty some |

0:08:17 | axes |

0:08:17 | and we have are weighted by table that the we have time but this i |

0:08:22 | and the vertical axis we have the speaker evaluation |

0:08:26 | and each |

0:08:28 | of the lines represents all standpoint of clustering purity a valid |

0:08:32 | so for example if we want to start with a |

0:08:35 | the results they're suppose we would like to get |

0:08:38 | in our experiments are clustering purity of one percent that is the score |

0:08:44 | and we want to compare themselves |

0:08:46 | but using o point five a and one eight and we see that |

0:08:52 | with |

0:08:54 | point five we need high spirits high fighters getting ability value |

0:08:59 | this means that |

0:09:00 | if our a optimal solution |

0:09:03 | it is found is found in the middle of the clustering the risk we will |

0:09:08 | the spectral sub network resource |

0:09:12 | then that's about of all we have studied use it to balance of a speaker |

0:09:17 | in the but also for speaker would try to study the manual they one we |

0:09:21 | are percent in a slight that these |

0:09:23 | we have one to speak at it that fast most of the owners in the |

0:09:28 | database and we have |

0:09:30 | all the number of a speaker about how much less our reviewers |

0:09:34 | a we also need to fix |

0:09:37 | but these the number of speakers are divided by the number to follow and in |

0:09:41 | our task we will can see that always a the size of the that six |

0:09:46 | to forty so it's of a where |

0:09:49 | giving are it's equal to |

0:09:51 | given the numbers or |

0:09:53 | of the speakers |

0:09:55 | here we have |

0:09:57 | for scenario in which we might i |

0:10:00 | they a presentation of a clear that the remainder speaker |

0:10:05 | that's we start |

0:10:06 | from a with this one which |

0:10:09 | the main or speaker task |

0:10:10 | more or less the same number of years that or something until these one in |

0:10:15 | which |

0:10:15 | we the main speaker cost much more out of your than the other where |

0:10:21 | if we |

0:10:22 | again |

0:10:23 | take a look at the results that this is a getting us |

0:10:27 | empirically the rate of call |

0:10:28 | we see that |

0:10:31 | this leads to a system and the sense similar results and as we increase the |

0:10:37 | presentation of i'll give that the range you get how |

0:10:40 | we |

0:10:41 | get better results |

0:10:43 | so |

0:10:43 | we can conclude that if the main speaker |

0:10:46 | task you know audio to make the different with different the rest of the via |

0:10:52 | speaker we will expect with a better clustered into shows |

0:10:57 | okay |

0:10:58 | it still for a what if you remember a when i present the clustering algorithm |

0:11:03 | i talk about the stopping criteria but it |

0:11:07 | so far a the computation cost of a threshold value |

0:11:12 | it has been avoided |

0:11:13 | in this section a we will study it to a different methods |

0:11:19 | and arseholes method requires a set of labeled a are we get database |

0:11:27 | two one we would better for a the experiments instead and then also a mismatch |

0:11:32 | between the training |

0:11:34 | and the testing set |

0:11:36 | so |

0:11:37 | the first one that we have call maximum this time with a baseball |

0:11:41 | we will use |

0:11:42 | the label our database to run a clustering process and |

0:11:48 | as we know |

0:11:49 | how many speakers do we have will be able to stop at the point in |

0:11:54 | which the number of speakers is equal to the number of clusters |

0:11:57 | if we |

0:11:58 | it saves that the distance or vast last iteration we will be able to use |

0:12:04 | later |

0:12:04 | a substantial value and that's initial value is they want that it's used for placement |

0:12:09 | for |

0:12:11 | the second method that these called maximum distance with unsupervised score calibration what we do |

0:12:16 | is instead of a leaving the clustering algorithm |

0:12:21 | and they distance metric but time we can be from the ap lda system |

0:12:27 | we will make a calibration process over the voucher scored and |

0:12:31 | that's a made use of credit with this point is the one that will be |

0:12:35 | used later in a clustering algorithm |

0:12:38 | a as this process calibrating we will be able to choose the threshold value that |

0:12:44 | we want depending on |

0:12:46 | how many a errors |

0:12:49 | we moved to let our clustering algorithm to make |

0:12:54 | i'm thinking that if you let |

0:12:57 | a few errors you will stop at very a high speaker the greedy values and |

0:13:03 | we will not get the correct number four |

0:13:06 | or for speaker |

0:13:08 | and we can see that |

0:13:10 | and for the group of clustering task |

0:13:15 | the first one but using a in which we will use similar training and testing |

0:13:22 | set and all the three groups in which we will have different a i'll just |

0:13:28 | better speaker distribution in the training and that there's things that |

0:13:32 | as here we are going in the rest i in stopping what we have a |

0:13:37 | as many speakers just clustering |

0:13:39 | we will define a way to perform a measure as the difference between the number |

0:13:44 | of speakers and the number of clusters |

0:13:46 | related to a the number of speakers |

0:13:51 | so here we have the obtain it results eh |

0:13:55 | we see here it may but the girl axis the their valuable exactly the this |

0:14:00 | one but i just define |

0:14:02 | and here we have |

0:14:05 | in blue |

0:14:06 | a difference of dining with the maximum distance with protocol |

0:14:10 | and on that a solution well funded by the a calibrated a scores |

0:14:17 | and |

0:14:18 | we see that a |

0:14:20 | the second method performs similar source no matter |

0:14:24 | a the that's a mismatch between |

0:14:28 | training and testing set and |

0:14:31 | we |

0:14:32 | the first method may only be used |

0:14:35 | when we have |

0:14:36 | see me that a databases |

0:14:38 | in the training and testing |

0:14:42 | so it to conclude with my presentation |

0:14:46 | i would like to say to think that these |

0:14:49 | we see that speaker clustering used |

0:14:53 | strongly affect by the characteristics of our are we get a calibration |

0:14:57 | and also a we can use these completion to anticipate |

0:15:03 | a possible to change but also to find possible solution in the future for example |

0:15:09 | we see |

0:15:10 | that it if we have operating at |

0:15:13 | are we dataset |

0:15:14 | we will get |

0:15:15 | much one assaults that use the at the database is more so |

0:15:20 | we will propose to split that our database into a is more than one and |

0:15:26 | use those smaller set to run a clustering that aims at |

0:15:32 | i as |

0:15:33 | those clustering task we |

0:15:35 | i have better visual that the rules that the big one |

0:15:39 | we will finally have |

0:15:40 | better results in |

0:15:42 | you know what clustering problem |

0:15:44 | and |

0:15:56 | the supply the need for questions so |

0:16:12 | i question so it's so probably |

0:16:18 | so they you mentioned you have stuck that someone clusters that are useful participate in |

0:16:24 | the accuracy of the best in a scenario |

0:16:27 | but it's based on the system i mean how dependent distributions on the system do |

0:16:34 | you use |

0:16:37 | i is at the unit is possible you know that |

0:16:41 | or |

0:16:44 | well i would say you know |

0:16:47 | it is used a quite spatially |

0:16:52 | i believe that you know when you make |

0:16:54 | one decision |

0:16:56 | a at the beginning of the clustering process |

0:16:59 | you that you will |

0:17:01 | take that into a home until the end of the process |

0:17:05 | so |

0:17:06 | i think i the reason behind and this conclusion is found in that's thing |

0:17:14 | for example a |

0:17:16 | a we can think why |

0:17:18 | we have |

0:17:20 | shown different results when and we have different size of the task |

0:17:25 | and used as the size of the task t speaker |

0:17:29 | errors that are made at the beginning of the clustering process |

0:17:32 | we started out or the of the clustering three |

0:17:38 | and |

0:17:38 | these |

0:17:39 | use |

0:17:39 | more harmful as |

0:17:42 | model iteration |

0:17:44 | we have so is are where the task is more than once we have there's |

0:17:50 | less there's iteration that will be less channel |

0:17:54 | and also for example |

0:17:57 | the task we in which we analyze |

0:18:04 | a the number of a speaker eh we see that |

0:18:09 | there was a result where a chain when we were at the middle of the |

0:18:17 | the clustering three |

0:18:19 | and |

0:18:20 | a and if the solution was found |

0:18:23 | in the beginning of the three or in the end of the three we got |

0:18:28 | a |

0:18:29 | better visual |

0:18:30 | that is also because |

0:18:32 | and |

0:18:33 | i again and at the beginning that a |

0:18:38 | less possible |

0:18:39 | partition |

0:18:40 | and |

0:18:41 | in the middle we have more but as |

0:18:45 | we cannot access all obtain because |

0:18:48 | because of the it possible decisions that we have previously made |

0:18:52 | the old |

0:18:53 | may not be available but that |

0:18:55 | that |

0:18:56 | a in doesn't happen that if we apply |

0:18:59 | we need a in just we have a more |

0:19:03 | possible option |

0:19:06 | that's because of course okay i |

0:19:09 | due to the bic clustering algorithm where using |

0:19:12 | so |

0:19:13 | i'd say a |

0:19:15 | yes i think a |

0:19:18 | clustering |

0:19:20 | i believe i affected by these by an the conclusion stuck |

0:19:25 | at a very influenced by a the algorithm you use |

0:19:31 | a |

0:19:32 | for example |

0:19:33 | a here and are not all there are so that all the experiments we have |

0:19:38 | make |

0:19:39 | but a |

0:19:41 | if we change the |

0:19:44 | but in case mix of and a we used for example |

0:19:49 | average |

0:19:50 | score |

0:19:51 | we show that the evidence |

0:19:54 | a you to the finals of the big of a see what we have |

0:19:58 | a better results when there is a means because if we use |

0:20:03 | average the score instead of matching score a all the results that we obtain whether |

0:20:09 | this were similar so |

0:20:10 | that was an example that if we change the clustering algorithm we may have a |

0:20:17 | different |

0:20:23 | some most of the completion suspect all the rebuttal for fundamental this element definitely a |

0:20:28 | clustering is our method for testing the particular scoring your you see inside what so |

0:20:35 | what is your inside of the limits once |

0:20:40 | so what would you say that affects the most of the to these conclusions |

0:20:45 | a high i think it's a quite affected by |

0:20:50 | by the |

0:20:52 | like the clustering algorithm within your |

0:20:57 | thanks |

0:21:03 | sorry |

0:21:05 | i four u s i isn't |

0:21:11 | no way stance |

0:21:12 | one work was the it's able the database that used to you mentioned that using |

0:21:18 | only "'cause" the t a three hundred seconds of the |

0:21:23 | of |

0:21:25 | okay there is the duration variability inside and so on |

0:21:29 | did you study the effect of this duration on the |

0:21:33 | all the conclusion that you would |

0:21:36 | yes i think we also need any some experiments which a we tested different |

0:21:46 | different iteration |

0:21:48 | and hey the data results channel deconvolution |

0:21:53 | and it keeps similar but a we have |

0:21:56 | hi |

0:21:57 | after some we that higher a clustering purity levels |

0:22:02 | all of our weighting |

0:22:05 | experiment |

0:22:06 | as we got higher the difference between a different databases used not show something |