0:00:15 hello would have "'em" everybody in these presentation and we show you some of my work in speaker clustering but before starting i would like to define two things the first one is the speaker clustering problem that we want to scroll we have another database in which i would be awesome belong to unknown speaker and also we have are known number of a speaker and the second one is we will talk about audio database characteristic in this presentation when we refer to this term we think is in things such as the number of audio or how many of yours we happening each speaker higher so first of all i would percent you the outline of the presentation we will start with the motivation later we i present you the clustering algorithm that we are we have been using later a we will see the them the right of also that we have studied and we will conclude with some experiment a starting the stopping criteria so if we talk about the what the question why we suppose that a we a receiving number of these one client that is interesting it getting a clustering based solution and one common question that we have to deal with is okay how is your system working and for that purpose a we will ask them to give a and how the database a similar as possible to that one that will be used later in the in the system and with that database we will make something we will be able to say okay we expect to have similar results as this one but based on hours again we've seen that a clustering task my of that very different results depending on the of the database so we also my sake be careful because if the distribution of our viewers and speaker in the database is different from what we have now you may have very different results and then of course based on how can we expect those disorders to change and one so that's what you would need to and on several experiment and someone else experiment one to nine percent think you're okay so now we know what we want to do first of all i we present the clustering algorithm that we are using we can see that and are domain i think it i've got clustering about these a clustering algorithm that are that stuck in a partition in which each audio is identified with one single cluster and it editing really we match the close to a cluster two completely fine i will algorithm we will have to a fixed three scenes the first one is the distance metric and for this purpose we will can see that a the scores provided by the lda system so before running the clustering algorithm we compute all the buttons all scores for the abolition database and we will use both the score to be the similarity matrix we also saw and need to define a linkage method and we will use minimum distance and also what we have six but stopping criterion and we can see that a score based initial particularly a maximum distance scores about these were to cluster made is the this time is right about certain threshold we will start and weather wise we will continue a messing cluster regarding the performance measures we are when i use a we will use a those defined by david but only when one of his work that a lot of one are the speaker but the and the clustering purity speaker the matter how to speak at the house but in the speaker a overall the clustered white cluster impurity measure of how corrupt cluster are and when we say that one cluster gypsy score but i we refer to the fact that you has audio from many different the speaker if we compute a those of i levels at each iteration of the big clustering process and we blocks the always point in graph we will get impunity three of course that are going as the one but a we have here in this slide we will use these graphs to make sure that performance of our way the clustering experiments a using the the whole the presentation and for as a reference point will be that you went but working point that these when we have the same is speaking ability of the clustering purity before we start with the presentation a i was and you the database that we have used we can see that and i leo's from these that are telephone channel and with a three hundred segment duration and here in a graph you can see the are we just put a speaker distributions that we have in this database okay use our policies to conduct a times an hour ago database was first meet a to define some variables that if an art in this part so we can see that don't then the first one size of the task but these the number of audio we have been database the second one number of a speaker that is the number of a speaker that we haven't database and the balance of a speaker that meshes and how many how close it just be good a house show and regarding the first well what we will perform different experiments in which we might i the size of the task it was started from the initial set of audio and we will study i into that's what is more the side so for example a we have as you can see in the table six and subset of side a three subsets results and for those task in which it we have more than clustering task we will the weather or the one of the resource l with one single car we can better results between different size of the task here we have a meeting place of course not they what extent that actually have we have clustering purity and in the medical axes we have speaker impurity and as we can see as we introduce the size of the task we expect to have better results in our clustering problem the second part of what we have i think use if the number for speaker and to characterize this experiment we will use the value out that is defined as the number of a speaker divided by the number of our with your we can also have another interpretation of these available but it allows us to know that iteration in which we should stop since we want to stop when we have as many clusters as the speakers we can see that several groups of clustering that's we will win of a time the number of speakers and all the task and have the same number of yours and given a task of a concrete number of a speaker a we will have a same number of our guest better speaker so as you can see in the table four component we will have task with a five a speaker size hundred and twenty hours per speaker and here we have the universal bases and that it's a little bit different from what we have seen the previews experiment but again we will exactly the same information on the a forty some axes and we have are weighted by table that the we have time but this i and the vertical axis we have the speaker evaluation and each of the lines represents all standpoint of clustering purity a valid so for example if we want to start with a the results they're suppose we would like to get in our experiments are clustering purity of one percent that is the score and we want to compare themselves but using o point five a and one eight and we see that with point five we need high spirits high fighters getting ability value this means that if our a optimal solution it is found is found in the middle of the clustering the risk we will the spectral sub network resource then that's about of all we have studied use it to balance of a speaker in the but also for speaker would try to study the manual they one we are percent in a slight that these we have one to speak at it that fast most of the owners in the database and we have all the number of a speaker about how much less our reviewers a we also need to fix but these the number of speakers are divided by the number to follow and in our task we will can see that always a the size of the that six to forty so it's of a where giving are it's equal to given the numbers or of the speakers here we have for scenario in which we might i they a presentation of a clear that the remainder speaker that's we start from a with this one which the main or speaker task more or less the same number of years that or something until these one in which we the main speaker cost much more out of your than the other where if we again take a look at the results that this is a getting us empirically the rate of call we see that this leads to a system and the sense similar results and as we increase the presentation of i'll give that the range you get how we get better results so we can conclude that if the main speaker task you know audio to make the different with different the rest of the via speaker we will expect with a better clustered into shows okay it still for a what if you remember a when i present the clustering algorithm i talk about the stopping criteria but it so far a the computation cost of a threshold value it has been avoided in this section a we will study it to a different methods and arseholes method requires a set of labeled a are we get database two one we would better for a the experiments instead and then also a mismatch between the training and the testing set so the first one that we have call maximum this time with a baseball we will use the label our database to run a clustering process and as we know how many speakers do we have will be able to stop at the point in which the number of speakers is equal to the number of clusters if we it saves that the distance or vast last iteration we will be able to use later a substantial value and that's initial value is they want that it's used for placement for the second method that these called maximum distance with unsupervised score calibration what we do is instead of a leaving the clustering algorithm and they distance metric but time we can be from the ap lda system we will make a calibration process over the voucher scored and that's a made use of credit with this point is the one that will be used later in a clustering algorithm a as this process calibrating we will be able to choose the threshold value that we want depending on how many a errors we moved to let our clustering algorithm to make i'm thinking that if you let a few errors you will stop at very a high speaker the greedy values and we will not get the correct number four or for speaker and we can see that and for the group of clustering task the first one but using a in which we will use similar training and testing set and all the three groups in which we will have different a i'll just better speaker distribution in the training and that there's things that as here we are going in the rest i in stopping what we have a as many speakers just clustering we will define a way to perform a measure as the difference between the number of speakers and the number of clusters related to a the number of speakers so here we have the obtain it results eh we see here it may but the girl axis the their valuable exactly the this one but i just define and here we have in blue a difference of dining with the maximum distance with protocol and on that a solution well funded by the a calibrated a scores and we see that a the second method performs similar source no matter a the that's a mismatch between training and testing set and we the first method may only be used when we have see me that a databases in the training and testing so it to conclude with my presentation i would like to say to think that these we see that speaker clustering used strongly affect by the characteristics of our are we get a calibration and also a we can use these completion to anticipate a possible to change but also to find possible solution in the future for example we see that it if we have operating at are we dataset we will get much one assaults that use the at the database is more so we will propose to split that our database into a is more than one and use those smaller set to run a clustering that aims at i as those clustering task we i have better visual that the rules that the big one we will finally have better results in you know what clustering problem and the supply the need for questions so i question so it's so probably so they you mentioned you have stuck that someone clusters that are useful participate in the accuracy of the best in a scenario but it's based on the system i mean how dependent distributions on the system do you use i is at the unit is possible you know that or well i would say you know it is used a quite spatially i believe that you know when you make one decision a at the beginning of the clustering process you that you will take that into a home until the end of the process so i think i the reason behind and this conclusion is found in that's thing for example a a we can think why we have shown different results when and we have different size of the task and used as the size of the task t speaker errors that are made at the beginning of the clustering process we started out or the of the clustering three and these use more harmful as model iteration we have so is are where the task is more than once we have there's less there's iteration that will be less channel and also for example the task we in which we analyze a the number of a speaker eh we see that there was a result where a chain when we were at the middle of the the clustering three and a and if the solution was found in the beginning of the three or in the end of the three we got a better visual that is also because and i again and at the beginning that a less possible partition and in the middle we have more but as we cannot access all obtain because because of the it possible decisions that we have previously made the old may not be available but that that a in doesn't happen that if we apply we need a in just we have a more possible option that's because of course okay i due to the bic clustering algorithm where using so i'd say a yes i think a clustering i believe i affected by these by an the conclusion stuck at a very influenced by a the algorithm you use a for example a here and are not all there are so that all the experiments we have make but a if we change the but in case mix of and a we used for example average score we show that the evidence a you to the finals of the big of a see what we have a better results when there is a means because if we use average the score instead of matching score a all the results that we obtain whether this were similar so that was an example that if we change the clustering algorithm we may have a different some most of the completion suspect all the rebuttal for fundamental this element definitely a clustering is our method for testing the particular scoring your you see inside what so what is your inside of the limits once so what would you say that affects the most of the to these conclusions a high i think it's a quite affected by by the like the clustering algorithm within your thanks sorry i four u s i isn't no way stance one work was the it's able the database that used to you mentioned that using only "'cause" the t a three hundred seconds of the of okay there is the duration variability inside and so on did you study the effect of this duration on the all the conclusion that you would yes i think we also need any some experiments which a we tested different different iteration and hey the data results channel deconvolution and it keeps similar but a we have hi after some we that higher a clustering purity levels all of our weighting experiment as we got higher the difference between a different databases used not show something