0:00:15hello would have "'em" everybody in these presentation and we show you some of my
0:00:21work in speaker clustering
0:00:23but before starting i would like to define two things the first one is the
0:00:28speaker clustering problem that we want to scroll we have another database in which i
0:00:34would be awesome belong to unknown speaker and also we have are known number of
0:00:38a speaker
0:00:39and the second one is we will talk about audio database characteristic in this presentation
0:00:44when we refer to this term we think is in things such as the number
0:00:49of audio or how many of yours we happening
0:00:52each speaker higher
0:00:54so
0:00:56first of all i would percent you the outline of the presentation
0:01:01we will start with the motivation
0:01:03later we i present you the clustering algorithm that we are we have been using
0:01:09later a we will see the them
0:01:12the right of also that we have studied and we will conclude
0:01:16with some experiment a starting the stopping criteria
0:01:21so
0:01:23if we talk about the what the question why we suppose that a we a
0:01:28receiving number of these one client that is interesting
0:01:32it getting a clustering based solution
0:01:35and one common question that we have to deal with is okay
0:01:40how is your system working
0:01:42and for that purpose a we will ask them to give a and how the
0:01:46database a similar as possible
0:01:49to that one that will be used
0:01:51later in the in the system and with that database we will make something we
0:01:56will be able to say okay we expect
0:01:59to have similar results as this one but
0:02:02based on hours again we've seen that a clustering task
0:02:07my of that
0:02:08very different results depending on the of the database so we also my sake be
0:02:13careful because if the distribution of our viewers and speaker in the database is different
0:02:18from what we have now
0:02:20you may have
0:02:21very different results
0:02:22and then
0:02:24of course based on how can we expect
0:02:27those disorders to change
0:02:29and one so that's what you would need to and on several experiment and someone
0:02:35else experiment one to nine percent think you're
0:02:38okay so now we know what we want to do first of all i we
0:02:42present the clustering algorithm that we are using
0:02:46we can see that and are domain i think it i've got clustering about these
0:02:50a clustering algorithm that are that stuck in a partition in which each audio is
0:02:56identified with one single cluster and it editing really we match the close to a
0:03:01cluster
0:03:02two completely fine i will algorithm we will have to a fixed three scenes the
0:03:09first one is the distance metric and for this purpose we will can see that
0:03:13a the scores provided by the lda system so
0:03:17before running the clustering algorithm
0:03:19we compute all the buttons all scores for the abolition database and we will use
0:03:24both the score to be the similarity matrix
0:03:27we also saw and need to define a linkage method and we will use minimum
0:03:32distance
0:03:33and also what we have six
0:03:36but stopping criterion and we can see that a score based initial particularly
0:03:41a maximum distance scores about these were to cluster made is the this time is
0:03:48right about certain threshold we will start
0:03:51and weather wise we will continue a
0:03:54messing cluster
0:03:56regarding the performance measures we are when i use a we will use a those
0:04:00defined by david but only when one of his work that a lot of one
0:04:05are the speaker but the and the clustering purity speaker the matter how to speak
0:04:11at the house but in the speaker a
0:04:14overall the clustered
0:04:15white cluster impurity measure of how corrupt cluster are and when we say that one
0:04:20cluster gypsy score but i we refer to the fact that you
0:04:24has audio from many different the speaker
0:04:27if we compute
0:04:29a those of i levels at each iteration of the big clustering process
0:04:34and we blocks
0:04:35the always point in graph
0:04:37we will get impunity three of course that are going as the one but a
0:04:42we have here in this slide
0:04:44we will use these graphs
0:04:46to make sure that performance of our way the clustering experiments a using the
0:04:52the whole the presentation
0:04:54and for as a reference
0:04:57point will be that you went but working point that these when we have
0:05:00the same is speaking ability of the clustering purity
0:05:05before we start with the presentation a i was and you the database that we
0:05:10have used we can see that
0:05:12and i leo's from these that are
0:05:14telephone channel
0:05:15and with a three hundred segment duration and here in a graph you can see
0:05:22the are we just put a speaker distributions that we have in this database
0:05:27okay
0:05:28use our policies
0:05:30to conduct a times an hour ago database was first meet a to define some
0:05:34variables that if an art in this part so we can see that don't then
0:05:39the first one size of the task
0:05:41but these the number of audio we have been database
0:05:44the second one number of a speaker that is the number of a speaker that
0:05:49we haven't database and the balance of a speaker that meshes
0:05:53and how many how close it just be good a house
0:05:58show
0:05:59and regarding the first well what we will perform different experiments in which
0:06:05we might i the size of the task
0:06:07it was started from the initial set of audio and we will study
0:06:11i into
0:06:12that's what is more the side
0:06:14so for example a we have as you can see in the table six
0:06:19and subset of side a three subsets results and
0:06:25for those task in which
0:06:27it we have more than clustering task we will the weather or the
0:06:31one of the resource l with one single car
0:06:34we can better results between different size of the task
0:06:39here we have a meeting place of course not they
0:06:43what extent that actually have we have clustering purity and in the medical axes we
0:06:48have speaker impurity
0:06:50and as we can see as we introduce
0:06:53the size of the task we expect to have better results in our clustering problem
0:07:00the second part of what we have i think use if the number for speaker
0:07:04and to characterize this experiment
0:07:06we will use
0:07:08the value out that is defined as the number of a speaker divided by the
0:07:13number of our with your
0:07:14we can also have another interpretation of these available
0:07:18but it allows us to know that
0:07:21iteration in which we should stop since we want to stop when we have as
0:07:25many clusters
0:07:26as the speakers
0:07:29we can see that several groups of clustering that's we will win of a time
0:07:34the number of speakers and all the task
0:07:38and have the same number of yours and given a task of a concrete number
0:07:43of a speaker
0:07:44a we will have a same number of our guest better speaker
0:07:50so as you can see in the table four component we will have task with
0:07:54a five a speaker size hundred and twenty hours per speaker
0:08:00and
0:08:00here we have the universal bases
0:08:04and that it's a little bit different from what we have seen the previews experiment
0:08:11but again we will exactly the same information on the a forty some
0:08:17axes
0:08:17and we have are weighted by table that the we have time but this i
0:08:22and the vertical axis we have the speaker evaluation
0:08:26and each
0:08:28of the lines represents all standpoint of clustering purity a valid
0:08:32so for example if we want to start with a
0:08:35the results they're suppose we would like to get
0:08:38in our experiments are clustering purity of one percent that is the score
0:08:44and we want to compare themselves
0:08:46but using o point five a and one eight and we see that
0:08:52with
0:08:54point five we need high spirits high fighters getting ability value
0:08:59this means that
0:09:00if our a optimal solution
0:09:03it is found is found in the middle of the clustering the risk we will
0:09:08the spectral sub network resource
0:09:12then that's about of all we have studied use it to balance of a speaker
0:09:17in the but also for speaker would try to study the manual they one we
0:09:21are percent in a slight that these
0:09:23we have one to speak at it that fast most of the owners in the
0:09:28database and we have
0:09:30all the number of a speaker about how much less our reviewers
0:09:34a we also need to fix
0:09:37but these the number of speakers are divided by the number to follow and in
0:09:41our task we will can see that always a the size of the that six
0:09:46to forty so it's of a where
0:09:49giving are it's equal to
0:09:51given the numbers or
0:09:53of the speakers
0:09:55here we have
0:09:57for scenario in which we might i
0:10:00they a presentation of a clear that the remainder speaker
0:10:05that's we start
0:10:06from a with this one which
0:10:09the main or speaker task
0:10:10more or less the same number of years that or something until these one in
0:10:15which
0:10:15we the main speaker cost much more out of your than the other where
0:10:21if we
0:10:22again
0:10:23take a look at the results that this is a getting us
0:10:27empirically the rate of call
0:10:28we see that
0:10:31this leads to a system and the sense similar results and as we increase the
0:10:37presentation of i'll give that the range you get how
0:10:40we
0:10:41get better results
0:10:43so
0:10:43we can conclude that if the main speaker
0:10:46task you know audio to make the different with different the rest of the via
0:10:52speaker we will expect with a better clustered into shows
0:10:57okay
0:10:58it still for a what if you remember a when i present the clustering algorithm
0:11:03i talk about the stopping criteria but it
0:11:07so far a the computation cost of a threshold value
0:11:12it has been avoided
0:11:13in this section a we will study it to a different methods
0:11:19and arseholes method requires a set of labeled a are we get database
0:11:27two one we would better for a the experiments instead and then also a mismatch
0:11:32between the training
0:11:34and the testing set
0:11:36so
0:11:37the first one that we have call maximum this time with a baseball
0:11:41we will use
0:11:42the label our database to run a clustering process and
0:11:48as we know
0:11:49how many speakers do we have will be able to stop at the point in
0:11:54which the number of speakers is equal to the number of clusters
0:11:57if we
0:11:58it saves that the distance or vast last iteration we will be able to use
0:12:04later
0:12:04a substantial value and that's initial value is they want that it's used for placement
0:12:09for
0:12:11the second method that these called maximum distance with unsupervised score calibration what we do
0:12:16is instead of a leaving the clustering algorithm
0:12:21and they distance metric but time we can be from the ap lda system
0:12:27we will make a calibration process over the voucher scored and
0:12:31that's a made use of credit with this point is the one that will be
0:12:35used later in a clustering algorithm
0:12:38a as this process calibrating we will be able to choose the threshold value that
0:12:44we want depending on
0:12:46how many a errors
0:12:49we moved to let our clustering algorithm to make
0:12:54i'm thinking that if you let
0:12:57a few errors you will stop at very a high speaker the greedy values and
0:13:03we will not get the correct number four
0:13:06or for speaker
0:13:08and we can see that
0:13:10and for the group of clustering task
0:13:15the first one but using a in which we will use similar training and testing
0:13:22set and all the three groups in which we will have different a i'll just
0:13:28better speaker distribution in the training and that there's things that
0:13:32as here we are going in the rest i in stopping what we have a
0:13:37as many speakers just clustering
0:13:39we will define a way to perform a measure as the difference between the number
0:13:44of speakers and the number of clusters
0:13:46related to a the number of speakers
0:13:51so here we have the obtain it results eh
0:13:55we see here it may but the girl axis the their valuable exactly the this
0:14:00one but i just define
0:14:02and here we have
0:14:05in blue
0:14:06a difference of dining with the maximum distance with protocol
0:14:10and on that a solution well funded by the a calibrated a scores
0:14:17and
0:14:18we see that a
0:14:20the second method performs similar source no matter
0:14:24a the that's a mismatch between
0:14:28training and testing set and
0:14:31we
0:14:32the first method may only be used
0:14:35when we have
0:14:36see me that a databases
0:14:38in the training and testing
0:14:42so it to conclude with my presentation
0:14:46i would like to say to think that these
0:14:49we see that speaker clustering used
0:14:53strongly affect by the characteristics of our are we get a calibration
0:14:57and also a we can use these completion to anticipate
0:15:03a possible to change but also to find possible solution in the future for example
0:15:09we see
0:15:10that it if we have operating at
0:15:13are we dataset
0:15:14we will get
0:15:15much one assaults that use the at the database is more so
0:15:20we will propose to split that our database into a is more than one and
0:15:26use those smaller set to run a clustering that aims at
0:15:32i as
0:15:33those clustering task we
0:15:35i have better visual that the rules that the big one
0:15:39we will finally have
0:15:40better results in
0:15:42you know what clustering problem
0:15:44and
0:15:56the supply the need for questions so
0:16:12i question so it's so probably
0:16:18so they you mentioned you have stuck that someone clusters that are useful participate in
0:16:24the accuracy of the best in a scenario
0:16:27but it's based on the system i mean how dependent distributions on the system do
0:16:34you use
0:16:37i is at the unit is possible you know that
0:16:41or
0:16:44well i would say you know
0:16:47it is used a quite spatially
0:16:52i believe that you know when you make
0:16:54one decision
0:16:56a at the beginning of the clustering process
0:16:59you that you will
0:17:01take that into a home until the end of the process
0:17:05so
0:17:06i think i the reason behind and this conclusion is found in that's thing
0:17:14for example a
0:17:16a we can think why
0:17:18we have
0:17:20shown different results when and we have different size of the task
0:17:25and used as the size of the task t speaker
0:17:29errors that are made at the beginning of the clustering process
0:17:32we started out or the of the clustering three
0:17:38and
0:17:38these
0:17:39use
0:17:39more harmful as
0:17:42model iteration
0:17:44we have so is are where the task is more than once we have there's
0:17:50less there's iteration that will be less channel
0:17:54and also for example
0:17:57the task we in which we analyze
0:18:04a the number of a speaker eh we see that
0:18:09there was a result where a chain when we were at the middle of the
0:18:17the clustering three
0:18:19and
0:18:20a and if the solution was found
0:18:23in the beginning of the three or in the end of the three we got
0:18:28a
0:18:29better visual
0:18:30that is also because
0:18:32and
0:18:33i again and at the beginning that a
0:18:38less possible
0:18:39partition
0:18:40and
0:18:41in the middle we have more but as
0:18:45we cannot access all obtain because
0:18:48because of the it possible decisions that we have previously made
0:18:52the old
0:18:53may not be available but that
0:18:55that
0:18:56a in doesn't happen that if we apply
0:18:59we need a in just we have a more
0:19:03possible option
0:19:06that's because of course okay i
0:19:09due to the bic clustering algorithm where using
0:19:12so
0:19:13i'd say a
0:19:15yes i think a
0:19:18clustering
0:19:20i believe i affected by these by an the conclusion stuck
0:19:25at a very influenced by a the algorithm you use
0:19:31a
0:19:32for example
0:19:33a here and are not all there are so that all the experiments we have
0:19:38make
0:19:39but a
0:19:41if we change the
0:19:44but in case mix of and a we used for example
0:19:49average
0:19:50score
0:19:51we show that the evidence
0:19:54a you to the finals of the big of a see what we have
0:19:58a better results when there is a means because if we use
0:20:03average the score instead of matching score a all the results that we obtain whether
0:20:09this were similar so
0:20:10that was an example that if we change the clustering algorithm we may have a
0:20:17different
0:20:23some most of the completion suspect all the rebuttal for fundamental this element definitely a
0:20:28clustering is our method for testing the particular scoring your you see inside what so
0:20:35what is your inside of the limits once
0:20:40so what would you say that affects the most of the to these conclusions
0:20:45a high i think it's a quite affected by
0:20:50by the
0:20:52like the clustering algorithm within your
0:20:57thanks
0:21:03sorry
0:21:05i four u s i isn't
0:21:11no way stance
0:21:12one work was the it's able the database that used to you mentioned that using
0:21:18only "'cause" the t a three hundred seconds of the
0:21:23of
0:21:25okay there is the duration variability inside and so on
0:21:29did you study the effect of this duration on the
0:21:33all the conclusion that you would
0:21:36yes i think we also need any some experiments which a we tested different
0:21:46different iteration
0:21:48and hey the data results channel deconvolution
0:21:53and it keeps similar but a we have
0:21:56hi
0:21:57after some we that higher a clustering purity levels
0:22:02all of our weighting
0:22:05experiment
0:22:06as we got higher the difference between a different databases used not show something