0:00:06uh so hi everyone um the most common
0:00:10and that's and thank god university of uh
0:00:12where is
0:00:13and i'd like to talk about in your problem
0:00:14to speaker clustering
0:00:17based mainly on some counts
0:00:21say uh
0:00:23individual ornaments analysis
0:00:25uh so
0:00:26that's when trying to be with these uh transition somehow
0:00:30so um the outline
0:00:44we should make
0:00:47so uh i'm gonna talk about a little bit of but no parameter density estimate
0:00:50and then i'm going to supply based on this the baseline and see
0:00:54and then across a wide requires adaptation
0:00:57uh in order to be compatible with a problem
0:01:00uh and i'm gonna so some bass and setting would
0:01:03i'm gonna show it like like that
0:01:04is nothing more than say a to see
0:01:06the modes of the posterior
0:01:08uh this includes
0:01:10to define
0:01:11the divergence is
0:01:12uh the proposed kernel
0:01:14and then i got a little bit about exponential family nodded to see that to show that
0:01:18at least for this a family distribution there's
0:01:22i think not to risk
0:01:23not heuristic involved
0:01:26so um
0:01:29so we have this
0:01:30well basically is that non parametric approach to plaster
0:01:34uh the number of cluster a is not required known a priori
0:01:37which means that it fit well to the problem
0:01:40it's awful mimosa units
0:01:42should be considered rather after develop a hierarchical clustering
0:01:46all these approaches
0:01:47other kind of vacation or
0:01:50i basically much segmentation
0:01:51stuff like that
0:01:53but also object right
0:01:56and my my reference is it
0:01:58the seminal paper of uh combination read by me
0:02:02uh i recently seconded
0:02:04something like
0:02:05citations are something like
0:02:06three thousand
0:02:07so it
0:02:08similar did
0:02:10uh some examples from the the paper you have any mention you wanted to segmented
0:02:15based on the colours
0:02:16okay and you have this
0:02:18and at one parameter that something so you never mind
0:02:22your target it is to define these clusters are you see that
0:02:26hi debbie dorsey
0:02:28that's the reason why we go no permit
0:02:31if you wanna
0:02:31one or a find a parameter model you
0:02:35it would be a disaster
0:02:36see this
0:02:38something like this you see them all
0:02:40there are seven models
0:02:42if you want to have a gaussian distribution then be considered is that it
0:02:46with not with a if you prevail probably doing
0:02:50so another example of the new position how the original about here
0:02:55uh this is a very the call or a
0:02:59bandwidth salmon explained and this also is
0:03:02this possible
0:03:04how smooth you wanna be and
0:03:06but this way you
0:03:07have different levels of smoothing
0:03:09okay this is another example uh again
0:03:14very similar approaches
0:03:15you want to extract the boundaries
0:03:17so what do you do a colour segmentation first and then you have to see what
0:03:21there is about the same here
0:03:25the limitations now
0:03:26in order to uh the data they don't to
0:03:29what like
0:03:30directly to our problem
0:03:32well that's on the spatial of observation
0:03:34see the text
0:03:35you know parameterisation
0:03:37okay whereas uh uh there are several uh class and task
0:03:41the one we have here
0:03:43the natural end
0:03:45we are we have can only be described using
0:03:48parametric models
0:03:52can we adapted
0:03:53in order to be applicable for problems such problems
0:03:55they're not they're missing the problem so
0:03:57so i have some photographs
0:04:00with different analysis
0:04:01and we want to plaster
0:04:04the same problem
0:04:06you wanna have described it saved by um a normal distribution
0:04:09each of the four
0:04:11and you wanna
0:04:12if you want to live in this missive algorithm in order to classify class of them
0:04:17you wanted to do this model the observation space with
0:04:20there's not euclidean geometry which is in euclidean geometry let's say
0:04:25a lot of space
0:04:27the proposed method i suppose used as an exponential family uses
0:04:30a bayesian framework
0:04:32and they just some concept
0:04:33or information john
0:04:40standard is
0:04:41you know parameter
0:04:43you have some data by X
0:04:44this X matrix
0:04:46and using a possible window in order
0:04:49the empirical distribution you convolved with kernel
0:04:53let's say see possibly still have is rather close to mitigate no
0:04:57say put it's a gaussian
0:04:59the only parameter here is
0:05:00V eights about
0:05:04something okay
0:05:06department impersonal parameter recovered that stuff before
0:05:10what what not grammatically basically means that you let you parameters grow linearly with your data
0:05:15doesn't mean that you don't have parameters actually
0:05:17but you're parameters are actually the data themselves
0:05:20last someone with some say stuff like that
0:05:23okay and the buttons can be bible too
0:05:26um the basic problems are mainly that you don't have enough data to estimate
0:05:32okay and that's to have more more a
0:05:35the dimensions
0:05:37um you require more more data
0:05:39i guess of the personality and all the stuff
0:05:44the point is
0:05:45do we actually need
0:05:46to to
0:05:47to estimate robust
0:05:49for each problem
0:05:51these on the like yeah
0:05:52and the answer is no for example
0:05:54class and
0:05:56consider before the before
0:05:58what we need it you wire is
0:06:00a method
0:06:02because the mode
0:06:03say they have said
0:06:04we haven't we should have another but not the the become the mode
0:06:08and the method
0:06:09plus sign each observations
0:06:11to the appropriate mode
0:06:12whatever this means
0:06:14should we require the estimate
0:06:16robustly that we get
0:06:17oh that's really
0:06:18if we can easily by
0:06:21this procedure
0:06:22and that
0:06:23what is it that
0:06:24so recall the expression
0:06:27find more
0:06:28seem to differentiate
0:06:29with respect to X
0:06:30and set this to zero
0:06:34after some ads in because this is a square distance you squarely so you have this you know
0:06:40define for it
0:06:42the differential of the current no
0:06:44with these G
0:06:45and you have the simple form
0:06:47apart from the constant you have this and this this is
0:06:50you can interpret it as
0:06:52uh say the estimated pdf using
0:06:55the differential profile
0:06:57and the other is a message that
0:06:58the main the main a result
0:07:02let's say is is that
0:07:03it reminded
0:07:05okay when
0:07:07you have this
0:07:08uh say wait time range
0:07:10of all the pixels
0:07:12with respect the differential kernel
0:07:16and and you all you need to do is
0:07:18find a way to buy this
0:07:21this is one of the main means that you are you know more
0:07:25it so you don't actually estimated dense
0:07:27you seem to do this
0:07:28and that's implies the other
0:07:30so that would look like this
0:07:32very intuitive
0:07:34this is abusive spectrum
0:07:36so for each of the reservations
0:07:38and the other does not matter at all
0:07:40start begin with excited to to say X zero
0:07:44calculate an assist vector
0:07:46like this
0:07:49and simply
0:07:50and it
0:07:50that one position
0:07:52until convergence it they are proven over this is very
0:07:56trivial insulting the common need to set up a
0:07:58so that was it
0:07:59due to the situation for all
0:08:01the observation huh
0:08:03okay and stored into the observation here
0:08:06the convergent point
0:08:11or more are on the same old
0:08:13they belong to the same cluster
0:08:15does it was that
0:08:16if you wanna
0:08:18post here is an observation
0:08:20okay the initial position
0:08:21you see the trajectory
0:08:23those are
0:08:25for the red the red apple
0:08:28thus it was that that's um is it ugly
0:08:30the main idea
0:08:31in observation space
0:08:33and you can and class was used in arbitrary shapes
0:08:38see we're clustering
0:08:41what are made with a gun
0:08:44make this last week
0:08:45be one i mean
0:08:46it's very very good
0:08:55so how how can we had that this idea
0:08:57be applicable to the spatial distribution
0:09:00suppose they could be last on the same have
0:09:02and utterances
0:09:07not forget about the speaker last numbers are taken more generally have an distributions
0:09:11parameterised by theta
0:09:13we should if i colonel
0:09:15that means a shape in the distance
0:09:18okay and the pdf can be regarded
0:09:20as opposed to a few time which sense
0:09:22in which is that you consider
0:09:24the density of the data
0:09:27your observations
0:09:28and this is it i would
0:09:29what determines
0:09:31or simply the cluster indicators
0:09:34your initial segmentation
0:09:35so if you have considered that
0:09:37speaker clustering task in that or is it some more
0:09:39you apply first
0:09:41segmentation it might be uniform might be
0:09:44speakers said
0:09:45based on speaker change detector
0:09:46and this is it
0:09:49so in this sense as opposed to fit
0:09:52and here is an example
0:09:54suppose we have six
0:09:55in this of segments
0:10:00the common to get the same when this is over all posterior
0:10:02so if one apply the same idea and they begin with here
0:10:06we will see that this last
0:10:08would be attracted
0:10:09only by itself
0:10:10so we would create
0:10:11a second bite so
0:10:13okay the same haven't so this
0:10:16the other three
0:10:17we're gonna have that would be i don't think that we're gonna have
0:10:22and the other the
0:10:23the the last one again
0:10:25its own class
0:10:27see this again to the remote files but it's it
0:10:30exactly the same idea
0:10:33okay so it's
0:10:35a higher level in the hierarchy that's what
0:10:38a integrated to what if i have the observations
0:10:42and the parameters but now
0:10:43you are in the space of observations
0:10:45and you have a posterior
0:10:47okay and uh
0:10:49the the same way you want express somehow the uncertainty in a smoother results
0:10:53by using this kernel
0:10:54in the observation domain which might be gaussian
0:10:57on the same way you have to be
0:10:58you have to expose your uncertainty
0:11:00about uh the estimation
0:11:02and as you see like
0:11:05why why we should a consider also
0:11:08the the the number
0:11:10the sample size of its class it was supposed to have the same position
0:11:14and supple somehow
0:11:16uh they were
0:11:17all this corresponded to ten times
0:11:19that's the sample size
0:11:22then probably all all these classes
0:11:24would be single tones
0:11:26it's saying that would be it would be a single class because we expect
0:11:29that as much as more data right
0:11:32these three
0:11:33we will manage
0:11:34more to what we would have a close
0:11:38okay so
0:11:39there is certainly dependence of the sample size if it's linear
0:11:42it's not
0:11:43i i
0:11:44it's a linear or only
0:11:45if the motors are correctly specified
0:11:48and in speaker diarization and especially for using
0:11:51simulation model is
0:11:52there's if you just dismiss misspecification
0:11:56you can consider this
0:11:58and that's a problem
0:12:00um so
0:12:01let's define the kernel
0:12:05let's see this delegation this some months
0:12:07but uh probably don't have time line
0:12:10consider this as a parameterised by delta
0:12:13family of the aviation be to assume so
0:12:15the for me to scale endeavours
0:12:18and the others
0:12:19but you have living their argument
0:12:21yeah that was the liquid
0:12:23you okay
0:12:24uh they wanted to was what the hell at least
0:12:28and is your estimate distance
0:12:30but you can also estimate right and all that kind of errors by summing it
0:12:33or by taking their harmonic mean
0:12:36we use this approach
0:12:37however many dressing up was um
0:12:39can based on this highly reduced
0:12:41okay and recall that
0:12:43this will happen no matter what the database
0:12:46which is a
0:12:47this is the see that pitch information about this which is simply that
0:12:50then they made the metric tensor
0:12:52you do it you could consider it information john
0:12:57so having defined as the shape
0:12:59now consider a some infinite distances let's define the shape
0:13:03we had to go motion
0:13:04in the observation space
0:13:05should be goes any longer
0:13:07it's another parameterisation now by myself
0:13:10okay if you if you consider only out of that
0:13:13are equal to one
0:13:15then you have an exponential
0:13:16okay and this should be considered
0:13:18as i read the rather nicotine
0:13:20the K derivative
0:13:22density with respect to the information element in these information element with like real big measure if one
0:13:28but anyway
0:13:29um well
0:13:30if you don't like
0:13:31uh though i i i
0:13:33simply consider a close one
0:13:35you have a lot of time for mobile phone
0:13:38and isaacs play a video propose again explain you collected yesterday
0:13:41these are varied
0:13:42only two
0:13:43the T distribution
0:13:45if you wanna
0:13:47us a heavy tail
0:13:50okay so it's a very nice interpretation of that
0:13:52the looks like so
0:13:54i'm not gonna
0:13:55analyse all this stuff
0:13:57but all this
0:13:58and we can see that by minimising the cost function
0:14:01okay here
0:14:03this down
0:14:05correspond to how much does you are off in the you are about to measurement
0:14:10it should be somehow leno leno
0:14:12with the sample size
0:14:14whereas this to tell you
0:14:16how close
0:14:17you wanna be
0:14:18with an informative prior the jeffrey's prior
0:14:21which is simply the flat prior if you consider
0:14:24uh you can enjoyment
0:14:26i think does that so its minimisation of cost function
0:14:29and here is
0:14:29just a single the average
0:14:31'kay on them
0:14:31yeah that
0:14:32then the deviation
0:14:35that's great and
0:14:36all this
0:14:37uh right of the conjugate priors all this stuff
0:14:40and we consider our as um
0:14:43parameter parameter additional this family
0:14:46but it's often end up
0:14:50let's go back problem having to find all the stuff
0:14:53so here's the posterior
0:14:55we have to have okay segments
0:14:58this is simply the difference but i don't
0:15:00you can be outside
0:15:02and you have these
0:15:02suppose it
0:15:03i would like also one let's consider only the stuff
0:15:06so this should be normally something like away
0:15:09and K
0:15:10are simply the number of my examples of how of it for the case uh sec
0:15:15and then a role
0:15:16seamless are just like like a weight so you may consider doesn't it
0:15:19as a as a mixture of uh distribution
0:15:22okay so the point is now
0:15:25recall that have differentiated respect to extract because of
0:15:29uh using the squared distances
0:15:31we came up with this formula
0:15:32so the question is
0:15:34we did we would you heuristics and a or a by differentiate this stuff
0:15:39or or or not
0:15:41uh and and the point is that
0:15:43if you constrain yourself
0:15:45you'd exponential family
0:15:47then there's not reasonable for example in the in the
0:15:52save for the normal distribution
0:15:54you have these uh the this a natural parameters that correspond to this problem tradition
0:15:59the sufficient statistic that
0:16:00i a tax naturally with the the
0:16:03the natural parameters
0:16:04and the measurements constant
0:16:06and um but i'm not gonna get into that because
0:16:09the wine tasting wait
0:16:15there is another couple
0:16:19or a parameter space
0:16:20and is the expectation parameters
0:16:22more close to what
0:16:24we usually
0:16:25uh tree
0:16:27in this
0:16:27in in the speaker clustering
0:16:30by exploiting
0:16:31the complexity
0:16:33or the log partition with respect
0:16:34to defeat that
0:16:35you end up with a not with the expectation parameters
0:16:39okay and
0:16:41by re differentiating the stuff
0:16:43with respect to feed that
0:16:45you get
0:16:46the fisher information
0:16:48matrix okay
0:16:50all the stuff or
0:16:51from the classical statistics of where known
0:16:55this is a potential find that the two potential functions
0:16:58which shows the mostly clearly why this
0:17:02this pay some couple expectation parameters
0:17:04and that he the parameters
0:17:06so if you want to do what you wanna do what you want to read
0:17:10and the same way you average between points in there
0:17:13euclidean domain
0:17:15you may consider also
0:17:17expectation parameterisation
0:17:19the expectation parameters
0:17:21all the natural parameters
0:17:22and it depends on you
0:17:23but like to do
0:17:25okay which geometry you wanted to
0:17:27do you probably have
0:17:29which more appropriate
0:17:33they could but lyman let's let's consider only the K and i went to the two extremes
0:17:38the one and zero
0:17:40they'll take one or zero
0:17:41you have these
0:17:42that's that's these are all mad
0:17:45if you differentiate you obtain some cleaner
0:17:47what to do this
0:17:48but it also this
0:17:49there's a there is any finally show like to mention here that's not mentioned my paper
0:17:54you expect that
0:17:55by differentiating this and
0:17:57and they are like rock and roll
0:18:00you have a parameterisation with that data
0:18:04since you don't use the natural gradient
0:18:07okay see that sees you don't consider the care
0:18:10but this new space is curved
0:18:12you differentiate respectively attended in if you switch composition
0:18:17which is somehow weird
0:18:18but if you use the natural gradient
0:18:21which define simply defined as this
0:18:23with distilled that
0:18:26now if you differentiate with respect to feed that
0:18:29you remain to the same parameterisation
0:18:31okay there's no switching
0:18:33between this parameterisation
0:18:34so you know i recall that in order to to make it a
0:18:38to define a ground rule
0:18:40you should
0:18:40want to do the kind of the space
0:18:42so you should
0:18:43work with the natural reading
0:18:48the final
0:18:50'cause i did what you put in like a month in the model this model called
0:18:54recalling the same again
0:18:57start from a from the percent per segment of the last there's no doubt that has no matter
0:19:02but how it doesn't matter
0:19:04and your next estimation will be an average
0:19:07oh for
0:19:09of all the all for the other segment
0:19:11in this parameterisation
0:19:12or in this parameterisation the fit
0:19:15they're not around
0:19:16it depends on you
0:19:21suppose we have some segments
0:19:23it's all segments
0:19:25and the method to sixteen clusters
0:19:27it's blue uh don't
0:19:30actually the mean the mean value
0:19:34okay the mean value of these say the these say to mfcc
0:19:39okay and see how the man
0:19:41you know sixteen plus you also find
0:19:43some single class
0:19:45that simply that right
0:19:47by the bottom so
0:19:52something a final report to go to the experiment
0:19:56um uh
0:19:58all this i wouldn't feel about the U two specification is no
0:20:02uh you should somehow bias
0:20:04the results
0:20:05um towards that have an annuity because you know
0:20:09that a a dialogue with the doesn't have
0:20:12say utterance
0:20:13of say three or four a
0:20:17is this a must buy as a as a a supplying a a
0:20:21do you play prior
0:20:22over a transition matrix
0:20:24been forced somehow continue eating
0:20:27but we do
0:20:28we multiply
0:20:30by this uh C distribution
0:20:34okay so if you are
0:20:36supporting the in the cave
0:20:39you want to emphasise your neighbour
0:20:42but not in a way that goes and does it
0:20:45okay not in such a way because if you if you do together some way
0:20:49the first and the last segment would we what we want one B one from the same a classroom
0:20:55you wanted it more mild and everything
0:20:58yeah i guess
0:20:59i don't the one week
0:21:02is there a database
0:21:07oh well
0:21:08you know this is a very database broke
0:21:12okay and we compare it to the standard of big uh here approach
0:21:16you won't find any and super duper while our results
0:21:21the best
0:21:22a configuration we find with local because this
0:21:25we fixed with fixed the lab agenda parameter using the development and then we have this
0:21:29result in the test set
0:21:31and M S was put them in C
0:21:34hey using only single ego self motivated matrix
0:21:37no gmms at all
0:21:38and we see that the harmonic mean was the best
0:21:42and the other side
0:21:43rather close enough not though
0:21:46if you use
0:21:47and if you're a clustering
0:21:48seem to care that it is
0:21:50would be a tragedy
0:21:53this is because of the big
0:21:54right you
0:21:55but you cannot use big you know
0:21:58and that's a problem
0:21:59i would use this stuff
0:22:02to finish
0:22:03but also
0:22:05and adaptation of an open out of attacks on the space of observation space bounded
0:22:10okay we use
0:22:12some reasonable i think uh bayesian argument in order to
0:22:16to make these transition
0:22:19okay and i was so that at least for the exponential families
0:22:22not recyclable and stuff
0:22:25uh well these are relevant to be honest um
0:22:27you want to obtain a point estimate about your hidden variables
0:22:31build stuff document clustering but
0:22:34or if you wanna have diarrhoea that
0:22:37consider all real time lapse
0:22:39but if you wanna doing force you don't you
0:22:42you don't use your clustering either
0:22:44okay you do do do do dirichlet process
0:22:47you use um N C M C
0:22:50you use variational bayes
0:22:53one final
0:22:55certainly you can if you consider the complete data they don't if you consider
0:22:59they as a like the complete data likelihood
0:23:02then the gmms we don't wanna see
0:23:05but you you need to have a correspondence or you need to start
0:23:08with the ubm with a common ubm
0:23:10why because
0:23:11if you can see the the complete data likelihood
0:23:14you only
0:23:15i can see there
0:23:16uh okay on the basis between
0:23:20uh gaussian
0:23:24for them different weights if you
0:23:26trained if you also allow the trains
0:23:29the the weights to be to be
0:23:31also i vectors
0:23:32i'm sure you can use it for
0:23:35even the original message
0:23:37if you use
0:23:38i i picked those
0:23:40note that you have
0:23:41um different segments
0:23:45the variability of the estimate
0:23:46should also be part of the problem
0:23:49so somehow
0:23:52play with a band with allow the bandwidth to be
0:23:54depending on the sample size
0:23:56okay to to to encode these
0:23:58i uncertainty
0:24:00in the this estimate
0:24:03thank you
0:24:10thanks and sorry for
0:24:11chris you wanted to keep some
0:24:14for some question
0:24:16you should comment
0:24:27just simple
0:24:29it's on
0:24:30it's one
0:24:31you are optimising
0:24:33you time for them
0:24:37doing gradient descent
0:24:39you can
0:24:40you have to compute the gradient
0:24:43do i understand correctly that you're not able to compute the
0:24:48so you during the optimisation
0:24:51i don't evaluate don't it actually you can this is the
0:24:55the house analytical that
0:24:56but is not correct
0:24:58it's based on that
0:24:59you know it's uh
0:25:01it's a simple estimate
0:25:03what what what
0:25:04the idea behind miss that means if i was it
0:25:06that you don't actually need
0:25:07to estimate the overall period you can bypass problem by using gradient
0:25:12that the idea
0:25:13is the
0:25:14just a
0:25:15practical thing because my favourite optimisation
0:25:20right objective function
0:25:23if we go
0:25:26of the wine
0:25:31well we should come in
0:25:41general questions about the mean shift
0:25:45if you start off
0:25:46start out by hypothesising time clusters
0:25:50you will always got time clusters is that correct
0:25:53no you don't no or little number one stress come you
0:25:58i was just thinking proper
0:26:00the number of classes completely fair infer from the from the average
0:26:04you don't you don't the uh
0:26:07proof a predefined number of class
0:26:10it depends only if the if the points compared
0:26:13okay to the to the same value
0:26:16you don't need to
0:26:19you said that um
0:26:22not having not being able to incorporate
0:26:24a course
0:26:25first of all you cannot have big because
0:26:27big big implies
0:26:29a a marginalisation with respect to the parameters
0:26:33okay but it isn't it it's a transfers dropout
0:26:37naturally hmmm it's
0:26:41not sure
0:26:43you know you can hold
0:26:45find the correct number of clusters
0:26:47with that without using a big
0:26:51sure sure
0:26:52i'm not using it because i forgot the comparison
0:26:54probably the correctly last