uh so hi everyone um the most common and that's and thank god university of uh where is and i'd like to talk about in your problem to speaker clustering uh which based mainly on some counts so um say uh individual ornaments analysis uh so that's when trying to be with these uh transition somehow so um the outline i so we should make good so uh i'm gonna talk about a little bit of but no parameter density estimate and then i'm going to supply based on this the baseline and see and then across a wide requires adaptation uh in order to be compatible with a problem uh and i'm gonna so some bass and setting would i'm gonna show it like like that is nothing more than say a to see the modes of the posterior uh this includes to define the divergence is uh the proposed kernel and then i got a little bit about exponential family nodded to see that to show that at least for this a family distribution there's i think not to risk not heuristic involved okay so um so we have this well basically is that non parametric approach to plaster uh the number of cluster a is not required known a priori which means that it fit well to the problem okay it's awful mimosa units should be considered rather after develop a hierarchical clustering all these approaches other kind of vacation or yeah this i basically much segmentation stuff like that but also object right recent and my my reference is it the seminal paper of uh combination read by me uh i recently seconded something like citations are something like three thousand so it similar did uh some examples from the the paper you have any mention you wanted to segmented based on the colours okay and you have this eh and at one parameter that something so you never mind uh your target it is to define these clusters are you see that hi debbie dorsey so that's the reason why we go no permit if you wanna one or a find a parameter model you it would be a disaster see this also something like this you see them all there are seven models if you want to have a gaussian distribution then be considered is that it with not with a if you prevail probably doing so another example of the new position how the original about here and uh this is a very the call or a bandwidth salmon explained and this also is this possible how smooth you wanna be and but this way you have different levels of smoothing okay this is another example uh again paper very similar approaches you want to extract the boundaries so what do you do a colour segmentation first and then you have to see what there is about the same here so the limitations now in order to uh the data they don't to what like directly to our problem well that's on the spatial of observation see the text you know parameterisation okay whereas uh uh there are several uh class and task like the one we have here where the natural end please we are we have can only be described using parametric models so can we adapted in order to be applicable for problems such problems they're not they're missing the problem so so i have some photographs with different analysis and we want to plaster unsupervised the same problem you wanna have described it saved by um a normal distribution each of the four and you wanna if you want to live in this missive algorithm in order to classify class of them you wanted to do this model the observation space with there's not euclidean geometry which is in euclidean geometry let's say but a lot of space so the proposed method i suppose used as an exponential family uses a bayesian framework and they just some concept or information john some so standard is you know parameter estimation you have some data by X this X matrix and using a possible window in order six smooth the empirical distribution you convolved with kernel let's say see possibly still have is rather close to mitigate no say put it's a gaussian and the only parameter here is V eights about excel something okay department impersonal parameter recovered that stuff before but what what not grammatically basically means that you let you parameters grow linearly with your data doesn't mean that you don't have parameters actually but you're parameters are actually the data themselves and last someone with some say stuff like that okay and the buttons can be bible too um the basic problems are mainly that you don't have enough data to estimate properly okay and that's to have more more a uh the dimensions um you require more more data i guess of the personality and all the stuff but the point is do we actually need to to to estimate robust for each problem these on the like yeah and the answer is no for example class and consider before the before what we need it you wire is a method because because the mode say they have said we haven't we should have another but not the the become the mode and the method plus sign each observations to the appropriate mode whatever this means so should we require the estimate robustly that we get oh that's really if we can easily by bypass this procedure and that what is it that so recall the expression find more seem to differentiate with respect to X and set this to zero okay after some ads in because this is a square distance you squarely so you have this you know so define for it that the differential of the current no with these G and you have the simple form apart from the constant you have this and this this is you can interpret it as uh say the estimated pdf using the differential profile and the other is a message that the main the main a result this what let's say is is that it reminded okay when you have this uh say wait time range of all the pixels with respect the differential kernel when and and you all you need to do is find a way to buy this if this is one of the main means that you are you know more it so you don't actually estimated dense you seem to do this and that's implies the other so that would look like this very intuitive this is abusive spectrum so for each of the reservations and the other does not matter at all start begin with excited to to say X zero calculate an assist vector like this okay and simply and it that one position until convergence it they are proven over this is very trivial insulting the common need to set up a so that was it due to the situation for all the observation huh okay and stored into the observation here the convergent point so if two or more are on the same old they belong to the same cluster does it was that so if you wanna the post here is an observation okay the initial position you see the trajectory any but those are for the red the red apple thus it was that that's um is it ugly the main idea in observation space okay and you can and class was used in arbitrary shapes see we're clustering what are made with a gun and uh make this last week be one i mean it's very very good so yeah so how how can we had that this idea be applicable to the spatial distribution suppose they could be last on the same have and utterances okay um not forget about the speaker last numbers are taken more generally have an distributions parameterised by theta we should if i colonel that means a shape in the distance appropriate okay and the pdf can be regarded as opposed to a few time which sense in which is that you consider the density of the data given your observations and this is it i would what determines or simply the cluster indicators oh your initial segmentation so if you have considered that speaker clustering task in that or is it some more you apply first segmentation it might be uniform might be speakers said based on speaker change detector and this is it okay so in this sense as opposed to fit and here is an example suppose we have six in this of segments okay and this yeah the common to get the same when this is over all posterior so if one apply the same idea and they begin with here we will see that this last would be attracted only by itself so we would create a second bite so okay the same haven't so this the other three we're gonna have that would be i don't think that we're gonna have four iconoclast and the other the the the last one again its own class see this again to the remote files but it's it exactly the same idea okay so it's like a higher level in the hierarchy that's what that's a integrated to what if i have the observations and the parameters but now you are in the space of observations and you have a posterior okay and uh the the same way you want express somehow the uncertainty in a smoother results by using this kernel in the observation domain which might be gaussian on the same way you have to be you have to expose your uncertainty about uh the estimation and as you see like why why we should a consider also the the the number the sample size of its class it was supposed to have the same position and supple somehow uh they were all this corresponded to ten times that's the sample size before well then probably all all these classes would be single tones it's saying that would be it would be a single class because we expect that as much as more data right these three we will manage more to what we would have a close okay so there is certainly dependence of the sample size if it's linear it's not i i i it's a linear or only if the motors are correctly specified and in speaker diarization and especially for using simulation model is there's if you just dismiss misspecification that you can consider this actually and that's a problem um so let's define the kernel um let's see this delegation this some months but uh probably don't have time line but consider this as a parameterised by delta family of the aviation be to assume so the for me to scale endeavours and the others but you have living their argument yeah that was the liquid you okay uh they wanted to was what the hell at least and is your estimate distance but you can also estimate right and all that kind of errors by summing it or by taking their harmonic mean and we use this approach however many dressing up was um can based on this highly reduced okay and recall that this will happen no matter what the database which is a this is the see that pitch information about this which is simply that then they made the metric tensor you do it you could consider it information john so having defined as the shape now consider a some infinite distances let's define the shape because we had to go motion in the observation space should be goes any longer well it's another parameterisation now by myself okay if you if you consider only out of that are equal to one then you have an exponential okay and this should be considered as i read the rather nicotine the K derivative density with respect to the information element in these information element with like real big measure if one but anyway um well so if you don't like uh though i i i simply consider a close one you have a lot of time for mobile phone and isaacs play a video propose again explain you collected yesterday these are varied only two the T distribution if you wanna considering us a heavy tail priors okay so it's a very nice interpretation of that the looks like so i'm not gonna analyse all this stuff but all this and we can see that by minimising the cost function okay here this down is correspond to how much does you are off in the you are about to measurement so it should be somehow leno leno with the sample size whereas this to tell you how close you wanna be with an informative prior the jeffrey's prior basically which is simply the flat prior if you consider uh you can enjoyment i think does that so its minimisation of cost function and here is just a single the average 'kay on them yeah that then the deviation okay that's great and all this uh right of the conjugate priors all this stuff and we consider our as um parameter parameter additional this family but it's often end up so let's go back problem having to find all the stuff so here's the posterior we have to have okay segments okay this is simply the difference but i don't you can be outside and you have these suppose it i would like also one let's consider only the stuff okay so this should be normally something like away and K are simply the number of my examples of how of it for the case uh sec and then a role seamless are just like like a weight so you may consider doesn't it as a as a mixture of uh distribution okay so the point is now recall that have differentiated respect to extract because of of uh using the squared distances we came up with this formula so the question is we did we would you heuristics and a or a by differentiate this stuff or or or not uh and and the point is that if you constrain yourself you'd exponential family then there's not reasonable for example in the in the save for the normal distribution you have these uh the this a natural parameters that correspond to this problem tradition the sufficient statistic that i a tax naturally with the the the natural parameters and the measurements constant and um but i'm not gonna get into that because the wine tasting wait um so there is another couple uh or a parameter space and is the expectation parameters more close to what we usually uh tree in this in in the speaker clustering okay by exploiting the complexity or the log partition with respect to defeat that you end up with a not with the expectation parameters okay and by by re differentiating the stuff with respect to feed that you get the fisher information major matrix okay all the stuff or from the classical statistics of where known and this is a potential find that the two potential functions which shows the mostly clearly why this ah this pay some couple expectation parameters and that he the parameters so if you want to do what you wanna do what you want to read and the same way you average between points in there euclidean domain you may consider also the expectation parameterisation the expectation parameters all the natural parameters and it depends on you but like to do okay which geometry you wanted to do you probably have which more appropriate so they could but lyman let's let's consider only the K and i went to the two extremes the one and zero they'll take one or zero you have these that's that's these are all mad but if you differentiate you obtain some cleaner what to do this but it also this there's a there is any finally show like to mention here that's not mentioned my paper you expect that by differentiating this and and they are like rock and roll you have a parameterisation with that data but since you don't use the natural gradient okay see that sees you don't consider the care but this new space is curved you differentiate respectively attended in if you switch composition which is somehow weird but if you use the natural gradient which define simply defined as this with distilled that okay now if you differentiate with respect to feed that you remain to the same parameterisation okay there's no switching between this parameterisation so you know i recall that in order to to make it a to define a ground rule you should want to do the kind of the space so you should work with the natural reading so the final 'cause i did what you put in like a month in the model this model called okay recalling the same again iteration start from a from the percent per segment of the last there's no doubt that has no matter but how it doesn't matter and your next estimation will be an average oh for of all the all for the other segment in this parameterisation or in this parameterisation the fit they're not around it depends on you so suppose we have some segments it's all segments and the method to sixteen clusters it's blue uh don't represents actually the mean the mean value okay the mean value of these say the these say to mfcc visions okay and see how the man you know sixteen plus you also find some single class that simply that right by the bottom so so something a final report to go to the experiment um uh all this i wouldn't feel about the U two specification is no uh you should somehow bias the results um towards that have an annuity because you know that a a dialogue with the doesn't have say utterance of say three or four a segments is this a must buy as a as a a supplying a a do you play prior over a transition matrix been forced somehow continue eating so but we do we multiply okay by this uh C distribution okay so if you are supporting the in the cave segment okay you want to emphasise your neighbour but not in a way that goes and does it okay not in such a way because if you if you do together some way the first and the last segment would we what we want one B one from the same a classroom never you wanted it more mild and everything yeah i guess i don't the one week so is there a database okay sure oh well you know this is a very database broke news okay and we compare it to the standard of big uh here approach you won't find any and super duper while our results uh the best a configuration we find with local because this we fixed with fixed the lab agenda parameter using the development and then we have this result in the test set and M S was put them in C exactly hey using only single ego self motivated matrix no gmms at all and we see that the harmonic mean was the best four and the other side rather close enough not though if you use and if you're a clustering seem to care that it is would be a tragedy this is because of the big right you but you cannot use big you know and that's a problem i would use this stuff so to finish but also and adaptation of an open out of attacks on the space of observation space bounded okay we use some reasonable i think uh bayesian argument in order to to make these transition okay and i was so that at least for the exponential families not recyclable and stuff so uh well these are relevant to be honest um you want to obtain a point estimate about your hidden variables build stuff document clustering but or if you wanna have diarrhoea that you consider all real time lapse but if you wanna doing force you don't you stuff you don't use your clustering either okay you do do do do dirichlet process says you use um N C M C you use variational bayes okay so one final gmms certainly you can if you consider the complete data they don't if you consider they as a like the complete data likelihood then the gmms we don't wanna see family yeah but you you need to have a correspondence or you need to start with the ubm with a common ubm why because if you can see the the complete data likelihood you only i can see there uh okay on the basis between corresponding uh gaussian last italian for them different weights if you trained if you also allow the trains the the weights to be to be also i vectors i'm sure you can use it for that even the original message however if you use i i picked those note that you have um different segments so the variability of the estimate should also be part of the problem so somehow um play with a band with allow the bandwidth to be depending on the sample size okay to to to encode these i uncertainty in the this estimate thank you thanks and sorry for chris you wanted to keep some time for some question ooh you should comment fig okay yeah quick just simple it's on it's one uh you are optimising uh you time for them yeah so where doing gradient descent so you can you have to compute the gradient but do i understand correctly that you're not able to compute the value so you during the optimisation oh being i don't evaluate don't it actually you can this is the the house analytical that but is not correct it's based on that you know it's uh it's a simple estimate what what what the idea behind miss that means if i was it that you don't actually need to estimate the overall period you can bypass problem by using gradient that the idea is the just a practical thing because my favourite optimisation yeah right objective function okay if we go okay so of the wine well we should come in general questions about the mean shift um if you start off start out by hypothesising time clusters you will always got time clusters is that correct no you don't no or little number one stress come you some i was just thinking proper the number of classes completely fair infer from the from the average you don't you don't the uh proof a predefined number of class it depends only if the if the points compared okay to the to the same value so you don't need to big yeah you said that um not having not being able to incorporate a course first of all you cannot have big because big big implies a a marginalisation with respect to the parameters okay but it isn't it it's a transfers dropout naturally hmmm it's thrusters dialling not sure you know you can hold find the correct number of clusters with that without using a big sorry okay material sure sure i'm not using it because i forgot the comparison probably the correctly last okay