so um this is a talk about uh if the class i two um a non-negative matrix factorization so the outline is uh as follows that will uh uh briefly recall some uh previous work about uh i S nmf and uh describe the latent a statistical model um a two D is uh specific uh and N F L uh then that will uh present some uh a way of actually uh smoothing the activation the question and the major contribution work uh going along with of course an algorithm which is uh going to be a majorization minimization algorithms and the before giving a few entries so this is only to uh introduce my annotations of them the data here the non-negative data here is going to be a view of dimensions F by and so a stack frequency here and and is a number of frame that and and the dictionary matrix W the activation matrix H at the number of components K and um so uh and then F A usually involves a minimizing in uh a quaternion of uh this form and are non negativity constraint of the value H with specific uh cost function here which in our case will be it a quite effective that mentions that that will intra justly so the applicative context here is a unsupervised the music you have representation uh so we will deal with a um audio spectrograms okay and the idea is to uh learn some uh representative the spectral patterns of thoughts on the spectrogram spectrogram uh and of course uh so there's which you should questions about uh how to choose who V uh whether you should choose the magnitude or or spectrogram uh how to choose the measure of fit that using the in the decomposition and also and then estimate can did you was um uh a wrong K approximation you approximate the spectrogram is the son of uh a rank one spectral grounds and question is uh if i want to retrieve a uh the comp the corresponding components in the time domain uh how should i in this or rank one a spectrograms an how should i did phase in so well um a generative approach uh for in serving this the questions that is uh what every shot to a tech wise i two and then S um uh the uh with a latent model that is as follows so uh let X B you or uh complex value the uh stft "'em" oh the signal you want to decompose so X is different from V O X is a a complex a complex value of data it's you assume that um yeah data so a comp the a time-frequency coefficient uh it's uh uh F and that uh he's a sum of uh components C K S and so okay the uh component index set at feast can see and is the frame uh complex value the such that this coefficient a as uh complex a circular a complex uh a gaussian distribution a uh meaning and uh you have to run done the phase them uh we such a structure on the variance and so basically a rank one uh a structure in the on the variance then you can it very easy to show uh uh i assuming that the components are independent uh that the log-likelihood function and easy quite two uh uh it that quite set to they've jones between the power spectrogram of of data so that's uh absolute value of X to the square um where the it why quite side to uh had a chance is uh defined a by this equation um uh so that's um quite a nice uh thing to i mean to this model is quite a nice thing uh to high because it's a to truly we uh it's a it's a proper uh generative model of the of the spectral grand and in particular if uh you quantity of interest uh is the um and the the the the components of you can uh construct these components uh in a statistically run the the way for example to take the the and in C uh estimate of coefficient uh uh component a high frequency F and and it simply uh of in a filter out the time-frequency mask applied to at of summation uh and uh the time-frequency mask mask to defined by the contribution of uh that component uh in terms of ions divided by the variance of all the company so that's uh that's it a quite so in M F the the the the the the basic that don't so of course uh audio exhibit some some a time cost stance some are we didn't see and uh i taking this uh information to account uh tan lee to uh more the estimation of H and thus so the value of a reduced to uh i don't see beach ambiguities a and they're still to charlie more present component reconstruction so in this work we we want to uh so if a P the uh and then F uh a problem not where we uh at the uh a and i'll see a function a which measures on this looks nests of the rows of H another the question is a how should we uh choose all uh bill this uh this most knots constraint then and again a we can take a generative approach uh which is what we did in uh in in previous work as well where don't where we propose to uh model the smoothness of the activation coefficients in terms of uh markov of chain so either our are in house gonna all an on of change could the present and non negativity so the id simply to assume a prior a for H T and now so um the activation creation of uh component K at frame hand to be searched that the mode would of this uh a distribution is obtained uh i the coefficient in the previous frame okay so you can basically uh a black here a ga now and again a distribution and that so you obtain this kind of a questions and you get a um a shape parameter and five here which controls the peak S of them over the around around uh the previous value you uh H K and minus so it you do uh map estimation of uh using this uh this prior for for H you will get them uh and optimization problem of just formal case so this is your fit data and that this down R is simply the the minus log of uh are you all the point on that have just find so in the case of in can an arc of change you will get a function of of this uh of and and five is uh like pitch for the shape parameter that you use in the in the prior distribution so you get something which is very close to two the it tech Y to measure between uh H and its shifted the action uh from uh one frame a yeah i them plus a lock function here and this like function here is quite annoying because it is going to mean use uh and ill posed um minimization problem in the the uh because of that down a if you look at at uh you objective function or for a given W and page and if you risk a this um this is a a couple of W and H M okay by you deck gonna metric tell to we should with diagonal down to they'll take yeah then a you can choose a the scale here search sufficiently small so that you decrease a this objective function okay so this will push the solutions of to all a degenerate eight a degenerate the solution the uh like this one so a natural question is a can i just uh removal this down oh and the answer is yes of course you can and it's even something rather a reasonable to do because if you uh a re are of the expression of your or this a measure actually you can see that this uh a one of a i'll far uh times like um down or actually uh uh when and five becomes a sufficiently super or you a greater than than one it basic is going to can so basically uh is quite reasonable to replace yup open that you function by simply the uh uh a a it once i to update are in between that H H and its shifted uh the uh one frame of being okay that that gives you a natural sure of us skating valiant us nets the michelle and uh there's no need to control a the norm of the value you uh in this uh in this of to this and program so it's it's rubber convenient okay so mm in let's talk about the i agree with that is now well um i would skip the most of the details and you can uh brief of to do it to the paper for more information on so one approach them um to solve the all uh generalized the and then F problem of is to be the em algorithm none uh well you could use them this latent components that i introduced introduced number that a uh as complete data okay so this is we did the similar thing it um in previous uh in this work for a another of our uh another and that G function uh the problem is that this i great i'm is quite slow because the augmented data the missing data is very large uh as compared to the uh available data so it is to a a very uh slowly converging agreed on the and we here propose them and new approach uh based on uh majorization minimization uh which does not uh required to to men the data uh meaning to uh to use the the latent component C in the n-gram uh hmmm so it works uh um as um as described so this is our objective function okay and so we we produce an iterative algorithm one which updates dates that will you given age so that's uh stormed down and on the and then S we note to do that and then that we are going to update the columns of H M sequentially even a the current update uh uh the but you and uh given the and values of the neighbours uh of H and so frame and minus one one and and minus uh and and and plus one okay so this problem here uh boils down to this uh so problem okay uh where you uh you want to uh minimize this the function which depend on the on the vector H and for this uh uh we will use a um that and and a uh i agree about no which uh is just on the out the optimization a procedure one uh so it's an to achieve a posted you where uh given a uh i rent a data H T and there okay so in blue we have the the function that we want to to minimize that so locally in the in the cure and update that we simply construct a a server a gate the all sherry function which is easier to minimize and uh the the the original cost function okay and then we need nice this function instead of uh the blue one and then we get a new date and then we to rate and it leads to a descent i them which will uh converse to the to the mean and so the question is a of course how to be a little uh such an ox you are a function um and i'm not going to give the details a a here but basically the principal as are um so in your function you have a uh a the fit to data and the and it down for the fit uh to data you can actually uh match arise it done you can match rise is comics part using a uh a jensen inequality uh you can approximate much or right this can make part by a a first order taylor approximation and and as a matter of fact you don't need to much or nice to measure i sorry the been that sit down because uh uh you are you get um a tractable update without to necessary doing this and in the end of you get a very simple a date to a question okay so that's really really simple to implement a uh where the contribution of the the prior priors on the pin that it down on in the red and so that's if you set long to the to to zero you simply get the storm down the it tech i set to and an F uh of day okay okay so um now we can have a we can look at a few um of to result um so i basically applied this uh uh uh a penalized the so this move like the quite set to an an F i grieve them to some uh all the uh uh jazz to um music the music signal so the the the power spectrum i'm here it sounds like this where X it was a i a a a a a a a so and so on and that um so first term let's compare the uh a convergence in "'em" of uh object objective uh a function value one of uh this and then i agree about a uh of us use the em algorithm that we could have a um done so using that you could do using uh this uh this component as a a late and by about "'em" and you can you can see that the the improvement uh a of the and then allegory of i'm is quite a significant a so this is a a log scale a and this is a desired the iteration and it trends a pretty fast on it's close to uh a C P U to real-time time on the store now the still not compute and the uh so this is the effect of uh uh the regularization for a a values values are of of the the pin G uh weight them so the the parameter or um that uh and uh so this is the baseline and and pin lies the and then F ten than on the was one um they quantize ten and uh one and read them and fortunately i don't have a magic uh but a two uh what too much telly uh uh that i mean the the right uh the right uh that you uh you have to be a you have to it has to be user defined a a to could D in a in a us on the editing eating uh uh sitting at this on is you know we'd have to uh tune this parameter according to do the design does a case i don't know uh do i have uh quite something in it so that's okay so i uh to to finish i um i wanted to uh uh show you um the structure of the time-frequency mask that are around by the decomposition K because i think it's a it's quite interesting to to see here these uh the structure a and to see that actually with a limited number of components so take once ten for that uh two minute uh a a piece of um of of music you can learn some interesting things okay so the time-frequency mask remember all the other wiener filter or else that you know games that you apply yeah uh uh to the observation to reconstruct a uh each of the the component so this is the first uh the first uh time-frequency frequency a mass school the values the zero is a um white and uh one is a back and the uh you get different uh a structure so here you you get the rather wide the uh wideband the E major and you get a so uh more pitch structure so for example we can use sent to one of this uh structure to one of these components typically uh this is going to uh capture it's of uh notes it sounds like this i and we now know that uh this is not actually the the time no uh the component it simply the mask okay that you applied to the observation so it means that even if you have some uh uh uh values uh a yeah to one at some place if there is nothing in at this a time-frequency point in the data uh you you get a uh uh S T a estimated the um has spectrum which is a which is a which is zero okay uh and for example we can know this is another type of uh a time-frequency structure which is a a white band are and uh this captures a uh the at tax of the of the buttons i a a a a a i i okay and so on a so have ten components like to like this um uh this one uh G to a clearly uh shows the the bass okay so it's just a low bass uh for uh component and this one a shows that this noise which is present on the on the recording so it's so it's uh hi pass we can see we can listen to it a basic is just noise so you do don't some things them uh even and uh we've a limited number of of components are and you can do uh nice uh sound the um well this this type of decomposition can have some a nice uh uh oh joy reading a tradition for example am you can uh uh use a so basically you have decompose your original "'em" the recording yeah into a number of uh of components so this involves some manual grouping to actually a reconstructed this uh uh the sources from from from the component and speaker lean you can so remove the noise and do a denoising and there so uh remastered these different components are a a a on two channels to produce a as to re recording from them on the recording very similar to uh the show and tell in more of what you know me just to the animal if you if you which use so the the the demos it's is the same uh same kind of it is so typically am uh from this original no you can create uh and that mixed and you noise that rations so will play that for you so for as the original mono a a a a and and uh because notion a a a and a a a a a a i and if you want you can can and for example for the for the brass uh components so the trumpets carry net so the interesting thing is that even if you have some artifacts on some of the estimated sources a uh because you uh replay play the sources to give or uh you actually uh uh don't to uh this sent to the artifacts and you it's you can run there are some the special uh a special pressure and uh that concludes and my uh my talk yes yes i i don't know i mean um you can you can be a L and and i agree an for the estimation of the and H okay uh using a this latent and components as the complete data okay but is not shown here okay but you can do it quite easily offline line sure uh_huh what what a take K uh let less some more components uh that's a good the question um hmmm ten components seem to be the the the proper or a number of components to use because uh adding more components uh only uh uh tended it to uh reach find a decomposition of the noise okay so it it seemed like uh uh after ten components um you didn't a obtain a more interesting uh almost right it's all season now to be honest i don't remember a a what does a uh when you take less than uh then ten compare i i don't remember click you're