0:00:17so um this is a talk about uh if the class i two um
0:00:22a non-negative matrix factorization
0:00:24so the outline is uh as follows that will uh uh briefly recall some uh
0:00:29previous work about uh i S nmf and uh
0:00:33describe the
0:00:34latent a statistical model um
0:00:37a two D is uh
0:00:38specific uh and N F L
0:00:41uh then that will uh present some uh a way of actually uh smoothing the activation the question
0:00:49and
0:00:50the major contribution
0:00:52work
0:00:53uh
0:00:53going along with of course an algorithm
0:00:56which is uh going to be a majorization minimization
0:00:59algorithms
0:01:00and the
0:01:01before giving a few entries
0:01:04so this is only to uh introduce my annotations of them
0:01:09the data here the non-negative data here is going to be a view of dimensions F by and so
0:01:16a stack frequency here and and is a number of frame that
0:01:20and
0:01:21and the dictionary matrix W the activation matrix H at the number of components K
0:01:27and um so uh
0:01:30and then F A
0:01:32usually involves a
0:01:33minimizing in
0:01:35uh
0:01:36a quaternion of uh this form and are non negativity constraint of the value
0:01:40H
0:01:41with specific uh
0:01:43cost function here which in our case will be it a quite effective that mentions that that will
0:01:47intra justly
0:01:49so the applicative context here is a unsupervised the music you have representation
0:01:54uh so we will deal with a um
0:01:56audio spectrograms
0:01:58okay
0:01:59and the idea is to uh
0:02:01learn some uh representative the spectral patterns of thoughts on the spectrogram
0:02:06spectrogram
0:02:08uh and of course uh so there's which you should questions about uh how to choose who V uh whether
0:02:16you should choose the magnitude or
0:02:17or spectrogram uh how to choose the measure of fit that using the
0:02:21in the decomposition
0:02:22and also and then estimate can did you was um
0:02:26uh a wrong K approximation you approximate the spectrogram is the son of
0:02:30uh a rank one spectral grounds and
0:02:33question is uh if i want to retrieve a uh the comp the corresponding components in the time domain
0:02:38uh how should i in this or rank one a spectrograms an
0:02:42how should i did phase in
0:02:44so well
0:02:46um
0:02:47a generative approach uh for in serving this the questions that
0:02:51is uh
0:02:52what every shot to a tech wise i two and then S um
0:02:55uh
0:02:56the uh with a latent model that is as follows so
0:03:00uh let X
0:03:01B you or uh complex value the uh stft "'em"
0:03:06oh the signal you want to decompose so
0:03:08X is different from V O X is a a complex a complex value of data
0:03:13it's you assume that um
0:03:15yeah data so a comp the a time-frequency coefficient
0:03:18uh it's uh uh F and that
0:03:21uh
0:03:22he's a sum of uh components C K S and so okay the uh component index set at feast can
0:03:28see and is the frame
0:03:30uh complex value the such that this coefficient a
0:03:35as uh
0:03:36complex a circular a complex uh a gaussian distribution a
0:03:41uh meaning and uh you have to run done the phase them
0:03:44uh we such a structure on the variance and so basically a rank one uh a structure in the on
0:03:49the variance
0:03:50then you can it very easy to show uh uh i assuming that the components are independent
0:03:55uh that the log-likelihood function and
0:03:57easy quite two
0:03:59uh uh it that quite set to they've jones between the power spectrogram of of data so that's
0:04:05uh absolute value of X to the square
0:04:08um
0:04:10where the it why quite side to uh had a chance is uh defined a
0:04:13by this equation
0:04:16um uh so that's um
0:04:20quite a nice uh thing to i mean to this model is quite a nice thing uh to high because
0:04:26it's a
0:04:27to truly we uh
0:04:28it's a it's a proper uh generative model of the of the spectral grand
0:04:32and in particular
0:04:34if uh you
0:04:36quantity of interest uh is the um
0:04:39and the the the the components of
0:04:41you can uh construct these components
0:04:43uh in a statistically run the the way for example to take the the and in C uh estimate of
0:04:50coefficient uh uh component a high frequency F and and
0:04:54it simply
0:04:55uh of in a filter out the time-frequency mask
0:04:58applied to at of summation
0:05:00uh and uh
0:05:01the time-frequency mask mask to defined by the contribution of
0:05:05uh that component
0:05:07uh in terms of ions divided by the variance of all the company
0:05:13so that's uh that's it a quite so in M F the the the the the the basic that don't
0:05:19so
0:05:19of course uh audio
0:05:22exhibit some some a time cost stance some
0:05:24are we didn't see
0:05:25and uh i taking this uh information to account uh tan
0:05:31lee to uh more the estimation of H and thus so the value of a reduced to uh i don't
0:05:36see beach ambiguities a
0:05:38and they're still to charlie
0:05:40more present component reconstruction
0:05:42so in this work
0:05:44we we want to uh
0:05:46so if a P the uh and then F uh a problem not
0:05:49where we uh
0:05:51at the uh
0:05:53a and i'll see a function a which measures on
0:05:56this looks nests of the rows of H
0:05:59another the question is a how should we uh
0:06:01choose all uh bill this uh this most knots constraint then
0:06:06and again a we can take a generative approach uh which is what we did in uh in in previous
0:06:11work as well where don't
0:06:12where we propose to uh
0:06:15model the smoothness of the activation coefficients in terms of uh markov of chain so
0:06:20either our are in house gonna all an on of change could the present
0:06:24and non negativity so the id simply
0:06:27to assume a prior a for H T and now so
0:06:30um
0:06:31the activation creation of
0:06:33uh component K at frame hand
0:06:35to be searched that
0:06:36the mode would of
0:06:37this uh a distribution is obtained
0:06:40uh i
0:06:41the coefficient
0:06:42in the previous frame
0:06:43okay
0:06:44so you can basically uh a black here a ga now and again a distribution and that
0:06:49so you obtain
0:06:50this kind of a questions
0:06:52and you get a um
0:06:54a shape parameter and five here
0:06:56which controls the peak S of them over the
0:06:59around around uh the previous value you uh H K
0:07:01and minus
0:07:04so it you do uh map estimation of uh using this uh this prior for for H
0:07:09you will get them
0:07:11uh and optimization problem of just formal case so this is your fit data
0:07:15and that this down R is simply the the minus log of uh are you all the point on that
0:07:20have just find
0:07:22so in the case of in can an arc of change
0:07:24you will get a function of
0:07:27of this uh of
0:07:29and and five is uh like pitch for the shape parameter
0:07:33that you use in the in the prior distribution
0:07:36so you get something which is very close to two
0:07:39the it tech Y to measure between uh H
0:07:44and its shifted the action uh from uh one frame a yeah i them
0:07:49plus
0:07:49a lock function here
0:07:51and this like function here is quite annoying
0:07:54because it is going to mean use uh and ill posed um
0:07:58minimization problem
0:08:01in the the uh because of that down a
0:08:04if you look at
0:08:05at uh you objective function or
0:08:07for a given W and page
0:08:09and if you risk a
0:08:11this um
0:08:13this is a a couple of W and H M
0:08:16okay by you deck gonna metric tell to
0:08:19we should with diagonal down to they'll take yeah then a
0:08:22you can choose a the scale here
0:08:24search sufficiently small so that you decrease a
0:08:28this objective function okay so this will push the solutions of
0:08:32to all
0:08:33a degenerate eight a degenerate the solution the
0:08:36uh like this one
0:08:38so a natural question is a can i just
0:08:41uh
0:08:42removal this down oh
0:08:44and the answer is yes of course you can and it's even something
0:08:48rather a reasonable to do
0:08:49because
0:08:50if you uh a re are of the expression of your or this a measure
0:08:55actually you can see that this
0:08:57uh a one of a i'll far uh
0:08:59times like um
0:09:01down or
0:09:02actually uh uh when and five becomes a sufficiently super or you a greater than than one
0:09:07it basic is going to can
0:09:10so basically
0:09:11uh is quite reasonable to replace
0:09:14yup open that you function by simply
0:09:16the uh uh a a it once i to update are in between that
0:09:20H H and its shifted uh the uh one frame of being
0:09:25okay that that gives you a natural sure of us skating valiant us nets the michelle
0:09:30and uh there's no need to control a the norm of the value you uh in this uh in this
0:09:34of to this and program
0:09:35so it's it's rubber convenient
0:09:38okay so
0:09:39mm in let's talk about the i agree with that is now well um
0:09:43i would skip the most of the details and you can uh brief of to do it to the paper
0:09:47for more information on
0:09:50so one approach them
0:09:52um
0:09:53to solve the all uh generalized the
0:09:55and then F problem of
0:09:56is to be the em algorithm none
0:09:58uh well you could use them
0:10:00this latent components that i introduced introduced number that a
0:10:03uh as complete data
0:10:05okay so this is we did the similar thing it um
0:10:08in previous uh in this work for a another of our uh another and that G
0:10:12function
0:10:13uh the problem is that this i great i'm is quite slow because the augmented data the missing data is
0:10:18very large uh
0:10:20as compared to the uh
0:10:22available data
0:10:23so it is to a a very uh slowly converging agreed on the
0:10:27and we here propose them
0:10:28and new approach uh based on uh majorization minimization
0:10:32uh which does not uh required to to men the data uh meaning to uh to use the the latent
0:10:38component C
0:10:39in the n-gram
0:10:42uh hmmm so it works uh um as um as described so this is
0:10:47our objective function okay and so we we produce an iterative algorithm one
0:10:52which updates dates that will you given age so that's
0:10:55uh
0:10:56stormed down and on the and then S we note to do that
0:11:00and then that we are going to update the columns of H M
0:11:04sequentially
0:11:05even a the current update uh uh the but you and uh given the and values of the neighbours
0:11:12uh of H and so frame and minus one one and and minus uh and and and plus one
0:11:17okay
0:11:18so this problem here uh
0:11:22boils down to this uh
0:11:24so problem
0:11:26okay
0:11:27uh where you uh you want to uh minimize this the function which depend on the on the vector H
0:11:33and for this uh uh we will use a um that and and a
0:11:37uh i agree about no
0:11:39which uh is just on the out the optimization a procedure one
0:11:44uh so it's an to achieve a posted you
0:11:47where uh given a
0:11:49uh i rent a data
0:11:51H T and there
0:11:52okay
0:11:53so in blue we have the the function that we want to to minimize that
0:11:57so locally in the in the cure and update that
0:12:00we simply construct a
0:12:01a server a gate the all sherry function which is easier to minimize and uh the the the original cost
0:12:07function
0:12:09okay
0:12:09and then we need nice this function instead of uh the blue one
0:12:13and then we get a new date and then we to rate and it leads to a descent i them
0:12:19which will
0:12:20uh converse to the to the mean and
0:12:22so the question is a of course how to be a little uh such an ox you are a function
0:12:28um and i'm not going to give the details a a here but basically the principal as are
0:12:33um
0:12:35so in your function you have a uh a the fit to data and the and it down for the
0:12:40fit uh to data you can actually uh match arise it done
0:12:44you can match rise is comics part using a uh a jensen inequality
0:12:48uh you can approximate much or right this can make part by a a first order taylor approximation and and
0:12:53as a matter of fact
0:12:54you don't need to much or nice to measure i sorry
0:12:57the been that sit down because uh uh you are you get um a tractable update without to
0:13:02necessary doing this
0:13:04and in the end of you get a very simple a date to a question okay so that's
0:13:09really really simple to implement a
0:13:12uh where the contribution of the the prior priors on the pin that it down on in the red and
0:13:16so that's if you set long to
0:13:18the to to zero you simply get the storm down the
0:13:22it tech i set to and an F uh of day
0:13:24okay
0:13:27okay so um
0:13:29now we can have a we can look at a few um
0:13:31of to result
0:13:33um so i basically applied this uh uh uh a penalized the
0:13:38so this move like the quite set to an an F i grieve them to some uh
0:13:42all the uh uh jazz to um
0:13:45music the music signal so the
0:13:48the the power spectrum i'm here it sounds like this
0:13:52where X
0:14:03it was
0:14:04a
0:14:06i
0:14:07a
0:14:10a
0:14:11a
0:14:13a
0:14:16a
0:14:17a
0:14:19a
0:14:20so and so on
0:14:22and that um
0:14:24so first term
0:14:26let's compare
0:14:27the
0:14:28uh a convergence in "'em" of uh object objective uh a function value one
0:14:34of uh this and then i agree about a
0:14:37uh of us use the em algorithm that we could have a um
0:14:41done so using that you could do using uh this uh
0:14:44this component as a a late and by about "'em"
0:14:47and you can you can see that the the improvement uh a of the and then allegory of i'm is
0:14:51quite a significant a
0:14:53so this is a a log scale a and this is a desired the iteration
0:14:56and it trends a pretty fast on it's close to uh a C P U to real-time time
0:15:01on the store now the still not compute
0:15:04and the uh so this is the effect of uh uh the regularization for a a values values are
0:15:11of of the the pin G uh weight them so the the parameter or um that
0:15:15uh
0:15:18and uh
0:15:21so this is the baseline and and pin lies the and then F ten than on the was one um
0:15:26they quantize ten and uh one and read them
0:15:30and fortunately i don't have a magic uh
0:15:33but a two uh what too much telly uh uh that i mean the
0:15:37the right uh the right uh that you uh you have to be a
0:15:40you have to
0:15:41it has to be user defined a
0:15:43a to could D in a in a us on the editing eating uh
0:15:46uh sitting at this on is you know we'd have to uh tune this parameter according to do the design
0:15:51does
0:15:54a case i don't know uh do i have uh
0:15:57quite something in it so that's
0:15:59okay so i uh to to finish i um
0:16:02i wanted to uh uh show you
0:16:05um the structure of the time-frequency mask
0:16:08that are around by the decomposition K because i think it's a
0:16:12it's quite interesting to
0:16:14to see here these uh the structure a and to see that actually
0:16:18with a limited number of components
0:16:19so take once ten
0:16:21for that uh two minute uh
0:16:24a a piece of um
0:16:25of of music
0:16:27you can learn some interesting things
0:16:28okay
0:16:29so the time-frequency mask remember all the other wiener filter or else that you know games that you apply yeah
0:16:34uh uh to the observation to reconstruct a uh each of the the component
0:16:40so this is the first uh
0:16:42the first uh
0:16:43time-frequency frequency a mass school
0:16:45the values the zero is a
0:16:48um
0:16:49white and uh one is a back
0:16:52and the uh you get different uh a structure so here you you get the rather wide the
0:16:58uh
0:16:59wideband the E major
0:17:01and you get a so uh
0:17:04more pitch structure so for example we can use sent to one of this
0:17:08uh structure
0:17:10to one of these components typically
0:17:12uh
0:17:13this is going to uh capture it's of uh notes
0:17:16it sounds like this
0:17:32i
0:17:34and we now know that uh
0:17:35this is not actually the
0:17:37the time no
0:17:39uh the component
0:17:40it simply the mask
0:17:42okay that you applied to the observation so
0:17:44it means that even if you have some uh uh
0:17:48uh values uh a yeah to one at some place if there is nothing in at this
0:17:52a time-frequency point in the data
0:17:55uh
0:17:56you you get a uh
0:17:57uh S T a estimated the um
0:18:00has spectrum which is a which is a which is zero okay
0:18:03uh and for example we can know this is another type of uh a time-frequency structure which is a a
0:18:08white band are
0:18:09and uh this captures a uh
0:18:12the at tax of the of the buttons
0:18:14i
0:18:15a
0:18:17a
0:18:20a
0:18:21a
0:18:21a
0:18:23i
0:18:25i
0:18:26okay and so on a
0:18:28so have ten components like to like this um
0:18:33uh this one uh G to a clearly uh shows the the bass okay so it's just a low bass
0:18:39uh for uh component and this one a shows that this noise which is present on the on the recording
0:18:44so it's so
0:18:46it's uh
0:18:47hi pass we can see we can listen to it
0:18:53a basic is just noise
0:18:56so
0:18:57you do don't some things them uh even and uh we've a limited number of of components are
0:19:02and you can do uh
0:19:05nice uh sound the um
0:19:07well this this type of decomposition can have some a nice uh
0:19:12uh oh joy reading a tradition for example am
0:19:15you can uh uh use a so basically you have decompose your original "'em" the recording yeah
0:19:21into a number of uh of components so this involves some manual grouping to actually a reconstructed this uh uh
0:19:27the sources from from from the component
0:19:30and speaker lean you can so remove the noise and do a denoising and there so
0:19:35uh
0:19:36remastered these different components are a a a on two channels to produce a as to re recording from them
0:19:41on the recording
0:19:42very similar to
0:19:43uh the show and tell in more of what you know me just to the animal if you if you
0:19:47which use so the the the demos it's is the same uh
0:19:50same kind of it is
0:19:51so typically am
0:19:53uh
0:19:53from this original no
0:19:55you can create uh
0:19:57and that mixed and you noise that
0:20:00rations so will play that for you so for as the original mono
0:20:06a
0:20:08a
0:20:09a
0:20:10a
0:20:11and and uh
0:20:12because
0:20:13notion
0:20:13a
0:20:14a
0:20:15a
0:20:16and
0:20:16a
0:20:17a
0:20:19a
0:20:20a
0:20:21a
0:20:22a
0:20:24i
0:20:25and if you want you can can and for example for the
0:20:28for the brass uh components so
0:20:30the trumpets carry net
0:20:40so the interesting thing is that even if you have some artifacts on some of the estimated sources a uh
0:20:45because you uh replay play the sources to give or
0:20:48uh you actually uh uh don't to uh
0:20:51this sent to the artifacts and you
0:20:52it's you can run there are some the special uh a special pressure
0:20:57and uh that concludes and my uh my talk
0:21:31yes
0:21:33yes
0:21:34i i don't know
0:21:35i mean um
0:21:36you can you can be a L and and i agree an for the estimation of the and H
0:21:42okay
0:21:43uh
0:21:44using a
0:21:45this latent and components
0:21:46as the complete data
0:21:49okay
0:21:49but is not shown here
0:21:51okay but you can do it quite easily
0:21:54offline line sure
0:22:09uh_huh
0:22:18what what a take K uh let less some more components
0:22:23uh that's a good the question
0:22:26um hmmm
0:22:30ten components seem to be the the the proper or a number of components to use
0:22:35because uh
0:22:37adding more components
0:22:39uh only uh
0:22:41uh
0:22:42tended it to uh reach find a decomposition of the noise
0:22:46okay
0:22:47so it it seemed like uh
0:22:50uh after ten components um
0:22:52you didn't a obtain a more interesting uh almost right it's all season
0:22:58now to be honest i don't remember a a what does a uh when you take less than uh
0:23:03then ten compare
0:23:06i i don't remember
0:23:12click you're