so um this is a talk about uh if the class i two um
a non-negative matrix factorization
so the outline is uh as follows that will uh uh briefly recall some uh
previous work about uh i S nmf and uh
describe the
latent a statistical model um
a two D is uh
specific uh and N F L
uh then that will uh present some uh a way of actually uh smoothing the activation the question
and
the major contribution
work
uh
going along with of course an algorithm
which is uh going to be a majorization minimization
algorithms
and the
before giving a few entries
so this is only to uh introduce my annotations of them
the data here the non-negative data here is going to be a view of dimensions F by and so
a stack frequency here and and is a number of frame that
and
and the dictionary matrix W the activation matrix H at the number of components K
and um so uh
and then F A
usually involves a
minimizing in
uh
a quaternion of uh this form and are non negativity constraint of the value
H
with specific uh
cost function here which in our case will be it a quite effective that mentions that that will
intra justly
so the applicative context here is a unsupervised the music you have representation
uh so we will deal with a um
audio spectrograms
okay
and the idea is to uh
learn some uh representative the spectral patterns of thoughts on the spectrogram
spectrogram
uh and of course uh so there's which you should questions about uh how to choose who V uh whether
you should choose the magnitude or
or spectrogram uh how to choose the measure of fit that using the
in the decomposition
and also and then estimate can did you was um
uh a wrong K approximation you approximate the spectrogram is the son of
uh a rank one spectral grounds and
question is uh if i want to retrieve a uh the comp the corresponding components in the time domain
uh how should i in this or rank one a spectrograms an
how should i did phase in
so well
um
a generative approach uh for in serving this the questions that
is uh
what every shot to a tech wise i two and then S um
uh
the uh with a latent model that is as follows so
uh let X
B you or uh complex value the uh stft "'em"
oh the signal you want to decompose so
X is different from V O X is a a complex a complex value of data
it's you assume that um
yeah data so a comp the a time-frequency coefficient
uh it's uh uh F and that
uh
he's a sum of uh components C K S and so okay the uh component index set at feast can
see and is the frame
uh complex value the such that this coefficient a
as uh
complex a circular a complex uh a gaussian distribution a
uh meaning and uh you have to run done the phase them
uh we such a structure on the variance and so basically a rank one uh a structure in the on
the variance
then you can it very easy to show uh uh i assuming that the components are independent
uh that the log-likelihood function and
easy quite two
uh uh it that quite set to they've jones between the power spectrogram of of data so that's
uh absolute value of X to the square
um
where the it why quite side to uh had a chance is uh defined a
by this equation
um uh so that's um
quite a nice uh thing to i mean to this model is quite a nice thing uh to high because
it's a
to truly we uh
it's a it's a proper uh generative model of the of the spectral grand
and in particular
if uh you
quantity of interest uh is the um
and the the the the components of
you can uh construct these components
uh in a statistically run the the way for example to take the the and in C uh estimate of
coefficient uh uh component a high frequency F and and
it simply
uh of in a filter out the time-frequency mask
applied to at of summation
uh and uh
the time-frequency mask mask to defined by the contribution of
uh that component
uh in terms of ions divided by the variance of all the company
so that's uh that's it a quite so in M F the the the the the the basic that don't
so
of course uh audio
exhibit some some a time cost stance some
are we didn't see
and uh i taking this uh information to account uh tan
lee to uh more the estimation of H and thus so the value of a reduced to uh i don't
see beach ambiguities a
and they're still to charlie
more present component reconstruction
so in this work
we we want to uh
so if a P the uh and then F uh a problem not
where we uh
at the uh
a and i'll see a function a which measures on
this looks nests of the rows of H
another the question is a how should we uh
choose all uh bill this uh this most knots constraint then
and again a we can take a generative approach uh which is what we did in uh in in previous
work as well where don't
where we propose to uh
model the smoothness of the activation coefficients in terms of uh markov of chain so
either our are in house gonna all an on of change could the present
and non negativity so the id simply
to assume a prior a for H T and now so
um
the activation creation of
uh component K at frame hand
to be searched that
the mode would of
this uh a distribution is obtained
uh i
the coefficient
in the previous frame
okay
so you can basically uh a black here a ga now and again a distribution and that
so you obtain
this kind of a questions
and you get a um
a shape parameter and five here
which controls the peak S of them over the
around around uh the previous value you uh H K
and minus
so it you do uh map estimation of uh using this uh this prior for for H
you will get them
uh and optimization problem of just formal case so this is your fit data
and that this down R is simply the the minus log of uh are you all the point on that
have just find
so in the case of in can an arc of change
you will get a function of
of this uh of
and and five is uh like pitch for the shape parameter
that you use in the in the prior distribution
so you get something which is very close to two
the it tech Y to measure between uh H
and its shifted the action uh from uh one frame a yeah i them
plus
a lock function here
and this like function here is quite annoying
because it is going to mean use uh and ill posed um
minimization problem
in the the uh because of that down a
if you look at
at uh you objective function or
for a given W and page
and if you risk a
this um
this is a a couple of W and H M
okay by you deck gonna metric tell to
we should with diagonal down to they'll take yeah then a
you can choose a the scale here
search sufficiently small so that you decrease a
this objective function okay so this will push the solutions of
to all
a degenerate eight a degenerate the solution the
uh like this one
so a natural question is a can i just
uh
removal this down oh
and the answer is yes of course you can and it's even something
rather a reasonable to do
because
if you uh a re are of the expression of your or this a measure
actually you can see that this
uh a one of a i'll far uh
times like um
down or
actually uh uh when and five becomes a sufficiently super or you a greater than than one
it basic is going to can
so basically
uh is quite reasonable to replace
yup open that you function by simply
the uh uh a a it once i to update are in between that
H H and its shifted uh the uh one frame of being
okay that that gives you a natural sure of us skating valiant us nets the michelle
and uh there's no need to control a the norm of the value you uh in this uh in this
of to this and program
so it's it's rubber convenient
okay so
mm in let's talk about the i agree with that is now well um
i would skip the most of the details and you can uh brief of to do it to the paper
for more information on
so one approach them
um
to solve the all uh generalized the
and then F problem of
is to be the em algorithm none
uh well you could use them
this latent components that i introduced introduced number that a
uh as complete data
okay so this is we did the similar thing it um
in previous uh in this work for a another of our uh another and that G
function
uh the problem is that this i great i'm is quite slow because the augmented data the missing data is
very large uh
as compared to the uh
available data
so it is to a a very uh slowly converging agreed on the
and we here propose them
and new approach uh based on uh majorization minimization
uh which does not uh required to to men the data uh meaning to uh to use the the latent
component C
in the n-gram
uh hmmm so it works uh um as um as described so this is
our objective function okay and so we we produce an iterative algorithm one
which updates dates that will you given age so that's
uh
stormed down and on the and then S we note to do that
and then that we are going to update the columns of H M
sequentially
even a the current update uh uh the but you and uh given the and values of the neighbours
uh of H and so frame and minus one one and and minus uh and and and plus one
okay
so this problem here uh
boils down to this uh
so problem
okay
uh where you uh you want to uh minimize this the function which depend on the on the vector H
and for this uh uh we will use a um that and and a
uh i agree about no
which uh is just on the out the optimization a procedure one
uh so it's an to achieve a posted you
where uh given a
uh i rent a data
H T and there
okay
so in blue we have the the function that we want to to minimize that
so locally in the in the cure and update that
we simply construct a
a server a gate the all sherry function which is easier to minimize and uh the the the original cost
function
okay
and then we need nice this function instead of uh the blue one
and then we get a new date and then we to rate and it leads to a descent i them
which will
uh converse to the to the mean and
so the question is a of course how to be a little uh such an ox you are a function
um and i'm not going to give the details a a here but basically the principal as are
um
so in your function you have a uh a the fit to data and the and it down for the
fit uh to data you can actually uh match arise it done
you can match rise is comics part using a uh a jensen inequality
uh you can approximate much or right this can make part by a a first order taylor approximation and and
as a matter of fact
you don't need to much or nice to measure i sorry
the been that sit down because uh uh you are you get um a tractable update without to
necessary doing this
and in the end of you get a very simple a date to a question okay so that's
really really simple to implement a
uh where the contribution of the the prior priors on the pin that it down on in the red and
so that's if you set long to
the to to zero you simply get the storm down the
it tech i set to and an F uh of day
okay
okay so um
now we can have a we can look at a few um
of to result
um so i basically applied this uh uh uh a penalized the
so this move like the quite set to an an F i grieve them to some uh
all the uh uh jazz to um
music the music signal so the
the the power spectrum i'm here it sounds like this
where X
it was
a
i
a
a
a
a
a
a
a
so and so on
and that um
so first term
let's compare
the
uh a convergence in "'em" of uh object objective uh a function value one
of uh this and then i agree about a
uh of us use the em algorithm that we could have a um
done so using that you could do using uh this uh
this component as a a late and by about "'em"
and you can you can see that the the improvement uh a of the and then allegory of i'm is
quite a significant a
so this is a a log scale a and this is a desired the iteration
and it trends a pretty fast on it's close to uh a C P U to real-time time
on the store now the still not compute
and the uh so this is the effect of uh uh the regularization for a a values values are
of of the the pin G uh weight them so the the parameter or um that
uh
and uh
so this is the baseline and and pin lies the and then F ten than on the was one um
they quantize ten and uh one and read them
and fortunately i don't have a magic uh
but a two uh what too much telly uh uh that i mean the
the right uh the right uh that you uh you have to be a
you have to
it has to be user defined a
a to could D in a in a us on the editing eating uh
uh sitting at this on is you know we'd have to uh tune this parameter according to do the design
does
a case i don't know uh do i have uh
quite something in it so that's
okay so i uh to to finish i um
i wanted to uh uh show you
um the structure of the time-frequency mask
that are around by the decomposition K because i think it's a
it's quite interesting to
to see here these uh the structure a and to see that actually
with a limited number of components
so take once ten
for that uh two minute uh
a a piece of um
of of music
you can learn some interesting things
okay
so the time-frequency mask remember all the other wiener filter or else that you know games that you apply yeah
uh uh to the observation to reconstruct a uh each of the the component
so this is the first uh
the first uh
time-frequency frequency a mass school
the values the zero is a
um
white and uh one is a back
and the uh you get different uh a structure so here you you get the rather wide the
uh
wideband the E major
and you get a so uh
more pitch structure so for example we can use sent to one of this
uh structure
to one of these components typically
uh
this is going to uh capture it's of uh notes
it sounds like this
i
and we now know that uh
this is not actually the
the time no
uh the component
it simply the mask
okay that you applied to the observation so
it means that even if you have some uh uh
uh values uh a yeah to one at some place if there is nothing in at this
a time-frequency point in the data
uh
you you get a uh
uh S T a estimated the um
has spectrum which is a which is a which is zero okay
uh and for example we can know this is another type of uh a time-frequency structure which is a a
white band are
and uh this captures a uh
the at tax of the of the buttons
i
a
a
a
a
a
i
i
okay and so on a
so have ten components like to like this um
uh this one uh G to a clearly uh shows the the bass okay so it's just a low bass
uh for uh component and this one a shows that this noise which is present on the on the recording
so it's so
it's uh
hi pass we can see we can listen to it
a basic is just noise
so
you do don't some things them uh even and uh we've a limited number of of components are
and you can do uh
nice uh sound the um
well this this type of decomposition can have some a nice uh
uh oh joy reading a tradition for example am
you can uh uh use a so basically you have decompose your original "'em" the recording yeah
into a number of uh of components so this involves some manual grouping to actually a reconstructed this uh uh
the sources from from from the component
and speaker lean you can so remove the noise and do a denoising and there so
uh
remastered these different components are a a a on two channels to produce a as to re recording from them
on the recording
very similar to
uh the show and tell in more of what you know me just to the animal if you if you
which use so the the the demos it's is the same uh
same kind of it is
so typically am
uh
from this original no
you can create uh
and that mixed and you noise that
rations so will play that for you so for as the original mono
a
a
a
a
and and uh
because
notion
a
a
a
and
a
a
a
a
a
a
i
and if you want you can can and for example for the
for the brass uh components so
the trumpets carry net
so the interesting thing is that even if you have some artifacts on some of the estimated sources a uh
because you uh replay play the sources to give or
uh you actually uh uh don't to uh
this sent to the artifacts and you
it's you can run there are some the special uh a special pressure
and uh that concludes and my uh my talk
yes
yes
i i don't know
i mean um
you can you can be a L and and i agree an for the estimation of the and H
okay
uh
using a
this latent and components
as the complete data
okay
but is not shown here
okay but you can do it quite easily
offline line sure
uh_huh
what what a take K uh let less some more components
uh that's a good the question
um hmmm
ten components seem to be the the the proper or a number of components to use
because uh
adding more components
uh only uh
uh
tended it to uh reach find a decomposition of the noise
okay
so it it seemed like uh
uh after ten components um
you didn't a obtain a more interesting uh almost right it's all season
now to be honest i don't remember a a what does a uh when you take less than uh
then ten compare
i i don't remember
click you're