so um this is a talk about uh if the class i two um

a non-negative matrix factorization

so the outline is uh as follows that will uh uh briefly recall some uh

previous work about uh i S nmf and uh

describe the

latent a statistical model um

a two D is uh

specific uh and N F L

uh then that will uh present some uh a way of actually uh smoothing the activation the question

and

the major contribution

work

uh

going along with of course an algorithm

which is uh going to be a majorization minimization

algorithms

and the

before giving a few entries

so this is only to uh introduce my annotations of them

the data here the non-negative data here is going to be a view of dimensions F by and so

a stack frequency here and and is a number of frame that

and

and the dictionary matrix W the activation matrix H at the number of components K

and um so uh

and then F A

usually involves a

minimizing in

uh

a quaternion of uh this form and are non negativity constraint of the value

H

with specific uh

cost function here which in our case will be it a quite effective that mentions that that will

intra justly

so the applicative context here is a unsupervised the music you have representation

uh so we will deal with a um

audio spectrograms

okay

and the idea is to uh

learn some uh representative the spectral patterns of thoughts on the spectrogram

spectrogram

uh and of course uh so there's which you should questions about uh how to choose who V uh whether

you should choose the magnitude or

or spectrogram uh how to choose the measure of fit that using the

in the decomposition

and also and then estimate can did you was um

uh a wrong K approximation you approximate the spectrogram is the son of

uh a rank one spectral grounds and

question is uh if i want to retrieve a uh the comp the corresponding components in the time domain

uh how should i in this or rank one a spectrograms an

how should i did phase in

so well

um

a generative approach uh for in serving this the questions that

is uh

what every shot to a tech wise i two and then S um

uh

the uh with a latent model that is as follows so

uh let X

B you or uh complex value the uh stft "'em"

oh the signal you want to decompose so

X is different from V O X is a a complex a complex value of data

it's you assume that um

yeah data so a comp the a time-frequency coefficient

uh it's uh uh F and that

uh

he's a sum of uh components C K S and so okay the uh component index set at feast can

see and is the frame

uh complex value the such that this coefficient a

as uh

complex a circular a complex uh a gaussian distribution a

uh meaning and uh you have to run done the phase them

uh we such a structure on the variance and so basically a rank one uh a structure in the on

the variance

then you can it very easy to show uh uh i assuming that the components are independent

uh that the log-likelihood function and

easy quite two

uh uh it that quite set to they've jones between the power spectrogram of of data so that's

uh absolute value of X to the square

um

where the it why quite side to uh had a chance is uh defined a

by this equation

um uh so that's um

quite a nice uh thing to i mean to this model is quite a nice thing uh to high because

it's a

to truly we uh

it's a it's a proper uh generative model of the of the spectral grand

and in particular

if uh you

quantity of interest uh is the um

and the the the the components of

you can uh construct these components

uh in a statistically run the the way for example to take the the and in C uh estimate of

coefficient uh uh component a high frequency F and and

it simply

uh of in a filter out the time-frequency mask

applied to at of summation

uh and uh

the time-frequency mask mask to defined by the contribution of

uh that component

uh in terms of ions divided by the variance of all the company

so that's uh that's it a quite so in M F the the the the the the basic that don't

so

of course uh audio

exhibit some some a time cost stance some

are we didn't see

and uh i taking this uh information to account uh tan

lee to uh more the estimation of H and thus so the value of a reduced to uh i don't

see beach ambiguities a

and they're still to charlie

more present component reconstruction

so in this work

we we want to uh

so if a P the uh and then F uh a problem not

where we uh

at the uh

a and i'll see a function a which measures on

this looks nests of the rows of H

another the question is a how should we uh

choose all uh bill this uh this most knots constraint then

and again a we can take a generative approach uh which is what we did in uh in in previous

work as well where don't

where we propose to uh

model the smoothness of the activation coefficients in terms of uh markov of chain so

either our are in house gonna all an on of change could the present

and non negativity so the id simply

to assume a prior a for H T and now so

um

the activation creation of

uh component K at frame hand

to be searched that

the mode would of

this uh a distribution is obtained

uh i

the coefficient

in the previous frame

okay

so you can basically uh a black here a ga now and again a distribution and that

so you obtain

this kind of a questions

and you get a um

a shape parameter and five here

which controls the peak S of them over the

around around uh the previous value you uh H K

and minus

so it you do uh map estimation of uh using this uh this prior for for H

you will get them

uh and optimization problem of just formal case so this is your fit data

and that this down R is simply the the minus log of uh are you all the point on that

have just find

so in the case of in can an arc of change

you will get a function of

of this uh of

and and five is uh like pitch for the shape parameter

that you use in the in the prior distribution

so you get something which is very close to two

the it tech Y to measure between uh H

and its shifted the action uh from uh one frame a yeah i them

plus

a lock function here

and this like function here is quite annoying

because it is going to mean use uh and ill posed um

minimization problem

in the the uh because of that down a

if you look at

at uh you objective function or

for a given W and page

and if you risk a

this um

this is a a couple of W and H M

okay by you deck gonna metric tell to

we should with diagonal down to they'll take yeah then a

you can choose a the scale here

search sufficiently small so that you decrease a

this objective function okay so this will push the solutions of

to all

a degenerate eight a degenerate the solution the

uh like this one

so a natural question is a can i just

uh

removal this down oh

and the answer is yes of course you can and it's even something

rather a reasonable to do

because

if you uh a re are of the expression of your or this a measure

actually you can see that this

uh a one of a i'll far uh

times like um

down or

actually uh uh when and five becomes a sufficiently super or you a greater than than one

it basic is going to can

so basically

uh is quite reasonable to replace

yup open that you function by simply

the uh uh a a it once i to update are in between that

H H and its shifted uh the uh one frame of being

okay that that gives you a natural sure of us skating valiant us nets the michelle

and uh there's no need to control a the norm of the value you uh in this uh in this

of to this and program

so it's it's rubber convenient

okay so

mm in let's talk about the i agree with that is now well um

i would skip the most of the details and you can uh brief of to do it to the paper

for more information on

so one approach them

um

to solve the all uh generalized the

and then F problem of

is to be the em algorithm none

uh well you could use them

this latent components that i introduced introduced number that a

uh as complete data

okay so this is we did the similar thing it um

in previous uh in this work for a another of our uh another and that G

function

uh the problem is that this i great i'm is quite slow because the augmented data the missing data is

very large uh

as compared to the uh

available data

so it is to a a very uh slowly converging agreed on the

and we here propose them

and new approach uh based on uh majorization minimization

uh which does not uh required to to men the data uh meaning to uh to use the the latent

component C

in the n-gram

uh hmmm so it works uh um as um as described so this is

our objective function okay and so we we produce an iterative algorithm one

which updates dates that will you given age so that's

uh

stormed down and on the and then S we note to do that

and then that we are going to update the columns of H M

sequentially

even a the current update uh uh the but you and uh given the and values of the neighbours

uh of H and so frame and minus one one and and minus uh and and and plus one

okay

so this problem here uh

boils down to this uh

so problem

okay

uh where you uh you want to uh minimize this the function which depend on the on the vector H

and for this uh uh we will use a um that and and a

uh i agree about no

which uh is just on the out the optimization a procedure one

uh so it's an to achieve a posted you

where uh given a

uh i rent a data

H T and there

okay

so in blue we have the the function that we want to to minimize that

so locally in the in the cure and update that

we simply construct a

a server a gate the all sherry function which is easier to minimize and uh the the the original cost

function

okay

and then we need nice this function instead of uh the blue one

and then we get a new date and then we to rate and it leads to a descent i them

which will

uh converse to the to the mean and

so the question is a of course how to be a little uh such an ox you are a function

um and i'm not going to give the details a a here but basically the principal as are

um

so in your function you have a uh a the fit to data and the and it down for the

fit uh to data you can actually uh match arise it done

you can match rise is comics part using a uh a jensen inequality

uh you can approximate much or right this can make part by a a first order taylor approximation and and

as a matter of fact

you don't need to much or nice to measure i sorry

the been that sit down because uh uh you are you get um a tractable update without to

necessary doing this

and in the end of you get a very simple a date to a question okay so that's

really really simple to implement a

uh where the contribution of the the prior priors on the pin that it down on in the red and

so that's if you set long to

the to to zero you simply get the storm down the

it tech i set to and an F uh of day

okay

okay so um

now we can have a we can look at a few um

of to result

um so i basically applied this uh uh uh a penalized the

so this move like the quite set to an an F i grieve them to some uh

all the uh uh jazz to um

music the music signal so the

the the power spectrum i'm here it sounds like this

where X

it was

a

i

a

a

a

a

a

a

a

so and so on

and that um

so first term

let's compare

the

uh a convergence in "'em" of uh object objective uh a function value one

of uh this and then i agree about a

uh of us use the em algorithm that we could have a um

done so using that you could do using uh this uh

this component as a a late and by about "'em"

and you can you can see that the the improvement uh a of the and then allegory of i'm is

quite a significant a

so this is a a log scale a and this is a desired the iteration

and it trends a pretty fast on it's close to uh a C P U to real-time time

on the store now the still not compute

and the uh so this is the effect of uh uh the regularization for a a values values are

of of the the pin G uh weight them so the the parameter or um that

uh

and uh

so this is the baseline and and pin lies the and then F ten than on the was one um

they quantize ten and uh one and read them

and fortunately i don't have a magic uh

but a two uh what too much telly uh uh that i mean the

the right uh the right uh that you uh you have to be a

you have to

it has to be user defined a

a to could D in a in a us on the editing eating uh

uh sitting at this on is you know we'd have to uh tune this parameter according to do the design

does

a case i don't know uh do i have uh

quite something in it so that's

okay so i uh to to finish i um

i wanted to uh uh show you

um the structure of the time-frequency mask

that are around by the decomposition K because i think it's a

it's quite interesting to

to see here these uh the structure a and to see that actually

with a limited number of components

so take once ten

for that uh two minute uh

a a piece of um

of of music

you can learn some interesting things

okay

so the time-frequency mask remember all the other wiener filter or else that you know games that you apply yeah

uh uh to the observation to reconstruct a uh each of the the component

so this is the first uh

the first uh

time-frequency frequency a mass school

the values the zero is a

um

white and uh one is a back

and the uh you get different uh a structure so here you you get the rather wide the

uh

wideband the E major

and you get a so uh

more pitch structure so for example we can use sent to one of this

uh structure

to one of these components typically

uh

this is going to uh capture it's of uh notes

it sounds like this

i

and we now know that uh

this is not actually the

the time no

uh the component

it simply the mask

okay that you applied to the observation so

it means that even if you have some uh uh

uh values uh a yeah to one at some place if there is nothing in at this

a time-frequency point in the data

uh

you you get a uh

uh S T a estimated the um

has spectrum which is a which is a which is zero okay

uh and for example we can know this is another type of uh a time-frequency structure which is a a

white band are

and uh this captures a uh

the at tax of the of the buttons

i

a

a

a

a

a

i

i

okay and so on a

so have ten components like to like this um

uh this one uh G to a clearly uh shows the the bass okay so it's just a low bass

uh for uh component and this one a shows that this noise which is present on the on the recording

so it's so

it's uh

hi pass we can see we can listen to it

a basic is just noise

so

you do don't some things them uh even and uh we've a limited number of of components are

and you can do uh

nice uh sound the um

well this this type of decomposition can have some a nice uh

uh oh joy reading a tradition for example am

you can uh uh use a so basically you have decompose your original "'em" the recording yeah

into a number of uh of components so this involves some manual grouping to actually a reconstructed this uh uh

the sources from from from the component

and speaker lean you can so remove the noise and do a denoising and there so

uh

remastered these different components are a a a on two channels to produce a as to re recording from them

on the recording

very similar to

uh the show and tell in more of what you know me just to the animal if you if you

which use so the the the demos it's is the same uh

same kind of it is

so typically am

uh

from this original no

you can create uh

and that mixed and you noise that

rations so will play that for you so for as the original mono

a

a

a

a

and and uh

because

notion

a

a

a

and

a

a

a

a

a

a

i

and if you want you can can and for example for the

for the brass uh components so

the trumpets carry net

so the interesting thing is that even if you have some artifacts on some of the estimated sources a uh

because you uh replay play the sources to give or

uh you actually uh uh don't to uh

this sent to the artifacts and you

it's you can run there are some the special uh a special pressure

and uh that concludes and my uh my talk

yes

yes

i i don't know

i mean um

you can you can be a L and and i agree an for the estimation of the and H

okay

uh

using a

this latent and components

as the complete data

okay

but is not shown here

okay but you can do it quite easily

offline line sure

uh_huh

what what a take K uh let less some more components

uh that's a good the question

um hmmm

ten components seem to be the the the proper or a number of components to use

because uh

adding more components

uh only uh

uh

tended it to uh reach find a decomposition of the noise

okay

so it it seemed like uh

uh after ten components um

you didn't a obtain a more interesting uh almost right it's all season

now to be honest i don't remember a a what does a uh when you take less than uh

then ten compare

i i don't remember

click you're