oh hi this is uh a joint work with uh four back from my in yeah this shot team in a selective but and a from a you know as an just sticks team from a to to comply that can we uh i'm not and going to talk about tech was say to a non-negative matrix factorization with group sparsity so uh there have been a several talks about the the quite site two a negative matrix factorization so uh we have been working on adding uh uh priors uh with this frame so i go over a quickly and non-negative matrix factorization uh the the next your slides so uh yeah yeah you can see a a a a steep it's simple example of uh a just signal it's uh it's composed of a can or not it's uh it's a uh and uh at to to at each each or you can see that uh first for notes of can are light and then combinations of two which are how money which are a a one up one dave uh to the other so uh this is a this is an example of a very a very difficult to and source separation a compact and uh what we can see here is that the so you have the data and uh a money to matrix factorisation in learning a basis dictionaries so with the basis spectra and they time activations yeah we can see the dictionary and the time activations and you can see that uh a very clearly you can see a the the notes are uh a separate it and you can see that the make actually since a very or easily so you can see that for notes are played together and then combination of to notes and uh uh there are still two components that are left one that explains the one that explains the noise and one that you can see here but uh uh so if we could listen to it uh sounds like the hammer of the of the channel so this is an example of a or where uh the cross a to nonnegative matrix factorisation works are really where i i'm seven example of a a a nonnegative matrix factorization using another a a a a uh another all us which is the euclidean us so here's the the same uh it's the same type plots uh except you can see that uh for um or the first not reece is the the thought components here uh you can see that the the top component gets split up with other components so the separation is not so good as a as before and that is explained by the fact that uh a take a to uh a measure of divergence is uh more sensitive uh most since if to to uh to high frequency uh and to choose so it seems a but a suppression for so uh now if we want to more complicated uh well just signals uh a problem uh a appears that uh if uh if you have only two sources uh each source can "'em" meets several uh several different spectra for example when i speak uh there are so spectra but you can associate my course i'm not only uh always saying that be the same thing and uh so there is the problem of grouping the components uh into two sources assigning several components to sources so uh uh for instance you can you can simply a run and M F and look at uh the activation coefficient okay can see matrix H and you can see that the in this uh in this very simple or uh example where are you have a base and uh yeah and there are overlapping this the region you can see that the for some components there is already uh very clear means that that these components second be assigned to the base and uh other components to uh data so uh one approach is to look at the dictionary and is are guided by a stick or just uh with the yeah uh the an engineer can uh the design the best uh the best grouping of components and two sources but uh the problem is that as the tracks get longer as you get a a a more tracks and uh also as the dictionary get larger uh is because more complicated so for the engine now because there is a a lot a more work to do uh uh and if you are used uh a get it by heuristic uh this series stick will involve a considering or permutations of of the of your generate so uh if you have five permutation mutation you have a factor your five a limitations to see too small but if you want to ten twenty uh components and of the jury this becomes uh wait wait too long it uh so you would run uh and an F of thoughts to seconds and would spend one day considering all the permutations a source of so uh with the want to do is to include the grouping in the learning of the of the dictionary so um when way of uh when we have thinking uh a how to group for the components is to uh is i think about the the the some levels of uh each source at uh at a given time so uh uh here uh for a given track a a uh i i are did the volume for each so the base get down the voice and uh you can see that there are some uh he that you can use for instance uh uh at this time you can see that the the basis a very low level uh compared to the other sources so you could say that that's some points one source is inactive or as the other as a are active and also uh another idea yeah is to exploit the fact that their shapes of uh is volume activations are are are very different so uh so uh not coming back to the the from of was set the the notations a a little bit so uh what we have been looking at uh so that that there is a you of the power spectrum uh and at time you can consider out that uh in a model that if you of a and it's eve uh you know model uh there was a spectrogram is uh is gonna but that that's the sum of uh for several components and each component uh each components of the complex spectrum uh it's not the gaussian with a diagonal covariance and uh nonnegative matrix factorization consists in uh computing using uh a factorization of uh the parameters of a matter so uh in this case in in the case of uh D uh tech why said to uh your a chance this corresponds to uh i mean you gaussian model uh which means uh that's we have a truly the additive model for the power spectrum runs so even if uh but it is not at a T for the observed or gram really additive for all what want to estimate that is the parameters and uh it is the only model for which you can get the is uh this i this to be true so i mean the gaussian a assumption and uh you can don't to uh looking at the power spectrogram in it uh it means that actually the power spectrum is uh distributed as an exponential uh with problem itself uh W and H got that we you use the bases dictionary and H uh that time coefficients the time activation and so in my annotation uh H has several role and you want to uh you want to find uh and want to uh you you want to find a a a a a partition of the rows of H in to uh say two groups uh but this may generalized an trial are a number of rules you want to find a partitions of the rows of H in two so here had would be to groups we the the same number of uh the same number of uh of from now coming back to the coming back to the P just lies what is the volume in uh uh what is the the some level of fit shots in a in a model well if you assume that uh the sense of uh each column of uh W sums to one then the some level of one source will be the seven of activation coefficients of uh of a group one which corresponds to force one so what we want mother is uh these these coefficients so inference we propose is to round the grouping at the same time as the factorization uh so this corresponds to uh uh doing a a up if close to to an F and and ink we propose a adding a prior uh that is that sector is in the groups a all the different sources so uh so yes since you have a a nonnegative coefficient this uh this uh and one um is just the the sum of the coefficients of age for uh one schools that is one group at a given time and uh and the i uh here we only assume that is a that it is a a concave function uh and so what this uh what this uh optimization problem tends you is that's you want to have a fit to the data but at the same time of uh you have a prior on that they are uh that uh at a given time there is only a a that they are only a few sources that are active at the same time so uh so in in you know that we have a but that's choice for side but so if you if you if you look at the paper are you you would see that the uh it to that it comes from a a graphical model with we two layers uh i so to much about this um and this corresponds actually uh so to uh maximum like you an france of uh of the problem of a model or even a a out to model of the data and uh a parameter on H um so about the inference of the parameters for the algorithm is uh in uh the the que c'est chance it's uh so it's very hard uh to uh to have a a a and so the related methods to to the parameter on friends we must the results to let's get to the date uh because they go way faster uh here an example of uh a a a a a at the right uh window running the algorithm with you know a a great in reading methods are or multiplicative at that's method and that but you get you of that's them goes away faster and actually are converges to but uh are a but on the pony no so um or or go with and uh a doesn't change significantly from a stand the class i two and F we just add uh terms which correspond to a to our prior and uh since yeah size use a is a concave function uh you have that the site in in upsets a prime uh in is with uh that's some level of source one so what the algorithm than that you is that that each step you are gonna a a bit H so as to get to a better fit of the data uh corresponding to the the class a two and gets you matrix like relation uh and the more source one uh the the the less source one uh will be at a high volume the more you will be uh but then broke coefficient at this time so uh it means that the this uh so this algorithm will push uh a low amplitude sources to zero and keep i i'm should source uh and uh so it's on the fact that uh even if we have a a a a a a a a you prior this doesn't change the speed of uh this doesn't change at of the speed of the algorithm it's are compulsion approximately a thousand iterations durations the time uh the time customs algorithm is read the same as a the classic in now one complicated aspect of uh i having this prior is that the you must to uh selection for the i'd proper all so uh uh i a prime thousand are uh in a mother are and on that uh and uh a a a of the choice of uh of side so even that we have a a given that we actually have a a a a a a graphical model that explains the choice of this prior uh we could result to uh we could is up to uh a a bayesian tools to to estimate the was parameters that uh actually we uh we we devised a statistic it is a a a a uh much more simple uh to uh it you on that so and it's the principle of this to stick is that if you become all the right palm tells then V given this parameter L should be exponentially distributed so uh if you compute now uh sadistic this thing that are we over the estimation of the you H and you have a a a and and you are and you look at is uh at this a random variable then it should be a distributed as an exponential one and you have a a lot of samples of this because you have a a a a a a a as many menu frequencies and it as many a frequency in is uh as many state is the statistics it is you have a a time-frequency bins we have a lot of them and uh then uh run uh computing a chroma graphs none of sadistic becomes a very interesting because it it's a very cheap and you can uh and uh we can just run a whole rid of experiments and look at the parameter values for which you have the lowest a we also have statistic um and so we did that on uh that that to get that check a that a lot or or uh source so we have a and the see that that to generated for from the model and now you can look at uh so i we look at the different number of uh uh a training sample for the mother and you can look at uh at the top yeah value of all set stick it is in blue uh uh in red uh a measure of the mountains to good to my because uh in this uh in this setting uh we have a a we generate it's synthetic that that from a non model so we can uh actually compute you and parameters to is the divergence good to mother and yeah you can see uh uh a a class if you can should scroll also gets which can vacations got is uh if that's a uh uh if a correct source one one source one is a exactly if we cover a a correct is source to and source exactly so uh when they are only a hundred observations can see that the there is with a good the classification accuracy but uh it is difficult to find a minimum of the is T and as you increase uh the the number of points in gets the the the set to get you were uh but on that there are and uh more in see the the minimum of the statistic T and also uh the the development of a model uh says yet but there are you get the rest so so this just means the model that i you have the but are a a are our prior will uh estimate the as as possible this uh this is a based on to that at that time we not want to uh experimental results so uh uh a a first the is to try uh is to trade this in a simple segmentation task or you know that that the it's a given time that is only one still that that is a key and uh uh a good thing is to uh compare are or them with uh just the simple idea of doing a a and then F and then finding the best uh mutation given a a a you given a statistics so he a a a a given a heuristic so he other heuristic is uh compute an and have and uh find the permutation that the minimize is uh this quantity this to give a faq compared so this is a result of a an an F with the this a heuristic grouping so uh uh you can see so the mix was first uh a then speech uh you can see each other sources uh so that's the result of uh and an F with heuristic to groupings we can see that the still a a a lot of uh missing up the the sources are are it's not a lot and uh this is a result with a are them that's long the grouping at the same time as uh the an F so you i can see that uh uh the separation gets a a a a a uh uh that lot uh lot more yeah uh or original result and uh the second experiment that run was on a a a a real of valid signals so uh so we took uh to some from the C sec that the base and we evaluated uh the quality of the separation uh when we vary the degree of overlap between the sources so the them that up the more difficult it becomes to the separation and and i insisting on the fact that we have no they're on the use so uh uh so uh you can talk for perfect separation but uh you would see that the when you varies over a the the less of a lab you have a but they'll the separation so you know you very good separation for a thought T three percent on year or that so that the sources is it is the is is mix uh this is a this as the source as deep dark voice you can see the that of thought people percent of a like to get the very good separation or T terms of as yeah and as the overlapping increases it's uh works and mars so what the prior what they'll prior is that for that is when and uh not all sources are active at the same time it's a people the the where separation so we can not we can listen that uh examples so and it doesn't work that's hasn't work sources i i oh okay so let's six this this is first meets i um uh a i hmmm this is this to don't the source skip directly to the results guess but with source and that's C of would we have an estimate of uh an on is for oh or a oh oh a yeah i do you so we have a ten seconds that a for the computer and so we have a proposed the simple sparsity prior to do a a group uh grouping of the sources and solve the permutation problem in this uh a single channel source separation case uh and we show that the algorithm them but there was of the grouping with the as a a post processing step and if you in future work we will try to incorporate smoothest might prior to uh to understand the time the paul then a mix of H i should we have time for only one with question no so the the most you play they are mostly a love part E component and how mix a very much of like it because they playing according to the egg are so that a single close like it and i am wondering how much the sampling rate and the effect these signs in is you R i mean if you would to hi are and fifty resolution oh what it be different what do you yeah that are separation and that you can see there such a yeah oh O K so we don't talk about this in the article now from uh from my parents this is the does not is sensitive to the simply me that you choose uh for this experiment i chose a sampling rate of uh a twenty two Q do have because uh it just just because of a computing time uh concern and i guess uh for the example or well you have a time voice since uh they play in the approximately in the in the mid and high level of of the spectrogram uh this wouldn't uh uh have too much effect because uh because the so the this i range the frequencies are pretty well separated but uh uh if you have base and another source that purely having a good the resolution uh we will help for uh since we have no model uh since we and number than for the basis of and then a having a would sampling right to i mean have a good something rate will help because you then you can get but the resolution uh you can you can afford a longer time window and but the resolution in the frequency range which is particularly important in the low frequency one a some that i i would say that the results are very robust the problem goes from okay thank you