Přepis řeči - ON THE RELATION BETWEEN ICA AND MMSE BASED SOURCE SEPARATION

yeah thank you mister chairman and um so the topic of my talk is on uh the relation between uh independent component and that analysis and mmse and it's a joint work my phd supervisor a professor bin yeah so a commonly considered case for independent component analysis is uh the demixing of linear noiseless mixture and in that case the um ideal demixing matrix W is the inverse of the mixing matrix a however i here we want to consider um linear noisy mixtures and the noise changes the ica solution so it's no longer the inverse mixing matching and this can be modelled by um the thing question here W ica is equal to a inverse class um deviation W to the or we can approximate this um for small noise as a inverse plus sigma squared times W bar um prior work on no a noisy ica mainly consists in methods to compensate um the by W to that and they modify the cost function or updated creation of ica however they require knowledge about the noise and we are interested in the ica solution for the noisy case without any bias correction and because we have made the observation that indeed uh i think you haste uh quite similar to mmse and uh and our goal is to find this matrix W bar you and that's creation and uh by this we want to explore the relation between i C an mmse theoretically so you a quick overview of my talk i will start of the signal model and the assumptions then uh we will look at three different solutions for the demixing task namely D inverse solution and the mmse solution of a to not blind methods then we will look at uh i a solution which is of course of course a blind approach uh in the is that section will then see that indeed i can um achieve an mse close to the mmse so the mixing and the demixing mixing process can be some right by these two equations creations you they are probably about known to all of you um X is the vector of mixture just signals which are are linear combinations of the source signals S um with them mixing through a square mixing matrix a a which is and by and and we have some at additive noise uh re yeah and the D make signals Y are obtained by a linear transform W applied to the mixture signals X the goal of the demixing is of course to get the D mixed signals by as similar as possible to the origin signals as so we make uh a a couple of these assumptions first the for the mixing process should be involved close so this means a a inverse should exist the original signals are assumed to be independent with the non gaussian pdf Q i with uh mean zero and variance one and furthermore we assume that the uh a D F Q is three times continuously differentiable and that all required expectation sick we got "'em" the noise we assume that it's zero-mean mean with a covariance matrix uh stick must where times are we so sick must where and D denotes the average variance of we and are we use a normal as covariance matrix the pdf O the pdf of the noise can be arbitrary uh but metric and this means that all or order moments of uh the noise are equal to zero and last we assume that uh the original sources S and the noise we are independent so he as the the first the non blind solution for the mixing that's uh it's the inverse solution so W inverse is equal to a inverse and uh it has the problem properties that it achieves a perfect demixing for the noiseless case however if there's noise this attention of noise amplification and this is especially serious if the mixing matrix a is close to singular and of course it only possible if you know a a in advance or some how can estimated and sits a non blind method the second non blind method is the mmse solution which is a a the metrics W which and minimize the uh M C there's solution is given in this equation here and we can approximate it in terms of signal square where S in the last line the properties are again that it's i think to to the inverse solution if that's no noise so we can achieve a perfect demixing mixing if there's no noise whatever um we need to know the mixing matrix a and properties of the noise or we need to be able to estimate a a second order moments between S and X so again it the um non blind met so now we come to the uh blind approach the ica solution the idea of ica is of course to get um the did mixed signals by a statistically independent since the since we assume that the original signals are statistically independent and we can define a desired distribution of the D mixed signals Q of why um to be the product of the marginal densities of the original source source Q i and then we can define a a cost function namely the kullback-leibler divergence between um the actual pdf of the T mixed signals by and the decide um P Q Q five and uh the formula for the kullback-leibler divergence is given here we just want to note that it's equal to zero if the two P D P and Q are identical and it's larger than zero if they are different hence speak can um so if the demixing type by minimizing this cost function using stochastic gradient descent and the update equations are given here so the the update uh the at W depends on W in uh in transpose and this so do correlation metrics and the function C i here is the um negative the remote of of the log pdf of the original source okay so at convergence of course the the update that of W is equal to zero and this is equivalent to say um that this uh to the correlation matrix uh fee of white tense why transpose um is equal to the identity metric the properties of the ica solution are that um it is equal to the inverse solution if there's no noise um but the big difference is that we don't need to know anything about a or S so um it's applied blind mixing yep the only thing that we require a is that we know the pdf of the original source and uh the original sources must be non goals if um all the pdfs at different then there's no permutation ambiguity and um if you know the pdf perfectly then this also no scaling at but um only a um but remains if the pdf estimate so now we come to the mentor or um of the paper um we can show by taylor series expansion of the nonlinear function fee i that the ica solution is given by this equation where um are we that is a transformed correlation matrix of the noise and and a is a scaling metrics which uh contain uh which depends on the pdf of the original sources through the parameters uh a pile and drove which are given uh here you just want to note that uh cup i a measure of non gaussian gaussianity and it's equal to one if and only if S as course and and it's in all other cases it's larger than one and for comparison we have written down here the um mmse solution and if you compared to see creation and the one on the top here you can see that they are indeed quite similar except for the scaling matrix and go go back uh the scaling matrix and here and if and is approximately a um a metrics for with all elements equal to one then we can conclude that the ica solution um is close to the mmse solution and we can also show that in that case um the two M ease of the ica solution and the mmse solution are quite similar the elements of the scaling matrix and are determined by um the pdf Q of S of the source and then to make any further conclusions we will assume uh a certain family of pdfs maybe the generalized some distribution so the pdf um is given here where come as the come function and that at is the shape parameter which controls the shape of the distribution for example for a type was to to we obtain the cost some distribution for a was to one that a i think distribution and if you let but to go to infinity V get the uniform distribution so um if we fixed the variance to one um then we obtain um that rose people to better minus one and the other parameters cut a to and the elements of the scaling matrix and are given in the plot here and the table so the diagonal elements and i i um i exactly equal to uh couple divided by two and the off diagonal elements and i J are between zero point five and one but maybe more interesting than these parameters is the question uh what i is he can be a and how close um can be get to the mmse estimator and for this we uh make an example we consider to G two D sources with the same shape parameter better the mixing matrix uh is given here we assume goes noise with uh identity covariance matrix and we have studied do relative mse E so this means the mse E of the ica solution divided it uh by the mse of the mmse is and as you can see from the plot you on the right hand side um the relative M you of the ica solution is close to one for a large range of the shape parameter better yeah so uh it less than one point zero six so uh only six percent worse than the mmse estimator and for reference we have also calculated the relative mse of the inverse solution for the two as an hours of ten db and twenty db so um you can see that um a blind approach i three eight out performs uh the inverse solution um which is the non blind method and also for a lot lot french of uh the values of the shape parameter better up to now we have a um can so we have uh consider only use theoretical results um which are valid uh only if for a infinite amount of data of since we have evaluated all the expectations exactly but in practice you'd never have uh internet amount of data so now we want to look at um an actual um could back like the divergence based ica algorithm with a finite amount of data and in practice use really don't use the standard ready and uh but instead the natural gradient because it has a better convergence properties and the update just you know here since we i now using um a finite amount of data of course not only the bias of the ica solution from the mmse solution is important but also the covariance of the estimation contributes to the mse and we uh can assume um two identically distributed source so um ica suffers from the permutation but so we need to resolve this before we can calculate the mse and uh last the scaling of the ica a uh components is slightly different from the scaling of the mmse solution so uh we also compensate for this before we can calculate the mse value so he a a a on the left plot we show the mse E for low passing dispute it signals with uh for different snrs and different sample size L the um like so line line the mmse estimator and um the colour lines are the actual performance of the ica algorithm and as you can see um for large enough sample size uh we can get quite close um to the M M Z um estimator so we can achieve a very good mse performance um for a low snr we can also see that ica out performs the inverse solution this is shown on the right hand side but we plot the relative mse so the line with a um triangles down works this uh sorry trying as a a court the inverse solution here so for low snr it increases quite dramatically where is the ica solution still use a reasonable um M E and the point where they are are the um the ica ica solution and inverse solution and this depends on the mixing matrix and the sample size and one last point that i want to mention um we have also plotted the uh to radical ica solution but the triangle down what's your and old um you can see that it matches quite well with the performance of the uh actual i i them except for very low snr of uh a zero db and this is a because uh we have made a small noise assumption in the door a because you have only be considered um terms up to order of signal square in or taylor series uh we want also to study the influence of the shape parameter better around the perform so here we plot of the relative mse of the ica solution for different um snrs so a ten db twenty db and thirty db and the channel or friend is that the more non goes and the source so the close a um but has equal to zero point five or or uh uh for large values of button the low the relative mse is except for a uh this case here where for the um as an hour ten db and one last point is that um one might wonder why does the the relative mse increase for increasing as are so if you go from ten db is not to thirty db snr Y uh just the relative mse increase um but this can be explained by the fact that indeed uh the and is E for the for i C eight for the noise this case is not a close to zero um but it's low or no longer but the cramer-rao role point which um depends on a cup and to i a couple and a sorry and uh and the relative mse increases for increasing snr so uh to summarise we have a uh in this paper right the ica solution and the M if for the noisy case have seen that there exists a relation between ica and mse which depends on the pdf of the original sources um however off from the ica solution which is the of course of blind approach is close to the mmse solution and we want to state that uh the relation also six just when the non great chief uh fee i does not match the true pdf and we have seen in the simulation results that uh we can at in practice achieve and mse close to the mmse estimator with uh and uh i C a i'd can based on the kullback-leibler divergence and we have also seen that not only the bias of the ica solution is important but also the covariance of the estimation at D two minds yet to for the performance and to some up everything i want to state uh blind demixing by a i is in many cases uh similar to non blind demixing based and M M C so uh i think you for attention and if you press these fit for each yeah or assuming a this is assuming uh that there's no uh time dependent and it of course it depends if you assume for example if you assume the the rum um the wrong type of the distribution if you assume to it's that course in and and in in the source that super coarse and then of course i say doesn't work so it's done it's of assume the correct type and yeah it it depends on the on the amount of mismatch if the mismatch is is um reasonably small or uh and uh used used to good approach and for phone and i i'm yeah yeah it new yeah it yeah it could be it could in the that are and have of uh i i've mentioned that uh in the in the paper or um that um you put use uh this derivation revision to act maybe be right um fee function uh which could you a lower mse by T but a by um getting a a metric which is close which is the elements close one um but the problem is this obviously depends on the snr and so you again you would need to yeah a so yeah oh

ON THE RELATION BETWEEN ICA AND MMSE BASED SOURCE SEPARATION

Source Separation and Applications

Přednášející: Benedikt Loesch, Autoři: Benedikt Loesch, Bin Yang, University of Stuttgart, Germany