"'kay" thank you and so i come to this very last a presentation a a a a model based speech announcements at first i would like if the outline of the talk i was start with the short introduction uh after what's our give a brief overview of out uh the model based noise duck use here for speech enhancement and i well so presents our as an already band i am the estimator your where as an art bands you means that we have different uh is the estimators and the input as an hour decides which one uh we choose then i will show some test results so we're give a short demonstration and we'll final of the the summer "'kay" first let me and the notation we use uh in this presentation we considering for example of such a scenario where we record the noisy microphone signal why which consists of a clean speech signal S that is additive a uh by noise signal and we can see domain we use this representation here what we use capital letters frame index X that and frequency index um all estimates in the following hard you know to by a head for example here at the output of or noise suppression uh you know we enhanced uh speech signal okay a literature uh so called statistical noise reduction uh approaches are often useful purpose of as speech enhancement among them for example the wiener filter all the weighting rooms roots by if for man a model a he's techniques usually as you a certain distribution for the speech and the noise signal for example a course in or up pdf and a light mathematical criteria like mmse embassy maximum likelihood or a map in order to i estimate the speech signal so the classification is based here on memory less a priori no in contrast advantage of model based approaches is that they can additionally uh consider correlation cross time and or frequency for example by using a specific uh model of our speech that so here we can exploit a priori information of fire or and uh one example of such a model based approach is the modified a a to but uh i will present the following the system consists of two steps the first set calls propagation that we tried to exploit uh temporal correlation of speech dft coefficient this is illustrated here we working at the frequency domain and you see of the previous and K enhanced speech coefficients uh for one specific uh frequency bin in order to predict of the current speech a coefficient for this we use a a conventional then your prediction techniques based on a a model of a and K and they ar coefficients a which uh i have to be known or estimated five the second step called uh update step we then uh only have to estimate the prediction error that you've made in the forest step this prediction error is denoted the following and um S and in order to um estimate this prediction error we consider of this the french and C D which is a a noise input efficient mine the first a prediction and as was see later uh we perform here in order to estimate yes we perform spectral weighting of the the french C E uh by a weighting gain G in order to estimate once you've estimated yes we can that update our first uh prediction and finally get a you the enhanced a speech but such a a come come the system the low or a common that the system is uh a light a a separately for each uh frequency uh have and we and then finally uh transform you uh the whole frame back into the time oh system can be is extended to uh noise signals in order to exploit also possible uh correlation of um noise signals that for we apply the propagation step also to the a voice signal where are we use of the previous M K uh and hands uh no estimates noise estimates from the uh past in order to a pretty a the current noise fish the it's that in we have to estimate two prediction errors that of the speech signal E S and that of noise signal and so let's out close to look to this problem so the objective here and the update step uh i just mentioned to estimate the S and S and N uh based on the difference signal be and this case D is given as the noisy input coefficient S plus and minus a first speech prediction mine the first uh noise each and this expression can also be uh stated it as some of the two prediction errors S last and so we have a a classical noise reduction problem you know the update step we have a target signal E S but want to estimate which is just or by an additive noise signal E and and we've but access only to the a noise a difference of signal D and this allows us to use here uh and a conventional statistical estimator which is that uh to the statistics of S and and we can perform you know the spectral weighting of the a french and signal by weighting gain G in order to estimate S or five one minus G in order to estimate a nor or to each and all these uh weighting gains G the original common to approach assumes a got P F for a S and and and minimize the mean square error between S and its estimate and comes that was to the well-known known we know just solution for the a weighting gain G as can be seen here however we met at the statistics of the speech prediction error signal E S and distribution of yes is not caution but a course as we showed that the i guess in two thousand a and i eight and this fact can i'll be exploited in the update step if we did not use the we have filter but statistical estimate which can be adapted to the as statistics for example this and was the estimator by a loans and is uh code leaks which assumes a generalized gamma distribution for uh the uh target see so far we measure our a pdf of the S for uh a a as an are ranged and averaged the results um at the end so at the end we had uh one single a histogram this contribution now we performed an as an norton band measurement of the statistics therefore for we just or our uh speech signals by white portion uh noise noise at different input snr values and measured uh the histograms the result can be seen here the normalized a P F for the mac you'd of yes depending on the input as an hour which bears you from minus twenty two thirty five um do and you can clearly see uh the uh input as an or has influence on the uh histograms you high of the input as a or the higher the of the probability that a a only small of prediction errors a Q and this fact now can also be supported in our system if we use an as an already and mmse estimator in the update step or this we use the em as the estimator by uh i mentioned before which is not adapted to each of the uh histogram we've just see so for each quantized as an or a value with the step size of a five db T V we um use here a different uh mmse estimator so a gate is now also uh depends uh on the input as an or and it's in order to estimate its input there's and or in our system we simply use a and has speech and noise ephesians form previous frames with such a system we increase of course the computational complexity and the a memory requirements compared to a a conventional statistical estimator compared to we know filter for example we increase the complexity by a factor of uh six round about and it's additionally we of course have to um store previous frames for the prediction part and um a look up table for each and as test K for a calm to the results some more a a system that things use your relative low a model orders of model all of three for speech signal and a lot of two for of the noisy signal they are coefficients are it's so in each frame using the elevens and uh i read which is applied to estimate from a previous frames and is names statistics and the up step to pull four uh the noise power a would can we cheap with such a system at first uh objective measurements averaged over five different uh noise signals is see that segment of speech as an hour lot of over the noise attenuation with the input as an R um bearing your from mine a ten to thirty five E objective here is to achieve a high noise attenuation and a high a segment speech as an are so the more these curves of place in the upper right corner the better performance in a blue and red to see the results of two purely statistical estimators the wiener filter and a low plus mse the estimator which assumes a low class distribution for the a speech signal and the green and um like to see the i two proposed uh common filter approach in green uh a where we use the a as an art in depend was the estimator and the update step and in um like B and you approach would be as an already penned M Ms and overall you can see here that um we three to come of it approach we uh of the form a T to a statistical estimators look here for example an input as an hour of um five db and keep here a the segment speech as an hour uh constance speech if you're a a much higher a a noise attenuation with a to model based approaches and in gain here a to three i D V noise attenuation if we compare the wiener filter and then you as an or then a common to also like to give you shorts demonstration uh these four uh investigated techniques at first oh play the noise the signal then uh the enhanced signals but the wiener filter and the the plus and was the estimator then the to common food approaches and it last once again of of more i i i i i i i i i i i i and i i i i i i so that you could hear that we if you with the you proposed as an art event come to that of the as noise attenuation while achieving almost same same a speech was see that the other additional objective measurements showing uh similar similar behavior can be found in the paper and the meantime we also conducted and informal listening tests which uh cannot be found the the paper the left side be compared to estimate which were not uh that to just to some a its statistics and compare to of be as an or independent common filter with the a wiener filter and on the right side we compared to estimate as which are ecstasy the um adapted to match it uh statistics you we compared the as an already pen common do with the simple course my so that we had nineteen uh a test persons and would just the over quality of there is bad respective uh techniques and in both figures and yeah you can see a clear preference for the uh you proposed um common that K just summarise we presented here a modified common filter approach which able to small its temporal correlation of a a speech and noise uh signal sure that in the update step the input as an R has influence uh on the statistics of the speech prediction error signal this fact can be exploited by uh using an as an already depend and and the estimator which just adapted to the method uh is the grams um in the update step and we showed in objective and subjective uh of iterations that uh we can approve the results of the statistical estimate Q i i first question oh it it it's hard to hear but i thought i detected an increased amount of musical noise in your oh you're a little last examples that you play it can you come place yeah that's true i mean is the trade between noise attenuation speech distortion and he's a good tones and um well the first two aspects we we are better but we unfortunately increase slight increase uh a to of music noise there um well you could use some some of post processing techniques three you uh the remaining john it in in a plot that you uh i think you at for different types of use did you were looking at where you were you were correctly where are or where be you can choose um five different types of noise i the average across all five yeah a word you play we're supposed to white gaussian noise example that was the effect noise the fact we note okay oh i it just a quick question and it's a term you set that equal to three for speech two for noise did you tried during that with different types of speech oh like growls or or a conscience or not the average of over a large database of speech yeah this before for of and the values you set for to this that does make a difference for the different types of noise yeah D bands or to the noise so of course can the the type of noise less for white course noise and the more of course for about okay i don't see for contents so i i would like to thank all the speaker some of the session