and you for an introduction so that's is that right away um the out of or um i'm got make a short introduction i giving a problem statement um and then wanna so introduce the uh the speech distortion weighted multichannel wiener filter and then but introduced we also very short in that condition of speech present but which um is the basis for the to solution that we gonna propose and find "'em" one hope to the word just to give a shock the on the hearing loss problem so some common cost of here well ways and B H related or exposed to to noise and or or of listening to loud music for a long time here so these a and a fact that can a fact all of us but more or the a consequence if you have a hearing loss or is uh you have a reduce the frequency resolution solution temporal resolution so you have difficulty distinguish between different sounds a a different frequency was a have problems with a low class sounds and it's problem of course um one or when hearing aid uses is is in a noise environment possibly with multiple speakers or any kind of noise and also a problem can be reverberation so for this reason there in the past and many more to microphone or structure proposed uh a is as directional microphones various of but beam formers how it is work would a for was on the multichannel wiener few so basically a the idea of all approach to find a set of filter coefficient so that you can do do a reduce the noise and minimize the speech distortion and the old goal of course is two improve the um intelligibility so if it does that by the defining the uh mike of all signals so you have a speech signal and uh additive noise contribution where is the frequency index and it is the frame index in this case we will more than the uh two microphone set so the and and yeah um it miss a criterion we form an like this so we wanna find a set of that the coefficient that minimize the difference between the decide speech component the filter to version of the noisy signals so basically we choose to estimate the the speech complained the first microphone so that would be the front microphone the hearing so an extension of this is if we sure that the speech and noise uh are statistically independent we can formulate a the M secrets here in this way so the first term corresponds to a speech distortion term and the second term corresponds to the the she'd don't noise and then the formation can be like like this so basically you have the estimated speech correlation matrix and the is the noise only correlation matrix weighted by a certain factor which correspond to um almost was a fact so you at this point we can see that the end of we have basically is based on the correlation matrix so we show a course of details and the problems involved in is to make these contribution so in general to estimate the uh the basic is to estimate the noise only correlation lectures and this speech plus noise correlation majors so they're the a speech so user basically to get a clean speech production major a can do that by for instance using a a voice activity detector to estimate the P that T speech correlation images doing a a speech plus noise pair and noise only doing is only few it's and then you make this so structure here so basically a in the in we have um is contribution skip fixed doing different periods so of course but as if you have a a speech does not here the update of the noise only correlation may just would be kept fixed and the speech plus noise correlation majors will the update so of course level also so the limitation of the tracking of the noise correlation matches because you imagine and if but the noise prior to the speech appear higher then then the speech plus noise pier and if we stop adapting the noise pollution images we basically have a that's a a red or or more special ability furthermore the estimation of the correlation may just as is typically don with a high averaging a should really in the area of two to three seconds so somehow how this also limits the um tracking capability spectral so if you look at the motivation for work are we start that um um since the S D W and all we depends on the long term average uh basically the noise to do some kind of a limited kind of eliminate um start time if X us and musical noise and and all that at i fixed but this present a single channel noise reduction another issue that we got a would here is this a weighting factor here with a general is used as a fixed weighting factor for all frequency of all frames and this is what we kind of say well this the what a base of our work is to find a optimal weighting factor because in general you can say that the speech and noise will be a stationary and in general was a say that one the speaking will have a lot of silence here in to in that we can exploit in the noise option process why the noise general general could be continues press so what propose is that we want to apply a different weight to the speech dominant segments and to the noise them dominant segments to do that which of inspiration from uh a single channel much ducks approach where there but a lot of work been done on a spectral try so so basically we don't inspiration from a a of the speech present ability basically they there's that by finding that two state models so you have one one state what you have noise only and then have once we go speech plus noise where as the use standard approach basing that assume that we have noise given that all time so by exploiting a to state model who we can improve the noise option so basically just a very shortly introduced to speech possible bill T it's estimate for each frequency for each frame it is based on uh an estimate of the the probability of speech being absent and then you have very contribution of different see to noise ratio measures so an example can be shown here where are you can see here that so low frequency area yeah high probability of speech and then a certain point you have a lot or build so the question was how can be exploit this in a in the most channel wiener feel we we start by kind of what to find the uh objective function so have we first have a first term we is the H one state where the the P and we have a second term which is the H zero state weighted by the one minus P so basically we take into account that we also have a a whether noise only so we can be more aggressive this stays in terms of noise reduction where we derive it of course the now we have you end up with a term one O P which basically um kind of like a um is not change for each frequency for each frame of that's with a fixed weighting factor B so basically if you have a high probability of speech you go back to kind of like preserving the speech and if you have a low probability you got to more aggressive noise reduction the problem here however is that as you so before the uh this speech present bob it's a kind of various a lot for each frequency of course when we applied in in this setup we we had a lot of distortion a lot of to face basically some aspects that was related to signal channel noise reduction a fact is that this filter here doesn't really distinguish between the it show the H one state so we when a little further i mean look and we kind of like that what have as if we could actually to take the H where H one state so we had was so we propose a simple method to do this we already have the information per frequency so we kind of just set okay we look at for each each frame we to be average and if the average is higher and a than a certain that stress how we were we were selected as H one state and otherwise i eight zero here's an example of this is a clean speech signal but of course it was estimated on the noise signal and here you can see that are certain so do values here we we be did take S H one state and all the S it's zero state so the rational behind having this information is that in the H zero state the noise corruption perform form there can be wait differently because that's no speech presence of B can be must must rested without compromising the this or increase the speech distortion in the H one state of course we we also want to reduce some most but we want to do it a bit more carefully so this is the idea of what we wanna apply a certain flexible weighting to do that a similar way what you can see here is that if we have detected a H one state we apply much small higher stress L a weighting factor and if it's a H one state at some point we were still apply a a lower but fixed weighting factor and it went if a bit to gets higher a kind of weighted according in that way in you can kind of preserve certain speech Q so to build that into the uh the standard and double there so basically we have a combination of soft values and a binary detection so the first one is uh a function of H one state which is a function of certain fixed trestle and the speech present ability and the second term is basically kind of using a fixed weighting fight and we derive it is a of course it all a P here this is the weighting factor so by exploiting both the soft value and the hardware and then we is honest uh simulation as well uh use the to microphone hearing the idea in a one all set up a and we have a relatively low level and time to more to babble noise sources and we used to objective quality measures uh uh which is the it's it's is an hour and the signal distortion so if we look at the results it to see that the standard method gives a much or signal to noise ratio but when you're re what when we decrease the weighting factor at the same time that E the distortion or also increases where we use the the one but we initially use with the one or what peter the problem was the high situation so you was still get like quite a good um is in uh performance but the distortion simply when very high but with the flexible press hall we use the different way fighter here we can see that the distortion like uh the um see does not stream improvement when is relatively high and the distortion was also have low of course the question is like how we you choose this weighting factor and that's of course still something that you're working on so does to summarise uh percent a different the extension of the uh is D W the we have algorithms we started to look at it with a fixed weighting factor then we incorporated the speech present T and then at the end we ended up with a combine solve and the binary detection in future work um we are aiming at performance some perceptual evaluation using a hearing it that listeners and we we'll we for the working on a finding a mall perceptually motivated weighting factor for is as we put uh exploits certain masking properties or even incorporating some a hearing models uh in the waiting process itself i do i i question i i yes please back the back Q for for each intention so my question is that the he C so a P you for uh uh speech do uh as to each is possible to apply to or twenty five each the wiener filtering for a speech an action for me just that you have to design speech and you have we include in speech so was a time not can you don't P and in can be a you know can she do we're still uh and these guys piece all seas so we yeah each that do you C C's D not P cable and how do you choose a weighting factor i i we use should you know one oh can you can you use some ninety i can you repeat the question go i so i can hear E yes okay yeah now you applies a multichannel channel mean if you mean for a noise reduction so my question is that E C's in a both a the speech production for symbol you have a desire to speech and you have will in turn few speech oh you are you mean like a multiple speakers in now yeah yeah yeah yeah well i guess it was still be uh i think you can it up i what's a scenario but but of course is gonna be more difficult estimating this a conditional speech possible of to because now the spectrum gonna be most most similar to the but decide speech signals of course no have to be much more careful when estimating the weighting fact and i think still that you it was to be you was it applied a multi speaker so that a build the results would be a little worse uh_huh okay thank you my question comments yes i mean and my questions a uh you reminded to he's question asking is uh uh when you apply i was them to uh to these uh do you have constraint on the east or or us and an something on the noise type right i because you the noise is an impulsive noise or and that type of noise my out you know as he set if for the noise is speech well in impulsive noise on the kind of noise um can used can this reasons do do you know with this see yeah well at this point we don't make any assumption of the noise actually a it can work one i i i was a that uh the most difficult scenario would be the motive a speaker in there but in terms of um noise types of thing you can apply to any of most there's no assumption so that we make a had to be certain type of noise a a user um so you mean that is this algorithm can be used for any type of uh uh noise or given the noise use uh and the speech just top inter speech yeah okay so well i i i think that uh in terms of choosing all these the values for threshold of course uh if you have like multiple speakers scenarios if you because well have depends on how well you can estimate all these uh a spectral components like that speech possible but and how how well you make the binary decision so of course if you have a multiple targets in out you might have a a large error on your estimation and then of course if you choose then probably you wanna choose a different value because if you have a large row in you will be subject to maybe a higher speech distortion what your five in this case if you have a that say read easy scenario like maybe like a car noise in that you have like more station noise then you estimation the speech by simple but probably most hi accuracy of course you can also apply most more aggressive press but if you have a was able talk as in there you probably have to be much more careful you can use them on there yeah oh i i mean i just one to ask have you to these type of scenario you you have to go any result you mean on the um you minimum a remote of all speakers scenario no will we didn't as the multiple speaker scenario of what we did it as the ways it was uh a much higher uh a room reverberation and then we saw that the estimation needed to be to a little bit and some of the values it is a carefully chosen but to increase the distortion in that case the estimation of the spectral components was much more in a so we kind of had to choose different values so of course it all depends on how a you can estimate he's components and and of here we just as a proof of concept we had like a low revisions an hour and just had a it's over babble my questions yeah a things that is not what i mean hearing it's then it's of course a of from you they can not only for speech right how does it found yeah because you use a different i state um depending on the frequency right and it depending on the frame yeah right so if yes and that for example then people want to you of the music might be yeah only second know how we will work in was used in there because this is more like a much should option process so i guess and use get a no if you this besides move so but then you should split it off many uh yeah music yeah probably ones with stop we only work with the speech signal yeah but if yeah had do you and it's of course it's applicable to any yeah of course i mean but of course in that terms them of recess doing more like a what a convex with between different the settings and so on that or in this case it it will not what well uh_huh i G and if you have used for speech then for example this start of the P to speech might not be detected well i because it's uh consider that known as noise yes to but but one example you can see is that sometimes like if you have like a high frequency component like he's some something yeah uh you are these because C but the colour by the noise i if the speech present probability built for in the case a very low probability speech and you are now it to be very grass these areas sometimes you really missed is yeah on the time so he's what a since it ways as was like he was saying like shoes i sometimes you will not be able to hit is actually if not allow a notion of to be very aggressive those yeah the of the techniques um a yeah and then uh well basically what we were can ours we know that we could be pretty aggressive but it would come at a cost so right now we are trying to kind of constrained these waiting like to by some psycho-acoustical problem so we exactly know when how when and how much to apply so basically if if we know that certain things see hi built of speech and then you probably mask or the noise of all the frequency and then we may not have to remove that most noise at the coming in the following week okay a comments questions okay thank you that's