0:00:13and you for an introduction
0:00:15so
0:00:16that's is that right away um the out of or
0:00:19um i'm got make a short introduction
0:00:22i giving a problem statement
0:00:23um and then wanna
0:00:25so introduce the uh
0:00:27the speech distortion weighted multichannel wiener filter
0:00:30and then
0:00:32but introduced
0:00:33we also very short in that condition of speech present but
0:00:36which um
0:00:38is the basis for the to solution that we gonna propose
0:00:41and find "'em" one hope to the word
0:00:43just to give a
0:00:44shock the on the
0:00:46hearing loss problem
0:00:48so some common cost of here well ways and B
0:00:51H related or
0:00:52exposed to to noise and
0:00:54or or of listening to loud music for
0:00:57a long time here
0:00:58so these a
0:00:59and a fact that can
0:01:00a fact all of us
0:01:02but more or the a consequence
0:01:04if you have a hearing loss or
0:01:06is uh
0:01:08you have a reduce the frequency resolution solution temporal resolution so
0:01:12you have difficulty distinguish between
0:01:15different sounds
0:01:16a a different frequency
0:01:18was a have
0:01:19problems with a low class sounds
0:01:21and it's problem of course um
0:01:23one or
0:01:24when
0:01:25hearing aid uses is
0:01:27is in a
0:01:28noise environment
0:01:29possibly with multiple speakers or any kind of noise
0:01:33and also
0:01:34a problem can be
0:01:35reverberation
0:01:37so for this
0:01:38reason there
0:01:39in the past and
0:01:40many more to microphone or structure proposed
0:01:42uh a is as directional microphones
0:01:45various of but beam formers
0:01:47how it is work would a for was on the multichannel wiener few
0:01:51so
0:01:52basically a the idea of all approach to
0:01:54find a set of filter coefficient
0:01:57so that
0:01:59you can do do a reduce the noise and minimize the speech distortion
0:02:03and the old goal of course is two
0:02:05improve the um
0:02:06intelligibility
0:02:08so
0:02:09if it does that by the defining the uh mike of all signals so you have a
0:02:13speech signal
0:02:14and uh
0:02:15additive noise contribution where
0:02:18is the frequency index and
0:02:19it is the frame index
0:02:21in this case we will more than the uh two microphone set
0:02:25so the and and yeah um
0:02:28it miss a criterion we form an like this so we wanna find a set of
0:02:31that the coefficient that minimize
0:02:33the difference between the
0:02:35decide speech component
0:02:37the filter to
0:02:38version of the
0:02:40noisy signals
0:02:42so basically we choose to estimate the
0:02:44the speech complained the first microphone so that would be the front microphone the hearing
0:02:49so
0:02:50an extension of this is if we sure that the speech and noise
0:02:53uh are statistically independent
0:02:55we can formulate a
0:02:57the M secrets here in this way so the first term corresponds to
0:03:00a speech distortion term
0:03:02and the second term corresponds to the
0:03:04the she'd don't noise
0:03:07and
0:03:08then the formation
0:03:10can be like like this
0:03:11so basically you have the estimated
0:03:13speech correlation matrix and the
0:03:15is the noise only correlation matrix
0:03:18weighted by a certain factor
0:03:19which correspond to um
0:03:21almost was a fact
0:03:23so you at this point we can
0:03:24see that the end of we have basically
0:03:28is based on the correlation matrix
0:03:31so we show a course of details and
0:03:32the problems involved in
0:03:34is to make these
0:03:35contribution
0:03:37so
0:03:38in general to estimate the uh
0:03:41the basic is to estimate the noise
0:03:43only correlation lectures
0:03:45and this speech plus noise
0:03:47correlation majors
0:03:48so they're the
0:03:50a speech
0:03:51so user
0:03:52basically to get a clean speech production major
0:03:54a can do that by for instance
0:03:56using a a voice activity detector to estimate the
0:04:00P that T speech correlation images
0:04:02doing a a speech plus noise pair
0:04:04and noise only doing is only few it's and then you make this
0:04:08so structure here
0:04:10so basically a
0:04:11in the in we have um
0:04:13is contribution
0:04:14skip fixed doing
0:04:15different periods
0:04:17so of course but as if you have a a speech does not here
0:04:21the update of the noise only correlation may just would be kept fixed
0:04:24and the speech plus noise correlation majors will
0:04:27the update
0:04:28so of course level
0:04:29also so the limitation of the
0:04:32tracking of the
0:04:33noise correlation matches because
0:04:35you imagine and if
0:04:36but the noise
0:04:37prior to the speech appear
0:04:39higher then
0:04:41then the speech plus noise pier
0:04:42and if we stop adapting
0:04:45the noise pollution images
0:04:47we basically have a
0:04:48that's a a red
0:04:49or or more special ability
0:04:52furthermore the estimation of the correlation may just as
0:04:56is typically
0:04:57don with a high averaging a
0:04:59should really in the area of two to three seconds
0:05:01so somehow how this also limits the um
0:05:04tracking capability
0:05:05spectral
0:05:08so if you look at the motivation for work are we start that
0:05:11um um
0:05:12since the S D W and all we depends on the long term average
0:05:16uh
0:05:18basically the noise to do some kind of a limited
0:05:20kind of eliminate um
0:05:22start time if X us and musical noise and
0:05:25and all that at
0:05:26i fixed but this present a single channel noise reduction
0:05:29another issue that we got a would here is this
0:05:32a
0:05:32weighting factor here
0:05:34with a general is used as a fixed weighting factor
0:05:37for
0:05:38all frequency of all frames
0:05:40and this is what we kind of say well
0:05:42this
0:05:43the what a base of our work is to find a optimal weighting factor
0:05:47because
0:05:48in general you can say that the speech and noise
0:05:51will be a stationary and in general was a say that one the speaking will have a lot of silence
0:05:56here in to in that we can exploit
0:05:58in the noise option
0:06:00process
0:06:01why
0:06:03the noise
0:06:03general general could be
0:06:05continues press
0:06:06so what propose is that
0:06:09we want to apply a different weight to the
0:06:12speech dominant segments and to the noise
0:06:14them
0:06:15dominant segments
0:06:17to do that
0:06:18which of inspiration from uh
0:06:20a single channel much ducks approach where there
0:06:23but a lot of work been done on a
0:06:25spectral try
0:06:27so
0:06:28so basically we don't inspiration from
0:06:30a a of the speech present ability
0:06:33basically
0:06:34they there's that by finding that two state models
0:06:36so you have one one state what you have
0:06:39noise only and then have
0:06:40once we go speech plus noise
0:06:42where as the use standard approach basing that assume that we have
0:06:46noise given that all time
0:06:48so by
0:06:49exploiting a to state model
0:06:52who we can improve the noise option
0:06:55so basically just a very shortly introduced to speech possible bill T
0:06:58it's estimate for each frequency for each frame
0:07:01it is based on uh
0:07:04an estimate of the
0:07:05the probability of
0:07:07speech being absent
0:07:08and then you have very contribution of
0:07:10different
0:07:11see to noise ratio measures
0:07:13so an example can be shown here
0:07:15where are you can see here that
0:07:18so low frequency area yeah
0:07:20high probability of speech and then
0:07:22a certain point you have a lot or build
0:07:24so the question was
0:07:25how can be
0:07:26exploit this in a
0:07:28in the most channel wiener feel
0:07:31we we start by kind of what to find the uh objective function so
0:07:36have we first have a first term
0:07:38we is the H one state where the the P
0:07:40and we have a second term
0:07:42which is the H zero state weighted by the
0:07:44one minus P so basically
0:07:46we take into account that we also have a
0:07:49a whether
0:07:50noise only so we
0:07:51can be
0:07:52more aggressive this stays in terms of noise reduction
0:07:56where we derive it of course the
0:07:58now we have
0:07:59you end up with a term
0:08:01one O P
0:08:02which basically
0:08:03um
0:08:04kind of like a um
0:08:06is not change for each
0:08:07frequency for each frame of that's with a fixed weighting factor B
0:08:11so basically if you have a high probability of speech
0:08:13you go back to kind of like preserving the speech and
0:08:16if you have a low probability
0:08:17you got
0:08:19to more aggressive noise reduction
0:08:20the problem here however is that
0:08:23as you so before
0:08:24the uh
0:08:26this speech present bob it's a kind of various a lot for each frequency of course when we applied in
0:08:30in this setup
0:08:32we we had a lot of distortion a lot of to face basically
0:08:36some aspects that was related to
0:08:39signal channel noise reduction
0:08:40a fact is that
0:08:43this filter here doesn't really distinguish between the
0:08:45it show the H one state
0:08:48so we
0:08:48when a little further
0:08:50i mean look and we kind of like that
0:08:52what have as if we could actually
0:08:54to take the H where H one state
0:08:56so we had was so we propose a simple method to do this
0:08:59we already have
0:09:00the information
0:09:02per frequency
0:09:03so we kind of just set okay we look at for each
0:09:06each frame we to be average
0:09:08and if the average is higher and a than a certain
0:09:11that stress how
0:09:14we were we were selected as
0:09:15H one state and
0:09:16otherwise i eight zero
0:09:18here's an example of this is a clean speech signal but of course it was estimated on the
0:09:23noise signal
0:09:24and here you can see that are certain
0:09:27so do values here we we be
0:09:29did take S H one state and all the S it's zero state
0:09:33so the rational behind having this
0:09:35information is that
0:09:37in the H
0:09:38zero state
0:09:39the noise corruption perform form there can be
0:09:41wait differently because that's no speech presence of B can be
0:09:44must must rested without
0:09:46compromising the
0:09:48this
0:09:48or increase the speech distortion
0:09:51in the H one state of course
0:09:52we
0:09:53we also want to reduce some most but we want to do it a bit more carefully
0:09:57so this is the idea of what we wanna apply a certain
0:09:59flexible weighting
0:10:02to do that a similar way
0:10:04what you can see here is that
0:10:05if we have detected a
0:10:07H one state we apply much small higher stress L
0:10:10a weighting factor
0:10:12and
0:10:13if it's a H one state
0:10:14at some point
0:10:15we were still apply a a lower but
0:10:17fixed weighting factor
0:10:19and it went if a bit to gets higher a kind of weighted
0:10:22according
0:10:23in that way
0:10:24in
0:10:25you can kind of preserve certain speech Q
0:10:29so to build that into the uh
0:10:32the standard and double there
0:10:34so basically we have a combination of soft values and a binary detection
0:10:39so the first one
0:10:40is uh a function of
0:10:42H one state
0:10:43which is a function of
0:10:45certain fixed trestle
0:10:47and the speech present ability
0:10:49and the second term is basically
0:10:51kind of using a fixed weighting fight
0:10:54and we derive it is
0:10:55a of course it all
0:10:57a P here this is the
0:10:58weighting factor
0:10:59so
0:11:00by exploiting both the
0:11:02soft value and the hardware
0:11:06and then we is honest
0:11:07uh simulation as well uh
0:11:09use the to microphone hearing the idea
0:11:12in a one all set up
0:11:14a
0:11:15and we have a relatively low level and time
0:11:18to more to babble noise sources
0:11:22and we used to objective quality measures uh
0:11:24uh which is the
0:11:26it's it's is an hour and
0:11:29the signal distortion
0:11:32so
0:11:32if we look at the results
0:11:34it to see that the standard method gives a much or
0:11:38signal to noise ratio
0:11:39but when you're re what when we decrease the weighting factor
0:11:42at the same time that E
0:11:43the distortion or also increases
0:11:46where we use the the one but we initially use with the one or what peter
0:11:50the problem was the high situation
0:11:52so
0:11:53you was still get like quite a good
0:11:55um
0:11:56is in uh performance but the distortion simply when very high
0:12:00but with the flexible press hall
0:12:03we use the
0:12:05different way fighter here we can see that the distortion like uh the um
0:12:09see does not stream
0:12:10improvement
0:12:11when is relatively high
0:12:12and the distortion was also have low
0:12:15of course the question is like how we you choose this weighting factor
0:12:18and that's of course still something that you're working on
0:12:23so
0:12:23does to summarise uh
0:12:25percent a different the extension of the uh
0:12:28is D W the we have algorithms
0:12:30we started to look at it with a fixed weighting factor
0:12:34then we incorporated the
0:12:35speech present T
0:12:37and then at the end we ended up with a combine solve
0:12:40and the binary detection
0:12:42in future work
0:12:43um
0:12:45we are aiming at performance some perceptual evaluation using a
0:12:49hearing it that listeners
0:12:50and
0:12:51we we'll we for the working on a
0:12:53finding a mall
0:12:55perceptually motivated weighting factor for is as we put
0:12:58uh exploits certain
0:13:00masking properties or
0:13:02even incorporating some
0:13:04a hearing models uh in the waiting process itself
0:13:08i do
0:13:11i
0:13:13i question
0:13:15i i yes please back the back
0:13:21Q for for each intention so my question is that the he C so a P you for
0:13:26uh uh speech do uh as
0:13:28to each is possible to apply to or twenty five each the wiener filtering for a speech an action
0:13:34for me just that you have to design speech and you have we include in speech
0:13:39so was a time not can you don't P and in can be a you know can she do we're
0:13:42still
0:13:43uh and these guys piece all seas
0:13:45so we yeah
0:13:47each that do you C C's D not P cable and how do you choose a weighting factor
0:13:51i i we use
0:13:53should you know one oh can you can you use some ninety
0:13:57i
0:13:57can you repeat the question go
0:13:59i so i can hear
0:14:01E yes okay yeah now you applies a multichannel channel mean if you mean for a noise reduction
0:14:07so my question is that E C's in a both a the speech production
0:14:12for symbol you have a desire to speech
0:14:15and you have will in turn few speech
0:14:19oh you are you mean like a multiple speakers in now yeah yeah yeah yeah
0:14:22well i guess it was still be uh
0:14:24i think you can it up i what's a scenario but but of course is gonna be more difficult
0:14:29estimating this a conditional speech possible of to because
0:14:33now the spectrum
0:14:34gonna be most most similar to the
0:14:37but decide speech signals of course
0:14:39no have to be much more careful when estimating the weighting fact
0:14:42and i think still that
0:14:43you it was to be
0:14:45you was it applied a multi
0:14:46speaker so that
0:14:47a build the results would be a little worse
0:14:50uh_huh
0:14:51okay thank you
0:14:52my question
0:14:54comments yes
0:14:58i mean and my questions a uh you reminded to he's question asking is uh
0:15:03uh when you apply i was them to
0:15:06uh to these uh
0:15:07do you have constraint on the east or or us and an something on the noise type
0:15:11right i
0:15:13because you the noise is an impulsive noise
0:15:16or
0:15:17and that type of noise my out you know
0:15:19as he set
0:15:20if for the noise is speech
0:15:22well in impulsive noise on the kind of noise um
0:15:25can used
0:15:26can this reasons do do you know with this see
0:15:29yeah well
0:15:30at this point we don't make any assumption of the noise actually
0:15:33a it can work one
0:15:34i i i was a that uh
0:15:36the most difficult scenario would be the motive
0:15:39a speaker in there but in terms of um
0:15:42noise types of thing you can apply to any of most
0:15:45there's no
0:15:46assumption so that we make a had to be
0:15:48certain type of noise
0:16:02a a user
0:16:05um
0:16:06so you mean that
0:16:08is
0:16:08this algorithm can be used for any type of uh
0:16:12uh
0:16:13noise
0:16:15or given the noise use uh and the speech just top
0:16:18inter speech
0:16:20yeah okay so
0:16:21well i
0:16:21i i think that uh
0:16:23in terms of choosing all these the values for threshold
0:16:26of course uh if you have like multiple speakers scenarios
0:16:30if you
0:16:31because
0:16:32well have depends on how well you can estimate all these uh a spectral components like that
0:16:36speech possible but
0:16:38and how how well you make the binary decision
0:16:40so of course if you have a multiple targets in out
0:16:44you might have a
0:16:45a large error on your estimation and then of course if you choose
0:16:49then probably you wanna choose a different value because
0:16:52if you have a large row in you will be
0:16:55subject to maybe
0:16:56a higher speech distortion what your five
0:16:58in this case if you have a that say
0:17:01read easy scenario like
0:17:03maybe like a car noise in that you have like more station noise
0:17:06then you estimation
0:17:07the speech by simple but probably most
0:17:10hi accuracy
0:17:11of course you can also apply most more aggressive
0:17:13press
0:17:14but if you have a was able talk as in there you probably have to be much more careful you
0:17:18can use them on there
0:17:19yeah
0:17:20oh
0:17:22i i mean i just one to ask have you to
0:17:24these type of scenario
0:17:26you you have to go any result
0:17:28you mean on the um you minimum a remote of all speakers scenario
0:17:32no will we didn't as the multiple speaker scenario of what we did it as the ways it was uh
0:17:36a much higher
0:17:37uh
0:17:38a room reverberation
0:17:39and then we saw that
0:17:41the estimation needed to be
0:17:42to a little bit
0:17:43and some of the values it is a carefully chosen but to
0:17:47increase the distortion
0:17:48in that case
0:17:49the estimation of the spectral components was much more in a
0:17:52so we kind of had to
0:17:55choose different values
0:17:57so of course
0:17:58it all depends on how a you can
0:18:00estimate he's
0:18:01components
0:18:02and
0:18:03and of here we just as a proof of concept we had like a low revisions an hour and just
0:18:07had a it's over
0:18:08babble
0:18:11my questions
0:18:12yeah a things that is not what i mean hearing it's then it's of course a of from you they
0:18:17can not only for speech right
0:18:18how does it found yeah because you use a different i state
0:18:22um depending on the frequency right and it depending on the frame
0:18:26yeah right so if yes and that for example then people want to you of the music might be
0:18:32yeah
0:18:32only second know how we will work in was used in there because this is more like a
0:18:36much should option process so i guess and use get a no
0:18:39if you this
0:18:40besides
0:18:41move
0:18:41so but then you should split it off many uh yeah music yeah probably ones with stop
0:18:47we only work with the
0:18:49speech signal yeah but if yeah had do you and it's of course it's applicable to any
0:18:53yeah of course i mean but of course in that terms them
0:18:56of recess doing more like a
0:18:58what a convex with between different the settings and so on that
0:19:02or in this case it
0:19:04it will not what well
0:19:06uh_huh
0:19:07i G and if you have used for speech then for example this start of the P to speech might
0:19:10not be detected well i because it's uh consider that known as noise
0:19:14yes to but but one example you can see is that sometimes like if you have like a high frequency
0:19:19component like he's
0:19:20some something yeah
0:19:22uh
0:19:23you are
0:19:24these because C but the colour by the noise
0:19:26i if the
0:19:27speech present probability built for in the case a very low probability speech and you are now it to be
0:19:31very grass
0:19:33these areas
0:19:34sometimes you really missed is
0:19:35yeah on the time so he's
0:19:37what a since it ways as was like
0:19:39he was saying like shoes
0:19:41i sometimes you
0:19:42will not be able to hit is actually
0:19:44if not allow
0:19:45a notion of to be very aggressive
0:19:47those
0:19:49yeah
0:19:50the of the techniques um a yeah and then uh well basically what we were can ours
0:19:55we know that we could be pretty aggressive
0:19:57but it would come at a cost
0:19:59so right now we are
0:20:00trying to kind of constrained these waiting like to by some
0:20:03psycho-acoustical problem
0:20:05so we exactly know when how when and how much to apply
0:20:10so basically if
0:20:11if we know that certain things see
0:20:13hi built of speech and
0:20:15then you probably mask or the noise of all the frequency
0:20:18and then we may not have to remove that most noise
0:20:21at the coming
0:20:22in the following week
0:20:24okay
0:20:25a comments questions
0:20:28okay thank you that's