and and there a one um of
rough urgent phones at university from calm
five for a or two presents a joint one with my C for vices it can
uh the topic is a analysis synthesis
based speech enhancement
we is improved
spectral envelope estimation by tracking speech time then
so
uh
first less
have a look at our line
for for my presentation
first
at the very beginning i where we introduce some uh
but runs
uh
as a spectral you all some
a a effect by noise corruption
conventional filtering ring
and now introduce a model based based speech enhancement
uh
which is a previous
proposed by us
and uh i work then introduce a speech tracking
speech dynamics tracking scheme that is used
in conjunction with the model based
speech enhancement
and uh uh performance evaluation
a cushion you
so uh
let's to have a a first have a look
the effect of noise corruption from as a true
perspective
yeah
use a white noise for example
we can observe that the
harmonic structure of speech as the C V a lead image
and the
the the special name a lot is now
which are out in a lot of a spectral distortion
and the
we are
the is some uh mention no statistical model based
as speech and and has meant to
and
the the the
the upper figure shows the classical oh lot special an impact you
though
and the
from the job times spent on we can see that
the lower portion of the
special and have been
restored
but the overall noise level
can not be
um
where was suppressed
so as a result there will be
many is music tones and the wrist reese residual noise is in the
clean
a a process the speech
and the the
lower or figure shows that uh optimum own modify the
log spectrum them to do you
and the
these not the generally
have a very good
at it
a cat ability of office
noise suppression
and but however
the form men and of the harmonics but
structures
um
for that just talk
so um that a of of us also um often you
are better pass goal but the
and the wild the
uh lower you go
gives a
better segment low snr school so there is always a tradeoff
the two in the noise suppression
and the the harmonic
distortion
we can say the naturalness of speech
so uh
can be also observed
from the spend joe
special special model that
no voice will first
a C V are they just thought of this
special name model
and the can measure no statistical method
what the
partial you're restore the
the
spectrum am all but a partial of for that just the spectrum
so this what potentially a con for some
comment and features
as such as music tones and the low intelligibility problems
in um
speech enhancement
so you in our our previous work we have proposed um
analysis synthesis this approach
based on the how most model
so that
basic idea is to
it's track a close to
Q
from noisy spatial
and the down we reconstruct the noise uh the type a speech
by re is this
using these speech only information
so you can see had from yeah
you have a speech information so you can have the track the location of the harmonics
you have a actual again
so you can have that all are average spectral
and at level
and you have the special envelope
you can have the
track
uh
many to respect
so why use this
what we choose this approach to
uh
to do speech enhancement
first
this model was cape
escape
bow to generate
clean harmonics
and that only speech related information is size
so i background noise is out to me be removed
and the this
this model also
and retrieved
some then each harmonic structure
and that as moves
spectrum would hope so so no isolates spectrum peaks
and the hands no meats
we from one problem
and also this mortal allows
at independent adjustment of
different more apparent
so it in a thing and both ask to
for was or N has the spent M role
and
using this framework
so by you think this now thought
at
we can
you uh we can suffer from the noise suppression
and the the harmonic distortion trade
so from some
uh of of our previous work
um
after we uh
applying some
for clean procedures
using conventional method that
we can apply the pitch
uh frequency domain pitch searching
and that that
a a spectral again estimation
some um really
preliminary result
shows that that P H and the spectral gain estimation
already already give very
good performance
by a a a a a pine on the perfect in the spectrum
however
the spectrum envelope estimation is
someone and
ad
so
uh
we can see yeah for some are really made a result
shows that the the past goal for
uh uh there a do you want noise
would give a already one point five
and the some um
but can measure an approach what a run
one point nine
and the our previous
approach
take D vol
you with this
pretty clean and can give a uh
also a a a point to
uh improvement
however
it's we replace
the M brought with a to clean rule
this
that
it can achieve
three point one seven
so it is
huge huge got here
so we we would expect some
improvement in past call if we can
further proof
spectrum them
so that problem can be state
as a
so for each frame use
frames of noisy observation
uh we want to find a mapping
between the noise and train spectral envelopes
and of full can set sec two frames
we want to find that
temporary tried to juries of clean special neville
oh
uh i in other words we want to estimate clean speech and by
looking for long term
speech you pollution
so by as you me uh over us
certain pure at time
uh a the S U yeah relationship between the consecutive clean spectrum blobs
and uh a
the in relationship between the noise and clean
special on them
we can use that lenient an
just the model to more though
this
uh
state chucking
so the
the feature
we used here is uh
a a line spectrum frequency of lpc coefficients
and uh
and the
for
each uh pure
see each cu result
all
observations
so
we have well
as C a series of lpc coefficients
so a given a comments system few uh
part meters
we can run it
um um i and the
yeah
oh ten clean L quite vision
so the next proper or what you how to to ten
uh
a a common system permit us
for
each
the year is all
which
so the idea is that for each block of noisy observations
we find a a a a we
we use the for each and the culpable
that
but also the
class did
parallel i lpc coefficients
and the
to through some uh optimize region
quite your we can all to and the corresponding i meant them permit
so in that all fine chaining just we have all
noisy and noisy and clean
uh L C coefficients
and the
we use those
spread B Q
to um
sure a to and uh
global and trace
in the sense that blocks with similar be sure
a a group into the same class us
by saying a similar we need to do define a distortion measure here
it could be is uh something vol
measure as a a you could in or you can
use the
some contract manager as as a
uh
as a uh
modified i S measure
and the
you also a to define i'm
feature for each
prop of all persuasions
you can use that average just special
or you can use all of theories of
vectors
so it it what it actually be a a matrix quantisation quantization you this case
and the
a for each cluster
we have both noisy and clean up so
uh
observation a noisy and clean
features
so we can minimize the total neck
a like cool function
for each cluster
and we will
oh to and the design the
comment system them permit in this case
so you know i like adaptation up they just we
we
we also have a a noisy observations
for a block
and the we use the
say
at this
this that's measure to find the cop and trees
and that has the
corresponding comments just the parameters
and the were run their common are we
is uh as that's of permit us and we will get the design better on them
so you can
so from the
spectral round yeah that the tracking
actually gives very good
uh
performance
also have from uh
three D view that
the
a noisy
the noise the envelope trying to juries a
quite
mad and the flat rate
and the some get
this conventional mention P of a read what the re risk oh
some harmonics
but a resulting some use one problem when
but
that the this most tracking
subject
use here
which give various moves
and uh
uh a and accurate to to re
so it is it can also be
observe from this figure that the
for
is that a
spend it
the phone then with
expend as compared to the conventional map
so the tracking gives very close
to the original spectral envelope
try to the right
so uh there's still time men do spectrum
and that harmonic structures
uh also
to
and the from the fine find or size
speech we can see that uh
no
smell or
um it and no use homes
and the
harmonic structures i
retrieved
and the
actually we can achieve a run
to phone
for one
pass school for
speaker dependent trendy
and the the
uh
the noise we use it is a from there are to ten db
or
uh uh using a white noise car noise and uh uh
a a be noise
so a speaker dependent and this be in the pen and testing is used
and it finally uh
i can group the presentation yeah
in this paper
presentation
we uh we've block at the effect of noise corporation an cry option
and the conventional speech enhancement
it's got
as been just got
and not and not it seems this approach is present
and and speech dynamic tracking important that incorporate
you change in the common ring as proposed
and they prove
a special name estimation is illustrated
objective to in terms
but
spectral distortion and passed call i show
so
at so for my
edition
i think you
yeah that so much
yeah this the first question
you have be audio samples are and then have you can up to uh bring with me so you're
yeah yeah yeah i was then some
good
it
yeah i you can sort or could you come on C P U cost to issues
oh um
actually i you
use that a a a for training
it will be time consuming a
will you out
but you can can show that all the in your protection
in of the uh
a a a i thought of size
so that that's always a tradeoff
okay fine tune is full
set able
let
that's quite lot
yeah a next question please
from your presentation uh i realise that
the on is by send is according to clean signal be or upper bound right
yeah
the on nice is sentences results according to the given to clean and but of the be you upper bound
of the optimal case would be the time to lead to show that clean yeah why is so my question
is is
you had in is said the on effect of D
they a noisy phase information that you using your sentences
so um what will be
the exact a of the uh noisy envelope
and the noisy
and fate information that you are using a would is in this case
in this work we just use the
many do spectral
we have not uh look at the face ms
actually uh
in college it has enhancement
uh a face not selling four
improved by
um
research
could
a fact the intelligibility
so
uh uh maybe in free for sure works we will come
but for your information the were some papers also
talking about the importance of phase information in in
the T
a made as which are work you know
yeah you this is a
have a some i mean that a gap between the upper bound and the proposed method of your can also
be because a
that's a
noise it face
so this check this scheme is uh
this lee
what well for voiced speech so
for um voiced speech we
we can just use some pretty clean the data
so this would be something weird asian
for
for form
a gap between the optimal
proposed but
i would be interested to know what you need to
a voice activity detector
all
actually we have tried to use the void
you D to trend voiced and i'm voice
for different that
but there is out there that
we can sure that
trend that
one class to pull or data
it's
you better for
for the whole tracking
yeah you synthesis model is very
yeah adequate for
it's a sinusoidal model for approach using yeah voiced sounds how do you put use the unvoiced sounds
so the unvoiced voiced sound it is basically uh
no P and uh we just
used that
um uh
a a a a boy
time the women port two
seems size
to have a gain information and P information
and just
commit time domain and
yeah i you are they have for of the questions
that is not the case
thank you once more