0:00:17hello do every i'm one of the you
0:00:19uh i right to P G uh
0:00:21our paper court co we the "'em" to fine and and ripple of from uh india yeah random
0:00:27a fast
0:00:29a tight and acoustically motivated base apply a four hundred a mean do were and source separation
0:00:36and is is sporting uh we have looks the first to present a send which is mainly for on uh
0:00:43as all the source spectrum or like uh and M maps or a S M at as we could before
0:00:48so i like to have a size that uh oh work here on the for read close on a
0:00:54the space a model
0:00:56that is some more at to source space of position
0:01:01so here's a are i have to present patient uh first i right to uh people E a rats so
0:01:06proper and a follow by you is general or a gaussian modeling framework uh for source separation
0:01:12then be moved to the main contribution of the work that he's by uh designing a new acoustically motivated space
0:01:19of prior
0:01:20and uh design or a maximum of to be up a be to estimation that to you hand the source
0:01:26separation performance
0:01:28and finally i so some uh experimental results and conclusion
0:01:33okay uh
0:01:35here we are considering zero source separation a problem where we use a a i uh month each and signal
0:01:42you know to by i se T
0:01:44two separate so all
0:01:46S up say
0:01:47and where are a a as the number of sensor is some more attention most all C is it's a
0:01:52under mean case
0:01:54K and it structure
0:01:57and if
0:01:57creating node
0:01:58by us is is is a contribution of only source S they to the microphone array so she's a is
0:02:05called source image she's
0:02:07oh which is related to the origin is all is by a mixing process sees
0:02:11characterised by them we see feature uh it's straight
0:02:15is that a more drilling uh the acoustic the process is from the source to the microphone
0:02:21and in the call type i do you "'em" since is in which are is them because of several sources
0:02:26so we have ice tea
0:02:27is this sum
0:02:28of is i
0:02:30okay that's a missing more
0:02:33so uh most uh state of the ask a process uh four hundred in mean as source separation operates in
0:02:39the frequency domain
0:02:40where a as the convolution in the time domain is up it it by the complex value month view case
0:02:47in needs the you me which is a simple form
0:02:51and and as a so the on this plastic secure some and uh where are only a few scenes
0:02:56uh i was assumed to be active at
0:02:58i frequency point
0:03:00for used no value yeah pop you uh uh do a and we assume been in close to to step
0:03:06uh of uh uh uh the estimates and we if of a uh is here i actually
0:03:11and then
0:03:12is just a square use in uh
0:03:14and state or is i
0:03:16is still we have a a by at for used in a binary mask
0:03:20where only one source is he's see that to be active like its time-frequency point
0:03:25so but this is taken it green be main you need to you know really stick the people over an
0:03:31as since the narrowband approximation here here than a how
0:03:36so you our work
0:03:37we uh a you go to different frame
0:03:41where where uh as a sock comes with just one coefficient of the source in these these
0:03:46is more as a zero-mean of gaussian random variable
0:03:49so a a is more a as the gaussian with a zero mean
0:03:54and covariance man sees a signal actually
0:03:57and we further fight the rise stick my as i
0:04:00by to to high a bit to V a N as a
0:04:03and V a a is the scalar sauce that yeah we encode suspect show how of the sources
0:04:10so that is for more just tossed that spec chili from set
0:04:14and actually
0:04:16is the spatial covariance matrix because these
0:04:18we in
0:04:19is space to a used an of the source
0:04:22okay and we are focusing more on the morning of the uh actually
0:04:30so uh as cool state of asks uh you lying on the net of approach to mason uh wind results
0:04:37on the wrong one and then is so as a
0:04:39is still products of to two we see that the is
0:04:43but in our world
0:04:44uh we yeah proposed the for right matt she's for as a way as a coefficient of actually
0:04:51and not deterministic lead elated
0:04:54okay so is no such fall rises
0:04:58so given an uh low and modeling framework and the parameterization as the source separation architecture we need to for
0:05:05step uh so we need these people are
0:05:07a for as to handle me signal is me into frequency domain
0:05:11and then the and the model me till here is the sauce value and and space of query matches
0:05:17and then uh as as a source coefficient is to be cap by uh we of in the way kind
0:05:23of soft masking and then every construct a time-domain signal
0:05:28so we have a "'cause" uh from now on a uh you on the estimation of a more to to
0:05:33we select a yahoo
0:05:35defined it here
0:05:38okay and uh
0:05:40here here is a P jen the main contribution of the paper is score of
0:05:44acoustically motivated this space apply prior
0:05:46so we have to see the reason the sort of and in some situations an
0:05:50where are the view T set can be no
0:05:53just secure S and can come a for this than in the past
0:05:57for a where as the police in of the right is fixed
0:06:00or in the form meeting whereas as a push is in of this
0:06:03uh do later use fixed
0:06:05for used in or on the broadcast thing where we know exactly
0:06:09the put to denote the salt sees and the room acoustic
0:06:13so given Z is known you make says think uh we can exploit is an all these about the sauce
0:06:19score is and and two character
0:06:22to in hand the source separation performance
0:06:25that's the motivation for the work
0:06:28and here we see oh one he's an all is for material
0:06:31whom acoustic
0:06:34if you assume that uh a as the D test pass and are we were in a a a and
0:06:38correlate that
0:06:39and the event a is fused
0:06:42is means that as the how can come form more old pushed in these a two
0:06:46so uh is
0:06:48uh that you we uh win uh leonard no is the mean of the space of or very in is
0:06:53we need close the contribution
0:06:56of of that's part
0:06:57which is defined it here and the covariance up to a T was and a
0:07:01and all these parameter
0:07:03a a it's just a a and C can be computed directly
0:07:06even to you you setting
0:07:08so uh for the next at time i we not present a at a how we can be computed but
0:07:13you can be for to the paper
0:07:15so uh okay uh
0:07:17that's a again so given the room with the the the
0:07:22a Q Q missus setting uh we can compute dean's up the space of corbin and bases
0:07:27and even as is uh mean oh we D five i as the inverse process prior over uh the space
0:07:34the is
0:07:35as a follows the inverse process distribution
0:07:39with the mean
0:07:41given by here and be computed from form the to really of statistical room acoustic
0:07:46and is a value in which is going to by uh the parameter at
0:07:51it's called a degree of freedom
0:07:53can be learned from the training data in the maximum like lisa was sent
0:07:57okay i mean not represent a in about the learning process
0:08:01the reason we choose in speech that's here is that it's a could you could you case prior to the
0:08:06them a gaussian people
0:08:08so we been to as in in a close form a the later on
0:08:14okay so uh
0:08:16now i'll i'll oh is to estimate the as the pen to me to C time
0:08:21and uh we use the expectation maximization yeah and we them a for is proposed
0:08:28is step
0:08:29uh we estimate uh the empirical covariance of bits of cheese
0:08:33uh a man has to to here
0:08:36uh by Z C question where uh that you we still owe simply a window if the we a multichannel
0:08:41wiener of in ring
0:08:43and in the and step uh uh that is you know a that for the map at don't be to
0:08:48up this that we start things
0:08:50so you were see of these a and and say uh can be it a T V updates
0:08:55in is uh jens that
0:08:58and if you see L C question up C separate you can uh uh see that uh
0:09:03he the contribution of the likelihood
0:09:05and Z power come from the contribution of the prior
0:09:09uh that we have it
0:09:10and gamma is the
0:09:13a chair up on a bit error we J D to means the contribution of the pilot
0:09:17and if you want to a bit uh to the me to in the maximum likelihood sense C be step
0:09:22uh a guy is zero
0:09:24so we can come
0:09:26to that like to said
0:09:29okay and now uh we have everything in hand us and uh
0:09:33that's size so some experiment with a
0:09:37so we we compare the source separation performance up propose uh
0:09:43use the paper using uh
0:09:45uh the map of how to meet estimation we there uh
0:09:49uh the maximum likelihood and with them the to likelihood mites re
0:09:54we had the first one is that a uh we don't know every any C uh the you
0:09:58a a so as a a a a blindly the initial i
0:10:02and the second one is that the uh as a is in is a light from the same you made
0:10:07see setting
0:10:08so we a fair comparison
0:10:11we still that if we know some uh uh are you mess stepping before here
0:10:15uh we can improve the source
0:10:18and B so compare as source separation with the base i uh binary mask
0:10:22rather than be some few is fixed
0:10:24the fourth i in the to
0:10:28but see that is computed that of uh from that you see set thing
0:10:32a a so the formula before
0:10:35and here a some up how a need to die
0:10:38speech and sampling rate number of yeah the works and
0:10:43and he is a find a reason as uh is is the every three as uh
0:10:47in terms of signal to distortion ratio we them as of the overall distortion and
0:10:53and uh and uh
0:10:55we compare this separation results the over at feast or on which are a four sources
0:11:00uh with you where you here
0:11:02and microphone spacing things five something meter
0:11:05and uh we uh
0:11:07compute separation results with D for an uh a reverberation time ranging from um
0:11:12a very here uh and that weights so fifty millisecond very uh people about "'em" and five hundred
0:11:20and i
0:11:20use that
0:11:21rule i
0:11:22he's the results given by our for pos the and we uh where the prior information
0:11:29and you we can see that uh
0:11:31uh of the proposed at with them out form or or or a maximum likelihood at with them and baseline
0:11:38a in all uh people over and
0:11:40a have thing
0:11:43okay for instance uh
0:11:44guess that will uh
0:12:07okay or maybe this
0:12:08is that in
0:12:21okay alright right gig can uh
0:12:23so uh
0:12:24you at see that are for sample at uh the revision in time up but two and a few T
0:12:29is a a moderate use in time
0:12:30oh proposed and with them where we know some up iron or is about
0:12:35set the and uh in a hand the stuff that separation form by one
0:12:39that's yeah
0:12:40go back to uh an ad at which and
0:12:44okay he's
0:12:45whose and
0:12:46a uh in the uh our work we propose an acoustically motivated this space of Y are uh
0:12:52which is
0:12:53a from that you rio
0:12:55is that the seek a room acoustic
0:12:57and we derive for the maximum of post the right be a a a a at with uh week so
0:13:01of uh presuppose to to the estimation of the more apparent be to
0:13:06and and the permutation problem okay
0:13:09a i like to every size this one because even known you made testing
0:13:13with the map and with them uh we do not of for from the well-known known with a simple them
0:13:18in the frequency domain source separation
0:13:21and importantly we so with that to prove but was
0:13:24with the help of a
0:13:27but uh a at this point uh we still need to know a many how to meet error like the
0:13:33source sports is and and the re in time a uh a to compute a a a the mean of
0:13:38the space of a very much as
0:13:40so that use your work can be D put good the
0:13:42to a fully a an source separation by estimate the or the acoustic
0:13:50okay that's and of my yeah and they said thank you
0:14:00we have time for
0:14:01so question
0:14:13Q for the presentation my name's is of some of the in T D
0:14:17on the a ha how do cut it does
0:14:19speech are right are in the I
0:14:23in in the one yeah
0:14:24speech of right yeah how do you
0:14:31so he's a space of fire and so uh
0:14:34the distribution is even so
0:14:37what we need
0:14:38to know is the mean
0:14:40and uh uh the variance
0:14:42he's the if i did i i
0:14:44so for the mean see time
0:14:47uh we can compute directly from the you miss yet thing
0:14:51for example you've we know the distant from the source to the microphone
0:14:55we can compute the forth uh that it's a from the sound of a microphone
0:15:00and uh so it uh we use to that the even at
0:15:03if few
0:15:04so uh
0:15:05that the yep that's
0:15:06school i see all politically state the main are a
0:15:09i i i i and that's um
0:15:11so my question needs
0:15:12if the the the uh
0:15:14right yeah yeah
0:15:15different from there
0:15:17really you like
0:15:18the the you
0:15:20different like so you could you oh well
0:15:23a distance is before and
0:15:25a how low a lot of things P
0:15:27to you are loaded
0:15:29yeah to be and is uh i i have been in to get
0:15:32and so got C but that yeah it's a very good so
0:15:36for future investigation
0:15:39and i actually at at this is uh that's uh in this still a where we uh tried to prove
0:15:44that a even as some known you missus set thing again improves the principal it's a separate simple performance
0:15:51but we us
0:15:52that's sky of was source that's said
0:15:54or or as the based in if each you like to estimate these parameters
0:15:59a bentley a from the mixture
0:16:01so at the time do not have a
0:16:03yeah such a a a a a uh variation
0:16:24okay uh yes okay firstly uh which is a very well known uh do you that we might to be
0:16:31mask P a we can see that is eat to as zero baseline a part
0:16:35and i actually in our previous work that with the same uh
0:16:40with the same at more frame will uh like the sees and with them with maximum neck
0:16:45a presented in how a previous paper we also compared to perform and
0:16:50scum at a state of the that we said
0:16:52some be that the size
0:16:55a using a would be nice more and it's is both but was uh approach outperformed performed sees at with
0:17:01which is already compared
0:17:03some as a of
0:17:05i i i would not say one at a state of the but
0:17:08some of and
0:17:10as the baseline
0:17:17that's the questions
0:17:25then that thank the speaker