| 0:00:15 | Q | 
|---|
| 0:00:16 | and uh so far | 
|---|
| 0:00:19 | thank you for being | 
|---|
| 0:00:21 | are | 
|---|
| 0:00:22 | through the session and reading flat annotation | 
|---|
| 0:00:24 | a this paper or well so a that by she it's come on a just think was upstairs that manning | 
|---|
| 0:00:29 | a poster or | 
|---|
| 0:00:30 | myself and its tunnels go | 
|---|
| 0:00:32 | and shift pitch could make it its i will be presenting a | 
|---|
| 0:00:36 | oh speaking at the problem they're talking about today use is with a robustness to the reverberation | 
|---|
| 0:00:42 | a speech is a natural medium of communication for humans | 
|---|
| 0:00:45 | and we've been applying speech technologies everywhere at feast work great and control lab conditions | 
|---|
| 0:00:51 | when we get to real world conditions things sort of break down | 
|---|
| 0:00:55 | and one of the reasons for this is that operation | 
|---|
| 0:00:58 | have immigration was what happens when we have reflections so | 
|---|
| 0:01:01 | and getting from me to you | 
|---|
| 0:01:03 | the sound not only takes the direct uh the meteor it also bounces of walls | 
|---|
| 0:01:08 | that reflections reflections of reflections and so on | 
|---|
| 0:01:11 | so if you actually look at a plot to the right | 
|---|
| 0:01:13 | that just shows the impulse response of for a typical impulse response from a a source to a less not | 
|---|
| 0:01:18 | and you can see the direct a i each of the spikes represents one of the reflections | 
|---|
| 0:01:23 | a a is off | 
|---|
| 0:01:24 | but | 
|---|
| 0:01:25 | these things i continue for some time | 
|---|
| 0:01:28 | so the sound that gets from the source to the listener can be part | 
|---|
| 0:01:33 | in the nearly for | 
|---|
| 0:01:34 | as of the room which is a stage of number here | 
|---|
| 0:01:37 | yeah not be of operation is that is characterised through would be a | 
|---|
| 0:01:42 | a R T sixty time which indicates | 
|---|
| 0:01:44 | how much time | 
|---|
| 0:01:46 | a sound takes to die off by sixty db B | 
|---|
| 0:01:49 | yeah if | 
|---|
| 0:01:50 | after are reflections | 
|---|
| 0:01:52 | and uh the operation | 
|---|
| 0:01:54 | ah | 
|---|
| 0:01:55 | as what to effect or reverberation thus to a speech signal | 
|---|
| 0:01:59 | the left | 
|---|
| 0:01:59 | top panel shows a a a spectrogram of a signal | 
|---|
| 0:02:03 | as from the resource management database | 
|---|
| 0:02:05 | we we have a to that using an artificial room response for a room that was that that at a | 
|---|
| 0:02:11 | a | 
|---|
| 0:02:11 | T six you down for about three hundred miliseconds then it can see what happens | 
|---|
| 0:02:15 | to the spectrogram | 
|---|
| 0:02:16 | but near | 
|---|
| 0:02:18 | and | 
|---|
| 0:02:20 | can actually note that this looks like the spectrogram and the entire spectrogram is sneered | 
|---|
| 0:02:25 | but it actually looks as of this mass spectrogram itself has been passed through a linear filter | 
|---|
| 0:02:30 | and she is brought | 
|---|
| 0:02:31 | happens | 
|---|
| 0:02:33 | two | 
|---|
| 0:02:34 | recognition accuracy because of reverberation | 
|---|
| 0:02:37 | and this experiment we uh | 
|---|
| 0:02:39 | trained our models with clean data from a resource management database | 
|---|
| 0:02:43 | how we simulated room responses one of five cross for cross three room but be much map | 
|---|
| 0:02:49 | we had reverberation time to the a few hundred and five hundred milliseconds | 
|---|
| 0:02:53 | a if we recognise clean speech you get an error of less than ten percent which is the leftmost mar | 
|---|
| 0:02:58 | but with that of a should time of only about three hundred miliseconds which is fairly standard for the | 
|---|
| 0:03:02 | for a room we can get that | 
|---|
| 0:03:07 | hmmm | 
|---|
| 0:03:08 | i don't see that is not audio so | 
|---|
| 0:03:11 | if you that it great | 
|---|
| 0:03:13 | and no | 
|---|
| 0:03:14 | so we | 
|---|
| 0:03:15 | a it's it's a fairly standard row | 
|---|
| 0:03:17 | and it can see that that of it immediately as got up to or fifty percent | 
|---|
| 0:03:20 | and the room responses are what half a second it's | 
|---|
| 0:03:23 | well over seventy percent so it's | 
|---|
| 0:03:24 | right to | 
|---|
| 0:03:26 | a it's very rapidly with bridge | 
|---|
| 0:03:28 | yeah in on it do you know that to deal but that we begin by modeling the effect of reverberation | 
|---|
| 0:03:33 | itself now | 
|---|
| 0:03:33 | consider | 
|---|
| 0:03:35 | how we compute feature as | 
|---|
| 0:03:37 | for speech recognition | 
|---|
| 0:03:38 | yeah have the speech signal can see look only at the grey blocks for not | 
|---|
| 0:03:42 | a speech signal goes through a bunch of file filter that's like mel-frequency frequency filters | 
|---|
| 0:03:46 | a and then the output of we compute | 
|---|
| 0:03:48 | how our at the output of these those you compress the power using a lot function | 
|---|
| 0:03:53 | and then eventually computed dct it gives you the feature | 
|---|
| 0:03:57 | no be evaluation of fixed | 
|---|
| 0:03:59 | in input to each of these filters it actually a a a a fix the signal such is the equivalent | 
|---|
| 0:04:04 | of | 
|---|
| 0:04:05 | affecting the input each of these but the so you can actually model | 
|---|
| 0:04:08 | but a vibration and this manner by be red blocks | 
|---|
| 0:04:12 | and uh the linearity of the uh | 
|---|
| 0:04:14 | convolution that was on your the sweltering that was on over here | 
|---|
| 0:04:18 | is that you can feel | 
|---|
| 0:04:20 | the initial analysis but does it should be a at frequency for does and the room response it's that | 
|---|
| 0:04:25 | so these two | 
|---|
| 0:04:27 | i | 
|---|
| 0:04:28 | strictly equivalent in terms of the effect on the features that are computed | 
|---|
| 0:04:34 | for all the signal that intent | 
|---|
| 0:04:36 | oh | 
|---|
| 0:04:37 | yeah we introduce this | 
|---|
| 0:04:39 | a mine an approximation | 
|---|
| 0:04:41 | we say that | 
|---|
| 0:04:42 | computing the how R | 
|---|
| 0:04:45 | of that they were great signal is it and | 
|---|
| 0:04:48 | roughly | 
|---|
| 0:04:49 | to level grading the C Ds of power values that you get in every channel | 
|---|
| 0:04:54 | and this filter or what here eight these H one to H T M | 
|---|
| 0:04:58 | i i simply the for does that you'd get if you | 
|---|
| 0:05:02 | oh essentially sample by sample square head | 
|---|
| 0:05:06 | impulse response of the room impulse response of the room | 
|---|
| 0:05:09 | and | 
|---|
| 0:05:10 | approximating and this it in this manner | 
|---|
| 0:05:12 | by | 
|---|
| 0:05:13 | for this order | 
|---|
| 0:05:15 | a because not perfect it gives you some and and the it is dependent on autocorrelation of the signal | 
|---|
| 0:05:20 | a i that we have a plot which actually shows what kind of does it makes | 
|---|
| 0:05:25 | the uh the red line is the spectrum of a signal this is the actually the output of a a | 
|---|
| 0:05:29 | uh | 
|---|
| 0:05:30 | a mel frequency for the centered at uh five hundred and it heard | 
|---|
| 0:05:35 | a a not in the room with we actually have a braided the signal in this case we apply a | 
|---|
| 0:05:39 | a i believe a a uh | 
|---|
| 0:05:41 | a a three hundred millisecond our T | 
|---|
| 0:05:43 | a operation | 
|---|
| 0:05:45 | the output of the filter shown by the green line | 
|---|
| 0:05:48 | but just | 
|---|
| 0:05:49 | what should get in this case | 
|---|
| 0:05:50 | this is what you get | 
|---|
| 0:05:52 | out | 
|---|
| 0:05:53 | oh | 
|---|
| 0:05:55 | using list approximate model what we get out here | 
|---|
| 0:05:58 | the shown but the blue line | 
|---|
| 0:06:00 | and you can see that this approximation which we get from from thing | 
|---|
| 0:06:04 | a quite but a vibration and the power | 
|---|
| 0:06:07 | doesn't introduce very much better in fact we have a a a a a a that more quantitative result are | 
|---|
| 0:06:12 | you know | 
|---|
| 0:06:13 | a it turns out that applying this filter | 
|---|
| 0:06:16 | to the palm or is different from applying this but the to the magnitude | 
|---|
| 0:06:21 | but you'd actually be taking this quite of use | 
|---|
| 0:06:22 | square root of these terms so good all points to | 
|---|
| 0:06:25 | and when applied D filter to the palm or you introduce an ad or | 
|---|
| 0:06:29 | which | 
|---|
| 0:06:30 | which results in about a a a a results in some some distortion and the output of the | 
|---|
| 0:06:36 | a a and the output of the uh that would be to to the | 
|---|
| 0:06:40 | reverberation model | 
|---|
| 0:06:42 | but is if you apply to the magnitude the kind of is much smaller | 
|---|
| 0:06:46 | so we but actually in a model | 
|---|
| 0:06:48 | a as you know that the abrasion is the filtering | 
|---|
| 0:06:52 | that can cause on the magnitude | 
|---|
| 0:06:54 | oh of a bird | 
|---|
| 0:06:56 | cough off your mel | 
|---|
| 0:06:59 | so the process can be caught up like so you have a | 
|---|
| 0:07:03 | and channel a mel filter or its equivalent you have a power or magnitude computation | 
|---|
| 0:07:08 | and then you have the spectral money which actually applies on the | 
|---|
| 0:07:12 | to impose the effect of reverberation we've expanded it on it extended on it here | 
|---|
| 0:07:17 | we have the | 
|---|
| 0:07:19 | magnitude or power spectrum going into the room response to get the label | 
|---|
| 0:07:24 | magnitude or power and then of course you have the log and the dct | 
|---|
| 0:07:28 | so what we have done is we have effectively | 
|---|
| 0:07:31 | that | 
|---|
| 0:07:32 | a convolution on the signal which is the room response | 
|---|
| 0:07:36 | to a convolution on magnitude are power spectrum | 
|---|
| 0:07:40 | and only observe all these types | 
|---|
| 0:07:42 | the | 
|---|
| 0:07:43 | have a belated sequence of power right | 
|---|
| 0:07:46 | and then just this | 
|---|
| 0:07:48 | a a problem is to deter mine | 
|---|
| 0:07:50 | oh i'll be stops the room response | 
|---|
| 0:07:52 | as we have | 
|---|
| 0:07:54 | as as the a problem that clean signal at seven | 
|---|
| 0:07:57 | oh this is obviously a an i in constrained problems so we have to impose some constraints | 
|---|
| 0:08:03 | and we're going to impose some constraints is going to say that uh | 
|---|
| 0:08:06 | a because we are dealing with that magnitudes call times are nonnegative | 
|---|
| 0:08:10 | in addition | 
|---|
| 0:08:12 | i don't merely observe B | 
|---|
| 0:08:14 | a a but in signal the actually observe a noise corrupted version of the reverberant signal | 
|---|
| 0:08:19 | so what we will do is to try to estimate the signal | 
|---|
| 0:08:23 | and the room response | 
|---|
| 0:08:25 | such that the error between the output of for model | 
|---|
| 0:08:29 | and what to actually that is uh is minimised | 
|---|
| 0:08:33 | but some sparsity constraints | 
|---|
| 0:08:35 | on the spectrum | 
|---|
| 0:08:37 | no because | 
|---|
| 0:08:38 | a scaling factor going and there also that impose an additional constraint that these room response times | 
|---|
| 0:08:44 | some to one | 
|---|
| 0:08:46 | can this it turns out simply a standard nonnegative matrix factorization problem | 
|---|
| 0:08:52 | i would actually go to the derivation of work you know | 
|---|
| 0:08:54 | but if you do you'd find a that you get a bit rules it's an iterative solution which gives a | 
|---|
| 0:08:59 | it that a very similar to | 
|---|
| 0:09:01 | a matrix factorization you can start off with an estimate | 
|---|
| 0:09:05 | and at each iteration to get a multiplicative update | 
|---|
| 0:09:08 | to this chart | 
|---|
| 0:09:09 | which in shows that are always days | 
|---|
| 0:09:11 | positive | 
|---|
| 0:09:13 | i a propose of this this formulation we have here is not something that for introducing this paper | 
|---|
| 0:09:18 | i has been proposed and by a me car and we also propose it separately a paper in uh | 
|---|
| 0:09:24 | i believe that last year | 
|---|
| 0:09:26 | oh the basic from of isn't but we proposed | 
|---|
| 0:09:29 | she as what we do we had the standard short time pretty one then you compute the power | 
|---|
| 0:09:34 | and the nmf decomposition which is what we have here | 
|---|
| 0:09:37 | i you an estimate of the K | 
|---|
| 0:09:40 | you can which you can perform an overlap add and | 
|---|
| 0:09:43 | estimate B | 
|---|
| 0:09:44 | a a no clean signal | 
|---|
| 0:09:47 | a contribution of what he that is that are not going to work directly on this | 
|---|
| 0:09:51 | power | 
|---|
| 0:09:52 | instead | 
|---|
| 0:09:53 | a actually apply had gammatone filter bank | 
|---|
| 0:09:56 | so basically | 
|---|
| 0:09:57 | that's but the bank here is gonna be a gammatone for the bank | 
|---|
| 0:10:01 | and after having applied the gammatone for the bank we compute them | 
|---|
| 0:10:05 | decomposition composition and the math | 
|---|
| 0:10:07 | and a in a for the bank and then performed the overlap | 
|---|
| 0:10:11 | so the got on for the bank can be thought of as a dimensionality it using | 
|---|
| 0:10:14 | linear operation | 
|---|
| 0:10:16 | on the | 
|---|
| 0:10:17 | a or or the magnitude | 
|---|
| 0:10:19 | and it is simply going to be the equivalent of multiplying the output of an have | 
|---|
| 0:10:23 | the pseudo inverse of this | 
|---|
| 0:10:25 | a lot for device matrix | 
|---|
| 0:10:27 | so that that as an example of what we get this is a reverberated signal are gonna i don't have | 
|---|
| 0:10:31 | audio so | 
|---|
| 0:10:33 | yeah | 
|---|
| 0:10:34 | i this sort of | 
|---|
| 0:10:36 | uh | 
|---|
| 0:10:38 | a a a a a | 
|---|
| 0:10:43 | yeah | 
|---|
| 0:10:44 | it's a lot of maybe | 
|---|
| 0:10:47 | uh by this is what we hard with the | 
|---|
| 0:10:54 | H | 
|---|
| 0:10:55 | she | 
|---|
| 0:10:56 | so i don't know yeah | 
|---|
| 0:10:58 | the the a signal | 
|---|
| 0:10:59 | yeah | 
|---|
| 0:11:01 | okay | 
|---|
| 0:11:03 | yeah | 
|---|
| 0:11:04 | that that that my what right that liberation as very used | 
|---|
| 0:11:07 | uh | 
|---|
| 0:11:08 | it believe me | 
|---|
| 0:11:10 | okay | 
|---|
| 0:11:11 | so | 
|---|
| 0:11:12 | a given that we can actually do this | 
|---|
| 0:11:14 | that was actually can or when you are but in a perceptions a great thing you can you are all | 
|---|
| 0:11:19 | sorts of nice stuff | 
|---|
| 0:11:20 | but and then you put this to that signal at a recogniser | 
|---|
| 0:11:24 | those | 
|---|
| 0:11:25 | the improvements don't sure | 
|---|
| 0:11:27 | so he are we and some experiments on the resource management database | 
|---|
| 0:11:31 | this as a model trained on clean speech and you as what you get | 
|---|
| 0:11:34 | and that was signal is the web braided with uh room response of that we hundred millisecond reverberation time | 
|---|
| 0:11:40 | and the error it goes down if you actually try to dereverberated using the basic and a | 
|---|
| 0:11:45 | mechanism | 
|---|
| 0:11:46 | proposed a by a income come you can | 
|---|
| 0:11:49 | and if this is that i don't the part or | 
|---|
| 0:11:52 | it goes down a but but if you applied on the map that you're it goes down a lot more | 
|---|
| 0:11:55 | so | 
|---|
| 0:11:56 | which shows that gives better off to where we a better at working on the magnitude | 
|---|
| 0:12:00 | and then | 
|---|
| 0:12:02 | oh | 
|---|
| 0:12:03 | here | 
|---|
| 0:12:05 | i | 
|---|
| 0:12:06 | G she M S R be don't and of for nmf variance | 
|---|
| 0:12:09 | again when we apply the garment to one | 
|---|
| 0:12:12 | and and of it so | 
|---|
| 0:12:13 | in the H cases | 
|---|
| 0:12:15 | a as you want that the room response responses the same in every | 
|---|
| 0:12:19 | channel | 
|---|
| 0:12:20 | oh the processing which is really that true | 
|---|
| 0:12:23 | but what happens is that because you observing of that a version | 
|---|
| 0:12:27 | of the signal | 
|---|
| 0:12:28 | it does not make sense to us as that the room response the same in every channel that actually gives | 
|---|
| 0:12:32 | you a bad estimate | 
|---|
| 0:12:34 | so if you as a estimated different room response at each channel | 
|---|
| 0:12:37 | and you get some improvements which a short by these guys | 
|---|
| 0:12:41 | and then if you are actually apply the gammatone filtering | 
|---|
| 0:12:45 | he is what you get when you were on the power but if you were on the magnitude | 
|---|
| 0:12:49 | we can see that if you as you that the room response responses the in all channels you gets | 
|---|
| 0:12:54 | is you performance | 
|---|
| 0:12:55 | if a lot it to be different for different channel | 
|---|
| 0:12:58 | a as a performance again | 
|---|
| 0:12:59 | so the gist of it is that | 
|---|
| 0:13:01 | that do you have a bidding the signal | 
|---|
| 0:13:03 | a after down but don't training and then post | 
|---|
| 0:13:05 | inverse filtering it | 
|---|
| 0:13:07 | and and forming all the net at | 
|---|
| 0:13:09 | but | 
|---|
| 0:13:10 | yeah | 
|---|
| 0:13:12 | a bit of signal results that's and at a rates which are less than half of what you'd get | 
|---|
| 0:13:16 | but the there is no and segment | 
|---|
| 0:13:19 | we got this not of you other | 
|---|
| 0:13:21 | uh test sets this with the by so and was that we had a a three hundred millisecond reverberation time | 
|---|
| 0:13:26 | this is good | 
|---|
| 0:13:26 | a three hundred and five hundred | 
|---|
| 0:13:28 | and we compare it with a bunch of other techniques which i one bar to explain | 
|---|
| 0:13:32 | i take time | 
|---|
| 0:13:34 | again the do just but was better | 
|---|
| 0:13:38 | no | 
|---|
| 0:13:40 | i actually making a model as i'm channel what here mainly that you can through | 
|---|
| 0:13:44 | i | 
|---|
| 0:13:45 | the power computation and the room response | 
|---|
| 0:13:48 | and then performing the dereverberation vibration | 
|---|
| 0:13:51 | i would this for the procedure | 
|---|
| 0:13:53 | have | 
|---|
| 0:13:54 | that is not an up box approximation but that would truly what happened | 
|---|
| 0:14:00 | so in this experiment we actually sort of | 
|---|
| 0:14:02 | fate reverberation | 
|---|
| 0:14:04 | applying that a vibration to a sequence of part values and then | 
|---|
| 0:14:08 | i to do have a big the signal nine but you can see | 
|---|
| 0:14:11 | but that that's a and a interested in | 
|---|
| 0:14:13 | these uh kind of | 
|---|
| 0:14:15 | spurious use which are there because | 
|---|
| 0:14:17 | the plot came out should be it's thesis | 
|---|
| 0:14:19 | and you can see that when be uh | 
|---|
| 0:14:22 | well actually holds to the improvements you get can be very very large | 
|---|
| 0:14:28 | a this case we tried this not all of stuff we so idea was on fake room responses where the | 
|---|
| 0:14:33 | room response computed using the image method | 
|---|
| 0:14:36 | so we applied some true room responses to obtain from a T are this is a room response but | 
|---|
| 0:14:41 | a for seventy millisecond response time is at six hundred millisecond and again | 
|---|
| 0:14:45 | improvements the | 
|---|
| 0:14:47 | no one of the things that everybody knows that | 
|---|
| 0:14:49 | is that the | 
|---|
| 0:14:51 | set up i should use so far | 
|---|
| 0:14:53 | is good | 
|---|
| 0:14:55 | a a speech recognition system is an never train on clean data are you actually train it on matched data | 
|---|
| 0:15:01 | that kind of data that you actually expect to recognise | 
|---|
| 0:15:05 | so | 
|---|
| 0:15:06 | i all of this but all when you perform matched condition training | 
|---|
| 0:15:10 | sure enough you observe that the implements you get | 
|---|
| 0:15:13 | from the of a braiding the signal | 
|---|
| 0:15:16 | if you train the signal a the recogniser and clean speech | 
|---|
| 0:15:20 | not even then given to get to the kind of performance you get if you simply trained the recognizer on | 
|---|
| 0:15:26 | dev a speech | 
|---|
| 0:15:27 | this is a performance a get the yellow bars in each case | 
|---|
| 0:15:31 | but even yeah | 
|---|
| 0:15:32 | we should do you have a braided but the training and test data using our technique you get an additional | 
|---|
| 0:15:36 | improvement which is about | 
|---|
| 0:15:38 | i twenty to forty percent relative or believe | 
|---|
| 0:15:40 | if i can find my at all | 
|---|
| 0:15:43 | i could helps in every K | 
|---|
| 0:15:46 | the truth of the matter is you don't merely have a operation | 
|---|
| 0:15:50 | we also have additive noise | 
|---|
| 0:15:51 | so that have a bit signal gets corrupted by additive noise be explained a here | 
|---|
| 0:15:56 | so we can you know five this were process that i've just show | 
|---|
| 0:16:00 | but | 
|---|
| 0:16:00 | some additional processing to compensate for the noise | 
|---|
| 0:16:04 | i of that's is to for that was presented but it's done yesterday at the leaf | 
|---|
| 0:16:08 | we had to be present that something called | 
|---|
| 0:16:10 | but that spectral | 
|---|
| 0:16:12 | cepstral coefficients D C C | 
|---|
| 0:16:15 | and uh | 
|---|
| 0:16:16 | so so is a procedure that | 
|---|
| 0:16:18 | in a in a in a | 
|---|
| 0:16:20 | and summary | 
|---|
| 0:16:21 | in the D S C C computation | 
|---|
| 0:16:23 | it's of directly web team of the magnitude spectra and compressing magnitude spectra | 
|---|
| 0:16:28 | can you but since is between a just in magnitude spectrum of what is that has is that | 
|---|
| 0:16:33 | stationary signals get can so that | 
|---|
| 0:16:36 | that things that maybe even a little bit | 
|---|
| 0:16:38 | the | 
|---|
| 0:16:39 | i as it turns out that's D speech is | 
|---|
| 0:16:41 | they but as stationary as the noise that could upset | 
|---|
| 0:16:44 | so as a result simply performing this | 
|---|
| 0:16:47 | the friends operation and what only and the prince magnitude spectra | 
|---|
| 0:16:53 | robust bus performance | 
|---|
| 0:16:54 | so that it's bad it's better and now have what positive and negative values so we actually have to sort | 
|---|
| 0:16:58 | of normalise its distribution | 
|---|
| 0:17:00 | and then without any compression whatsoever | 
|---|
| 0:17:03 | just apply dct | 
|---|
| 0:17:05 | and then perform from additional operations as before | 
|---|
| 0:17:08 | and this output is what we used for recognition | 
|---|
| 0:17:11 | and we showed and now the post is that using that | 
|---|
| 0:17:14 | feature | 
|---|
| 0:17:15 | get much more robust recognition then what you get with just read up | 
|---|
| 0:17:19 | a mel frequency cepstra | 
|---|
| 0:17:21 | so it turns out that the room that they're dereverberation that we does that i just talked about can be | 
|---|
| 0:17:25 | combined | 
|---|
| 0:17:27 | but this tells that a spectral cepstral coefficient computation | 
|---|
| 0:17:31 | so you actually do have a bit the signal first and then | 
|---|
| 0:17:34 | you compute the mel spectra these of the not the log mel spectra computed differences is and then compute features | 
|---|
| 0:17:41 | and then and you perform recognition on those features and sure enough you can see that | 
|---|
| 0:17:46 | firstly | 
|---|
| 0:17:47 | and all you have is that a web it'd signal it makes things just marginally words | 
|---|
| 0:17:51 | but the moment to begin adding noise | 
|---|
| 0:17:54 | a blue as as a lot of improvements so this uh is all on | 
|---|
| 0:17:57 | a a a a a room | 
|---|
| 0:17:59 | but millisecond room response | 
|---|
| 0:18:01 | and the blue light shows the performance you get a | 
|---|
| 0:18:05 | and you and do you that will be the signal and | 
|---|
| 0:18:07 | then use the B C C features | 
|---|
| 0:18:10 | and each please uh we've got noise of two different levels | 
|---|
| 0:18:13 | could up in the signal | 
|---|
| 0:18:15 | and you can see that but the best performance by far as what you get | 
|---|
| 0:18:19 | a you but do that will be the signal and | 
|---|
| 0:18:22 | then perform be C computation | 
|---|
| 0:18:25 | so in summary | 
|---|
| 0:18:26 | we model that operation | 
|---|
| 0:18:28 | one speech spectral | 
|---|
| 0:18:30 | we might it as | 
|---|
| 0:18:32 | a phenomenon that a fixed this sequence of magnitude spectra | 
|---|
| 0:18:36 | used an a F | 
|---|
| 0:18:37 | a factor is this to perform of operation | 
|---|
| 0:18:41 | i also used the gammatone sub-band non-negative matrix factorization | 
|---|
| 0:18:44 | not that you have a perceptual weighting on this and | 
|---|
| 0:18:47 | in perceptual weighting | 
|---|
| 0:18:49 | and uh the compared its magnitude and power domain the were here | 
|---|
| 0:18:53 | and studied the joint | 
|---|
| 0:18:55 | normalized and a patient problem by integrating a noise that was feature a lot with | 
|---|
| 0:19:00 | do we that operation and got significant improvements | 
|---|
| 0:19:03 | thank you | 
|---|
| 0:19:09 | well | 
|---|
| 0:19:10 | the is to last talk and do used to have time to | 
|---|
| 0:19:14 | change | 
|---|
| 0:19:17 | yes | 
|---|
| 0:19:40 | hmmm | 
|---|
| 0:19:47 | i | 
|---|
| 0:19:48 | a | 
|---|
| 0:19:49 | oh | 
|---|
| 0:19:51 | a | 
|---|
| 0:19:52 | oh | 
|---|
| 0:19:53 | i | 
|---|
| 0:19:54 | i | 
|---|
| 0:19:56 | i | 
|---|
| 0:20:00 | oh | 
|---|
| 0:20:02 | one | 
|---|
| 0:20:03 | i | 
|---|
| 0:20:04 | a | 
|---|
| 0:20:06 | oh | 
|---|
| 0:20:07 | and | 
|---|
| 0:20:08 | yeah | 
|---|
| 0:20:16 | uh | 
|---|
| 0:20:36 | yeah | 
|---|
| 0:20:38 | oh | 
|---|
| 0:20:59 | yeah | 
|---|
| 0:21:00 | okay | 
|---|
| 0:21:01 | i | 
|---|
| 0:21:04 | i | 
|---|
| 0:21:04 | yeah | 
|---|
| 0:21:06 | yeah | 
|---|
| 0:21:10 | yeah | 
|---|
| 0:21:16 | a question | 
|---|
| 0:21:19 | and yeah | 
|---|
| 0:21:21 | yes | 
|---|
| 0:21:30 | a | 
|---|
| 0:21:47 | a | 
|---|
| 0:21:49 | yeah | 
|---|
| 0:21:50 | okay | 
|---|
| 0:21:51 | and | 
|---|
| 0:21:54 | yeah | 
|---|
| 0:21:54 | my | 
|---|
| 0:21:54 | okay | 
|---|
| 0:21:56 | i | 
|---|
| 0:22:01 | i | 
|---|
| 0:22:05 | oh | 
|---|
| 0:22:06 | oh | 
|---|
| 0:22:10 | oh | 
|---|
| 0:22:10 | if | 
|---|
| 0:22:18 | a | 
|---|
| 0:22:24 | so | 
|---|
| 0:22:25 | and | 
|---|
| 0:22:26 | to i have the last question i have to prove | 
|---|
| 0:22:29 | uh talk about | 
|---|
| 0:22:32 | some sort of has to X and H | 
|---|
| 0:22:37 | and | 
|---|
| 0:22:38 | just just what don't have it is but have a bribe and was once had to it | 
|---|
| 0:22:44 | that's perhaps | 
|---|
| 0:22:47 | the so the optimized is not have to do that | 
|---|
| 0:22:52 | situation | 
|---|
| 0:22:54 | oh | 
|---|
| 0:22:55 | i | 
|---|
| 0:22:55 | i | 
|---|
| 0:22:56 | vol | 
|---|
| 0:22:59 | a | 
|---|
| 0:23:01 | yeah | 
|---|
| 0:23:04 | yeah | 
|---|
| 0:23:09 | oh | 
|---|
| 0:23:10 | a | 
|---|
| 0:23:14 | you | 
|---|
| 0:23:16 | yeah | 
|---|
| 0:23:16 | for | 
|---|
| 0:23:18 | i | 
|---|
| 0:23:20 | i | 
|---|
| 0:23:23 | uh_huh | 
|---|
| 0:23:24 | and | 
|---|
| 0:23:25 | i | 
|---|
| 0:23:26 | i how to do this and the this one | 
|---|
| 0:23:29 | experimental to have | 
|---|
| 0:23:33 | yeah | 
|---|
| 0:23:36 | well | 
|---|
| 0:23:36 | the questions | 
|---|
| 0:23:40 | well i guess so | 
|---|
| 0:23:41 | people a a a a i want to | 
|---|
| 0:23:46 | as the i'm to say of this paper is and in particular | 
|---|
| 0:23:50 | do | 
|---|
| 0:23:51 | so two and and sent much | 
|---|