0:00:15 | yeah |
---|---|

0:00:16 | thank you |

0:00:17 | um |

0:00:17 | and let's the some audience left for the last talk of today day |

0:00:21 | and a |

0:00:23 | the |

0:00:24 | it is uh a of different to the talks before |

0:00:28 | um for for getting line of mike talk we can just this is uh |

0:00:33 | i to the book were that's uh uh i was taught to |

0:00:36 | given that very short introduction seduction what lights also plays |

0:00:39 | say it |

0:00:40 | what's the problem can load it's case may need to permutation and the greedy and how i'm solving in using |

0:00:47 | as sparsity basically criteria |

0:00:49 | so um |

0:00:51 | and the you a case for like some separation is when you have a cocktail party problem |

0:00:56 | um we have some sources |

0:00:59 | uh at this point i say we have |

0:01:01 | speech sources to a people talking |

0:01:04 | and he would like to get |

0:01:06 | sing the components of that |

0:01:08 | uh but a what what you get a some recordings which are just make chance |

0:01:13 | these send single components |

0:01:15 | and um |

0:01:18 | in this case |

0:01:19 | but i'm |

0:01:20 | looking here uh uh we have the |

0:01:22 | better problem off to the |

0:01:24 | uh mixture of being convolutive one |

0:01:27 | as we have to of speech we have reflections and so and so on |

0:01:30 | so uh |

0:01:32 | the problem becomes more complicated |

0:01:34 | and the mathematical formulation for this um we have |

0:01:38 | some source |

0:01:39 | some extent |

0:01:40 | and matrix |

0:01:41 | at least for the instantaneous then use case |

0:01:44 | i gets of measurements and what we want to do is to |

0:01:48 | a estimate might matrix uh separating matrix so we get again to |

0:01:52 | uh i'll in |

0:01:53 | a signals |

0:01:54 | uh for this we had like the ica |

0:01:57 | so nothing you at this point |

0:01:59 | uh what we have to um |

0:02:01 | take into account uh we never now the although of the sources and you never know which energy the sauces |

0:02:09 | have |

0:02:10 | um |

0:02:11 | in my work i used to |

0:02:13 | done not feature a of the natural gradient |

0:02:16 | uh as i think if you but you know |

0:02:18 | oh |

0:02:19 | for speech signals we need |

0:02:22 | uh |

0:02:23 | as always we need some |

0:02:25 | a a probability dispersion |

0:02:27 | functions for speech what when considering here |

0:02:30 | we can safely assume uh we have using a class industry |

0:02:35 | so |

0:02:36 | as i you said you have |

0:02:39 | uh not to simply case we have to convolutive mixture we have |

0:02:42 | a in this case |

0:02:44 | you you |

0:02:45 | different delays you have to reflections and so on |

0:02:48 | so we model this |

0:02:49 | using uh a convolution |

0:02:52 | and uh four |

0:02:54 | we a situations we have some known that to us |

0:02:57 | two thousand four thousand taps or whatever |

0:03:00 | um |

0:03:02 | estimating these filters directly in time domain |

0:03:06 | is |

0:03:06 | hot |

0:03:07 | possibly but very hard |

0:03:09 | so the you wouldn't way is to go to the |

0:03:12 | a time-frequency domain using the short fourier transform |

0:03:15 | and now what we have is |

0:03:18 | just again uh what implication in each frequency bin |

0:03:21 | so uh we can just use the |

0:03:25 | uh up to a to are you shown in each frequency bin independently |

0:03:29 | which is again |

0:03:31 | uh |

0:03:32 | not a problem |

0:03:33 | but |

0:03:34 | no |

0:03:35 | we have |

0:03:36 | the problem of |

0:03:37 | uh the different |

0:03:39 | and rotation patients and and scaling things uh |

0:03:42 | and the previous example |

0:03:44 | can do in you think about that in this case we have to correct |

0:03:48 | um |

0:03:49 | the scaling |

0:03:50 | uh there some standard was you have to solve it |

0:03:53 | uh |

0:03:54 | the typical the case is the minimum distance |

0:03:57 | or often principle |

0:03:58 | uh which we |

0:04:00 | multiply the |

0:04:01 | i'm next matrix by yeah |

0:04:03 | and the with to tight on you down at them and |

0:04:06 | uh what |

0:04:07 | this |

0:04:08 | and that's that we |

0:04:10 | uh X and and scaling done by the mixing system |

0:04:14 | you do not know which was |

0:04:15 | but at least we do not |

0:04:16 | at new distortion |

0:04:17 | just point |

0:04:19 | um |

0:04:19 | some new method |

0:04:21 | uh presented |

0:04:22 | and last time uh a filter shorting filter shaping |

0:04:26 | but for these masks that you need |

0:04:28 | well |

0:04:28 | um |

0:04:29 | solve the permutation problem first |

0:04:32 | uh well it's as |

0:04:33 | uh you can so it didn't each frequency bin independent |

0:04:37 | so |

0:04:39 | we were talking about the permutation problem what what is so how can be |

0:04:44 | uh well |

0:04:45 | uh |

0:04:46 | scrap |

0:04:47 | in this case |

0:04:48 | we have to |

0:04:49 | short time |

0:04:50 | the |

0:04:51 | some space two spectrograms for time free transform |

0:04:54 | of two signals |

0:04:55 | where just |

0:04:56 | when you exactly know |

0:04:58 | these spots a swell between the do use two |

0:05:01 | uh |

0:05:02 | spectrograms |

0:05:04 | when you are we start these signals |

0:05:06 | back |

0:05:06 | to time domain of course |

0:05:08 | both signals appear in boston channels |

0:05:11 | so again you didn't uh |

0:05:14 | separate and so you have to correct |

0:05:17 | for use permutation and these can be |

0:05:19 | uh and every frequency band different |

0:05:22 | and usually comes quite complicated |

0:05:25 | uh usually the two main approaches |

0:05:29 | uh |

0:05:30 | the |

0:05:31 | a lot of paper as in and of friends |

0:05:34 | concentrate on on direct T V two patents and directions of arrival |

0:05:38 | uh the idea is |

0:05:40 | when you have to or mixing matrix as |

0:05:42 | uh we can just |

0:05:44 | uh |

0:05:44 | calculate |

0:05:45 | to directions with a some come from and assume |

0:05:49 | uh that one direction is one source |

0:05:52 | this works |

0:05:53 | good |

0:05:54 | a strong we have low reverberation |

0:05:56 | but i reverberation uh you can't |

0:05:59 | um um |

0:06:01 | pinpoint point a the sauce to thing the direction in all frequencies together |

0:06:05 | uh in this case here |

0:06:07 | i i used the statistics of the separated signals |

0:06:11 | um one |

0:06:12 | trivial simple case is uh |

0:06:15 | you just |

0:06:16 | look |

0:06:17 | such a a line in the neighbouring nine in this say |

0:06:20 | i |

0:06:20 | they have to look to same |

0:06:22 | so |

0:06:23 | they here they are highly correlated |

0:06:26 | um |

0:06:28 | yeah this is true |

0:06:29 | does this |

0:06:31 | at least for |

0:06:32 | when when you are looking for a very near bring bent so we have here to a wreck neighbouring bins |

0:06:37 | and blue and green and yeah okay yeah highly correlated |

0:06:41 | if you just |

0:06:42 | go |

0:06:43 | a few bins away |

0:06:45 | yeah i i wouldn't say |

0:06:47 | these been covered |

0:06:49 | so the correlation method |

0:06:50 | is not |

0:06:51 | so to robust |

0:06:53 | but uh they have been extensions to make it |

0:06:56 | uh a lot more robust |

0:06:58 | oh okay so |

0:06:59 | yeah |

0:07:01 | at these um |

0:07:02 | the correlation coefficients uh |

0:07:05 | take the |

0:07:06 | um |

0:07:07 | and then low |

0:07:08 | calculate the correlation |

0:07:09 | and decide |

0:07:10 | the pen what station |

0:07:12 | depending on all four possible permutations take |

0:07:16 | and then |

0:07:16 | and uh using is uh |

0:07:18 | uh are you can just use a this this way |

0:07:21 | as a already said this isn't very robust you have |

0:07:24 | to make it |

0:07:25 | a because of the |

0:07:27 | yeah when comparing more distant bins |

0:07:29 | a |

0:07:30 | you just got wrong |

0:07:32 | uh and then |

0:07:33 | so um |

0:07:35 | and |

0:07:36 | you years ago uh uh just been proposed it is the other so think she you as proposed here |

0:07:42 | but you don't compare |

0:07:44 | single bins |

0:07:45 | uh yeah |

0:07:46 | but how blocks of bins |

0:07:48 | so that the S luck like this |

0:07:50 | you compare |

0:07:51 | it's a first stage you compare one been but another |

0:07:53 | zero you one |

0:07:55 | and calculate a couple |

0:07:56 | correlation can created in and you get |

0:07:59 | you permutation and take the next to bins and so and so on |

0:08:02 | so in this case you have neighbouring bands and you can assume okay to |

0:08:07 | assumption to five related bins |

0:08:09 | it's met |

0:08:10 | in the next step |

0:08:12 | you take |

0:08:13 | these to correctly calculated bins |

0:08:15 | take to two and calculate now |

0:08:18 | uh these four collation so actually what you get |

0:08:21 | F here for coefficients |

0:08:23 | and we have to decide |

0:08:24 | which one to take to you site which can eight uh which permutation do we take |

0:08:29 | to big as one |

0:08:30 | to mean |

0:08:31 | to always one or whatever |

0:08:33 | four |

0:08:34 | but not a problem |

0:08:35 | here you go to already sixteen and the next |

0:08:38 | yeah we get a sixty four and so on |

0:08:41 | so it becomes even harder |

0:08:43 | um |

0:08:44 | a simple example for this |

0:08:46 | um |

0:08:46 | when we just plot |

0:08:48 | for the the situation but for a frequency |

0:08:51 | bins |

0:08:52 | um |

0:08:52 | the coefficients yeah |

0:08:55 | um |

0:08:56 | for all frequency bins so |

0:08:58 | and the first page you would just take the correlation it C coefficients |

0:09:02 | directly |

0:09:03 | uh on the first of their i don't know |

0:09:05 | uh |

0:09:06 | and a |

0:09:07 | uh okay when you look at this |

0:09:10 | it's |

0:09:10 | looks like |

0:09:11 | just go to uh |

0:09:12 | well |

0:09:13 | it just one here and here |

0:09:15 | hardly |

0:09:16 | so when you going |

0:09:17 | next up to next steps |

0:09:19 | so that's say |

0:09:20 | you compare |

0:09:21 | the block |

0:09:23 | five from that to eight hundred to the block a time that to one thousand |

0:09:28 | we on that or whatever |

0:09:29 | you compare all the coefficients well which are and a square |

0:09:34 | so we have a lot of coefficients which are correctly |

0:09:37 | and a lot of coefficients with or |

0:09:38 | not can |

0:09:39 | and and so on in this case here |

0:09:42 | K |

0:09:44 | as we work |

0:09:44 | here are not |

0:09:46 | but in the next steps you compare these coefficients |

0:09:49 | a K just me still worked as might a stable |

0:09:52 | but this case here |

0:09:54 | if a lot |

0:09:55 | one computations |

0:09:56 | which is a lot of |

0:09:57 | indicators of our limitations which |

0:10:00 | in a right and |

0:10:01 | one conditions so |

0:10:03 | usually the dyadic sorting scheme |

0:10:06 | is that are but still |

0:10:08 | phase |

0:10:10 | but |

0:10:10 | so and signal |

0:10:13 | um |

0:10:15 | no i want to |

0:10:16 | um |

0:10:18 | a present if you approach |

0:10:20 | uh the first |

0:10:21 | uh observation i i and you can make it |

0:10:24 | when you're just take |

0:10:26 | speech signals |

0:10:27 | speech signals as past |

0:10:29 | and um |

0:10:32 | a mixture of two signals which are in a independent |

0:10:35 | this last |

0:10:37 | and a |

0:10:38 | you can extend this |

0:10:41 | even if the signals are on a signal |

0:10:44 | as long as the independent |

0:10:47 | to mixture is less spots |

0:10:50 | and just is exactly what we have a a permutation problem we have to bound a signals and one to |

0:10:55 | look which permutation do we have |

0:10:57 | so the wrong permutation will be |

0:11:00 | uh |

0:11:01 | a past |

0:11:03 | a a you have he an example of this |

0:11:06 | uh just |

0:11:07 | to plain speech signal |

0:11:09 | but nothing |

0:11:10 | hadn't yeah |

0:11:12 | and in this case |

0:11:13 | i just |

0:11:14 | most to |

0:11:16 | hi are that's uh uh of of the signal so that |

0:11:19 | hi up |

0:11:19 | half of the signal |

0:11:21 | to the other so we have to mutation |

0:11:23 | and the lower |

0:11:25 | level of of the the R T K that sorting scheme |

0:11:28 | and when we compare these |

0:11:29 | we have here a lot of |

0:11:31 | you was or more zeros |

0:11:33 | and when you look here we have |

0:11:36 | clearly a signal which is less spots |

0:11:39 | and uh |

0:11:41 | this is exactly what we need to uh |

0:11:45 | from late |

0:11:45 | the a new criterion |

0:11:48 | you want to signal to be S sparse as possible |

0:11:51 | uh the measurement of sparsity um |

0:11:54 | for this is an hour of uh to take to |

0:11:57 | some new method of the lp norm |

0:11:59 | uh |

0:12:01 | in my case cases a usually it takes something like zero point one |

0:12:05 | for for P |

0:12:06 | but it's not that |

0:12:08 | and part you can vary |

0:12:10 | um okay so |

0:12:12 | uh i there is no |

0:12:14 | S with the correlation coefficient |

0:12:17 | we take |

0:12:18 | our signal |

0:12:20 | calculate |

0:12:22 | no not the correlation between two signals |

0:12:24 | but the sparsity of a sum of two signal |

0:12:28 | and |

0:12:29 | take again |

0:12:30 | the four coefficients |

0:12:31 | every every one against each other |

0:12:34 | and you get one |

0:12:35 | um |

0:12:37 | yeah coefficients |

0:12:38 | coefficient which can decide which permutation |

0:12:41 | the point think about this |

0:12:42 | snow |

0:12:43 | we don't take the |

0:12:45 | coefficients in the time-frequency domain but D transform |

0:12:49 | is |

0:12:50 | point process |

0:12:51 | coefficients |

0:12:52 | to uh |

0:12:53 | time domain signal |

0:12:54 | where we can apply |

0:12:56 | it it you know |

0:12:58 | uh |

0:13:00 | using this |

0:13:02 | even if we take |

0:13:04 | that's a hundred frequency bins from K to S |

0:13:07 | still again P that the calm |

0:13:09 | just one and coefficient |

0:13:11 | for the whole sorting she |

0:13:14 | so when we now know do the |

0:13:16 | the are or thing |

0:13:17 | so we have again here and |

0:13:20 | frequency |

0:13:20 | just one thing the frequency band transform to the time domain |

0:13:24 | he again one |

0:13:25 | E applied to you know |

0:13:27 | is |

0:13:28 | and here again |

0:13:29 | and |

0:13:29 | at this point that is uh |

0:13:31 | different |

0:13:32 | no we transform |

0:13:34 | to frequency bins the time domain |

0:13:38 | and calculate again one comes and so and so and so |

0:13:41 | so it's this point you don't know |

0:13:43 | have to problem of |

0:13:44 | which coefficients of this |

0:13:46 | that's a thousands or or whatever |

0:13:49 | do you do you takes on you uh but you have always just one coefficient |

0:13:53 | and |

0:13:54 | due to the |

0:13:55 | different |

0:13:56 | criterion |

0:13:58 | uh a a it's it's much more robust |

0:14:01 | mostly |

0:14:02 | um |

0:14:04 | i have |

0:14:05 | done some |

0:14:06 | simulations |

0:14:08 | um |

0:14:09 | so first set |

0:14:10 | uh uh data set this does a for the set up |

0:14:12 | use |

0:14:12 | go |

0:14:14 | T |

0:14:15 | um |

0:14:16 | so on so about last they can set from five years ago so |

0:14:20 | um |

0:14:21 | we have |

0:14:22 | a separate |

0:14:23 | this this state set that uh is |

0:14:25 | the lot uh somehow |

0:14:27 | it's a reverberant |

0:14:29 | recordings some some speech but to relation is |

0:14:32 | quite whole |

0:14:33 | you can when you hear of to is that has that you can see yeah it's |

0:14:36 | government art |

0:14:38 | derivations like |

0:14:39 | this this case |

0:14:41 | the direction of of uh an approach |

0:14:43 | it's |

0:14:44 | very good |

0:14:45 | um |

0:14:47 | it |

0:14:48 | it works because of the low vibration |

0:14:51 | the proposed method it |

0:14:53 | not as good |

0:14:54 | almost |

0:14:56 | but uh when you're local closely Y |

0:14:59 | is performing |

0:15:00 | not that good it's because |

0:15:02 | it's a very low stage where we compare just one thing and frequency bin |

0:15:07 | i |

0:15:07 | yeah uh |

0:15:08 | happened some limitations to and correct |

0:15:11 | and uh |

0:15:12 | so |

0:15:13 | perhaps |

0:15:15 | uh |

0:15:15 | should it this to get so that a bit more |

0:15:17 | if |

0:15:19 | uh is |

0:15:19 | assumption of |

0:15:21 | sparsity and |

0:15:22 | solves |

0:15:23 | a a one pass cygnus is of this is correct |

0:15:27 | and um |

0:15:29 | but |

0:15:29 | when you going to a a set which uh a that the cartons that high reverberation |

0:15:34 | uh |

0:15:35 | all over you got |

0:15:37 | less |

0:15:38 | uh suppression performance |

0:15:41 | the do approach |

0:15:42 | is |

0:15:43 | because it with to set up you don |

0:15:46 | to have the uh |

0:15:48 | the signal coming from one direction because |

0:15:50 | of the reverberation |

0:15:52 | but |

0:15:53 | the new approach we all again get almost the performance of the non right algorithm |

0:15:58 | uh because this case um |

0:16:01 | you don't |

0:16:02 | matter which direction to signal comes as long as we |

0:16:05 | i able to separate it |

0:16:07 | in every frequency bin |

0:16:09 | and um um |

0:16:11 | so it's not always |

0:16:13 | matching the non by case |

0:16:15 | but it's |

0:16:15 | more robust |

0:16:16 | compared to the |

0:16:18 | signal it's of the dot pro |

0:16:20 | so to conclude |

0:16:22 | um |

0:16:23 | the converted by source separation |

0:16:25 | can be soft and the sorry time-frequency domain |

0:16:29 | a you have to solve the scaling and permutation |

0:16:32 | and |

0:16:33 | no we presented a new algorithm based and sparsity |

0:16:37 | in the time domain |

0:16:38 | not as user a and a dating time domain |

0:16:43 | and with tire of variation we have usually better |

0:16:46 | separation performance and there |

0:16:48 | direction five |

0:17:09 | uh |

0:17:11 | so |

0:17:15 | yeah let's a hard a set up it's like seven and a half set and for this i used five |

0:17:20 | seconds |

0:17:22 | i i saying |

0:17:23 | if |

0:17:24 | an a signal uh enough signal to make i C in each frequency band |

0:17:28 | then there would be enough signal to make you |

0:17:31 | you know |