| 0:00:13 | oh |
|---|
| 0:00:13 | welcome |
|---|
| 0:00:15 | ladies and gentlemen to this |
|---|
| 0:00:17 | experts session on trains in or are and acoustic signal processing |
|---|
| 0:00:23 | and is |
|---|
| 0:00:24 | the that so many of you came |
|---|
| 0:00:27 | and thank you are but in advance for postponing a lunch break a bit |
|---|
| 0:00:31 | um i hope to mount will make it interesting |
|---|
| 0:00:34 | i i was just reason the fight that we could also use this opportunity what you need to to do |
|---|
| 0:00:39 | some advertisement for our a T C which is the T C and |
|---|
| 0:00:42 | or you an acoustic signal processing |
|---|
| 0:00:44 | as i'm not to really prepared for this or page take the whole thing as advertisement |
|---|
| 0:00:50 | for a our T C and whoever wants to get involved |
|---|
| 0:00:53 | please contact us |
|---|
| 0:00:55 | and |
|---|
| 0:00:55 | there are various ways of getting involved in our activities |
|---|
| 0:00:59 | and of course we first one would like to |
|---|
| 0:01:01 | tell you about what this is |
|---|
| 0:01:03 | so am |
|---|
| 0:01:04 | i i i in my are role as a posture of this T C and |
|---|
| 0:01:09 | i would like to process to to to experts which are also from our T C which present the |
|---|
| 0:01:14 | acoustic signal processing community and the audio community |
|---|
| 0:01:18 | uh uh a very specific and i think of very |
|---|
| 0:01:21 | uh |
|---|
| 0:01:21 | we now and way i would like first like to |
|---|
| 0:01:24 | uh point to pet we can a plea please skunk step forward |
|---|
| 0:01:28 | a that you can be C |
|---|
| 0:01:30 | but we can a is |
|---|
| 0:01:32 | the |
|---|
| 0:01:33 | from the imperial college london |
|---|
| 0:01:36 | and and i think uh is the most important thing about to right now is |
|---|
| 0:01:40 | that he just recently "'cause" did did the first book on speech to reverberation |
|---|
| 0:01:45 | a for everything and you might look at is that sides which has also very nice pictures like can |
|---|
| 0:01:52 | and uh on the other hand i have not come |
|---|
| 0:01:55 | with well known for |
|---|
| 0:01:57 | the audio or and |
|---|
| 0:01:59 | music especially community and co |
|---|
| 0:02:02 | he's score course actually much beyond that |
|---|
| 0:02:04 | he from a research |
|---|
| 0:02:07 | uh i should not i forget to mention that actually path have ties |
|---|
| 0:02:11 | to both words though |
|---|
| 0:02:13 | not come is |
|---|
| 0:02:14 | oh also teaching that stand for that and patrick also has |
|---|
| 0:02:18 | and that's true nations |
|---|
| 0:02:20 | so we that further do you i i would say uh i should stop |
|---|
| 0:02:28 | well thanks very much for coming along to this uh session help is gonna be interesting to you |
|---|
| 0:02:33 | um |
|---|
| 0:02:34 | we um |
|---|
| 0:02:36 | try to think about what you might expect from this kind of session |
|---|
| 0:02:40 | and i have to say that's |
|---|
| 0:02:42 | the idea of trends is a very personal thing |
|---|
| 0:02:45 | so uh we can to present |
|---|
| 0:02:47 | uh what we personally think uh hopefully interesting things |
|---|
| 0:02:51 | but uh obviously in the time concerns we |
|---|
| 0:02:54 | we can't cover everything so some of these things are like uh |
|---|
| 0:02:58 | a easy to define like counting papers as a measure of activity |
|---|
| 0:03:02 | or counting achievements maybe in terms of except papers rather than submitted by papers |
|---|
| 0:03:07 | some of them are much less uh |
|---|
| 0:03:09 | uh uh uh how do you own be list |
|---|
| 0:03:12 | and uh that more uh uh soft the concepts but we try to to go around this we a little |
|---|
| 0:03:17 | bit |
|---|
| 0:03:18 | and see what we can find |
|---|
| 0:03:21 | so the first thing we did was to look at the distribution of submissions to |
|---|
| 0:03:25 | uh the transactions on uh audio speech and language processing |
|---|
| 0:03:29 | and uh |
|---|
| 0:03:30 | i the plot this out that's a lot of detail on this pie chart here |
|---|
| 0:03:34 | but the thing to note from this |
|---|
| 0:03:36 | is that there is some big |
|---|
| 0:03:38 | uh subjects which are very active within a community in terms of the amount of effort |
|---|
| 0:03:44 | going into them |
|---|
| 0:03:45 | so speech enhancement is a big one and has been for a long time |
|---|
| 0:03:50 | source separation continues to be very active |
|---|
| 0:03:53 | uh we fat ica sessions he |
|---|
| 0:03:55 | uh to icassp uh |
|---|
| 0:03:58 | microphone array signal processing |
|---|
| 0:04:00 | still very big and uh showing up something like thirteen percent of submissions |
|---|
| 0:04:05 | a content based music processing that's just called it music processing |
|---|
| 0:04:09 | music is huge for us now music is huge for us and continues to grow |
|---|
| 0:04:15 | as race if not |
|---|
| 0:04:17 | and um |
|---|
| 0:04:18 | uh this is a a a uh real even lucien that we sing maybe even a revolution |
|---|
| 0:04:23 | in our uh profile of activities is |
|---|
| 0:04:26 | uh also we could look at audio analysis as a |
|---|
| 0:04:29 | as a big topic |
|---|
| 0:04:30 | the ones that i've highlighted they're are the ones that we can to try to focus on in this session |
|---|
| 0:04:34 | as i mentioned we can't possibly focus on |
|---|
| 0:04:37 | everything |
|---|
| 0:04:39 | so that leads just to music |
|---|
| 0:04:41 | so some music is um become very big here as as patrick mentioned and and this year at i cast |
|---|
| 0:04:46 | there |
|---|
| 0:04:47 | are three sessions as you can um see listed there |
|---|
| 0:04:49 | there's a number of reasons i thought well worth highlighting just because the is in to see how the Q |
|---|
| 0:04:53 | to develop |
|---|
| 0:04:54 | um so the the reasons is that the you X which is how people describe would papers there many a |
|---|
| 0:04:59 | describe described paper it's meeting |
|---|
| 0:05:00 | to conference |
|---|
| 0:05:01 | um was changed to include music as an absent so |
|---|
| 0:05:05 | it's a rather bureaucratic |
|---|
| 0:05:06 | um we same |
|---|
| 0:05:08 | but it probably has a lot large much to do with the fact that there's some music papers now at |
|---|
| 0:05:12 | icassp in M |
|---|
| 0:05:13 | and was i think that's a good idea |
|---|
| 0:05:15 | um a second reason is as a lot more content to work with um |
|---|
| 0:05:18 | music six easy to work with as we you know we all own large collections |
|---|
| 0:05:22 | um and and the third reason is is become a very commercially relevant in the last few years |
|---|
| 0:05:27 | um so i tunes impact or are certain it's two examples |
|---|
| 0:05:31 | of companies who are are making a a a large my money from |
|---|
| 0:05:34 | from music um ideas |
|---|
| 0:05:36 | um |
|---|
| 0:05:37 | as the mention the the data is easy um we all have um large um C D collections |
|---|
| 0:05:43 | and and |
|---|
| 0:05:44 | one of the the things that |
|---|
| 0:05:45 | that is difficult but music is a all copyrighted or all the stuff the wanna work with this operator |
|---|
| 0:05:50 | yeah and one way that Q T out with this is by um doing a to some a talk what |
|---|
| 0:05:55 | little bit |
|---|
| 0:05:56 | but another way that that that you D has a as um work with these it is two |
|---|
| 0:06:02 | create what's called the million song database |
|---|
| 0:06:04 | um and the idea of this is to distribute features of the song not the not the actual |
|---|
| 0:06:10 | copper the material |
|---|
| 0:06:11 | and so um |
|---|
| 0:06:13 | actual forget me if are i think it you a hundred features |
|---|
| 0:06:16 | purse on and there over time to |
|---|
| 0:06:18 | um |
|---|
| 0:06:19 | and columbian an echo nist uh provide this database |
|---|
| 0:06:22 | um at online |
|---|
| 0:06:24 | and there's a of data there that that people when use and it's really available in it's a very large |
|---|
| 0:06:29 | database |
|---|
| 0:06:29 | and i expect we'll see more more papers |
|---|
| 0:06:32 | um but uses database |
|---|
| 0:06:34 | the the matrix is an is of been the the best um thing for the |
|---|
| 0:06:39 | scientific if a component of music analysis music processing |
|---|
| 0:06:42 | this is the you list of tasks |
|---|
| 0:06:44 | that were that are being uh work done for the two thousand eleven competition same |
|---|
| 0:06:48 | um as a matching it's a big issue and |
|---|
| 0:06:51 | what the mean X people do um is |
|---|
| 0:06:53 | provide an environment and universe you wanna or i where people can one are algorithms a large data base of |
|---|
| 0:06:58 | of song |
|---|
| 0:07:00 | so the songs never leave you know was so on or |
|---|
| 0:07:02 | so instead of you know getting data and doing your algorithms and send results back |
|---|
| 0:07:06 | you said you algorithm universe you on the white |
|---|
| 0:07:08 | um in a particular environment java environment |
|---|
| 0:07:11 | and they bought it a they could do about get the up to it for you |
|---|
| 0:07:14 | and and then they run the algorithm and their machines and the clusters |
|---|
| 0:07:17 | and give you like results |
|---|
| 0:07:18 | i one to highlight um a three uh tasks |
|---|
| 0:07:21 | that are so right here |
|---|
| 0:07:23 | that are um very um important in very uh a popular |
|---|
| 0:07:26 | what is audio tag um classification so how you tag audio with various things |
|---|
| 0:07:30 | um is it happy use a blues |
|---|
| 0:07:33 | um anything you think of can be a a attack |
|---|
| 0:07:36 | and people were that very hard |
|---|
| 0:07:38 | um what for fundamental frequency estimation tracking |
|---|
| 0:07:40 | um has been popular a yeah i |
|---|
| 0:07:42 | yeah i before merrick started |
|---|
| 0:07:45 | but mirror X as i think of a coming database and and really up scientific level can not people can |
|---|
| 0:07:51 | can compare things on around |
|---|
| 0:07:53 | and the other one is a other get chord estimation |
|---|
| 0:07:55 | so that sense a court is is to another tag |
|---|
| 0:07:58 | but very specialised tearing |
|---|
| 0:07:59 | and helps people understand a music and people work on a lot |
|---|
| 0:08:03 | um something else as happen and spend very have it this year |
|---|
| 0:08:06 | yeah is um a lower work can separation analysis |
|---|
| 0:08:09 | and they are all very model different approaches |
|---|
| 0:08:13 | so this particular um um graphical model um |
|---|
| 0:08:17 | is for paper but um |
|---|
| 0:08:19 | um |
|---|
| 0:08:21 | my our open france a right |
|---|
| 0:08:23 | and it's shows um a sequence the note along the top and so in this case a have a score |
|---|
| 0:08:27 | in know what's what's being played and that's that hard information to get |
|---|
| 0:08:30 | and then the generating um |
|---|
| 0:08:33 | um data about the uh than the that harmonics |
|---|
| 0:08:36 | um um from here so you have the the amplitude |
|---|
| 0:08:39 | the free have no i and the variance of the of the of the gaussian in the spectral domain |
|---|
| 0:08:43 | oops sorry that are combined |
|---|
| 0:08:45 | and and then you have similar simple able in so these of the spectral slices |
|---|
| 0:08:49 | in what you try to do what you trying to |
|---|
| 0:08:51 | um given the note sequence you have um |
|---|
| 0:08:53 | i'm sorry |
|---|
| 0:08:55 | build a um or find the you the these |
|---|
| 0:08:58 | um emission probabilities |
|---|
| 0:09:00 | that describe a music |
|---|
| 0:09:01 | and from that you can do a lot of um a very everything work |
|---|
| 0:09:05 | um you can to do things like um tagging with to mentioned for things like a motion in john right |
|---|
| 0:09:10 | and and uh uh um something that's kind of do to my heart but shows a the kind of work |
|---|
| 0:09:15 | that's being done is area |
|---|
| 0:09:16 | um some work i'm and morphing um |
|---|
| 0:09:19 | and the question that um |
|---|
| 0:09:21 | um quite a known and what they want to ask was |
|---|
| 0:09:24 | what's the right way to think about um audio your perception |
|---|
| 0:09:27 | and in morphing |
|---|
| 0:09:29 | and so if you do more fink lee |
|---|
| 0:09:31 | the |
|---|
| 0:09:33 | the path in feature space should be a line |
|---|
| 0:09:35 | so if you're morphing between one position another position |
|---|
| 0:09:38 | that feature moves along a line in the will domain |
|---|
| 0:09:40 | and you want the same sort of thing to happen in the auditory domain |
|---|
| 0:09:44 | so |
|---|
| 0:09:44 | the |
|---|
| 0:09:45 | um |
|---|
| 0:09:46 | the graph that shown here on the left them so put pro quality but just give you a sense of |
|---|
| 0:09:50 | it |
|---|
| 0:09:50 | or with or or a range of a line spectral free of frequency envelopes |
|---|
| 0:09:56 | and then and the right hand side are |
|---|
| 0:09:58 | all the perceptual measures that of been used there have been calculated based on these |
|---|
| 0:10:03 | on these on L ourselves |
|---|
| 0:10:05 | and what they're doing is final look for one that's a straight line would you can see and in the |
|---|
| 0:10:08 | bill there |
|---|
| 0:10:09 | and and um some pieces work better than others are i think that research is still being |
|---|
| 0:10:14 | pursuit |
|---|
| 0:10:17 | right so uh |
|---|
| 0:10:18 | uh audio and acoustic signal processing T C |
|---|
| 0:10:22 | covers was quite a wide range of areas um |
|---|
| 0:10:25 | which are |
|---|
| 0:10:26 | well |
|---|
| 0:10:27 | i have to say that it to me there exciting i help you feel also that same excitement about said |
|---|
| 0:10:32 | the technology that are being developed |
|---|
| 0:10:34 | and and i think we see trends that a lot of the is this of being in the low archery |
|---|
| 0:10:39 | for many years |
|---|
| 0:10:41 | and now starting to come to the point of applications industrial applications |
|---|
| 0:10:44 | and we for about some of these in the planner |
|---|
| 0:10:47 | and and in that kind of context |
|---|
| 0:10:50 | if we look at uh the research that we do |
|---|
| 0:10:53 | um i ask a question of how much of it is driven by |
|---|
| 0:10:57 | uh the that is i have for exciting applications |
|---|
| 0:11:00 | and how much of it is fundamental how much of it |
|---|
| 0:11:03 | underpins |
|---|
| 0:11:04 | the |
|---|
| 0:11:04 | technology with good algorithmic research |
|---|
| 0:11:08 | um so i else you know is there a happy marriage here |
|---|
| 0:11:14 | and uh i have the uh do you can touch is of cambridge will forgive me for using that photograph |
|---|
| 0:11:19 | uh but there is a serious point a high this um but before we come to the series point |
|---|
| 0:11:28 | um |
|---|
| 0:11:29 | so uh of course prince william is very very pleased um having uh now find found is very fine bride |
|---|
| 0:11:41 | so he's maximised is expectations |
|---|
| 0:11:44 | um and uh i had a very uh happy day |
|---|
| 0:11:48 | the there coming back to something a little bit more serious i think um things which look good have to |
|---|
| 0:11:54 | be underpinned by |
|---|
| 0:11:56 | excellent |
|---|
| 0:11:57 | in uh algorithmic and fundamental research |
|---|
| 0:12:00 | so if there is a trend perhaps |
|---|
| 0:12:02 | two things that look great |
|---|
| 0:12:04 | let's just not loose sight to the fact that the power |
|---|
| 0:12:08 | behind them uh is |
|---|
| 0:12:09 | uh the algorithms that we do |
|---|
| 0:12:12 | okay |
|---|
| 0:12:13 | so one of the areas of out grizzly research which is very hot and has been for a long time |
|---|
| 0:12:18 | is in uh array signal processing is applied to |
|---|
| 0:12:21 | microphones maybe also loudspeaker right |
|---|
| 0:12:25 | and here we see um and even of applications hearing aids as been very busy for a long time |
|---|
| 0:12:31 | and has a |
|---|
| 0:12:32 | uh many applications as well as excellent underpinning technology |
|---|
| 0:12:36 | i do see now a big brunch out into the living room |
|---|
| 0:12:40 | and the living room means V |
|---|
| 0:12:43 | it means entertainment perhaps it means an X box three sixty with a connects |
|---|
| 0:12:47 | a microphone array uh perhaps it means sky T V |
|---|
| 0:12:51 | and so these are new applications which are really coming on stream now |
|---|
| 0:12:55 | and uh i think we'll start to shape |
|---|
| 0:12:58 | the way that we do research |
|---|
| 0:13:00 | at asks haven't to change that much we still want to do localization we still want to do tracking |
|---|
| 0:13:05 | we still want to extract to decide source from any |
|---|
| 0:13:08 | uh would be that noise or other tool "'cause" |
|---|
| 0:13:11 | um and then and then you a pass a new task is to try to learn something about the acoustic |
|---|
| 0:13:16 | environment |
|---|
| 0:13:18 | from uh a by inferring it from the multichannel signals that we can obtain with the microphone right |
|---|
| 0:13:24 | and this gives is a dish additional prior information on which we can condition estimation |
|---|
| 0:13:30 | um |
|---|
| 0:13:31 | know that it's you is what kind of microphone array should we use and how can we understand how it's |
|---|
| 0:13:36 | gonna behave |
|---|
| 0:13:38 | people started off perhaps looking at linear arrays |
|---|
| 0:13:41 | um |
|---|
| 0:13:41 | certainly extending it into play you and cylindrical and spherical even distributed or race that don't really have any geometry |
|---|
| 0:13:48 | three |
|---|
| 0:13:50 | and uh that's signed of such arrays including that's spacing |
|---|
| 0:13:53 | of microphone elements and the orientation uh uh is uh an important an expanding topic i think |
|---|
| 0:13:59 | people started off with linear arrays |
|---|
| 0:14:01 | um |
|---|
| 0:14:02 | a bunch of microphones in a line |
|---|
| 0:14:04 | perhaps uh this is a well-known i can mike from M H acoustics |
|---|
| 0:14:08 | uh thirty two sense on the surface of a rigid sphere a eight centimetres or so |
|---|
| 0:14:13 | of the little bar or tree prototypes products |
|---|
| 0:14:17 | the come now into real products you can buy |
|---|
| 0:14:20 | and uh connect your T V sets sky T V |
|---|
| 0:14:23 | as |
|---|
| 0:14:24 | uh the opportunity to include microphone arrays |
|---|
| 0:14:27 | for relatively low cost |
|---|
| 0:14:28 | uh such that you can communicate uh using your living room equipment |
|---|
| 0:14:33 | um |
|---|
| 0:14:34 | for a a very low cost |
|---|
| 0:14:35 | to |
|---|
| 0:14:37 | communications and hardware well |
|---|
| 0:14:39 | and the channel just here that you're probably sitting for me away from the microphone |
|---|
| 0:14:44 | so uh uh uh this is going to be i think a really hot application for us |
|---|
| 0:14:49 | in the future |
|---|
| 0:14:52 | interestingly uh people are still doing fundamental research so i'm pleased to see that and that he's a paper i |
|---|
| 0:14:57 | picked out uh |
|---|
| 0:14:58 | i can't say a random but it caught my eye |
|---|
| 0:15:01 | um he he's a problem given and the source is an M microphones |
|---|
| 0:15:06 | where should you put the microphone |
|---|
| 0:15:09 | and uh in this work which is some uh work i spotted from uh from the old about group |
|---|
| 0:15:15 | i given a planar microphone array |
|---|
| 0:15:17 | some analysis which enables one to predict |
|---|
| 0:15:20 | the directivity index obtained for different geometries and therefore obviously then allows optimisation |
|---|
| 0:15:26 | of those too much |
|---|
| 0:15:29 | okay so source separation is uh another hot topic and has been for a while |
|---|
| 0:15:34 | i thought i should say that's obviously trends |
|---|
| 0:15:37 | start somewhere |
|---|
| 0:15:38 | the trend |
|---|
| 0:15:39 | has to begin with the trend setter |
|---|
| 0:15:42 | and i put this photograph up of uh colin cherry |
|---|
| 0:15:45 | um simply because i think he used to have the office which is above my office now so |
|---|
| 0:15:50 | i also feel some kind of uh proximity effect |
|---|
| 0:15:53 | um |
|---|
| 0:15:54 | and uh his definition of the cocktail party in is nineteen fifties book on human communication has often is often |
|---|
| 0:16:01 | quite it's in people's papers |
|---|
| 0:16:03 | um and the early experiments were asking the question as to the behavior of listeners |
|---|
| 0:16:08 | when they were receiving to almost simultaneous signals |
|---|
| 0:16:11 | and uh |
|---|
| 0:16:12 | cool that the cocktail party |
|---|
| 0:16:14 | at the picture here i put it up on purpose because i don't think many people would really have a |
|---|
| 0:16:19 | good image of what a cocktail party was in nineteen fifty |
|---|
| 0:16:25 | and so i i guess it looks a bit different now a |
|---|
| 0:16:29 | but anyway |
|---|
| 0:16:30 | uh so |
|---|
| 0:16:31 | progress in this area has led us to be able to handle cases where we have both that i mean |
|---|
| 0:16:36 | and undeterred on to determine scenarios |
|---|
| 0:16:39 | i'm clustering has been a very effective technique |
|---|
| 0:16:42 | uh the permutation |
|---|
| 0:16:44 | uh problem |
|---|
| 0:16:46 | has been addressed uh with some great successes as well |
|---|
| 0:16:49 | and now we're starting to see results in the practical context where we have reverberation as well |
|---|
| 0:16:56 | the uh usual effect of reverberation is talked about in the context |
|---|
| 0:17:00 | um of dereverberation algorithms for speech enhancement |
|---|
| 0:17:04 | and uh this is something that i've uh myself tried to address |
|---|
| 0:17:08 | and uh perhaps we now at the stage where there is a push to take some of the |
|---|
| 0:17:13 | algorithms from the lab archery and start to roll them out into real world applications |
|---|
| 0:17:19 | that's will then learn whether they work or not |
|---|
| 0:17:22 | and uh we have to address the cases which are both single and channel case |
|---|
| 0:17:27 | uh often by using acoustic channel inversion if we can estimate acoustic channel |
|---|
| 0:17:33 | and although |
|---|
| 0:17:35 | this is all |
|---|
| 0:17:35 | a slight title speech enhancement of course reverberation |
|---|
| 0:17:39 | uh is widely used |
|---|
| 0:17:41 | both positively and has negative effects also in music so let's not lose sight of that |
|---|
| 0:17:48 | the other factor which i wanted to touch on here was seen |
|---|
| 0:17:52 | so |
|---|
| 0:17:53 | and interdisciplinary research is often a favourites modality |
|---|
| 0:17:57 | and did not community we can see some if it's coming from |
|---|
| 0:18:01 | cross fertilisation of different topic areas |
|---|
| 0:18:04 | for example |
|---|
| 0:18:06 | all of uh dereverberation reverberation and blind source separation |
|---|
| 0:18:09 | and we start to see papers where |
|---|
| 0:18:11 | these are jointly |
|---|
| 0:18:13 | uh uh uh addressed with some uh good leave each from both |
|---|
| 0:18:17 | uh but types of techniques |
|---|
| 0:18:19 | equally |
|---|
| 0:18:20 | speech for uh dereverberation reverberation coupled with speech recognition |
|---|
| 0:18:25 | where |
|---|
| 0:18:26 | a classical speech recognizer is in hans |
|---|
| 0:18:29 | uh such that it has knowledge of the models of clean speech but also |
|---|
| 0:18:33 | has models for the reverberation |
|---|
| 0:18:36 | and by combining these |
|---|
| 0:18:37 | is able to make a a big improvements in a word accuracy |
|---|
| 0:18:45 | so i want to talk a bit about a week or anything that i've been seeing or less two years |
|---|
| 0:18:49 | um |
|---|
| 0:18:50 | both in this community and an elsewhere but i thought i i'd and mention it here first and and |
|---|
| 0:18:55 | and that's about sparsity |
|---|
| 0:18:56 | um |
|---|
| 0:18:57 | and and no we're not talking about my here |
|---|
| 0:19:00 | um |
|---|
| 0:19:03 | the |
|---|
| 0:19:03 | first a i saw this um |
|---|
| 0:19:05 | was in the matching pursuit work that was presented here and ninety seven i think that was first done and |
|---|
| 0:19:10 | you know a signal processing |
|---|
| 0:19:12 | a transactions and ninety three |
|---|
| 0:19:14 | and um at the time i thought it was interesting but a dime idea |
|---|
| 0:19:18 | um |
|---|
| 0:19:20 | and so now i'm a crack myself |
|---|
| 0:19:21 | um but it's own up a number of resting places um in in the work we that has been done |
|---|
| 0:19:27 | um it i cast elsewhere |
|---|
| 0:19:28 | um compressed sensing a a a few years ago um was a proper the best example |
|---|
| 0:19:33 | um |
|---|
| 0:19:34 | but in in this community um |
|---|
| 0:19:36 | and we seen any can you know to sorry is still low just as deep belief network |
|---|
| 0:19:41 | um |
|---|
| 0:19:42 | sparsity D has been a big part of of the work that's been done on D of that works and |
|---|
| 0:19:46 | in machine learning |
|---|
| 0:19:47 | i think that's pen um you know sing |
|---|
| 0:19:50 | and |
|---|
| 0:19:51 | um in a lot of paper is that we saw this this year um |
|---|
| 0:19:54 | L one regularization is a way of of providing solutions that that makes sense |
|---|
| 0:20:00 | um |
|---|
| 0:20:01 | when you have a very um go over determined um very complex um basis set |
|---|
| 0:20:06 | and so i i |
|---|
| 0:20:07 | i i title this or a spouse a D uh but it's probably better described a sparsity |
|---|
| 0:20:12 | in combination with um over over complete basis sets |
|---|
| 0:20:16 | and i think that combinations and resting |
|---|
| 0:20:18 | oh one example of that um was talked about a little bit go |
|---|
| 0:20:21 | and session before this |
|---|
| 0:20:22 | um in the work by a um |
|---|
| 0:20:24 | i i new in cr |
|---|
| 0:20:26 | um using a cortical representation to um |
|---|
| 0:20:30 | um |
|---|
| 0:20:31 | to model sound |
|---|
| 0:20:32 | and |
|---|
| 0:20:33 | and courts is probably the original um |
|---|
| 0:20:36 | a sparse representation |
|---|
| 0:20:37 | um |
|---|
| 0:20:38 | it predates all of us |
|---|
| 0:20:40 | and and the idea is that you wanna represent sound with the least amount of of biological energy |
|---|
| 0:20:46 | and what seems work well there is to use bikes there are |
|---|
| 0:20:49 | represent of are very um |
|---|
| 0:20:52 | a a distinct sound atoms and how the top put together is still a matter discussion |
|---|
| 0:20:56 | but uh |
|---|
| 0:20:57 | i think is the been gone be you know sing |
|---|
| 0:20:59 | and the way a uh a new but and ch has been using that is two |
|---|
| 0:21:03 | take noisy speech and input if you these kind of um this very overcomplete complete basis set |
|---|
| 0:21:09 | and then |
|---|
| 0:21:10 | um |
|---|
| 0:21:12 | phil to it |
|---|
| 0:21:13 | you and in we regions |
|---|
| 0:21:15 | that that are |
|---|
| 0:21:17 | likely to contain speech |
|---|
| 0:21:19 | and so |
|---|
| 0:21:20 | in a sense |
|---|
| 0:21:21 | um it's a it's a wiener filter but it's in a very rich environment |
|---|
| 0:21:25 | where it's very easy to separate um speech from noise and things like that |
|---|
| 0:21:28 | and what's on the bottom is is noisy speech the kind of feel to that makes sense for speech |
|---|
| 0:21:32 | which for example has a a lot of energy rather forwards modulation rate |
|---|
| 0:21:36 | and then the clean clean speech on uh on the op |
|---|
| 0:21:40 | um |
|---|
| 0:21:40 | the deep belief networks are are you know thing um i think um for similar reason this all ties together |
|---|
| 0:21:46 | um |
|---|
| 0:21:46 | was shown in the left hand side it is um |
|---|
| 0:21:49 | um |
|---|
| 0:21:50 | is a little bit of a waveform that's been applied to a a |
|---|
| 0:21:54 | a restricted boltzmann scene |
|---|
| 0:21:56 | which is just a way of saying that they have a their legal learn weight matrix |
|---|
| 0:21:59 | the transforms the input |
|---|
| 0:22:01 | on the bottom here |
|---|
| 0:22:03 | to an output |
|---|
| 0:22:04 | uh so on top there |
|---|
| 0:22:05 | few um a a a a make a weight matrix |
|---|
| 0:22:08 | and is a what little bit of a nonlinear you there |
|---|
| 0:22:11 | in a can learn these things in a way that um |
|---|
| 0:22:14 | um |
|---|
| 0:22:16 | can we construct input so find too |
|---|
| 0:22:18 | find a basis vectors um on the side what where is that by the way picks vector X |
|---|
| 0:22:23 | so that give "'em" of these guys they can we construct the the visible units it sorry |
|---|
| 0:22:28 | um |
|---|
| 0:22:28 | and these are some they been doing this for image processing domain for a long time |
|---|
| 0:22:32 | and these are some results |
|---|
| 0:22:33 | in the waveform domain there are there are new this year |
|---|
| 0:22:36 | and there's a bunch of thing um things that often look like um |
|---|
| 0:22:40 | uh gabor is a very sizes |
|---|
| 0:22:42 | but the one thing as an or things you have to see some very complex features so this in the |
|---|
| 0:22:46 | fixed a domain |
|---|
| 0:22:47 | and you got these things that have to frequency P |
|---|
| 0:22:49 | which you know might be akin to formants |
|---|
| 0:22:52 | um |
|---|
| 0:22:53 | and so they will applying that to to speech recognition and i think that's in sing direction |
|---|
| 0:22:58 | i'm gonna limb here because um |
|---|
| 0:23:00 | i think the reason that um |
|---|
| 0:23:02 | suppose C D's important |
|---|
| 0:23:04 | is it because it gives this a way of of representing things that we can't do with that we can't |
|---|
| 0:23:08 | do was well in other domains |
|---|
| 0:23:10 | so we have grew up with the voice transform domain and what's on an and a left can side at |
|---|
| 0:23:14 | two basis functions |
|---|
| 0:23:15 | is one a basis to just to frequencies |
|---|
| 0:23:18 | and with those two basis functions you can represent the entire subspace space |
|---|
| 0:23:22 | so that point that's shown there to be anyone that subspace and and you can do all those things |
|---|
| 0:23:26 | and it's a very which representation is a as we all know |
|---|
| 0:23:29 | you know as is a satisfy the nyquist criteria you can you can do anything |
|---|
| 0:23:33 | but |
|---|
| 0:23:34 | i think that's the problem with |
|---|
| 0:23:35 | with |
|---|
| 0:23:36 | a dense representation like that |
|---|
| 0:23:37 | and alternative is to you is you look at something like an overcomplete bases |
|---|
| 0:23:41 | and and just pick out elements at you've seen before |
|---|
| 0:23:44 | so you you just as some synthetic formants |
|---|
| 0:23:47 | but the way i like to think about these things working is that |
|---|
| 0:23:50 | if you train um if you if you build a system that that it exploits um sparseness |
|---|
| 0:23:55 | whether but belief network whether be matching pursuit |
|---|
| 0:23:58 | um whatever your favourite implementation technology as |
|---|
| 0:24:01 | you can learn patterns that look like these formants and so what's on the left is is one of all |
|---|
| 0:24:06 | with different vocal tract lang |
|---|
| 0:24:08 | and uh on the second and a and the right hand side as a different valid different vocal tract length |
|---|
| 0:24:13 | and |
|---|
| 0:24:15 | the system on the right with a sparse overcomplete representation is just gonna learn these kinds of things |
|---|
| 0:24:20 | it's goal balls with different vocal tract length |
|---|
| 0:24:22 | it's not colour need entire space |
|---|
| 0:24:24 | and so that if you wanna process things |
|---|
| 0:24:26 | if you working in this space |
|---|
| 0:24:28 | then only things that are valid sound sounds it you seen before |
|---|
| 0:24:31 | will be represented by the sparse basis fact |
|---|
| 0:24:33 | but a basis that |
|---|
| 0:24:34 | and it can do |
|---|
| 0:24:35 | yeah useful things and so i think that's where it's can be an important trend in a port direction for |
|---|
| 0:24:39 | unity |
|---|
| 0:24:44 | so one of the things we wanted to do is to get out to different sectors of a a topic |
|---|
| 0:24:48 | area and uh put in some uh i hopefully interesting quotations from |
|---|
| 0:24:53 | uh i just in those field so |
|---|
| 0:24:55 | and he's one that comes from um |
|---|
| 0:24:58 | from T T so he we have telecommunications company |
|---|
| 0:25:01 | uh thank you for uh to here here not at E |
|---|
| 0:25:04 | for this code remaining challenges in source separation |
|---|
| 0:25:08 | could include blind source separation for an unknown or dynamic |
|---|
| 0:25:12 | number of source |
|---|
| 0:25:14 | it is that i artificially officially in it's cherry jerry chair uh a photograph on the wall of the large |
|---|
| 0:25:22 | uh into the E how what areas so if we think about mixed signal I sees |
|---|
| 0:25:27 | uh the the guys at the working on those uh |
|---|
| 0:25:31 | functionalities |
|---|
| 0:25:32 | really support what we want to do |
|---|
| 0:25:34 | uh so i think that that's important to to listen to the heart guys as well |
|---|
| 0:25:39 | so from uh we'll so micro electronics |
|---|
| 0:25:41 | uh most lower is driving dsp P speed and memory compacity and they billing implementation of sophisticated dsp functions |
|---|
| 0:25:49 | resulting from me is of research |
|---|
| 0:25:51 | the end user experience |
|---|
| 0:25:53 | uh maybe this is a which rather than the reality of the moment |
|---|
| 0:25:56 | the end user experience is one of natural white and voice communications devoid |
|---|
| 0:26:01 | of acoustic background noise and unwanted artifacts |
|---|
| 0:26:04 | seems to me like the hardware manufacturers are on our side |
|---|
| 0:26:09 | um um we had uh a little bit this morning about the uh X box connect |
|---|
| 0:26:13 | uh you found a have |
|---|
| 0:26:15 | thanks |
|---|
| 0:26:15 | for this uh a contribution here of the applications of sound capture and enhancement and processing technologies shift |
|---|
| 0:26:23 | oh he's a paradigm shift |
|---|
| 0:26:24 | shift gradually from communications |
|---|
| 0:26:28 | which is where they |
|---|
| 0:26:29 | where region eight isn't half the home |
|---|
| 0:26:31 | mostly a towards mostly recognition and building natural human-machine interface |
|---|
| 0:26:38 | uh and he highlights mobile devices |
|---|
| 0:26:41 | "'cause" and living rooms |
|---|
| 0:26:42 | i key application at |
|---|
| 0:26:45 | malcolm you get the last word |
|---|
| 0:26:46 | well i i don't the last word but but we we have one more slide and we can decide whether |
|---|
| 0:26:50 | this is the last word from |
|---|
| 0:26:51 | i'm steve jobs or from with a ga got a |
|---|
| 0:26:54 | but in either case the message is same and this large commercial applications for the work that we're doing |
|---|
| 0:26:59 | it started with um M P three which enable this market |
|---|
| 0:27:03 | but this still a lot of things we done in terms of finding music |
|---|
| 0:27:06 | um |
|---|
| 0:27:07 | adding adding to things um understanding |
|---|
| 0:27:09 | what what people's a team a needs are so we really haven't talked but that very much |
|---|
| 0:27:12 | but |
|---|
| 0:27:13 | um |
|---|
| 0:27:14 | this is an information but this does not information retrieval task you know people looking for things that are chain |
|---|
| 0:27:18 | themselves some whether be songs or or or or or music or whatever |
|---|
| 0:27:22 | um i'm you signals and and working with them is an important thing to do |
|---|
| 0:27:25 | and so |
|---|
| 0:27:26 | um i think both lately got got and see jobs can have a final word |
|---|
| 0:27:30 | so thank you |
|---|
| 0:27:39 | so |
|---|
| 0:27:40 | thank you |
|---|
| 0:27:41 | my come and |
|---|
| 0:27:42 | patrick rate |
|---|
| 0:27:43 | a now we have very little time for discussion but we certainly should not miss this up you need T |
|---|
| 0:27:49 | to hear other the voices as well as that we mentioned |
|---|
| 0:27:52 | obviously these views are not completely balance |
|---|
| 0:27:56 | how could it they be |
|---|
| 0:27:58 | so maybe somebody in the for a would like to add some but |
|---|
| 0:28:01 | something and we can |
|---|
| 0:28:03 | a we have a little discussion on more |
|---|
| 0:28:06 | anybody |
|---|
| 0:28:08 | yeah |
|---|
| 0:28:13 | a thank you for that great summary |
|---|
| 0:28:15 | uh i just want to add one more thing i think up |
|---|
| 0:28:18 | we have to a isn't two years and the work together |
|---|
| 0:28:21 | and i think cross model issues are |
|---|
| 0:28:24 | a likely to be very important the |
|---|
| 0:28:27 | i eyes did act that you has and the years detect the eyes and so on and |
|---|
| 0:28:30 | likewise i think uh audition audio research and B vision suck should not |
|---|
| 0:28:35 | proceed separately |
|---|
| 0:28:37 | thanks |
|---|
| 0:28:38 | the money for this comment uh |
|---|
| 0:28:41 | this is certainly something which we highly appreciate and we always like to be in touch with the |
|---|
| 0:28:46 | multimedia guys would don C uh audio as a media |
|---|
| 0:28:50 | um |
|---|
| 0:28:51 | but uh uh |
|---|
| 0:28:53 | certainly we uh there are many applications where we actually closely working |
|---|
| 0:28:58 | with with your persons just think about |
|---|
| 0:29:01 | uh celeste tracking |
|---|
| 0:29:03 | so if you want to track some acoustic sources |
|---|
| 0:29:06 | and the source a silent then you're a the uh you better use you camera |
|---|
| 0:29:11 | so they are |
|---|
| 0:29:12 | a quite a few applications with this is quite natural to joint for |
|---|
| 0:29:19 | i i you know just a |
|---|
| 0:29:21 | to reinforce that there was a nice people saw us to remember who did it |
|---|
| 0:29:24 | with their looking for joint source |
|---|
| 0:29:26 | joint audiovisual sources and i think that's |
|---|
| 0:29:29 | it's important and |
|---|
| 0:29:30 | it can be easier i mean |
|---|
| 0:29:31 | the signals are no longer a big deal |
|---|
| 0:29:34 | so it's easy to get to the space commuter power is pretty easy |
|---|
| 0:29:37 | it would be fun |
|---|
| 0:29:42 | followed that uh people have to |
|---|
| 0:29:44 | okay follow that talks about four years |
|---|
| 0:29:47 | uh is there any research uh |
|---|
| 0:29:49 | well i use a pen binaural a single person sinful |
|---|
| 0:29:53 | binaural uh for musical signal processing |
|---|
| 0:29:59 | i don't i don't heat so the question was whether is any binaural music research um |
|---|
| 0:30:03 | i don't know of any i mean people certainly worry about um synthesizing um hi |
|---|
| 0:30:08 | um high fidelity sound fields |
|---|
| 0:30:11 | so um |
|---|
| 0:30:13 | um |
|---|
| 0:30:14 | the fun of a group for example from working on on synthesizing |
|---|
| 0:30:17 | you know sound field a sound good no matter where you are |
|---|
| 0:30:20 | and and so you know work with people stand for |
|---|
| 0:30:22 | where various in in computing in in creating three D sound fields |
|---|
| 0:30:26 | for musical experiences |
|---|
| 0:30:28 | um |
|---|
| 0:30:29 | um but i much or where X i go yeah |
|---|
| 0:30:33 | i mean i i i if you'd S be ten use you whether we have five point one speakers in |
|---|
| 0:30:36 | the living room |
|---|
| 0:30:37 | i was set no |
|---|
| 0:30:38 | but |
|---|
| 0:30:39 | look what's happened |
|---|
| 0:30:40 | so we we better |
|---|
| 0:30:46 | else before lunch |
|---|
| 0:30:52 | okay you talked about uh five point ones because the living room but um |
|---|
| 0:30:56 | or thing a lot of new algorithms that a little do uh microphone array processing |
|---|
| 0:31:01 | well would be saying devices that let us do it |
|---|
| 0:31:03 | i mean like soft connect has a a a a few microphones i've seen a few um |
|---|
| 0:31:08 | cell phones that have multiple microphones on for noise cancellation will have more devices allow us to |
|---|
| 0:31:14 | a better processing algorithm |
|---|
| 0:31:16 | yeah so the question was what what we have devices that will have uh |
|---|
| 0:31:19 | uh the ability to allow us to implement |
|---|
| 0:31:23 | yeah |
|---|
| 0:31:24 | so i P eyes |
|---|
| 0:31:25 | so on so forth |
|---|
| 0:31:26 | i i i understand from this morning talks that day be a a um a guys will be a software |
|---|
| 0:31:30 | development kits will be available for connect |
|---|
| 0:31:32 | um and that could be a lot of fun |
|---|
| 0:31:34 | um i think uh the hardware is that to enable us to do it and |
|---|
| 0:31:38 | the key point at of this i think is one of the trends that uh |
|---|
| 0:31:43 | uh we use C which is a move |
|---|
| 0:31:46 | in audio from single to multichannel |
|---|
| 0:31:48 | that's been happening for a while and that is their sign of its stopping |
|---|
| 0:31:52 | as so the of we would expect the facilities |
|---|
| 0:31:54 | uh the processing power |
|---|
| 0:31:56 | the uh inter operability and software development kits to come with that as well |
|---|
| 0:32:05 | near the question |
|---|
| 0:32:07 | comments |
|---|
| 0:32:09 | i have one uh |
|---|
| 0:32:10 | final remark which came mark |
|---|
| 0:32:13 | increasingly uh |
|---|
| 0:32:15 | and that would like to put that as a channel a challenge because |
|---|
| 0:32:18 | uh they're sensor networks are out there and they are |
|---|
| 0:32:21 | uh in discussion on |
|---|
| 0:32:24 | in many papers where a nice uh |
|---|
| 0:32:28 | algorithms are provided all ways based on the assumption that all the senses are synchronise |
|---|
| 0:32:35 | um |
|---|
| 0:32:36 | this is a |
|---|
| 0:32:37 | tough problem actually so |
|---|
| 0:32:39 | and we feel in the audio community we could a |
|---|
| 0:32:43 | if a lot if somebody could really built devices which make sure that all the audio front ends in |
|---|
| 0:32:49 | distributed to beauty work |
|---|
| 0:32:51 | synchrony a the synchronise |
|---|
| 0:32:53 | uh the underlying problem is simply the |
|---|
| 0:32:57 | once you |
|---|
| 0:32:58 | correlates signals of different senses that |
|---|
| 0:33:01 | um have |
|---|
| 0:33:03 | not exactly synchronous clocks |
|---|
| 0:33:06 | the what uh this |
|---|
| 0:33:08 | correlation |
|---|
| 0:33:09 | will fall apart |
|---|
| 0:33:11 | and |
|---|
| 0:33:11 | just look at all your optimize nation and all the adaptive filtering stuff that we have |
|---|
| 0:33:16 | it's always based on correlation and |
|---|
| 0:33:18 | even higher orders the |
|---|
| 0:33:20 | but then uh |
|---|
| 0:33:22 | this problem has to be solved |
|---|
| 0:33:24 | and so if you want to do something really |
|---|
| 0:33:27 | uh a good for us then please solve this problem |
|---|
| 0:33:32 | as a have after once |
|---|
| 0:33:34 | after lunch okay |
|---|
| 0:33:36 | thank you were much for attending |
|---|