| 0:00:13 | oh | 
|---|
| 0:00:13 | welcome | 
|---|
| 0:00:15 | ladies and gentlemen to this | 
|---|
| 0:00:17 | experts session on trains in or are and acoustic signal processing | 
|---|
| 0:00:23 | and is | 
|---|
| 0:00:24 | the that so many of you came | 
|---|
| 0:00:27 | and thank you are but in advance for postponing a lunch break a bit | 
|---|
| 0:00:31 | um i hope to mount will make it interesting | 
|---|
| 0:00:34 | i i was just reason the fight that we could also use this opportunity what you need to to do | 
|---|
| 0:00:39 | some advertisement for our a T C which is the T C and | 
|---|
| 0:00:42 | or you an acoustic signal processing | 
|---|
| 0:00:44 | as i'm not to really prepared for this or page take the whole thing as advertisement | 
|---|
| 0:00:50 | for a our T C and whoever wants to get involved | 
|---|
| 0:00:53 | please contact us | 
|---|
| 0:00:55 | and | 
|---|
| 0:00:55 | there are various ways of getting involved in our activities | 
|---|
| 0:00:59 | and of course we first one would like to | 
|---|
| 0:01:01 | tell you about what this is | 
|---|
| 0:01:03 | so am | 
|---|
| 0:01:04 | i i i in my are role as a posture of this T C and | 
|---|
| 0:01:09 | i would like to process to to to experts which are also from our T C which present the | 
|---|
| 0:01:14 | acoustic signal processing community and the audio community | 
|---|
| 0:01:18 | uh uh a very specific and i think of very | 
|---|
| 0:01:21 | uh | 
|---|
| 0:01:21 | we now and way i would like first like to | 
|---|
| 0:01:24 | uh point to pet we can a plea please skunk step forward | 
|---|
| 0:01:28 | a that you can be C | 
|---|
| 0:01:30 | but we can a is | 
|---|
| 0:01:32 | the | 
|---|
| 0:01:33 | from the imperial college london | 
|---|
| 0:01:36 | and and i think uh is the most important thing about to right now is | 
|---|
| 0:01:40 | that he just recently "'cause" did did the first book on speech to reverberation | 
|---|
| 0:01:45 | a for everything and you might look at is that sides which has also very nice pictures like can | 
|---|
| 0:01:52 | and uh on the other hand i have not come | 
|---|
| 0:01:55 | with well known for | 
|---|
| 0:01:57 | the audio or and | 
|---|
| 0:01:59 | music especially community and co | 
|---|
| 0:02:02 | he's score course actually much beyond that | 
|---|
| 0:02:04 | he from a research | 
|---|
| 0:02:07 | uh i should not i forget to mention that actually path have ties | 
|---|
| 0:02:11 | to both words though | 
|---|
| 0:02:13 | not come is | 
|---|
| 0:02:14 | oh also teaching that stand for that and patrick also has | 
|---|
| 0:02:18 | and that's true nations | 
|---|
| 0:02:20 | so we that further do you i i would say uh i should stop | 
|---|
| 0:02:28 | well thanks very much for coming along to this uh session help is gonna be interesting to you | 
|---|
| 0:02:33 | um | 
|---|
| 0:02:34 | we um | 
|---|
| 0:02:36 | try to think about what you might expect from this kind of session | 
|---|
| 0:02:40 | and i have to say that's | 
|---|
| 0:02:42 | the idea of trends is a very personal thing | 
|---|
| 0:02:45 | so uh we can to present | 
|---|
| 0:02:47 | uh what we personally think uh hopefully interesting things | 
|---|
| 0:02:51 | but uh obviously in the time concerns we | 
|---|
| 0:02:54 | we can't cover everything so some of these things are like uh | 
|---|
| 0:02:58 | a easy to define like counting papers as a measure of activity | 
|---|
| 0:03:02 | or counting achievements maybe in terms of except papers rather than submitted by papers | 
|---|
| 0:03:07 | some of them are much less uh | 
|---|
| 0:03:09 | uh uh uh how do you own be list | 
|---|
| 0:03:12 | and uh that more uh uh soft the concepts but we try to to go around this we a little | 
|---|
| 0:03:17 | bit | 
|---|
| 0:03:18 | and see what we can find | 
|---|
| 0:03:21 | so the first thing we did was to look at the distribution of submissions to | 
|---|
| 0:03:25 | uh the transactions on uh audio speech and language processing | 
|---|
| 0:03:29 | and uh | 
|---|
| 0:03:30 | i the plot this out that's a lot of detail on this pie chart here | 
|---|
| 0:03:34 | but the thing to note from this | 
|---|
| 0:03:36 | is that there is some big | 
|---|
| 0:03:38 | uh subjects which are very active within a community in terms of the amount of effort | 
|---|
| 0:03:44 | going into them | 
|---|
| 0:03:45 | so speech enhancement is a big one and has been for a long time | 
|---|
| 0:03:50 | source separation continues to be very active | 
|---|
| 0:03:53 | uh we fat ica sessions he | 
|---|
| 0:03:55 | uh to icassp uh | 
|---|
| 0:03:58 | microphone array signal processing | 
|---|
| 0:04:00 | still very big and uh showing up something like thirteen percent of submissions | 
|---|
| 0:04:05 | a content based music processing that's just called it music processing | 
|---|
| 0:04:09 | music is huge for us now music is huge for us and continues to grow | 
|---|
| 0:04:15 | as race if not | 
|---|
| 0:04:17 | and um | 
|---|
| 0:04:18 | uh this is a a a uh real even lucien that we sing maybe even a revolution | 
|---|
| 0:04:23 | in our uh profile of activities is | 
|---|
| 0:04:26 | uh also we could look at audio analysis as a | 
|---|
| 0:04:29 | as a big topic | 
|---|
| 0:04:30 | the ones that i've highlighted they're are the ones that we can to try to focus on in this session | 
|---|
| 0:04:34 | as i mentioned we can't possibly focus on | 
|---|
| 0:04:37 | everything | 
|---|
| 0:04:39 | so that leads just to music | 
|---|
| 0:04:41 | so some music is um become very big here as as patrick mentioned and and this year at i cast | 
|---|
| 0:04:46 | there | 
|---|
| 0:04:47 | are three sessions as you can um see listed there | 
|---|
| 0:04:49 | there's a number of reasons i thought well worth highlighting just because the is in to see how the Q | 
|---|
| 0:04:53 | to develop | 
|---|
| 0:04:54 | um so the the reasons is that the you X which is how people describe would papers there many a | 
|---|
| 0:04:59 | describe described paper it's meeting | 
|---|
| 0:05:00 | to conference | 
|---|
| 0:05:01 | um was changed to include music as an absent so | 
|---|
| 0:05:05 | it's a rather bureaucratic | 
|---|
| 0:05:06 | um we same | 
|---|
| 0:05:08 | but it probably has a lot large much to do with the fact that there's some music papers now at | 
|---|
| 0:05:12 | icassp in M | 
|---|
| 0:05:13 | and was i think that's a good idea | 
|---|
| 0:05:15 | um a second reason is as a lot more content to work with um | 
|---|
| 0:05:18 | music six easy to work with as we you know we all own large collections | 
|---|
| 0:05:22 | um and and the third reason is is become a very commercially relevant in the last few years | 
|---|
| 0:05:27 | um so i tunes impact or are certain it's two examples | 
|---|
| 0:05:31 | of companies who are are making a a a large my money from | 
|---|
| 0:05:34 | from music um ideas | 
|---|
| 0:05:36 | um | 
|---|
| 0:05:37 | as the mention the the data is easy um we all have um large um C D collections | 
|---|
| 0:05:43 | and and | 
|---|
| 0:05:44 | one of the the things that | 
|---|
| 0:05:45 | that is difficult but music is a all copyrighted or all the stuff the wanna work with this operator | 
|---|
| 0:05:50 | yeah and one way that Q T out with this is by um doing a to some a talk what | 
|---|
| 0:05:55 | little bit | 
|---|
| 0:05:56 | but another way that that that you D has a as um work with these it is two | 
|---|
| 0:06:02 | create what's called the million song database | 
|---|
| 0:06:04 | um and the idea of this is to distribute features of the song not the not the actual | 
|---|
| 0:06:10 | copper the material | 
|---|
| 0:06:11 | and so um | 
|---|
| 0:06:13 | actual forget me if are i think it you a hundred features | 
|---|
| 0:06:16 | purse on and there over time to | 
|---|
| 0:06:18 | um | 
|---|
| 0:06:19 | and columbian an echo nist uh provide this database | 
|---|
| 0:06:22 | um at online | 
|---|
| 0:06:24 | and there's a of data there that that people when use and it's really available in it's a very large | 
|---|
| 0:06:29 | database | 
|---|
| 0:06:29 | and i expect we'll see more more papers | 
|---|
| 0:06:32 | um but uses database | 
|---|
| 0:06:34 | the the matrix is an is of been the the best um thing for the | 
|---|
| 0:06:39 | scientific if a component of music analysis music processing | 
|---|
| 0:06:42 | this is the you list of tasks | 
|---|
| 0:06:44 | that were that are being uh work done for the two thousand eleven competition same | 
|---|
| 0:06:48 | um as a matching it's a big issue and | 
|---|
| 0:06:51 | what the mean X people do um is | 
|---|
| 0:06:53 | provide an environment and universe you wanna or i where people can one are algorithms a large data base of | 
|---|
| 0:06:58 | of song | 
|---|
| 0:07:00 | so the songs never leave you know was so on or | 
|---|
| 0:07:02 | so instead of you know getting data and doing your algorithms and send results back | 
|---|
| 0:07:06 | you said you algorithm universe you on the white | 
|---|
| 0:07:08 | um in a particular environment java environment | 
|---|
| 0:07:11 | and they bought it a they could do about get the up to it for you | 
|---|
| 0:07:14 | and and then they run the algorithm and their machines and the clusters | 
|---|
| 0:07:17 | and give you like results | 
|---|
| 0:07:18 | i one to highlight um a three uh tasks | 
|---|
| 0:07:21 | that are so right here | 
|---|
| 0:07:23 | that are um very um important in very uh a popular | 
|---|
| 0:07:26 | what is audio tag um classification so how you tag audio with various things | 
|---|
| 0:07:30 | um is it happy use a blues | 
|---|
| 0:07:33 | um anything you think of can be a a attack | 
|---|
| 0:07:36 | and people were that very hard | 
|---|
| 0:07:38 | um what for fundamental frequency estimation tracking | 
|---|
| 0:07:40 | um has been popular a yeah i | 
|---|
| 0:07:42 | yeah i before merrick started | 
|---|
| 0:07:45 | but mirror X as i think of a coming database and and really up scientific level can not people can | 
|---|
| 0:07:51 | can compare things on around | 
|---|
| 0:07:53 | and the other one is a other get chord estimation | 
|---|
| 0:07:55 | so that sense a court is is to another tag | 
|---|
| 0:07:58 | but very specialised tearing | 
|---|
| 0:07:59 | and helps people understand a music and people work on a lot | 
|---|
| 0:08:03 | um something else as happen and spend very have it this year | 
|---|
| 0:08:06 | yeah is um a lower work can separation analysis | 
|---|
| 0:08:09 | and they are all very model different approaches | 
|---|
| 0:08:13 | so this particular um um graphical model um | 
|---|
| 0:08:17 | is for paper but um | 
|---|
| 0:08:19 | um | 
|---|
| 0:08:21 | my our open france a right | 
|---|
| 0:08:23 | and it's shows um a sequence the note along the top and so in this case a have a score | 
|---|
| 0:08:27 | in know what's what's being played and that's that hard information to get | 
|---|
| 0:08:30 | and then the generating um | 
|---|
| 0:08:33 | um data about the uh than the that harmonics | 
|---|
| 0:08:36 | um um from here so you have the the amplitude | 
|---|
| 0:08:39 | the free have no i and the variance of the of the of the gaussian in the spectral domain | 
|---|
| 0:08:43 | oops sorry that are combined | 
|---|
| 0:08:45 | and and then you have similar simple able in so these of the spectral slices | 
|---|
| 0:08:49 | in what you try to do what you trying to | 
|---|
| 0:08:51 | um given the note sequence you have um | 
|---|
| 0:08:53 | i'm sorry | 
|---|
| 0:08:55 | build a um or find the you the these | 
|---|
| 0:08:58 | um emission probabilities | 
|---|
| 0:09:00 | that describe a music | 
|---|
| 0:09:01 | and from that you can do a lot of um a very everything work | 
|---|
| 0:09:05 | um you can to do things like um tagging with to mentioned for things like a motion in john right | 
|---|
| 0:09:10 | and and uh uh um something that's kind of do to my heart but shows a the kind of work | 
|---|
| 0:09:15 | that's being done is area | 
|---|
| 0:09:16 | um some work i'm and morphing um | 
|---|
| 0:09:19 | and the question that um | 
|---|
| 0:09:21 | um quite a known and what they want to ask was | 
|---|
| 0:09:24 | what's the right way to think about um audio your perception | 
|---|
| 0:09:27 | and in morphing | 
|---|
| 0:09:29 | and so if you do more fink lee | 
|---|
| 0:09:31 | the | 
|---|
| 0:09:33 | the path in feature space should be a line | 
|---|
| 0:09:35 | so if you're morphing between one position another position | 
|---|
| 0:09:38 | that feature moves along a line in the will domain | 
|---|
| 0:09:40 | and you want the same sort of thing to happen in the auditory domain | 
|---|
| 0:09:44 | so | 
|---|
| 0:09:44 | the | 
|---|
| 0:09:45 | um | 
|---|
| 0:09:46 | the graph that shown here on the left them so put pro quality but just give you a sense of | 
|---|
| 0:09:50 | it | 
|---|
| 0:09:50 | or with or or a range of a line spectral free of frequency envelopes | 
|---|
| 0:09:56 | and then and the right hand side are | 
|---|
| 0:09:58 | all the perceptual measures that of been used there have been calculated based on these | 
|---|
| 0:10:03 | on these on L ourselves | 
|---|
| 0:10:05 | and what they're doing is final look for one that's a straight line would you can see and in the | 
|---|
| 0:10:08 | bill there | 
|---|
| 0:10:09 | and and um some pieces work better than others are i think that research is still being | 
|---|
| 0:10:14 | pursuit | 
|---|
| 0:10:17 | right so uh | 
|---|
| 0:10:18 | uh audio and acoustic signal processing T C | 
|---|
| 0:10:22 | covers was quite a wide range of areas um | 
|---|
| 0:10:25 | which are | 
|---|
| 0:10:26 | well | 
|---|
| 0:10:27 | i have to say that it to me there exciting i help you feel also that same excitement about said | 
|---|
| 0:10:32 | the technology that are being developed | 
|---|
| 0:10:34 | and and i think we see trends that a lot of the is this of being in the low archery | 
|---|
| 0:10:39 | for many years | 
|---|
| 0:10:41 | and now starting to come to the point of applications industrial applications | 
|---|
| 0:10:44 | and we for about some of these in the planner | 
|---|
| 0:10:47 | and and in that kind of context | 
|---|
| 0:10:50 | if we look at uh the research that we do | 
|---|
| 0:10:53 | um i ask a question of how much of it is driven by | 
|---|
| 0:10:57 | uh the that is i have for exciting applications | 
|---|
| 0:11:00 | and how much of it is fundamental how much of it | 
|---|
| 0:11:03 | underpins | 
|---|
| 0:11:04 | the | 
|---|
| 0:11:04 | technology with good algorithmic research | 
|---|
| 0:11:08 | um so i else you know is there a happy marriage here | 
|---|
| 0:11:14 | and uh i have the uh do you can touch is of cambridge will forgive me for using that photograph | 
|---|
| 0:11:19 | uh but there is a serious point a high this um but before we come to the series point | 
|---|
| 0:11:28 | um | 
|---|
| 0:11:29 | so uh of course prince william is very very pleased um having uh now find found is very fine bride | 
|---|
| 0:11:41 | so he's maximised is expectations | 
|---|
| 0:11:44 | um and uh i had a very uh happy day | 
|---|
| 0:11:48 | the there coming back to something a little bit more serious i think um things which look good have to | 
|---|
| 0:11:54 | be underpinned by | 
|---|
| 0:11:56 | excellent | 
|---|
| 0:11:57 | in uh algorithmic and fundamental research | 
|---|
| 0:12:00 | so if there is a trend perhaps | 
|---|
| 0:12:02 | two things that look great | 
|---|
| 0:12:04 | let's just not loose sight to the fact that the power | 
|---|
| 0:12:08 | behind them uh is | 
|---|
| 0:12:09 | uh the algorithms that we do | 
|---|
| 0:12:12 | okay | 
|---|
| 0:12:13 | so one of the areas of out grizzly research which is very hot and has been for a long time | 
|---|
| 0:12:18 | is in uh array signal processing is applied to | 
|---|
| 0:12:21 | microphones maybe also loudspeaker right | 
|---|
| 0:12:25 | and here we see um and even of applications hearing aids as been very busy for a long time | 
|---|
| 0:12:31 | and has a | 
|---|
| 0:12:32 | uh many applications as well as excellent underpinning technology | 
|---|
| 0:12:36 | i do see now a big brunch out into the living room | 
|---|
| 0:12:40 | and the living room means V | 
|---|
| 0:12:43 | it means entertainment perhaps it means an X box three sixty with a connects | 
|---|
| 0:12:47 | a microphone array uh perhaps it means sky T V | 
|---|
| 0:12:51 | and so these are new applications which are really coming on stream now | 
|---|
| 0:12:55 | and uh i think we'll start to shape | 
|---|
| 0:12:58 | the way that we do research | 
|---|
| 0:13:00 | at asks haven't to change that much we still want to do localization we still want to do tracking | 
|---|
| 0:13:05 | we still want to extract to decide source from any | 
|---|
| 0:13:08 | uh would be that noise or other tool "'cause" | 
|---|
| 0:13:11 | um and then and then you a pass a new task is to try to learn something about the acoustic | 
|---|
| 0:13:16 | environment | 
|---|
| 0:13:18 | from uh a by inferring it from the multichannel signals that we can obtain with the microphone right | 
|---|
| 0:13:24 | and this gives is a dish additional prior information on which we can condition estimation | 
|---|
| 0:13:30 | um | 
|---|
| 0:13:31 | know that it's you is what kind of microphone array should we use and how can we understand how it's | 
|---|
| 0:13:36 | gonna behave | 
|---|
| 0:13:38 | people started off perhaps looking at linear arrays | 
|---|
| 0:13:41 | um | 
|---|
| 0:13:41 | certainly extending it into play you and cylindrical and spherical even distributed or race that don't really have any geometry | 
|---|
| 0:13:48 | three | 
|---|
| 0:13:50 | and uh that's signed of such arrays including that's spacing | 
|---|
| 0:13:53 | of microphone elements and the orientation uh uh is uh an important an expanding topic i think | 
|---|
| 0:13:59 | people started off with linear arrays | 
|---|
| 0:14:01 | um | 
|---|
| 0:14:02 | a bunch of microphones in a line | 
|---|
| 0:14:04 | perhaps uh this is a well-known i can mike from M H acoustics | 
|---|
| 0:14:08 | uh thirty two sense on the surface of a rigid sphere a eight centimetres or so | 
|---|
| 0:14:13 | of the little bar or tree prototypes products | 
|---|
| 0:14:17 | the come now into real products you can buy | 
|---|
| 0:14:20 | and uh connect your T V sets sky T V | 
|---|
| 0:14:23 | as | 
|---|
| 0:14:24 | uh the opportunity to include microphone arrays | 
|---|
| 0:14:27 | for relatively low cost | 
|---|
| 0:14:28 | uh such that you can communicate uh using your living room equipment | 
|---|
| 0:14:33 | um | 
|---|
| 0:14:34 | for a a very low cost | 
|---|
| 0:14:35 | to | 
|---|
| 0:14:37 | communications and hardware well | 
|---|
| 0:14:39 | and the channel just here that you're probably sitting for me away from the microphone | 
|---|
| 0:14:44 | so uh uh uh this is going to be i think a really hot application for us | 
|---|
| 0:14:49 | in the future | 
|---|
| 0:14:52 | interestingly uh people are still doing fundamental research so i'm pleased to see that and that he's a paper i | 
|---|
| 0:14:57 | picked out uh | 
|---|
| 0:14:58 | i can't say a random but it caught my eye | 
|---|
| 0:15:01 | um he he's a problem given and the source is an M microphones | 
|---|
| 0:15:06 | where should you put the microphone | 
|---|
| 0:15:09 | and uh in this work which is some uh work i spotted from uh from the old about group | 
|---|
| 0:15:15 | i given a planar microphone array | 
|---|
| 0:15:17 | some analysis which enables one to predict | 
|---|
| 0:15:20 | the directivity index obtained for different geometries and therefore obviously then allows optimisation | 
|---|
| 0:15:26 | of those too much | 
|---|
| 0:15:29 | okay so source separation is uh another hot topic and has been for a while | 
|---|
| 0:15:34 | i thought i should say that's obviously trends | 
|---|
| 0:15:37 | start somewhere | 
|---|
| 0:15:38 | the trend | 
|---|
| 0:15:39 | has to begin with the trend setter | 
|---|
| 0:15:42 | and i put this photograph up of uh colin cherry | 
|---|
| 0:15:45 | um simply because i think he used to have the office which is above my office now so | 
|---|
| 0:15:50 | i also feel some kind of uh proximity effect | 
|---|
| 0:15:53 | um | 
|---|
| 0:15:54 | and uh his definition of the cocktail party in is nineteen fifties book on human communication has often is often | 
|---|
| 0:16:01 | quite it's in people's papers | 
|---|
| 0:16:03 | um and the early experiments were asking the question as to the behavior of listeners | 
|---|
| 0:16:08 | when they were receiving to almost simultaneous signals | 
|---|
| 0:16:11 | and uh | 
|---|
| 0:16:12 | cool that the cocktail party | 
|---|
| 0:16:14 | at the picture here i put it up on purpose because i don't think many people would really have a | 
|---|
| 0:16:19 | good image of what a cocktail party was in nineteen fifty | 
|---|
| 0:16:25 | and so i i guess it looks a bit different now a | 
|---|
| 0:16:29 | but anyway | 
|---|
| 0:16:30 | uh so | 
|---|
| 0:16:31 | progress in this area has led us to be able to handle cases where we have both that i mean | 
|---|
| 0:16:36 | and undeterred on to determine scenarios | 
|---|
| 0:16:39 | i'm clustering has been a very effective technique | 
|---|
| 0:16:42 | uh the permutation | 
|---|
| 0:16:44 | uh problem | 
|---|
| 0:16:46 | has been addressed uh with some great successes as well | 
|---|
| 0:16:49 | and now we're starting to see results in the practical context where we have reverberation as well | 
|---|
| 0:16:56 | the uh usual effect of reverberation is talked about in the context | 
|---|
| 0:17:00 | um of dereverberation algorithms for speech enhancement | 
|---|
| 0:17:04 | and uh this is something that i've uh myself tried to address | 
|---|
| 0:17:08 | and uh perhaps we now at the stage where there is a push to take some of the | 
|---|
| 0:17:13 | algorithms from the lab archery and start to roll them out into real world applications | 
|---|
| 0:17:19 | that's will then learn whether they work or not | 
|---|
| 0:17:22 | and uh we have to address the cases which are both single and channel case | 
|---|
| 0:17:27 | uh often by using acoustic channel inversion if we can estimate acoustic channel | 
|---|
| 0:17:33 | and although | 
|---|
| 0:17:35 | this is all | 
|---|
| 0:17:35 | a slight title speech enhancement of course reverberation | 
|---|
| 0:17:39 | uh is widely used | 
|---|
| 0:17:41 | both positively and has negative effects also in music so let's not lose sight of that | 
|---|
| 0:17:48 | the other factor which i wanted to touch on here was seen | 
|---|
| 0:17:52 | so | 
|---|
| 0:17:53 | and interdisciplinary research is often a favourites modality | 
|---|
| 0:17:57 | and did not community we can see some if it's coming from | 
|---|
| 0:18:01 | cross fertilisation of different topic areas | 
|---|
| 0:18:04 | for example | 
|---|
| 0:18:06 | all of uh dereverberation reverberation and blind source separation | 
|---|
| 0:18:09 | and we start to see papers where | 
|---|
| 0:18:11 | these are jointly | 
|---|
| 0:18:13 | uh uh uh addressed with some uh good leave each from both | 
|---|
| 0:18:17 | uh but types of techniques | 
|---|
| 0:18:19 | equally | 
|---|
| 0:18:20 | speech for uh dereverberation reverberation coupled with speech recognition | 
|---|
| 0:18:25 | where | 
|---|
| 0:18:26 | a classical speech recognizer is in hans | 
|---|
| 0:18:29 | uh such that it has knowledge of the models of clean speech but also | 
|---|
| 0:18:33 | has models for the reverberation | 
|---|
| 0:18:36 | and by combining these | 
|---|
| 0:18:37 | is able to make a a big improvements in a word accuracy | 
|---|
| 0:18:45 | so i want to talk a bit about a week or anything that i've been seeing or less two years | 
|---|
| 0:18:49 | um | 
|---|
| 0:18:50 | both in this community and an elsewhere but i thought i i'd and mention it here first and and | 
|---|
| 0:18:55 | and that's about sparsity | 
|---|
| 0:18:56 | um | 
|---|
| 0:18:57 | and and no we're not talking about my here | 
|---|
| 0:19:00 | um | 
|---|
| 0:19:03 | the | 
|---|
| 0:19:03 | first a i saw this um | 
|---|
| 0:19:05 | was in the matching pursuit work that was presented here and ninety seven i think that was first done and | 
|---|
| 0:19:10 | you know a signal processing | 
|---|
| 0:19:12 | a transactions and ninety three | 
|---|
| 0:19:14 | and um at the time i thought it was interesting but a dime idea | 
|---|
| 0:19:18 | um | 
|---|
| 0:19:20 | and so now i'm a crack myself | 
|---|
| 0:19:21 | um but it's own up a number of resting places um in in the work we that has been done | 
|---|
| 0:19:27 | um it i cast elsewhere | 
|---|
| 0:19:28 | um compressed sensing a a a few years ago um was a proper the best example | 
|---|
| 0:19:33 | um | 
|---|
| 0:19:34 | but in in this community um | 
|---|
| 0:19:36 | and we seen any can you know to sorry is still low just as deep belief network | 
|---|
| 0:19:41 | um | 
|---|
| 0:19:42 | sparsity D has been a big part of of the work that's been done on D of that works and | 
|---|
| 0:19:46 | in machine learning | 
|---|
| 0:19:47 | i think that's pen um you know sing | 
|---|
| 0:19:50 | and | 
|---|
| 0:19:51 | um in a lot of paper is that we saw this this year um | 
|---|
| 0:19:54 | L one regularization is a way of of providing solutions that that makes sense | 
|---|
| 0:20:00 | um | 
|---|
| 0:20:01 | when you have a very um go over determined um very complex um basis set | 
|---|
| 0:20:06 | and so i i | 
|---|
| 0:20:07 | i i title this or a spouse a D uh but it's probably better described a sparsity | 
|---|
| 0:20:12 | in combination with um over over complete basis sets | 
|---|
| 0:20:16 | and i think that combinations and resting | 
|---|
| 0:20:18 | oh one example of that um was talked about a little bit go | 
|---|
| 0:20:21 | and session before this | 
|---|
| 0:20:22 | um in the work by a um | 
|---|
| 0:20:24 | i i new in cr | 
|---|
| 0:20:26 | um using a cortical representation to um | 
|---|
| 0:20:30 | um | 
|---|
| 0:20:31 | to model sound | 
|---|
| 0:20:32 | and | 
|---|
| 0:20:33 | and courts is probably the original um | 
|---|
| 0:20:36 | a sparse representation | 
|---|
| 0:20:37 | um | 
|---|
| 0:20:38 | it predates all of us | 
|---|
| 0:20:40 | and and the idea is that you wanna represent sound with the least amount of of biological energy | 
|---|
| 0:20:46 | and what seems work well there is to use bikes there are | 
|---|
| 0:20:49 | represent of are very um | 
|---|
| 0:20:52 | a a distinct sound atoms and how the top put together is still a matter discussion | 
|---|
| 0:20:56 | but uh | 
|---|
| 0:20:57 | i think is the been gone be you know sing | 
|---|
| 0:20:59 | and the way a uh a new but and ch has been using that is two | 
|---|
| 0:21:03 | take noisy speech and input if you these kind of um this very overcomplete complete basis set | 
|---|
| 0:21:09 | and then | 
|---|
| 0:21:10 | um | 
|---|
| 0:21:12 | phil to it | 
|---|
| 0:21:13 | you and in we regions | 
|---|
| 0:21:15 | that that are | 
|---|
| 0:21:17 | likely to contain speech | 
|---|
| 0:21:19 | and so | 
|---|
| 0:21:20 | in a sense | 
|---|
| 0:21:21 | um it's a it's a wiener filter but it's in a very rich environment | 
|---|
| 0:21:25 | where it's very easy to separate um speech from noise and things like that | 
|---|
| 0:21:28 | and what's on the bottom is is noisy speech the kind of feel to that makes sense for speech | 
|---|
| 0:21:32 | which for example has a a lot of energy rather forwards modulation rate | 
|---|
| 0:21:36 | and then the clean clean speech on uh on the op | 
|---|
| 0:21:40 | um | 
|---|
| 0:21:40 | the deep belief networks are are you know thing um i think um for similar reason this all ties together | 
|---|
| 0:21:46 | um | 
|---|
| 0:21:46 | was shown in the left hand side it is um | 
|---|
| 0:21:49 | um | 
|---|
| 0:21:50 | is a little bit of a waveform that's been applied to a a | 
|---|
| 0:21:54 | a restricted boltzmann scene | 
|---|
| 0:21:56 | which is just a way of saying that they have a their legal learn weight matrix | 
|---|
| 0:21:59 | the transforms the input | 
|---|
| 0:22:01 | on the bottom here | 
|---|
| 0:22:03 | to an output | 
|---|
| 0:22:04 | uh so on top there | 
|---|
| 0:22:05 | few um a a a a make a weight matrix | 
|---|
| 0:22:08 | and is a what little bit of a nonlinear you there | 
|---|
| 0:22:11 | in a can learn these things in a way that um | 
|---|
| 0:22:14 | um | 
|---|
| 0:22:16 | can we construct input so find too | 
|---|
| 0:22:18 | find a basis vectors um on the side what where is that by the way picks vector X | 
|---|
| 0:22:23 | so that give "'em" of these guys they can we construct the the visible units it sorry | 
|---|
| 0:22:28 | um | 
|---|
| 0:22:28 | and these are some they been doing this for image processing domain for a long time | 
|---|
| 0:22:32 | and these are some results | 
|---|
| 0:22:33 | in the waveform domain there are there are new this year | 
|---|
| 0:22:36 | and there's a bunch of thing um things that often look like um | 
|---|
| 0:22:40 | uh gabor is a very sizes | 
|---|
| 0:22:42 | but the one thing as an or things you have to see some very complex features so this in the | 
|---|
| 0:22:46 | fixed a domain | 
|---|
| 0:22:47 | and you got these things that have to frequency P | 
|---|
| 0:22:49 | which you know might be akin to formants | 
|---|
| 0:22:52 | um | 
|---|
| 0:22:53 | and so they will applying that to to speech recognition and i think that's in sing direction | 
|---|
| 0:22:58 | i'm gonna limb here because um | 
|---|
| 0:23:00 | i think the reason that um | 
|---|
| 0:23:02 | suppose C D's important | 
|---|
| 0:23:04 | is it because it gives this a way of of representing things that we can't do with that we can't | 
|---|
| 0:23:08 | do was well in other domains | 
|---|
| 0:23:10 | so we have grew up with the voice transform domain and what's on an and a left can side at | 
|---|
| 0:23:14 | two basis functions | 
|---|
| 0:23:15 | is one a basis to just to frequencies | 
|---|
| 0:23:18 | and with those two basis functions you can represent the entire subspace space | 
|---|
| 0:23:22 | so that point that's shown there to be anyone that subspace and and you can do all those things | 
|---|
| 0:23:26 | and it's a very which representation is a as we all know | 
|---|
| 0:23:29 | you know as is a satisfy the nyquist criteria you can you can do anything | 
|---|
| 0:23:33 | but | 
|---|
| 0:23:34 | i think that's the problem with | 
|---|
| 0:23:35 | with | 
|---|
| 0:23:36 | a dense representation like that | 
|---|
| 0:23:37 | and alternative is to you is you look at something like an overcomplete bases | 
|---|
| 0:23:41 | and and just pick out elements at you've seen before | 
|---|
| 0:23:44 | so you you just as some synthetic formants | 
|---|
| 0:23:47 | but the way i like to think about these things working is that | 
|---|
| 0:23:50 | if you train um if you if you build a system that that it exploits um sparseness | 
|---|
| 0:23:55 | whether but belief network whether be matching pursuit | 
|---|
| 0:23:58 | um whatever your favourite implementation technology as | 
|---|
| 0:24:01 | you can learn patterns that look like these formants and so what's on the left is is one of all | 
|---|
| 0:24:06 | with different vocal tract lang | 
|---|
| 0:24:08 | and uh on the second and a and the right hand side as a different valid different vocal tract length | 
|---|
| 0:24:13 | and | 
|---|
| 0:24:15 | the system on the right with a sparse overcomplete representation is just gonna learn these kinds of things | 
|---|
| 0:24:20 | it's goal balls with different vocal tract length | 
|---|
| 0:24:22 | it's not colour need entire space | 
|---|
| 0:24:24 | and so that if you wanna process things | 
|---|
| 0:24:26 | if you working in this space | 
|---|
| 0:24:28 | then only things that are valid sound sounds it you seen before | 
|---|
| 0:24:31 | will be represented by the sparse basis fact | 
|---|
| 0:24:33 | but a basis that | 
|---|
| 0:24:34 | and it can do | 
|---|
| 0:24:35 | yeah useful things and so i think that's where it's can be an important trend in a port direction for | 
|---|
| 0:24:39 | unity | 
|---|
| 0:24:44 | so one of the things we wanted to do is to get out to different sectors of a a topic | 
|---|
| 0:24:48 | area and uh put in some uh i hopefully interesting quotations from | 
|---|
| 0:24:53 | uh i just in those field so | 
|---|
| 0:24:55 | and he's one that comes from um | 
|---|
| 0:24:58 | from T T so he we have telecommunications company | 
|---|
| 0:25:01 | uh thank you for uh to here here not at E | 
|---|
| 0:25:04 | for this code remaining challenges in source separation | 
|---|
| 0:25:08 | could include blind source separation for an unknown or dynamic | 
|---|
| 0:25:12 | number of source | 
|---|
| 0:25:14 | it is that i artificially officially in it's cherry jerry chair uh a photograph on the wall of the large | 
|---|
| 0:25:22 | uh into the E how what areas so if we think about mixed signal I sees | 
|---|
| 0:25:27 | uh the the guys at the working on those uh | 
|---|
| 0:25:31 | functionalities | 
|---|
| 0:25:32 | really support what we want to do | 
|---|
| 0:25:34 | uh so i think that that's important to to listen to the heart guys as well | 
|---|
| 0:25:39 | so from uh we'll so micro electronics | 
|---|
| 0:25:41 | uh most lower is driving dsp P speed and memory compacity and they billing implementation of sophisticated dsp functions | 
|---|
| 0:25:49 | resulting from me is of research | 
|---|
| 0:25:51 | the end user experience | 
|---|
| 0:25:53 | uh maybe this is a which rather than the reality of the moment | 
|---|
| 0:25:56 | the end user experience is one of natural white and voice communications devoid | 
|---|
| 0:26:01 | of acoustic background noise and unwanted artifacts | 
|---|
| 0:26:04 | seems to me like the hardware manufacturers are on our side | 
|---|
| 0:26:09 | um um we had uh a little bit this morning about the uh X box connect | 
|---|
| 0:26:13 | uh you found a have | 
|---|
| 0:26:15 | thanks | 
|---|
| 0:26:15 | for this uh a contribution here of the applications of sound capture and enhancement and processing technologies shift | 
|---|
| 0:26:23 | oh he's a paradigm shift | 
|---|
| 0:26:24 | shift gradually from communications | 
|---|
| 0:26:28 | which is where they | 
|---|
| 0:26:29 | where region eight isn't half the home | 
|---|
| 0:26:31 | mostly a towards mostly recognition and building natural human-machine interface | 
|---|
| 0:26:38 | uh and he highlights mobile devices | 
|---|
| 0:26:41 | "'cause" and living rooms | 
|---|
| 0:26:42 | i key application at | 
|---|
| 0:26:45 | malcolm you get the last word | 
|---|
| 0:26:46 | well i i don't the last word but but we we have one more slide and we can decide whether | 
|---|
| 0:26:50 | this is the last word from | 
|---|
| 0:26:51 | i'm steve jobs or from with a ga got a | 
|---|
| 0:26:54 | but in either case the message is same and this large commercial applications for the work that we're doing | 
|---|
| 0:26:59 | it started with um M P three which enable this market | 
|---|
| 0:27:03 | but this still a lot of things we done in terms of finding music | 
|---|
| 0:27:06 | um | 
|---|
| 0:27:07 | adding adding to things um understanding | 
|---|
| 0:27:09 | what what people's a team a needs are so we really haven't talked but that very much | 
|---|
| 0:27:12 | but | 
|---|
| 0:27:13 | um | 
|---|
| 0:27:14 | this is an information but this does not information retrieval task you know people looking for things that are chain | 
|---|
| 0:27:18 | themselves some whether be songs or or or or or music or whatever | 
|---|
| 0:27:22 | um i'm you signals and and working with them is an important thing to do | 
|---|
| 0:27:25 | and so | 
|---|
| 0:27:26 | um i think both lately got got and see jobs can have a final word | 
|---|
| 0:27:30 | so thank you | 
|---|
| 0:27:39 | so | 
|---|
| 0:27:40 | thank you | 
|---|
| 0:27:41 | my come and | 
|---|
| 0:27:42 | patrick rate | 
|---|
| 0:27:43 | a now we have very little time for discussion but we certainly should not miss this up you need T | 
|---|
| 0:27:49 | to hear other the voices as well as that we mentioned | 
|---|
| 0:27:52 | obviously these views are not completely balance | 
|---|
| 0:27:56 | how could it they be | 
|---|
| 0:27:58 | so maybe somebody in the for a would like to add some but | 
|---|
| 0:28:01 | something and we can | 
|---|
| 0:28:03 | a we have a little discussion on more | 
|---|
| 0:28:06 | anybody | 
|---|
| 0:28:08 | yeah | 
|---|
| 0:28:13 | a thank you for that great summary | 
|---|
| 0:28:15 | uh i just want to add one more thing i think up | 
|---|
| 0:28:18 | we have to a isn't two years and the work together | 
|---|
| 0:28:21 | and i think cross model issues are | 
|---|
| 0:28:24 | a likely to be very important the | 
|---|
| 0:28:27 | i eyes did act that you has and the years detect the eyes and so on and | 
|---|
| 0:28:30 | likewise i think uh audition audio research and B vision suck should not | 
|---|
| 0:28:35 | proceed separately | 
|---|
| 0:28:37 | thanks | 
|---|
| 0:28:38 | the money for this comment uh | 
|---|
| 0:28:41 | this is certainly something which we highly appreciate and we always like to be in touch with the | 
|---|
| 0:28:46 | multimedia guys would don C uh audio as a media | 
|---|
| 0:28:50 | um | 
|---|
| 0:28:51 | but uh uh | 
|---|
| 0:28:53 | certainly we uh there are many applications where we actually closely working | 
|---|
| 0:28:58 | with with your persons just think about | 
|---|
| 0:29:01 | uh celeste tracking | 
|---|
| 0:29:03 | so if you want to track some acoustic sources | 
|---|
| 0:29:06 | and the source a silent then you're a the uh you better use you camera | 
|---|
| 0:29:11 | so they are | 
|---|
| 0:29:12 | a quite a few applications with this is quite natural to joint for | 
|---|
| 0:29:19 | i i you know just a | 
|---|
| 0:29:21 | to reinforce that there was a nice people saw us to remember who did it | 
|---|
| 0:29:24 | with their looking for joint source | 
|---|
| 0:29:26 | joint audiovisual sources and i think that's | 
|---|
| 0:29:29 | it's important and | 
|---|
| 0:29:30 | it can be easier i mean | 
|---|
| 0:29:31 | the signals are no longer a big deal | 
|---|
| 0:29:34 | so it's easy to get to the space commuter power is pretty easy | 
|---|
| 0:29:37 | it would be fun | 
|---|
| 0:29:42 | followed that uh people have to | 
|---|
| 0:29:44 | okay follow that talks about four years | 
|---|
| 0:29:47 | uh is there any research uh | 
|---|
| 0:29:49 | well i use a pen binaural a single person sinful | 
|---|
| 0:29:53 | binaural uh for musical signal processing | 
|---|
| 0:29:59 | i don't i don't heat so the question was whether is any binaural music research um | 
|---|
| 0:30:03 | i don't know of any i mean people certainly worry about um synthesizing um hi | 
|---|
| 0:30:08 | um high fidelity sound fields | 
|---|
| 0:30:11 | so um | 
|---|
| 0:30:13 | um | 
|---|
| 0:30:14 | the fun of a group for example from working on on synthesizing | 
|---|
| 0:30:17 | you know sound field a sound good no matter where you are | 
|---|
| 0:30:20 | and and so you know work with people stand for | 
|---|
| 0:30:22 | where various in in computing in in creating three D sound fields | 
|---|
| 0:30:26 | for musical experiences | 
|---|
| 0:30:28 | um | 
|---|
| 0:30:29 | um but i much or where X i go yeah | 
|---|
| 0:30:33 | i mean i i i if you'd S be ten use you whether we have five point one speakers in | 
|---|
| 0:30:36 | the living room | 
|---|
| 0:30:37 | i was set no | 
|---|
| 0:30:38 | but | 
|---|
| 0:30:39 | look what's happened | 
|---|
| 0:30:40 | so we we better | 
|---|
| 0:30:46 | else before lunch | 
|---|
| 0:30:52 | okay you talked about uh five point ones because the living room but um | 
|---|
| 0:30:56 | or thing a lot of new algorithms that a little do uh microphone array processing | 
|---|
| 0:31:01 | well would be saying devices that let us do it | 
|---|
| 0:31:03 | i mean like soft connect has a a a a few microphones i've seen a few um | 
|---|
| 0:31:08 | cell phones that have multiple microphones on for noise cancellation will have more devices allow us to | 
|---|
| 0:31:14 | a better processing algorithm | 
|---|
| 0:31:16 | yeah so the question was what what we have devices that will have uh | 
|---|
| 0:31:19 | uh the ability to allow us to implement | 
|---|
| 0:31:23 | yeah | 
|---|
| 0:31:24 | so i P eyes | 
|---|
| 0:31:25 | so on so forth | 
|---|
| 0:31:26 | i i i understand from this morning talks that day be a a um a guys will be a software | 
|---|
| 0:31:30 | development kits will be available for connect | 
|---|
| 0:31:32 | um and that could be a lot of fun | 
|---|
| 0:31:34 | um i think uh the hardware is that to enable us to do it and | 
|---|
| 0:31:38 | the key point at of this i think is one of the trends that uh | 
|---|
| 0:31:43 | uh we use C which is a move | 
|---|
| 0:31:46 | in audio from single to multichannel | 
|---|
| 0:31:48 | that's been happening for a while and that is their sign of its stopping | 
|---|
| 0:31:52 | as so the of we would expect the facilities | 
|---|
| 0:31:54 | uh the processing power | 
|---|
| 0:31:56 | the uh inter operability and software development kits to come with that as well | 
|---|
| 0:32:05 | near the question | 
|---|
| 0:32:07 | comments | 
|---|
| 0:32:09 | i have one uh | 
|---|
| 0:32:10 | final remark which came mark | 
|---|
| 0:32:13 | increasingly uh | 
|---|
| 0:32:15 | and that would like to put that as a channel a challenge because | 
|---|
| 0:32:18 | uh they're sensor networks are out there and they are | 
|---|
| 0:32:21 | uh in discussion on | 
|---|
| 0:32:24 | in many papers where a nice uh | 
|---|
| 0:32:28 | algorithms are provided all ways based on the assumption that all the senses are synchronise | 
|---|
| 0:32:35 | um | 
|---|
| 0:32:36 | this is a | 
|---|
| 0:32:37 | tough problem actually so | 
|---|
| 0:32:39 | and we feel in the audio community we could a | 
|---|
| 0:32:43 | if a lot if somebody could really built devices which make sure that all the audio front ends in | 
|---|
| 0:32:49 | distributed to beauty work | 
|---|
| 0:32:51 | synchrony a the synchronise | 
|---|
| 0:32:53 | uh the underlying problem is simply the | 
|---|
| 0:32:57 | once you | 
|---|
| 0:32:58 | correlates signals of different senses that | 
|---|
| 0:33:01 | um have | 
|---|
| 0:33:03 | not exactly synchronous clocks | 
|---|
| 0:33:06 | the what uh this | 
|---|
| 0:33:08 | correlation | 
|---|
| 0:33:09 | will fall apart | 
|---|
| 0:33:11 | and | 
|---|
| 0:33:11 | just look at all your optimize nation and all the adaptive filtering stuff that we have | 
|---|
| 0:33:16 | it's always based on correlation and | 
|---|
| 0:33:18 | even higher orders the | 
|---|
| 0:33:20 | but then uh | 
|---|
| 0:33:22 | this problem has to be solved | 
|---|
| 0:33:24 | and so if you want to do something really | 
|---|
| 0:33:27 | uh a good for us then please solve this problem | 
|---|
| 0:33:32 | as a have after once | 
|---|
| 0:33:34 | after lunch okay | 
|---|
| 0:33:36 | thank you were much for attending | 
|---|