oh welcome ladies and gentlemen to this experts session on trains in or are and acoustic signal processing and is the that so many of you came and thank you are but in advance for postponing a lunch break a bit um i hope to mount will make it interesting i i was just reason the fight that we could also use this opportunity what you need to to do some advertisement for our a T C which is the T C and or you an acoustic signal processing as i'm not to really prepared for this or page take the whole thing as advertisement for a our T C and whoever wants to get involved please contact us and there are various ways of getting involved in our activities and of course we first one would like to tell you about what this is so am i i i in my are role as a posture of this T C and i would like to process to to to experts which are also from our T C which present the acoustic signal processing community and the audio community uh uh a very specific and i think of very uh we now and way i would like first like to uh point to pet we can a plea please skunk step forward a that you can be C but we can a is the from the imperial college london and and i think uh is the most important thing about to right now is that he just recently "'cause" did did the first book on speech to reverberation a for everything and you might look at is that sides which has also very nice pictures like can and uh on the other hand i have not come with well known for the audio or and music especially community and co he's score course actually much beyond that he from a research uh i should not i forget to mention that actually path have ties to both words though not come is oh also teaching that stand for that and patrick also has and that's true nations so we that further do you i i would say uh i should stop well thanks very much for coming along to this uh session help is gonna be interesting to you um we um try to think about what you might expect from this kind of session and i have to say that's the idea of trends is a very personal thing so uh we can to present uh what we personally think uh hopefully interesting things but uh obviously in the time concerns we we can't cover everything so some of these things are like uh a easy to define like counting papers as a measure of activity or counting achievements maybe in terms of except papers rather than submitted by papers some of them are much less uh uh uh uh how do you own be list and uh that more uh uh soft the concepts but we try to to go around this we a little bit and see what we can find so the first thing we did was to look at the distribution of submissions to uh the transactions on uh audio speech and language processing and uh i the plot this out that's a lot of detail on this pie chart here but the thing to note from this is that there is some big uh subjects which are very active within a community in terms of the amount of effort going into them so speech enhancement is a big one and has been for a long time source separation continues to be very active uh we fat ica sessions he uh to icassp uh microphone array signal processing still very big and uh showing up something like thirteen percent of submissions a content based music processing that's just called it music processing music is huge for us now music is huge for us and continues to grow as race if not and um uh this is a a a uh real even lucien that we sing maybe even a revolution in our uh profile of activities is uh also we could look at audio analysis as a as a big topic the ones that i've highlighted they're are the ones that we can to try to focus on in this session as i mentioned we can't possibly focus on everything so that leads just to music so some music is um become very big here as as patrick mentioned and and this year at i cast there are three sessions as you can um see listed there there's a number of reasons i thought well worth highlighting just because the is in to see how the Q to develop um so the the reasons is that the you X which is how people describe would papers there many a describe described paper it's meeting to conference um was changed to include music as an absent so it's a rather bureaucratic um we same but it probably has a lot large much to do with the fact that there's some music papers now at icassp in M and was i think that's a good idea um a second reason is as a lot more content to work with um music six easy to work with as we you know we all own large collections um and and the third reason is is become a very commercially relevant in the last few years um so i tunes impact or are certain it's two examples of companies who are are making a a a large my money from from music um ideas um as the mention the the data is easy um we all have um large um C D collections and and one of the the things that that is difficult but music is a all copyrighted or all the stuff the wanna work with this operator yeah and one way that Q T out with this is by um doing a to some a talk what little bit but another way that that that you D has a as um work with these it is two create what's called the million song database um and the idea of this is to distribute features of the song not the not the actual copper the material and so um actual forget me if are i think it you a hundred features purse on and there over time to um and columbian an echo nist uh provide this database um at online and there's a of data there that that people when use and it's really available in it's a very large database and i expect we'll see more more papers um but uses database the the matrix is an is of been the the best um thing for the scientific if a component of music analysis music processing this is the you list of tasks that were that are being uh work done for the two thousand eleven competition same um as a matching it's a big issue and what the mean X people do um is provide an environment and universe you wanna or i where people can one are algorithms a large data base of of song so the songs never leave you know was so on or so instead of you know getting data and doing your algorithms and send results back you said you algorithm universe you on the white um in a particular environment java environment and they bought it a they could do about get the up to it for you and and then they run the algorithm and their machines and the clusters and give you like results i one to highlight um a three uh tasks that are so right here that are um very um important in very uh a popular what is audio tag um classification so how you tag audio with various things um is it happy use a blues um anything you think of can be a a attack and people were that very hard um what for fundamental frequency estimation tracking um has been popular a yeah i yeah i before merrick started but mirror X as i think of a coming database and and really up scientific level can not people can can compare things on around and the other one is a other get chord estimation so that sense a court is is to another tag but very specialised tearing and helps people understand a music and people work on a lot um something else as happen and spend very have it this year yeah is um a lower work can separation analysis and they are all very model different approaches so this particular um um graphical model um is for paper but um um my our open france a right and it's shows um a sequence the note along the top and so in this case a have a score in know what's what's being played and that's that hard information to get and then the generating um um data about the uh than the that harmonics um um from here so you have the the amplitude the free have no i and the variance of the of the of the gaussian in the spectral domain oops sorry that are combined and and then you have similar simple able in so these of the spectral slices in what you try to do what you trying to um given the note sequence you have um i'm sorry build a um or find the you the these um emission probabilities that describe a music and from that you can do a lot of um a very everything work um you can to do things like um tagging with to mentioned for things like a motion in john right and and uh uh um something that's kind of do to my heart but shows a the kind of work that's being done is area um some work i'm and morphing um and the question that um um quite a known and what they want to ask was what's the right way to think about um audio your perception and in morphing and so if you do more fink lee the the path in feature space should be a line so if you're morphing between one position another position that feature moves along a line in the will domain and you want the same sort of thing to happen in the auditory domain so the um the graph that shown here on the left them so put pro quality but just give you a sense of it or with or or a range of a line spectral free of frequency envelopes and then and the right hand side are all the perceptual measures that of been used there have been calculated based on these on these on L ourselves and what they're doing is final look for one that's a straight line would you can see and in the bill there and and um some pieces work better than others are i think that research is still being pursuit right so uh uh audio and acoustic signal processing T C covers was quite a wide range of areas um which are well i have to say that it to me there exciting i help you feel also that same excitement about said the technology that are being developed and and i think we see trends that a lot of the is this of being in the low archery for many years and now starting to come to the point of applications industrial applications and we for about some of these in the planner and and in that kind of context if we look at uh the research that we do um i ask a question of how much of it is driven by uh the that is i have for exciting applications and how much of it is fundamental how much of it underpins the technology with good algorithmic research um so i else you know is there a happy marriage here and uh i have the uh do you can touch is of cambridge will forgive me for using that photograph uh but there is a serious point a high this um but before we come to the series point um so uh of course prince william is very very pleased um having uh now find found is very fine bride so he's maximised is expectations um and uh i had a very uh happy day the there coming back to something a little bit more serious i think um things which look good have to be underpinned by excellent in uh algorithmic and fundamental research so if there is a trend perhaps two things that look great let's just not loose sight to the fact that the power behind them uh is uh the algorithms that we do okay so one of the areas of out grizzly research which is very hot and has been for a long time is in uh array signal processing is applied to microphones maybe also loudspeaker right and here we see um and even of applications hearing aids as been very busy for a long time and has a uh many applications as well as excellent underpinning technology i do see now a big brunch out into the living room and the living room means V it means entertainment perhaps it means an X box three sixty with a connects a microphone array uh perhaps it means sky T V and so these are new applications which are really coming on stream now and uh i think we'll start to shape the way that we do research at asks haven't to change that much we still want to do localization we still want to do tracking we still want to extract to decide source from any uh would be that noise or other tool "'cause" um and then and then you a pass a new task is to try to learn something about the acoustic environment from uh a by inferring it from the multichannel signals that we can obtain with the microphone right and this gives is a dish additional prior information on which we can condition estimation um know that it's you is what kind of microphone array should we use and how can we understand how it's gonna behave people started off perhaps looking at linear arrays um certainly extending it into play you and cylindrical and spherical even distributed or race that don't really have any geometry three and uh that's signed of such arrays including that's spacing of microphone elements and the orientation uh uh is uh an important an expanding topic i think people started off with linear arrays um a bunch of microphones in a line perhaps uh this is a well-known i can mike from M H acoustics uh thirty two sense on the surface of a rigid sphere a eight centimetres or so of the little bar or tree prototypes products the come now into real products you can buy and uh connect your T V sets sky T V as uh the opportunity to include microphone arrays for relatively low cost uh such that you can communicate uh using your living room equipment um for a a very low cost to communications and hardware well and the channel just here that you're probably sitting for me away from the microphone so uh uh uh this is going to be i think a really hot application for us in the future interestingly uh people are still doing fundamental research so i'm pleased to see that and that he's a paper i picked out uh i can't say a random but it caught my eye um he he's a problem given and the source is an M microphones where should you put the microphone and uh in this work which is some uh work i spotted from uh from the old about group i given a planar microphone array some analysis which enables one to predict the directivity index obtained for different geometries and therefore obviously then allows optimisation of those too much okay so source separation is uh another hot topic and has been for a while i thought i should say that's obviously trends start somewhere the trend has to begin with the trend setter and i put this photograph up of uh colin cherry um simply because i think he used to have the office which is above my office now so i also feel some kind of uh proximity effect um and uh his definition of the cocktail party in is nineteen fifties book on human communication has often is often quite it's in people's papers um and the early experiments were asking the question as to the behavior of listeners when they were receiving to almost simultaneous signals and uh cool that the cocktail party at the picture here i put it up on purpose because i don't think many people would really have a good image of what a cocktail party was in nineteen fifty and so i i guess it looks a bit different now a but anyway uh so progress in this area has led us to be able to handle cases where we have both that i mean and undeterred on to determine scenarios i'm clustering has been a very effective technique uh the permutation uh problem has been addressed uh with some great successes as well and now we're starting to see results in the practical context where we have reverberation as well the uh usual effect of reverberation is talked about in the context um of dereverberation algorithms for speech enhancement and uh this is something that i've uh myself tried to address and uh perhaps we now at the stage where there is a push to take some of the algorithms from the lab archery and start to roll them out into real world applications that's will then learn whether they work or not and uh we have to address the cases which are both single and channel case uh often by using acoustic channel inversion if we can estimate acoustic channel and although this is all a slight title speech enhancement of course reverberation uh is widely used both positively and has negative effects also in music so let's not lose sight of that the other factor which i wanted to touch on here was seen so and interdisciplinary research is often a favourites modality and did not community we can see some if it's coming from cross fertilisation of different topic areas for example all of uh dereverberation reverberation and blind source separation and we start to see papers where these are jointly uh uh uh addressed with some uh good leave each from both uh but types of techniques equally speech for uh dereverberation reverberation coupled with speech recognition where a classical speech recognizer is in hans uh such that it has knowledge of the models of clean speech but also has models for the reverberation and by combining these is able to make a a big improvements in a word accuracy so i want to talk a bit about a week or anything that i've been seeing or less two years um both in this community and an elsewhere but i thought i i'd and mention it here first and and and that's about sparsity um and and no we're not talking about my here um the first a i saw this um was in the matching pursuit work that was presented here and ninety seven i think that was first done and you know a signal processing a transactions and ninety three and um at the time i thought it was interesting but a dime idea um and so now i'm a crack myself um but it's own up a number of resting places um in in the work we that has been done um it i cast elsewhere um compressed sensing a a a few years ago um was a proper the best example um but in in this community um and we seen any can you know to sorry is still low just as deep belief network um sparsity D has been a big part of of the work that's been done on D of that works and in machine learning i think that's pen um you know sing and um in a lot of paper is that we saw this this year um L one regularization is a way of of providing solutions that that makes sense um when you have a very um go over determined um very complex um basis set and so i i i i title this or a spouse a D uh but it's probably better described a sparsity in combination with um over over complete basis sets and i think that combinations and resting oh one example of that um was talked about a little bit go and session before this um in the work by a um i i new in cr um using a cortical representation to um um to model sound and and courts is probably the original um a sparse representation um it predates all of us and and the idea is that you wanna represent sound with the least amount of of biological energy and what seems work well there is to use bikes there are represent of are very um a a distinct sound atoms and how the top put together is still a matter discussion but uh i think is the been gone be you know sing and the way a uh a new but and ch has been using that is two take noisy speech and input if you these kind of um this very overcomplete complete basis set and then um phil to it you and in we regions that that are likely to contain speech and so in a sense um it's a it's a wiener filter but it's in a very rich environment where it's very easy to separate um speech from noise and things like that and what's on the bottom is is noisy speech the kind of feel to that makes sense for speech which for example has a a lot of energy rather forwards modulation rate and then the clean clean speech on uh on the op um the deep belief networks are are you know thing um i think um for similar reason this all ties together um was shown in the left hand side it is um um is a little bit of a waveform that's been applied to a a a restricted boltzmann scene which is just a way of saying that they have a their legal learn weight matrix the transforms the input on the bottom here to an output uh so on top there few um a a a a make a weight matrix and is a what little bit of a nonlinear you there in a can learn these things in a way that um um can we construct input so find too find a basis vectors um on the side what where is that by the way picks vector X so that give "'em" of these guys they can we construct the the visible units it sorry um and these are some they been doing this for image processing domain for a long time and these are some results in the waveform domain there are there are new this year and there's a bunch of thing um things that often look like um uh gabor is a very sizes but the one thing as an or things you have to see some very complex features so this in the fixed a domain and you got these things that have to frequency P which you know might be akin to formants um and so they will applying that to to speech recognition and i think that's in sing direction i'm gonna limb here because um i think the reason that um suppose C D's important is it because it gives this a way of of representing things that we can't do with that we can't do was well in other domains so we have grew up with the voice transform domain and what's on an and a left can side at two basis functions is one a basis to just to frequencies and with those two basis functions you can represent the entire subspace space so that point that's shown there to be anyone that subspace and and you can do all those things and it's a very which representation is a as we all know you know as is a satisfy the nyquist criteria you can you can do anything but i think that's the problem with with a dense representation like that and alternative is to you is you look at something like an overcomplete bases and and just pick out elements at you've seen before so you you just as some synthetic formants but the way i like to think about these things working is that if you train um if you if you build a system that that it exploits um sparseness whether but belief network whether be matching pursuit um whatever your favourite implementation technology as you can learn patterns that look like these formants and so what's on the left is is one of all with different vocal tract lang and uh on the second and a and the right hand side as a different valid different vocal tract length and the system on the right with a sparse overcomplete representation is just gonna learn these kinds of things it's goal balls with different vocal tract length it's not colour need entire space and so that if you wanna process things if you working in this space then only things that are valid sound sounds it you seen before will be represented by the sparse basis fact but a basis that and it can do yeah useful things and so i think that's where it's can be an important trend in a port direction for unity so one of the things we wanted to do is to get out to different sectors of a a topic area and uh put in some uh i hopefully interesting quotations from uh i just in those field so and he's one that comes from um from T T so he we have telecommunications company uh thank you for uh to here here not at E for this code remaining challenges in source separation could include blind source separation for an unknown or dynamic number of source it is that i artificially officially in it's cherry jerry chair uh a photograph on the wall of the large uh into the E how what areas so if we think about mixed signal I sees uh the the guys at the working on those uh functionalities really support what we want to do uh so i think that that's important to to listen to the heart guys as well so from uh we'll so micro electronics uh most lower is driving dsp P speed and memory compacity and they billing implementation of sophisticated dsp functions resulting from me is of research the end user experience uh maybe this is a which rather than the reality of the moment the end user experience is one of natural white and voice communications devoid of acoustic background noise and unwanted artifacts seems to me like the hardware manufacturers are on our side um um we had uh a little bit this morning about the uh X box connect uh you found a have thanks for this uh a contribution here of the applications of sound capture and enhancement and processing technologies shift oh he's a paradigm shift shift gradually from communications which is where they where region eight isn't half the home mostly a towards mostly recognition and building natural human-machine interface uh and he highlights mobile devices "'cause" and living rooms i key application at malcolm you get the last word well i i don't the last word but but we we have one more slide and we can decide whether this is the last word from i'm steve jobs or from with a ga got a but in either case the message is same and this large commercial applications for the work that we're doing it started with um M P three which enable this market but this still a lot of things we done in terms of finding music um adding adding to things um understanding what what people's a team a needs are so we really haven't talked but that very much but um this is an information but this does not information retrieval task you know people looking for things that are chain themselves some whether be songs or or or or or music or whatever um i'm you signals and and working with them is an important thing to do and so um i think both lately got got and see jobs can have a final word so thank you so thank you my come and patrick rate a now we have very little time for discussion but we certainly should not miss this up you need T to hear other the voices as well as that we mentioned obviously these views are not completely balance how could it they be so maybe somebody in the for a would like to add some but something and we can a we have a little discussion on more anybody yeah a thank you for that great summary uh i just want to add one more thing i think up we have to a isn't two years and the work together and i think cross model issues are a likely to be very important the i eyes did act that you has and the years detect the eyes and so on and likewise i think uh audition audio research and B vision suck should not proceed separately thanks the money for this comment uh this is certainly something which we highly appreciate and we always like to be in touch with the multimedia guys would don C uh audio as a media um but uh uh certainly we uh there are many applications where we actually closely working with with your persons just think about uh celeste tracking so if you want to track some acoustic sources and the source a silent then you're a the uh you better use you camera so they are a quite a few applications with this is quite natural to joint for i i you know just a to reinforce that there was a nice people saw us to remember who did it with their looking for joint source joint audiovisual sources and i think that's it's important and it can be easier i mean the signals are no longer a big deal so it's easy to get to the space commuter power is pretty easy it would be fun followed that uh people have to okay follow that talks about four years uh is there any research uh well i use a pen binaural a single person sinful binaural uh for musical signal processing i don't i don't heat so the question was whether is any binaural music research um i don't know of any i mean people certainly worry about um synthesizing um hi um high fidelity sound fields so um um the fun of a group for example from working on on synthesizing you know sound field a sound good no matter where you are and and so you know work with people stand for where various in in computing in in creating three D sound fields for musical experiences um um but i much or where X i go yeah i mean i i i if you'd S be ten use you whether we have five point one speakers in the living room i was set no but look what's happened so we we better else before lunch okay you talked about uh five point ones because the living room but um or thing a lot of new algorithms that a little do uh microphone array processing well would be saying devices that let us do it i mean like soft connect has a a a a few microphones i've seen a few um cell phones that have multiple microphones on for noise cancellation will have more devices allow us to a better processing algorithm yeah so the question was what what we have devices that will have uh uh the ability to allow us to implement yeah so i P eyes so on so forth i i i understand from this morning talks that day be a a um a guys will be a software development kits will be available for connect um and that could be a lot of fun um i think uh the hardware is that to enable us to do it and the key point at of this i think is one of the trends that uh uh we use C which is a move in audio from single to multichannel that's been happening for a while and that is their sign of its stopping as so the of we would expect the facilities uh the processing power the uh inter operability and software development kits to come with that as well near the question comments i have one uh final remark which came mark increasingly uh and that would like to put that as a channel a challenge because uh they're sensor networks are out there and they are uh in discussion on in many papers where a nice uh algorithms are provided all ways based on the assumption that all the senses are synchronise um this is a tough problem actually so and we feel in the audio community we could a if a lot if somebody could really built devices which make sure that all the audio front ends in distributed to beauty work synchrony a the synchronise uh the underlying problem is simply the once you correlates signals of different senses that um have not exactly synchronous clocks the what uh this correlation will fall apart and just look at all your optimize nation and all the adaptive filtering stuff that we have it's always based on correlation and even higher orders the but then uh this problem has to be solved and so if you want to do something really uh a good for us then please solve this problem as a have after once after lunch okay thank you were much for attending