Přepis řeči - Trends in Audio and Acoustic Signal Processing

0:00:13	oh
0:00:13	welcome
0:00:15	ladies and gentlemen to this
0:00:17	experts session on trains in or are and acoustic signal processing
0:00:23	and is
0:00:24	the that so many of you came
0:00:27	and thank you are but in advance for postponing a lunch break a bit
0:00:31	um i hope to mount will make it interesting
0:00:34	i i was just reason the fight that we could also use this opportunity what you need to to do
0:00:39	some advertisement for our a T C which is the T C and
0:00:42	or you an acoustic signal processing
0:00:44	as i'm not to really prepared for this or page take the whole thing as advertisement
0:00:50	for a our T C and whoever wants to get involved
0:00:53	please contact us
0:00:55	and
0:00:55	there are various ways of getting involved in our activities
0:00:59	and of course we first one would like to
0:01:01	tell you about what this is
0:01:03	so am
0:01:04	i i i in my are role as a posture of this T C and
0:01:09	i would like to process to to to experts which are also from our T C which present the
0:01:14	acoustic signal processing community and the audio community
0:01:18	uh uh a very specific and i think of very
0:01:21	uh
0:01:21	we now and way i would like first like to
0:01:24	uh point to pet we can a plea please skunk step forward
0:01:28	a that you can be C
0:01:30	but we can a is
0:01:32	the
0:01:33	from the imperial college london
0:01:36	and and i think uh is the most important thing about to right now is
0:01:40	that he just recently "'cause" did did the first book on speech to reverberation
0:01:45	a for everything and you might look at is that sides which has also very nice pictures like can
0:01:52	and uh on the other hand i have not come
0:01:55	with well known for
0:01:57	the audio or and
0:01:59	music especially community and co
0:02:02	he's score course actually much beyond that
0:02:04	he from a research
0:02:07	uh i should not i forget to mention that actually path have ties
0:02:11	to both words though
0:02:13	not come is
0:02:14	oh also teaching that stand for that and patrick also has
0:02:18	and that's true nations
0:02:20	so we that further do you i i would say uh i should stop
0:02:28	well thanks very much for coming along to this uh session help is gonna be interesting to you
0:02:33	um
0:02:34	we um
0:02:36	try to think about what you might expect from this kind of session
0:02:40	and i have to say that's
0:02:42	the idea of trends is a very personal thing
0:02:45	so uh we can to present
0:02:47	uh what we personally think uh hopefully interesting things
0:02:51	but uh obviously in the time concerns we
0:02:54	we can't cover everything so some of these things are like uh
0:02:58	a easy to define like counting papers as a measure of activity
0:03:02	or counting achievements maybe in terms of except papers rather than submitted by papers
0:03:07	some of them are much less uh
0:03:09	uh uh uh how do you own be list
0:03:12	and uh that more uh uh soft the concepts but we try to to go around this we a little
0:03:17	bit
0:03:18	and see what we can find
0:03:21	so the first thing we did was to look at the distribution of submissions to
0:03:25	uh the transactions on uh audio speech and language processing
0:03:29	and uh
0:03:30	i the plot this out that's a lot of detail on this pie chart here
0:03:34	but the thing to note from this
0:03:36	is that there is some big
0:03:38	uh subjects which are very active within a community in terms of the amount of effort
0:03:44	going into them
0:03:45	so speech enhancement is a big one and has been for a long time
0:03:50	source separation continues to be very active
0:03:53	uh we fat ica sessions he
0:03:55	uh to icassp uh
0:03:58	microphone array signal processing
0:04:00	still very big and uh showing up something like thirteen percent of submissions
0:04:05	a content based music processing that's just called it music processing
0:04:09	music is huge for us now music is huge for us and continues to grow
0:04:15	as race if not
0:04:17	and um
0:04:18	uh this is a a a uh real even lucien that we sing maybe even a revolution
0:04:23	in our uh profile of activities is
0:04:26	uh also we could look at audio analysis as a
0:04:29	as a big topic
0:04:30	the ones that i've highlighted they're are the ones that we can to try to focus on in this session
0:04:34	as i mentioned we can't possibly focus on
0:04:37	everything
0:04:39	so that leads just to music
0:04:41	so some music is um become very big here as as patrick mentioned and and this year at i cast
0:04:46	there
0:04:47	are three sessions as you can um see listed there
0:04:49	there's a number of reasons i thought well worth highlighting just because the is in to see how the Q
0:04:53	to develop
0:04:54	um so the the reasons is that the you X which is how people describe would papers there many a
0:04:59	describe described paper it's meeting
0:05:00	to conference
0:05:01	um was changed to include music as an absent so
0:05:05	it's a rather bureaucratic
0:05:06	um we same
0:05:08	but it probably has a lot large much to do with the fact that there's some music papers now at
0:05:12	icassp in M
0:05:13	and was i think that's a good idea
0:05:15	um a second reason is as a lot more content to work with um
0:05:18	music six easy to work with as we you know we all own large collections
0:05:22	um and and the third reason is is become a very commercially relevant in the last few years
0:05:27	um so i tunes impact or are certain it's two examples
0:05:31	of companies who are are making a a a large my money from
0:05:34	from music um ideas
0:05:36	um
0:05:37	as the mention the the data is easy um we all have um large um C D collections
0:05:43	and and
0:05:44	one of the the things that
0:05:45	that is difficult but music is a all copyrighted or all the stuff the wanna work with this operator
0:05:50	yeah and one way that Q T out with this is by um doing a to some a talk what
0:05:55	little bit
0:05:56	but another way that that that you D has a as um work with these it is two
0:06:02	create what's called the million song database
0:06:04	um and the idea of this is to distribute features of the song not the not the actual
0:06:10	copper the material
0:06:11	and so um
0:06:13	actual forget me if are i think it you a hundred features
0:06:16	purse on and there over time to
0:06:18	um
0:06:19	and columbian an echo nist uh provide this database
0:06:22	um at online
0:06:24	and there's a of data there that that people when use and it's really available in it's a very large
0:06:29	database
0:06:29	and i expect we'll see more more papers
0:06:32	um but uses database
0:06:34	the the matrix is an is of been the the best um thing for the
0:06:39	scientific if a component of music analysis music processing
0:06:42	this is the you list of tasks
0:06:44	that were that are being uh work done for the two thousand eleven competition same
0:06:48	um as a matching it's a big issue and
0:06:51	what the mean X people do um is
0:06:53	provide an environment and universe you wanna or i where people can one are algorithms a large data base of
0:06:58	of song
0:07:00	so the songs never leave you know was so on or
0:07:02	so instead of you know getting data and doing your algorithms and send results back
0:07:06	you said you algorithm universe you on the white
0:07:08	um in a particular environment java environment
0:07:11	and they bought it a they could do about get the up to it for you
0:07:14	and and then they run the algorithm and their machines and the clusters
0:07:17	and give you like results
0:07:18	i one to highlight um a three uh tasks
0:07:21	that are so right here
0:07:23	that are um very um important in very uh a popular
0:07:26	what is audio tag um classification so how you tag audio with various things
0:07:30	um is it happy use a blues
0:07:33	um anything you think of can be a a attack
0:07:36	and people were that very hard
0:07:38	um what for fundamental frequency estimation tracking
0:07:40	um has been popular a yeah i
0:07:42	yeah i before merrick started
0:07:45	but mirror X as i think of a coming database and and really up scientific level can not people can
0:07:51	can compare things on around
0:07:53	and the other one is a other get chord estimation
0:07:55	so that sense a court is is to another tag
0:07:58	but very specialised tearing
0:07:59	and helps people understand a music and people work on a lot
0:08:03	um something else as happen and spend very have it this year
0:08:06	yeah is um a lower work can separation analysis
0:08:09	and they are all very model different approaches
0:08:13	so this particular um um graphical model um
0:08:17	is for paper but um
0:08:19	um
0:08:21	my our open france a right
0:08:23	and it's shows um a sequence the note along the top and so in this case a have a score
0:08:27	in know what's what's being played and that's that hard information to get
0:08:30	and then the generating um
0:08:33	um data about the uh than the that harmonics
0:08:36	um um from here so you have the the amplitude
0:08:39	the free have no i and the variance of the of the of the gaussian in the spectral domain
0:08:43	oops sorry that are combined
0:08:45	and and then you have similar simple able in so these of the spectral slices
0:08:49	in what you try to do what you trying to
0:08:51	um given the note sequence you have um
0:08:53	i'm sorry
0:08:55	build a um or find the you the these
0:08:58	um emission probabilities
0:09:00	that describe a music
0:09:01	and from that you can do a lot of um a very everything work
0:09:05	um you can to do things like um tagging with to mentioned for things like a motion in john right
0:09:10	and and uh uh um something that's kind of do to my heart but shows a the kind of work
0:09:15	that's being done is area
0:09:16	um some work i'm and morphing um
0:09:19	and the question that um
0:09:21	um quite a known and what they want to ask was
0:09:24	what's the right way to think about um audio your perception
0:09:27	and in morphing
0:09:29	and so if you do more fink lee
0:09:31	the
0:09:33	the path in feature space should be a line
0:09:35	so if you're morphing between one position another position
0:09:38	that feature moves along a line in the will domain
0:09:40	and you want the same sort of thing to happen in the auditory domain
0:09:44	so
0:09:44	the
0:09:45	um
0:09:46	the graph that shown here on the left them so put pro quality but just give you a sense of
0:09:50	it
0:09:50	or with or or a range of a line spectral free of frequency envelopes
0:09:56	and then and the right hand side are
0:09:58	all the perceptual measures that of been used there have been calculated based on these
0:10:03	on these on L ourselves
0:10:05	and what they're doing is final look for one that's a straight line would you can see and in the
0:10:08	bill there
0:10:09	and and um some pieces work better than others are i think that research is still being
0:10:14	pursuit
0:10:17	right so uh
0:10:18	uh audio and acoustic signal processing T C
0:10:22	covers was quite a wide range of areas um
0:10:25	which are
0:10:26	well
0:10:27	i have to say that it to me there exciting i help you feel also that same excitement about said
0:10:32	the technology that are being developed
0:10:34	and and i think we see trends that a lot of the is this of being in the low archery
0:10:39	for many years
0:10:41	and now starting to come to the point of applications industrial applications
0:10:44	and we for about some of these in the planner
0:10:47	and and in that kind of context
0:10:50	if we look at uh the research that we do
0:10:53	um i ask a question of how much of it is driven by
0:10:57	uh the that is i have for exciting applications
0:11:00	and how much of it is fundamental how much of it
0:11:03	underpins
0:11:04	the
0:11:04	technology with good algorithmic research
0:11:08	um so i else you know is there a happy marriage here
0:11:14	and uh i have the uh do you can touch is of cambridge will forgive me for using that photograph
0:11:19	uh but there is a serious point a high this um but before we come to the series point
0:11:28	um
0:11:29	so uh of course prince william is very very pleased um having uh now find found is very fine bride
0:11:41	so he's maximised is expectations
0:11:44	um and uh i had a very uh happy day
0:11:48	the there coming back to something a little bit more serious i think um things which look good have to
0:11:54	be underpinned by
0:11:56	excellent
0:11:57	in uh algorithmic and fundamental research
0:12:00	so if there is a trend perhaps
0:12:02	two things that look great
0:12:04	let's just not loose sight to the fact that the power
0:12:08	behind them uh is
0:12:09	uh the algorithms that we do
0:12:12	okay
0:12:13	so one of the areas of out grizzly research which is very hot and has been for a long time
0:12:18	is in uh array signal processing is applied to
0:12:21	microphones maybe also loudspeaker right
0:12:25	and here we see um and even of applications hearing aids as been very busy for a long time
0:12:31	and has a
0:12:32	uh many applications as well as excellent underpinning technology
0:12:36	i do see now a big brunch out into the living room
0:12:40	and the living room means V
0:12:43	it means entertainment perhaps it means an X box three sixty with a connects
0:12:47	a microphone array uh perhaps it means sky T V
0:12:51	and so these are new applications which are really coming on stream now
0:12:55	and uh i think we'll start to shape
0:12:58	the way that we do research
0:13:00	at asks haven't to change that much we still want to do localization we still want to do tracking
0:13:05	we still want to extract to decide source from any
0:13:08	uh would be that noise or other tool "'cause"
0:13:11	um and then and then you a pass a new task is to try to learn something about the acoustic
0:13:16	environment
0:13:18	from uh a by inferring it from the multichannel signals that we can obtain with the microphone right
0:13:24	and this gives is a dish additional prior information on which we can condition estimation
0:13:30	um
0:13:31	know that it's you is what kind of microphone array should we use and how can we understand how it's
0:13:36	gonna behave
0:13:38	people started off perhaps looking at linear arrays
0:13:41	um
0:13:41	certainly extending it into play you and cylindrical and spherical even distributed or race that don't really have any geometry
0:13:48	three
0:13:50	and uh that's signed of such arrays including that's spacing
0:13:53	of microphone elements and the orientation uh uh is uh an important an expanding topic i think
0:13:59	people started off with linear arrays
0:14:01	um
0:14:02	a bunch of microphones in a line
0:14:04	perhaps uh this is a well-known i can mike from M H acoustics
0:14:08	uh thirty two sense on the surface of a rigid sphere a eight centimetres or so
0:14:13	of the little bar or tree prototypes products
0:14:17	the come now into real products you can buy
0:14:20	and uh connect your T V sets sky T V
0:14:23	as
0:14:24	uh the opportunity to include microphone arrays
0:14:27	for relatively low cost
0:14:28	uh such that you can communicate uh using your living room equipment
0:14:33	um
0:14:34	for a a very low cost
0:14:35	to
0:14:37	communications and hardware well
0:14:39	and the channel just here that you're probably sitting for me away from the microphone
0:14:44	so uh uh uh this is going to be i think a really hot application for us
0:14:49	in the future
0:14:52	interestingly uh people are still doing fundamental research so i'm pleased to see that and that he's a paper i
0:14:57	picked out uh
0:14:58	i can't say a random but it caught my eye
0:15:01	um he he's a problem given and the source is an M microphones
0:15:06	where should you put the microphone
0:15:09	and uh in this work which is some uh work i spotted from uh from the old about group
0:15:15	i given a planar microphone array
0:15:17	some analysis which enables one to predict
0:15:20	the directivity index obtained for different geometries and therefore obviously then allows optimisation
0:15:26	of those too much
0:15:29	okay so source separation is uh another hot topic and has been for a while
0:15:34	i thought i should say that's obviously trends
0:15:37	start somewhere
0:15:38	the trend
0:15:39	has to begin with the trend setter
0:15:42	and i put this photograph up of uh colin cherry
0:15:45	um simply because i think he used to have the office which is above my office now so
0:15:50	i also feel some kind of uh proximity effect
0:15:53	um
0:15:54	and uh his definition of the cocktail party in is nineteen fifties book on human communication has often is often
0:16:01	quite it's in people's papers
0:16:03	um and the early experiments were asking the question as to the behavior of listeners
0:16:08	when they were receiving to almost simultaneous signals
0:16:11	and uh
0:16:12	cool that the cocktail party
0:16:14	at the picture here i put it up on purpose because i don't think many people would really have a
0:16:19	good image of what a cocktail party was in nineteen fifty
0:16:25	and so i i guess it looks a bit different now a
0:16:29	but anyway
0:16:30	uh so
0:16:31	progress in this area has led us to be able to handle cases where we have both that i mean
0:16:36	and undeterred on to determine scenarios
0:16:39	i'm clustering has been a very effective technique
0:16:42	uh the permutation
0:16:44	uh problem
0:16:46	has been addressed uh with some great successes as well
0:16:49	and now we're starting to see results in the practical context where we have reverberation as well
0:16:56	the uh usual effect of reverberation is talked about in the context
0:17:00	um of dereverberation algorithms for speech enhancement
0:17:04	and uh this is something that i've uh myself tried to address
0:17:08	and uh perhaps we now at the stage where there is a push to take some of the
0:17:13	algorithms from the lab archery and start to roll them out into real world applications
0:17:19	that's will then learn whether they work or not
0:17:22	and uh we have to address the cases which are both single and channel case
0:17:27	uh often by using acoustic channel inversion if we can estimate acoustic channel
0:17:33	and although
0:17:35	this is all
0:17:35	a slight title speech enhancement of course reverberation
0:17:39	uh is widely used
0:17:41	both positively and has negative effects also in music so let's not lose sight of that
0:17:48	the other factor which i wanted to touch on here was seen
0:17:52	so
0:17:53	and interdisciplinary research is often a favourites modality
0:17:57	and did not community we can see some if it's coming from
0:18:01	cross fertilisation of different topic areas
0:18:04	for example
0:18:06	all of uh dereverberation reverberation and blind source separation
0:18:09	and we start to see papers where
0:18:11	these are jointly
0:18:13	uh uh uh addressed with some uh good leave each from both
0:18:17	uh but types of techniques
0:18:19	equally
0:18:20	speech for uh dereverberation reverberation coupled with speech recognition
0:18:25	where
0:18:26	a classical speech recognizer is in hans
0:18:29	uh such that it has knowledge of the models of clean speech but also
0:18:33	has models for the reverberation
0:18:36	and by combining these
0:18:37	is able to make a a big improvements in a word accuracy
0:18:45	so i want to talk a bit about a week or anything that i've been seeing or less two years
0:18:49	um
0:18:50	both in this community and an elsewhere but i thought i i'd and mention it here first and and
0:18:55	and that's about sparsity
0:18:56	um
0:18:57	and and no we're not talking about my here
0:19:00	um
0:19:03	the
0:19:03	first a i saw this um
0:19:05	was in the matching pursuit work that was presented here and ninety seven i think that was first done and
0:19:10	you know a signal processing
0:19:12	a transactions and ninety three
0:19:14	and um at the time i thought it was interesting but a dime idea
0:19:18	um
0:19:20	and so now i'm a crack myself
0:19:21	um but it's own up a number of resting places um in in the work we that has been done
0:19:27	um it i cast elsewhere
0:19:28	um compressed sensing a a a few years ago um was a proper the best example
0:19:33	um
0:19:34	but in in this community um
0:19:36	and we seen any can you know to sorry is still low just as deep belief network
0:19:41	um
0:19:42	sparsity D has been a big part of of the work that's been done on D of that works and
0:19:46	in machine learning
0:19:47	i think that's pen um you know sing
0:19:50	and
0:19:51	um in a lot of paper is that we saw this this year um
0:19:54	L one regularization is a way of of providing solutions that that makes sense
0:20:00	um
0:20:01	when you have a very um go over determined um very complex um basis set
0:20:06	and so i i
0:20:07	i i title this or a spouse a D uh but it's probably better described a sparsity
0:20:12	in combination with um over over complete basis sets
0:20:16	and i think that combinations and resting
0:20:18	oh one example of that um was talked about a little bit go
0:20:21	and session before this
0:20:22	um in the work by a um
0:20:24	i i new in cr
0:20:26	um using a cortical representation to um
0:20:30	um
0:20:31	to model sound
0:20:32	and
0:20:33	and courts is probably the original um
0:20:36	a sparse representation
0:20:37	um
0:20:38	it predates all of us
0:20:40	and and the idea is that you wanna represent sound with the least amount of of biological energy
0:20:46	and what seems work well there is to use bikes there are
0:20:49	represent of are very um
0:20:52	a a distinct sound atoms and how the top put together is still a matter discussion
0:20:56	but uh
0:20:57	i think is the been gone be you know sing
0:20:59	and the way a uh a new but and ch has been using that is two
0:21:03	take noisy speech and input if you these kind of um this very overcomplete complete basis set
0:21:09	and then
0:21:10	um
0:21:12	phil to it
0:21:13	you and in we regions
0:21:15	that that are
0:21:17	likely to contain speech
0:21:19	and so
0:21:20	in a sense
0:21:21	um it's a it's a wiener filter but it's in a very rich environment
0:21:25	where it's very easy to separate um speech from noise and things like that
0:21:28	and what's on the bottom is is noisy speech the kind of feel to that makes sense for speech
0:21:32	which for example has a a lot of energy rather forwards modulation rate
0:21:36	and then the clean clean speech on uh on the op
0:21:40	um
0:21:40	the deep belief networks are are you know thing um i think um for similar reason this all ties together
0:21:46	um
0:21:46	was shown in the left hand side it is um
0:21:49	um
0:21:50	is a little bit of a waveform that's been applied to a a
0:21:54	a restricted boltzmann scene
0:21:56	which is just a way of saying that they have a their legal learn weight matrix
0:21:59	the transforms the input
0:22:01	on the bottom here
0:22:03	to an output
0:22:04	uh so on top there
0:22:05	few um a a a a make a weight matrix
0:22:08	and is a what little bit of a nonlinear you there
0:22:11	in a can learn these things in a way that um
0:22:14	um
0:22:16	can we construct input so find too
0:22:18	find a basis vectors um on the side what where is that by the way picks vector X
0:22:23	so that give "'em" of these guys they can we construct the the visible units it sorry
0:22:28	um
0:22:28	and these are some they been doing this for image processing domain for a long time
0:22:32	and these are some results
0:22:33	in the waveform domain there are there are new this year
0:22:36	and there's a bunch of thing um things that often look like um
0:22:40	uh gabor is a very sizes
0:22:42	but the one thing as an or things you have to see some very complex features so this in the
0:22:46	fixed a domain
0:22:47	and you got these things that have to frequency P
0:22:49	which you know might be akin to formants
0:22:52	um
0:22:53	and so they will applying that to to speech recognition and i think that's in sing direction
0:22:58	i'm gonna limb here because um
0:23:00	i think the reason that um
0:23:02	suppose C D's important
0:23:04	is it because it gives this a way of of representing things that we can't do with that we can't
0:23:08	do was well in other domains
0:23:10	so we have grew up with the voice transform domain and what's on an and a left can side at
0:23:14	two basis functions
0:23:15	is one a basis to just to frequencies
0:23:18	and with those two basis functions you can represent the entire subspace space
0:23:22	so that point that's shown there to be anyone that subspace and and you can do all those things
0:23:26	and it's a very which representation is a as we all know
0:23:29	you know as is a satisfy the nyquist criteria you can you can do anything
0:23:33	but
0:23:34	i think that's the problem with
0:23:35	with
0:23:36	a dense representation like that
0:23:37	and alternative is to you is you look at something like an overcomplete bases
0:23:41	and and just pick out elements at you've seen before
0:23:44	so you you just as some synthetic formants
0:23:47	but the way i like to think about these things working is that
0:23:50	if you train um if you if you build a system that that it exploits um sparseness
0:23:55	whether but belief network whether be matching pursuit
0:23:58	um whatever your favourite implementation technology as
0:24:01	you can learn patterns that look like these formants and so what's on the left is is one of all
0:24:06	with different vocal tract lang
0:24:08	and uh on the second and a and the right hand side as a different valid different vocal tract length
0:24:13	and
0:24:15	the system on the right with a sparse overcomplete representation is just gonna learn these kinds of things
0:24:20	it's goal balls with different vocal tract length
0:24:22	it's not colour need entire space
0:24:24	and so that if you wanna process things
0:24:26	if you working in this space
0:24:28	then only things that are valid sound sounds it you seen before
0:24:31	will be represented by the sparse basis fact
0:24:33	but a basis that
0:24:34	and it can do
0:24:35	yeah useful things and so i think that's where it's can be an important trend in a port direction for
0:24:39	unity
0:24:44	so one of the things we wanted to do is to get out to different sectors of a a topic
0:24:48	area and uh put in some uh i hopefully interesting quotations from
0:24:53	uh i just in those field so
0:24:55	and he's one that comes from um
0:24:58	from T T so he we have telecommunications company
0:25:01	uh thank you for uh to here here not at E
0:25:04	for this code remaining challenges in source separation
0:25:08	could include blind source separation for an unknown or dynamic
0:25:12	number of source
0:25:14	it is that i artificially officially in it's cherry jerry chair uh a photograph on the wall of the large
0:25:22	uh into the E how what areas so if we think about mixed signal I sees
0:25:27	uh the the guys at the working on those uh
0:25:31	functionalities
0:25:32	really support what we want to do
0:25:34	uh so i think that that's important to to listen to the heart guys as well
0:25:39	so from uh we'll so micro electronics
0:25:41	uh most lower is driving dsp P speed and memory compacity and they billing implementation of sophisticated dsp functions
0:25:49	resulting from me is of research
0:25:51	the end user experience
0:25:53	uh maybe this is a which rather than the reality of the moment
0:25:56	the end user experience is one of natural white and voice communications devoid
0:26:01	of acoustic background noise and unwanted artifacts
0:26:04	seems to me like the hardware manufacturers are on our side
0:26:09	um um we had uh a little bit this morning about the uh X box connect
0:26:13	uh you found a have
0:26:15	thanks
0:26:15	for this uh a contribution here of the applications of sound capture and enhancement and processing technologies shift
0:26:23	oh he's a paradigm shift
0:26:24	shift gradually from communications
0:26:28	which is where they
0:26:29	where region eight isn't half the home
0:26:31	mostly a towards mostly recognition and building natural human-machine interface
0:26:38	uh and he highlights mobile devices
0:26:41	"'cause" and living rooms
0:26:42	i key application at
0:26:45	malcolm you get the last word
0:26:46	well i i don't the last word but but we we have one more slide and we can decide whether
0:26:50	this is the last word from
0:26:51	i'm steve jobs or from with a ga got a
0:26:54	but in either case the message is same and this large commercial applications for the work that we're doing
0:26:59	it started with um M P three which enable this market
0:27:03	but this still a lot of things we done in terms of finding music
0:27:06	um
0:27:07	adding adding to things um understanding
0:27:09	what what people's a team a needs are so we really haven't talked but that very much
0:27:12	but
0:27:13	um
0:27:14	this is an information but this does not information retrieval task you know people looking for things that are chain
0:27:18	themselves some whether be songs or or or or or music or whatever
0:27:22	um i'm you signals and and working with them is an important thing to do
0:27:25	and so
0:27:26	um i think both lately got got and see jobs can have a final word
0:27:30	so thank you
0:27:39	so
0:27:40	thank you
0:27:41	my come and
0:27:42	patrick rate
0:27:43	a now we have very little time for discussion but we certainly should not miss this up you need T
0:27:49	to hear other the voices as well as that we mentioned
0:27:52	obviously these views are not completely balance
0:27:56	how could it they be
0:27:58	so maybe somebody in the for a would like to add some but
0:28:01	something and we can
0:28:03	a we have a little discussion on more
0:28:06	anybody
0:28:08	yeah
0:28:13	a thank you for that great summary
0:28:15	uh i just want to add one more thing i think up
0:28:18	we have to a isn't two years and the work together
0:28:21	and i think cross model issues are
0:28:24	a likely to be very important the
0:28:27	i eyes did act that you has and the years detect the eyes and so on and
0:28:30	likewise i think uh audition audio research and B vision suck should not
0:28:35	proceed separately
0:28:37	thanks
0:28:38	the money for this comment uh
0:28:41	this is certainly something which we highly appreciate and we always like to be in touch with the
0:28:46	multimedia guys would don C uh audio as a media
0:28:50	um
0:28:51	but uh uh
0:28:53	certainly we uh there are many applications where we actually closely working
0:28:58	with with your persons just think about
0:29:01	uh celeste tracking
0:29:03	so if you want to track some acoustic sources
0:29:06	and the source a silent then you're a the uh you better use you camera
0:29:11	so they are
0:29:12	a quite a few applications with this is quite natural to joint for
0:29:19	i i you know just a
0:29:21	to reinforce that there was a nice people saw us to remember who did it
0:29:24	with their looking for joint source
0:29:26	joint audiovisual sources and i think that's
0:29:29	it's important and
0:29:30	it can be easier i mean
0:29:31	the signals are no longer a big deal
0:29:34	so it's easy to get to the space commuter power is pretty easy
0:29:37	it would be fun
0:29:42	followed that uh people have to
0:29:44	okay follow that talks about four years
0:29:47	uh is there any research uh
0:29:49	well i use a pen binaural a single person sinful
0:29:53	binaural uh for musical signal processing
0:29:59	i don't i don't heat so the question was whether is any binaural music research um
0:30:03	i don't know of any i mean people certainly worry about um synthesizing um hi
0:30:08	um high fidelity sound fields
0:30:11	so um
0:30:13	um
0:30:14	the fun of a group for example from working on on synthesizing
0:30:17	you know sound field a sound good no matter where you are
0:30:20	and and so you know work with people stand for
0:30:22	where various in in computing in in creating three D sound fields
0:30:26	for musical experiences
0:30:28	um
0:30:29	um but i much or where X i go yeah
0:30:33	i mean i i i if you'd S be ten use you whether we have five point one speakers in
0:30:36	the living room
0:30:37	i was set no
0:30:38	but
0:30:39	look what's happened
0:30:40	so we we better
0:30:46	else before lunch
0:30:52	okay you talked about uh five point ones because the living room but um
0:30:56	or thing a lot of new algorithms that a little do uh microphone array processing
0:31:01	well would be saying devices that let us do it
0:31:03	i mean like soft connect has a a a a few microphones i've seen a few um
0:31:08	cell phones that have multiple microphones on for noise cancellation will have more devices allow us to
0:31:14	a better processing algorithm
0:31:16	yeah so the question was what what we have devices that will have uh
0:31:19	uh the ability to allow us to implement
0:31:23	yeah
0:31:24	so i P eyes
0:31:25	so on so forth
0:31:26	i i i understand from this morning talks that day be a a um a guys will be a software
0:31:30	development kits will be available for connect
0:31:32	um and that could be a lot of fun
0:31:34	um i think uh the hardware is that to enable us to do it and
0:31:38	the key point at of this i think is one of the trends that uh
0:31:43	uh we use C which is a move
0:31:46	in audio from single to multichannel
0:31:48	that's been happening for a while and that is their sign of its stopping
0:31:52	as so the of we would expect the facilities
0:31:54	uh the processing power
0:31:56	the uh inter operability and software development kits to come with that as well
0:32:05	near the question
0:32:07	comments
0:32:09	i have one uh
0:32:10	final remark which came mark
0:32:13	increasingly uh
0:32:15	and that would like to put that as a channel a challenge because
0:32:18	uh they're sensor networks are out there and they are
0:32:21	uh in discussion on
0:32:24	in many papers where a nice uh
0:32:28	algorithms are provided all ways based on the assumption that all the senses are synchronise
0:32:35	um
0:32:36	this is a
0:32:37	tough problem actually so
0:32:39	and we feel in the audio community we could a
0:32:43	if a lot if somebody could really built devices which make sure that all the audio front ends in
0:32:49	distributed to beauty work
0:32:51	synchrony a the synchronise
0:32:53	uh the underlying problem is simply the
0:32:57	once you
0:32:58	correlates signals of different senses that
0:33:01	um have
0:33:03	not exactly synchronous clocks
0:33:06	the what uh this
0:33:08	correlation
0:33:09	will fall apart
0:33:11	and
0:33:11	just look at all your optimize nation and all the adaptive filtering stuff that we have
0:33:16	it's always based on correlation and
0:33:18	even higher orders the
0:33:20	but then uh
0:33:22	this problem has to be solved
0:33:24	and so if you want to do something really
0:33:27	uh a good for us then please solve this problem
0:33:32	as a have after once
0:33:34	after lunch okay
0:33:36	thank you were much for attending

Trends in Audio and Acoustic Signal Processing

Expert Sessions

Přednášející: Walter Kellermann, Malcolm G. Slaney, Patrick A. Naylor, Autoři: Walter Kellermann, Malcolm G. Slaney, Patrick A. Naylor