Speech Transcript - BROADBAND DIRECTION ESTIMATION METHOD UTILIZING COMBINED PRESSURE AND ENERGY GRADIENTS FROM OPTIMIZED MICROPHONE ARRAY

0:00:13	i build your presentation about rope and direction estimation method
0:00:18	using component pressure and energy gradients
0:00:21	and
0:00:22	this work has been a with
0:00:23	use that we a cue from
0:00:25	although universe the also
0:00:29	okay well here is to outline
0:00:31	my presentation
0:00:33	first uh short introduction to this topic
0:00:36	and then some background
0:00:38	about the direction estimation
0:00:41	mean G chi i don't analysis
0:00:43	and also uh
0:00:45	i will present the microphone a right which probably used mike the
0:00:48	that's signals for for this and now alice
0:00:52	and then uh
0:00:53	i will present the
0:00:54	this big method
0:00:56	for direction estimation
0:00:58	this this come from the rich
0:01:00	pressure and energy gradients
0:01:03	and also the microphone error rate which is optimized for this method
0:01:08	and uh then some evaluations and
0:01:11	one of the summer of this presentation
0:01:15	ah
0:01:15	well
0:01:17	T estimation of direction
0:01:19	well it S of or or per pulse in several applications
0:01:23	like a a source local station and beamforming
0:01:27	and uh also in in uh
0:01:30	this got of parametric spatial audio coding methods
0:01:34	and that there's a huge
0:01:36	or a large scale of
0:01:38	you for that's this estimate direction
0:01:41	like in music
0:01:42	and it's breed
0:01:43	it's cetera there are
0:01:44	but here we are concentrating do
0:01:47	the direct uh
0:01:48	these sound in this the based
0:01:50	may that's
0:01:52	so
0:01:52	we are using that's for
0:01:54	for direction estimation
0:01:57	and that
0:01:58	this kind of approach
0:02:00	has been used
0:02:02	with the directional audio coding
0:02:04	which is sir
0:02:06	technique for recording and a repair routing spatial sound
0:02:11	and a whole
0:02:13	here in this figure you can see
0:02:15	one the application
0:02:17	teleconferencing
0:02:19	where we have a
0:02:20	some remote location
0:02:22	there are some
0:02:23	some twelve or send
0:02:24	microphone array which kept to the sound and
0:02:27	and then we do some
0:02:29	some encoding and decoding
0:02:31	and that
0:02:32	then we should have a somehow
0:02:34	spatialised telecon for from the other end
0:02:38	"'kay"
0:02:42	uh
0:02:43	so um this uh
0:02:45	noted noted and i'll analysis is based on the sound in those vectors so
0:02:50	which uh it
0:02:52	which are uh represent the direction and magnitude of the
0:02:55	that's flow of sound energy
0:02:58	and uh
0:02:59	this uh vectors are
0:03:01	are computed as a
0:03:03	pressure at times particle well velocity in one point of sound field
0:03:08	and uh
0:03:09	oh the direction of the rival
0:03:11	it's of obtain it
0:03:13	a simply bleep taking an ops of a side
0:03:16	opposite direction of the
0:03:17	so sound to the vector
0:03:20	and um you know or applications
0:03:23	a related to do do you arc we have used to
0:03:26	be format microphone signals
0:03:28	in this analysis
0:03:30	so this
0:03:31	signals consist of
0:03:34	of one omnidirectional signal on and three
0:03:37	three die was four
0:03:38	for X Y and chit
0:03:41	directions
0:03:42	so these type they
0:03:44	the approximate the
0:03:46	the body go well all C D's
0:03:50	and uh
0:03:54	uh
0:03:54	instead of using a
0:03:56	for instance sound field microphone
0:03:59	or
0:03:59	another
0:04:01	and the microphones for for be form microphone signals we have
0:04:04	you have been used this kind of
0:04:06	uh
0:04:08	microphone a rate of
0:04:09	or four
0:04:10	only direct sum microphones
0:04:13	which are placed close to one another
0:04:15	and up
0:04:17	uh
0:04:18	this a horizontal be format signals can be derived it
0:04:21	from this this kind of error rate
0:04:24	and uh the idols or
0:04:26	computed just type biting you known
0:04:28	taking a breast a gradient
0:04:30	from opposing microphones
0:04:32	so X type of the wide of are just
0:04:35	you one want direct the would
0:04:37	each two and
0:04:38	and so on
0:04:40	and uh
0:04:41	well
0:04:42	and this
0:04:43	W signal this only direct something lights
0:04:45	just a and number eight over
0:04:47	or microphone signals here
0:04:51	but the unfortunately we have some
0:04:53	problems with this
0:04:55	this kind of error rate
0:04:57	when creating those those type
0:05:00	goes at high frequencies
0:05:02	uh this type was so deformed
0:05:04	because of the spatial
0:05:06	and that
0:05:08	well this
0:05:10	this uh
0:05:10	the spatial realising frequency
0:05:13	here
0:05:14	if the depends on the and the distance between opposing microphones
0:05:19	and here i have a a well that
0:05:22	three different
0:05:23	three
0:05:24	three figures for three different erase
0:05:27	so uh
0:05:30	this
0:05:31	oh
0:05:32	first one here
0:05:34	well this is for
0:05:36	yeah or with the one centimetre be distance
0:05:39	and uh
0:05:40	well it it produce quite a it die balls
0:05:43	that
0:05:44	or or frequencies here
0:05:46	but when we increase the distance between microphones and centre
0:05:51	will be some problems at high frequencies here
0:05:54	and here
0:05:55	so these are not bibles anymore
0:05:59	and up
0:06:01	obviously
0:06:02	this
0:06:03	has some influence on
0:06:05	on a direction estimation
0:06:08	so at high frequencies and up
0:06:11	uh here is the
0:06:12	this a direction
0:06:14	well the the estimation error here
0:06:16	it's express it does uh root mean square error or here in this figure
0:06:22	a a function of frequency
0:06:23	so
0:06:25	at high frequencies
0:06:26	after this or specialising frequencies
0:06:29	this uh
0:06:30	yeah or is quite
0:06:32	a huge
0:06:33	and uh
0:06:35	and on the other hand a low frequencies
0:06:38	also depending on the distance between microphones we have some
0:06:41	some estimation error because of the inter
0:06:44	in a no of the microphones
0:06:46	and uh
0:06:49	and basically we can estimate the direction reliable
0:06:52	only within a
0:06:53	so then frequency window
0:06:59	um um
0:07:02	so um
0:07:03	in this work
0:07:05	we are proposing to use um
0:07:08	this kind of a array
0:07:10	which uh a consist of four
0:07:13	four omnidirectional microphones with
0:07:16	relatively large housing so
0:07:18	so they are shallow so i won't have high frequencies
0:07:21	and this
0:07:22	this provides us some
0:07:24	some in microphone level differences
0:07:27	so
0:07:28	uh this uh microphones are are run such that there one axis directions are pointing to the
0:07:34	a post side directions
0:07:36	in in this microphone pairs pairs here
0:07:40	and uh
0:07:42	well
0:07:43	this perhaps just so just a sure rest of how
0:07:46	how this uh
0:07:48	as sound this shot what and uh at rated
0:07:51	because of the chateau one so in this direction
0:07:54	directional patterns here
0:07:56	and these are for two two different microphones this left one is four
0:08:00	eight K G microphone
0:08:02	which is larger done this another one this grass microphone here
0:08:06	but anyway we can see that
0:08:08	these are not
0:08:09	on directional direction anymore at high frequencies so
0:08:13	and uh so
0:08:15	uh
0:08:16	this
0:08:17	this effect
0:08:18	this you the lies here
0:08:20	here with
0:08:21	direction estimation then
0:08:26	and up
0:08:28	so um
0:08:30	um
0:08:31	for for estimating direction
0:08:33	we are
0:08:34	proposing to use
0:08:36	uh or or computing the energy gradients between those microphones are high frequency so
0:08:42	it's up
0:08:44	just computing the
0:08:46	the
0:08:47	subtraction between power spectrum of the microphones
0:08:51	as that
0:08:52	we are
0:08:53	we are approximating sound in directly with this
0:08:55	with this up action subtraction here
0:08:59	and up
0:09:00	it
0:09:01	produce that's
0:09:02	this kind of
0:09:03	type will direct direct is for
0:09:07	for a
0:09:08	for this
0:09:10	approximate a approximated in this to the vectors here
0:09:13	and uh we are using directly D
0:09:16	these for direction estimation at high frequencies
0:09:21	and up
0:09:22	well
0:09:23	but on the other hand we don't have any
0:09:25	any uh
0:09:27	major or
0:09:28	or we don't have a prominent
0:09:30	inter michael level difference is that low frequency so there we use
0:09:34	use just very shall make that for for computing first
0:09:37	pressure gradient and then
0:09:39	then in the C the vectors from them
0:09:42	so this is somehow
0:09:44	combination between impression that in you gradients
0:09:50	uh
0:09:52	okay well i
0:09:54	uh
0:09:55	then another
0:09:57	i don't topic in his presentation was to you
0:10:00	uh optimize
0:10:01	microphone a rate for this
0:10:03	it computation
0:10:05	so the idea here is to
0:10:07	knots
0:10:09	this a spatial i freak ones with the
0:10:12	frequency limit for using the energy gradient
0:10:16	and uh
0:10:17	so as i mentioned this
0:10:20	this is a i frequency it's depends on the
0:10:24	inter microphone distances
0:10:26	and uh
0:10:28	frequency lee for for in into gradients it's depends when the dive faq "'em" size of the microphone
0:10:34	and uh
0:10:36	here this um
0:10:38	a a we no effect four omnidirectional microphone it's uh
0:10:42	speech described with the
0:10:43	directivity index
0:10:45	which is a
0:10:47	ray sure
0:10:48	between uh on axis energy
0:10:51	and a total
0:10:53	so energy we just integrated over all directions
0:10:57	of this
0:10:58	you all some
0:11:00	some idea about this direct sum no use of the
0:11:03	omnidirectional microphone that high frequencies
0:11:06	and uh on the other hand this uh a direct to be index it's
0:11:10	depends on the ratio
0:11:12	of uh between a die fry came circle for Ms
0:11:16	and wavelength
0:11:18	well this K A
0:11:19	factor it's
0:11:20	three that's this ratio
0:11:23	and uh
0:11:25	and uh after
0:11:26	and would this
0:11:29	and with this direct T V index and a K a factor we get this kind of
0:11:35	this kind of gore
0:11:37	for omnidirectional directional microphone
0:11:39	so this this represent this uh directive the index
0:11:42	as a function of
0:11:43	K A
0:11:46	and uh finally
0:11:47	we can compute the
0:11:49	optimist distance between microphones
0:11:53	with this
0:11:54	formal here
0:11:56	so
0:11:57	basically we just a
0:11:59	defined that how much we want this up
0:12:02	to use this data shadow a we affect here
0:12:05	uh
0:12:07	so we just
0:12:08	choose one
0:12:09	some
0:12:10	that are direct to be index value here
0:12:13	and then it we take the corresponding K of well oh here and then compute
0:12:17	the distance
0:12:21	okay well are
0:12:24	then some evaluations
0:12:25	uh this were
0:12:27	conducted in
0:12:28	and a and i a chamber
0:12:30	on the measurements were done in and the chamber
0:12:33	and that
0:12:34	using a
0:12:35	a K G microphones
0:12:37	for a gauge you microphones with
0:12:39	i for i come of
0:12:40	two point one centimetres
0:12:43	and that this results in a spacing of
0:12:46	three point three centimetres for for this error rate
0:12:50	and that
0:12:51	also using grass microphone error right
0:12:54	which has a more
0:12:56	small die for kim size then
0:12:58	this a K G microphone
0:13:01	and uh again we have a
0:13:03	this uh
0:13:04	estimation error
0:13:06	expressed
0:13:07	as a
0:13:08	root mean square or here
0:13:10	so um
0:13:12	this results this solid line
0:13:15	this is for
0:13:17	for using this uh the additional method using those rest gradient only
0:13:22	and that
0:13:23	well
0:13:24	well as you can see that
0:13:26	at high frequencies to zero or is quite
0:13:28	it's very significant
0:13:31	after just the spatial lies and frequency
0:13:34	but
0:13:37	but this energy gradient they produce the it's produce very nice nice estimation for us
0:13:43	and uh
0:13:44	and using a combination
0:13:46	of these different
0:13:48	radiance
0:13:49	we get
0:13:50	somehow
0:13:52	uh
0:13:52	reliable estimation for all
0:13:54	for
0:13:55	for entire
0:13:56	what audio frequency range here
0:13:59	and the same with this
0:14:00	grass microphone array right
0:14:02	you
0:14:06	uh
0:14:08	so um
0:14:09	yeah the summary of my
0:14:11	my presentation
0:14:13	so uh
0:14:15	so the basic idea was to
0:14:17	to improve
0:14:18	T and now this is which is
0:14:20	the direction estimation
0:14:22	from the from using this a square microphone error
0:14:26	and uh
0:14:27	see improvement
0:14:28	has
0:14:29	actually it by using a
0:14:32	using the shot of the microphones and this make method
0:14:37	and that
0:14:39	and also it was shown that this
0:14:41	optimized
0:14:42	microphone from rate it's works with this spec method
0:14:48	okay well
0:14:49	thank you
0:14:50	i Q and have time
0:14:56	the question
0:15:04	i i i think about
0:15:05	a work the way
0:15:07	which
0:15:09	and
0:15:10	right
0:15:12	i
0:15:12	were
0:15:13	have you ever cut the experiments i mean pressed of reverberation
0:15:17	you should the experiments results any quick humour
0:15:21	no
0:15:22	but
0:15:25	oh right
0:15:26	do you have any uh yeah experiments
0:15:29	uh a and the experiments in in way easy or or really environments
0:15:33	oh yeah yeah yeah i have i have a i have a
0:15:36	yeah tried this with this a a teleconferencing application
0:15:40	and uh
0:15:41	you
0:15:42	well
0:15:43	it works
0:15:43	nice
0:15:45	this in
0:15:46	well i i i'm this in our experiment of that have we have used it in or more room and
0:15:50	also some
0:15:52	much environments
0:15:53	so yeah
0:15:57	i and more questions
0:16:02	okay thank you very
0:16:03	you

BROADBAND DIRECTION ESTIMATION METHOD UTILIZING COMBINED PRESSURE AND ENERGY GRADIENTS FROM OPTIMIZED MICROPHONE ARRAY

Microphone Array Signal Processing

Presented by: Jukka Ahonen, Author(s): Jukka Ahonen, Ville Pulkki, Aalto University, Finland