Speech Transcript - FREQUENCY SELECTIVE PITCH TRANSPOSITION OF AUDIO SIGNALS

0:00:16	thank you very much
0:00:17	and a um it's you also see yeah i'm not the first author of this paper
0:00:22	but in our case i must say for to T such that this cannot be here
0:00:26	to date because see his family has been in by a second door to two weeks ago so he can't
0:00:31	be here
0:00:33	um
0:00:33	and i'm working at the international audio lab or to recent a in which is that
0:00:37	joint institution
0:00:39	of
0:00:39	um the university of a you know back
0:00:42	and the problem of a institute for integrated circuits
0:00:48	what's the motivation for the work i'm going to present here
0:00:51	is that you often do in music production use a lie on
0:00:55	mixing prerecorded material
0:00:58	samples
0:00:59	and um you also need to at that these samples frequently two
0:01:04	different to musical context
0:01:06	then
0:01:07	the context they were recorded in
0:01:09	so in in some cases you might need to key mode conversion
0:01:13	this means major to minor or vice versa
0:01:16	and
0:01:17	they the a
0:01:18	algorithm for four
0:01:20	enabling this task
0:01:22	as been presented
0:01:23	um in
0:01:24	previous conferences
0:01:26	this this is called mode clock modulation vocoder
0:01:30	um it's some what's you to put to this task
0:01:33	but um we also found out that device
0:01:36	special enhancements necessary
0:01:38	in order to address
0:01:40	special requirements for this application
0:01:46	so i first want to um give a short overview on this model walk
0:01:51	accuracy
0:01:52	which performs the single pass and is
0:01:55	in a block wise processing
0:01:57	which is shown in in a block diagrams here
0:02:01	it does
0:02:01	first
0:02:02	uh signal adaptive band-pass filtering
0:02:05	which is aligned with spectral center centres of gravity
0:02:09	means we first of the um if T analysis
0:02:15	yeah
0:02:15	a dft analysis
0:02:17	and from the dft spectra
0:02:19	the um centres of
0:02:21	gravity
0:02:22	in perceptually adjusted then
0:02:25	uh determined in the band it's uh just it's so they are this decomposition is flexible
0:02:31	so from these centres
0:02:32	center frequencies
0:02:34	um and the around centre center frequencies to construct a bandpass filters
0:02:38	and
0:02:39	i in the yeah
0:02:40	done in the frequency domain
0:02:41	and in inverse
0:02:43	uh
0:02:44	dft T
0:02:45	get back for each bandpass signal a
0:02:47	to a time domain signal
0:02:49	and um this time domain signal
0:02:52	bandpass signal
0:02:53	is then and lies with and am and fm
0:02:56	and that this this
0:02:57	so you basically you have the carrier frequency which corresponds to the centre of gravity of this special frequency reach
0:03:04	and
0:03:05	a uh the F signal which gives the um
0:03:08	instantaneous frequency offset
0:03:11	quite um relative to this carrier of frequency
0:03:14	and you get um get the instantaneous make me to do M P chewed in the A M component
0:03:20	and then you can close to the signal in this modulation domain
0:03:24	for example you can change the carrier frequencies
0:03:27	and still maintain that uh fine temporal structure
0:03:31	um by keeping the A M and the F
0:03:34	it's
0:03:35	um
0:03:35	in the synthesis
0:03:36	you have to combine the a if M component with the maybe mode you modified
0:03:42	um carrier frequency
0:03:43	you have to
0:03:44	somehow one the different
0:03:46	um components from button block to the next block because it's tempered blocks sets it before
0:03:51	um
0:03:53	or just and yeah
0:03:54	and
0:03:55	um you to uh
0:03:56	and overlap it
0:03:57	processing of the am and the F M
0:04:00	or frequent instantaneous frequency
0:04:02	signals in order to get continuous
0:04:05	um parameter
0:04:06	and then you two
0:04:07	the synthesis
0:04:08	and at up
0:04:10	um all the sickness from the different bands you had
0:04:13	decompose the signal into four
0:04:17	so this is the basic structure of the modulation
0:04:19	well coder
0:04:20	but do you to the structure with the relatively
0:04:23	long blocks in the dft analysis
0:04:26	you still the um miss some of the um
0:04:30	signal
0:04:31	uh
0:04:32	characteristics by this processing
0:04:35	um and this is
0:04:36	one of the parts we we address by the enhancement
0:04:40	and this
0:04:40	the first of these enhancement was the so-called envelope shaping
0:04:44	i means
0:04:45	temporal envelopes of with in
0:04:47	the uh
0:04:49	dft blocks
0:04:50	might got get um lost or distorted
0:04:54	because you um can lose the
0:04:57	um this uh to to dispersed and you can
0:05:00	whose face
0:05:01	a relations between the different tone
0:05:04	and
0:05:05	this would could cost the temporal smearing of transients
0:05:08	and in this case it's better to use
0:05:11	then explicit
0:05:12	a temporal envelope
0:05:14	and you get access to the parameters of these
0:05:16	um of this temporal in below
0:05:18	but doing an lpc analysis in the frequency domain
0:05:21	because correlation in the frequency domain
0:05:24	corresponds to multiplication in the time domain
0:05:27	this means with it at coefficient
0:05:30	you get from from lpc analysis
0:05:33	along the frequency axis
0:05:34	you get parameters
0:05:35	you can use for a um getting
0:05:38	and
0:05:39	time function
0:05:40	you could could say at time response
0:05:42	yeah but you can
0:05:43	then it at the end
0:05:45	a might apply to to get back the temporal and middle
0:05:49	this what is done
0:05:50	with this
0:05:51	read looks
0:05:52	these are the um
0:05:53	enhancements what the
0:05:55	envelopes
0:05:59	um
0:06:00	in other
0:06:01	enhancement
0:06:02	the enhancement which is necessary
0:06:04	once you um start modifying spectra components
0:06:08	is
0:06:09	that you have to take into account
0:06:10	that
0:06:11	um
0:06:12	music a sounds are not normally consisting of a fundamental into a lot of harmonics the tone
0:06:18	and um you should keep this in mind when you modify frequencies
0:06:24	so the overtones tones are um quasi harmonic on uh the yeah frequency scale
0:06:31	which are you normally integer multiples of the fundamental frequency on you team integer multiples
0:06:38	um on the other hand to musical intervals are based on a logarithmic scale
0:06:43	and um now it's
0:06:45	a question
0:06:46	when you modify frequencies in which way you should modify them
0:06:51	um or and of course we want to modify them in the the based way for for the
0:06:56	a for what we intend to to for example for the transcription
0:07:00	and we have to consider a
0:07:02	this
0:07:03	because if it's a five it if it's an over of one fundamental to
0:07:07	frequency you which have to modified in accordance with the fundamental and not according to the musical scale
0:07:13	the the um if it would be and uh signal toll
0:07:17	on and that and then other um
0:07:19	part of the of the um skater
0:07:23	so yeah in this leads to
0:07:25	some kind of ambiguity when you get one told in just look
0:07:29	um
0:07:29	look at it on its own
0:07:31	so that's why we have to um get some addition interpretation
0:07:35	to find out whether it's uh
0:07:37	fundamental frequency
0:07:39	are if it's an overtone or uh a harmonic component of uh
0:07:42	a more complex sound structure
0:07:47	this is just an example
0:07:48	of um how in pulse of this key is uh
0:07:52	can match the
0:07:54	um how morning
0:07:56	and um just one example of uh to pick out
0:08:00	could be the number five which is
0:08:02	five times the
0:08:03	uh a fundamental frequency of one to alone
0:08:06	could be also
0:08:08	um and now that in which is a major it
0:08:10	a parts
0:08:11	am
0:08:12	in this in this diagram that the at might of of tapes and not taking into account so
0:08:17	so we you can have
0:08:18	um
0:08:19	some ambiguities between
0:08:21	a
0:08:21	second and also the for um
0:08:24	harmonic
0:08:25	which would then be just put of
0:08:27	op tapes and so on
0:08:28	so that's why you get
0:08:30	kind of an be treaty with um over to ones
0:08:33	and
0:08:34	music scores
0:08:37	and that's why this
0:08:38	second enhancement at been added to model clock
0:08:41	which is so that hmmm
0:08:42	which is called harmonic locking
0:08:45	so um is a set before the to estimated fundamental as
0:08:49	have to be mapped directory
0:08:51	and then you have to um decide for a the components
0:08:55	if it's a
0:08:57	um
0:08:57	oh but
0:08:58	then it has to be lot to the
0:09:01	transposition of its fundamental
0:09:04	just an the processing yeah
0:09:06	you decide um for money told if it's
0:09:09	um not
0:09:10	to another
0:09:11	frequency of bits
0:09:12	as be transposed on it's all
0:09:14	and by this which
0:09:16	yeah um just on either it transposition
0:09:18	of them G D node based mapping which is done for the fundamental frequency
0:09:23	yeah are it
0:09:24	um
0:09:25	done a transpose according to the to its fundamental
0:09:28	if it
0:09:29	if it's locked as up apply
0:09:31	uh indication here
0:09:33	it's not
0:09:34	non locked
0:09:35	then it's is locked in to test to be looked to the fundamental frequency and its map
0:09:42	now we come to the um listening test
0:09:45	methodology
0:09:47	it's a to
0:09:48	a difficult task if you to um
0:09:50	this kind of transcription
0:09:52	so we uh selected
0:09:55	me D samples
0:09:56	which we first at in the original domain
0:09:59	and we did
0:10:00	me transcription to obtain
0:10:03	um five which we could then yeah put into the test
0:10:06	so these but it is uh transcribe
0:10:09	um
0:10:10	reference signal which is done by T
0:10:13	and then uh transfer to a bay five
0:10:16	and on the other hand hand we get the original wave file
0:10:19	and be processed it um
0:10:21	to to with the transcription and then we can compare the to
0:10:25	and we have
0:10:26	different versions
0:10:28	three versions of of the more folk and one reference
0:10:32	transcription
0:10:33	system
0:10:34	job
0:10:35	also present
0:10:36	yeah
0:10:37	um there's one commercial system available which is the direct note excess in the middle line at each up by
0:10:43	a mini
0:10:45	and this is available since autumn
0:10:47	when a two thousand and nine
0:10:49	and it also allows
0:10:50	selective editing eating of polyphonic music
0:10:53	but it performs a multi-pass pass analysis
0:10:56	and it doesn't automatic decomposition into notes and um
0:11:00	a heuristic classification rule
0:11:03	but it also can be used to perform this scheme mode
0:11:06	clean key mode conversion
0:11:07	and so that's why we also try to um compare our
0:11:11	approach with this one
0:11:15	these are the the um items we used
0:11:18	um problem with to P a project we use some different signals
0:11:23	and
0:11:23	different midi files
0:11:24	is the set before
0:11:26	trash shown here
0:11:27	and this B
0:11:28	try to get some variety of more complex
0:11:31	orchestral music
0:11:33	and some more um solo instrument
0:11:36	hearts
0:11:37	so cup quite a mixture of
0:11:39	complexity of of
0:11:41	um content
0:11:44	these were the results of "'em"
0:11:46	so called mass for a test that we don't want to go too much into detail
0:11:50	in this test we have a a um
0:11:52	normally you hidden reference
0:11:54	is
0:11:55	um
0:11:56	i don't you know to by one
0:11:57	we have um
0:11:59	uh
0:11:59	so quite reference which is just uh
0:12:02	low-pass pass filtered signal which just numb do you know to by number two
0:12:06	and we have the more work the origin and what block
0:12:09	the more rock um is number three what work with the harmonic locking is for
0:12:14	and mark work with the a harmonic locking and D um
0:12:17	envelope shaping
0:12:19	it's
0:12:19	five and six is the the N A you the rate um
0:12:24	this system be compared to
0:12:26	um but not first we want to see how um
0:12:28	oh enhancements work in T V C
0:12:31	um um for this one example B that um a difference between four and five this means the addition of
0:12:37	envelope shaping
0:12:39	what's see a for the key tar um
0:12:41	the key top once it's a much clearer a
0:12:43	and so
0:12:44	somewhat preferred by
0:12:46	the listen
0:12:48	and
0:12:48	um
0:12:49	here i um we have the difference but a a difference between
0:12:53	uh the original remote walk and that mote work with someone it locking
0:12:58	with the which
0:12:59	um delivered but the for a no signal
0:13:03	we also see that uh in in most of the cases
0:13:07	um the D N A
0:13:08	perform better
0:13:11	and
0:13:13	um
0:13:14	i can make first summer right these sides here that
0:13:17	the harmonic locking really improve the term the
0:13:20	the envelope shaping also improve the trends in
0:13:23	parts
0:13:25	but you know was rated better for five
0:13:27	out of seven items
0:13:29	and um the rating could cover different aspects
0:13:33	of
0:13:34	this sound change which but was performed here
0:13:36	like a natural sounding artifacts on melody or car transcription errors
0:13:41	but tampa the preservation or pages
0:13:44	um and it is nice in many reported to trend for transposition
0:13:49	error us
0:13:50	um
0:13:51	in the in eighty
0:13:52	and
0:13:53	uh tampa problems from what talk
0:13:56	so we made an additional test which was the formant preference test
0:14:01	when these main quality aspects to find out more if this is really the case
0:14:07	for this
0:14:08	um yet twelve expert listeners
0:14:11	mean post technical a musical background
0:14:13	and we had now with them the extended model talk
0:14:16	and compared it to the N a
0:14:19	and
0:14:20	um we also found out in the first test
0:14:22	that is unknown mailer T which is a
0:14:24	a transcribed version of the original the um me D
0:14:28	is
0:14:29	somehow hard to
0:14:30	um to great for for people so we did it the other way around we did the transcription with me
0:14:36	D integral tries
0:14:37	transcribe it back to the original um score
0:14:41	um with a right for with our egg
0:14:44	for for signals
0:14:45	which are shown yeah also orchestra and some mixture and P know
0:14:50	and
0:14:50	now we we put this
0:14:52	five in the in the preference test
0:14:56	and and the outcome was
0:14:59	quite clear in the sense
0:15:00	is the people that
0:15:01	reported before in
0:15:03	that's there was a quite the uh preference for
0:15:07	uh the melody transcription for more walk which is shown yeah what focus all that the it left side
0:15:14	and in these are the results for a for the a transcription music transcription
0:15:19	and he uh are the results for time of the
0:15:22	which uh
0:15:23	show the clear preference for for the D N A
0:15:26	i can play an example
0:15:30	a can play all the five
0:15:31	to get a
0:15:33	yeah and short versions in the all that is they are shown here
0:15:36	first your reaching a
0:15:46	a
0:15:47	a
0:15:47	a
0:15:49	a
0:15:51	a
0:15:54	um
0:15:55	i
0:15:58	a
0:15:59	i
0:16:02	um
0:16:04	a
0:16:06	a
0:16:07	a
0:16:11	um
0:16:12	a
0:16:15	a
0:16:19	um
0:16:23	i think the some problems in the
0:16:25	in the music transcriptions in it in a a
0:16:28	a number uh a pressing this listening conditions yeah
0:16:34	so um not example is this is the piano no used to have time i play also the
0:16:40	um
0:16:41	this device here
0:16:45	uh um uh uh uh uh uh uh
0:17:05	uh um uh uh uh uh uh uh
0:17:25	uh um oh uh uh uh
0:17:44	oh
0:17:44	uh um uh uh uh uh uh um uh uh
0:18:06	"'kay" so um just a short summary
0:18:09	um
0:18:10	we have down now the what work for selective trends
0:18:13	position of pitch
0:18:15	which is capable of real-time processing
0:18:18	and which can put use
0:18:19	trends ends
0:18:20	and
0:18:21	uh also improves the time the by how money clocking
0:18:24	and it's
0:18:25	um
0:18:26	referred over the commercial system in the
0:18:28	in terms of transposition position of the melody T but it you know a the um
0:18:33	prefer
0:18:34	in time proposed preservation
0:18:37	so and in maybe in general
0:18:39	the
0:18:39	the both of the systems were and the range from fair to good so there's room for improvement
0:18:45	but the already
0:18:46	a somewhat use of yeah
0:18:48	the system thank you
0:18:57	we questions
0:19:00	one question i had as willis was trained listeners was goal years or where there
0:19:05	um um for the for the preference test it's it were a of people who were also yeah i had
0:19:10	some music background to stressed
0:19:12	um quite important for
0:19:14	this uh a time to the um grading let's say
0:19:18	but they weren't signal processors are not special to a golden yes no
0:19:24	and you questions
0:19:27	one harder question
0:19:29	well would you like to do me if you had all the signal processing power and all smart you could
0:19:33	do
0:19:33	what would you like to do to
0:19:35	oh problem
0:19:36	um i think
0:19:38	can be
0:19:39	that they
0:19:40	can be made a bit more complicated if you
0:19:42	you can imagine that you have total ones which are
0:19:45	a mixture of
0:19:46	uh maybe harmonics and find a mentor the frequency and so on
0:19:50	a a at different harmonics of different tones which match and the on the grid
0:19:54	so then of course the decomposition is much more complicated
0:19:58	and it and of course for this you would need to quite a more um up station um so i
0:20:04	think this would be one of the ways of a a a a a a further improvement could be achieved
0:20:09	a because the see
0:20:11	anything else
0:20:14	thank
0:20:15	okay can use a microphone
0:20:19	on your bullet point up there about a reproduction of transients improved by lpc based envelope shaping could you comment
0:20:26	on that what that is yeah the it we use the lpc parameters and um be obtained in the frequency
0:20:30	domain and apply this is a time envelope in the time domain
0:20:35	this is what i showed with the with the rates blocks and uh
0:20:38	when overview diagram
0:20:43	thank you

FREQUENCY SELECTIVE PITCH TRANSPOSITION OF AUDIO SIGNALS

Music Signal Processing

Presented by: Bernd Edler, Author(s): Sascha Disch, Fraunhofer Institute for Integrated Circuits (IIS), Germany; Bernd Edler, International Audio Laboratories Erlangen, Germany