thank you very much

and a um it's you also see yeah i'm not the first author of this paper

but in our case i must say for to T such that this cannot be here

to date because see his family has been in by a second door to two weeks ago so he can't

be here

um

and i'm working at the international audio lab or to recent a in which is that

joint institution

of

um the university of a you know back

and the problem of a institute for integrated circuits

what's the motivation for the work i'm going to present here

is that you often do in music production use a lie on

mixing prerecorded material

samples

and um you also need to at that these samples frequently two

different to musical context

then

the context they were recorded in

so in in some cases you might need to key mode conversion

this means major to minor or vice versa

and

they the a

algorithm for four

enabling this task

as been presented

um in

previous conferences

this this is called mode clock modulation vocoder

um it's some what's you to put to this task

but um we also found out that device

special enhancements necessary

in order to address

special requirements for this application

so i first want to um give a short overview on this model walk

accuracy

which performs the single pass and is

in a block wise processing

which is shown in in a block diagrams here

it does

first

uh signal adaptive band-pass filtering

which is aligned with spectral center centres of gravity

means we first of the um if T analysis

yeah

a dft analysis

and from the dft spectra

the um centres of

gravity

in perceptually adjusted then

uh determined in the band it's uh just it's so they are this decomposition is flexible

so from these centres

center frequencies

um and the around centre center frequencies to construct a bandpass filters

and

i in the yeah

done in the frequency domain

and in inverse

uh

dft T

get back for each bandpass signal a

to a time domain signal

and um this time domain signal

bandpass signal

is then and lies with and am and fm

and that this this

so you basically you have the carrier frequency which corresponds to the centre of gravity of this special frequency reach

and

a uh the F signal which gives the um

instantaneous frequency offset

quite um relative to this carrier of frequency

and you get um get the instantaneous make me to do M P chewed in the A M component

and then you can close to the signal in this modulation domain

for example you can change the carrier frequencies

and still maintain that uh fine temporal structure

um by keeping the A M and the F

it's

um

in the synthesis

you have to combine the a if M component with the maybe mode you modified

um carrier frequency

you have to

somehow one the different

um components from button block to the next block because it's tempered blocks sets it before

um

or just and yeah

and

um you to uh

and overlap it

processing of the am and the F M

or frequent instantaneous frequency

signals in order to get continuous

um parameter

and then you two

the synthesis

and at up

um all the sickness from the different bands you had

decompose the signal into four

so this is the basic structure of the modulation

well coder

but do you to the structure with the relatively

long blocks in the dft analysis

you still the um miss some of the um

signal

uh

characteristics by this processing

um and this is

one of the parts we we address by the enhancement

and this

the first of these enhancement was the so-called envelope shaping

i means

temporal envelopes of with in

the uh

dft blocks

might got get um lost or distorted

because you um can lose the

um this uh to to dispersed and you can

whose face

a relations between the different tone

and

this would could cost the temporal smearing of transients

and in this case it's better to use

then explicit

a temporal envelope

and you get access to the parameters of these

um of this temporal in below

but doing an lpc analysis in the frequency domain

because correlation in the frequency domain

corresponds to multiplication in the time domain

this means with it at coefficient

you get from from lpc analysis

along the frequency axis

you get parameters

you can use for a um getting

and

time function

you could could say at time response

yeah but you can

then it at the end

a might apply to to get back the temporal and middle

this what is done

with this

read looks

these are the um

enhancements what the

envelopes

um

in other

enhancement

the enhancement which is necessary

once you um start modifying spectra components

is

that you have to take into account

that

um

music a sounds are not normally consisting of a fundamental into a lot of harmonics the tone

and um you should keep this in mind when you modify frequencies

so the overtones tones are um quasi harmonic on uh the yeah frequency scale

which are you normally integer multiples of the fundamental frequency on you team integer multiples

um on the other hand to musical intervals are based on a logarithmic scale

and um now it's

a question

when you modify frequencies in which way you should modify them

um or and of course we want to modify them in the the based way for for the

a for what we intend to to for example for the transcription

and we have to consider a

this

because if it's a five it if it's an over of one fundamental to

frequency you which have to modified in accordance with the fundamental and not according to the musical scale

the the um if it would be and uh signal toll

on and that and then other um

part of the of the um skater

so yeah in this leads to

some kind of ambiguity when you get one told in just look

um

look at it on its own

so that's why we have to um get some addition interpretation

to find out whether it's uh

fundamental frequency

are if it's an overtone or uh a harmonic component of uh

a more complex sound structure

this is just an example

of um how in pulse of this key is uh

can match the

um how morning

and um just one example of uh to pick out

could be the number five which is

five times the

uh a fundamental frequency of one to alone

could be also

um and now that in which is a major it

a parts

am

in this in this diagram that the at might of of tapes and not taking into account so

so we you can have

um

some ambiguities between

a

second and also the for um

harmonic

which would then be just put of

op tapes and so on

so that's why you get

kind of an be treaty with um over to ones

and

music scores

and that's why this

second enhancement at been added to model clock

which is so that hmmm

which is called harmonic locking

so um is a set before the to estimated fundamental as

have to be mapped directory

and then you have to um decide for a the components

if it's a

um

oh but

then it has to be lot to the

transposition of its fundamental

just an the processing yeah

you decide um for money told if it's

um not

to another

frequency of bits

as be transposed on it's all

and by this which

yeah um just on either it transposition

of them G D node based mapping which is done for the fundamental frequency

yeah are it

um

done a transpose according to the to its fundamental

if it

if it's locked as up apply

uh indication here

it's not

non locked

then it's is locked in to test to be looked to the fundamental frequency and its map

now we come to the um listening test

methodology

it's a to

a difficult task if you to um

this kind of transcription

so we uh selected

me D samples

which we first at in the original domain

and we did

me transcription to obtain

um five which we could then yeah put into the test

so these but it is uh transcribe

um

reference signal which is done by T

and then uh transfer to a bay five

and on the other hand hand we get the original wave file

and be processed it um

to to with the transcription and then we can compare the to

and we have

different versions

three versions of of the more folk and one reference

transcription

system

job

also present

yeah

um there's one commercial system available which is the direct note excess in the middle line at each up by

a mini

and this is available since autumn

when a two thousand and nine

and it also allows

selective editing eating of polyphonic music

but it performs a multi-pass pass analysis

and it doesn't automatic decomposition into notes and um

a heuristic classification rule

but it also can be used to perform this scheme mode

clean key mode conversion

and so that's why we also try to um compare our

approach with this one

these are the the um items we used

um problem with to P a project we use some different signals

and

different midi files

is the set before

trash shown here

and this B

try to get some variety of more complex

orchestral music

and some more um solo instrument

hearts

so cup quite a mixture of

complexity of of

um content

these were the results of "'em"

so called mass for a test that we don't want to go too much into detail

in this test we have a a um

normally you hidden reference

is

um

i don't you know to by one

we have um

uh

so quite reference which is just uh

low-pass pass filtered signal which just numb do you know to by number two

and we have the more work the origin and what block

the more rock um is number three what work with the harmonic locking is for

and mark work with the a harmonic locking and D um

envelope shaping

it's

five and six is the the N A you the rate um

this system be compared to

um but not first we want to see how um

oh enhancements work in T V C

um um for this one example B that um a difference between four and five this means the addition of

envelope shaping

what's see a for the key tar um

the key top once it's a much clearer a

and so

somewhat preferred by

the listen

and

um

here i um we have the difference but a a difference between

uh the original remote walk and that mote work with someone it locking

with the which

um delivered but the for a no signal

we also see that uh in in most of the cases

um the D N A

perform better

and

um

i can make first summer right these sides here that

the harmonic locking really improve the term the

the envelope shaping also improve the trends in

parts

but you know was rated better for five

out of seven items

and um the rating could cover different aspects

of

this sound change which but was performed here

like a natural sounding artifacts on melody or car transcription errors

but tampa the preservation or pages

um and it is nice in many reported to trend for transposition

error us

um

in the in eighty

and

uh tampa problems from what talk

so we made an additional test which was the formant preference test

when these main quality aspects to find out more if this is really the case

for this

um yet twelve expert listeners

mean post technical a musical background

and we had now with them the extended model talk

and compared it to the N a

and

um we also found out in the first test

that is unknown mailer T which is a

a transcribed version of the original the um me D

is

somehow hard to

um to great for for people so we did it the other way around we did the transcription with me

D integral tries

transcribe it back to the original um score

um with a right for with our egg

for for signals

which are shown yeah also orchestra and some mixture and P know

and

now we we put this

five in the in the preference test

and and the outcome was

quite clear in the sense

is the people that

reported before in

that's there was a quite the uh preference for

uh the melody transcription for more walk which is shown yeah what focus all that the it left side

and in these are the results for a for the a transcription music transcription

and he uh are the results for time of the

which uh

show the clear preference for for the D N A

i can play an example

a can play all the five

to get a

yeah and short versions in the all that is they are shown here

first your reaching a

a

a

a

a

a

um

i

a

i

um

a

a

a

um

a

a

um

i think the some problems in the

in the music transcriptions in it in a a

a number uh a pressing this listening conditions yeah

so um not example is this is the piano no used to have time i play also the

um

this device here

uh um uh uh uh uh uh uh

uh um uh uh uh uh uh uh

uh um oh uh uh uh

oh

uh um uh uh uh uh uh um uh uh

"'kay" so um just a short summary

um

we have down now the what work for selective trends

position of pitch

which is capable of real-time processing

and which can put use

trends ends

and

uh also improves the time the by how money clocking

and it's

um

referred over the commercial system in the

in terms of transposition position of the melody T but it you know a the um

prefer

in time proposed preservation

so and in maybe in general

the

the both of the systems were and the range from fair to good so there's room for improvement

but the already

a somewhat use of yeah

the system thank you

we questions

one question i had as willis was trained listeners was goal years or where there

um um for the for the preference test it's it were a of people who were also yeah i had

some music background to stressed

um quite important for

this uh a time to the um grading let's say

but they weren't signal processors are not special to a golden yes no

and you questions

one harder question

well would you like to do me if you had all the signal processing power and all smart you could

do

what would you like to do to

oh problem

um i think

can be

that they

can be made a bit more complicated if you

you can imagine that you have total ones which are

a mixture of

uh maybe harmonics and find a mentor the frequency and so on

a a at different harmonics of different tones which match and the on the grid

so then of course the decomposition is much more complicated

and it and of course for this you would need to quite a more um up station um so i

think this would be one of the ways of a a a a a a further improvement could be achieved

a because the see

anything else

thank

okay can use a microphone

on your bullet point up there about a reproduction of transients improved by lpc based envelope shaping could you comment

on that what that is yeah the it we use the lpc parameters and um be obtained in the frequency

domain and apply this is a time envelope in the time domain

this is what i showed with the with the rates blocks and uh

when overview diagram

thank you