so much

um yeah

a my name is here so but are from now that of science and technology chip and that today i

talk about the automatic music

some naming based and the or the object will write issue

so this is a it's while but this talk the press

i

it's sprain them are our motivation in the back um

and the next the i X for in the spatial information C it's ms of based on the of i'm

best

i k-means clustering and uh after the experiment result

i will can

so this is that that one or or or or or our research

the image "'cause" common eh

a a feature is a the scraps

of of the uh music to a disk chan uh show the abstract information of the musical tune

a a a a for example

that we can hear if we can cure the some easy call some no

and uh we can easily and this ten and that i was strapped all the you formation of the tune

and the

we can easily i judge

to buy it

we not

uh i a pretty far about one

so

the as a music because some no

is a very important

but the problem is that common tree the music some nodes um mainly made about nine thirty

so that that very big problem because the uh

uh there is so many do D in

uh doing a is this so many times a so many music tunes

exist

and uh for example the

so many music uh

uh produce

a by the current to be shows as we as the uh all these

so

the one difficulty the is uh uh a how to deal use

the means go tuned

a i and the make the some name

in into a

by

but to not a ways

and the and seven

problem is the back so have now provide that's that very files

and the bound to the time segment

so for example random pick up a have the scraps

are the two is the bit about way

so this is a very random ready scrapped and uh

um abstract information is not so good

oh

the

i i go or is that to a uh a the technology for and the thing the weight can strapped

and of the musical tune

that relate it to what is

the

a this one and this one

and uh this one is based on the construction or is is by the beach tracking

and uh also the uh this one is uh construction and standing by the main male would be and Y

so the but mister

a a you deal with the uh more or signal

and also both met sort

ah

try to extract this hum middle D what be

if

uh

but uh

uh

in this talk

i is it

i propose

to use them under of the alternative information

the for example

the most three a a a very but music to and is this still we'll we're a channel for lot

so we can easily we extract the spatial information is that the are here such at time temporal information

oh that the motivation

so the the uh this the research and is

the we provide

we try to provide

the are kind tape

a Q

for making them is you call us some now reading

the is that are the uh

temporal information that for cheese used in a conventional missile

the fast

i propose the sum and noise this mess

for of the uh for peak picking at the

spatial information of the two

and the the neck

uh we we a have the uh some uh investigation are be some difference

with the L a proposed ms so and this some temporal based ms so

and then we propose the might

approach

so i

but going to be uh estimation

a a all the uh all the obvious yeah

so we estimate

that's spatial information of the uh included in the audio tunes

the for example

that dramas and some uh questions and uh

well clothes

a are it at the send

normally money

but sometimes

and the uh

and the P are known and uh uh side a get or something

uh located at the uh site

so

the to go to an have D S a structure of the spatial direction

i a as the temporal direction

so that we

it want to extract that this spatial information but using the sum of quantization technique

this is uh a but we all the estimation process of the of the object

well as we up right time fourier analysis right at like the uh commission method

and uh

we do the some k-means means right clustering technique

um but and that i will explain the

and uh we pick up

thus of quantization but the that

expresses is that direction

oh the each

or the was yeah each instrument

and the then

the uh we

but also the extract the sum activation to be patient function

all the uh each

uh object

then

and the weak classifier

and the we

uh we detect

that changing time all would be uh some structure yeah or of the uh uh some

um music two

so this is uh a model all the a time-frequency input signal to it at the

so so this is the general form but the no nodding most three we use

only them

uh to channel signal as the still form a so and equal to

and uh this is a quantization but that we want to

put it into the this

signal

and that this is yeah the uh uh a sorry for B is the in the

um maybe the in

so this is a mistake

um that this is that any given engine now can tie this sum bit to

but that this and can be that the mean by our so

and that is is a figure yeah are be uh from

a

spectral read

uh exist in their right

hand side and the left hand side channel

so

in a

frequency and the time

uh a means that

signal now that's still signal i have the such a configuration

then

we you one to

a a put the sum and the system better the right this

the one the express is that one direction X be a some

so sees

the and the for example this direction mean

uh a present that almost the sent the rock at eight

uh instruments right the dramas and the vocal

and that this

uh left side

a a a bit to mean the left on side instruments reich that's side you tell what something

so this is uh clustering string

uh mathematics in the press ring

the we want to

classify

the each object

but from not i'm of the each cell

but the only the direction

R be uh is this is a a a a difference from the a conventional and the normal approach to

the for example that we

we are i don't want to take yeah the i was only to about all that each component at the

left and the right hand

sound they would domain

but that we want to extract that

and cool

uh from the uh quantization back to

so oh

so this is that input put at the one three can see under one time

and that this is that one

chan data of that call quantization back to

yeah but that this point i the some back that is normal

as a as you need

for example

and the we calculate not that you a difference between the this

that there is

but uh we want to cry to eight the in a from the um

all the from the upon point this but on whole

and that is input signal bit on

so

the k-means clustering

a a is uh some

time the

something though

where before like

the first we set the in for us

and uh some it's sensor it that this see the initial right

and and that

we update the centroid

uh right B S

the we cut greek but such a a a a or sign ever

cosine signed distance and of a a a a with we round us

signals

and the but that this scroll saying error a is simplified the this so this problem

is very simplified

uh a right the uh this solution is given by the finding the maximum eigenvalue problem

or that this correlation function of the input signal we thing that one for us

then that we solve this the maximization problem uh using the sum it's B D or something

then we

a uh you define the

centroid new centre right

then

uh we define the new

you a and all the a price

then

go back to a

so and that we calculate that this

a hmmm

a of to buy station for each class

each in four

the factory

no we obtain the optimal quantization back to at each and H T close that such a a uh since

centroid the C

and that that press flight each component to in an old you all and that's all process index i

yeah the this

class index i is the important because the a it just right

the representation of the activation are be a round direction

at the time and if we can see

so that we or would be it

a

uh

trust index function is or audio object localization

so that mean the at that one

at uh

frequency and can time point

the if the i function equals and

and for example the one and two and three

and that there

um the this very equal one um that the wide do so this

mean the

some up to be should so

right see the uh example of the P yeah or the set yeah i function

the

this is that we all music tune

uh the praying the trump bit but prone tramp bad bass and drums and that's that's form

and the entire bow a

is is the that the or all right

and the entire bar be E the saxophone solo pot

so i think you can see the uh in the one

class is very active at that in borrow a be the a trump it is very active

and then we see the very short

it re all or a T V active area or a for channels of that this the mean that row

few we

the the purely

but it short there three

and then

sift

to D uh in the L be at that

sets once so so this is the sex

so this is just like the sum

separation results of the aspects down

and that there that we

well do you want to pick up that changing time point

so that we more simplify the information

that we march or the such that the a you for make up to be shown information

uh a a all with that the see we march

and then we define a this is the a based P at the time

so that a we give the some frequency weighting

and that this is the sum example

or the us

block this um do then T

so

a is the changing time be L structure change point

so the first and this back and

crass and is that

that maybe that one

it's a we up

and that then of the the changing point

the and and that's that

S S is a

so

this mean that the

and the instrument

uh is

a the a yeah be of and then save to the uh press one so this is just like this

um

um

i to be shown

sequence

uh i run with the spatial

direction

so

we define

and that we want to pick up

that that i mean that is

so

being point

changing point right it

the sometimes

there are the sound fine at fat to write this some vibration

so uh and the we

a a a a some losing the very simple simple supposing technique technical and with the time

and also also do we do the we ring

R be at this a requisition things P

and uh into the ability to get the number of classes

as the for example that we a assumed a for a little

stay it the first date

that one

re on side is that sent i that B

only the right hand side is that the

and

or

doubt where this

signal is

a

so we classified the full state

and that this

a a close is that very robust to result

okay

rats go to the evaluation

the we do the experiment using that out that was C P

popular pure music database that we you got the twenty five

a a popular music signal that the see the john

that he pop rock are up they can goes the pops

the uh is on bruce or metal

and the we money you white put uh a a two hundred sixty seven structure changed time

by man you're right

in the database that which are regarded as the correct

so so in this our experiment

the number of was is that to so that this is a

the real marked

and uh we

set the quantization but the the number of sent on this some but the

and

E so we

so this is uh a

a a result using the will propose a men so

the number of or or the quite cells is that two hundred sixty seven

and uh

we pick gap though one hundred to ninety three

and

or

all the uh correct on so the with the D is uh not do that it that this is the

fire detection

so that precision that the record is a most the uh a point seven seventy percent more than seven me

person

and the a major

is there a

the point seven

for all

so the more than seventy percent

uh a detection are correct

so

the somebody

a a a a a have a some uh interest

in the comparison with the this

a a a spatial information base ms so

and the we've D S some conventional temporal based missile

so that we compare

the we uh do the experiment with the apple bit tape method it's that P L C A proposed by

white

uh to sell them ten that this is that and map based ms

and automatic detect detector to duration

and that yeah do you know so this is the temporal based missile

the as of like the

a a and then stuff can be a conventional missile

so

but that

a is not that i propose a is the to talk talking

and that's spatial

base

mission

so that they say that that's the comprise and we

uh i think priest showed the uh

precision and the recall and if me

so at the you can see

the uh in a if a major

then not so much

a a a a a different

alright is

in in in that if it's itself

but

the contents

the re are detection

behave year

is before and so this is a some investigation the part the difference

or be a propose the spatial based ms so we've the it's some temporal basements missile

yeah

that we

how to wait the some you relation P here all the uh conventional ms and uh proposed mess

so the this

um the

one hundred twenty seven detection

uh uh

by whole

ms

but uh

fifty is three detection

only only at that

he detected it only by the proposed mess

so this is a detected by the spatial information

and and the side

the forty nine detections

only by probable at the

detected by the if that P yeah a so the see the temporal

result

so

at the you can see

the yeah yeah is very similar or local you relation

um

or or they uh if a major is the one the same but the a from the desired is a

very different

so

and uh also so this is a very complimentary

so um then next that is that we apply this sum

a margin technique

yeah

so

a maybe may you you you do in the uh

we have this so many idea of the matching technique but that in this

and i

a right a very simple one

but row or operation the very simple one but the very effective

so this is the result of the uh

marched up two

so that we can see the very good

if a major

so the the baseline the proposed a mess so that's a show on B

so

ah a

the if major is at the point a seven for what the this my suit

technique

a a deep the uh more than eighty percent

accuracy the the so

the in control region the spatial and temporal information

yeah be a very good

information and the very complimentary

so that if we

gives a

the bells

that we caff a the a better the result

so this is uh can jumps that i propose the new alternative mess

to detect the changing point all the M is you to post music some naming

the conventional a method is based on the temporal structure extract

it's tracking ms

but the uh will probable the ms not lead the spatial

uh information based ms

that we detect

that changing the time of the dsp shows structure

or be a music amount chan beauty to

so that they the

using the i miss that the seventy per and at U S C

uh we here

but uh um where are are Q a three we here have we can get

if we use the

mod

approaches is with the uh

uh temporal information or temporal based miss out

the thank you so much

for

i i have the time the i i want to show that the more all the uh some narrow

okay

for

yeah

i mean um the

that are so that's so

yeah

yeah yeah

i

that

this is the second pass

i

yeah than

i

i

i

and

i

yeah

a i

john

the high that i had a

or not

i

a

a

i

and

a

a

this is a

i

i

in a set

that you can and that's standard that is music

yeah

okay thanks somewhat

i'm not sure that i'd by the music

that might be a my tear is not to

i

i i do have one question i haven't thought about audio summary

but

it be some video summaries of movies the that real are often does not look at all like that was

normal be

and a morning with the white answer is for music do you want

just excerpts service you want

summing style are used

what's

what's gonna be best the mail

and if you if you are disk could design or on from know will the use

we only do with the uh music

B six them out

a by the much on the L format

yeah

what what would be the ideal summary

some are we you know or from you know i mean if you could do you want summary what we

look like

no no things you know what user can

um but from some some the mean this some the house say not

like the you know

some no

the what what would be the ideal of male for piece of music

and the

this one

no in general would be just pieces of music would be a beginning a middle of you and would it

be

something completely different

um

so

so in there

put a music that very easy to detect the speech whom

information and the we can easily make this some now

but the for general

and uh some time uh

some use a is that very difficult

to deal with

a a for example of this something funny music

uh is difficult but the

well though the this is not so just you you are

uh

and sell your president but that

on the

the one

in a general

to deal with that know music

the oh one big problem is uh

many

instrument music

right the seem twenty and a is such a case that we

a um

thus so many

quantization vector

yeah

that

the after that we can

it it the sum

on changing point but the

some classification are with them is a very

complex

okay

but not turn solutions of colleagues minor doing is actually putting it up on the a web single people click

on

so we doing for image from males

and so then we can actually measure

click to performance a way to right

for males

hmmm

which might be points of view at some point

you know questions

make much

thank you