so much
um yeah
a my name is here so but are from now that of science and technology chip and that today i
talk about the automatic music
some naming based and the or the object will write issue
so this is a it's while but this talk the press
i
it's sprain them are our motivation in the back um
and the next the i X for in the spatial information C it's ms of based on the of i'm
best
i k-means clustering and uh after the experiment result
i will can
so this is that that one or or or or or our research
the image "'cause" common eh
a a feature is a the scraps
of of the uh music to a disk chan uh show the abstract information of the musical tune
a a a a for example
that we can hear if we can cure the some easy call some no
and uh we can easily and this ten and that i was strapped all the you formation of the tune
and the
we can easily i judge
to buy it
we not
uh i a pretty far about one
so
the as a music because some no
is a very important
but the problem is that common tree the music some nodes um mainly made about nine thirty
so that that very big problem because the uh
uh there is so many do D in
uh doing a is this so many times a so many music tunes
exist
and uh for example the
so many music uh
uh produce
a by the current to be shows as we as the uh all these
so
the one difficulty the is uh uh a how to deal use
the means go tuned
a i and the make the some name
in into a
by
but to not a ways
and the and seven
problem is the back so have now provide that's that very files
and the bound to the time segment
so for example random pick up a have the scraps
are the two is the bit about way
so this is a very random ready scrapped and uh
um abstract information is not so good
oh
the
i i go or is that to a uh a the technology for and the thing the weight can strapped
and of the musical tune
that relate it to what is
the
a this one and this one
and uh this one is based on the construction or is is by the beach tracking
and uh also the uh this one is uh construction and standing by the main male would be and Y
so the but mister
a a you deal with the uh more or signal
and also both met sort
ah
try to extract this hum middle D what be
if
uh
but uh
uh
in this talk
i is it
i propose
to use them under of the alternative information
the for example
the most three a a a very but music to and is this still we'll we're a channel for lot
so we can easily we extract the spatial information is that the are here such at time temporal information
oh that the motivation
so the the uh this the research and is
the we provide
we try to provide
the are kind tape
a Q
for making them is you call us some now reading
the is that are the uh
temporal information that for cheese used in a conventional missile
the fast
i propose the sum and noise this mess
for of the uh for peak picking at the
spatial information of the two
and the the neck
uh we we a have the uh some uh investigation are be some difference
with the L a proposed ms so and this some temporal based ms so
and then we propose the might
approach
so i
but going to be uh estimation
a a all the uh all the obvious yeah
so we estimate
that's spatial information of the uh included in the audio tunes
the for example
that dramas and some uh questions and uh
well clothes
a are it at the send
normally money
but sometimes
and the uh
and the P are known and uh uh side a get or something
uh located at the uh site
so
the to go to an have D S a structure of the spatial direction
i a as the temporal direction
so that we
it want to extract that this spatial information but using the sum of quantization technique
this is uh a but we all the estimation process of the of the object
well as we up right time fourier analysis right at like the uh commission method
and uh
we do the some k-means means right clustering technique
um but and that i will explain the
and uh we pick up
thus of quantization but the that
expresses is that direction
oh the each
or the was yeah each instrument
and the then
the uh we
but also the extract the sum activation to be patient function
all the uh each
uh object
then
and the weak classifier
and the we
uh we detect
that changing time all would be uh some structure yeah or of the uh uh some
um music two
so this is uh a model all the a time-frequency input signal to it at the
so so this is the general form but the no nodding most three we use
only them
uh to channel signal as the still form a so and equal to
and uh this is a quantization but that we want to
put it into the this
signal
and that this is yeah the uh uh a sorry for B is the in the
um maybe the in
so this is a mistake
um that this is that any given engine now can tie this sum bit to
but that this and can be that the mean by our so
and that is is a figure yeah are be uh from
a
spectral read
uh exist in their right
hand side and the left hand side channel
so
in a
frequency and the time
uh a means that
signal now that's still signal i have the such a configuration
then
we you one to
a a put the sum and the system better the right this
the one the express is that one direction X be a some
so sees
the and the for example this direction mean
uh a present that almost the sent the rock at eight
uh instruments right the dramas and the vocal
and that this
uh left side
a a a bit to mean the left on side instruments reich that's side you tell what something
so this is uh clustering string
uh mathematics in the press ring
the we want to
classify
the each object
but from not i'm of the each cell
but the only the direction
R be uh is this is a a a a difference from the a conventional and the normal approach to
the for example that we
we are i don't want to take yeah the i was only to about all that each component at the
left and the right hand
sound they would domain
but that we want to extract that
and cool
uh from the uh quantization back to
so oh
so this is that input put at the one three can see under one time
and that this is that one
chan data of that call quantization back to
yeah but that this point i the some back that is normal
as a as you need
for example
and the we calculate not that you a difference between the this
that there is
but uh we want to cry to eight the in a from the um
all the from the upon point this but on whole
and that is input signal bit on
so
the k-means clustering
a a is uh some
time the
something though
where before like
the first we set the in for us
and uh some it's sensor it that this see the initial right
and and that
we update the centroid
uh right B S
the we cut greek but such a a a a or sign ever
cosine signed distance and of a a a a with we round us
signals
and the but that this scroll saying error a is simplified the this so this problem
is very simplified
uh a right the uh this solution is given by the finding the maximum eigenvalue problem
or that this correlation function of the input signal we thing that one for us
then that we solve this the maximization problem uh using the sum it's B D or something
then we
a uh you define the
centroid new centre right
then
uh we define the new
you a and all the a price
then
go back to a
so and that we calculate that this
a hmmm
a of to buy station for each class
each in four
the factory
no we obtain the optimal quantization back to at each and H T close that such a a uh since
centroid the C
and that that press flight each component to in an old you all and that's all process index i
yeah the this
class index i is the important because the a it just right
the representation of the activation are be a round direction
at the time and if we can see
so that we or would be it
a
uh
trust index function is or audio object localization
so that mean the at that one
at uh
frequency and can time point
the if the i function equals and
and for example the one and two and three
and that there
um the this very equal one um that the wide do so this
mean the
some up to be should so
right see the uh example of the P yeah or the set yeah i function
the
this is that we all music tune
uh the praying the trump bit but prone tramp bad bass and drums and that's that's form
and the entire bow a
is is the that the or all right
and the entire bar be E the saxophone solo pot
so i think you can see the uh in the one
class is very active at that in borrow a be the a trump it is very active
and then we see the very short
it re all or a T V active area or a for channels of that this the mean that row
few we
the the purely
but it short there three
and then
sift
to D uh in the L be at that
sets once so so this is the sex
so this is just like the sum
separation results of the aspects down
and that there that we
well do you want to pick up that changing time point
so that we more simplify the information
that we march or the such that the a you for make up to be shown information
uh a a all with that the see we march
and then we define a this is the a based P at the time
so that a we give the some frequency weighting
and that this is the sum example
or the us
block this um do then T
so
a is the changing time be L structure change point
so the first and this back and
crass and is that
that maybe that one
it's a we up
and that then of the the changing point
the and and that's that
S S is a
so
this mean that the
and the instrument
uh is
a the a yeah be of and then save to the uh press one so this is just like this
um
um
i to be shown
sequence
uh i run with the spatial
direction
so
we define
and that we want to pick up
that that i mean that is
so
being point
changing point right it
the sometimes
there are the sound fine at fat to write this some vibration
so uh and the we
a a a a some losing the very simple simple supposing technique technical and with the time
and also also do we do the we ring
R be at this a requisition things P
and uh into the ability to get the number of classes
as the for example that we a assumed a for a little
stay it the first date
that one
re on side is that sent i that B
only the right hand side is that the
and
or
doubt where this
signal is
a
so we classified the full state
and that this
a a close is that very robust to result
okay
rats go to the evaluation
the we do the experiment using that out that was C P
popular pure music database that we you got the twenty five
a a popular music signal that the see the john
that he pop rock are up they can goes the pops
the uh is on bruce or metal
and the we money you white put uh a a two hundred sixty seven structure changed time
by man you're right
in the database that which are regarded as the correct
so so in this our experiment
the number of was is that to so that this is a
the real marked
and uh we
set the quantization but the the number of sent on this some but the
and
E so we
so this is uh a
a a result using the will propose a men so
the number of or or the quite cells is that two hundred sixty seven
and uh
we pick gap though one hundred to ninety three
and
or
all the uh correct on so the with the D is uh not do that it that this is the
fire detection
so that precision that the record is a most the uh a point seven seventy percent more than seven me
person
and the a major
is there a
the point seven
for all
so the more than seventy percent
uh a detection are correct
so
the somebody
a a a a a have a some uh interest
in the comparison with the this
a a a spatial information base ms so
and the we've D S some conventional temporal based missile
so that we compare
the we uh do the experiment with the apple bit tape method it's that P L C A proposed by
white
uh to sell them ten that this is that and map based ms
and automatic detect detector to duration
and that yeah do you know so this is the temporal based missile
the as of like the
a a and then stuff can be a conventional missile
so
but that
a is not that i propose a is the to talk talking
and that's spatial
base
mission
so that they say that that's the comprise and we
uh i think priest showed the uh
precision and the recall and if me
so at the you can see
the uh in a if a major
then not so much
a a a a a different
alright is
in in in that if it's itself
but
the contents
the re are detection
behave year
is before and so this is a some investigation the part the difference
or be a propose the spatial based ms so we've the it's some temporal basements missile
yeah
that we
how to wait the some you relation P here all the uh conventional ms and uh proposed mess
so the this
um the
one hundred twenty seven detection
uh uh
by whole
ms
but uh
fifty is three detection
only only at that
he detected it only by the proposed mess
so this is a detected by the spatial information
and and the side
the forty nine detections
only by probable at the
detected by the if that P yeah a so the see the temporal
result
so
at the you can see
the yeah yeah is very similar or local you relation
um
or or they uh if a major is the one the same but the a from the desired is a
very different
so
and uh also so this is a very complimentary
so um then next that is that we apply this sum
a margin technique
yeah
so
a maybe may you you you do in the uh
we have this so many idea of the matching technique but that in this
and i
a right a very simple one
but row or operation the very simple one but the very effective
so this is the result of the uh
marched up two
so that we can see the very good
if a major
so the the baseline the proposed a mess so that's a show on B
so
ah a
the if major is at the point a seven for what the this my suit
technique
a a deep the uh more than eighty percent
accuracy the the so
the in control region the spatial and temporal information
yeah be a very good
information and the very complimentary
so that if we
gives a
the bells
that we caff a the a better the result
so this is uh can jumps that i propose the new alternative mess
to detect the changing point all the M is you to post music some naming
the conventional a method is based on the temporal structure extract
it's tracking ms
but the uh will probable the ms not lead the spatial
uh information based ms
that we detect
that changing the time of the dsp shows structure
or be a music amount chan beauty to
so that they the
using the i miss that the seventy per and at U S C
uh we here
but uh um where are are Q a three we here have we can get
if we use the
mod
approaches is with the uh
uh temporal information or temporal based miss out
the thank you so much
for
i i have the time the i i want to show that the more all the uh some narrow
okay
for
yeah
i mean um the
that are so that's so
yeah
yeah yeah
i
that
this is the second pass
i
yeah than
i
i
i
and
i
yeah
a i
john
the high that i had a
or not
i
a
a
i
and
a
a
this is a
i
i
in a set
that you can and that's standard that is music
yeah
okay thanks somewhat
i'm not sure that i'd by the music
that might be a my tear is not to
i
i i do have one question i haven't thought about audio summary
but
it be some video summaries of movies the that real are often does not look at all like that was
normal be
and a morning with the white answer is for music do you want
just excerpts service you want
summing style are used
what's
what's gonna be best the mail
and if you if you are disk could design or on from know will the use
we only do with the uh music
B six them out
a by the much on the L format
yeah
what what would be the ideal summary
some are we you know or from you know i mean if you could do you want summary what we
look like
no no things you know what user can
um but from some some the mean this some the house say not
like the you know
some no
the what what would be the ideal of male for piece of music
and the
this one
no in general would be just pieces of music would be a beginning a middle of you and would it
be
something completely different
um
so
so in there
put a music that very easy to detect the speech whom
information and the we can easily make this some now
but the for general
and uh some time uh
some use a is that very difficult
to deal with
a a for example of this something funny music
uh is difficult but the
well though the this is not so just you you are
uh
and sell your president but that
on the
the one
in a general
to deal with that know music
the oh one big problem is uh
many
instrument music
right the seem twenty and a is such a case that we
a um
thus so many
quantization vector
yeah
that
the after that we can
it it the sum
on changing point but the
some classification are with them is a very
complex
okay
but not turn solutions of colleagues minor doing is actually putting it up on the a web single people click
on
so we doing for image from males
and so then we can actually measure
click to performance a way to right
for males
hmmm
which might be points of view at some point
you know questions
make much
thank you