so much
um yeah
a my name is here so but are from now that of science and technology chip and that today i
talk about the automatic music
some naming based and the or the object will write issue
so this is a it's while but this talk the press
it's sprain them are our motivation in the back um
and the next the i X for in the spatial information C it's ms of based on the of i'm
i k-means clustering and uh after the experiment result
i will can
so this is that that one or or or or or our research
the image "'cause" common eh
a a feature is a the scraps
of of the uh music to a disk chan uh show the abstract information of the musical tune
a a a a for example
that we can hear if we can cure the some easy call some no
and uh we can easily and this ten and that i was strapped all the you formation of the tune
and the
we can easily i judge
to buy it
we not
uh i a pretty far about one
the as a music because some no
is a very important
but the problem is that common tree the music some nodes um mainly made about nine thirty
so that that very big problem because the uh
uh there is so many do D in
uh doing a is this so many times a so many music tunes
and uh for example the
so many music uh
uh produce
a by the current to be shows as we as the uh all these
the one difficulty the is uh uh a how to deal use
the means go tuned
a i and the make the some name
in into a
but to not a ways
and the and seven
problem is the back so have now provide that's that very files
and the bound to the time segment
so for example random pick up a have the scraps
are the two is the bit about way
so this is a very random ready scrapped and uh
um abstract information is not so good
i i go or is that to a uh a the technology for and the thing the weight can strapped
and of the musical tune
that relate it to what is
a this one and this one
and uh this one is based on the construction or is is by the beach tracking
and uh also the uh this one is uh construction and standing by the main male would be and Y
so the but mister
a a you deal with the uh more or signal
and also both met sort
try to extract this hum middle D what be
but uh
in this talk
i is it
i propose
to use them under of the alternative information
the for example
the most three a a a very but music to and is this still we'll we're a channel for lot
so we can easily we extract the spatial information is that the are here such at time temporal information
oh that the motivation
so the the uh this the research and is
the we provide
we try to provide
the are kind tape
a Q
for making them is you call us some now reading
the is that are the uh
temporal information that for cheese used in a conventional missile
the fast
i propose the sum and noise this mess
for of the uh for peak picking at the
spatial information of the two
and the the neck
uh we we a have the uh some uh investigation are be some difference
with the L a proposed ms so and this some temporal based ms so
and then we propose the might
so i
but going to be uh estimation
a a all the uh all the obvious yeah
so we estimate
that's spatial information of the uh included in the audio tunes
the for example
that dramas and some uh questions and uh
well clothes
a are it at the send
normally money
but sometimes
and the uh
and the P are known and uh uh side a get or something
uh located at the uh site
the to go to an have D S a structure of the spatial direction
i a as the temporal direction
so that we
it want to extract that this spatial information but using the sum of quantization technique
this is uh a but we all the estimation process of the of the object
well as we up right time fourier analysis right at like the uh commission method
and uh
we do the some k-means means right clustering technique
um but and that i will explain the
and uh we pick up
thus of quantization but the that
expresses is that direction
oh the each
or the was yeah each instrument
and the then
the uh we
but also the extract the sum activation to be patient function
all the uh each
uh object
and the weak classifier
and the we
uh we detect
that changing time all would be uh some structure yeah or of the uh uh some
um music two
so this is uh a model all the a time-frequency input signal to it at the
so so this is the general form but the no nodding most three we use
only them
uh to channel signal as the still form a so and equal to
and uh this is a quantization but that we want to
put it into the this
and that this is yeah the uh uh a sorry for B is the in the
um maybe the in
so this is a mistake
um that this is that any given engine now can tie this sum bit to
but that this and can be that the mean by our so
and that is is a figure yeah are be uh from
spectral read
uh exist in their right
hand side and the left hand side channel
in a
frequency and the time
uh a means that
signal now that's still signal i have the such a configuration
we you one to
a a put the sum and the system better the right this
the one the express is that one direction X be a some
so sees
the and the for example this direction mean
uh a present that almost the sent the rock at eight
uh instruments right the dramas and the vocal
and that this
uh left side
a a a bit to mean the left on side instruments reich that's side you tell what something
so this is uh clustering string
uh mathematics in the press ring
the we want to
the each object
but from not i'm of the each cell
but the only the direction
R be uh is this is a a a a difference from the a conventional and the normal approach to
the for example that we
we are i don't want to take yeah the i was only to about all that each component at the
left and the right hand
sound they would domain
but that we want to extract that
and cool
uh from the uh quantization back to
so oh
so this is that input put at the one three can see under one time
and that this is that one
chan data of that call quantization back to
yeah but that this point i the some back that is normal
as a as you need
for example
and the we calculate not that you a difference between the this
that there is
but uh we want to cry to eight the in a from the um
all the from the upon point this but on whole
and that is input signal bit on
the k-means clustering
a a is uh some
time the
something though
where before like
the first we set the in for us
and uh some it's sensor it that this see the initial right
and and that
we update the centroid
uh right B S
the we cut greek but such a a a a or sign ever
cosine signed distance and of a a a a with we round us
and the but that this scroll saying error a is simplified the this so this problem
is very simplified
uh a right the uh this solution is given by the finding the maximum eigenvalue problem
or that this correlation function of the input signal we thing that one for us
then that we solve this the maximization problem uh using the sum it's B D or something
then we
a uh you define the
centroid new centre right
uh we define the new
you a and all the a price
go back to a
so and that we calculate that this
a hmmm
a of to buy station for each class
each in four
the factory
no we obtain the optimal quantization back to at each and H T close that such a a uh since
centroid the C
and that that press flight each component to in an old you all and that's all process index i
yeah the this
class index i is the important because the a it just right
the representation of the activation
0:10:44at the time and if we can see
0:10:47so that we or would be it
0:10:51trust index function is or audio object localization
0:10:55so that mean the at that one
0:10:58at uh
0:11:00frequency and can time point
0:11:02the if the i function equals and
0:11:05and for example the one and two and three
0:11:07and that there
0:11:08um the this very equal one um that the wide do so this
0:11:13mean the
0:11:14some up to be should so
0:11:16right see the uh example of the P yeah or the set yeah i function
0:11:22this is that we all music tune
0:11:24uh the praying the trump bit but prone tramp bad bass and drums and that's that's form
0:11:31and the entire bow a
0:11:33is is the that the or all right
0:11:35and the entire bar be E the saxophone solo pot
0:11:39so i think you can see the uh in the one
0:11:42class is very active at that in borrow a be the a trump it is very active
0:11:48and then we see the very short
0:11:51it re all or a T V active area or a for channels of that this the mean that row
0:11:57few we
0:11:58the the purely
0:11:59but it short there three
0:12:01and then
0:12:02to D uh in the L be at that
0:12:05sets once so so this is the sex
0:12:07so this is just like the sum
0:12:09separation results of the aspects down
0:12:16and that there that we
0:12:18well do you want to pick up that changing time point
0:12:21so that we more simplify the information
0:12:25that we march or the such that the a you for make up to be shown information
0:12:31uh a a all with that the see we march
0:12:34and then we define a this is the a based P at the time
0:12:38so that a we give the some frequency weighting
0:12:42and that this is the sum example
0:12:44or the us
0:12:45block this um do then T
0:12:48a is the changing time be L structure change point
0:12:52so the first and this back and
0:12:54crass and is that
0:12:56that maybe that one
0:12:57it's a we up
0:12:59and that then of the the changing point
0:13:02the and and that's that
0:13:04S S is a
0:13:06this mean that the
0:13:08and the instrument
0:13:10uh is
0:13:11a the a yeah be of and then save to the uh press one so this is just like this
0:13:18i to be shown
0:13:21uh i run with the spatial
0:13:27we define
0:13:28and that we want to pick up
0:13:31that that i mean that is
0:13:33being point
0:13:35changing point right it
0:13:39the sometimes
0:13:40there are the sound fine at fat to write this some vibration
0:13:45so uh and the we
0:13:46a a a a some losing the very simple simple supposing technique technical and with the time
0:13:52and also also do we do the we ring
0:13:55R be at this a requisition things P
0:13:58and uh into the ability to get the number of classes
0:14:02as the for example that we a assumed a for a little
0:14:06stay it the first date
0:14:08that one
0:14:09re on side is that sent i that B
0:14:13only the right hand side is that the
0:14:17doubt where this
0:14:19signal is
0:14:22so we classified the full state
0:14:25and that this
0:14:26a a close is that very robust to result
0:14:31rats go to the evaluation
0:14:34the we do the experiment using that out that was C P
0:14:38popular pure music database that we you got the twenty five
0:14:42a a popular music signal that the see the john
0:14:45that he pop rock are up they can goes the pops
0:14:49the uh is on bruce or metal
0:14:52and the we money you white put uh a a two hundred sixty seven structure changed time
0:14:57by man you're right
0:14:59in the database that which are regarded as the correct
0:15:02so so in this our experiment
0:15:05the number of was is that to so that this is a
0:15:08the real marked
0:15:09and uh we
0:15:11set the quantization but the the number of sent on this some but the
0:15:15E so we
0:15:20so this is uh a
0:15:21a a result using the will propose a men so
0:15:25the number of or or the quite cells is that two hundred sixty seven
0:15:29and uh
0:15:31we pick gap though one hundred to ninety three
0:15:35all the uh correct on so the with the D is uh not do that it that this is the
0:15:40fire detection
0:15:41so that precision that the record is a most the uh a point seven seventy percent more than seven me
0:15:47and the a major
0:15:49is there a
0:15:51the point seven
0:15:52for all
0:15:53so the more than seventy percent
0:15:56uh a detection are correct
0:16:02the somebody
0:16:04a a a a a have a some uh interest
0:16:08in the comparison with the this
0:16:10a a a spatial information base ms so
0:16:13and the we've D S some conventional temporal based missile
0:16:17so that we compare
0:16:19the we uh do the experiment with the apple bit tape method it's that P L C A proposed by
0:16:27uh to sell them ten that this is that and map based ms
0:16:31and automatic detect detector to duration
0:16:34and that yeah do you know so this is the temporal based missile
0:16:39the as of like the
0:16:40a a and then stuff can be a conventional missile
0:16:45but that
0:16:45a is not that i propose a is the to talk talking
0:16:49and that's spatial
0:16:52so that they say that that's the comprise and we
0:16:55uh i think priest showed the uh
0:16:57precision and the recall and if me
0:16:59so at the you can see
0:17:01the uh in a if a major
0:17:03then not so much
0:17:05a a a a a different
0:17:08alright is
0:17:09in in in that if it's itself
0:17:13the contents
0:17:14the re are detection
0:17:16behave year
0:17:17is before and so this is a some investigation the part the difference
0:17:22or be a propose the spatial based ms so we've the it's some temporal basements missile
0:17:27that we
0:17:29how to wait the some you relation P here all the uh conventional ms and uh proposed mess
0:17:36so the this
0:17:38um the
0:17:40one hundred twenty seven detection
0:17:43uh uh
0:17:45by whole
0:17:47but uh
0:17:48fifty is three detection
0:17:51only only at that
0:17:52he detected it only by the proposed mess
0:17:55so this is a detected by the spatial information
0:17:59and and the side
0:18:00the forty nine detections
0:18:02only by probable at the
0:18:04detected by the if that P yeah a so the see the temporal
0:18:09at the you can see
0:18:11the yeah yeah is very similar or local you relation
0:18:16or or they uh if a major is the one the same but the a from the desired is a
0:18:22very different
0:18:24and uh also so this is a very complimentary
0:18:28so um then next that is that we apply this sum
0:18:32a margin technique
0:18:36a maybe may you you you do in the uh
0:18:38we have this so many idea of the matching technique but that in this
0:18:44and i
0:18:45a right a very simple one
0:18:47but row or operation the very simple one but the very effective
0:18:52so this is the result of the uh
0:18:54marched up two
0:18:56so that we can see the very good
0:18:59if a major
0:19:00so the the baseline the proposed a mess so that's a show on B
0:19:06ah a
0:19:08the if major is at the point a seven for what the this my suit
0:19:13a a deep the uh more than eighty percent
0:19:16accuracy the the so
0:19:18the in control region the spatial and temporal information
0:19:23yeah be a very good
0:19:24information and the very complimentary
0:19:27so that if we
0:19:29gives a
0:19:29the bells
0:19:30that we caff a the a better the result
0:19:34so this is uh can jumps that i propose the new alternative mess
0:19:40to detect the changing point all the M is you to post music some naming
0:19:45the conventional a method is based on the temporal structure extract
0:19:49it's tracking ms
0:19:50but the uh will probable the ms not lead the spatial
0:19:54uh information based ms
0:19:56that we detect
0:19:57that changing the time of the dsp shows structure
0:20:01or be a music amount chan beauty to
0:20:04so that they the
0:20:06using the i miss that the seventy per and at U S C
0:20:10uh we here
0:20:11but uh um where are are Q a three we here have we can get
0:20:16if we use the
0:20:18approaches is with the uh
0:20:20uh temporal information or temporal based miss out
0:20:24the thank you so much
0:20:31i i have the time the i i want to show that the more all the uh some narrow
0:20:52i mean um the
0:20:55that are so that's so
0:21:02yeah yeah
0:21:09this is the second pass
0:21:18yeah than
0:21:30a i
0:21:38the high that i had a
0:21:40or not
0:21:51this is a
0:21:55in a set
0:21:56that you can and that's standard that is music
0:21:59okay thanks somewhat
0:22:01i'm not sure that i'd by the music
0:22:02that might be a my tear is not to
0:22:07i i do have one question i haven't thought about audio summary
0:22:11it be some video summaries of movies the that real are often does not look at all like that was
0:22:15normal be
0:22:17and a morning with the white answer is for music do you want
0:22:20just excerpts service you want
0:22:21summing style are used
0:22:24what's gonna be best the mail
0:22:27and if you if you are disk could design or on from know will the use
0:22:31we only do with the uh music
0:22:34B six them out
0:22:35a by the much on the L format
0:22:39what what would be the ideal summary
0:22:42some are we you know or from you know i mean if you could do you want summary what we
0:22:45look like
0:22:46no no things you know what user can
0:22:49um but from some some the mean this some the house say not
0:22:54like the you know
0:22:55some no
0:22:57the what what would be the ideal of male for piece of music
0:23:01and the
0:23:02this one
0:23:04no in general would be just pieces of music would be a beginning a middle of you and would it
0:23:10something completely different
0:23:15so in there
0:23:16put a music that very easy to detect the speech whom
0:23:20information and the we can easily make this some now
0:23:23but the for general
0:23:25and uh some time uh
0:23:27some use a is that very difficult
0:23:30to deal with
0:23:31a a for example of this something funny music
0:23:34uh is difficult but the
0:23:37well though the this is not so just you you are
0:23:41and sell your president but that
0:23:43on the
0:23:44the one
0:23:46in a general
0:23:47to deal with that know music
0:23:50the oh one big problem is uh
0:23:54instrument music
0:23:56right the seem twenty and a is such a case that we
0:23:59a um
0:24:00thus so many
0:24:02quantization vector
0:24:05the after that we can
0:24:07it it the sum
0:24:08on changing point but the
0:24:10some classification are with them is a very
0:24:19but not turn solutions of colleagues minor doing is actually putting it up on the a web single people click
0:24:24so we doing for image from males
0:24:26and so then we can actually measure
0:24:28click to performance a way to right
0:24:31for males
0:24:32which might be points of view at some point
0:24:36you know questions
0:24:39make much
0:24:40thank you