0:00:13so much
0:00:13um yeah
0:00:15a my name is here so but are from now that of science and technology chip and that today i
0:00:20talk about the automatic music
0:00:22some naming based and the or the object will write issue
0:00:27so this is a it's while but this talk the press
0:00:30it's sprain them are our motivation in the back um
0:00:33and the next the i X for in the spatial information C it's ms of based on the of i'm
0:00:40i k-means clustering and uh after the experiment result
0:00:44i will can
0:00:47so this is that that one or or or or or our research
0:00:50the image "'cause" common eh
0:00:52a a feature is a the scraps
0:00:55of of the uh music to a disk chan uh show the abstract information of the musical tune
0:01:03a a a a for example
0:01:05that we can hear if we can cure the some easy call some no
0:01:09and uh we can easily and this ten and that i was strapped all the you formation of the tune
0:01:15and the
0:01:15we can easily i judge
0:01:18to buy it
0:01:19we not
0:01:20uh i a pretty far about one
0:01:23the as a music because some no
0:01:25is a very important
0:01:27but the problem is that common tree the music some nodes um mainly made about nine thirty
0:01:32so that that very big problem because the uh
0:01:35uh there is so many do D in
0:01:37uh doing a is this so many times a so many music tunes
0:01:43and uh for example the
0:01:44so many music uh
0:01:46uh produce
0:01:47a by the current to be shows as we as the uh all these
0:01:52the one difficulty the is uh uh a how to deal use
0:01:57the means go tuned
0:01:59a i and the make the some name
0:02:02in into a
0:02:05but to not a ways
0:02:07and the and seven
0:02:08problem is the back so have now provide that's that very files
0:02:13and the bound to the time segment
0:02:15so for example random pick up a have the scraps
0:02:18are the two is the bit about way
0:02:21so this is a very random ready scrapped and uh
0:02:25um abstract information is not so good
0:02:30i i go or is that to a uh a the technology for and the thing the weight can strapped
0:02:36and of the musical tune
0:02:41that relate it to what is
0:02:44a this one and this one
0:02:46and uh this one is based on the construction or is is by the beach tracking
0:02:51and uh also the uh this one is uh construction and standing by the main male would be and Y
0:02:57so the but mister
0:02:59a a you deal with the uh more or signal
0:03:03and also both met sort
0:03:07try to extract this hum middle D what be
0:03:13but uh
0:03:16in this talk
0:03:17i is it
0:03:19i propose
0:03:19to use them under of the alternative information
0:03:23the for example
0:03:24the most three a a a very but music to and is this still we'll we're a channel for lot
0:03:31so we can easily we extract the spatial information is that the are here such at time temporal information
0:03:38oh that the motivation
0:03:42so the the uh this the research and is
0:03:46the we provide
0:03:47we try to provide
0:03:49the are kind tape
0:03:51a Q
0:03:52for making them is you call us some now reading
0:03:56the is that are the uh
0:03:58temporal information that for cheese used in a conventional missile
0:04:03the fast
0:04:03i propose the sum and noise this mess
0:04:06for of the uh for peak picking at the
0:04:09spatial information of the two
0:04:11and the the neck
0:04:13uh we we a have the uh some uh investigation are be some difference
0:04:18with the L a proposed ms so and this some temporal based ms so
0:04:22and then we propose the might
0:04:28so i
0:04:29but going to be uh estimation
0:04:31a a all the uh all the obvious yeah
0:04:34so we estimate
0:04:36that's spatial information of the uh included in the audio tunes
0:04:42the for example
0:04:43that dramas and some uh questions and uh
0:04:46well clothes
0:04:47a are it at the send
0:04:49normally money
0:04:50but sometimes
0:04:52and the uh
0:04:54and the P are known and uh uh side a get or something
0:04:57uh located at the uh site
0:05:01the to go to an have D S a structure of the spatial direction
0:05:06i a as the temporal direction
0:05:08so that we
0:05:10it want to extract that this spatial information but using the sum of quantization technique
0:05:19this is uh a but we all the estimation process of the of the object
0:05:24well as we up right time fourier analysis right at like the uh commission method
0:05:30and uh
0:05:32we do the some k-means means right clustering technique
0:05:35um but and that i will explain the
0:05:38and uh we pick up
0:05:40thus of quantization but the that
0:05:43expresses is that direction
0:05:45oh the each
0:05:46or the was yeah each instrument
0:05:49and the then
0:05:50the uh we
0:05:52but also the extract the sum activation to be patient function
0:05:56all the uh each
0:05:58uh object
0:06:00and the weak classifier
0:06:03and the we
0:06:04uh we detect
0:06:05that changing time all would be uh some structure yeah or of the uh uh some
0:06:11um music two
0:06:14so this is uh a model all the a time-frequency input signal to it at the
0:06:19so so this is the general form but the no nodding most three we use
0:06:23only them
0:06:24uh to channel signal as the still form a so and equal to
0:06:30and uh this is a quantization but that we want to
0:06:33put it into the this
0:06:36and that this is yeah the uh uh a sorry for B is the in the
0:06:41um maybe the in
0:06:43so this is a mistake
0:06:44um that this is that any given engine now can tie this sum bit to
0:06:48but that this and can be that the mean by our so
0:06:54and that is is a figure yeah are be uh from
0:06:58spectral read
0:07:00uh exist in their right
0:07:02hand side and the left hand side channel
0:07:05in a
0:07:06frequency and the time
0:07:08uh a means that
0:07:09signal now that's still signal i have the such a configuration
0:07:15we you one to
0:07:17a a put the sum and the system better the right this
0:07:20the one the express is that one direction X be a some
0:07:25so sees
0:07:26the and the for example this direction mean
0:07:29uh a present that almost the sent the rock at eight
0:07:33uh instruments right the dramas and the vocal
0:07:36and that this
0:07:37uh left side
0:07:39a a a bit to mean the left on side instruments reich that's side you tell what something
0:07:46so this is uh clustering string
0:07:49uh mathematics in the press ring
0:07:52the we want to
0:07:55the each object
0:07:57but from not i'm of the each cell
0:08:00but the only the direction
0:08:02R be uh is this is a a a a difference from the a conventional and the normal approach to
0:08:08the for example that we
0:08:10we are i don't want to take yeah the i was only to about all that each component at the
0:08:17left and the right hand
0:08:18sound they would domain
0:08:19but that we want to extract that
0:08:22and cool
0:08:23uh from the uh quantization back to
0:08:26so oh
0:08:29so this is that input put at the one three can see under one time
0:08:33and that this is that one
0:08:35chan data of that call quantization back to
0:08:38yeah but that this point i the some back that is normal
0:08:42as a as you need
0:08:44for example
0:08:45and the we calculate not that you a difference between the this
0:08:50that there is
0:08:51but uh we want to cry to eight the in a from the um
0:08:55all the from the upon point this but on whole
0:08:58and that is input signal bit on
0:09:03the k-means clustering
0:09:05a a is uh some
0:09:07time the
0:09:08something though
0:09:09where before like
0:09:10the first we set the in for us
0:09:13and uh some it's sensor it that this see the initial right
0:09:17and and that
0:09:17we update the centroid
0:09:19uh right B S
0:09:21the we cut greek but such a a a a or sign ever
0:09:24cosine signed distance and of a a a a with we round us
0:09:30and the but that this scroll saying error a is simplified the this so this problem
0:09:36is very simplified
0:09:37uh a right the uh this solution is given by the finding the maximum eigenvalue problem
0:09:43or that this correlation function of the input signal we thing that one for us
0:09:49then that we solve this the maximization problem uh using the sum it's B D or something
0:09:55then we
0:09:57a uh you define the
0:09:59centroid new centre right
0:10:02uh we define the new
0:10:04you a and all the a price
0:10:07go back to a
0:10:08so and that we calculate that this
0:10:10a hmmm
0:10:13a of to buy station for each class
0:10:15each in four
0:10:20the factory
0:10:21no we obtain the optimal quantization back to at each and H T close that such a a uh since
0:10:27centroid the C
0:10:28and that that press flight each component to in an old you all and that's all process index i
0:10:35yeah the this
0:10:36class index i is the important because the a it just right
0:10:40the representation of the activation are be a round direction
0:10:44at the time and if we can see
0:10:47so that we or would be it
0:10:51trust index function is or audio object localization
0:10:55so that mean the at that one
0:10:58at uh
0:11:00frequency and can time point
0:11:02the if the i function equals and
0:11:05and for example the one and two and three
0:11:07and that there
0:11:08um the this very equal one um that the wide do so this
0:11:13mean the
0:11:14some up to be should so
0:11:16right see the uh example of the P yeah or the set yeah i function
0:11:22this is that we all music tune
0:11:24uh the praying the trump bit but prone tramp bad bass and drums and that's that's form
0:11:31and the entire bow a
0:11:33is is the that the or all right
0:11:35and the entire bar be E the saxophone solo pot
0:11:39so i think you can see the uh in the one
0:11:42class is very active at that in borrow a be the a trump it is very active
0:11:48and then we see the very short
0:11:51it re all or a T V active area or a for channels of that this the mean that row
0:11:57few we
0:11:58the the purely
0:11:59but it short there three
0:12:01and then
0:12:02to D uh in the L be at that
0:12:05sets once so so this is the sex
0:12:07so this is just like the sum
0:12:09separation results of the aspects down
0:12:16and that there that we
0:12:18well do you want to pick up that changing time point
0:12:21so that we more simplify the information
0:12:25that we march or the such that the a you for make up to be shown information
0:12:31uh a a all with that the see we march
0:12:34and then we define a this is the a based P at the time
0:12:38so that a we give the some frequency weighting
0:12:42and that this is the sum example
0:12:44or the us
0:12:45block this um do then T
0:12:48a is the changing time be L structure change point
0:12:52so the first and this back and
0:12:54crass and is that
0:12:56that maybe that one
0:12:57it's a we up
0:12:59and that then of the the changing point
0:13:02the and and that's that
0:13:04S S is a
0:13:06this mean that the
0:13:08and the instrument
0:13:10uh is
0:13:11a the a yeah be of and then save to the uh press one so this is just like this
0:13:18i to be shown
0:13:21uh i run with the spatial
0:13:27we define
0:13:28and that we want to pick up
0:13:31that that i mean that is
0:13:33being point
0:13:35changing point right it
0:13:39the sometimes
0:13:40there are the sound fine at fat to write this some vibration
0:13:45so uh and the we
0:13:46a a a a some losing the very simple simple supposing technique technical and with the time
0:13:52and also also do we do the we ring
0:13:55R be at this a requisition things P
0:13:58and uh into the ability to get the number of classes
0:14:02as the for example that we a assumed a for a little
0:14:06stay it the first date
0:14:08that one
0:14:09re on side is that sent i that B
0:14:13only the right hand side is that the
0:14:17doubt where this
0:14:19signal is
0:14:22so we classified the full state
0:14:25and that this
0:14:26a a close is that very robust to result
0:14:31rats go to the evaluation
0:14:34the we do the experiment using that out that was C P
0:14:38popular pure music database that we you got the twenty five
0:14:42a a popular music signal that the see the john
0:14:45that he pop rock are up they can goes the pops
0:14:49the uh is on bruce or metal
0:14:52and the we money you white put uh a a two hundred sixty seven structure changed time
0:14:57by man you're right
0:14:59in the database that which are regarded as the correct
0:15:02so so in this our experiment
0:15:05the number of was is that to so that this is a
0:15:08the real marked
0:15:09and uh we
0:15:11set the quantization but the the number of sent on this some but the
0:15:15E so we
0:15:20so this is uh a
0:15:21a a result using the will propose a men so
0:15:25the number of or or the quite cells is that two hundred sixty seven
0:15:29and uh
0:15:31we pick gap though one hundred to ninety three
0:15:35all the uh correct on so the with the D is uh not do that it that this is the
0:15:40fire detection
0:15:41so that precision that the record is a most the uh a point seven seventy percent more than seven me
0:15:47and the a major
0:15:49is there a
0:15:51the point seven
0:15:52for all
0:15:53so the more than seventy percent
0:15:56uh a detection are correct
0:16:02the somebody
0:16:04a a a a a have a some uh interest
0:16:08in the comparison with the this
0:16:10a a a spatial information base ms so
0:16:13and the we've D S some conventional temporal based missile
0:16:17so that we compare
0:16:19the we uh do the experiment with the apple bit tape method it's that P L C A proposed by
0:16:27uh to sell them ten that this is that and map based ms
0:16:31and automatic detect detector to duration
0:16:34and that yeah do you know so this is the temporal based missile
0:16:39the as of like the
0:16:40a a and then stuff can be a conventional missile
0:16:45but that
0:16:45a is not that i propose a is the to talk talking
0:16:49and that's spatial
0:16:52so that they say that that's the comprise and we
0:16:55uh i think priest showed the uh
0:16:57precision and the recall and if me
0:16:59so at the you can see
0:17:01the uh in a if a major
0:17:03then not so much
0:17:05a a a a a different
0:17:08alright is
0:17:09in in in that if it's itself
0:17:13the contents
0:17:14the re are detection
0:17:16behave year
0:17:17is before and so this is a some investigation the part the difference
0:17:22or be a propose the spatial based ms so we've the it's some temporal basements missile
0:17:27that we
0:17:29how to wait the some you relation P here all the uh conventional ms and uh proposed mess
0:17:36so the this
0:17:38um the
0:17:40one hundred twenty seven detection
0:17:43uh uh
0:17:45by whole
0:17:47but uh
0:17:48fifty is three detection
0:17:51only only at that
0:17:52he detected it only by the proposed mess
0:17:55so this is a detected by the spatial information
0:17:59and and the side
0:18:00the forty nine detections
0:18:02only by probable at the
0:18:04detected by the if that P yeah a so the see the temporal
0:18:09at the you can see
0:18:11the yeah yeah is very similar or local you relation
0:18:16or or they uh if a major is the one the same but the a from the desired is a
0:18:22very different
0:18:24and uh also so this is a very complimentary
0:18:28so um then next that is that we apply this sum
0:18:32a margin technique
0:18:36a maybe may you you you do in the uh
0:18:38we have this so many idea of the matching technique but that in this
0:18:44and i
0:18:45a right a very simple one
0:18:47but row or operation the very simple one but the very effective
0:18:52so this is the result of the uh
0:18:54marched up two
0:18:56so that we can see the very good
0:18:59if a major
0:19:00so the the baseline the proposed a mess so that's a show on B
0:19:06ah a
0:19:08the if major is at the point a seven for what the this my suit
0:19:13a a deep the uh more than eighty percent
0:19:16accuracy the the so
0:19:18the in control region the spatial and temporal information
0:19:23yeah be a very good
0:19:24information and the very complimentary
0:19:27so that if we
0:19:29gives a
0:19:29the bells
0:19:30that we caff a the a better the result
0:19:34so this is uh can jumps that i propose the new alternative mess
0:19:40to detect the changing point all the M is you to post music some naming
0:19:45the conventional a method is based on the temporal structure extract
0:19:49it's tracking ms
0:19:50but the uh will probable the ms not lead the spatial
0:19:54uh information based ms
0:19:56that we detect
0:19:57that changing the time of the dsp shows structure
0:20:01or be a music amount chan beauty to
0:20:04so that they the
0:20:06using the i miss that the seventy per and at U S C
0:20:10uh we here
0:20:11but uh um where are are Q a three we here have we can get
0:20:16if we use the
0:20:18approaches is with the uh
0:20:20uh temporal information or temporal based miss out
0:20:24the thank you so much
0:20:31i i have the time the i i want to show that the more all the uh some narrow
0:20:52i mean um the
0:20:55that are so that's so
0:21:02yeah yeah
0:21:09this is the second pass
0:21:18yeah than
0:21:30a i
0:21:38the high that i had a
0:21:40or not
0:21:51this is a
0:21:55in a set
0:21:56that you can and that's standard that is music
0:21:59okay thanks somewhat
0:22:01i'm not sure that i'd by the music
0:22:02that might be a my tear is not to
0:22:07i i do have one question i haven't thought about audio summary
0:22:11it be some video summaries of movies the that real are often does not look at all like that was
0:22:15normal be
0:22:17and a morning with the white answer is for music do you want
0:22:20just excerpts service you want
0:22:21summing style are used
0:22:24what's gonna be best the mail
0:22:27and if you if you are disk could design or on from know will the use
0:22:31we only do with the uh music
0:22:34B six them out
0:22:35a by the much on the L format
0:22:39what what would be the ideal summary
0:22:42some are we you know or from you know i mean if you could do you want summary what we
0:22:45look like
0:22:46no no things you know what user can
0:22:49um but from some some the mean this some the house say not
0:22:54like the you know
0:22:55some no
0:22:57the what what would be the ideal of male for piece of music
0:23:01and the
0:23:02this one
0:23:04no in general would be just pieces of music would be a beginning a middle of you and would it
0:23:10something completely different
0:23:15so in there
0:23:16put a music that very easy to detect the speech whom
0:23:20information and the we can easily make this some now
0:23:23but the for general
0:23:25and uh some time uh
0:23:27some use a is that very difficult
0:23:30to deal with
0:23:31a a for example of this something funny music
0:23:34uh is difficult but the
0:23:37well though the this is not so just you you are
0:23:41and sell your president but that
0:23:43on the
0:23:44the one
0:23:46in a general
0:23:47to deal with that know music
0:23:50the oh one big problem is uh
0:23:54instrument music
0:23:56right the seem twenty and a is such a case that we
0:23:59a um
0:24:00thus so many
0:24:02quantization vector
0:24:05the after that we can
0:24:07it it the sum
0:24:08on changing point but the
0:24:10some classification are with them is a very
0:24:19but not turn solutions of colleagues minor doing is actually putting it up on the a web single people click
0:24:24so we doing for image from males
0:24:26and so then we can actually measure
0:24:28click to performance a way to right
0:24:31for males
0:24:32which might be points of view at some point
0:24:36you know questions
0:24:39make much
0:24:40thank you