so much um yeah a my name is here so but are from now that of science and technology chip and that today i talk about the automatic music some naming based and the or the object will write issue so this is a it's while but this talk the press i it's sprain them are our motivation in the back um and the next the i X for in the spatial information C it's ms of based on the of i'm best i k-means clustering and uh after the experiment result i will can so this is that that one or or or or or our research the image "'cause" common eh a a feature is a the scraps of of the uh music to a disk chan uh show the abstract information of the musical tune a a a a for example that we can hear if we can cure the some easy call some no and uh we can easily and this ten and that i was strapped all the you formation of the tune and the we can easily i judge to buy it we not uh i a pretty far about one so the as a music because some no is a very important but the problem is that common tree the music some nodes um mainly made about nine thirty so that that very big problem because the uh uh there is so many do D in uh doing a is this so many times a so many music tunes exist and uh for example the so many music uh uh produce a by the current to be shows as we as the uh all these so the one difficulty the is uh uh a how to deal use the means go tuned a i and the make the some name in into a by but to not a ways and the and seven problem is the back so have now provide that's that very files and the bound to the time segment so for example random pick up a have the scraps are the two is the bit about way so this is a very random ready scrapped and uh um abstract information is not so good oh the i i go or is that to a uh a the technology for and the thing the weight can strapped and of the musical tune that relate it to what is the a this one and this one and uh this one is based on the construction or is is by the beach tracking and uh also the uh this one is uh construction and standing by the main male would be and Y so the but mister a a you deal with the uh more or signal and also both met sort ah try to extract this hum middle D what be if uh but uh uh in this talk i is it i propose to use them under of the alternative information the for example the most three a a a very but music to and is this still we'll we're a channel for lot so we can easily we extract the spatial information is that the are here such at time temporal information oh that the motivation so the the uh this the research and is the we provide we try to provide the are kind tape a Q for making them is you call us some now reading the is that are the uh temporal information that for cheese used in a conventional missile the fast i propose the sum and noise this mess for of the uh for peak picking at the spatial information of the two and the the neck uh we we a have the uh some uh investigation are be some difference with the L a proposed ms so and this some temporal based ms so and then we propose the might approach so i but going to be uh estimation a a all the uh all the obvious yeah so we estimate that's spatial information of the uh included in the audio tunes the for example that dramas and some uh questions and uh well clothes a are it at the send normally money but sometimes and the uh and the P are known and uh uh side a get or something uh located at the uh site so the to go to an have D S a structure of the spatial direction i a as the temporal direction so that we it want to extract that this spatial information but using the sum of quantization technique this is uh a but we all the estimation process of the of the object well as we up right time fourier analysis right at like the uh commission method and uh we do the some k-means means right clustering technique um but and that i will explain the and uh we pick up thus of quantization but the that expresses is that direction oh the each or the was yeah each instrument and the then the uh we but also the extract the sum activation to be patient function all the uh each uh object then and the weak classifier and the we uh we detect that changing time all would be uh some structure yeah or of the uh uh some um music two so this is uh a model all the a time-frequency input signal to it at the so so this is the general form but the no nodding most three we use only them uh to channel signal as the still form a so and equal to and uh this is a quantization but that we want to put it into the this signal and that this is yeah the uh uh a sorry for B is the in the um maybe the in so this is a mistake um that this is that any given engine now can tie this sum bit to but that this and can be that the mean by our so and that is is a figure yeah are be uh from a spectral read uh exist in their right hand side and the left hand side channel so in a frequency and the time uh a means that signal now that's still signal i have the such a configuration then we you one to a a put the sum and the system better the right this the one the express is that one direction X be a some so sees the and the for example this direction mean uh a present that almost the sent the rock at eight uh instruments right the dramas and the vocal and that this uh left side a a a bit to mean the left on side instruments reich that's side you tell what something so this is uh clustering string uh mathematics in the press ring the we want to classify the each object but from not i'm of the each cell but the only the direction R be uh is this is a a a a difference from the a conventional and the normal approach to the for example that we we are i don't want to take yeah the i was only to about all that each component at the left and the right hand sound they would domain but that we want to extract that and cool uh from the uh quantization back to so oh so this is that input put at the one three can see under one time and that this is that one chan data of that call quantization back to yeah but that this point i the some back that is normal as a as you need for example and the we calculate not that you a difference between the this that there is but uh we want to cry to eight the in a from the um all the from the upon point this but on whole and that is input signal bit on so the k-means clustering a a is uh some time the something though where before like the first we set the in for us and uh some it's sensor it that this see the initial right and and that we update the centroid uh right B S the we cut greek but such a a a a or sign ever cosine signed distance and of a a a a with we round us signals and the but that this scroll saying error a is simplified the this so this problem is very simplified uh a right the uh this solution is given by the finding the maximum eigenvalue problem or that this correlation function of the input signal we thing that one for us then that we solve this the maximization problem uh using the sum it's B D or something then we a uh you define the centroid new centre right then uh we define the new you a and all the a price then go back to a so and that we calculate that this a hmmm a of to buy station for each class each in four the factory no we obtain the optimal quantization back to at each and H T close that such a a uh since centroid the C and that that press flight each component to in an old you all and that's all process index i yeah the this class index i is the important because the a it just right the representation of the activation are be a round direction at the time and if we can see so that we or would be it a uh trust index function is or audio object localization so that mean the at that one at uh frequency and can time point the if the i function equals and and for example the one and two and three and that there um the this very equal one um that the wide do so this mean the some up to be should so right see the uh example of the P yeah or the set yeah i function the this is that we all music tune uh the praying the trump bit but prone tramp bad bass and drums and that's that's form and the entire bow a is is the that the or all right and the entire bar be E the saxophone solo pot so i think you can see the uh in the one class is very active at that in borrow a be the a trump it is very active and then we see the very short it re all or a T V active area or a for channels of that this the mean that row few we the the purely but it short there three and then sift to D uh in the L be at that sets once so so this is the sex so this is just like the sum separation results of the aspects down and that there that we well do you want to pick up that changing time point so that we more simplify the information that we march or the such that the a you for make up to be shown information uh a a all with that the see we march and then we define a this is the a based P at the time so that a we give the some frequency weighting and that this is the sum example or the us block this um do then T so a is the changing time be L structure change point so the first and this back and crass and is that that maybe that one it's a we up and that then of the the changing point the and and that's that S S is a so this mean that the and the instrument uh is a the a yeah be of and then save to the uh press one so this is just like this um um i to be shown sequence uh i run with the spatial direction so we define and that we want to pick up that that i mean that is so being point changing point right it the sometimes there are the sound fine at fat to write this some vibration so uh and the we a a a a some losing the very simple simple supposing technique technical and with the time and also also do we do the we ring R be at this a requisition things P and uh into the ability to get the number of classes as the for example that we a assumed a for a little stay it the first date that one re on side is that sent i that B only the right hand side is that the and or doubt where this signal is a so we classified the full state and that this a a close is that very robust to result okay rats go to the evaluation the we do the experiment using that out that was C P popular pure music database that we you got the twenty five a a popular music signal that the see the john that he pop rock are up they can goes the pops the uh is on bruce or metal and the we money you white put uh a a two hundred sixty seven structure changed time by man you're right in the database that which are regarded as the correct so so in this our experiment the number of was is that to so that this is a the real marked and uh we set the quantization but the the number of sent on this some but the and E so we so this is uh a a a result using the will propose a men so the number of or or the quite cells is that two hundred sixty seven and uh we pick gap though one hundred to ninety three and or all the uh correct on so the with the D is uh not do that it that this is the fire detection so that precision that the record is a most the uh a point seven seventy percent more than seven me person and the a major is there a the point seven for all so the more than seventy percent uh a detection are correct so the somebody a a a a a have a some uh interest in the comparison with the this a a a spatial information base ms so and the we've D S some conventional temporal based missile so that we compare the we uh do the experiment with the apple bit tape method it's that P L C A proposed by white uh to sell them ten that this is that and map based ms and automatic detect detector to duration and that yeah do you know so this is the temporal based missile the as of like the a a and then stuff can be a conventional missile so but that a is not that i propose a is the to talk talking and that's spatial base mission so that they say that that's the comprise and we uh i think priest showed the uh precision and the recall and if me so at the you can see the uh in a if a major then not so much a a a a a different alright is in in in that if it's itself but the contents the re are detection behave year is before and so this is a some investigation the part the difference or be a propose the spatial based ms so we've the it's some temporal basements missile yeah that we how to wait the some you relation P here all the uh conventional ms and uh proposed mess so the this um the one hundred twenty seven detection uh uh by whole ms but uh fifty is three detection only only at that he detected it only by the proposed mess so this is a detected by the spatial information and and the side the forty nine detections only by probable at the detected by the if that P yeah a so the see the temporal result so at the you can see the yeah yeah is very similar or local you relation um or or they uh if a major is the one the same but the a from the desired is a very different so and uh also so this is a very complimentary so um then next that is that we apply this sum a margin technique yeah so a maybe may you you you do in the uh we have this so many idea of the matching technique but that in this and i a right a very simple one but row or operation the very simple one but the very effective so this is the result of the uh marched up two so that we can see the very good if a major so the the baseline the proposed a mess so that's a show on B so ah a the if major is at the point a seven for what the this my suit technique a a deep the uh more than eighty percent accuracy the the so the in control region the spatial and temporal information yeah be a very good information and the very complimentary so that if we gives a the bells that we caff a the a better the result so this is uh can jumps that i propose the new alternative mess to detect the changing point all the M is you to post music some naming the conventional a method is based on the temporal structure extract it's tracking ms but the uh will probable the ms not lead the spatial uh information based ms that we detect that changing the time of the dsp shows structure or be a music amount chan beauty to so that they the using the i miss that the seventy per and at U S C uh we here but uh um where are are Q a three we here have we can get if we use the mod approaches is with the uh uh temporal information or temporal based miss out the thank you so much for i i have the time the i i want to show that the more all the uh some narrow okay for yeah i mean um the that are so that's so yeah yeah yeah i that this is the second pass i yeah than i i i and i yeah a i john the high that i had a or not i a a i and a a this is a i i in a set that you can and that's standard that is music yeah okay thanks somewhat i'm not sure that i'd by the music that might be a my tear is not to i i i do have one question i haven't thought about audio summary but it be some video summaries of movies the that real are often does not look at all like that was normal be and a morning with the white answer is for music do you want just excerpts service you want summing style are used what's what's gonna be best the mail and if you if you are disk could design or on from know will the use we only do with the uh music B six them out a by the much on the L format yeah what what would be the ideal summary some are we you know or from you know i mean if you could do you want summary what we look like no no things you know what user can um but from some some the mean this some the house say not like the you know some no the what what would be the ideal of male for piece of music and the this one no in general would be just pieces of music would be a beginning a middle of you and would it be something completely different um so so in there put a music that very easy to detect the speech whom information and the we can easily make this some now but the for general and uh some time uh some use a is that very difficult to deal with a a for example of this something funny music uh is difficult but the well though the this is not so just you you are uh and sell your president but that on the the one in a general to deal with that know music the oh one big problem is uh many instrument music right the seem twenty and a is such a case that we a um thus so many quantization vector yeah that the after that we can it it the sum on changing point but the some classification are with them is a very complex okay but not turn solutions of colleagues minor doing is actually putting it up on the a web single people click on so we doing for image from males and so then we can actually measure click to performance a way to right for males hmmm which might be points of view at some point you know questions make much thank you