a good off no everyone my name's you huh some some from the interactive you what you live of for most mattering your numerous D uh and to gonna present not each traction a using in probably a probabilistic latent component analysis so is a joint work we chin which and uh from the media technology like of a grace not so a sorry so this it a and a followed walks to for us i'm gonna give a brief introduction of a system and in i'm gonna and introduced to model we use the pharmacist system and uh i describe our a system and a present the experiment result and uh in the and i conclude that would talk uh so in this but you could uh paper we treat the singing voice as a matter of the of the song so it's not always to but we but if is true in a lot of a cases so this that ensure it this is a a a a a battery a brief intro we you in the real from system so we have a a all audio signal and all the fit to can it the for at the segment at in two non vocal segment and a walk segment of a singing voice detection what do you know system so the local step segment of the audio signal is just the uh the segment so contains only to accompaniment of the cell and the work segment it just segment are a which can of local the thing voice as a as the accompaniment of the so i then in the next that uh we train we train a company model from the non vocal segment using a P L C A which i'm gonna discuss later in this uh talk and then uh after we train a company model a common model will be applied to the local segment to extract the singing ways of the song and it doing in the and a P check make out with them is applied to the tract is one ways due to attract the that is of the of the cell so uh i'm gonna to introduce uh i one model use in our system so to mixing in here we start from a single slice of the spectrogram we put a single size thousand that's spectrogram gram a uh on the right of the figure and the which treat these spectrum and the histogram so this is his ground to be generated by some kind of a probability but and uh it's just a um a a multinomial distribution so we look at different so a spectrogram at a different at for this to uh of the signal all we can see that different at different have a it will have a diff different spectrum vector and uh so we we could possibly just the use a multinomial distribution for every single uh a that rome of the signal but sure is gonna we in this case we when you don't out of a uh a component it's gonna be a lot dictionary so is that we just use a dictionary of for like a one hundred a spectral vector at so you each so back from of the style we be uh the your component uh in yeah combination of the spectral vector so in this case we you we should have a model which is the probability latent component analysis so we have a a dictionary of a spectral vector uh which we need to to learn from the of the re showing we G the spectrogram of the song and uh it's called and the component analysis because these parameters to are late and compose uh and they and of arrivals and uh in this case so this is the mixture weight and this is the spectra oh a vector at and uh we a model these spectrum user in your comp mean your combination of the spectrum vectors and uh this to parameterize of our model can be estimated by expectation maximum algorithms resumes are will not going to the detail of the estimation you can refer to our paper for the detail so now we we should have a more though that's that we are going uh do the singing voice extraction so you to the uh at constraint i'm gonna just of focus on this part of our paper so so this is uh image each uh this it the audio signal and the we how uh us you wise detection and i was um to a segment of signal into different card which is still not work or part of it the segment and is a walk or part of a segment so this is just a a you dust reaching example in in the real case i to gonna be much company it is not that's a so non vocal segment we'll real just in the beginning and the vocal part it in the and the so is not that case i just a want to uh give us an a point you know situation shape so after we at then you find it's a non vocal segment are we use a P L A P L C a model to training uh a spectral back to a dictionary of spectral or for the accompaniment so now we have a dictionary each week sprang only use the accompaniment and then the next step so for the local segment we do with the P L C you training as you were but should we fix some of of the uh component two is already pre ah spectrum back for the non local segment and the we have some uh free component to we explain the uh i don't know close still still a part of the singing wise so in the end that we will have a two different group of a dictionary the this group is the pre-trained fig uh a component for the non local power and of this part of real is the new train from the local segments really like spring most of the the uh the scene voice of the south so as the and we just a re reconstruct out signal a separate the T we use uh new new don't a component for the work all to extract a to to reconstruct so uh the singing voice and the we use the fixed now what corpora uh that the non components two to to extract the accompaniment to in the local that so in the and at a simple uh pitch estimation algorithm will be applied to this extracted a seen ways to extract this the P of of the song uh so i not to mention that so oh very is similar uh system is proposed in this paper a but as margaret some at this uh so the different speech i was system them and uh he's system is that our system a has a uh a pre-trained trained us so a sting was detection what do so we so is totally automatic and that in you his paper so they just a manually detected the singing voice and at the uh so so now pad and the training data manual uh so we also you value of our a system uh like a a the experiment without and this is just a simple ink example point this paper so yeah is some uh just reach an example of of a system so this on on the top of the graph he's uh uh it's this on us it's a simple matter and uh the second figure it's a you track is singing voice of i was is um a to uh applied to the to the polyphonic music and that the sort of a you got a is it i is a original separated like a scene wise and and uh last the you tracked leader estimation from how we track the singing voice and uh are gonna listen to the example but a a gonna um and and i sure just can easily of a the system is an of the stuff for this is just about to compress a company might show that uh keep the same voice that for nonetheless well so this are it only of it is much better when uh so after after we extract it the thing you voice we just apply a wire simple autocorrelation correlation based pitch estimation algorithm to this map oh we can still here as a accompaniment in the uh and them uh in this track is singing voice but uh oh auto correlation would extract the lack a in but you gonna get example way that you track at eighty percent of the correct each for the same way so we do some uh comparison to other two system the first system them he is uh a multi estimation system at you about developed the you our live so it somewhat pitch estimation system so we treat stuff us the peach estimate in each frame as that i them out of the of the star and the second system is a a a singing voice extraction system so that the sort that is a result of our system so as we can see our system has a bad or read for F measure an accuracy a compared to the at the system and uh i has a comparable uh procedure jane uh compared to do bass the system in precision uh you evaluation and the second one has a relatively low performance uh we believe that so why in the thing voice see trashing uh i was um but should we only use a uh uh we only use the predominant pitch estimation at result with been in this work and uh so is kind of a a the track a P H uh is not this is not the singing voice the P each patch is that the other accompaniment instrument a peach so maybe it's that's that treat uh the other company min at the pre don't peach so we P D a tuning of the parrot or at of for the second a system would increase the performance without uh so we conclude our paper here uh and uh first the of the probability late and were rival model is uh intra use the to company meant and the lead singing ways that that you would be you have system and uh the experimental results show that the at the of the thing ways could be six F extracted in uh in that eight that the be used uh a a paper and and they are of with do some future uh directions so for us to so the work on a local singing voice detection i was um is uh a based on causing extra model it's so we we data of land and the opposite have some improvement uh space so we want to uh don't future uh you research on this but also we want a better pitch mission algorithm uh for the thing was uh what the scene was detection uh what do so this can our paper as as we still have have a so i is this work uh while i was doing an internship should meeting region so in greece note company so i i want to sec my colleagues in grace notes for the could use for discussions and uh we also want to thank uh review or of of a paper to help improve a paper and i want to send so my to wise and them according signals were in to help me to improve the presentation just thank you yes a of my name is gail uh from data compare stick and i one question concerning the segmentation you have not talk too much about segmentation but i guess that if you are you you have a on the segmentation you would probably have less good the um the separation of back from middle these two can you give us some hints of a whole well or from the segmentation sure sure so uh the segmentation is based on cost a mixture model also we trained the cost a mixture model um like a fifty pre label the manual label the uh sounds like a commercial music is include the as a pop music rock music and we just a pre-trained as this uh model wines and the the four uh for the new racks uh a so you ways you in uh so a data as that and uh so uh we're a accuracy of for the scene what detection more do is a run seventy person and the you mention that you've uh yeah our system that depends on the performance of the singing voice detection module so you've the thing was to talks to more that than work it will feel because T to real treat some a local segment of uh at of non vocal segment and uh the diction we train from this part of we also you for and uh seem ways so in this part of it's uh a system a but no book where where L so that's i want to research in this prior i in the future i a question for you i was of your or you example was wonderful but it's some like them the major draw the fact i good the symbols coming through the both okay i a more that one was were popping out uh so on one is i don't know the cyst so sit them automatically trained that uh so that that's the late and uh dictionary to explain the accompaniment so um by maybe be and one possible reason is that are made is that it actually do not local segment that does not contain the simple or the symbol is not a predominant in now to be experiment and of the at uh yeah that's my good mate you explanation based for so well we do the system a segmentation we try to use the a uh use the uh so non vocal segment uh you to the local segment it to explain it it's each other that we do not want to use not the non vocal segment or you the and the to explained uh the what segment in the beginning so a i was some is that the sound to be consistent the with the a calm accompaniment but they to be a change or what town so yeah you know questions like like remote