Speech Transcript - POINT PROCESS MCMC FOR SEQUENTIAL MUSIC TRANSCRIPTION

mean and i'm started to just one right but it's it's not simon i um so i'm gonna talk about the music transcription work um from my master's project last year and so uh just to go to what we to the transcription it's and say i musical signal might look some a bit like that's so this is a a a a just a time domain signal it's can be roughly periodic and it's it's can have a a whole that's of um sinusoidal and components each with a a a different time varying amplitude and but that's how we perceive music is that and this is what we what we sort i think of we we have a we think of a a a is no and and then some of a high tech and high level properties uh such as the and expression and that the timbre of instrument so and so what would like is a system that can take some might this and turn it into something like this now that's quite an ambitious things do one step say and we gonna a for a a the intermediate results that something like this this is that it can or roll uh and we've got um um like or got and the pitch of the night sub side yeah and time on bottom and the line indicating which makes the presents and this is from them and you work silence and just on a single byte modeling um so what i'm gonna do is just talk about a um uh sequential um framework that doing this night estimation and and not talk about the the models um that we we using say we we got a a like you'd model using a some point point processes and then something simple dynamic models for them next evaluation and then not talk that's and and C M C scheme to some results and so first all um i'm a music is and a continuous signal and uh we we want to look at and we can see domain model said pressing we gonna do is to um chop it up into frames and i will reference the frames with this then subscript Z here and then for each frame would like to estimate that was set to make its presence which will cool be to T an out and given the data that we've got for that frame to go white T and and the way we can do this is by looking at this uh a joint posterior a of the V the notes in the current frame and the previous frame and you're recognise this from the the previous talk it's that the same um that and say we've got we we can expand this one three times um i like it's um yeah a transition that sticks and and then this um uh posterior time from the previous and processing step say and you might in this uh T minus one implements to then just a marginal of that so i got a yeah a particle up or the previous frame we can smoke that so let's less of for yeah it's the the models that we using for that selected at um so um i mentioned we can use frequency domain model say this is just the actual time area transform of uh one of the frames i'm see that what we interested in it is this that of P down here and that's that's a lot of redundant information down here in the noise level so the first thing we gonna do straight away they will run of that just a peak detection algorithm and this is very simple we we just looking at the first order difference and then and give a median threshold on it and so we would use the bispectrum down to just this that's of a red circle pizza um now what would like to model is but the frequency and the amplitude uh it it sends out the the amplitude these peaks is dependent on a an off lot factors including in but that's playing and uh the you recording environment and most of all them are very very of a time so and print together a simple and robust models as is difficult say what we're gonna start of but just looking at a model for the the frequencies of the the set of so and say if we and if we have one night playing you know um frame then what we what we C characteristically is a a peak at some fundamental frequency that's that's the the lowest P can't with and with than it a yeah the fundamental and then we see yeah a sets of peaks that's um i times a partial frequencies i would is the set up here and there approximately in multiples of the uh a fundamental and but we don't always get a P in all these locations some plus one thing here and we don't know how many of them they'll be ha ha how high we have to go up and in addition we gonna get some cuts of up yeah and it's gonna be due to um a a noise all transients affects which were not really modeling and or the non musical sounds and recording i uh so if we if we have a lot of no present in the frame we up with a horrible they rest a station issue where we we'd like to link every P we've i that one of the nets presents all a a cut the price and but so that that gives us some horrible scaling in complexity as we increase number of nights of the number of at times um so we can get around this by um making it a um using up a possible process assumption about the uh the generation of peaks in a spectrum so we seen that's and for each of its own and the pizza generated in the in the spectrum according to a poisson process and we can construct a and in intensity functions this um for some process by which has a maximum at the expected uh frequency of the uh i that i uh no this is quite a significant assumption um germany many where we only expect a see no P school one be or maybe in some rare cases some some respect peak um now with this assumption we we gonna have a a a a some distributions of the number of peaks at that time that time and so and that's that's the bad thing that the good thing is that um because of the union property of price some processes we could just at the intensity functions uh for each i've i a to of us uh and that's T function like this for the a whole night's as a a personal press and and you can see we we constructed this um with a a very now large can combine it's that fundamental showing that way it would pretty certain is gonna be a peak that and we and we quite sure uh what frequency will be at and we've got some a a small components it's a high frequencies where with less that exactly what frequency the people look occur um and then if we have more one they present the again we can just at these and intensity functions together a for all the different nights and give us a i and a poisson process but for all the peaks in a and all spectrum uh say just that he's a mac and this is uh we would been using a a gaussian mixture model to to construct these these note and intensity function and and then we just uh adding them together to give the entire frame and intensity function and then adding on and a little bit extra um uniformly to account for that that scott's of peaks so the cut up for some process and and then once we got this we uh i like uh a like cleared uh expressions of the um frame so um just a integrating the intensity function a each um and of the fast a transform that would give us the an expectation of um for the the presence of a peak in that bin and then uh we can just um take a like you like this um from a from for speech and and then not all together to give the cycle frame likely and now i said i'll of the uh and attains a cow approximately at integer multiples of the fundamental um and it it sends out that for um especially for a stringed instruments uh they the you ten step the spread out so high frequent so and we've been using a um a models of this in a menace T and the going to this formula i can from the that and and this introduces another parameter that we can have to rest which is if this be here which is that it's a and in how many city parameter a for each night um so the things we have to estimate and now adding up that that speech to that we had a idea and if if we use take this be that the set the problems as we need to estimate we've got so and the number of notes and then for each night a fundamental frequency the number of partials annals that in in how T um maybe non to the um transition density and now we've been using some very simple models least a for um and they'd been based on two quite basic observations say press the that's if an is present in one frame then it's like that that it is also and present in the next frame um and second that's uh it this is the number of nights present in one frame then it's like you will have the same number of nights in the next frame and i'll we see there are formal um higher a levels of modeling the we could do here looking at how the the number of partial frequencies change between frames so you expect that the K um and also um modeling a a note onset set sets we have like that i is um but would now got everything we need to do some inference say this is that this is the thing we trying to rest make run but and we defined a model for the like it that's that the poisson model and we've got a my simple models for the the transition then T and so now we can use the um and C C particles out with them uh which but never and just the last talk and uh two S make this this joint that's T um now the the problem is that if we've got a large number of next then we've got a lot of parameters now um to about three from this region a remember at which means if we try and change all of them at once we end up with very low acceptance rates than all um markov chain um okay that the way to get around this um for we gonna have to sorts of move we can have means where we only and change the and the current frame parameters and then all these where we we trying change by the previous frame and the current frame from and for the current frame and it's it's nice you we can just use metropolis with gives them a just choose to use change some subsets of the problem as that once will just change and the three parameters the say seated one nights uh in each step um the joint moves it gets a little more complex um what would like to do is sample poll a the T minus one from the um possible distribution from a from the previous frame and then uh propose the card frame is from some provides uh say the the problem here is that if when we do the sampling we will be changing all of the T minus one promises as in one guy and and again that gives the is very low acceptance rates uh say a solution to this this being C to take the the particle distribution and it's of collapse it onto to a a a single univariate histogram uh for for all the different possible notes that we have in the previous frame and then we use this to as an approximation for the the the marginal distribution of um each night and then and the the of for uh independent it that's you my as one and this means that we can sample um one day to to time uh the um of the but the T minus one parameter um and that and again gives acceptable acceptable uh except at uh a to finally we we want to made the number of makes present in each frame and that can be done very nice just by putting the whole thing into a a reversible jump um formulation and so that's some look at some results and so this is the the output from a couple of markov chains this is a a a a simple case where we just got one night and we're not looking at reversible jump a that what we fixing the number of nights that one button yeah and you can see that it it and it picks up the correct night in on the first iteration in fact factor um and the other a from just on the tree can you a green i think but it so takes about twenty frames segments and and then here on the right got a a three nee case um and we doing reversible jump mcmc now say um rest making the number of nights air um and that's and yeah so yeah that pretty much correct um with the the frequency say we see a fixed to of the knight's you very quickly and then it its troubles to choose between three possibilities here and this the three cases are in fact space i not to the parts and and the reason that some confusion there is "'cause" the three next have you much the same sets of i but i we of partial frequencies um i finally just a few results uh this is and a simple um sort of a loud test piece so it's it's just three chords each of three nights um so we've got time on bottom them here and then the the frequency of the knight's present a of the this and and the the blue dots that um it's K and it estimates and we can see a fixed up um all night quite nicely here and this that one but one just dropping out here um as the the I the K at the end of the night um do errors here a a at the beginning of each night and and easy "'cause" by a transient effects the beginning of the like which will we're not modelling at my and and then find a we we tried on some real music um so this is a a a a kind of piece and and you C it picks up these the base nice but quite nicely and so the the travel mates it it doing a bad job out here that's there's a lot of false alarms and its of the going on and again that's to to um some trend like transient affects the beginning of each night which we we're not modeling well and sorry the i've a late or just the the ground so you just to each um but that and the um the on point process model which we using so it's you on a search of you for each um uh each frame given given the nights and some simple a dynamic models that so for the evaluation of the X at a time and he's will the these us to do and sequential inference is the the mcmc particles goes out for them to find the the number of nights in each frame and and estimates of that that um frequency is the problem and say that there's lots of ways we could extend this say i i mention that E that we we what you look at P camp use "'cause" that's to hot so i and we do mess that need gets nice performance of we looked at then and and also a a at the phase they that we haven't that that's all yeah um and all step by by looking it's that's more complex a dynamical uh and how to given the simplicity of them it seems we in quite well now um a it's quite a long might of real time about it it's a a i we haven't been aiming to get it real time maybe oh uh yes or something and so and i was able to look through what looks simple peak detection we is possible to find a the of features would like to more spectrum to simple peter good score but your term record more as you are used to but spike to hear more just can be limited to from do spurt maybe or you you such teams still so a i i i a by the your we use to do are go to be to measurements just peak detection like and the peach that you're detecting great and number uh uh is better to four two peaks for example of after some some doing to get to do to use like more still sparks some stuff you that was a some smoothing moving that's but it's doing a sure that for all source or they're them i was to the this to the the errors and the of real hard we can what because of to use from minimum or maybe yeah that's that a trade between if you give a pretty much averaging in that it news the sum of the different now or site it's got just a very sure on the which seems a i oh i i five oh oh a i i a i i oh i which and i think another silence do is that something along in in

POINT PROCESS MCMC FOR SEQUENTIAL MUSIC TRANSCRIPTION

Particle Filtering for High Dimensional Problems

Presented by: Pete Bunch, Author(s): Pete Bunch, Simon J. Godsill, University of Cambridge, United Kingdom