0:00:15mean
0:00:15and
0:00:16i'm started to just one right but it's it's not simon
0:00:19i
0:00:20um
0:00:21so i'm gonna talk about the music transcription work um from my master's project last year
0:00:25and
0:00:27so
0:00:28uh just to go to what we to the transcription it's and say i musical signal might look some a
0:00:33bit like that's
0:00:34so this is a a a a just a time domain signal it's can be roughly periodic and it's it's
0:00:38can have a a whole that's of
0:00:40um sinusoidal
0:00:41and
0:00:42components each with a a a different time varying amplitude
0:00:46and but that's how we perceive music is that
0:00:48and this is what we
0:00:49what we sort
0:00:51i think of we we have a we think of a a a is no
0:00:53and and then some of a high tech and high level properties
0:00:57uh
0:00:57such as the and expression and that the timbre of instrument
0:01:00so
0:01:01and so what would like is a system that can take some might this and turn it into something like
0:01:06this
0:01:07now that's quite an ambitious things do one step
0:01:09say
0:01:10and we gonna a for a a the intermediate results
0:01:12that something like this
0:01:14this is that it can or roll
0:01:15uh and we've got
0:01:17um um like
0:01:19or got and the pitch of the night sub side yeah and time on bottom and the line indicating which
0:01:23makes the presents
0:01:24and this is from them and you work silence
0:01:26and just on a single byte modeling
0:01:30um
0:01:30so what i'm gonna do is just talk about a um
0:01:34uh sequential um
0:01:35framework that
0:01:37doing this night estimation
0:01:39and and not talk about the the models
0:01:40um
0:01:41that we we using say we we got a a like you'd model using a some point point processes and
0:01:46then something simple dynamic models for them next evaluation
0:01:49and then not talk that's and and C M C scheme to some results
0:01:52and so first all um i'm a music is
0:01:55and a continuous
0:01:56signal and
0:01:58uh we we want to look at
0:01:59and
0:02:01we can see domain model said pressing we gonna do is to um chop it up into frames and i
0:02:05will reference the frames with this then subscript Z here
0:02:09and then for each frame would like to estimate that was set to make its presence which will
0:02:14cool be to T an out and given the data that we've got for that frame to go white T
0:02:18and and the way we can do this is by looking at this uh a joint posterior a of the
0:02:23V
0:02:23the notes in the current frame and the previous frame and you're recognise this from the the previous talk it's
0:02:28that the same
0:02:29um
0:02:29that
0:02:30and say we've got
0:02:31we we can expand this one three times um i like it's um yeah a transition that sticks and
0:02:36and then this
0:02:38um
0:02:40uh posterior time from the previous
0:02:42and processing step
0:02:44say and you might in this uh
0:02:46T minus one implements to then
0:02:48just a marginal of that
0:02:50so i got a yeah
0:02:51a particle up or the previous frame we can smoke that
0:02:56so let's less of for yeah it's the the models
0:02:59that we using for that selected
0:03:01at um
0:03:02so
0:03:03um i mentioned we can use frequency domain model say this is just the actual time area
0:03:08transform of uh one of the frames
0:03:10i'm see that what we interested in it is
0:03:12this that of P down here
0:03:14and that's that's a lot of redundant information down here in the noise level
0:03:17so the first thing we gonna do straight away
0:03:19they will run of that just a peak detection algorithm
0:03:22and this is very simple we we just looking at the first order difference
0:03:26and then and give a median threshold on it
0:03:29and so we would use
0:03:30the bispectrum down to just this that's of a red circle pizza
0:03:35um
0:03:36now
0:03:37what would like to model is but the frequency and the amplitude
0:03:40uh it it sends out the the amplitude these peaks
0:03:43is dependent on a an off lot factors
0:03:44including in but that's playing
0:03:46and uh the you
0:03:48recording environment
0:03:49and most of all them are very very of a time
0:03:51so and
0:03:53print together a simple and robust models as
0:03:55is difficult say what we're gonna start of but just looking at a model for the
0:03:59the frequencies of the the set of so
0:04:04and
0:04:06say
0:04:08if we and if we have one night playing you know
0:04:10um
0:04:12frame then what we what we
0:04:14C characteristically is a a peak at some fundamental frequency that's that's
0:04:18the the lowest P can't with and with than it a yeah the fundamental
0:04:21and then we see yeah a sets of peaks
0:04:22that's
0:04:23um i times a partial frequencies
0:04:25i would is the set up here and there approximately in multiples of the uh a fundamental
0:04:31and but we don't always get
0:04:32a P in all these locations some plus one thing here
0:04:35and we don't know how many of them
0:04:37they'll be ha ha
0:04:38how high we have to go up
0:04:40and
0:04:40in addition we gonna get some cuts of up yeah and it's gonna be due to
0:04:44um
0:04:46a a noise all transients affects which were not really modeling
0:04:49and or the non musical
0:04:51sounds and recording
0:04:52i
0:04:54uh
0:04:54so if we if we have a lot of
0:04:56no
0:04:57present in the frame
0:04:58we up with a horrible they rest a station issue where we we'd like to link every P we've i
0:05:03that one of the nets presents all
0:05:05a a cut the price
0:05:07and
0:05:08but so that that gives us some horrible scaling in complexity as we increase number of nights of the number
0:05:12of at times
0:05:14um so we can get around this by um
0:05:16making it a um using up a possible
0:05:19process assumption about the uh the generation of peaks in a spectrum
0:05:24so
0:05:25we seen that's and
0:05:27for each of its own and the pizza generated in the in the spectrum according to a poisson process
0:05:33and we can construct a and in intensity functions this
0:05:36um
0:05:37for some process by which has a maximum at the expected uh frequency of the
0:05:42uh i that i
0:05:44uh no this is quite a significant assumption
0:05:46um
0:05:48germany many where we only
0:05:49expect a see no P
0:05:50school one be or maybe in some rare cases some some respect peak
0:05:54um
0:05:55now with this assumption we we gonna have a a a a some distributions of the number of peaks at
0:05:59that time that time
0:06:00and so
0:06:02and that's that's the bad thing that the good thing is that
0:06:05um because of the union property of price some processes we could just at the intensity functions
0:06:09uh for each i've i
0:06:11a to of us
0:06:12uh and that's T
0:06:13function like this for the a whole night's as a a personal press
0:06:17and and you can see we we constructed this
0:06:19um
0:06:20with a
0:06:21a very
0:06:22now large can combine it's that fundamental
0:06:25showing that way it would pretty certain is gonna be a peak that and we and we quite sure
0:06:29uh what frequency will be at
0:06:31and we've got some a a small components it's a high frequencies where with less that exactly what frequency the
0:06:36people look occur
0:06:39um
0:06:39and then if we have more one they present the again we can just at these and intensity functions together
0:06:45a for all the different nights
0:06:46and give us a i
0:06:47and a poisson process but for all the peaks in a and all spectrum
0:06:51uh say just that
0:06:53he's a mac
0:06:54and
0:06:55this is uh
0:06:56we would been using a a gaussian mixture model to to construct these these note
0:07:00and intensity
0:07:01function
0:07:03and and then we just
0:07:04uh adding them together to give the entire frame
0:07:06and intensity function
0:07:08and then adding on and a little bit extra um uniformly
0:07:11to account for that that scott's of peaks so
0:07:13the
0:07:14cut up for some process
0:07:16and and then once we got this we
0:07:18uh i like uh a like cleared
0:07:20uh expressions of the
0:07:22um
0:07:23frame
0:07:23so
0:07:24um
0:07:27just a integrating the intensity function a each um and of the fast a transform that would give us the
0:07:32an expectation of
0:07:34um for the the presence of a peak in that bin
0:07:36and then
0:07:37uh we can just um
0:07:39take a like you like this
0:07:41um
0:07:42from a from for speech and and then not all together to give the cycle frame likely
0:07:48and
0:07:48now i said i'll of the uh
0:07:50and attains a cow approximately at integer multiples of the fundamental
0:07:54um
0:07:55and
0:07:56it it sends out that for um
0:07:58especially for a stringed instruments
0:08:00uh they the you ten step the spread out so high frequent
0:08:03so
0:08:03and we've been using a um a models of this in a menace T and the going to this formula
0:08:08i can from the that
0:08:09and and this introduces another parameter that we can have to rest which is if this be here which is
0:08:14that it's a and in how many city parameter a for each night
0:08:18um
0:08:19so
0:08:20the things we have to estimate and now adding up
0:08:22that that speech to that we had a idea
0:08:24and
0:08:25if if we use
0:08:26take this be that the set the problems as we need to estimate we've got so and the number of
0:08:30notes and then for each night a fundamental frequency
0:08:33the number of partials annals that in in how T
0:08:39um maybe non to the um
0:08:40transition density and now we've been using some very simple models least a for um and they'd been based on
0:08:46two
0:08:47quite basic observations say press the that's if an is present in one frame then it's like that that it
0:08:53is also
0:08:54and present in the next frame
0:08:56um and second that's uh it this is the number of nights present in one frame then it's like you
0:09:00will have the same number of nights in the next frame
0:09:02and i'll we see there are formal um higher a levels of modeling the we could do here looking at
0:09:07how the the number of
0:09:09partial frequencies change between frames
0:09:11so you expect that the K
0:09:12um
0:09:14and also um
0:09:15modeling a a note onset set sets we have like that i
0:09:18is
0:09:21um
0:09:22but would now got everything we need to do some inference
0:09:24say
0:09:25this is that
0:09:27this is the thing we trying to rest make run but and we defined a model for the like it
0:09:30that's that the poisson model and we've got a my simple models for the
0:09:33the transition
0:09:34then T
0:09:35and so now we can use the um and C C particles out with them uh which
0:09:40but never and just the last talk
0:09:41and
0:09:43uh two
0:09:44S make this this joint that's T
0:09:46um
0:09:48now
0:09:50the the problem is that
0:09:51if we've got a large number of next then we've got
0:09:53a lot of parameters now um
0:09:55to about three from this region a remember
0:09:58at which means if we try and change all of them at once we end up with very low acceptance
0:10:01rates than all
0:10:02um markov chain
0:10:06um
0:10:08okay that the way to get around this um for we gonna have to sorts of move we can have
0:10:12means where we only and change the
0:10:14and the current frame parameters
0:10:15and then all these where we we trying change by the previous frame and the current frame from
0:10:20and for the current frame
0:10:21and it's it's nice you we can just use metropolis with gives them a just choose to use change some
0:10:26subsets of the problem as that once will just change
0:10:28and the three parameters the say seated one nights
0:10:31uh in each step
0:10:33um
0:10:34the joint moves it gets a little more complex
0:10:36um what would like to do is
0:10:38sample poll a the T minus one from the
0:10:41um
0:10:42possible distribution from a from the previous frame
0:10:45and then uh propose the card frame is from some provides
0:10:49uh say the the problem here is that if
0:10:51when we do the sampling we will be changing all of the T minus one promises as in one guy
0:10:55and and again that gives the is very low acceptance rates
0:10:59uh say
0:11:01a solution to this this being C to take the the particle distribution and it's of collapse it onto to
0:11:06a a a single
0:11:06univariate histogram uh for for all the different possible notes that we have in the previous frame
0:11:12and then we use this
0:11:13to as an approximation for the
0:11:14the the marginal
0:11:16distribution of
0:11:18um each night and then and the the of
0:11:21for uh independent it
0:11:22that's you my as one and this means that we can sample
0:11:25um one day to to time uh the um
0:11:28of the but the T minus one parameter
0:11:31um and that and again gives
0:11:33acceptable acceptable uh
0:11:34except at
0:11:36uh a to finally we we want to made the number of makes present in each frame
0:11:39and that can
0:11:40be done very nice just by putting the whole thing into a a reversible jump
0:11:44um formulation
0:11:46and
0:11:47so that's some look at some results
0:11:49and so this is the the output from a couple of markov chains this is a a a a simple
0:11:53case where we just got one night
0:11:54and we're not looking at reversible jump a that what we fixing the number of nights that one button yeah
0:12:00and
0:12:00you can see that it it
0:12:02and it picks up the correct night
0:12:04in on the first iteration in fact factor
0:12:06um
0:12:08and the other a from just on the tree can you a green i think but it
0:12:11so takes about twenty frames segments
0:12:14and
0:12:15and then here on the right got a a three nee case um and we doing reversible jump mcmc now
0:12:19say um rest making the number of nights air
0:12:22um and that's
0:12:23and yeah so yeah that
0:12:25pretty much correct
0:12:26um
0:12:28with the the frequency say we see a fixed
0:12:30to of the knight's you very quickly and then it its troubles to choose between three possibilities here
0:12:34and
0:12:35this the three cases are in fact
0:12:37space
0:12:38i not to the parts and and the reason that some confusion there is "'cause" the three next have you
0:12:42much the same sets of i but i
0:12:44we of partial frequencies
0:12:46um
0:12:48i finally just
0:12:49a few results
0:12:49uh this is
0:12:51and a simple um sort of
0:12:53a loud test piece
0:12:54so it's it's just three chords each of three nights
0:12:57um
0:12:57so we've got time on bottom them here and then the the frequency of the knight's present a of the
0:13:02this
0:13:03and
0:13:03and the the blue dots that
0:13:06um it's K and it estimates and we can see a fixed up
0:13:08um all night quite nicely here
0:13:10and this that one
0:13:12but one just dropping out here um as the the I the K at the end of the night
0:13:17um
0:13:19do errors here a a at the beginning of each night
0:13:21and and easy
0:13:22"'cause" by a transient effects the beginning of the like which will we're not modelling at my
0:13:28and and then find a we we tried on some real music
0:13:31um so this is a a a a kind of piece
0:13:34and and you C it picks up these the base nice
0:13:37but
0:13:37quite nicely
0:13:39and
0:13:40so the the travel mates it it
0:13:42doing a bad job out here that's there's a lot of false alarms and its of the going on and
0:13:46again that's to to um some trend like transient affects the beginning of each night which we we're not modeling
0:13:52well
0:13:54and sorry the i've a late or just the the ground
0:13:59so you just to each um but that and the um
0:14:01the on
0:14:02point process model which we using so it's you on a search of you for each
0:14:06um
0:14:07uh each frame given given the nights and some simple a dynamic models that
0:14:11so
0:14:12for the evaluation of the X at a time
0:14:14and he's will the these us to do and sequential inference is the the mcmc particles goes out for them
0:14:19to find the the number of nights in each frame and
0:14:22and estimates of that that
0:14:23um frequency is the problem
0:14:26and
0:14:28say that there's lots of ways we could extend this say i i mention that E that we we what
0:14:31you look at P camp use "'cause" that's to hot
0:14:33so i and we do mess that need gets nice performance of we looked at then
0:14:37and and also a a at the phase they that we haven't that that's all yeah
0:14:41um and all step by by looking it's that's more complex
0:14:44a dynamical
0:14:45uh
0:14:46and how to given the simplicity of them it seems we in quite well now
0:15:00um
0:15:12a it's quite a long might of real time about it
0:15:15it's a
0:15:24a i we haven't been aiming to get it real time maybe
0:15:27oh
0:15:39uh yes or something
0:15:42and
0:15:43so
0:15:47and i was able to look through what looks simple peak detection we is possible to find a the of
0:15:51features would like to more spectrum to simple peter good score but
0:15:55your term record more as you are used to but spike to hear more just can be limited to from
0:16:00do
0:16:00spurt maybe or
0:16:02you you such teams still
0:16:04so a i i i a by the your we use to do are go to be to measurements just
0:16:08peak detection like and the peach that you're detecting great
0:16:12and number uh uh is better to four two peaks for example of after some some doing to get to
0:16:16do to use like more still
0:16:18sparks some stuff you that was a some smoothing moving that's
0:16:23but it's doing a
0:16:24sure that for
0:16:25all source or they're them
0:16:27i was to the this to the the errors and the of real hard we can what because of to
0:16:33use from minimum or maybe
0:16:35yeah that's that a trade between if you give a pretty much averaging in that it news the sum of
0:16:39the different now or site it's got just a very sure on the which seems
0:16:50a
0:16:55i
0:16:58oh
0:17:00i
0:17:01i five
0:17:03oh
0:17:04oh
0:17:05a
0:17:06i
0:17:06i
0:17:07a
0:17:08i
0:17:09i
0:17:10oh
0:17:11i
0:17:13which
0:17:21and i think another silence do is that something along
0:17:49in in