uh the uh a goal of this work was to uh improve upon state-of-the-art transcription uh by explicitly incorporate information about not tons is an offset uh some general information about transcription which you might have had before by not but on no i it's the process of combating an audio recording it do some for them music notation eight has numerous applications thing and my are in interactive music system such as uh automated score following already uh competition colour G and it could be divided into several subtasks uh such as multi pitch just mission a detection of note onset and offsets uh instrument litigation extraction of rhythmic information is the temple and in the multi pitch multiple instrument case it still remains an open problem uh some related work in which is linked to this work is the uh iterative spectral subtraction based system by to poor which propose the spectral smoothness principle uh the rule based system by rules do who also proposed as a time representation the resonator time-frequency image which is also used and this work uh i thing yes joint model is your estimation method which continues you ranks first in the U X uh public evaluations for uh most as your estimation and note tracking and then iterative estimation system for multi pitch estimation which exploit the temporal evolution which was previously proposed but you're uh also some related work an onset detection uh is the well known uh fused onset detection functions by one problem well low which combine energy and phase base measures and uh a more recent development which was late fusion by holds up to which fused at the onset descriptors of the decision level and uh in this work uh we propose a system for joint multi just estimation which will to exploit on and than of the detection in an effort to have improved multiple pitch estimation results um not also detection feature were developed and proposed which were derived from preprocessing steps from the description system and offsets a we believe are the first time to be explicitly exploited by using uh a kid a markov model this is the basic outline of the system basically there is a preprocessing step where the uh time-frequency representation is extracted spectral whitening is performed and noise is suppressed and they pitch sailing an sort pitch strength function is extracted of the was the core the system is the onset detection using late fusion and the proposed scriptures joint multipitch estimation afterwards each wise of the detection and the result is the general transcription in a T four uh this is an example of the uh time for the series we used which was the resonator time-frequency image which is uh resonator filter bank um we use that's and course with them more common on and Q transform for example because of of its exponential decay factor it's uh had the but temporal resolution in low because that you mike see here this is a very typical uh are um recording from the mikes to thousand seven competition which is usually employ after the extraction of the uh a time for this representation presentation aspects a whitening is performed you know to suppress timbral information and make the system more robust to different sound sources uh the what method but to it was used to that end and uh it was followed by a two once the octave span a if filtering procedure and the based on that uh white and and noise suppressed presentation a pitch aliens or pitch strength function is extracted uh a along with tuning and how many to coefficients and the lower figure you can see in the bottom that are the are T i spectrum of a C four pound notes and in the lower left and right figure you can see the corresponding peaks set function when you see a prominent peak in the C for note but you can also see several peaks sing sub harmonic positions or in super how positions of that can have a a so what's onset detection is uh forms two also the scriptures were extracted and proposed utilise information from the preprocessing steps of the multi pitch estimation stage first first proposed a script was a spectral flux basis to which also incorporated tuning information and is essentially E was probe motivated because in uh many cases you have for as is called by V brought T or a by tuning changes and these might uh give many false alarms in uh normal energy based uh also detection measure and these proposed measure is basically a half wave rectified uh a might on resolution fills bank which also incorporate an information from the extract pitch salience function as so what's on it's can be easily detected by P B all that but the function uh a second function a a for the detection was also proposed in order to detect soft on source of dozens of are produced without any knots of to change my be produced by both string is as for example and the proposed function was based on the P on a chrome are up to version of the extract pitch salience function um which was also have a rectified of work you know to combine these two you want the scriptures late fusion was applied and uh in know two train the late fusion problems as a development set from uh ghent university was "'cause" this is of ten thirty second uh a classic music X was you uh for multiple of zero estimation for each frame for each uh a kind of that's are extracted and for each possible combination uh the overlapped partials are estimated and overlapping partial treatment is applied basically for each combination of partial collision is is computed and that's was the um pitches of the overlapped partials are estimated by uh this script cepstrum base spectral envelope estimation procedure in the low frequency domain in the figure here you can see the uh in the right you can see the uh harmonic partial sequence of a have but G five for can nodes and the course one express of them a after what's for each possible peach combination for to a frame a score function is computed which exploits uh several spectral features and also aims to minimize the residual spectrum so um the features of what use were the uh spectral flatness for that's harmonic partial sequence a smooth this measured based on these spectral smoothness principle the spectral centroid which is these centre of gravity for that harmonic partial sequence aiming for a low spectral centroid is usually an indication of a musical is one harmonic sound uh a novel feature was proposed which was the harmonic related speech ratio which was a a you know to to um suppress press any harmonic or sub money cared and finally we try to minimize the uh flatness for the residual spectrum to much my of there is is suspect so of the optimal speech kind it said is one that actually maximise that score function and the weight promises as for that's score function were trained using now the my station using a development sense of one hundred kind of samples from the media lines kind of sounds database from uh that was propose developed by fun is in a me from uh in india i to the uh the pitch estimation stage the of the texan is proposed and it's applied uh and it's done using two state on of hidden markov models for each single speech and in this system an off that is defined as the time frame between two consecutive on sets well the at this stage of a peach firstly turns in any if states you know it to act uh compute the state priors and state transition for that a man um E files from the other C database were used from the classic and jazz john and for the observation probably we a the information from the pretty extracted pitch sense function and uh basically the observation function for not to pitch is essentially a sigmoid function for that extracted salience function and see the basic structure of the peach wise H of them for of the detection in both for evaluation we use the we used in just get really it's true a a set of twelve twenty three second X or some the other C the base which consist of classic and just music excerpts uh uh you can see most of these pieces are are but not all of them in five there as many guitars and there's a very nice court is also um here's a basic example of that just caption shen in the upper figure you you can see um the beach ground truth for a D tar X or the lower have you can see that description this is what the original recording sounds like hmmm mm um are able and uh this is the synthesized transcription for this same recording oh you're you're generally you can see that um the going doesn't have men false alarms but in some cases tends to underestimate the chord thing number notes of polyphony number so it has some miss detections but overall deep is quite good and uh these are the results for that system uh the results in terms of accuracy using ten millisecond a a evaluation is sixty point five percent for just a frame based evaluation with not without on of the detection eight fifty nine point seven percent utilizing information for since only because it so uh has money more false alarms because it doesn't have any the activations what beaches and it right up to sixty one point two percent for the joint owns and that of the case and when compared to the uh various all a in there it so so that so as the uh method by can as case that which had a uh gmms that and that spectrum models uh of these spec but uh method by site or the H T C up than that was also present before uh results are about two percent improves in terms of like accuracy multi detail these might be given with some additional metrics where can be seen that's most of the errors if at the uh uh a are uh a false negatives missed detections where as the number of false positives R is relatively to be smaller and finally some results on the onset detection procedure uh it should be noted that we were aiming for a high recall a not a high measure because we want went is the state in um or segmenting the signal but rather to capture as many on sits as possible finally the contributions what work where the onset detection features that's where do right for from speech estimation preprocessing uh score function that complain combine several features for multi pitch estimation including uh no feature for suppressing come on pitches apps of detection uh using pitch wise hmms and so could show results using the a C database which a perform state-of-the-art and uh in the future we like to explicitly model uh the no detailed they produce sound stay says as the uh attack to change in since and the case part of the produce sound uh a phone joint model which just mission and not tracking not separately and finally publicly clear at uh methods to the marks framework of was done in a previous method of or thank you right of those control so this questions have time hi so um i notice that you said you train your onset detection or on piano i i i think it was a detection no was trained and general uh classic music X most of them were string actually okay i guess that was gonna be my next question because oh a lot of times on plucked strings and and struck instruments it's much much easier not to do that yeah onset set and uh i was just wondering if you feel like you're onset detection has anything new to say about detecting on sets and things with bows or with singing or you know things words or yeah uh that's why the uh uh a second tones measure was proposed not to detect soft once it's which was a bit a pitch based measure which is actually i think the only reliable way to detect on sets without any energy change and in fact we put some examples transcription exams like they wanna you before in the web uh where i have a another exam from a string quartets description which is actually pretty accurate i one question you say more about you also sets also signed for perceptually important and one in wire was so important to your performance well uh the thing is that most multipitch pitch estimation methods to that they do not explicitly export some information about the excitation time the octave octave the activation time of the produce sound and also the the activation that sound and uh by incorporating not information in fact uh to demonstrate that we can also just improve the bit on that frame based multi pitch estimation C G and um yeah i mean uh generally on so should if fact be used um more widely and not be left outside which just is them as is user down the look like a a look like a lot of errors were cars part missing notes yeah means you are so was helping was impaired the missing notes were actually producing case of then scored when you might have also some uh octave errors uh sometimes the upper pitch might not be detect in that case of you have an to and that doesn't have anything to do we don't sets because uh all it's are the for the lower note about it has something to say about the features of we might use for um multipitch pitch estimation that we need features that might be more robust that's say to uh in the case of overlap no know questions oh one "'em" up monopolise support when you have a core i mean can we really hope to get all the notes so we automatic means well uh depending on the instrument model is that you have and if you exam if we for example you have change the promise of you system based on that specific and smith so that might be generally easier compared to let's say you have a change your problems in a counter and tested it in uh course from both string so you are good music are used to be dependent models can be used as an i i think that's generally a trend would be to for mean the future estimate to speak description so that it will also include that joint in meant relations that yeah else yeah bills because remote Q