and just that the stance was you met and the statistics in as creation ask data and the subject of my talk is to introduce an improvement method for text independent phonetic segmentation based on that might kinda ne call mark came from in brief i will first focus on on what you a to be you a speech has a complex signal physical sense physical sense that is to say to you read as a realisation of complex that but after to having if we introduce periods that time seen the study of complex system might be to use a powerful two a cache in your character of the speech signal this is called micro kind a knee call mark K for money's M M and i i show the general potential of speak M M F have to be applied and a speech and all is and then i with channel on on hunter application of these formalism them to phonetic segmentation of a speech signal and i been introduce a basic and improvement to for segmentation and finally i would take some time to present experimental results and to conclude so it has been to a quality and experimentally established that there use for once a nonlinear phenomena in the production process of the speech signal for example already was number which is a number characterising different for a used i put to be able to as thousand which corresponds to to a for a well as we know most of the a in the speech processing tsar based on the linear source-filter model which can not a quickly take into a but a in your character of the speech signal hence and so but here is to find then value a key parameters which are responsible for the complex cut of a speech signal previous studies have me should have shown that such parameters do exist but they are very hard to be estimate our strategies to take the and knowledge is coming from a statistical physics and to relate the complexity with the predictability of each point inside the signal and in practice need to there although computationally efficient tools to yeah to make these parameters if there exist and to use them for a practical and a as important one as in the study of complex system the first phase of started in the late forties with the classical walk of colour more of and which was the basis for the latest at later post in this domain which are based on the study of a structure functions state a main result of these methods used to recognise a global lead the existence of a multiscale that structure without giving access to state there i mean oh is a use is two side because they are based on their statistical average is non the stationary assumption that can be used to decide whether a system is complex or or not that much more information and the second phase missed we try to uh that's a mind you much recording inside the signal where the complexity happens and how it to its a a more precise terms we try to find a subset inside the signal which have the highest information content and we try to explain how these the transfer of information between different the scale organises itself as methods are being made possible by the approach in the statistical physics in this study of i lily system and the two size a study of the notion of transition site a complex east as shown that uh so as you metric multi a scalar quantization is responsible for the complex C this inside a signal a typical example for the is the cascade of energy in fully developed look problem fingerprint impact is is the existence of a power law behavior in the temporal correlation function which has to be you you value that out of any of stationarity assumption at each point site the signal any a single exponents related to this power a lot of as we will be see you see shortly a score of singularity exponents that it can be shown that it completely explains the a quantization of multi-scale the structures and an example in this i stick can only "'cause" form as mean that is in this study of multi of signals i the kind a equal for models which was the first that am trying to at them singularity exponents as a global property of the signal with to what is called a lower down to spectrum are in this equation we have a complex signal as and a multi resolution a multiresolution function grand mal what thing at this scale or and he the at to stand for expectations of where a statistical ensemble the exponent of these power to P could be related to the a a distribution of singularity exponents two dollars on transform but main problem is that it's a global description it doesn't give access to equal and a local dynamics of the signal so in but a can only from one is be try to instead of of feeling on the statistical able to be try to see so the signal i i try to introduce singularity exponents you much is related to geometric location like the signal be a the time index T here and uh yeah multiresolution function gram are and this can just to here the power the exponent and of this problem this but because single singularity exponent and can be estimated precisely to a we of the transition phones of the signal yeah to main problem is that precise estimation of these parameters and uh in this regard but a what of one of the crucial sure choices it problems is the choice of the functional grammar or for example we can use simply the linear increments and that it has been shown that it it doesn't give a precise estimation of H of T because of to a stable and sensitivity of these and you in cream have a best choice for batman it's trying to be the grab model speech is defined as the integral of the variance models were work the but i oh use a B R teen this equation and normalized but the robust me on the real i that's is defined from be typical characterisation of can take energy into a real and it has been shown that it can it is related to the information content of each point if we to use these measure four yeah calculation of H of T so make this or if we can have a good estimate of H of T i can um work a a very important subset inside the signal which is called most thing we have many for this corresponds to the and since i the signal which up have to your of singularity exponents it has been shown that the or lower the value of a single exponent is the high these are on the given point so the critical transitions of the signal use have is happening at this points and a of a reconstruction from has been proposed that and it has been shown in many applications that P can we construct the whole signal having access to only this small subset of to date so this is what just to too the importance of the singularity exponents how have to that we can turn on to see how they can be applied to speech signal previously we have shown that the estimation procedure of H of T for a speech signal and B have shown that we can have good to estimate of H of T for the majority of point in the speech signal we have a speech signal extracted from timit timit database with vertical red lines speech was the phoneme boundaries them them from manual transcriptions provided in timit database and of course the objective of text independent to phonetic segmentation is to identify these phoneme boundaries and in a tolerance mean do so since that is different phonemes they have we know that they have different a statistical properties V expect a singularity exponents to have different behaviours to show these you studied the a can distribution of the single a exponent the time evolution of the distribution of singularity exponents so we have been those of to length thirty miliseconds be compute can histogram of B and we plot it's a time evolution over time and can easily not in this uh uh a graphical representation which is which are the P of conditional to that histogram of singularity exponents conditioned on time and can easily not a remarkable change in the distribution of singularity exponents between different phonemes this has been extensively evaluated over different to speech sect signal but the problem is that it cannot use these uh graphical representation for but for developing a but an automatic segmentation how or you provide a E is here to be used for an automatic algorithm we we is that the easiest interpretation of these changing distribution is changing the average a find a new measure of we it a C C V just simply get primitive of exponents and this could be considered as the can the average instantaneous average of singular to explore we can see the resulting functional and i it is clear that that it shows a difference in distributions more clear a so inside each phoneme the a C see that is or less in yeah we do not a change in so a second of phoneme boundary however to develop an automatic fit segmentation have or is that it can is very simple metric used to fit a piecewise linear curve to this and C C by minimizing the mean square error uh we have a a a going wrong with take fitted okay and we have identified the breaking points have like a candidate point see that you have a a twenty five many most of the boundaries trees bit very good resolution because a there are the because we don't have any been doing problem in this we have access is high as possible resolution which is the sampling frequency of the speech signal so the primary simulations shows that is but a simple metal has comparable results with the state of the art these which was present in know previous works and oh at that it is that we don't a this it is not a sensitive to the threshold selection as we will see in experimental results but where it's a per by performing a or on not is of this method be observed that the i mean see in the uh that's yeah i these thinking difference in the distribution of singularity exponents but the a C is not able to reveal them to identified the i boundaries a are points that there is no distinctive changing the distributions but a C C and linear care feeding makes some mistakes has a try to use a but a classical approach in that detection of change change detection which is right to you has been widely used in segmentation of regions which is a two step procedure to first to select a set of candidate was generous and then to a he is to to do the decision to C but they're each can lead to to the corresponds to a change in the can you know features or not so for the process P selection is that we have two observations first we so that some of the missed boundaries correspond to the transitions between fricatives stops to roles and uh so can be so that that but positions to detect are the transitions between well i know it's segments or silence or poses two phonemes because and silence we have i would positive value of singularity exponents and you know active parts we have a i only negative values so it you an easy to it take change in the that's cups of a C C hence we so to uh i was a to be applied to a pass filter to the original signal and do exactly this same to compute the singularity exponents and a C C for the low pass signal you as an example in the that the figure you can see that a C C of the original signal and in the right one you can see the a C C of the lower filter have to signal we know that fricative is steep so and as far as are essentially a high band signal than low pass signal corps tends them into a a low energy and to low energy signal and see that the figure we have some changing shape or C C but it is not easy to detect which the linear curve care feeding but in the right side right hand side yeah much easier to detect a T reason is a another example of again i emphasise that we have to changing the original a see C but it is not easy to detect but that in the low pass version on the right hand side it is really easy to take the so as the first the you up apply the nmf A C R B C god two signal and its low pass filtered version i'm the but or or the breaking points as the as a candidates and in the second point to be to be perform uh dynamic and i mean doing followed by a log likelihood ratio you but as test to see and one of the candidates but are they actually correspond to a changing distribution of singularity exponents or not i in for size that be do is on the single exponents of the signal itself because we are interest to to show the strength of singularity exponents the low pass filter of a filtered version the does not have any real meaning is just some diversity via at are i grew so that was the dynamic or window mean during procedure for each point the consider treating those icsi like again that oh have to question you put as is on a question and i have to be but this is that to a single the exponents of that are generated by a single gaussian or it is generated by two questions on X or we click so much for H one what right could then H C to a and we take the candidate as uh as the boundary otherwise we remove it from a candidate please then we go to the next three so i experiment our simulations were done on timit the based on the full training for of to meet which consist of four thousand and six hundred sentences and we have developed a i was move or to randomly chose and files from these data we have try to report of the possible performance in because there is this difficult in the literature to compare have have reported out of time to simplify later corporations are two category of a score partial uh a or but you have hit rate or hit rate we shows the right the right of correctly detected by take that boundaries or segmentation we chose how much more we have to take to than false long shows that how much i how many false use have you have to take that the problem with these partial as scores is that a can be they can go in opposite directions for example an improvement each rate could correspond to an increase in false alarm rates so we cannot do a for on page and only be partial the schools but are about the score to this partial the course i've missed and used go to a console for example if one takes a wrote and false alarm it to content or value takes hit rate and or were segmentation into a beat much in is on over segmentation rate so oh the experimental result first we can see that comp a C C D's do we seek a good on the improvement and on the for a different style utterances we can see that we have like two or three percent huh improvement in france so one road and the like for presenting in over segmentation and he rates are more or less the same but and it this shows the improvement over the procedure great that compared then be compared to that a friends number so and which is the state of the art in the literature i can see that for the two runs of twenty five miliseconds be a were almost the same contrary yeah but a percent improvement in the file so long but and we have ten percent improvement in our segmentation uh right a a more important for even if we go to a low tolerance is for five miliseconds we can see that for i i love these we have like more than ten percent improvement in heat rate false alarm and or segmentation this is because the i would a high resolution of the to C C function of that's the bit ones but i been doing we don't have to been doing you have access to the finest possible resolution in terms of a measure we can see that's a a for a lower resolutions we have more than ten percent improvement in both of the okay for in both of the um a scores and for twenty five miliseconds be have like six or or or or four present improvement in or a and if so have have uh to uh i i mentioned that the method is not sensitive to to show which is a problem of the as a call so text methods of phonetic segmentation we are trying the have shown the a sensitivity of to a is to the care beating to sure i have changed the could sure sure to over four hundred percent the value of the threshold and they're value you of a value only has changed in a zero point five percent this shows that a choice of the threshold is not important that all in this have agreed i choose a for a independent is an important feature of we have but these these to you have shown the you have emphasise on the strength of singularity exponents in section of transitions found transitions fronts in the speech signal a more importantly the promising phonetic segment average be encouraging results in phonetic segmentation shows the potential of M F in done it is is of week or local dynamics of a speech signal hence this are are you of work is to use M M F U i don't know means of a speech technology and you to use the constructions from or or or the concept of what to model they've that which is an ongoing research and result i hope to have good results in that from time to very much for that right on time i can take questions one and one but this is officially the end of the fact oh okay yeah i