these thanks to you thanks to all of you for come they so this is the outline of my to a go first so well i would give a brief introduction on on the topic and then describe the phase based features which are as the did in this work then and i we show you a V there results of four experimental evaluation of these features within the frame of voice but the logic you detection and finally a would come a in the great majority of speech processing application uh and then focus is on the use of the amplitude spectrum of the free transform uh nonetheless that there might be a to begin by also considering the phase information and just for example was uh i was down for uh so that a been approaches using phase of based features to speaker recognition or automatic speech recognition us are so as for example the yeah the work of multi or hatch and this a is i mean that that been an improvement by using phase based features in two systems and this is possible since phase uh provides a compound that the resource information with regard to to the amplitude spectrum and therefore are uh investigating the uh the usefulness of uh the phase information uh seems to be a promising approach in in speech and then an now we describe the phase based features which are the object of this of this work so so so what we focus on the group delay function and group delay is defined i i mine minus the first derivative of the uh of the are wrapped phase of a of the for transform and this can be written as follows here so you can see that a a a a and i you know the real and imaginary part of the free transform and uh uh while that is and multiply by X of that so an advantage in using that equation is that it doesn't require any any phase and wrapping and you know you can understand uh i the group delay function as been most of the time about it in can one speech processing application since considered uh is the Z transform of the signal and a then my there might be some zeros close to the unit circle and this is especially true for for the speech signal so that it is there uh so you have a method the fruit transform on and frequency but located at on the unit circle in a in is plane and five to rules uh close the unit circle uh the variation in the in the phase information is quite high and was a in in the high spikes in in the in the group delay function so you can also understand that uh because first search frequencies uh the a the two of this of that expression becomes the low resulting in in this in the group delay so there a that mean some approaches uh uh aiming meeting at reducing the service the spikes in the group delay and first approach where a the modified group delay proposed by hatch and you can see that yeah in the do meet or it has been a but by as a from a guy which is a a a a cepstral smooth version of that of the for transform and this representation makes also use or to to smoothing parameters i find them which also so in uh at reducing that this spikes in the group delay so and of the version is the product of the pair and the group delay proposed by to and yeah yeah that a can see that that the main source of the all the spikes in the group delay come from the come from the a minute or therefore for just get rid of a and just consider that the new mode or of of the expression so we have also invested to investigated the travel billy and propose by was score and this actually use is uh another control or in the you play uh instead of the of the unit circle so and a another circle in the in the Z plane uh to a it is a transform and just list to uh a both us moves and hire high low uh representation of the peaks in in the speech spectrum so that one present purpose we also use the the straight spectrogram a back our what i and this is a that's a speech uh a pitch at that the uh times small thing of the of the on the speech but uh for me to uh spec that's a baseline we also consider the for a magnitude so of the spectrum of the for transform and yeah you give an example of where have is five spectral uh look like uh a a a a a a a system low produce by and number forty question you both and below low for dysphonic question so here you of the three mind it to the straight spectrum modified really the that power and the group delay and the would delay and you can see fat that for under forty question we have a structure which see which is a we regular in time well this is not true for this funny patient and you can this is especially at the side in the job would delay so basically to explain is uh during the production of a stand what were you you can assume that the vocal tract shape is is constant i is so that the contract function is it can be assume as stationary so if you find some is use come from the the turbulence is the ring of the do and that let the production so this this five run see also use uh of features to were from the space the composition position so to to in a them expect the position just consider the the source speech or approach so we have a lot of for a but they which is convert in time domain with the look at that response to give the speech signal and uh we the mix space model of speech says is that the that but was some maximum phase which means i'm D "'cause" that's is uh and "'cause" an signal well i have a cat that he's mean and phase that is to say uh a "'cause" on then so the day the key idea of them expose the composition is to separate uh the minimum and maximum phase component of speech and this is possible for example in the zeros of the z-transform the mean proposed by was good uh uh we can see that the zeros so this is that Z plane in the input our code in it and you can see that zero related to the good that for our five the unit circle well for the good vocal tract there are inside it sim the vocal tract is so "'cause" a minimum phase system so you can see that in this is it the uh the main there is a a a possible enough to separation between the the minimum and maximum phase components of speech and we have shown that it's also possible in the complex cepstrum mean just using the quick N C are G has a boundary for the for the separation so it just work we we focus on the use of the compressed of strong uh the composition so basically we have a speech in we apply was pacific window which is uh a synchronous on that but that are joins then just yeah and to pitch but of long and then we compute the complex cepstrum and in the complex cepstrum some the it's very easy just keeping than to get a a uh in that by inverse compressed cepstrum we get the maximum phase component speech which is mainly related to the glottal flow well i for the positive index is we get the minimum phase uh component of speech which is mainly influenced by the vocal tract so it is where a uh we just extract we just isolated the the maximum phase component of speech which is a kind of a a a a great than flow me so you are you have an example of a two side of the maximum phase component uh yeah O but one uh one i would say the makes "'em" and that the knicks space my but it's respect to that is to so we obtain obtained waveforms forms which grew will uh but those of the top row such as a a a a a lot more the well have some other frames uh the paint the composition and we have such true well an event waveform so we note that two to know that that the mixtures models was or not so we just completed this to time parameters from from the from yeah makes a maximum phase uh with four so now the experiment that real evaluation of these features uh so for that that the base we have to K the base uh which is made of uh the production for uh fifty three number for nick and six hundred fifty seven dysphonic patients and we just consider the the production of the system vol as features we use the frame frame variation for the five spectral run so as i said i if you assume that the vocal tract shape is constant during the production of this system but words to frame to frame variation mean uh uh i are do to from a are you to the to to is there and the got that prediction so we also use this to uh time parameters uh for that was back uh of the mix phase more the and for comparison purpose we also use uh a three parts of john spectral utterances which are extracted from the uh for a it to spectral uh so actually it is three these out in using three this things subbands in a in the spectrum and that of any here because there uh the the mouse uh informative in our previous study so yeah you have an example of the distribution of this um of some power some features so here you so you might need to the modified group delay and the chirp group delay and you can see that it is at a frame to frame variation uh in relative so you can see that problem of them of funny passion we have much uh but was which are much lower than for dysphonic patient and this is especially true for the job would delay a representation so yeah and the right to you have a uh the used to run for you want so uh the time constant uh for the respect of the mix pays model and actually if if the waveform uh corporate uh that's so that a that's of the group of low we expect values are or zero but do you and this is true for the great majority of the number for nick have friends but you can see that for dysphonic uh fashions um most of the time that a that makes the composition fails so we have a says this features uh in terms of uh mutual information so basically this is the percentage of uh use what information of the features bring to the that to the classification problem so yeah we have the five spectrograms and you can see that the chart would be lay uh gives the high amount of uh a useful information for the classification problem mean number funny dysphonic patient uh you can also see to values for the modified five really and five the two times meet there's for their uh respect of the mix phase model so a an aspect from from our uh but use that is you can see that that the spectral balances is the higher amount of information but you have to note that is well that's a uh the intrinsic discrimination power of each uh feature consider a super lately but if you can i'm them for example of the combination of two features if you use by one was about you you can see that it only brings used sixty four percent of mutual information because they are highly are then don't we the best combination of two features is bad one with T two this do you which leads to seventy nine percent of mutual information and this is possible because this so just two sources of information are mainly complementary and uh a very uh very a not not that much uh weird and then so we also use a plastic value or based uh evaluation uh using an artificial no network uh we sixteen on uh we use a a a ten fold cross validation and for the performance measure we use the or rate but at the frame and the passion levels so a a passion is that most uh as as of phone a this funny so we use uh a for that um a majority and a decision strategy uh considering the frame so it the results just using a single feature you can see a the compose on between the that's it a line for you magnitude than the children really and you can see uh a to improve my using that the that the representation both at the frame level and the passion that using no uh two features you have you have the two time can for their respect of the mix more than and you the best combination of two features but one and T two so we can see uh that up to now i have a patient level that the should to to give the best result uh and the passion level but at the frame level we obtain the best and i was a with a a one D two now we to features so let's a or so that the can representation using the perceptual of a balance as and you can see that with each three features we obtain uh that's a worse result than just using the the chip would be lay at the patient level and now you can see are also latest there the very interesting result just using the tree group delay representation with a very low uh error rate but at the and the passion that no using five features so we had that the for magnitude them strip or the two time constant to the three uh group delay representation actually you you can see comparing comparing with this line that is actually doesn't bring anything anything uh more so finally just using the uh feature set uh so that then features uh or we add obvious obviously the best result that the error rate uh for that frame level but considering the the patient level we get uh for about zero eight per which was already obtain just you think that tree uh sure of the tree group delay representation so i as a conclusion we have shown that a phase based features are appropriate for court rising yeah regular gonna write is in the four nation during sustained vote and this phase pitch features are actually complementary three uh at the on was the features the read from the magnitude spectrum common the use in in speech processing and we obtain a quite good performance just using that three features or of the of the group leave representation a but the bank or of you one if you have any question or comment that well thanks thank you have questions a common yes please i was so a so no observation is that to you uh and exchange decomposition things at a dysphonic speech but uh a do not explain it what is the reason for that oh okay so vertically the that would say that the production does on respect the the mixed his model but as i said yeah the we found the windowing first of all for the windowing wing you have to to apply a a so i for news and to pitch pretty my window way so for some this funny uh a a question the just size are are not well mark or are not present for a little and also for the pitch just that me feel two so that might explain some of bad results and um yeah i maybe maybe because of this i thing is the i time as thanks for the talk i i it is what ask if you have it increases interaction between the vocal tract and the source would do you or or a the sensitivity go up but me for the this fine patience if they happen to have mark coupling but that effects um the the phase the mixed is model okay um i to that i can then swear to that that question but anyway you find a a maximum and minimum phase component but just to say that it is really event to consider that the maximum phase component is a a group of what was to make a would not say that a i'm the sure but okay and the back but to my experience um in for the decomposition for number for an expressions let's say a speech synthesis that the base work but when you're more that's see cool coupling between the vocal tract and the glottal souls can i i can advance hi thanks for my talk a just the question was saying to a court all source to be just have to me max first components but just a as a meeting first components which is you to to the so to yield so a and that my spectral tilt of to got all source yeah and that might also vary from frame to frame so the see that that components lights also if should take account that's you could do and gets better and uh results thank a you okay so of the but what is mainly due to the look the with phase which is a minimum phase signal so it this makes in the let's see in the in the me uh in the minimum phase component yeah which is which is not the object i what mean which is not a is the in this work so we just focus on the analysis of the maximal phase company and also for the the features there are from the the mix phase more that we just consider just two just two parameters so event about that mean that that might also and were to uh a a a is question so uh uh even though is not really a lot that flow estimate my the you might have a okay to just so that yeah we have a let's say uh a relevant with form meant just one is very noisy meaning that a meaning that the the mixed phase the compare them to just feels so you but that you cannot interpret that as as a of us to estimate not the last you might have a a a at of expect the composition is it uh and were question all or i half a question myself again uh in the of dysphonic database i guess you have different classes of this phone near could you comment on that and whether you try to distinguish and those classes as you worth so we do not need that works so you just the let's a binary decision so locations uh normal for the got this warning and also for the that's in the uh database you might have very use um but image which it's for a single or a a single patient we just consider a uh a and you know at the location so that computes a discussion let's thank you again