actually and um i have here to prevent our core also paper on behalf of the first of all sir from university of science and technology of china actually this work started will when the first author visited dice as an research in we implemented implement it's the harmonic plus noise model basically in the very beginning we we one the that to be used for speech or analysis is is and um especially for speech synthesis i a a she went back to school uh we are i use that harmonic plus noise model uh to to to implement a new feature for but applied this time for for speaker but verification and um we got on a price and they a a a a a promising speaker verification do a result using this that new set of features so this is basically uh i but this stuff of that the host story of this work um that a days out i i i will first to introduce our but vision and um the so called S S E a feature which stands for spectral subband and sure feature yeah i i wear up briefly introduce the harmonic class noise analysis of speech and um the you a how we and how we calculate these spectral so subband and the to your feature and finally how we model the S S yeah are feature and finally i where introduce are evaluation results and down conclusions and that is that is um probably uh we we have known problem that for today speaker i identification man the verification tasks usually we steal use the features part from automatic speech recognition um the problem is is those features are actually supposed to be able to normalize the speaker information but so we we want the motivation is quite street for word we want to find some new features that is a to re current mfcc features uh to uh uh features like like mfcc so it can uh carry the speaker characteristics and then there for uh two a a to be able to improve the speaker verification performance so this is actually a the motivation of a P work uh are of this work um that for a for there are several steps a a a a a to extract our proposed as S S yeah features at the first step is we um apply the harmonic plus noise and then it's analysis i of speech and um then calculate a subband and shows i we uh i uh we introduced a details later that's actually uh in each subband you you need to calculate the and edge of the harmonic part rows is the energy of the noise part and then that's a new feature and you plug into the current speaker verification system which is actually uh conventional gmm-ubm system and um you you use that as a uh as a as M a company uh to read fit a feature to mfcc feature so i i i will uh briefly introduce T the harmonic plus noise i i a speech and then it's is here this this work was pope proposed to by a professor start new yeah know uh you you can you can you can find the reference people are people here and um basically for for this for that for the each uh input at you we first the do uh F zero extraction of pitch extraction to get the uh uh uh uh a at F zero estimation and and uh then and and of course we you you get the you ways always label me we we we discard uh as those are waste of frames and a only in uh use those was of for frames for for further analysis and um a to this we do pitch synchronise uh synchronous window any um um on on the input utterance so you get several frame a to represent the intel uh uh uh in have syntax and um for each given frame uh our short suspects a speech segment we do a a you man and and if H M and stands for a harmonic plus noise model me model don't and um the basic idea of a you and and nine C is you um to decompose the input speech signal into the harmonic part which use a purely attic class the that noise pop we can use several mastered to to represent the noise part and the in this work we use uh uh we use the residual basically the input signal my the harmonic part uh as with noise yeah are some basic uh uh setups a a up the hmm and nine is um the speech signal as uh i i i as you can see and um we use to pitch period a hamming window for each track twos to uh to get the uh to to to basically chop the input include a a speech and um he is a not and that and that was another important thing we need to define that is a a for for for for each you man which H can an an it's is that is in max max small was the frequency uh uh we we fix that frequency to six er and um the as a as a as i mentioned before the noise noise part a a a a a is defined as a research researchers it signal and then yes uh example all all this the the same role you're uh i a a pronounced up by different two different speakers uh the the group the red curve is the uh uh uh harmonic part harmonic spectrum uh i of a particular input frame and it and the and the green power growing curve is the noise part of that's the spectral for spectrogram up the not a noise part and um for this frequency subband as you can see a a for this speaker the it's a and the tree show of the harmonic part and the noise part is almost a uh almost most like a bit uh at the wine basically it means uh the energy of the harmonic money part a similar is similar to the to the energy of the noise part a for this speaker you can see more energy is this are designed to the harmonic part about the and the noise part so we we hope these characteristics or or or read uh a this kind of K a a re and differentiate a a a a a a different speakers so we when we observe this the a the sixty two problems you and you need to you right find the first is a band a of each sub-band a in this case we we define depend where is that it as the average of to mean them uh possible F zero and the maximal possible F zero four for a a for a given speaker actually i this this uh this number those two numbers up a gender dependent we defined a a a a i a i of values use for female speakers and this another set of barry was for for male speakers and done and the problem is is centre center frequency of of each subband um actually at this is quite straightforward uh four or H M and analysis we use three yeah and grew or multi part might might pose are of F zero and um so that we can define T i subbands together uh in total to cover they whole frequency range and um after this oh which start and and and in frequency we can calculate the subband energy for the H T mainly at the harmonic part and at the end sub energy for for the noise part and done you can calculate then calculate the energy ratio between the two and come vote the value into into T B so so after this for each frame you you get a dimensional feature vector and um and uh the this is a gender dependent so in in in in our experiments uh for female speakers there are a sort is three uh i mentioned dimension uh uh uh we have a is three dimensional feature for female speakers and uh forty five dimensional feature for male speakers because of male speakers usually have a lower or uh i F zero so so after the the the feature has been a are calculated uh we need to you not of that uh so we the first thing we we want to check is whether the distribution of the S S yeah features is in so that we can use uh jim and to to model that so actually we we we caff the we we we plot those a a one to see whether we can we can model that and they so like using using come distribution to model that feature a a uh is quite reasonable and they it looks like a option so we use the are come at covers note gmm-ubm ubm system to do speaker verification we use uh conventional mfcc feature as a baseline and um implement the S S yeah a feature based system and um we use um mentoring data name the uh it's six is this is um come the used a a a a database uh uh mentoring and them i it is widely used a in china i i i two for some for speech recognition and that speech and then it's is even for speaker a speaker related task and the we measure the the eer uh to to say to um as a pro for a a as a as a of performance match and uh no score number the that normalization was used uh this is say that these some statistics have C up to a training and test and couples are we have oh a hundred and D five speakers altogether together and um we use and seconds training and in seconds a test i oh for speaker verification task and then so it this is a reading style us and um so those sickly we have for for the for the two speakers we have uh a a a a seventy some T six male speakers last a you four female speakers and uh we have a to to there we have um by this number of of a unique testing sentences and we we we we we we are range them to get this number of uh uh i six six thousand male trials process seven sound and female trials okay that's is see that the result uh we can first uh see the uh and F C C baseline the ye E for the for the M and mfcc baseline and um i i as as as you can observe of and as user uh of the female speakers i have to be to be more a little bit more difficult to handle and um by using as S yeah are features are alarm uh is that the performance is actually worse then the mfcc features that you can get those numbers at if you if we combine those two systems together we we get an the uh we get a a a a a a reasonable input bit uh performance improvement especially for the for the female speaker female speakers so if we can combine those two system together actually uh that the female up the performance for the for for the a female speakers here is actually becomes factor then the then the or speak uh so this is a a quite interesting and surprisingly good uh performance improves so to conclude this is this this paper actually a it is quite straightforward we we we proposed are you new feature named as as yeah for speaker verification it can uh characterise three interaction between vocal tract movements and close to L for and uh seems like it it is quite quite the to capture the speaker characteristics and um this feature is complementary to mfcc and um in if you read you you you read it there uh in reducing yeah a along with the mfcc baseline system and um of the future work we want to a to see to to do more experiment to see whether it performs well for example in noisy environment and um and that and after post processing techniques okay thank you very much i you you through the question the yeah yeah hopefully i can i i because uh i i'm not quite for a mill it ways the uh speaker verification task this what was was basically at time when uh when she went back to school uh we uh the the the the the the come part of of this work it that the the intent actually implements the G and three is it's when going as you with it it is so hopefully i can i can i can i can answer your question more focus on the it it it H and part uh and hopefully my a the can states not so the details uh a so the uh your is you feature is the but is only calculates the racial of the harmonic and the noise parts yeah so the weight does noise come from ah the the noise is is that so car noise it it different from the uh from the additive noise all the noise no the environment so issue in in speech recognition that's is is is this is different a basically uh the second use this speaker that that for the H M and analysis part uh for each even an input speech frame you will decompose the the inputs speech signal into two different parts the first a part is called how money part which is purely a pure rhetoric and and and the remaining saying the residual you can't define that as a as it noise so this noise is different front fronts the noise in in in in speech for example speech recognition oh so uh it's like a in your system you are using clean signals the recorded in the parts of is yes the so you can uh like extract the uh and not the the you the you as you call the not that's are defined everything that is not a period it okay so that so as noise i is if there was like a a same noise in the uh in in the region signal is with will this noise noise part be robust to the it is to have a that and that is a problem we want we want to see that a a uh in this uh if if the input signal is noisy that there are several things that could be effect it's by by the uh uh a that it i mean the additive noise for example what the it it the the it and analysis still rely pretty much on the i'm the actor rates estimate of the F zero yeah but if you have very strong noise this part could be affected and then uh "'cause" see that the the a still bit it depending on the on hype a pure noise it could affect you how money estimation and it it could also a fact the noise estimation yeah but that that the is actually based thing the that the the the first a or yeah why wound it to to investigate as a as a future work okay in also i'd like to ask a what is the mess used to fuse the mfcc C and S the cr systems that's uh the this is score fusion because a a you know and F C C for the mfcc system you use all the all this input frames i for the edge H an and H and and a the S S yeah system um was the frames are discarded so we are the so so the the the you you uh the you you but this call a call uh to get um frame average like a good reaches call and then combine them to get in in you and the uh is is is that shouldn't coefficients of the so the individual features the uh i i can not a i don't have a a answer to this question maybe we we need to check with the we we we is yeah why i'm that okay yeah you think i i don't know whether there is a uh any weight or if you if it is a a critical to can those weight yeah okay oh of of the question you do anything with the residual for uh you use that all somehow as a feature now we we don't use that we only calculate the in to racial look between the how money part was is C the research yeah can also a question um is then addition shouldn't of these features to do mfcc improves the the results means that so to these two feature sets of features are uncorrelated so did to much will be pretty did the judgement to to what it's and it or it's just it works better room so the a subset of the speakers that you to is a a possible slogan may be and i'm not so sure about this part and done and the we we did we didn't try in for example you can if the the first thing is we we have discarded the on voiced of frames so so basic you you cannot not a a a a two i E like pca to to to come work for example combine mfcc yeah a long we see as as yeah feature and then map that's same to a two and the uh read used in the that animation two to play the news system would we we haven't tried that because um because you have difficulties to hand with the with the on ways the a frames i in in in conventional no out uh how money plus noise and non is is have a for always of frames you do not have the estimate of of of the harmonic part so basically city we cannot calculate the subband and three joe for the always stuff second but have a i i oh ah sh a yeah i i but i yeah oh i i i a i i oh i i i i i yeah yeah a a a i a yeah yeah i oh i a i oh a i i oh i i i a i i yeah yeah i i a okay thanks so let's things the speech were again and i yeah