0:00:15 so hi everyone i'll present thing to the iterative bayesian and mmse by noise compensation techniques for speaker recognition in the i-vector space so let's start by setting up the problem here we are working on noise also noise is one of the biggest problem in speaker recognition and the a lot of techniques have been proposed in the but in the past years to deal with it in different domains such as speech enhancement techniques feature compensation mother compensation and robust scoring and in the last years the nn based techniques for a the robust feature extraction or a robust computations or statistics or i-vector like representation of speech so what we are proposing sheer ease a combination of two algorithms in order to clean up and noisy i-vectors so we are using a clean front end so system trained using clean data and a clean back end so in scoring model so the first algorithm in the past work in the previous work we presented a i'm up it's an additive noise model operating in the i-vector space it's based on a two hypothesis the gaussianity of the i-vectors distribution and the gaussianity of the night distribution in the i-vector space here i'm not saying that noise is additive in the i-vector space and just use ink this model to represent relationship between clean and noisy i-vectors just to be here so using not criterion we can there are in this equation and we end up we a model that it given a y zero noisy i-vector we can d noise it clean it up using the between i-vectors distribution hyper parameters and the noise distribution hyper parameters so in practice this algorithm is implemented like this given a test segment we start by checking it's the snr level if the segment it's clean is clean so we are okay if it's not we extract the noisy version of the i-vectors y zero and then using a voice activity detection system we extract noise from the signal using the silence intervals and then we inject this noise into clean training utterances this way we have clean i-vectors and they are noisy preference using the test noise so we can build the noise model using the gaussian distribution and then we can use the previous equation to clean up the noisy i-vectors so the novelty of this paper is how can we improve the i'm so that the problem is that we can apply time up many times successfully iteratively because we can guarantee the goshen hypothesis on the on the residual noise so the solution that we came up with is to use another algorithm and to iteratively between these two algorithms in order to achieve better training for the i-vectors so this second algorithms this call the catfish algorithm it's used mainly in chemistry two align different molecules so here we we're applying it on i-vectors and we're starting from noisy i-vectors and we want to estimate the best translation and rotation matrix in order to go to the clean version so formally for the formulation of the problem it's called the program this problem and its start with two matrices to data matrices and noisy i-vectors presented at a matrix and the clean version this way we can estimate the best relation matrix or here that relates the two so in the training we start by that we said that we are estimating a translation vector and the rotation matrix so to get rid of the translation we start by center ink the data the we compute the centroid on the clean data and the noisy data and then we center the clean and noisy very i-vectors then now we can compute the to the best rotation matrix between the noisy i-vectors and their cleavers and using svd decomposition the once we've done this when we have the best translation and rotation for a given noise on the test the weekend extract the test i-vector we apply we start by applying the translation a minus here we subtract the centroid of the the noisy i-vectors and then we apply the rotation and then either translation to and up with its cleaver so we use needs and switchboard data for training and the nist two thousand and eight four test that seven condition we are using nineteen mfcc coefficients plus energy plus their first and second derivatives five hundred twelve components gmm our i-vectors have a four hundred components under using the two covariance scoring so here we are applying each algorithm independently and then what combining the two so we've the first algorithm i'm up we can achieve from forty to sixty percent for a t v equal error rate improvement for each noise for the first algorithm we jan achieved up to forty five percent of equal error rate improvement but when we combine the two in the for one iteration or for you we can and up with up to eighty five percent of whatever it improvement here i presented the data for male they may for male data and to your for you might but well for female it's the error rates are a little bit tired but it's efficient for both the and here we compare the two algorithms and their combination on heterogeneous the setup it's the when we use a lot of data noisy and clean data for enrollment and test with different snr levels on the target and test and we can see that's a it's it remains efficient in this context so as a summary using i'm out or that they kept algorithm we can improve the equal error rate from forty to sixty percent but the interesting part is that combining the two can achieve for better gains thank you so we have questions is the patient matrix a noise and it's or anti noise that yes that's really different sorry yes here we're estimating for each different noise at different a translation and rotation matrix we just want to show the efficiency of this technique but in of the future in another paper will be published in interspeech i guess we well it's except that so it will we propose another approach so that the that does not suppose a certain model of noise in the i-vector space and that can be used for many noise that can be trained using many noises and use it if you used efficiently on the test with different places so here is to just to show the how four we can go to the best case scenario but in another paper we show how we can extend this to go away many noises and i was presentation so if you go back many years ago how lemon oppenheimer had a sequential map estimation that be used for speech enhancement obliterated back and forth between noise suppression filters and speech parameterization so you're iterating back and forth between two algorithms here you show results we had one iteration to iteration is there any way to come up with some well maybe two questions here anyway to come up with some form of convergence criteria that you can assess and second is there any way to look at the i-vectors as you go through the two iterations to see which i-vectors are actually changing the most that might tell you a little bit more about which vectors are more sensitive to the type of noise so the first question so the first question was is there any way to look at a convergence criteria because when you say eight or two you need to know whether you convergence and okay so well here what we've that is just to iterate many tendency at which from a which level we get we start the making the results were so it's not really it's not that the we haven't the gone that gone there in that so if you look at the two noise types you cycling fan noise and i think you had to car noise so both are low frequency type noises can you see if you have similar changes in the i-vectors in both those noise types yes maybe i can't the common in that because i haven't then the full analysis but the just from the right we can i can tell you for sure for sure is the that the efficiency depends on the on which noise you're playing at all so it sufficient store but it's it can be the that is in the way that makes it more efficient if we have different noises in the between enrollment and test thank you for the nice presentation one a while ago try to read original are mapped paper so if you don't mind i just as a question about the original i'm out that the iterative one sorry that i didn't understood original are you map yes not data at one okay so go like i mean in the block diagram that you how can you go back to the block diagram of this or email yes so you're estimating extracting noise from the signal or somehow estimating the noise and in the signal so and then you go up to the for noisy and of zero db that the speech and noise are of steam similar or same strengths over there can you tell us how would you or in extracting noise from signal in zero db so here were using energy based voice activity detection system but we are we just making the threshold more strict in order to avoid the and you got with speech confused as noise so it's not we we did the well as sophisticated the voice activity detection system for this task specific well as the avoiding a slight as much as possible to end up with the with speech by using a very strict this one on the energy c use the it's just it's quite amazing the level of improvement you gain from twenty something to date present it is it is quite something that it feel it feels that you have very good model of noise here and if you have such thing then it would make sense also to just check we is speech enhancement i mean you have this and misty based approach like wiener filtering if you have a good model the contract the noise than it is good to also compare with that was to do you like feature enhancement noise reduction in compare with that as well just a common yes okay okay that doesn't be any more questions over so that the speaker