Speech Transcript - Bayesian Speaker Verification with Heavy-Tailed Priors

the title of my talk uh vision speaker verification with heavy tailed right yeah or not yeah right oh yeah oh or okay oh in a nutshell uh but still is about it is um applying uh joint factor analysis where i vectors as features so i'll be assuming that you have uh some familiarity with joint factor analysis i vectors and cosine distance scroll right uh the key fact about i actors is that they provide a representation of speech segments so arbitrator durations by vectors of uh fixed dimension uh these all these vectors uh seem to contain most of the information needed to distinguish between speakers and as a bonus they are of relatively low dimension typically four hundred rather than a hundred thousand as in the case of a gmm supervectors uh this means that it's possible to apply modern bayesian that because of uh pattern recognition to the speaker recognition problem we've banished the time dimension altogether and we're in a situation which is quite analogous to other action recognition problems the um i think i should at the outset explained what i need but where nation because it's open to several interpretations um what i intend is that it is uh in my mind the terms station and for the ballistic are synonymous with each other the idea is two as far as possible do everything within the framework of the cartoons probability it doesn't really matter whether you prefer to interpret probabilities and frequentist terms or and added then surely terms three rules the probability of the same or only two the sum rule and the product a very they give you the same results in both cases um and the advantage of this is that you have uh logically coherent way of doing reasoning in the face of uncertainty the disadvantage is that in practise you usually run into a computational brick wall in pretty short order if you try to to follow these rules consistently so in fact it's really only been in the past ten years that's uh this field of they shouldn't pattern recognition has really taken off and that's that thanks to the introduction um fast approximate methods all bayesian inference uh in particular age a variational bayes uh which makes it possible to treat probabilistic models which are well more sophisticated then was possible in the case of uh traditional statistic so the you know the unifying theme in my twelve will be the application of variational bayes method to the speaker recognition proper um i start out with the traditional assumptions in joint factor analysis that speaker and channel effects or and uh so statistically independent and gaussian the strip and in the first part might well i will simply a to show how joint factor analysis can be done under these assumptions using i pictures as features and a patient rate um this already works very well yeah in my experience it gives better results them then joint factor analysis uh the second part of my talk will be concerned with hell a variational bayes can be used two model non gaussian behaviour in the data uh i i found that this leads to to a substantial uh improvement in performance and uh as an added bonus it seems to be possible to do away with the need for score normalisation across the the whole day ah the fun part of my talk of this factor okay it's concerned with the problem of how to integrate the assumptions of joint factor analysis and cosine distance scoring you know coherent framework um on the face but this looks like a hopeless exercise okay the the assumptions appeared to be completely different uh however it is possible to do something about this thanks to the flexibility provided by variational bayes so even though this is like that of i think this is where uh talking about because it's a real object lesson in how harmful these beijing methods are at least potentially um before getting down to business uh i just say something about the way of organise this presentation uh in preparing the slides i i tried to ensure that they were reasonably complete and self contained okay what are the idea i have in my mind is that if anyone was interested in reading through the slides afterwards they should tell a fairly complete story okay but uh because of time constraints i'm going to have to gloss over uh some points in V in your presentation uh for the same reason there's going to be somehow in the slides okay okay to do some hand waving their um i found that by focusing on the gaussian dance just statistical independence assumptions uh i could explain the the variational bayes ideas but the uh an animal uh amount of uh of technicalities so i would spend almost half we time on the first part really tall uh on the other hand the last part of the talk uh is is technical is addressed primarily two uh members of the audience who would have read say the the chapter on variational bayes and uh bishop's book okay okay so here the the the the basic assumptions of factor analysis with i vectors uh features um we had used D for data as for speaker C for channel or recording okay we have a collection of recordings per speaker um we assume that that can be decomposed into two statistically independent parts a speaker part um uh channel or these assumptions are questionable but i'm going to stick with them for the um first part of the channel um this uh this model well we have replaced they had the supervector by and observable i vector already has a name it's known and uh face recognition as probabilistic a linear discriminant uh i mouses uh make i think as a that's twenty nine is the true covariance model okay but the other guy is is the one that you will find it very uh and the best picture the um it's not perhaps quite as straightforward as it appears because uh if you're dealing with high dimensional features for example mllr features you can treat these are covariance matrices as being a full rank yeah and you need uh a hidden variable a representation of the model which is practically analogous to the hidden variable description of joint factor analysis so here on the left hand side D on that's an observable ivector not a a hidden supervector um it turns out to be convenient for the heavy tails stuff to refer to the eigenvoice matrix and the eigenchannel matrix matrix using subscripts you want and you too rather than the traditional names that you wouldn't be uh same thing for the um where the speaker factors are labelled X one the channel factors i label them X two or B or indicates the V dependence on the right or the or the uh the channel uh there's one difference here from the um conventional formulation on a joint factor analysis in the lda this uh residual term the epsilon which in general has been modelled by right now by a diagonal covariance or or precision matrix it's associated traditionally with the channel rather than with the speaker okay in jfa i i formulated it slightly differently but i i'm i'm just going to follow this uh uh this model in in in this presentation so because the the residual epsilon is associated with the channel there are two noise terms okay that's the contribution of the eigenchannels okay that contribute uh this so the so the channel variance and the contribution to the residual and there's a precision matrix sense is to say the inverse of the covariance matrix sorry about that too because you have statistical independence uh is the graphical model that goes um with that application uh if you're not familiar with is that we just take a minute to explain how to read these uh these diagrams um uh a much uh mode like that in the case um observable there oh the black nodes in the case hidden variables the do not indicate model parameters and the arrows in the case conditional dependency okay so the the i vector is assumed to depend on a speaker factors the channel factors um residual this like notation indicates that something is replicated server time okay there are several sets of channel factors one for each recording but there's only one set of speaker factors so that's outside three of the plate uh here are specified say that parameter lambda but i did about specifying the distribution oh speaker factors because it's understood be standard normal um so as i mentioned well including the channel factors enables this decomposition here it's not always nest if you have i mean vectors of dimension four hundred it's actually possible to model full rank are rather full precision matrices instead of diagonal okay and in that case this time doesn't actually contribute anything um i have found it useful well in experimental work to use this term to estimate eigenchannels on microphone data so it's useful to people and in fact it turns out that so these channel factors can always be eliminated at recognition time that's a technical point i come back to it later if i okay so how do you do speaker recognition with the the lda model okay i'm gonna make some provisional assumptions here one is that you've already succeeded in estimating the model parameters yeah eigenvoices the eigenchannels et cetera and the other that you know how to uh evaluate this thing known as the evidence integral okay you have a collection of ivectors associated with each speaker you also have a collection of hidden variables to evaluate the marginal likelihood you have to integrate over it variables so and assume that we've tackle these two problems uh it turns out that the key to solving both problems in general is to evaluate the posterior distribution of the hidden variables and i returned so that in a minute but first i just one to show you have to do speaker recognition okay we take the simplest case the the the core condition in the nist evaluation yeah one recording which is usually designated as test mother designated trained and you're interested inception the question whether the two speakers are the same or different so if the two speakers are the same okay i think it's natural to call that the alternative hypothesis but that doesn't seem to be an a universal really about that um then the likelihood the atoms is calculated okay assumption that there is a common seven speaker factors but different channel factors for the two recording on the other hand it's the two speakers are different and then be calculation of these two likelihoods can be done uh independently because the speaker factors and that's channel factors or on time for that record so the point is that everything here is an evidence into okay if you can evaluate the evidence integral you're in this uh a few things to note uh unlike traditional likelihood ratios this is symmetric and D one and D two uh it also has an unusual denominator here okay you don't see anything like this and joint factor analysis okay this is this is something that comes out of following will be the patient um power line and it's actually we see this later potentially and effective method of score normalisation and the other point i would like to stress is but you can write down the likelihood ratio for any type speaker recognition problem in the same way for instance you eight conversations in training one conversations and test we might have three conversations and train into conversations and test in all cases it's just a matter of following the rules of probability consistently and you can write down the mic ratio or bayes factor uh as it is usually called in this field uh the standard insensible had to be evaluated exactly under gaussian assumptions table is it's rather convert and if you do relax the gaussian assumptions you can't do it um uh i believe that even in the gaussian case you're better off using variational bayes and the co disagrees best but i decided to let it stand and we can uh yeah go into it later if so um if there's time the uh key inside here is that this uh this inequality that you can always find a lower bound on the evidence with and we distribution of it on the hidden factors um it's and i i grant you it's not obvious just by looking at it but the derivation turns out to be just a cost once all the facts come back like right but are or a nonnegative um and what i'll be focusing on is the use of the variational bayes method so um find a principle approximation to the the true posterior oh let me just digress a minute to explain why posteriors of about nine there's nothing mysterious about this posterior distribution you you just apply bayes' rule this is what you get you can read all this term here from the graphical model this is the prior this is the evidence okay practically straightforward the only problem can practise says that you can't evaluate yeah exactly evaluating the evidence and evaluating the posterior are two sides of the same problem you can't do it just by numerical integration because these uh these integrals are in hundreds of dimensions um another way of saying the difficulty which i i think is a useful way to of thinking about it is that whatever factorisations you haven't the prior that's be a page they get destroyed when you multiply by okay factorisations in the prior art statistical independence assumptions statistical independence assumptions get destroyed in the poster uh it's easy to uh to see why this the case in terms of the graphical model but as i said i'm going to draw so if you uh a few things and return to this question variational bayes the um yeah the in the variational bayes approximation is that what you acknowledge that uh independence has been destroyed in the posterior but you go back and forth so impostor okay and you look for what's called a variational approximation of the poster variational because it's actually free form as in the countless variations you don't impose any restriction on the functional form of oh yeah and there's a standard set of couple uh update formulas that you can that you can apply here the couple because this expectation is calculated with the posterior on extra this expectation is calculated with the posterior next one so you have to uh iterate between the two um nice thing is that this iteration comes with ian like uh convergence uh guarantees and it's avoided altogether the need to invert um large sparse block matrices which is the only way you can evaluate the evidence exactly and then only in the gaussian okay uh this uh posterior distribution or the the variational approximation of the posterior distribution is also the the key to estimate and model parameter okay you use a lower bound as a proxy for the likelihood of the evidence and you see two optimise a lower bound calculated over uh a collection of training speakers uh here i just taking the definition and rewritten it this way uh it's convenient to do this because this term here doesn't involve me model parameters parameters at all so the first approach problem or would be just too uh optimise uh this term here okay the contribution again to the uh to the evidence criterion by summing this overall speaker okay um this when you when you work it out turns out to be formally identical two um probabilistic principal components analysis it's just a least squares problem the only um and it's actually the E M auxiliary function for probabilistic principal components analysis the only the only difference is that you have to use the variational posterior rather than be other than the exact that's true um there is another way of estimation which i called minimum divergence estimation the this is pretty good you can of confusion over here so uh try and explains briefly there is concentrate this term here it's independent of the model parameters okay but you can do you can the i changes of variable here okay which minimise the B divergence but are constrained in such a way as to preserve the value of the um auxiliary function and if you minimise these divergences you will them keeping this thing you will then increase the the uh value you have adams uh criterion uh the way this work say in the case of speaker factors to minimise the divergence what you do is you look for uh i'm transformations of the speaker factors such that the first and second order moments are the speaker factors agree on average as as the number of uh speakers in the training set with the first order moment of the prior and the second order moment right that's that's just a matter of uh a finding an affine transformation that satisfies this condition you then applied the inverse transformation to update the model parameters in such a way as to keep the value of the uh yeah i'm auxiliary function fixed and it turns out that if you interleaved these two uh steps you will be able to accelerate the um the convergence so ah well just one comment about about this uh and i set out to do here is to produce point estimates of three eigenvoice matrix and the uh i'm the eigenchannel matrix uh if you are really hardcore bayesian you don't allow point estimates into your model you have to do everything in terms of prior probabilities um posterior probabilities so a true blue bayesian approach a prior on the eigenvoices and calculate the posterior again by variational bayes even the number of speaker factors could be treated as a hidden random variable okay and the posterior distribution could be calculated again by haitian right so there is an extensive literature on this on this subject uh and say that if there's one problem with variational bayes it provides too much flexibility you have to exercise good judgement as to which things you should try i wish things are probably not going to help in other words don't lose sight of your you're engineering objective and the particular thing i chose to to focus on was the gaussian assumption okay uh as far as i can see the gaussian assumption is just not realistic for the i don't a so that we're dealing with and what i set out to do using variational bayes was to replace the gaussian assumption with the exponential decrease adam famously by a power law distribution which uh allows four um outlier exceptional speaker of facts severe channel distortions uh in the data and this term black swan is amusing uh it so um romans had a had a phrase or a rare bird much like a black one intended to convey the motion of something impossible or inconceivable and they were in no position to know that uh likes one's actually do exist uh in australia um um a financial forecaster by the name of tell the a few years ago he wrote a polemic against the gaussian distribution called the black swan the um yeah actually rolled before they start rationed in two thousand and made which of course is the mother of all blacks ones and as as a result is it uh quite a bigger media splash okay it turns out that the um textbook a definition of uh the student's T distribution the one which i'm going to use in place of the gaussian distribution that this is a workable with the variational bayes there is a not a construction that represents the student's T distribution um as a continuous mixture of um normal random variable uh it's based on the gamma distribution is unimodal distribution on the positive real switch has two parameters that enable you to adjust the the mean and the variance independently of each other but it was is this okay in order to sample from a student's T distribution you start with a gaussian distribution with precision matrix lambda you then yeah the covariance matrix by a random scale factor drawn from the gaussian distribution and then you sample from the normal distribution with the modified covariance matrix is that random scale factor that introduces the the heavy tail behaviour um the parameters of the gaussian distribution of the gamma distribution rather determine the extent to which this thing is is heavy tail you have the gaussian at at one extreme at the other extreme you something called the the cushion distribution which is so heavy tail that the variances in from uh this term degrees of freedom it comes from classical statistics but it doesn't have any particular main uh in in this context okay so for example suppose you want to make the channel factors heavy tail in order to model applying channel distortion well you have to do here X so remember are you one set of channel factors for each recording so this is inside the plate you associate a random scale factor okay with that hidden random variable okay and that one time scale factor is sampled from a gamma distribution call the member with the freedom into so handy to the lda does this for all of the hidden variables and the gaussian P L D A model yeah of speaker factors have an associated scale factor random scale factor channel factors and so pseudorandom scale factor residual has an associated time and scale vector so in fact all i didn't just here are just three extra parameters three extra degrees of freedom in order to model the the heavy tail behaviour yeah these are some tactical points okay uh how you can carryover variational bayes from the gaussian case to the heavy tailed case and do so in a computationally uh efficient way um i refer you to the paper for these the key point that i would like to draw your attention to is that these numbers degrees of freedom can actually be estimated using the same evidence criterion as the eigenvoices and the eigenchannels okay here's some results this is a a comparison of gas really and how detailed P L D A um the several conditions of the nist uh two thousand and eight evaluation okay so this is the equal error rate and the two thousand and eight detection cost function okay it's clear it in all three conditions the there's a very dramatic uh reduction in errors uh both the dcf point and we are uh this was done without score normalisation if you do what score normalisation what happens this you get uniform improvement in all cases okay i'll simply lda i get uniform degradation probably uh student's T distribution but only does normalisation not help you it's a nuisance in the students to uh let me just say a word about score normalisation um it's usually needed in order to set the decision threshold in speaker verification in a trial dependent way um it so uh this typically french are computationally expensive and it complicates life if you if you ever have to do cross gender uh trials on the other hand if you have a good general model for speech in other words if you insist on the probabilistic yeah way of thinking there's no wrong for for score normalisation if there is no need for calibration but we're not there yeah um in practice is needed because of applying recordings okay which tend to produce uh exceptionally low scores for all of trials in which they are involved and what the uh student's T distribution appears to be doing is that the extra hidden variables these scale factors that i introduce appear the capable of uh of modelling this uh this outlier behaviour adequate thus doing away with the need for uh for score normalisation uh i should so i have a copy of about microphones each if the situation with telephone speech seems to be quite clear okay i guess of the L D A what's globalisation gives results which are comparable to cosine distance scoring get better results but uh heavy tailed the lda at least on the two thousand and a data and in general there about twenty five send better than traditional joint factor analysis uh but it turns out to break down and that an interesting way um um on microphone speech uh now how much yesterday he described an ivector extractor of dimension six hundred which could be used for recognition both microphone and telephone speech so we started out by training a model using only telephone speech speaker factors and the residual was modelled with a full precision right right okay then we augmented that with the with eigenchannels and everything was treated in the heavy tailed right okay um well turned out upon unfortunately is that we ran straight into the cushy distribution for the microphone transducer affect that means is that the variance all the channel effects microphone back that is infinite um it's a short so it's a short step to realise that if you have infinite variance for channel effects you're not able to speaker recognition so um i haven't been able to uh to fix this uh at present the best strategy would seem to be too project away the V troubles some dimensions using some type of P O D A that so that's not gene structure which i i believe we talking about uh in the next presentation okay oh and then come to the third part of my talk which concerns the question oh how it would be possible to integrate joint factor analysis or P L B A and call centre and scoring or something resembling a in a coherent probably fig right uh if you haven't seen these types of uh scatter plots there are very interesting okay each colour here represents a speaker and each point represents an utterance the speech um this is a plot of of supervectors projected onto the what is essentially the first two uh i vector components so you see what's going on here this is the well i motivation for cosine distance scoring cosine distance scoring ignores the magnitude of the vectors and uses only the angle between them as the similar signature and this is completely inconsistent with the assumptions all joint factor analysis because there seems to be for each speaker a principal axes of variability that passes through the speakers me the session variability for speaker is augmented in a particular direction the direction i mean vector where is jfa or P L V A assumes that you can't model session okay for all speakers in the same way the strip that's three statistical independence assumption in in in jfa um i thought of necessarily just to add a you have the ad in interpreting these these plots to have to be careful that it's not a notified to the well the way you estimate supervectors and so on we we do find these plots with with an vectors but we have to cherry the results in order to get um ice pictures like one right i showed you but the the principle that okay for this type of behaviour which i call directional scatter is the effect that's of the colour distance matcher yeah uh in speaker recognition i don't know how to account for it i'm not concerned with that question the only question i would like to answer is how to model this type of behaviour probabilistic okay as i i said this part is going to get of the technical it's addressed to people who have red the chapter and bashers book um variational right uh in order to get a handle on this problem there seems to be a natural strategy okay instead of representing each speaker by a single point next one and the speaker factor space represent each speaker by a distribution which is specified by i mean vector you and the precision matrix model the i'm vectors are then generated by sampling speaker factors from this just version i have but this inverted commas because the speaker factors very from one recording to remember okay as to channel but the mechanism by push the generator is quite different that's willing to come the man the trick is to choose the prior on the mean and precision matrix read speaker in which you and then the or not statistically independent because what you want is you want to precision matrix for each speaker which varies with the location of speakers mean vector and of course once you set this out your immediately going to run into problems you you does not hold all of doing point estimation of the perceptual matrix if you only have one or two observations of the speaker uh you have to follow the rules of probability system play integrator prior and the way to do that courses with um right okay so he was an accountant we can either seems to be only one way to to um one natural prior on precision matrices although we should prior uh i won't talk about this okay i just put it down there so that if you're interested you be able to recognise that this is just a generalisation of the gamma distribution okay if you take an equal to one this will reduce to the gamma distribution in higher dimensions it's concentrating on positive definite major um there is a parameter call the the number of degrees of freedom again okay that so determines how P uh this uh distribution is uh also this point i think is worth mentioning there's no loss of generality in assuming that W which would matrix here is it good to be identity the reason this is worth mentioning is that this turns out to correspond exactly to something that nudging does and uh he's processing if you're familiar with his work you know that uh he estimates that W C C N matrix in the speaker space and then lightens the data with that matrix before evaluating the uh because okay first thing then we have generated the decision matrix for the speaker the next step is to generate the the mean vector speaker and you do that using a student's T distribution okay once you have a precision matrix that's all you need if you just adding the gamma distribution you can sample the mean vector according to a student's T distribution uh and explained in the manual white you need to use the student's T distribution uh the point i would just like to draw your attention to at this stage is that because the distribution of you depends on the land there the conditional distribution lambda depends on you okay so that means but he precision matrix for a speaker and on location all the speaker in the speaker factor space so that means that you have somehow modelling this directional scout skip that um go to the um graphical model i think it's clear from this uh remember when you're confronted with something like this that everything inside the plate is replicated for each of the recordings speaker everything that outside of the plate is done once per speaker okay so the first step is it generate the precision matrix you then generate the mean for the speaker by sampling from um a student's T distribution of call the hidden scale factor W and the parameters of the gamma distribution out data once you have the mean and the precision matrix you generate the speaker factors re speaker uh for each recording remember we're making the speaker factors depend on or okay bye something from another student's T distribution the interesting thing is that these three parameters alpha beta and tell the term and whether or not this oh business it's going to exhibit directions cat normal okay sorry this can be explained without some hundred you have to do it calculation remember landers a session matrix land inverse is the covariance matrix someone and comparing here is the distribution of the covariance matrix given the speaker dependent parameters and the prior distribution of the covariance you see what you have is a weighted average of the prior expectation and another term now this second term here and all the speakers me it's a rank one covariance matrix the only variability that's allowed is in the direction of the mean vector this is a picture book response to it which is exactly what the doctor four action scatter um i'd draw your attention to the fact that the this term here is multiplied by this so it depends on how the number of degrees of freedom and this uh random scale factor that okay so the extent a directional scattering is going to and on the behaviour of this uh this much uh it depends in fact on the parameters which govern the distribution oh the random scale factor W yeah W has a large mean and a small variance you can say that this this thing but the a fact all the variability in the direction of the mean vector okay so in that case directions kevin would be present to a large extent four um most speakers in the data on the other hand there's another limiting case where uh you can show that the thing reduces to to heavy tailed field again and there's no directional scattering at all so the key question would be to see how this model trains that uh to be frank this is going to take a couple models uh i don't have any results to uh okay so in conclusion um well guess immediately it's an effective model for speaker recognition and it's just joint factor analysis with ivectors uh as features my experience spain that it works better then uh traditional joint factor analysis even though the basic assumptions or are open to question okay variational bayes allows you to go a long way in relaxing these assumptions you can model outliers by adding these hidden variables you can model directional scattering by having these variables the derivation of the variational bayes update formulas is mechanical no i'm not saying it's always easy but it is coming okay and it comes with um yeah my convergence guarantees so that you can you have some hope of uh the barking or implementation one can get is that in practise you have to stay inside the exponential second work uh i can the other one uh it's also i'm personally of the opinion that is uh in order to get the full benefit of these methods we need for recall informative priors that is to say prior distributions on the hidden variables whose parameters can be i use the word of this is it because uh estimated is is really an appropriate here and this is a strong uh larger training sets so the example is that one of the hidden variables that i just uh disk right okay are controlled by a handful of scalar degrees of freedom and these can all be estimated using the using the evidence criterion from uh from training data now it to be to be trying to locate the advantage of probabilistic methods is is that you have uh logically coherent way reasoning and the phase uncertainty the disadvantage is that it needs timing um after okay too to master the techniques and to program them if you're principal concern is to get a good system up and running quickly i would recommend um something michael signed distance uh on the other hand if you're interested in mastering this family of methods i think they're really only three things you need to look at okay there's the original paper by prince analogy or a probabilistic linear discriminant analysis in uh face recognition that's the gaussian case everything you need to know about probabilities ambitious book which ah i highly recommend it so it's very well written and it starts from first run oh uh this is the this is paper um i don't believe the paper is actually found its way into proceedings but it is available along those lines uh okay thank you much right this is the action no yeah but it no and of course uh thanks representation which uh reuniting uh you use you to uh uh encourage us to uh as you said if you wanna which solution you can do it that way if you want a more principled solution uh but i i cannot uh notice i just know is that uh they use a point of uh you algorithm is based on that point is to um so you have a speech utterance uh use your factor analyses to summarise i decided and you completely ignored in certain people to process and then from that you should use that we should keep track of that uncertainty so how do you like that it's a an entirely empirical uh decision based on on the effectiveness of of machines uh cosine distance scoring no it just works really well um attend somewhere maybe so um incorporate the uncertainty in the i vector estimation procedure don't seem to have they complicate life it's it's really imperative what it's dictated by baltimore um but you know it's true tuition um one one question regarding you results are presented so i would uh one categories remote yeah um conversation sides down and so you know you were yeah you i picture which finding your um retail setup house and you nine you but when they did that you see it was not the ten second data you can circle um i well the best results were obtained without score normalisation okay so we're was no question of uh introducing a corporate your question is maybe in the gaussian case should we should be used that oh no a what you need yeah to me distribution i right so you see i yeah when you open we estimate you do these particular i picked right maybe oh and second um but i my experience has been and then this black or white okay is that it's better not to use ten seconds later right uh in the case of the indicate an interesting aspect of ivectors is but um they perform very well on the ten second in sec okay in other words the estimation figure drawing vectors is much less sense so um short duration um relevance map right prob a high one based on what the impact uh you make an assumption that um some um fig oceanic you the last the slide somehow exhibit a gaussian decent last of it i this way i mean he's doing at a nonparametric way to do so and i only sensations back so i i think i was careful to use students T distributions everywhere yeah decreases that that that require that it's that which gives me the flexibility to model of players and directions got that does that answer your question or yeah innocently used to model it uh some highlights are made are much more oh at the last uh last variational bayes does require that and in fact it was an actual an extra restriction that you have stay inside the the exponential uh funnelling solely the art consists in achieving what you want to do subject of those uh strange i is that an adequate response yeah about the product yeah you you hmmm you know so and like you yeah i i you right you can well in fact uh we use the evidence criterion you which is exactly the same criterion for estimating these the the numbers of trees of freedom as we did for estimating the eigenvoices and the eigenchannels so it's completely consistent there was no manual tuning thank you so there was a question let me think but okay because

Bayesian Speaker Verification with Heavy-Tailed Priors

INVITED TALKS

Added: 14. 7. 2010 11:08, Author: Patrick Kenny (Centre de recherche informatique de Montreal), Length: 0:55:32