0:00:15 | given an i-vector on the value |
---|
0:00:18 | can be decomposed in part |
---|
0:00:20 | speaker all part with your town zero on |
---|
0:00:24 | with |
---|
0:00:27 | matrix v |
---|
0:00:29 | score one contains them but this of |
---|
0:00:32 | and they can based voice subspace |
---|
0:00:34 | and always it you |
---|
0:00:37 | which is like to speaker factor normally distributed |
---|
0:00:42 | which is |
---|
0:00:43 | so we to do is a |
---|
0:00:45 | consummate weeks from now |
---|
0:00:48 | which is for inside too |
---|
0:00:50 | and the lies in most commonly used |
---|
0:00:53 | p l system for i-vectors in which shown in effect is kept for |
---|
0:00:59 | the decision score |
---|
0:01:01 | proposed by someone prince |
---|
0:01:03 | is or log likelihood ratio |
---|
0:01:06 | in which we can see that |
---|
0:01:09 | the computing the scroll depends only on the |
---|
0:01:12 | nolan shelf |
---|
0:01:13 | i matrix fifty transpose of five but |
---|
0:01:16 | for speaker |
---|
0:01:18 | factor |
---|
0:01:19 | and vector transpose it proves long down |
---|
0:01:23 | which content to talk about every reliability |
---|
0:01:30 | there shouldn't fairly on modeling can provide good performance but |
---|
0:01:34 | it has been shown that just performance are achieved only if the condition and prosody |
---|
0:01:40 | a it follows the and extraction of i-vector all this conditioning posteriors |
---|
0:01:46 | and is summarized by whitening most commonly used a |
---|
0:01:51 | whitening is a standardization and length normalisation |
---|
0:01:58 | i was matrix |
---|
0:01:59 | of liability shown in for the standardisation |
---|
0:02:02 | can is a total covariance matrix |
---|
0:02:05 | so within speaker covariance matrix the volume |
---|
0:02:09 | eventually to eventually we iterate this process |
---|
0:02:13 | parameters are computed for the i-vectors present in the training corpus and applied to test |
---|
0:02:18 | i-vectors |
---|
0:02:22 | assumptions of the mission p lda the vicinity |
---|
0:02:26 | justly |
---|
0:02:27 | and the linearity of eigenvoices it means that |
---|
0:02:31 | so the speaker are but can be constrained in a linear subspace |
---|
0:02:36 | and the mostly just a city of the radio or |
---|
0:02:39 | it means that a system to build your model assumes |
---|
0:02:42 | that a speaker classes |
---|
0:02:46 | statistics means that channel effects can be modeled |
---|
0:02:50 | in a speaker independent way |
---|
0:02:53 | so that the distributions shells a seven grams metrics |
---|
0:02:58 | so it's independency between number |
---|
0:03:01 | and the speaker factor |
---|
0:03:03 | and the equality of covariance |
---|
0:03:06 | garlic which occurrence it means that there are also between the residual between the actually |
---|
0:03:12 | beach of a class |
---|
0:03:13 | and the middle parameter |
---|
0:03:14 | computed |
---|
0:03:16 | for the jelly a seem to be uncorrelated |
---|
0:03:20 | normally distributed a on the explained by |
---|
0:03:24 | so front it's a simple |
---|
0:03:27 | of the development corpus |
---|
0:03:30 | so randomly |
---|
0:03:31 | and that surrounds the not vary with the effects being more target |
---|
0:03:38 | on the left as a graph |
---|
0:03:41 | is the simple condition of the p lda model is in speaker factor one dimension |
---|
0:03:47 | one additional subspace |
---|
0:03:50 | where is no more |
---|
0:03:52 | while stoned a normal prior for the speaker factor |
---|
0:03:56 | and some classes with the same |
---|
0:03:59 | viability metrics |
---|
0:04:01 | or am is that i-vector no lie on |
---|
0:04:04 | the nonlinear and find it connects subsets of an impostor |
---|
0:04:10 | so as the distribution of i-vector noise |
---|
0:04:13 | which is referred to as it's very core distribution |
---|
0:04:19 | we think that perhaps insurance that exists a renowned speaker-independent admits a parameter on the |
---|
0:04:24 | of within stego abilities questionable |
---|
0:04:27 | in such a not affect be modeled in a speaker independent way |
---|
0:04:31 | it's difficult to sure that something is right or something is wrong |
---|
0:04:37 | for example if we find out or ration significant duration between |
---|
0:04:42 | the whole |
---|
0:04:44 | and the class parameter |
---|
0:04:46 | the effect drama to it where you're late the estimation of random variable |
---|
0:04:54 | first we present the deterministic approach |
---|
0:04:58 | why printing deterministic approach to compute a purely apparently fast |
---|
0:05:03 | because first two and we try some |
---|
0:05:07 | deterministic approach is an remarks and that other approaches |
---|
0:05:11 | not |
---|
0:05:12 | all relevant sometimes a not so but the to suit |
---|
0:05:17 | it should still there is not optimal for i-vector cycle distribution |
---|
0:05:22 | can we replace is sophistication of the expectation maximization maximum likelihood |
---|
0:05:28 | estimation of |
---|
0:05:30 | parameters |
---|
0:05:32 | by a simple and straightforward while stifle wildest an acoustic approach |
---|
0:05:37 | so we want to know if |
---|
0:05:40 | so application of the maximum likelihood |
---|
0:05:44 | approach compute the parameters of the india |
---|
0:05:49 | brings significant improvement of performance |
---|
0:05:54 | we did not sorry may be the value when signals the between losing into programs |
---|
0:05:59 | matrix was completely |
---|
0:06:00 | on our development corpus |
---|
0:06:04 | a singular value decomposition of the between speaker covariance matrix |
---|
0:06:08 | give a matrix |
---|
0:06:10 | whose columns are |
---|
0:06:12 | so eigenvectors of the weighting between speaker |
---|
0:06:16 | liability and the their remote matrix of eigenvalues |
---|
0:06:21 | sorted in decreasing order |
---|
0:06:24 | un a wrong are less and b |
---|
0:06:27 | we can |
---|
0:06:30 | compute |
---|
0:06:31 | as arounds principle between speaker variability |
---|
0:06:36 | and summarize it's and metric speech times t matrix |
---|
0:06:40 | defined by the question for |
---|
0:06:45 | the fast not to x p one two we are used to be turned on |
---|
0:06:49 | matrix composed of the first occurrence of p |
---|
0:06:52 | and so they're gonna matrix don't i want to well |
---|
0:06:57 | is only comprise of the |
---|
0:06:59 | highest hardest |
---|
0:07:01 | eigenvalues |
---|
0:07:03 | and so we propose a two |
---|
0:07:07 | carry out |
---|
0:07:09 | experiment with only |
---|
0:07:11 | i w conditioning |
---|
0:07:14 | conditioning and the system the still addition according to |
---|
0:07:19 | within class covariance matrix |
---|
0:07:21 | followed by next lemmatization |
---|
0:07:23 | and the direct estimation of others at the parameters of the p l |
---|
0:07:28 | the lda without which emitted and then and |
---|
0:07:31 | on the bus on development corpus |
---|
0:07:35 | so the scoring replaced by is the smart this is the total covariance matrices |
---|
0:07:40 | for |
---|
0:07:42 | is estimated by |
---|
0:07:44 | that at the transmitters of the development corpus |
---|
0:07:47 | and speaker levity metrics fifty transpose by be want to all |
---|
0:07:55 | suppose can be justified if we consider somebody solely data from the development corpus |
---|
0:08:02 | we can express as a factor and the parameters |
---|
0:08:06 | speaker and with your |
---|
0:08:08 | factors and she |
---|
0:08:10 | well i on the value s is the mean vector director of speaker s |
---|
0:08:15 | we show in the article that the covariance matrix is be i two well as |
---|
0:08:19 | desirable that the speaker factor is standardised mean zero and i don't to metrics for |
---|
0:08:26 | ability |
---|
0:08:27 | and the dependence between that and variables |
---|
0:08:32 | remark that only the new which of the covariance which is a necessary condition |
---|
0:08:36 | is the |
---|
0:08:39 | shift |
---|
0:08:40 | and we cry and to obtain the lda scoring |
---|
0:08:48 | next mission is known to improve the question it is so we compute the custody |
---|
0:08:53 | of the speaker and was or fact also for development corpus |
---|
0:08:57 | before and after length normalisation |
---|
0:09:00 | top graphs shows |
---|
0:09:03 | and distribution offices quell line source to standardise digital factors |
---|
0:09:09 | left as the speaker factors on whites are ways of the optimum |
---|
0:09:14 | the dashed lull i and is |
---|
0:09:17 | the distribution of the key to |
---|
0:09:20 | the speaker factor or must follow |
---|
0:09:25 | a key with a degrees of freedom |
---|
0:09:29 | and still on |
---|
0:09:31 | okay to is a p u is of freedoms peas dimension of the i-vector space |
---|
0:09:37 | we show it's not use that |
---|
0:09:39 | for all |
---|
0:09:40 | development board line |
---|
0:09:43 | and so for evaluation |
---|
0:09:46 | datasets |
---|
0:09:48 | there is a mismatch |
---|
0:09:50 | between them |
---|
0:09:54 | and as a distribution of an intimate we can give it a distribution |
---|
0:10:01 | remark also the several dataset shift between |
---|
0:10:05 | development and evaluation dataset |
---|
0:10:09 | after length normalization |
---|
0:10:12 | is the volume |
---|
0:10:14 | right care to experiments with |
---|
0:10:18 | manage to compute parameters and with a deterministic approaches |
---|
0:10:23 | in both cases |
---|
0:10:24 | we can see that |
---|
0:10:26 | so the numbers and the t v |
---|
0:10:29 | partially reduced |
---|
0:10:31 | and the shift |
---|
0:10:34 | between the development and evaluation |
---|
0:10:38 | mark sets a deterministic approach |
---|
0:10:42 | improves the question e g |
---|
0:10:43 | in a similar manner to ml technique |
---|
0:10:47 | what's that is on the and it's recognition but most distant of motion t |
---|
0:10:53 | always use of |
---|
0:10:54 | three systems |
---|
0:10:58 | we ultraviolet of conditions of the nist speaker recognition evaluations on eight ten |
---|
0:11:05 | twelve telephone |
---|
0:11:08 | is that the noisy environment |
---|
0:11:11 | with the system |
---|
0:11:13 | was a length normalization |
---|
0:11:15 | following do not exist from going to signal |
---|
0:11:19 | so that learns metrics and |
---|
0:11:21 | two w which two cases |
---|
0:11:24 | what is you know and mel |
---|
0:11:26 | estimate of parameters and is a deterministic |
---|
0:11:29 | an estimate of parameters |
---|
0:11:32 | we can see |
---|
0:11:34 | you can see that the result of the same in terms of |
---|
0:11:39 | the colour right |
---|
0:11:40 | between the two last the last two techniques |
---|
0:11:43 | in terms of this you have the probabilistic approach women superior |
---|
0:11:48 | and we mark sets l w conditioning performed a bitter |
---|
0:11:53 | done the l signal conditioning |
---|
0:11:56 | event with a deterministic approach |
---|
0:12:04 | so no you consider that maybe the fact that so |
---|
0:12:09 | and the end and better approach doesn't bring as expected |
---|
0:12:13 | improvement of performance |
---|
0:12:15 | maybe is due to the fact that the g p l d ar model is |
---|
0:12:20 | not optimal for i-vector spherical distributions |
---|
0:12:26 | so we compute |
---|
0:12:28 | two series for development corpus |
---|
0:12:32 | first the average people celebrated of zero is you of our observations |
---|
0:12:36 | given the model |
---|
0:12:38 | even |
---|
0:12:39 | she and until |
---|
0:12:40 | and standard t money t |
---|
0:12:45 | which we consider that are likely would |
---|
0:12:48 | off |
---|
0:12:49 | class but also for likelihood |
---|
0:12:51 | of the class given number |
---|
0:12:55 | then we compile this |
---|
0:12:57 | likelihood to the parameter of position of the class consider a wider of probabilistic class |
---|
0:13:04 | position it's pasta or like a likelihood of the speaker and for speaker factor of |
---|
0:13:09 | the class |
---|
0:13:12 | and we display |
---|
0:13:16 | the two series |
---|
0:13:19 | all horizontal |
---|
0:13:21 | wasn't really is parameter of class position and optical |
---|
0:13:26 | the likelihood |
---|
0:13:28 | of the reason you |
---|
0:13:30 | according |
---|
0:13:31 | we can model |
---|
0:13:35 | the first graph |
---|
0:13:37 | shows results always that would next normalisation |
---|
0:13:41 | with i-vector lost provided buys extractor |
---|
0:13:44 | and we remark here that |
---|
0:13:47 | no volition a cross between the position of the class |
---|
0:13:53 | and is a likelihood |
---|
0:13:55 | of the residue |
---|
0:13:58 | each time we displays a coefficient of determination task well |
---|
0:14:02 | two scroll from zero to one |
---|
0:14:05 | when which indicates l well data for points fit alignment |
---|
0:14:10 | the task was equal to zero point zero four |
---|
0:14:13 | close to zero |
---|
0:14:16 | after are all length normalisation |
---|
0:14:19 | a significant reduction |
---|
0:14:21 | appears between the likelihoods of the class factors and the likelihood of there is you |
---|
0:14:27 | a squirrel are equal to zero point filing nine and zero point six four |
---|
0:14:35 | so there is a dependency between |
---|
0:14:37 | the actual vulnerability |
---|
0:14:41 | matrix of class and the probability position of this classic sperry by the likelihood of |
---|
0:14:46 | the fractal |
---|
0:14:48 | so we can see they are that's the show and it there was a dusty |
---|
0:14:51 | of the raising your |
---|
0:14:58 | we compute the previews |
---|
0:15:02 | results who is well training set |
---|
0:15:05 | in which that are not evenly distributed across speakers |
---|
0:15:10 | so we can object that relations due to the can to differ information to speaker |
---|
0:15:15 | some four |
---|
0:15:17 | so we compute the same graphs and before |
---|
0:15:21 | but on the for is speaker |
---|
0:15:25 | training classes |
---|
0:15:27 | with the minimum number of sessions per training speaker |
---|
0:15:33 | we don't are you see that a minimal number of sessions speaker |
---|
0:15:36 | one from two to sixty two |
---|
0:15:41 | and this time for only segments of speaker which |
---|
0:15:46 | the more than this minimum well |
---|
0:15:48 | we compute the l score |
---|
0:15:50 | we see that before makes them addition there are no problems |
---|
0:15:54 | because the |
---|
0:15:55 | the two series are independent |
---|
0:15:57 | and after maximization be seen |
---|
0:16:00 | that event for |
---|
0:16:04 | uses speaker classes with the |
---|
0:16:07 | the maximum number of sessions |
---|
0:16:10 | the same |
---|
0:16:11 | was we took us |
---|
0:16:14 | is else well which are higher than zero point six |
---|
0:16:24 | so we remark |
---|
0:16:27 | that the j p alone modelling is a good model |
---|
0:16:32 | but if we are obliged to |
---|
0:16:34 | project that on the nonlinear also phones are |
---|
0:16:38 | problem is to be sure that and the most acoustic model with the quality of |
---|
0:16:44 | covariance |
---|
0:16:46 | will for from the simpson |
---|
0:16:51 | we don't dusty does take this out to replace the overall with a cluster between |
---|
0:16:55 | that parameter by the class dependent parameter |
---|
0:16:58 | steak the queen the local position of the class to fit to it |
---|
0:17:01 | actual distortions |
---|
0:17:05 | such an adrenaline is difficult to carry out |
---|
0:17:11 | because it induces a complex density |
---|
0:17:14 | i passing the within class variability parameters will nonlinear function |
---|
0:17:19 | or getting up length normalization and |
---|
0:17:22 | posting approaches as well which present over the i-vector on the one |
---|
0:17:27 | attempting to find out attic what why also as heavy tailed be |
---|
0:17:31 | discriminative classifiers pairwise discriminative |
---|
0:17:36 | all just on why we are obliged to ignore the non maybe because and all |
---|
0:17:41 | contain expected |
---|
0:17:43 | the art abilities |
---|
0:17:46 | may be related to some parameters |
---|
0:17:49 | acoustic |
---|
0:17:51 | just remark |
---|
0:17:54 | which the and w conditioning |
---|
0:17:59 | transform is the within class variability in the identity matrix |
---|
0:18:03 | and identity matrix as no |
---|
0:18:06 | principal components |
---|
0:18:08 | maybe it at alleviates is a constant of |
---|
0:18:13 | almost a dusty city |
---|
0:18:16 | thank you |
---|
0:18:22 | i |
---|
0:18:33 | condition man something eat with experiments that you replaced the probabilistic approach of estimating the |
---|
0:18:42 | parameters with the say on the screen |
---|
0:18:47 | the minister |
---|
0:18:49 | i think that's |
---|
0:18:50 | in the limit if you're stream have main speakers |
---|
0:18:53 | these two conditions exactly the same sort the only difference is that you're putting the |
---|
0:18:58 | prior in the one case |
---|
0:19:00 | okay so we present us with the number of the number of speakers average number |
---|
0:19:06 | of speakers a and i guess that's you can go to a small number of |
---|
0:19:10 | speakers when you train the model |
---|
0:19:12 | yes and it and difference that's as of the deterministic approach is not intended competes |
---|
0:19:19 | with a man a matter and then is the best way |
---|
0:19:22 | but just i was surprised by is a slight yelp of performance |
---|
0:19:28 | and so it's |
---|
0:19:31 | assume that maybe because our aim ml count |
---|
0:19:35 | be optimal because there is a problem of sphericity of data |
---|
0:19:40 | but deterministic approach is not |
---|
0:19:43 | and that's exactly this topic when we try to show that the norm |
---|
0:19:49 | of the speaker factors |
---|
0:19:52 | whether the full weight a |
---|
0:19:56 | yes i guess a because you have to treat them as random variables because the |
---|
0:20:01 | not simply points |
---|
0:20:03 | under the plp scheme there they have a posterior distribution |
---|
0:20:07 | "'kay" a better way |
---|
0:20:09 | to consider whether they following the distribution |
---|
0:20:13 | would be |
---|
0:20:14 | broccoli to at the trace |
---|
0:20:16 | the posterior covariance matrix v should be should also be added when you can't leave |
---|
0:20:20 | the norm in order to see the overall distribution rather than |
---|
0:20:25 | dot products on okay |
---|
0:20:30 | marketing that's |
---|
0:20:31 | so that in that's the same rationale "'cause" with evaluation was test like toss a |
---|
0:20:37 | rice with development corpus used as vectors |
---|
0:20:42 | same effect okay was that the difference is that between length normalization they'll score is |
---|
0:20:47 | not close to zero |
---|
0:20:49 | the cost for off test |
---|
0:20:52 | i-vectors before estimators and has provided by the extract all |
---|
0:20:58 | is the to zero point three |
---|
0:21:02 | where is a vector of test not used for training the lda factor analyses so |
---|
0:21:08 | there is a shift only not only for mean |
---|
0:21:11 | but only four |
---|
0:21:13 | this problem of almost instantly |
---|
0:21:19 | just one quick what i just missed your point when you said |
---|
0:21:23 | i think you were saying that |
---|
0:21:27 | trying to make the det spherically distributed you thought was inconsistent with being gaussian |
---|
0:21:32 | why's that |
---|
0:21:36 | its empirical but the gaussian high dimensional space are sphere |
---|
0:21:41 | yes some very |
---|
0:21:45 | but |
---|
0:21:47 | we constrained speaker fact all floral |
---|
0:21:52 | and sphere |
---|
0:21:53 | the just a goat |
---|
0:21:55 | to assume that the within class but not be a set of the problem but |
---|
0:22:03 | we will be affected |
---|
0:22:05 | by the position |
---|
0:22:07 | writings the posterior |
---|
0:22:09 | the prior distribution of the i-vectors |
---|
0:22:11 | zero mean unit identity rate both in high dimensional space that will be approximate |
---|
0:22:19 | so that that's i mean that's care what happened i not as mathematically what a |
---|
0:22:23 | high dimensional space so why's it in its just |
---|
0:22:28 | here we actually a lot of what's a spherical distribution for phase as well |
---|
0:22:38 | and applying model with the quality of correlators is a difficult the surface |
---|
0:22:44 | maybe a see that length normalization is a whole technique projects on the sphere |
---|
0:22:50 | instead of adjusting the tanks taking the information i think |
---|
0:22:59 | good but which |
---|
0:23:02 | but not so |
---|
0:23:05 | discussion |
---|