| 0:00:15 | given an i-vector on the value |
|---|
| 0:00:18 | can be decomposed in part |
|---|
| 0:00:20 | speaker all part with your town zero on |
|---|
| 0:00:24 | with |
|---|
| 0:00:27 | matrix v |
|---|
| 0:00:29 | score one contains them but this of |
|---|
| 0:00:32 | and they can based voice subspace |
|---|
| 0:00:34 | and always it you |
|---|
| 0:00:37 | which is like to speaker factor normally distributed |
|---|
| 0:00:42 | which is |
|---|
| 0:00:43 | so we to do is a |
|---|
| 0:00:45 | consummate weeks from now |
|---|
| 0:00:48 | which is for inside too |
|---|
| 0:00:50 | and the lies in most commonly used |
|---|
| 0:00:53 | p l system for i-vectors in which shown in effect is kept for |
|---|
| 0:00:59 | the decision score |
|---|
| 0:01:01 | proposed by someone prince |
|---|
| 0:01:03 | is or log likelihood ratio |
|---|
| 0:01:06 | in which we can see that |
|---|
| 0:01:09 | the computing the scroll depends only on the |
|---|
| 0:01:12 | nolan shelf |
|---|
| 0:01:13 | i matrix fifty transpose of five but |
|---|
| 0:01:16 | for speaker |
|---|
| 0:01:18 | factor |
|---|
| 0:01:19 | and vector transpose it proves long down |
|---|
| 0:01:23 | which content to talk about every reliability |
|---|
| 0:01:30 | there shouldn't fairly on modeling can provide good performance but |
|---|
| 0:01:34 | it has been shown that just performance are achieved only if the condition and prosody |
|---|
| 0:01:40 | a it follows the and extraction of i-vector all this conditioning posteriors |
|---|
| 0:01:46 | and is summarized by whitening most commonly used a |
|---|
| 0:01:51 | whitening is a standardization and length normalisation |
|---|
| 0:01:58 | i was matrix |
|---|
| 0:01:59 | of liability shown in for the standardisation |
|---|
| 0:02:02 | can is a total covariance matrix |
|---|
| 0:02:05 | so within speaker covariance matrix the volume |
|---|
| 0:02:09 | eventually to eventually we iterate this process |
|---|
| 0:02:13 | parameters are computed for the i-vectors present in the training corpus and applied to test |
|---|
| 0:02:18 | i-vectors |
|---|
| 0:02:22 | assumptions of the mission p lda the vicinity |
|---|
| 0:02:26 | justly |
|---|
| 0:02:27 | and the linearity of eigenvoices it means that |
|---|
| 0:02:31 | so the speaker are but can be constrained in a linear subspace |
|---|
| 0:02:36 | and the mostly just a city of the radio or |
|---|
| 0:02:39 | it means that a system to build your model assumes |
|---|
| 0:02:42 | that a speaker classes |
|---|
| 0:02:46 | statistics means that channel effects can be modeled |
|---|
| 0:02:50 | in a speaker independent way |
|---|
| 0:02:53 | so that the distributions shells a seven grams metrics |
|---|
| 0:02:58 | so it's independency between number |
|---|
| 0:03:01 | and the speaker factor |
|---|
| 0:03:03 | and the equality of covariance |
|---|
| 0:03:06 | garlic which occurrence it means that there are also between the residual between the actually |
|---|
| 0:03:12 | beach of a class |
|---|
| 0:03:13 | and the middle parameter |
|---|
| 0:03:14 | computed |
|---|
| 0:03:16 | for the jelly a seem to be uncorrelated |
|---|
| 0:03:20 | normally distributed a on the explained by |
|---|
| 0:03:24 | so front it's a simple |
|---|
| 0:03:27 | of the development corpus |
|---|
| 0:03:30 | so randomly |
|---|
| 0:03:31 | and that surrounds the not vary with the effects being more target |
|---|
| 0:03:38 | on the left as a graph |
|---|
| 0:03:41 | is the simple condition of the p lda model is in speaker factor one dimension |
|---|
| 0:03:47 | one additional subspace |
|---|
| 0:03:50 | where is no more |
|---|
| 0:03:52 | while stoned a normal prior for the speaker factor |
|---|
| 0:03:56 | and some classes with the same |
|---|
| 0:03:59 | viability metrics |
|---|
| 0:04:01 | or am is that i-vector no lie on |
|---|
| 0:04:04 | the nonlinear and find it connects subsets of an impostor |
|---|
| 0:04:10 | so as the distribution of i-vector noise |
|---|
| 0:04:13 | which is referred to as it's very core distribution |
|---|
| 0:04:19 | we think that perhaps insurance that exists a renowned speaker-independent admits a parameter on the |
|---|
| 0:04:24 | of within stego abilities questionable |
|---|
| 0:04:27 | in such a not affect be modeled in a speaker independent way |
|---|
| 0:04:31 | it's difficult to sure that something is right or something is wrong |
|---|
| 0:04:37 | for example if we find out or ration significant duration between |
|---|
| 0:04:42 | the whole |
|---|
| 0:04:44 | and the class parameter |
|---|
| 0:04:46 | the effect drama to it where you're late the estimation of random variable |
|---|
| 0:04:54 | first we present the deterministic approach |
|---|
| 0:04:58 | why printing deterministic approach to compute a purely apparently fast |
|---|
| 0:05:03 | because first two and we try some |
|---|
| 0:05:07 | deterministic approach is an remarks and that other approaches |
|---|
| 0:05:11 | not |
|---|
| 0:05:12 | all relevant sometimes a not so but the to suit |
|---|
| 0:05:17 | it should still there is not optimal for i-vector cycle distribution |
|---|
| 0:05:22 | can we replace is sophistication of the expectation maximization maximum likelihood |
|---|
| 0:05:28 | estimation of |
|---|
| 0:05:30 | parameters |
|---|
| 0:05:32 | by a simple and straightforward while stifle wildest an acoustic approach |
|---|
| 0:05:37 | so we want to know if |
|---|
| 0:05:40 | so application of the maximum likelihood |
|---|
| 0:05:44 | approach compute the parameters of the india |
|---|
| 0:05:49 | brings significant improvement of performance |
|---|
| 0:05:54 | we did not sorry may be the value when signals the between losing into programs |
|---|
| 0:05:59 | matrix was completely |
|---|
| 0:06:00 | on our development corpus |
|---|
| 0:06:04 | a singular value decomposition of the between speaker covariance matrix |
|---|
| 0:06:08 | give a matrix |
|---|
| 0:06:10 | whose columns are |
|---|
| 0:06:12 | so eigenvectors of the weighting between speaker |
|---|
| 0:06:16 | liability and the their remote matrix of eigenvalues |
|---|
| 0:06:21 | sorted in decreasing order |
|---|
| 0:06:24 | un a wrong are less and b |
|---|
| 0:06:27 | we can |
|---|
| 0:06:30 | compute |
|---|
| 0:06:31 | as arounds principle between speaker variability |
|---|
| 0:06:36 | and summarize it's and metric speech times t matrix |
|---|
| 0:06:40 | defined by the question for |
|---|
| 0:06:45 | the fast not to x p one two we are used to be turned on |
|---|
| 0:06:49 | matrix composed of the first occurrence of p |
|---|
| 0:06:52 | and so they're gonna matrix don't i want to well |
|---|
| 0:06:57 | is only comprise of the |
|---|
| 0:06:59 | highest hardest |
|---|
| 0:07:01 | eigenvalues |
|---|
| 0:07:03 | and so we propose a two |
|---|
| 0:07:07 | carry out |
|---|
| 0:07:09 | experiment with only |
|---|
| 0:07:11 | i w conditioning |
|---|
| 0:07:14 | conditioning and the system the still addition according to |
|---|
| 0:07:19 | within class covariance matrix |
|---|
| 0:07:21 | followed by next lemmatization |
|---|
| 0:07:23 | and the direct estimation of others at the parameters of the p l |
|---|
| 0:07:28 | the lda without which emitted and then and |
|---|
| 0:07:31 | on the bus on development corpus |
|---|
| 0:07:35 | so the scoring replaced by is the smart this is the total covariance matrices |
|---|
| 0:07:40 | for |
|---|
| 0:07:42 | is estimated by |
|---|
| 0:07:44 | that at the transmitters of the development corpus |
|---|
| 0:07:47 | and speaker levity metrics fifty transpose by be want to all |
|---|
| 0:07:55 | suppose can be justified if we consider somebody solely data from the development corpus |
|---|
| 0:08:02 | we can express as a factor and the parameters |
|---|
| 0:08:06 | speaker and with your |
|---|
| 0:08:08 | factors and she |
|---|
| 0:08:10 | well i on the value s is the mean vector director of speaker s |
|---|
| 0:08:15 | we show in the article that the covariance matrix is be i two well as |
|---|
| 0:08:19 | desirable that the speaker factor is standardised mean zero and i don't to metrics for |
|---|
| 0:08:26 | ability |
|---|
| 0:08:27 | and the dependence between that and variables |
|---|
| 0:08:32 | remark that only the new which of the covariance which is a necessary condition |
|---|
| 0:08:36 | is the |
|---|
| 0:08:39 | shift |
|---|
| 0:08:40 | and we cry and to obtain the lda scoring |
|---|
| 0:08:48 | next mission is known to improve the question it is so we compute the custody |
|---|
| 0:08:53 | of the speaker and was or fact also for development corpus |
|---|
| 0:08:57 | before and after length normalisation |
|---|
| 0:09:00 | top graphs shows |
|---|
| 0:09:03 | and distribution offices quell line source to standardise digital factors |
|---|
| 0:09:09 | left as the speaker factors on whites are ways of the optimum |
|---|
| 0:09:14 | the dashed lull i and is |
|---|
| 0:09:17 | the distribution of the key to |
|---|
| 0:09:20 | the speaker factor or must follow |
|---|
| 0:09:25 | a key with a degrees of freedom |
|---|
| 0:09:29 | and still on |
|---|
| 0:09:31 | okay to is a p u is of freedoms peas dimension of the i-vector space |
|---|
| 0:09:37 | we show it's not use that |
|---|
| 0:09:39 | for all |
|---|
| 0:09:40 | development board line |
|---|
| 0:09:43 | and so for evaluation |
|---|
| 0:09:46 | datasets |
|---|
| 0:09:48 | there is a mismatch |
|---|
| 0:09:50 | between them |
|---|
| 0:09:54 | and as a distribution of an intimate we can give it a distribution |
|---|
| 0:10:01 | remark also the several dataset shift between |
|---|
| 0:10:05 | development and evaluation dataset |
|---|
| 0:10:09 | after length normalization |
|---|
| 0:10:12 | is the volume |
|---|
| 0:10:14 | right care to experiments with |
|---|
| 0:10:18 | manage to compute parameters and with a deterministic approaches |
|---|
| 0:10:23 | in both cases |
|---|
| 0:10:24 | we can see that |
|---|
| 0:10:26 | so the numbers and the t v |
|---|
| 0:10:29 | partially reduced |
|---|
| 0:10:31 | and the shift |
|---|
| 0:10:34 | between the development and evaluation |
|---|
| 0:10:38 | mark sets a deterministic approach |
|---|
| 0:10:42 | improves the question e g |
|---|
| 0:10:43 | in a similar manner to ml technique |
|---|
| 0:10:47 | what's that is on the and it's recognition but most distant of motion t |
|---|
| 0:10:53 | always use of |
|---|
| 0:10:54 | three systems |
|---|
| 0:10:58 | we ultraviolet of conditions of the nist speaker recognition evaluations on eight ten |
|---|
| 0:11:05 | twelve telephone |
|---|
| 0:11:08 | is that the noisy environment |
|---|
| 0:11:11 | with the system |
|---|
| 0:11:13 | was a length normalization |
|---|
| 0:11:15 | following do not exist from going to signal |
|---|
| 0:11:19 | so that learns metrics and |
|---|
| 0:11:21 | two w which two cases |
|---|
| 0:11:24 | what is you know and mel |
|---|
| 0:11:26 | estimate of parameters and is a deterministic |
|---|
| 0:11:29 | an estimate of parameters |
|---|
| 0:11:32 | we can see |
|---|
| 0:11:34 | you can see that the result of the same in terms of |
|---|
| 0:11:39 | the colour right |
|---|
| 0:11:40 | between the two last the last two techniques |
|---|
| 0:11:43 | in terms of this you have the probabilistic approach women superior |
|---|
| 0:11:48 | and we mark sets l w conditioning performed a bitter |
|---|
| 0:11:53 | done the l signal conditioning |
|---|
| 0:11:56 | event with a deterministic approach |
|---|
| 0:12:04 | so no you consider that maybe the fact that so |
|---|
| 0:12:09 | and the end and better approach doesn't bring as expected |
|---|
| 0:12:13 | improvement of performance |
|---|
| 0:12:15 | maybe is due to the fact that the g p l d ar model is |
|---|
| 0:12:20 | not optimal for i-vector spherical distributions |
|---|
| 0:12:26 | so we compute |
|---|
| 0:12:28 | two series for development corpus |
|---|
| 0:12:32 | first the average people celebrated of zero is you of our observations |
|---|
| 0:12:36 | given the model |
|---|
| 0:12:38 | even |
|---|
| 0:12:39 | she and until |
|---|
| 0:12:40 | and standard t money t |
|---|
| 0:12:45 | which we consider that are likely would |
|---|
| 0:12:48 | off |
|---|
| 0:12:49 | class but also for likelihood |
|---|
| 0:12:51 | of the class given number |
|---|
| 0:12:55 | then we compile this |
|---|
| 0:12:57 | likelihood to the parameter of position of the class consider a wider of probabilistic class |
|---|
| 0:13:04 | position it's pasta or like a likelihood of the speaker and for speaker factor of |
|---|
| 0:13:09 | the class |
|---|
| 0:13:12 | and we display |
|---|
| 0:13:16 | the two series |
|---|
| 0:13:19 | all horizontal |
|---|
| 0:13:21 | wasn't really is parameter of class position and optical |
|---|
| 0:13:26 | the likelihood |
|---|
| 0:13:28 | of the reason you |
|---|
| 0:13:30 | according |
|---|
| 0:13:31 | we can model |
|---|
| 0:13:35 | the first graph |
|---|
| 0:13:37 | shows results always that would next normalisation |
|---|
| 0:13:41 | with i-vector lost provided buys extractor |
|---|
| 0:13:44 | and we remark here that |
|---|
| 0:13:47 | no volition a cross between the position of the class |
|---|
| 0:13:53 | and is a likelihood |
|---|
| 0:13:55 | of the residue |
|---|
| 0:13:58 | each time we displays a coefficient of determination task well |
|---|
| 0:14:02 | two scroll from zero to one |
|---|
| 0:14:05 | when which indicates l well data for points fit alignment |
|---|
| 0:14:10 | the task was equal to zero point zero four |
|---|
| 0:14:13 | close to zero |
|---|
| 0:14:16 | after are all length normalisation |
|---|
| 0:14:19 | a significant reduction |
|---|
| 0:14:21 | appears between the likelihoods of the class factors and the likelihood of there is you |
|---|
| 0:14:27 | a squirrel are equal to zero point filing nine and zero point six four |
|---|
| 0:14:35 | so there is a dependency between |
|---|
| 0:14:37 | the actual vulnerability |
|---|
| 0:14:41 | matrix of class and the probability position of this classic sperry by the likelihood of |
|---|
| 0:14:46 | the fractal |
|---|
| 0:14:48 | so we can see they are that's the show and it there was a dusty |
|---|
| 0:14:51 | of the raising your |
|---|
| 0:14:58 | we compute the previews |
|---|
| 0:15:02 | results who is well training set |
|---|
| 0:15:05 | in which that are not evenly distributed across speakers |
|---|
| 0:15:10 | so we can object that relations due to the can to differ information to speaker |
|---|
| 0:15:15 | some four |
|---|
| 0:15:17 | so we compute the same graphs and before |
|---|
| 0:15:21 | but on the for is speaker |
|---|
| 0:15:25 | training classes |
|---|
| 0:15:27 | with the minimum number of sessions per training speaker |
|---|
| 0:15:33 | we don't are you see that a minimal number of sessions speaker |
|---|
| 0:15:36 | one from two to sixty two |
|---|
| 0:15:41 | and this time for only segments of speaker which |
|---|
| 0:15:46 | the more than this minimum well |
|---|
| 0:15:48 | we compute the l score |
|---|
| 0:15:50 | we see that before makes them addition there are no problems |
|---|
| 0:15:54 | because the |
|---|
| 0:15:55 | the two series are independent |
|---|
| 0:15:57 | and after maximization be seen |
|---|
| 0:16:00 | that event for |
|---|
| 0:16:04 | uses speaker classes with the |
|---|
| 0:16:07 | the maximum number of sessions |
|---|
| 0:16:10 | the same |
|---|
| 0:16:11 | was we took us |
|---|
| 0:16:14 | is else well which are higher than zero point six |
|---|
| 0:16:24 | so we remark |
|---|
| 0:16:27 | that the j p alone modelling is a good model |
|---|
| 0:16:32 | but if we are obliged to |
|---|
| 0:16:34 | project that on the nonlinear also phones are |
|---|
| 0:16:38 | problem is to be sure that and the most acoustic model with the quality of |
|---|
| 0:16:44 | covariance |
|---|
| 0:16:46 | will for from the simpson |
|---|
| 0:16:51 | we don't dusty does take this out to replace the overall with a cluster between |
|---|
| 0:16:55 | that parameter by the class dependent parameter |
|---|
| 0:16:58 | steak the queen the local position of the class to fit to it |
|---|
| 0:17:01 | actual distortions |
|---|
| 0:17:05 | such an adrenaline is difficult to carry out |
|---|
| 0:17:11 | because it induces a complex density |
|---|
| 0:17:14 | i passing the within class variability parameters will nonlinear function |
|---|
| 0:17:19 | or getting up length normalization and |
|---|
| 0:17:22 | posting approaches as well which present over the i-vector on the one |
|---|
| 0:17:27 | attempting to find out attic what why also as heavy tailed be |
|---|
| 0:17:31 | discriminative classifiers pairwise discriminative |
|---|
| 0:17:36 | all just on why we are obliged to ignore the non maybe because and all |
|---|
| 0:17:41 | contain expected |
|---|
| 0:17:43 | the art abilities |
|---|
| 0:17:46 | may be related to some parameters |
|---|
| 0:17:49 | acoustic |
|---|
| 0:17:51 | just remark |
|---|
| 0:17:54 | which the and w conditioning |
|---|
| 0:17:59 | transform is the within class variability in the identity matrix |
|---|
| 0:18:03 | and identity matrix as no |
|---|
| 0:18:06 | principal components |
|---|
| 0:18:08 | maybe it at alleviates is a constant of |
|---|
| 0:18:13 | almost a dusty city |
|---|
| 0:18:16 | thank you |
|---|
| 0:18:22 | i |
|---|
| 0:18:33 | condition man something eat with experiments that you replaced the probabilistic approach of estimating the |
|---|
| 0:18:42 | parameters with the say on the screen |
|---|
| 0:18:47 | the minister |
|---|
| 0:18:49 | i think that's |
|---|
| 0:18:50 | in the limit if you're stream have main speakers |
|---|
| 0:18:53 | these two conditions exactly the same sort the only difference is that you're putting the |
|---|
| 0:18:58 | prior in the one case |
|---|
| 0:19:00 | okay so we present us with the number of the number of speakers average number |
|---|
| 0:19:06 | of speakers a and i guess that's you can go to a small number of |
|---|
| 0:19:10 | speakers when you train the model |
|---|
| 0:19:12 | yes and it and difference that's as of the deterministic approach is not intended competes |
|---|
| 0:19:19 | with a man a matter and then is the best way |
|---|
| 0:19:22 | but just i was surprised by is a slight yelp of performance |
|---|
| 0:19:28 | and so it's |
|---|
| 0:19:31 | assume that maybe because our aim ml count |
|---|
| 0:19:35 | be optimal because there is a problem of sphericity of data |
|---|
| 0:19:40 | but deterministic approach is not |
|---|
| 0:19:43 | and that's exactly this topic when we try to show that the norm |
|---|
| 0:19:49 | of the speaker factors |
|---|
| 0:19:52 | whether the full weight a |
|---|
| 0:19:56 | yes i guess a because you have to treat them as random variables because the |
|---|
| 0:20:01 | not simply points |
|---|
| 0:20:03 | under the plp scheme there they have a posterior distribution |
|---|
| 0:20:07 | "'kay" a better way |
|---|
| 0:20:09 | to consider whether they following the distribution |
|---|
| 0:20:13 | would be |
|---|
| 0:20:14 | broccoli to at the trace |
|---|
| 0:20:16 | the posterior covariance matrix v should be should also be added when you can't leave |
|---|
| 0:20:20 | the norm in order to see the overall distribution rather than |
|---|
| 0:20:25 | dot products on okay |
|---|
| 0:20:30 | marketing that's |
|---|
| 0:20:31 | so that in that's the same rationale "'cause" with evaluation was test like toss a |
|---|
| 0:20:37 | rice with development corpus used as vectors |
|---|
| 0:20:42 | same effect okay was that the difference is that between length normalization they'll score is |
|---|
| 0:20:47 | not close to zero |
|---|
| 0:20:49 | the cost for off test |
|---|
| 0:20:52 | i-vectors before estimators and has provided by the extract all |
|---|
| 0:20:58 | is the to zero point three |
|---|
| 0:21:02 | where is a vector of test not used for training the lda factor analyses so |
|---|
| 0:21:08 | there is a shift only not only for mean |
|---|
| 0:21:11 | but only four |
|---|
| 0:21:13 | this problem of almost instantly |
|---|
| 0:21:19 | just one quick what i just missed your point when you said |
|---|
| 0:21:23 | i think you were saying that |
|---|
| 0:21:27 | trying to make the det spherically distributed you thought was inconsistent with being gaussian |
|---|
| 0:21:32 | why's that |
|---|
| 0:21:36 | its empirical but the gaussian high dimensional space are sphere |
|---|
| 0:21:41 | yes some very |
|---|
| 0:21:45 | but |
|---|
| 0:21:47 | we constrained speaker fact all floral |
|---|
| 0:21:52 | and sphere |
|---|
| 0:21:53 | the just a goat |
|---|
| 0:21:55 | to assume that the within class but not be a set of the problem but |
|---|
| 0:22:03 | we will be affected |
|---|
| 0:22:05 | by the position |
|---|
| 0:22:07 | writings the posterior |
|---|
| 0:22:09 | the prior distribution of the i-vectors |
|---|
| 0:22:11 | zero mean unit identity rate both in high dimensional space that will be approximate |
|---|
| 0:22:19 | so that that's i mean that's care what happened i not as mathematically what a |
|---|
| 0:22:23 | high dimensional space so why's it in its just |
|---|
| 0:22:28 | here we actually a lot of what's a spherical distribution for phase as well |
|---|
| 0:22:38 | and applying model with the quality of correlators is a difficult the surface |
|---|
| 0:22:44 | maybe a see that length normalization is a whole technique projects on the sphere |
|---|
| 0:22:50 | instead of adjusting the tanks taking the information i think |
|---|
| 0:22:59 | good but which |
|---|
| 0:23:02 | but not so |
|---|
| 0:23:05 | discussion |
|---|