0:00:15 | how do |
---|---|

0:00:16 | so i reference investigations about discriminative training |

0:00:22 | applied to vectors i-vectors that have been probably normalized |

0:00:28 | shown us the system on which focus |

0:00:33 | says using more i-vector based system first cognition |

0:00:37 | who is normalisation within class covariance the next normalization |

0:00:42 | then modeling notion p lda modeling providing parameters |

0:00:48 | me mean value mean mu and covariance matrices |

0:00:52 | and llr score |

0:00:57 | some works have been point one of the two |

0:01:01 | optimize parameters of this modeling be lda modeling |

0:01:06 | by using a discriminative the way |

0:01:09 | this discriminative classifiers use the logistic regression |

0:01:15 | maximisation |

0:01:16 | applying to score conditions of p lda |

0:01:21 | or for one to period parameters |

0:01:27 | statistics |

0:01:30 | the goal here is to have the new step an additional step to the normalization |

0:01:36 | procedure |

0:01:37 | which doesn't modifies the distance between i-vectors |

0:01:41 | unlike maximization em within class and then into constraints a discriminative training |

0:01:49 | once the and this additional no posted you |

0:01:52 | is carried out it's possible to |

0:01:56 | train the discriminative classifier with limited order of questions to optimize records that |

0:02:03 | as the older of questions to optimize by discriminative way |

0:02:08 | the core to z-score all of the dimension of the i-vector |

0:02:13 | then we carry out to the state-of-the-art logistic regression based |

0:02:18 | discriminative training |

0:02:19 | and also a new approach that for two hours and also norman discriminative classifier |

0:02:25 | which is a novel tint |

0:02:28 | first from addition the mattress |

0:02:32 | using the f e |

0:02:35 | is assumed to be statistically |

0:02:38 | statistically independent of t i s and the sit on |

0:02:42 | of the is constrained to lie in are line or in our own shove |

0:02:50 | the eigenvoice subspace |

0:02:53 | then a new zones comments about two weeks |

0:02:56 | long dot is four |

0:03:00 | the most commonly used mode and fourteen year |

0:03:04 | in speaker recognition |

0:03:07 | so the at all score can be written as the second degree polynomial function |

0:03:11 | of components of the two vectors of the trial w |

0:03:15 | and the value chain |

0:03:17 | which is can be written |

0:03:20 | all sonically out with marcus is p and q |

0:03:28 | we call that the state-of-the-art two days |

0:03:31 | was duration based |

0:03:33 | discriminative classifiers |

0:03:35 | try to optimize coefficients initialize bar be lda modeling |

0:03:42 | the use of as a low probability of correctly classifying or training |

0:03:48 | target as target non-target just target trials cold to tell cross entropy |

0:03:55 | by using gradient descent respect to some coefficients |

0:03:59 | the coefficients |

0:04:01 | that have to be maximized can be |

0:04:03 | is the period and it a score coefficients |

0:04:06 | so i do not missus p and q |

0:04:09 | previous slide |

0:04:11 | and following this way we propose a bible get an hour and so on |

0:04:16 | there are score can be written |

0:04:18 | as a dot product |

0:04:20 | between and expanded vector of trial |

0:04:23 | and the i-vector w use it is initialized with purely parameters |

0:04:30 | but books from a marketing proposed in two thousand |

0:04:34 | thirteen two |

0:04:36 | optimize purely a parameters mean value |

0:04:40 | eigenvoice subspace the mattress |

0:04:43 | three and nuisance variability matrix lambda |

0:04:48 | by using this |

0:04:50 | to tell cross entropy |

0:04:51 | function |

0:04:56 | discriminative training consider from those limitations of the recall that i since it is in |

0:05:00 | c |

0:05:01 | overfitting |

0:05:02 | overfitting on development data |

0:05:05 | and the respect of is about a made a conditions |

0:05:09 | matrices of covariance must be positive |

0:05:14 | the night the night |

0:05:16 | and the mattress experience you to the negative or positive |

0:05:21 | the condition right |

0:05:22 | so |

0:05:23 | some solutions have been proposed |

0:05:27 | constrained discriminative training |

0:05:30 | attempt to train only a small amount of parameters |

0:05:33 | for their |

0:05:35 | d where these the dimension of the i-vector |

0:05:37 | or then address instead of this call |

0:05:42 | so it shows proposed for example by wrote in and all |

0:05:46 | as your own box to mark screen |

0:05:48 | optimize only some coefficients for each dimension of the i-vector |

0:05:53 | and also for which a counts like make up scroll |

0:06:02 | sure you |

0:06:04 | can see that the scores composes some of |

0:06:08 | so what terms |

0:06:10 | it is possible to optimize the problem it coefficients for |

0:06:14 | each |

0:06:16 | bottom system |

0:06:21 | also only mean vector or |

0:06:24 | and eigenvalues of peeling matrices |

0:06:27 | can be train and we optimize it when the scaling factor also on the fact |

0:06:32 | of all |

0:06:33 | a unique or scholar for each matrix |

0:06:39 | it's possible so as to what we singular value decomposition of p into four parameters |

0:06:44 | to respect them it and it to parameter conditions |

0:06:50 | if it is gonna teach training |

0:06:53 | as the probably in the interesting results when i-vector we'll not normalized |

0:06:58 | it struggles to improve |

0:07:00 | speaker detection one i-vector have been first normalized |

0:07:04 | whereas assumption that she's the best performance |

0:07:09 | and represents all the additional normally the simplicity on the screen |

0:07:14 | propose an intended to constrain the discriminative training |

0:07:19 | recall that after within class covariance matrix w is a topic |

0:07:25 | after links number normalisation it has been shown that it remains |

0:07:30 | almost exactly isn't to pick |

0:07:32 | i mean and identity matrix in light bias colour |

0:07:37 | we propose just two |

0:07:40 | to rotation by z eigenvector basis of between class covariance matrix b of the training |

0:07:45 | dataset |

0:07:46 | computed over decomposition of b |

0:07:49 | and we apply is matrix of eigen vectors of be to each i-vector or |

0:07:56 | training or test |

0:07:58 | this is very simple person doesn't twenty four distance between i-vectors |

0:08:03 | so that doesn't deterministic matrices b is diagonal the value remains almost expected is a |

0:08:09 | true peak |

0:08:11 | and therefore they are not |

0:08:13 | because it b eigenvector basis is also going or |

0:08:16 | we assume |

0:08:18 | okay point is that we assume that building matrices from transposed and number become almost |

0:08:23 | they're going out of and then these all topic for longer |

0:08:27 | as a consequence is the mattresses of score involved in the air of scorpions you |

0:08:32 | almost signal |

0:08:36 | moreover as the solution of lda is |

0:08:39 | most exactly |

0:08:41 | according to the subspaces just a convict also be |

0:08:45 | "'cause" they were doing that is almost exactly equal to |

0:08:48 | i up to constant negative constant |

0:08:52 | so the first components of i-vector also proximity the projects them into the ldr also |

0:08:57 | space |

0:09:00 | so the score can be written as isomorph |

0:09:04 | allpass one down |

0:09:08 | that's there is a one ton for each dimension of the i-vector |

0:09:12 | and we |

0:09:14 | the other things are what is your turn |

0:09:17 | or is it i z off diagonal terms of the initial scoring |

0:09:22 | all the diagonal terms be on the asked to mention |

0:09:25 | and the offsets |

0:09:29 | so stressed and another proportion of a between zero score can be concentrated into this |

0:09:34 | song of all |

0:09:36 | terms one for each |

0:09:38 | dependent of independent |

0:09:39 | terms |

0:09:42 | here is an analysis of purely parameters before and after this with addition |

0:09:46 | and we modules the dignity always entropy of the matrices |

0:09:52 | value of maximal of one indicates that not expect exactly diagonal |

0:09:58 | we can see that after the right after |

0:10:02 | dissertation |

0:10:03 | all the value or a close to one |

0:10:05 | whose nearly matrices are very close to be diagonal |

0:10:09 | and also score metrics |

0:10:11 | and women's you result of p |

0:10:14 | so lofty lda by using some functions projection |

0:10:19 | distance between projects and then |

0:10:21 | sure the |

0:10:23 | matrix |

0:10:24 | aspects |

0:10:25 | and we see that and i is the most exactly the topic |

0:10:30 | to misuse the negligible or |

0:10:33 | part |

0:10:34 | assume that of for that you're violence we |

0:10:36 | compute on the last line table |

0:10:39 | the rest should between the violence |

0:10:42 | of the residual term and the variances along scroll |

0:10:46 | and we can see that after a four |

0:10:48 | manner |

0:10:50 | female |

0:10:50 | training set values and i close to zero |

0:10:55 | in terms of performance |

0:10:57 | we can possibly lda full baseline with the as a simplified scoring |

0:11:01 | in which we have removed |

0:11:05 | was it your term can see that's was it's a single |

0:11:08 | there is a d or don't of no |

0:11:12 | or |

0:11:13 | the plate of or in the speaker detection |

0:11:18 | so we can |

0:11:20 | carrier to discriminative training applied to the vectors |

0:11:26 | first a state-of-the-art logistic regression based |

0:11:30 | first approach following buggered |

0:11:33 | and are also then it is an interesting coefficient is the schematic training can be |

0:11:38 | performed by optimising |

0:11:42 | vector omega |

0:11:44 | score is a dot product between an expanded vectors trial given two i-vectors |

0:11:51 | you're marking on that the score can be written |

0:11:54 | as vector or of the auto |

0:11:58 | all that's and the steed off although this war owens initial |

0:12:04 | descriptive training |

0:12:07 | so one way second approach is based on works of books from one mike rate |

0:12:13 | and can be remarked that as a matter this is a close to be diagonal |

0:12:18 | there are close as you to their eigenvalue |

0:12:22 | a diagonal matrix |

0:12:23 | and so we perform following boxed on my we only |

0:12:28 | performance measures training |

0:12:31 | intended to optimize as a diagonal off if you transposed the scout are of long |

0:12:37 | vowel |

0:12:38 | and the mean value me |

0:12:44 | then will introduce no anomaly an alternative to the logistic regression |

0:12:50 | discriminative training |

0:12:55 | we define a is spectral |

0:12:59 | expanded vector or score of the trial |

0:13:02 | i was all this one |

0:13:05 | spectral where like to all |

0:13:08 | with a one |

0:13:10 | component for each dimension of cd |

0:13:13 | eigenvoice subspace and the last component which is |

0:13:18 | so was it your terms |

0:13:21 | so the score is equal to this vector or dot product of this data and |

0:13:25 | of a vector of ones |

0:13:28 | the goal here is to replace this |

0:13:31 | unique normal spectral |

0:13:32 | the problem vector by the buses |

0:13:35 | basis of discriminant axes are extracted by using fisher project |

0:13:40 | then i |

0:13:41 | we have extracted in |

0:13:43 | one can but not one but |

0:13:45 | several vectors we have to combine these buses |

0:13:48 | basis of the control to fronted the unique normal a vector |

0:13:53 | needed by speaker detection |

0:13:58 | so we can use a one woman shucked italian two |

0:14:02 | extract as the disk a discriminant axes |

0:14:07 | in this space of expanded vector |

0:14:11 | so we can see there are data set comprised of for trials target and non-target |

0:14:15 | trials |

0:14:17 | for each of one of those of them we |

0:14:20 | by the expanded vector all |

0:14:23 | of the destroyer |

0:14:25 | so in these datasets we can compute the constrain the dimension |

0:14:31 | we can compute the statistics of trial or a target and non-target trials |

0:14:37 | the within class between class covariance matrices of |

0:14:41 | this dataset |

0:14:45 | in this case of two class classifier target non-target and we can extract is taxes |

0:14:51 | you maximizing the fisher criterion |

0:14:54 | of a question nine |

0:15:01 | problem |

0:15:02 | since you understand what the problem |

0:15:05 | with two class |

0:15:07 | the |

0:15:08 | between just middle east forms one so we can only |

0:15:12 | extractor one non you're |

0:15:14 | value |

0:15:16 | one axis only can be extracted because we are |

0:15:21 | limit of is the number of class |

0:15:25 | but some time ago we get a random it or of proposed them in order |

0:15:30 | to extract marxism class is like using the fisher we do i am so different |

0:15:35 | as middle bars also normal discriminative classifier |

0:15:39 | since you was use the sometimes in face to face recognition |

0:15:46 | to |

0:15:47 | two cells and |

0:15:49 | researchers use it in those errors |

0:15:52 | the idea is in a given in this other reason we then a training corpus |

0:15:56 | td off expanded vectors |

0:15:59 | of scroll trial |

0:16:01 | target non-target trials |

0:16:03 | we compute the statistics we compute is are extracted vector maximize |

0:16:10 | which maximizes as official italian |

0:16:13 | and born as |

0:16:15 | we project the data set onto the orthogonal subspace of is a vector |

0:16:20 | so we extract a vector we have the background and we |

0:16:24 | project data on the aeroplane of this electoral |

0:16:31 | and we t right so we can extract more taxes |

0:16:35 | then |

0:16:37 | class classes |

0:16:41 | can be that is that fisher returns the geometrical approach which doesn't need |

0:16:48 | assumptions of ago sanity for vector corresponding latent all schools |

0:16:53 | i'm not |

0:16:55 | additionally |

0:16:56 | distributed |

0:16:58 | i can be shown that they follow independent each component of expanding score for one |

0:17:04 | c dimension following dependent non sound toolkit you distributions with distant parameters |

0:17:10 | for target trials and non-target trials |

0:17:14 | can be more supposing that if you |

0:17:17 | carry out an experiment using expanded vectors course whiskey to distribution |

0:17:24 | we obtain exactly the sandwich you |

0:17:26 | then we select a loss the idea that off cool |

0:17:30 | because if you chew |

0:17:32 | does not |

0:17:33 | a new informations |

0:17:36 | extract i-vectors of standard normal prior |

0:17:40 | so this is a |

0:17:41 | the we to put in a multifunctional score |

0:17:44 | for look at you |

0:17:46 | so that was on the same |

0:17:49 | but if we use this method to extract a try to extract the |

0:17:54 | discriminant axis |

0:17:57 | or an menstrual to address is to combine this subspace of |

0:18:02 | discriminant |

0:18:03 | axis to |

0:18:05 | to obtain the unique |

0:18:07 | normal vector are needed by speaker detection we need only |

0:18:11 | one vector to apply |

0:18:14 | so we have to find weights to |

0:18:18 | applied to each |

0:18:19 | also no discrete on tech vectors |

0:18:25 | that's proposed |

0:18:27 | weights equal to the norms the spectral |

0:18:30 | because by this way it can be shown that the variance of scores off |

0:18:34 | the |

0:18:37 | the axis |

0:18:38 | i don't iteration |

0:18:41 | the variance is decreasing |

0:18:43 | and so this is this missile is similar to a singular value decomposition |

0:18:48 | in which we extract the |

0:18:51 | most important axes in terms of variability of scroll then |

0:18:56 | the others |

0:18:58 | with decreasing violence and remark that at the end |

0:19:02 | the impact of the lasts and are |

0:19:06 | discriminant vectors is negligible or in this in the score |

0:19:11 | so |

0:19:14 | question ten show that to a trial we can have to rotation by be computed |

0:19:20 | expanded vector of g i g between two i-vectors |

0:19:24 | and the price of the product |

0:19:27 | of cs benedict always is |

0:19:31 | discriminant axes with seizes is |

0:19:33 | weighted sum of fisher could tie on |

0:19:37 | axis |

0:19:40 | for task training event if the dimension of expanded vector |

0:19:46 | is folder or do you can not disk or |

0:19:50 | we can of more than one hundred millions of non-target |

0:19:56 | trials |

0:19:57 | and since we have to compute the covariance matrix of |

0:20:01 | set of more than |

0:20:03 | and |

0:20:05 | so i four hundred |

0:20:07 | billions |

0:20:09 | trials |

0:20:11 | we can parameterize just cores that others statistics of |

0:20:17 | the training set |

0:20:18 | if we but make a pass training of the system things that can be expressed |

0:20:22 | as linear combinations |

0:20:24 | of statistics of subsets |

0:20:26 | so it's possible to split the task |

0:20:31 | i don't for experiments to split the task of computation of this you which |

0:20:38 | current training dataset |

0:20:41 | another remark |

0:20:44 | which was not and done by the also has a nice old |

0:20:48 | i |

0:20:50 | the nist needs |

0:20:51 | vertically to project data onto a to one answer space |

0:20:55 | at each iteration |

0:20:56 | and also if you are |

0:20:59 | billions of data it's very long but the paper was an unruly to me |

0:21:06 | extract i-vectors without |

0:21:10 | the concern of projecting data at each iteration only by updating statistics |

0:21:16 | it is possible to extract i-vectors without |

0:21:19 | are effective |

0:21:21 | where are projection of data at each iteration |

0:21:26 | lines use |

0:21:28 | of z recognition five |

0:21:33 | of phone is the sorry the two thousand ten telephone extended |

0:21:40 | with a vector provided by |

0:21:44 | borrow university of technology so santana |

0:21:47 | so as an eleven |

0:21:49 | thanks to on the chernotsky and of a month ago |

0:21:53 | for male set and from a set |

0:21:55 | and of the first line for h and i is the baseline |

0:22:00 | p lda |

0:22:02 | first as the two approaches using logistic regression on coefficient of score of punitive parameters |

0:22:09 | and the fourth line easier or something more discriminative classifier |

0:22:15 | we can see first that logistic regression there is the approach is frightening improving the |

0:22:20 | performance of p lda |

0:22:23 | it's why that's why the of the weighting because the incentives the cup |

0:22:30 | the corresponding is constrained |

0:22:34 | maybe overfitting on data all |

0:22:38 | although i don't know |

0:22:40 | and as the results are not better than p lda |

0:22:45 | maybe asked other links normalisation a vector r |

0:22:50 | go shown |

0:22:51 | it proves gaussianity |

0:22:53 | and seuss logistic regression is enabled maybe |

0:22:56 | to improve a getting |

0:22:59 | the performance |

0:23:01 | we remark that was more discriminative classifier is able to improve performance in terms of |

0:23:08 | equal error rate |

0:23:09 | and see it at all |

0:23:12 | for all send us more than female |

0:23:17 | not that's a to take into account and distortions in the television on the critical |

0:23:22 | original false alarms |

0:23:24 | it's able to learn or the only on is trials provide things the highest |

0:23:32 | as a non-target trials providing the highest schools |

0:23:38 | with the dentist and highest non-target |

0:23:42 | trial scores |

0:23:44 | we trained the thirty two |

0:23:47 | be bitter done with or |

0:23:50 | so the non-target set |

0:23:56 | what is the recent speaker in the one and to silence |

0:24:01 | you know evaluation which is a good way to assess what business of an approach |

0:24:04 | covers the conditions are not controlled |

0:24:08 | i'm with the real version noise short duration and mixing |

0:24:12 | male female |

0:24:14 | we can see that visit hardly are i-vector of |

0:24:19 | that or d is able to improve slightly performance of p lda |

0:24:26 | not just sets present indicated |

0:24:30 | on all those of the |

0:24:32 | official score board there are more suited our cruise the channels and their or and |

0:24:37 | we applaud |

0:24:39 | or this cost |

0:24:40 | well in don't not correctly calibrate |

0:24:45 | the discourse the development set |

0:24:48 | and so as a result |

0:24:51 | two versions |

0:24:54 | future works well working on short duration of the utterance of a team use a |

0:24:59 | desirable to improve slightly or |

0:25:02 | sometimes more |

0:25:03 | others ple baseline |

0:25:06 | and particulars the speaker variabilities system issue is not very accurate |

0:25:13 | as |

0:25:14 | the ones for short duration |

0:25:16 | and the also on i-vector like representations |

0:25:22 | following |

0:25:24 | whole v are which propose them |

0:25:27 | to extract a lower want to probability factors for speaker diarization |

0:25:32 | by using deep neural networks |

0:25:35 | we showed that is p lda framework a is able to texas |

0:25:42 | a new representation |

0:25:45 | and to deal with system in addition |

0:25:50 | thank you |