0:00:15how do
0:00:16so i reference investigations about discriminative training
0:00:22applied to vectors i-vectors that have been probably normalized
0:00:28shown us the system on which focus
0:00:33says using more i-vector based system first cognition
0:00:37who is normalisation within class covariance the next normalization
0:00:42then modeling notion p lda modeling providing parameters
0:00:48me mean value mean mu and covariance matrices
0:00:52and llr score
0:00:57some works have been point one of the two
0:01:01optimize parameters of this modeling be lda modeling
0:01:06by using a discriminative the way
0:01:09this discriminative classifiers use the logistic regression
0:01:15maximisation
0:01:16applying to score conditions of p lda
0:01:21or for one to period parameters
0:01:27statistics
0:01:30the goal here is to have the new step an additional step to the normalization
0:01:36procedure
0:01:37which doesn't modifies the distance between i-vectors
0:01:41unlike maximization em within class and then into constraints a discriminative training
0:01:49once the and this additional no posted you
0:01:52is carried out it's possible to
0:01:56train the discriminative classifier with limited order of questions to optimize records that
0:02:03as the older of questions to optimize by discriminative way
0:02:08the core to z-score all of the dimension of the i-vector
0:02:13then we carry out to the state-of-the-art logistic regression based
0:02:18discriminative training
0:02:19and also a new approach that for two hours and also norman discriminative classifier
0:02:25which is a novel tint
0:02:28first from addition the mattress
0:02:32using the f e
0:02:35is assumed to be statistically
0:02:38statistically independent of t i s and the sit on
0:02:42of the is constrained to lie in are line or in our own shove
0:02:50the eigenvoice subspace
0:02:53then a new zones comments about two weeks
0:02:56long dot is four
0:03:00the most commonly used mode and fourteen year
0:03:04in speaker recognition
0:03:07so the at all score can be written as the second degree polynomial function
0:03:11of components of the two vectors of the trial w
0:03:15and the value chain
0:03:17which is can be written
0:03:20all sonically out with marcus is p and q
0:03:28we call that the state-of-the-art two days
0:03:31was duration based
0:03:33discriminative classifiers
0:03:35try to optimize coefficients initialize bar be lda modeling
0:03:42the use of as a low probability of correctly classifying or training
0:03:48target as target non-target just target trials cold to tell cross entropy
0:03:55by using gradient descent respect to some coefficients
0:03:59the coefficients
0:04:01that have to be maximized can be
0:04:03is the period and it a score coefficients
0:04:06so i do not missus p and q
0:04:09previous slide
0:04:11and following this way we propose a bible get an hour and so on
0:04:16there are score can be written
0:04:18as a dot product
0:04:20between and expanded vector of trial
0:04:23and the i-vector w use it is initialized with purely parameters
0:04:30but books from a marketing proposed in two thousand
0:04:34thirteen two
0:04:36optimize purely a parameters mean value
0:04:40eigenvoice subspace the mattress
0:04:43three and nuisance variability matrix lambda
0:04:48by using this
0:04:50to tell cross entropy
0:04:51function
0:04:56discriminative training consider from those limitations of the recall that i since it is in
0:05:00c
0:05:01overfitting
0:05:02overfitting on development data
0:05:05and the respect of is about a made a conditions
0:05:09matrices of covariance must be positive
0:05:14the night the night
0:05:16and the mattress experience you to the negative or positive
0:05:21the condition right
0:05:22so
0:05:23some solutions have been proposed
0:05:27constrained discriminative training
0:05:30attempt to train only a small amount of parameters
0:05:33for their
0:05:35d where these the dimension of the i-vector
0:05:37or then address instead of this call
0:05:42so it shows proposed for example by wrote in and all
0:05:46as your own box to mark screen
0:05:48optimize only some coefficients for each dimension of the i-vector
0:05:53and also for which a counts like make up scroll
0:06:02sure you
0:06:04can see that the scores composes some of
0:06:08so what terms
0:06:10it is possible to optimize the problem it coefficients for
0:06:14each
0:06:16bottom system
0:06:21also only mean vector or
0:06:24and eigenvalues of peeling matrices
0:06:27can be train and we optimize it when the scaling factor also on the fact
0:06:32of all
0:06:33a unique or scholar for each matrix
0:06:39it's possible so as to what we singular value decomposition of p into four parameters
0:06:44to respect them it and it to parameter conditions
0:06:50if it is gonna teach training
0:06:53as the probably in the interesting results when i-vector we'll not normalized
0:06:58it struggles to improve
0:07:00speaker detection one i-vector have been first normalized
0:07:04whereas assumption that she's the best performance
0:07:09and represents all the additional normally the simplicity on the screen
0:07:14propose an intended to constrain the discriminative training
0:07:19recall that after within class covariance matrix w is a topic
0:07:25after links number normalisation it has been shown that it remains
0:07:30almost exactly isn't to pick
0:07:32i mean and identity matrix in light bias colour
0:07:37we propose just two
0:07:40to rotation by z eigenvector basis of between class covariance matrix b of the training
0:07:45dataset
0:07:46computed over decomposition of b
0:07:49and we apply is matrix of eigen vectors of be to each i-vector or
0:07:56training or test
0:07:58this is very simple person doesn't twenty four distance between i-vectors
0:08:03so that doesn't deterministic matrices b is diagonal the value remains almost expected is a
0:08:09true peak
0:08:11and therefore they are not
0:08:13because it b eigenvector basis is also going or
0:08:16we assume
0:08:18okay point is that we assume that building matrices from transposed and number become almost
0:08:23they're going out of and then these all topic for longer
0:08:27as a consequence is the mattresses of score involved in the air of scorpions you
0:08:32almost signal
0:08:36moreover as the solution of lda is
0:08:39most exactly
0:08:41according to the subspaces just a convict also be
0:08:45"'cause" they were doing that is almost exactly equal to
0:08:48i up to constant negative constant
0:08:52so the first components of i-vector also proximity the projects them into the ldr also
0:08:57space
0:09:00so the score can be written as isomorph
0:09:04allpass one down
0:09:08that's there is a one ton for each dimension of the i-vector
0:09:12and we
0:09:14the other things are what is your turn
0:09:17or is it i z off diagonal terms of the initial scoring
0:09:22all the diagonal terms be on the asked to mention
0:09:25and the offsets
0:09:29so stressed and another proportion of a between zero score can be concentrated into this
0:09:34song of all
0:09:36terms one for each
0:09:38dependent of independent
0:09:39terms
0:09:42here is an analysis of purely parameters before and after this with addition
0:09:46and we modules the dignity always entropy of the matrices
0:09:52value of maximal of one indicates that not expect exactly diagonal
0:09:58we can see that after the right after
0:10:02dissertation
0:10:03all the value or a close to one
0:10:05whose nearly matrices are very close to be diagonal
0:10:09and also score metrics
0:10:11and women's you result of p
0:10:14so lofty lda by using some functions projection
0:10:19distance between projects and then
0:10:21sure the
0:10:23matrix
0:10:24aspects
0:10:25and we see that and i is the most exactly the topic
0:10:30to misuse the negligible or
0:10:33part
0:10:34assume that of for that you're violence we
0:10:36compute on the last line table
0:10:39the rest should between the violence
0:10:42of the residual term and the variances along scroll
0:10:46and we can see that after a four
0:10:48manner
0:10:50female
0:10:50training set values and i close to zero
0:10:55in terms of performance
0:10:57we can possibly lda full baseline with the as a simplified scoring
0:11:01in which we have removed
0:11:05was it your term can see that's was it's a single
0:11:08there is a d or don't of no
0:11:12or
0:11:13the plate of or in the speaker detection
0:11:18so we can
0:11:20carrier to discriminative training applied to the vectors
0:11:26first a state-of-the-art logistic regression based
0:11:30first approach following buggered
0:11:33and are also then it is an interesting coefficient is the schematic training can be
0:11:38performed by optimising
0:11:42vector omega
0:11:44score is a dot product between an expanded vectors trial given two i-vectors
0:11:51you're marking on that the score can be written
0:11:54as vector or of the auto
0:11:58all that's and the steed off although this war owens initial
0:12:04descriptive training
0:12:07so one way second approach is based on works of books from one mike rate
0:12:13and can be remarked that as a matter this is a close to be diagonal
0:12:18there are close as you to their eigenvalue
0:12:22a diagonal matrix
0:12:23and so we perform following boxed on my we only
0:12:28performance measures training
0:12:31intended to optimize as a diagonal off if you transposed the scout are of long
0:12:37vowel
0:12:38and the mean value me
0:12:44then will introduce no anomaly an alternative to the logistic regression
0:12:50discriminative training
0:12:55we define a is spectral
0:12:59expanded vector or score of the trial
0:13:02i was all this one
0:13:05spectral where like to all
0:13:08with a one
0:13:10component for each dimension of cd
0:13:13eigenvoice subspace and the last component which is
0:13:18so was it your terms
0:13:21so the score is equal to this vector or dot product of this data and
0:13:25of a vector of ones
0:13:28the goal here is to replace this
0:13:31unique normal spectral
0:13:32the problem vector by the buses
0:13:35basis of discriminant axes are extracted by using fisher project
0:13:40then i
0:13:41we have extracted in
0:13:43one can but not one but
0:13:45several vectors we have to combine these buses
0:13:48basis of the control to fronted the unique normal a vector
0:13:53needed by speaker detection
0:13:58so we can use a one woman shucked italian two
0:14:02extract as the disk a discriminant axes
0:14:07in this space of expanded vector
0:14:11so we can see there are data set comprised of for trials target and non-target
0:14:15trials
0:14:17for each of one of those of them we
0:14:20by the expanded vector all
0:14:23of the destroyer
0:14:25so in these datasets we can compute the constrain the dimension
0:14:31we can compute the statistics of trial or a target and non-target trials
0:14:37the within class between class covariance matrices of
0:14:41this dataset
0:14:45in this case of two class classifier target non-target and we can extract is taxes
0:14:51you maximizing the fisher criterion
0:14:54of a question nine
0:15:01problem
0:15:02since you understand what the problem
0:15:05with two class
0:15:07the
0:15:08between just middle east forms one so we can only
0:15:12extractor one non you're
0:15:14value
0:15:16one axis only can be extracted because we are
0:15:21limit of is the number of class
0:15:25but some time ago we get a random it or of proposed them in order
0:15:30to extract marxism class is like using the fisher we do i am so different
0:15:35as middle bars also normal discriminative classifier
0:15:39since you was use the sometimes in face to face recognition
0:15:46to
0:15:47two cells and
0:15:49researchers use it in those errors
0:15:52the idea is in a given in this other reason we then a training corpus
0:15:56td off expanded vectors
0:15:59of scroll trial
0:16:01target non-target trials
0:16:03we compute the statistics we compute is are extracted vector maximize
0:16:10which maximizes as official italian
0:16:13and born as
0:16:15we project the data set onto the orthogonal subspace of is a vector
0:16:20so we extract a vector we have the background and we
0:16:24project data on the aeroplane of this electoral
0:16:31and we t right so we can extract more taxes
0:16:35then
0:16:37class classes
0:16:41can be that is that fisher returns the geometrical approach which doesn't need
0:16:48assumptions of ago sanity for vector corresponding latent all schools
0:16:53i'm not
0:16:55additionally
0:16:56distributed
0:16:58i can be shown that they follow independent each component of expanding score for one
0:17:04c dimension following dependent non sound toolkit you distributions with distant parameters
0:17:10for target trials and non-target trials
0:17:14can be more supposing that if you
0:17:17carry out an experiment using expanded vectors course whiskey to distribution
0:17:24we obtain exactly the sandwich you
0:17:26then we select a loss the idea that off cool
0:17:30because if you chew
0:17:32does not
0:17:33a new informations
0:17:36extract i-vectors of standard normal prior
0:17:40so this is a
0:17:41the we to put in a multifunctional score
0:17:44for look at you
0:17:46so that was on the same
0:17:49but if we use this method to extract a try to extract the
0:17:54discriminant axis
0:17:57or an menstrual to address is to combine this subspace of
0:18:02discriminant
0:18:03axis to
0:18:05to obtain the unique
0:18:07normal vector are needed by speaker detection we need only
0:18:11one vector to apply
0:18:14so we have to find weights to
0:18:18applied to each
0:18:19also no discrete on tech vectors
0:18:25that's proposed
0:18:27weights equal to the norms the spectral
0:18:30because by this way it can be shown that the variance of scores off
0:18:34the
0:18:37the axis
0:18:38i don't iteration
0:18:41the variance is decreasing
0:18:43and so this is this missile is similar to a singular value decomposition
0:18:48in which we extract the
0:18:51most important axes in terms of variability of scroll then
0:18:56the others
0:18:58with decreasing violence and remark that at the end
0:19:02the impact of the lasts and are
0:19:06discriminant vectors is negligible or in this in the score
0:19:11so
0:19:14question ten show that to a trial we can have to rotation by be computed
0:19:20expanded vector of g i g between two i-vectors
0:19:24and the price of the product
0:19:27of cs benedict always is
0:19:31discriminant axes with seizes is
0:19:33weighted sum of fisher could tie on
0:19:37axis
0:19:40for task training event if the dimension of expanded vector
0:19:46is folder or do you can not disk or
0:19:50we can of more than one hundred millions of non-target
0:19:56trials
0:19:57and since we have to compute the covariance matrix of
0:20:01set of more than
0:20:03and
0:20:05so i four hundred
0:20:07billions
0:20:09trials
0:20:11we can parameterize just cores that others statistics of
0:20:17the training set
0:20:18if we but make a pass training of the system things that can be expressed
0:20:22as linear combinations
0:20:24of statistics of subsets
0:20:26so it's possible to split the task
0:20:31i don't for experiments to split the task of computation of this you which
0:20:38current training dataset
0:20:41another remark
0:20:44which was not and done by the also has a nice old
0:20:48i
0:20:50the nist needs
0:20:51vertically to project data onto a to one answer space
0:20:55at each iteration
0:20:56and also if you are
0:20:59billions of data it's very long but the paper was an unruly to me
0:21:06extract i-vectors without
0:21:10the concern of projecting data at each iteration only by updating statistics
0:21:16it is possible to extract i-vectors without
0:21:19are effective
0:21:21where are projection of data at each iteration
0:21:26lines use
0:21:28of z recognition five
0:21:33of phone is the sorry the two thousand ten telephone extended
0:21:40with a vector provided by
0:21:44borrow university of technology so santana
0:21:47so as an eleven
0:21:49thanks to on the chernotsky and of a month ago
0:21:53for male set and from a set
0:21:55and of the first line for h and i is the baseline
0:22:00p lda
0:22:02first as the two approaches using logistic regression on coefficient of score of punitive parameters
0:22:09and the fourth line easier or something more discriminative classifier
0:22:15we can see first that logistic regression there is the approach is frightening improving the
0:22:20performance of p lda
0:22:23it's why that's why the of the weighting because the incentives the cup
0:22:30the corresponding is constrained
0:22:34maybe overfitting on data all
0:22:38although i don't know
0:22:40and as the results are not better than p lda
0:22:45maybe asked other links normalisation a vector r
0:22:50go shown
0:22:51it proves gaussianity
0:22:53and seuss logistic regression is enabled maybe
0:22:56to improve a getting
0:22:59the performance
0:23:01we remark that was more discriminative classifier is able to improve performance in terms of
0:23:08equal error rate
0:23:09and see it at all
0:23:12for all send us more than female
0:23:17not that's a to take into account and distortions in the television on the critical
0:23:22original false alarms
0:23:24it's able to learn or the only on is trials provide things the highest
0:23:32as a non-target trials providing the highest schools
0:23:38with the dentist and highest non-target
0:23:42trial scores
0:23:44we trained the thirty two
0:23:47be bitter done with or
0:23:50so the non-target set
0:23:56what is the recent speaker in the one and to silence
0:24:01you know evaluation which is a good way to assess what business of an approach
0:24:04covers the conditions are not controlled
0:24:08i'm with the real version noise short duration and mixing
0:24:12male female
0:24:14we can see that visit hardly are i-vector of
0:24:19that or d is able to improve slightly performance of p lda
0:24:26not just sets present indicated
0:24:30on all those of the
0:24:32official score board there are more suited our cruise the channels and their or and
0:24:37we applaud
0:24:39or this cost
0:24:40well in don't not correctly calibrate
0:24:45the discourse the development set
0:24:48and so as a result
0:24:51two versions
0:24:54future works well working on short duration of the utterance of a team use a
0:24:59desirable to improve slightly or
0:25:02sometimes more
0:25:03others ple baseline
0:25:06and particulars the speaker variabilities system issue is not very accurate
0:25:13as
0:25:14the ones for short duration
0:25:16and the also on i-vector like representations
0:25:22following
0:25:24whole v are which propose them
0:25:27to extract a lower want to probability factors for speaker diarization
0:25:32by using deep neural networks
0:25:35we showed that is p lda framework a is able to texas
0:25:42a new representation
0:25:45and to deal with system in addition
0:25:50thank you