0:00:29university espain speaker recognition
0:01:02i-vector speaker recognition
0:01:16to get the parameters of the PLDA, we need to do the point estimates of
0:01:23the parameters
0:01:24maximum likelihood supervise
0:01:30plenty of data
0:01:43development data from
0:02:04the PLDA considers i-vector decompose
0:02:22where the prior is Gaussian
0:02:34to use this model
0:02:41a large number of data
0:02:47if we don't have a large of data, we are forced to
0:02:54speaker vector
0:03:03where the prior for y is Gaussian
0:03:14in this case we need less
0:03:24so if we have for example twenty
0:03:30a number of
0:03:36dimension of speaker vector ninety
0:03:44in the Bayesian approach
0:03:59for the parameters
0:04:04we are assumed they are
0:04:13on the model parameters
0:04:15and then we compute the posterior
0:04:20given the i-vectors and
0:04:32compute the posterior
0:04:45in this case we compute the posterior
0:04:56from now on we call this prior
0:05:06and finally we take
0:05:13by computing their expected values given the target posterior
0:05:20to get the posterior of the model parameters
0:05:31what we do is they compose
0:05:35assume model parameters
0:05:47then we compute in a cyclic fashion
0:05:57and finally we approximate
0:06:19is the number of speakers in the database
0:06:22and the posterior for the
0:06:25for the channels
0:06:29is the number of the segments in the
0:06:35then we can compute
0:06:38for the target data set
0:06:47from the original data set to the target data set
0:06:54we can compute the weight of the prior
0:06:59target data
0:07:01to do that we should modify the prior distribution
0:07:05the weight prior has dependent
0:07:10of the number of the speakers
0:07:13that we have in the last data set
0:07:19so we change the parameters
0:07:22we want to multiply the weight prior
0:07:29we have need to modify the alpha
0:07:31these two parameters
0:07:42but at the same time, they give the same expectation values for
0:07:49we can do the same with the prior of w
0:07:53and the finally
0:07:59for the number of speakers and the number of segments
0:08:03effective number of speakers and segments of the prior Gaussian
0:08:10we are going to compare out methods
0:08:14the normalization is
0:08:20that do centering and whitening
0:08:30to make more Gaussians
0:08:32fixing Gaussian
0:08:41unitary hypersphere
0:08:49to reduce the data set
0:08:56now I explain the data set
0:09:01data set
0:09:04this is
0:09:07data set we will use
0:09:13similar to the
0:09:18telephone channels
0:09:26that contains 30 male and 30 female
0:09:29data has the similar conditions
0:09:40two to three minutes
0:09:52data set with large
0:09:55we use this five
0:10:04that contains more than five hundred males and seven hundred females
0:10:12and it has variety of channels
0:10:18speaker verification
0:10:24we got twenty MFCC's plus delta and
0:10:36we build the system
0:10:50we use the normalization too
0:10:53the parameters
0:11:02and finally we used s norm score normalization with cohorts from the
0:11:09first here
0:11:24we compare
0:11:34we can see improvement
0:11:50we can see that
0:11:58the prior distribution
0:12:01we compare for instance the first line and the last line equal error rate
0:12:07forty percent for males and fourteen percent for females for min d c f improvement
0:12:13of twelve percent for males and forty six percent for females
0:12:17here it is a table compare difference parameters
0:12:27we can see
0:12:41here we show length normalization with s norm and without s norm
0:12:48when we use
0:12:57improvement using i-vector but not as much as
0:13:09we can see too that
0:13:11in this data set vector normalization
0:13:23better or
0:13:29here we show some improvements
0:14:03and for females
0:14:42we see that
0:14:49we can see that without normalization
0:14:58finally the conclusions we have developed a method to adapt a p l d a
0:15:03i-vector classifier from a domain with a large amount of development data to a domain
0:15:07with scarce development data
0:15:09we have conducted experiments
0:15:15we can see this technique improves the performance of the system
0:15:19and these improvement mainly comes from the adaptation of the channel matrix w
0:15:28we have compared this method with the length normalization
0:15:38we have better results
0:15:48we have discussed length normalization
0:15:51as future work Bayesian adaptation of the u b m and the i-vector extractor
0:16:22no the i-vector length means
0:16:31not the dimensional of the i-vector
0:17:40maybe we can do the same
0:17:45as we have more norm data