0:00:29 | university espain speaker recognition |
---|---|

0:01:02 | i-vector speaker recognition |

0:01:11 | PLDA |

0:01:16 | to get the parameters of the PLDA, we need to do the point estimates of |

0:01:23 | the parameters |

0:01:24 | maximum likelihood supervise |

0:01:30 | plenty of data |

0:01:43 | development data from |

0:02:04 | the PLDA considers i-vector decompose |

0:02:22 | where the prior is Gaussian |

0:02:34 | to use this model |

0:02:41 | a large number of data |

0:02:47 | if we don't have a large of data, we are forced to |

0:02:54 | speaker vector |

0:03:03 | where the prior for y is Gaussian |

0:03:09 | Gaussian |

0:03:14 | in this case we need less |

0:03:24 | so if we have for example twenty |

0:03:30 | a number of |

0:03:36 | dimension of speaker vector ninety |

0:03:44 | in the Bayesian approach |

0:03:59 | for the parameters |

0:04:04 | we are assumed they are |

0:04:09 | priors |

0:04:13 | on the model parameters |

0:04:15 | and then we compute the posterior |

0:04:20 | given the i-vectors and |

0:04:25 | so |

0:04:27 | methods |

0:04:32 | compute the posterior |

0:04:37 | prior |

0:04:45 | in this case we compute the posterior |

0:04:56 | from now on we call this prior |

0:05:06 | and finally we take |

0:05:13 | by computing their expected values given the target posterior |

0:05:20 | to get the posterior of the model parameters |

0:05:27 | solutions |

0:05:31 | what we do is they compose |

0:05:35 | assume model parameters |

0:05:47 | then we compute in a cyclic fashion |

0:05:57 | and finally we approximate |

0:06:19 | is the number of speakers in the database |

0:06:22 | and the posterior for the |

0:06:25 | for the channels |

0:06:29 | is the number of the segments in the |

0:06:35 | then we can compute |

0:06:38 | for the target data set |

0:06:47 | from the original data set to the target data set |

0:06:54 | we can compute the weight of the prior |

0:06:59 | target data |

0:07:01 | to do that we should modify the prior distribution |

0:07:05 | the weight prior has dependent |

0:07:10 | of the number of the speakers |

0:07:13 | that we have in the last data set |

0:07:19 | so we change the parameters |

0:07:22 | we want to multiply the weight prior |

0:07:29 | we have need to modify the alpha |

0:07:31 | these two parameters |

0:07:42 | but at the same time, they give the same expectation values for |

0:07:49 | we can do the same with the prior of w |

0:07:53 | and the finally |

0:07:59 | for the number of speakers and the number of segments |

0:08:03 | effective number of speakers and segments of the prior Gaussian |

0:08:10 | we are going to compare out methods |

0:08:14 | the normalization is |

0:08:20 | that do centering and whitening |

0:08:30 | to make more Gaussians |

0:08:32 | fixing Gaussian |

0:08:41 | unitary hypersphere |

0:08:49 | to reduce the data set |

0:08:56 | now I explain the data set |

0:09:01 | data set |

0:09:04 | this is |

0:09:07 | data set we will use |

0:09:13 | similar to the |

0:09:18 | telephone channels |

0:09:26 | that contains 30 male and 30 female |

0:09:29 | data has the similar conditions |

0:09:32 | conditions |

0:09:40 | two to three minutes |

0:09:52 | data set with large |

0:09:55 | we use this five |

0:10:04 | that contains more than five hundred males and seven hundred females |

0:10:12 | and it has variety of channels |

0:10:18 | speaker verification |

0:10:24 | we got twenty MFCC's plus delta and |

0:10:36 | we build the system |

0:10:50 | we use the normalization too |

0:10:53 | the parameters |

0:11:02 | and finally we used s norm score normalization with cohorts from the |

0:11:09 | first here |

0:11:24 | we compare |

0:11:34 | we can see improvement |

0:11:50 | we can see that |

0:11:58 | the prior distribution |

0:12:01 | we compare for instance the first line and the last line equal error rate |

0:12:07 | forty percent for males and fourteen percent for females for min d c f improvement |

0:12:13 | of twelve percent for males and forty six percent for females |

0:12:17 | here it is a table compare difference parameters |

0:12:27 | we can see |

0:12:31 | improvement |

0:12:41 | here we show length normalization with s norm and without s norm |

0:12:48 | when we use |

0:12:57 | improvement using i-vector but not as much as |

0:13:09 | we can see too that |

0:13:11 | in this data set vector normalization |

0:13:23 | better or |

0:13:29 | here we show some improvements |

0:14:03 | and for females |

0:14:28 | finally |

0:14:42 | we see that |

0:14:49 | we can see that without normalization |

0:14:58 | finally the conclusions we have developed a method to adapt a p l d a |

0:15:03 | i-vector classifier from a domain with a large amount of development data to a domain |

0:15:07 | with scarce development data |

0:15:09 | we have conducted experiments |

0:15:15 | we can see this technique improves the performance of the system |

0:15:19 | and these improvement mainly comes from the adaptation of the channel matrix w |

0:15:28 | we have compared this method with the length normalization |

0:15:38 | we have better results |

0:15:48 | we have discussed length normalization |

0:15:51 | as future work Bayesian adaptation of the u b m and the i-vector extractor |

0:16:22 | no the i-vector length means |

0:16:31 | not the dimensional of the i-vector |

0:17:40 | maybe we can do the same |

0:17:45 | as we have more norm data |