0:00:16 | Changhuai |
---|---|

0:00:17 | you Haizhou Li |

0:00:21 | and Ambikairajah Kong Aik Lee and |

0:00:25 | oh |

0:00:27 | presented by |

0:00:48 | good afternoon every one |

0:00:51 | the paper i would like to present is entitled |

0:00:53 | Bhattacharyya based gmm |

0:00:56 | SVM system with adaptive from |

0:00:59 | relevance factor for pair language recognition |

0:01:06 | and outline |

0:01:07 | oh for this presentation is shown here |

0:01:11 | in this pair language recognition system and we major focus by using |

0:01:17 | a studying the three |

0:01:20 | techniques including Bhattacharyya based gmm svm |

0:01:25 | an adaptive relevance factor as well as strategies for pair language recognition |

0:01:34 | given a specified language pair the task of |

0:01:38 | recognition of |

0:01:39 | language pair is to decide which of these |

0:01:43 | two languages is in fact spoken in the specified in a given segment |

0:01:50 | so we develop pair language recognition systems by studying bhattacharyya base gmm svm |

0:01:59 | by introducing mean supervector and the covariance supervector and we merge these two kind of |

0:02:07 | sub kernels together to form a better performance |

0:02:12 | for this |

0:02:13 | a hybrid system |

0:02:15 | we also |

0:02:17 | in order to compensate those duration effect |

0:02:21 | and we introduce adaptive relevance factors |

0:02:25 | alright |

0:02:27 | of |

0:02:27 | and MAP in gmm svm systems |

0:02:31 | and for the purpose of pair language recognition we introduce two set of strategies |

0:02:39 | for this a big |

0:02:40 | condition purpose |

0:02:42 | and we report our system design |

0:02:45 | for each progress |

0:02:47 | for LRE twenty eleven submission |

0:02:54 | so |

0:02:56 | in a speaker and language recognition system normally |

0:03:01 | and there are two typical kernals for gmm svm they are |

0:03:07 | kullback leibler kernel and bhattacharyya kernel |

0:03:11 | used |

0:03:12 | conventional kl kernel only includes mean information |

0:03:19 | for recognition that modeling |

0:03:22 | however |

0:03:23 | a Symmetrized version of the k l |

0:03:27 | can extend |

0:03:28 | it to include the covariance term |

0:03:33 | here |

0:03:38 | so why we choose |

0:03:40 | Bhattacha ryya based kernel for language pair |

0:03:44 | recognition |

0:03:46 | so based on many experiments |

0:03:50 | for speaker and language recognition systems |

0:03:54 | we observed the bhattacharyya based kernel has better performance than k. l. |

0:04:01 | so |

0:04:02 | in the bhattacharya kernel |

0:04:05 | there are |

0:04:07 | this kernal actually could be splitted |

0:04:09 | can be splitted into three terms the first term |

0:04:13 | can contribute is contributed by mean and covariance of |

0:04:18 | gmm |

0:04:21 | and the second term |

0:04:22 | involves the covariance term only the third term is |

0:04:27 | involves weight but |

0:04:29 | parameter of gmm only |

0:04:32 | so actually these three terms can be independently used to give |

0:04:37 | the recognition decision score |

0:04:40 | with different degree of information contribution |

0:04:46 | so by using the first term of the Bhattacharyya kernel |

0:04:50 | so with |

0:04:51 | keeping covariance |

0:04:54 | not updated |

0:04:55 | that |

0:04:56 | we can get the mean supervector train |

0:04:59 | stress |

0:05:00 | so |

0:05:02 | so these kind of kernel could be independently used as a sub |

0:05:08 | modeling |

0:05:10 | and then |

0:05:11 | second term only includes the covariance term |

0:05:14 | ah so we can get the |

0:05:18 | covariance supervectors from this term |

0:05:21 | we only use |

0:05:22 | the first two terms of the bhattacharyya kernel |

0:05:26 | for our |

0:05:28 | for our pair language recognition |

0:05:31 | system design |

0:05:35 | so the NAP for both |

0:05:39 | a mean supervector and the covariance supervector of Bhattacharyya |

0:05:47 | are trained by using different |

0:05:49 | a database with |

0:05:51 | a certain amount of overlap |

0:05:53 | this purpose is to |

0:05:56 | oh |

0:05:57 | to increase those compensation factors |

0:06:03 | so for this UBM database and the |

0:06:07 | relevance factor database training |

0:06:10 | we can |

0:06:12 | use the common to both |

0:06:15 | supervector mean and covariance |

0:06:21 | so in order to compensate duration variability we introduce adaptive relevance factor |

0:06:28 | sure |

0:06:29 | and this adaptive relevance factor of MAP |

0:06:33 | in gmm svm |

0:06:35 | here we show the MAP position |

0:06:38 | in gmm svm system |

0:06:41 | so this equation is the mean updated |

0:06:46 | of MAP |

0:06:48 | so here the x_i is the first of sufficient |

0:06:52 | statistic statistics |

0:06:54 | so you can see the relevance factor gamma_i can indirectly affect the degree of update |

0:07:02 | for the mean vectors of gmm |

0:07:06 | so |

0:07:09 | so we assume |

0:07:12 | once we |

0:07:13 | we have this relevance factor be a function of duration it is possible to do |

0:07:19 | some compensation work |

0:07:21 | in this |

0:07:24 | mean update |

0:07:27 | so far there are two types of relevance factors |

0:07:30 | one is in the classical MAP |

0:07:34 | usually we use fixed value of relevance factor |

0:07:38 | so the relevance factor also can be data dependence by this question |

0:07:45 | this equation is derived from |

0:07:48 | from the factor analysis research |

0:07:53 | here the phi is a diagonal matrix that can be trained by using development database |

0:08:01 | so assume this relevance factor be function of k. is related to the number of |

0:08:09 | features that is connected to duration |

0:08:14 | so we can see the occupation |

0:08:18 | count N_i |

0:08:20 | we do the expectation |

0:08:22 | on this occupation count and we can see this |

0:08:26 | the expectation of the occupation count is directly |

0:08:30 | proportional to proportional to the durations |

0:08:34 | so if we choose this function as the duration function for |

0:08:43 | for the relevance factor so we can have expectation of adaptation coefficient |

0:08:51 | of MAP mean adaptation trends to a constant vector so we can get this |

0:08:58 | adaptive relevance factor by this equation |

0:09:03 | so this equation will result in |

0:09:06 | g.m.m. being independent of duration |

0:09:13 | now we go to the third point of our presentation |

0:09:17 | we propose two strategies for pair language recognition the first one is one |

0:09:25 | to all strategy |

0:09:27 | also called core to pair modeling |

0:09:32 | this modeling means we train gmm svm models for certain |

0:09:38 | target language against all other target languages |

0:09:42 | so we can have the score vectors here |

0:09:45 | with this score vector and by using our development database for all the target |

0:09:53 | languages and we can have the back |

0:09:56 | the gaussian backend modelings |

0:09:58 | for this the end |

0:10:01 | for these N languages |

0:10:04 | so |

0:10:04 | finally |

0:10:07 | and language pair scores can be obtained |

0:10:10 | through the log likelihood ratios shown here |

0:10:16 | so the second |

0:10:17 | strategy is a pairwise strategy also called pair modeling |

0:10:22 | this modeling is very simple just use |

0:10:28 | two languages' database from the language pair |

0:10:31 | directly train the model of gmm svm and we get |

0:10:36 | this modeling |

0:10:38 | and we get |

0:10:39 | the scores |

0:10:44 | for the fusion of the two strategies |

0:10:46 | we only apply equal weights |

0:10:48 | for this |

0:10:50 | that means we assume |

0:10:52 | that importance of the two strategies |

0:10:54 | are the same |

0:10:55 | so we get the final score by fusion the two strategies |

0:11:03 | here we show a hybrid |

0:11:05 | pair language recognition system |

0:11:10 | we get the test utterance we can have |

0:11:13 | Bhattacharyya mean supervector and covariance supervector |

0:11:19 | together input to |

0:11:21 | the two |

0:11:22 | strategies |

0:11:24 | and we get the merging of the two supervectors in each of the |

0:11:31 | strategies |

0:11:33 | finally we fusion these two strategies together and we get the final score |

0:11:42 | we do the evaluation for our |

0:11:45 | pair language recognition design |

0:11:47 | by using |

0:11:49 | NIST LRE 2011 platform |

0:11:53 | here there are twenty-four target languages so totally |

0:11:58 | there are |

0:12:00 | two hundred and seventy six language pairs |

0:12:03 | so we choose |

0:12:05 | five hundred and twelve Gaussian components for gmm |

0:12:09 | and ubm and |

0:12:11 | oh we |

0:12:13 | do these experiments |

0:12:16 | and show the results based on thirty second task in this paper |

0:12:22 | but we also do other duration parts in our experiments |

0:12:29 | so here we use eighty dimensions MFCC SDC |

0:12:36 | and this MFCC SDC features |

0:12:39 | with energy based vad |

0:12:42 | and the performance is computed |

0:12:45 | as average cost |

0:12:47 | for the N worst language pairs |

0:12:51 | here we list |

0:12:52 | the training data base |

0:12:54 | for both CTS and BNBS |

0:12:59 | set |

0:13:02 | for our language pair recognition training |

0:13:06 | now we show the experiment results |

0:13:09 | by comparing firstly we compare the fixed relevance factor and adaptive relevance factor |

0:13:16 | effect |

0:13:17 | the table one shows |

0:13:19 | under |

0:13:21 | the core to pair |

0:13:22 | strategy we show |

0:13:25 | this |

0:13:27 | fixed relevance factor set to three different |

0:13:31 | value zero point two five eight thirty two and we give |

0:13:36 | the eer and the minimum cost |

0:13:39 | here and compare with arf that is |

0:13:43 | adaptive relevance factor and we compare these two |

0:13:48 | compare these data we can say |

0:13:54 | the adaptive relevance facotr performs |

0:13:56 | better than any of the |

0:13:59 | fixed relevance factor settings |

0:14:02 | so the similar observations |

0:14:04 | found |

0:14:08 | in this pair strategy |

0:14:11 | here and say twelve point |

0:14:13 | seven five percent for |

0:14:15 | in terms of eer |

0:14:17 | and |

0:14:19 | and the other one is higher one |

0:14:22 | with the relevance factor settings |

0:14:28 | the second experiment we are doing |

0:14:30 | is for |

0:14:32 | the effect of the merging. |

0:14:34 | the two sets of supervectors |

0:14:36 | mean supervector and covariance supervector |

0:14:40 | the blue color means the mean supervector |

0:14:44 | the green color we present |

0:14:46 | the |

0:14:48 | Bhattacharyya covariance |

0:14:49 | supervector with eighty dimension |

0:14:52 | MFCC sdc features |

0:14:54 | and arf is adaptive relevance factor |

0:14:58 | so we |

0:14:59 | we do this experiment |

0:15:02 | under |

0:15:03 | core to pair strategy and we show the red |

0:15:10 | color |

0:15:11 | this merging effect |

0:15:12 | in the red color and we can see |

0:15:15 | performance is obviously |

0:15:18 | over the previous one that's mean and covariance |

0:15:26 | this figure is based on |

0:15:28 | N top |

0:15:29 | language pairs that is |

0:15:33 | the worst |

0:15:36 | performance of EER |

0:15:38 | with N times N minus one divided by two |

0:15:42 | language pairs |

0:15:45 | so the similar |

0:15:47 | results |

0:15:50 | can be found in the |

0:15:52 | pair strategies |

0:15:54 | also the red color always |

0:15:57 | so |

0:15:59 | most of the language pairs is lower it gives |

0:16:03 | lower minimum detection cost |

0:16:10 | finally |

0:16:11 | we will show the fusion effect |

0:16:17 | with |

0:16:19 | the two pairs |

0:16:20 | the first one |

0:16:22 | the blue one is core to pair and the green one is for the pair |

0:16:27 | strategies after we merging this two strategies we can get the final results |

0:16:34 | with eer of |

0:16:36 | ten point |

0:16:37 | something percent |

0:16:38 | and the minimum cost is zero point zero nine |

0:16:46 | oh we come to conclusions for my presentation we have developed a hybrid |

0:16:52 | Bhattacharyya based gmm-svm system for pair language recognition |

0:16:57 | for the purpose of LRE twenty eleven submission |

0:17:03 | performance after the merge of |

0:17:06 | mean supervector and covariance supervector is obvious |

0:17:10 | we compare to the fixed relevance factor |

0:17:14 | and we aobserved the adaptive relevance factor is effective |

0:17:18 | for the pair language recognition |

0:17:21 | and |

0:17:22 | finally |

0:17:24 | we can say the fusion of core to pair and pair strategies |

0:17:29 | is useful |

0:17:32 | here we show some reference papers especially for the first one from patrick kenny he |

0:17:39 | proposed this database |

0:17:41 | data dependent relevance factor |

0:17:44 | thank you |

0:18:11 | oh okay |

0:18:14 | firstly we choose these |

0:18:16 | mean and covariance super vectors |

0:18:20 | this means we don't want to merge |

0:18:24 | this mean and covariance informations in one kernel |

0:18:29 | we want to separate it because we find if we separate it |

0:18:35 | we may get better performance after merging these two |

0:18:39 | supervectors together |

0:18:44 | we ever compared it |

0:18:49 | so that is when we |

0:18:53 | is |

0:18:53 | when we do the kernel with the first term and the second term merging together |

0:18:59 | to produce only one kernel and compare with the separated kernels that is mean kernel |

0:19:05 | and covariance kernel after that fusion together |

0:19:12 | the latter effect is better |

0:19:17 | okay |

0:19:24 | oh |

0:19:24 | okay |

0:19:28 | that is |

0:19:32 | i think at least |

0:19:37 | because it is based on different training and testing environment |

0:19:42 | and database |

0:19:44 | so totally the effect is obvious |

0:19:51 | oh |