0:00:16Changhuai
0:00:17you Haizhou Li
0:00:21and Ambikairajah Kong Aik Lee and
0:00:25oh
0:00:27presented by
0:00:48good afternoon every one
0:00:51the paper i would like to present is entitled
0:00:53Bhattacharyya based gmm
0:00:56SVM system with adaptive from
0:00:59relevance factor for pair language recognition
0:01:06and outline
0:01:07oh for this presentation is shown here
0:01:11in this pair language recognition system and we major focus by using
0:01:17a studying the three
0:01:20techniques including Bhattacharyya based gmm svm
0:01:25an adaptive relevance factor as well as strategies for pair language recognition
0:01:34given a specified language pair the task of
0:01:38recognition of
0:01:39language pair is to decide which of these
0:01:43two languages is in fact spoken in the specified in a given segment
0:01:50so we develop pair language recognition systems by studying bhattacharyya base gmm svm
0:01:59by introducing mean supervector and the covariance supervector and we merge these two kind of
0:02:07sub kernels together to form a better performance
0:02:12for this
0:02:13a hybrid system
0:02:15we also
0:02:17in order to compensate those duration effect
0:02:21and we introduce adaptive relevance factors
0:02:25alright
0:02:27of
0:02:27and MAP in gmm svm systems
0:02:31and for the purpose of pair language recognition we introduce two set of strategies
0:02:39for this a big
0:02:40condition purpose
0:02:42and we report our system design
0:02:45for each progress
0:02:47for LRE twenty eleven submission
0:02:54so
0:02:56in a speaker and language recognition system normally
0:03:01and there are two typical kernals for gmm svm they are
0:03:07kullback leibler kernel and bhattacharyya kernel
0:03:11used
0:03:12conventional kl kernel only includes mean information
0:03:19for recognition that modeling
0:03:22however
0:03:23a Symmetrized version of the k l
0:03:27can extend
0:03:28it to include the covariance term
0:03:33here
0:03:38so why we choose
0:03:40Bhattacha ryya based kernel for language pair
0:03:44recognition
0:03:46so based on many experiments
0:03:50for speaker and language recognition systems
0:03:54we observed the bhattacharyya based kernel has better performance than k. l.
0:04:01so
0:04:02in the bhattacharya kernel
0:04:05there are
0:04:07this kernal actually could be splitted
0:04:09can be splitted into three terms the first term
0:04:13can contribute is contributed by mean and covariance of
0:04:18gmm
0:04:21and the second term
0:04:22involves the covariance term only the third term is
0:04:27involves weight but
0:04:29parameter of gmm only
0:04:32so actually these three terms can be independently used to give
0:04:37the recognition decision score
0:04:40with different degree of information contribution
0:04:46so by using the first term of the Bhattacharyya kernel
0:04:50so with
0:04:51keeping covariance
0:04:54not updated
0:04:55that
0:04:56we can get the mean supervector train
0:04:59stress
0:05:00so
0:05:02so these kind of kernel could be independently used as a sub
0:05:08modeling
0:05:10and then
0:05:11second term only includes the covariance term
0:05:14ah so we can get the
0:05:18covariance supervectors from this term
0:05:21we only use
0:05:22the first two terms of the bhattacharyya kernel
0:05:26for our
0:05:28for our pair language recognition
0:05:31system design
0:05:35so the NAP for both
0:05:39a mean supervector and the covariance supervector of Bhattacharyya
0:05:47are trained by using different
0:05:49a database with
0:05:51a certain amount of overlap
0:05:53this purpose is to
0:05:56oh
0:05:57to increase those compensation factors
0:06:03so for this UBM database and the
0:06:07relevance factor database training
0:06:10we can
0:06:12use the common to both
0:06:15supervector mean and covariance
0:06:21so in order to compensate duration variability we introduce adaptive relevance factor
0:06:28sure
0:06:29and this adaptive relevance factor of MAP
0:06:33in gmm svm
0:06:35here we show the MAP position
0:06:38in gmm svm system
0:06:41so this equation is the mean updated
0:06:46of MAP
0:06:48so here the x_i is the first of sufficient
0:06:52statistic statistics
0:06:54so you can see the relevance factor gamma_i can indirectly affect the degree of update
0:07:02for the mean vectors of gmm
0:07:06so
0:07:09so we assume
0:07:12once we
0:07:13we have this relevance factor be a function of duration it is possible to do
0:07:19some compensation work
0:07:21in this
0:07:24mean update
0:07:27so far there are two types of relevance factors
0:07:30one is in the classical MAP
0:07:34usually we use fixed value of relevance factor
0:07:38so the relevance factor also can be data dependence by this question
0:07:45this equation is derived from
0:07:48from the factor analysis research
0:07:53here the phi is a diagonal matrix that can be trained by using development database
0:08:01so assume this relevance factor be function of k. is related to the number of
0:08:09features that is connected to duration
0:08:14so we can see the occupation
0:08:18count N_i
0:08:20we do the expectation
0:08:22on this occupation count and we can see this
0:08:26the expectation of the occupation count is directly
0:08:30proportional to proportional to the durations
0:08:34so if we choose this function as the duration function for
0:08:43for the relevance factor so we can have expectation of adaptation coefficient
0:08:51of MAP mean adaptation trends to a constant vector so we can get this
0:08:58adaptive relevance factor by this equation
0:09:03so this equation will result in
0:09:06g.m.m. being independent of duration
0:09:13now we go to the third point of our presentation
0:09:17we propose two strategies for pair language recognition the first one is one
0:09:25to all strategy
0:09:27also called core to pair modeling
0:09:32this modeling means we train gmm svm models for certain
0:09:38target language against all other target languages
0:09:42so we can have the score vectors here
0:09:45with this score vector and by using our development database for all the target
0:09:53languages and we can have the back
0:09:56the gaussian backend modelings
0:09:58for this the end
0:10:01for these N languages
0:10:04so
0:10:04finally
0:10:07and language pair scores can be obtained
0:10:10through the log likelihood ratios shown here
0:10:16so the second
0:10:17strategy is a pairwise strategy also called pair modeling
0:10:22this modeling is very simple just use
0:10:28two languages' database from the language pair
0:10:31directly train the model of gmm svm and we get
0:10:36this modeling
0:10:38and we get
0:10:39the scores
0:10:44for the fusion of the two strategies
0:10:46we only apply equal weights
0:10:48for this
0:10:50that means we assume
0:10:52that importance of the two strategies
0:10:54are the same
0:10:55so we get the final score by fusion the two strategies
0:11:03here we show a hybrid
0:11:05pair language recognition system
0:11:10we get the test utterance we can have
0:11:13Bhattacharyya mean supervector and covariance supervector
0:11:19together input to
0:11:21the two
0:11:22strategies
0:11:24and we get the merging of the two supervectors in each of the
0:11:31strategies
0:11:33finally we fusion these two strategies together and we get the final score
0:11:42we do the evaluation for our
0:11:45pair language recognition design
0:11:47by using
0:11:49NIST LRE 2011 platform
0:11:53here there are twenty-four target languages so totally
0:11:58there are
0:12:00two hundred and seventy six language pairs
0:12:03so we choose
0:12:05five hundred and twelve Gaussian components for gmm
0:12:09and ubm and
0:12:11oh we
0:12:13do these experiments
0:12:16and show the results based on thirty second task in this paper
0:12:22but we also do other duration parts in our experiments
0:12:29so here we use eighty dimensions MFCC SDC
0:12:36and this MFCC SDC features
0:12:39with energy based vad
0:12:42and the performance is computed
0:12:45as average cost
0:12:47for the N worst language pairs
0:12:51here we list
0:12:52the training data base
0:12:54for both CTS and BNBS
0:12:59set
0:13:02for our language pair recognition training
0:13:06now we show the experiment results
0:13:09by comparing firstly we compare the fixed relevance factor and adaptive relevance factor
0:13:16effect
0:13:17the table one shows
0:13:19under
0:13:21the core to pair
0:13:22strategy we show
0:13:25this
0:13:27fixed relevance factor set to three different
0:13:31value zero point two five eight thirty two and we give
0:13:36the eer and the minimum cost
0:13:39here and compare with arf that is
0:13:43adaptive relevance factor and we compare these two
0:13:48compare these data we can say
0:13:54the adaptive relevance facotr performs
0:13:56better than any of the
0:13:59fixed relevance factor settings
0:14:02so the similar observations
0:14:04found
0:14:08in this pair strategy
0:14:11here and say twelve point
0:14:13seven five percent for
0:14:15in terms of eer
0:14:17and
0:14:19and the other one is higher one
0:14:22with the relevance factor settings
0:14:28the second experiment we are doing
0:14:30is for
0:14:32the effect of the merging.
0:14:34the two sets of supervectors
0:14:36mean supervector and covariance supervector
0:14:40the blue color means the mean supervector
0:14:44the green color we present
0:14:46the
0:14:48Bhattacharyya covariance
0:14:49supervector with eighty dimension
0:14:52MFCC sdc features
0:14:54and arf is adaptive relevance factor
0:14:58so we
0:14:59we do this experiment
0:15:02under
0:15:03core to pair strategy and we show the red
0:15:10color
0:15:11this merging effect
0:15:12in the red color and we can see
0:15:15performance is obviously
0:15:18over the previous one that's mean and covariance
0:15:26this figure is based on
0:15:28N top
0:15:29language pairs that is
0:15:33the worst
0:15:36performance of EER
0:15:38with N times N minus one divided by two
0:15:42language pairs
0:15:45so the similar
0:15:47results
0:15:50can be found in the
0:15:52pair strategies
0:15:54also the red color always
0:15:57so
0:15:59most of the language pairs is lower it gives
0:16:03lower minimum detection cost
0:16:10finally
0:16:11we will show the fusion effect
0:16:17with
0:16:19the two pairs
0:16:20the first one
0:16:22the blue one is core to pair and the green one is for the pair
0:16:27strategies after we merging this two strategies we can get the final results
0:16:34with eer of
0:16:36ten point
0:16:37something percent
0:16:38and the minimum cost is zero point zero nine
0:16:46oh we come to conclusions for my presentation we have developed a hybrid
0:16:52Bhattacharyya based gmm-svm system for pair language recognition
0:16:57for the purpose of LRE twenty eleven submission
0:17:03performance after the merge of
0:17:06mean supervector and covariance supervector is obvious
0:17:10we compare to the fixed relevance factor
0:17:14and we aobserved the adaptive relevance factor is effective
0:17:18for the pair language recognition
0:17:21and
0:17:22finally
0:17:24we can say the fusion of core to pair and pair strategies
0:17:29is useful
0:17:32here we show some reference papers especially for the first one from patrick kenny he
0:17:39proposed this database
0:17:41data dependent relevance factor
0:17:44thank you
0:18:11oh okay
0:18:14firstly we choose these
0:18:16mean and covariance super vectors
0:18:20this means we don't want to merge
0:18:24this mean and covariance informations in one kernel
0:18:29we want to separate it because we find if we separate it
0:18:35we may get better performance after merging these two
0:18:39supervectors together
0:18:44we ever compared it
0:18:49so that is when we
0:18:53is
0:18:53when we do the kernel with the first term and the second term merging together
0:18:59to produce only one kernel and compare with the separated kernels that is mean kernel
0:19:05and covariance kernel after that fusion together
0:19:12the latter effect is better
0:19:17okay
0:19:24oh
0:19:24okay
0:19:28that is
0:19:32i think at least
0:19:37because it is based on different training and testing environment
0:19:42and database
0:19:44so totally the effect is obvious
0:19:51oh