0:00:15my name is the only hobby a from part process research centre points of what's
0:00:20and on a take the topic e is the i-vector more than in q we
0:00:25deep belief networks
0:00:27for multi session speaker recognition
0:00:32you know the acoustic modeling a using deep belief networks have been shown to be
0:00:37effective in speech recognition area and it's the getting popular not nowadays
0:00:43but a very few items the using only r p m's restricted boltzmann machines or
0:00:49generative ubms have been carried out in speaker recognition area
0:00:54we have proposed in our period previous work is that the was published in i
0:01:00can speak at some fourteen
0:01:01we use the both generative and discriminative it dbn
0:01:07on that work we use the only a single session target i-vectors as the inputs
0:01:12the to the networks
0:01:15in this paper we extend our previous work from a single decision to a more
0:01:20decision test
0:01:22that the we have used the then
0:01:24i-vector challenge database in these experiments
0:01:28and also we have modified our proposed impostor selection method that the
0:01:34to be more accurate and more robust against the its parameters
0:01:41first the ability to short a background about the deep belief networks and then i
0:01:46will go
0:01:48i will describe a all our dbn based system and then i will go or
0:01:54more in details the in our proposed impostor selection method
0:01:58and the i didn't show the experimental results that and at the and the conclusion
0:02:07deep belief networks the are originally a problems
0:02:11probabilistic generative models
0:02:14that every two at some layers are treated as the restricted boltzmann machines
0:02:21and the old ones are you to our bn will be the inputs to the
0:02:29above all the m and is trained to label layer
0:02:33however by adding top
0:02:37label layer this you know generative dbn can be converted to a discriminative want by
0:02:42doing the standard back propagation
0:02:48in this is like the i have some information about the how they are bm
0:02:54is trained and trained and
0:02:56how it's the good fit for to be matched with the per training a neural
0:03:04networks but i think i can escape is
0:03:09it's and i and is better to focus on our method
0:03:17less remind what's the problem
0:03:19the problem is to model each target the speaker be a valuable i-vectors what we
0:03:26have you are five i-vectors are part of i-vectors per each target speaker and a
0:03:32large amount of background the i-vectors of the development set
0:03:37our proposal is to use the deep belief networks for two main reasons
0:03:42first is the two
0:03:46face first is to take that want a job well unsupervised learning using the
0:03:52i relevant background data at the development set
0:03:55and to take that mine page of a supervised learning to train each target model
0:04:00and discriminatively
0:04:04this is the whole blacked out drama all our proposed method let's the two in
0:04:12the widely in three main is that's
0:04:15the first is that is balanced training
0:04:19what what's the problem imbalanced training here in this case the we have a large
0:04:25amount of background i make doors as a negative samples and if you amount of
0:04:31a target data at the positive samples
0:04:33as we are going to model each target speaker discriminative leaving it you get let's
0:04:41and the training the network with such a on balanced training be the list the
0:04:48overfitting
0:04:51so the solutions we have proposed here to decrease the number of background i-vectors as
0:04:57much as possible in their effective way
0:05:02we don't is in tremendous that's the first
0:05:05we select the only those background i-vectors that are more informative
0:05:14and then clustering the selected on in post or by k-means algorithm and the using
0:05:21cosine distance criteria
0:05:24and then using the
0:05:28the imposed and the cluster centroids as a negative samples
0:05:33and then finally a we will distribute a the positive and negative samples and equality
0:05:39in mind the mini batch it
0:05:47the second step is the adaptation process that you have proposed in our previous work
0:05:54i adaptation using all the background i-vectors we have be trained at a deep net
0:06:03network
0:06:04unsupervised think the without a label
0:06:07and because the trained model universal deep belief network
0:06:12and then each to target the speaker network speaker will be adapted from this a
0:06:19universal dbn
0:06:21but how adaptation the works
0:06:25adaptation
0:06:26be initialized and the networks the i instead of randomly and be initialized by the
0:06:33ubm parameters
0:06:35and then do they are unsupervised learning
0:06:40on we the balanced data all
0:06:44from this of one for only a few iterations
0:06:50in our previous work we have shown that
0:06:53the period and the pre-training in this case
0:06:56works better than random initialization
0:06:58and the proposed occupation works better then pre-training
0:07:05the second is that this last is that is fine tuning that is actually a
0:07:10back propagating is
0:07:13the neural networks using the label later
0:07:17but we have to change something here in comparison to estimate would be perverts the
0:07:25do one the only one layer error by provided
0:07:29propagation
0:07:30for few iterations the before full back propagation is carried out
0:07:35our experimental results in our last in our own previous works shown
0:07:41as shown that is this works better because and the op the top
0:07:47the label layer
0:07:50by this is the something like a pre-training the top layer as well and it
0:07:54works better that during the whole backprop right migration
0:07:59without doing this
0:08:03on the other hand be bic and bic and a d by our black there'd
0:08:09role models is then be to two main phases that the first the phase is
0:08:15target independent and the c can is target dependent
0:08:19actually target independent using the whole background i-vectors we have we train a universal deep
0:08:26belief networks
0:08:27and it be compute the impostor centroids
0:08:32that how this process is carried out only once for all the target speakers we
0:08:38have
0:08:40in the second that's
0:08:41and you think
0:08:45using the you db and impostor centroids
0:08:49and the available target i-vectors we will train our networks the discriminative be
0:09:00let's scroll more in details in the proposed impostor selection method
0:09:04and this method is
0:09:07it is similar to the
0:09:09support vector or bayes the
0:09:13approach that proposed by mitchell at clarion and the is it compose the but we
0:09:19have used here the cosine distance criteria and the we have changes some other things
0:09:28it composed of well four main steps the
0:09:31as some of the we have the whole background i-vectors in wants to hang out
0:09:36on another so that we have the client i-vectors
0:09:39each collect direct or
0:09:42that in this case is the average all five i-vectors berries client
0:09:47be to compare our bit all background i-vectors we have
0:09:51using cosine distance criteria
0:09:53and the top and i killers this the background i-vectors to each client
0:09:59will be kept in address that thought age in this
0:10:04a steps
0:10:05and maybe do the same for all the reliant i-vectors
0:10:10until the car i-vectors the cocktail ends that we have
0:10:15and the be compute the impostor frequencies in this that age and be normalized aim
0:10:22at n is the and top i-vectors the in each other for each client and
0:10:29the whole number of collect i-vectors
0:10:33and beep is that the this normalisation
0:10:37at the impostor frequency is more robust the against the threshold that we will define
0:10:45on this the frequencies
0:10:48then we set a threshold on this normalized impostor frequencies and those impostors have higher
0:10:55frequency frequencies then this are sure will be selected that the most informative impostors
0:11:05actually we have b
0:11:10we have the impostor frequencies and for all the background i-vectors we will have one
0:11:15frequencies will be defined iterations and those i-vectors the impostors that have higher impostor frequencies
0:11:23that then defined threshold will be selected
0:11:27this the threshold and the then and parameter will be defined experimentally
0:11:33at the experiment on section
0:11:41if the order or the impostor frequencies for the
0:11:46impostors the we will see that the any post or the have the same frequency
0:11:53a impostor frequencies
0:11:55that the that's why be have
0:11:58defined at a ritual the on the impostor frequencies not just the selecting the top
0:12:06a fixed number of a simple so
0:12:12in experimental station the dataset the that you have used is the
0:12:18nist the two thousand fourteen a i-vector challenge the i-vector size that you know is
0:12:23six hundred
0:12:25post processing that you have like eight out on i-vectors on
0:12:29all mean normalization the last whitening
0:12:33one hidden layers is used in this extreme as and the hidden layer like a
0:12:37layer size is four hundred
0:12:42forty owning the
0:12:43the two parameters for the impostor selection method that is
0:12:49the threshold and the and parameter if we plot the per the minimum dcf
0:12:57verses the this threshold for different and
0:13:01we will see and he's a
0:13:03a small
0:13:05the results are not good i if and is the too high
0:13:11biz the performance of the system want to be used a bell white changing the
0:13:17original
0:13:18and the best one is the choosing in according to our experiments is choosing
0:13:23and equal to one hundred and it shows the
0:13:28by setting that originals by this we will have a minimum m
0:13:34value for minimal dcf by these utterance rolled and setting and equals to one hundred
0:13:44in experiment all the results the be in this challenge we have we had one
0:13:50baseline system that everyone knows what's the baseline
0:13:55our proposed a dbn based is then be the target independent impostors that is good
0:14:01lowball impostors for the same for all the
0:14:05target speakers
0:14:06if we
0:14:08do this experiments we will have a this results
0:14:11that the is the big difference between
0:14:15the baseline system and our system
0:14:18and if we add a the target dependent the
0:14:22targets
0:14:22to the target independent impostors that in this case is one hundred is and the
0:14:29parameter and the at this pool is targeting depend the non-target depend then we will
0:14:34have
0:14:36better performance that is the
0:14:39this
0:14:40when you
0:14:41but in this case a if we at the target dependent the complexity of the
0:14:47system will be more than the first one because the in for each target the
0:14:54for each a target speaker for just speaker we need to do the clustering separately
0:15:01what in this case we just the compute the impostor centroids the ones for all
0:15:06the speakers
0:15:11if we do this that normal score normalisation on our baseline i have on or
0:15:17dbn and basis them maybe without that normalization and the results in this
0:15:24what if the ad that normalization using the all the whole impostor database we have
0:15:29the development set we will have words results
0:15:33if it's select the only ten top one thousand kilos this i-vectors impostors we would
0:15:40have it be better what is it is the worse than a without using that
0:15:45norm that normalization
0:15:47but the
0:15:50beach the but if
0:15:52we use the same impostor selection method for that normalization v a v is the
0:16:01and setting the parameter t and aiken again for this that normalization
0:16:07we will see that we have a be in for right you be improvement here
0:16:15and the
0:16:20and the in comparison to the baseline system we will see that the we will
0:16:26have
0:16:27to in the three percent improvements
0:16:31actually this twenty percent improvement is the in comparison with these results with these results
0:16:37the that he's the all the results the improvement is more than this
0:16:44but
0:16:46in this experiment so the for impostor selection method you have used the client i-vectors
0:16:53our experiment our new results experimental results have shown that if we don't use the
0:16:58client i-vectors
0:17:00i collect i-vectors the
0:17:03and the just select the particular and the i-vectors collect i-vectors from only the development
0:17:10set we will see that the
0:17:14we will have almost the same results then this that are very similar that actually
0:17:20a
0:17:23for our system proposed system it doesn't matter that we used the client i-vectors in
0:17:28or impostor selection method or select or jobs randomly choosing a the actual and i-vectors
0:17:36from only the background i-vectors
0:17:44and the main conclusions and
0:17:46in this paper or b and b have the problem of the impostor selection method
0:17:51for that we have shown that the helps to well outs is then to what
0:18:00the
0:18:01we'll have a good important for performance in multi session task
0:18:07and that really been the out more i-vectors the well very sharp where each target
0:18:13speaker helped the dbn system to capture more speaker and session variabilities in comparison to
0:18:21the single session task
0:18:25and also the final discriminative dbn per dbn based the approach showed a considerable performance
0:18:33in comparison to the com conventional baseline system propose the wine is seen in this
0:18:41challenge
0:18:42thank you
0:18:51we have time for question
0:18:58thanks to talk alike extension of the background dataset selection that you on the
0:19:03one question that comes to mono is when you doing a selection you looking at
0:19:08all the clients that are going to be enrolled system sorry i and you know
0:19:12also are not close enough again a so when you doing this dataset selection you
0:19:16looking at what is just statistically important are the clients that are going to be
0:19:21rolling system so you're
0:19:23system itself fourteen hours information about are you going to test on
0:19:27why wouldn't you just to closed set speaker i'd say that
0:19:33so reading it
0:19:34the when you're choosing at your impostors your before you dbn training all z norm
0:19:41that selection process itself is aware of all your target speakers
0:19:47yes that's correct
0:19:48so why not take a further and just a closed set speaker i they for
0:19:52the i-vector challenge
0:19:53yes that's why i'm telling you at the experiment the results extend i told you
0:20:00if we don't use the non-target i-vectors and just the and select randomly the same
0:20:08number of actual and i-vectors only from the development set
0:20:14and we use these a in iteration process use the for instance the one thousand
0:20:21the three hundred the i-vectors randomly from the development set and do the same processes
0:20:27the computing the and impostor frequencies
0:20:30and then again choose the and the random i-vectors and do the same and computing
0:20:37the impostors and then being the outrage overall impose an impostor frequencies and you the
0:20:43same set the threshold and setting the parameters
0:20:46we had almost the very similar results of these results that you have views on
0:20:51the target like make so that's a that's a very
0:20:56client specific selection menu not aware of the other clients in that sense
0:21:01very nice
0:21:03with data yes technically looking at the other clients with against the rules of the
0:21:06i-vector challenge but he has a solution that didn't have the other thing is the
0:21:10closed set scoring don't make here for wouldn't actually work because they are all different
0:21:15speaker