0:00:15thank you very much
0:00:17thanks to the organisation for the enhanced percent in a hardware work
0:00:24which is still trying to complement well
0:00:28so with some post analyses the necessary they larry able to
0:00:34you to the due to some somebody beauties a meat couldn't come here so i'm
0:00:42try to percent
0:00:45thank you now present if you tell somewhat all overview about the other we submissions
0:00:52where system
0:00:55we have some hypotheses are not at each that they would like to show you
0:01:00a how we work with a development dataset and the man on interactions that we
0:01:07the evaluation results and someone of these things and configurations on the lesson study we
0:01:13learn from this
0:01:16okay still
0:01:18very briefly the other we are able to a shown was focused on the development
0:01:23of language recognition systems
0:01:26for very closely related languages
0:01:30so well we have to twenty target language is a split across
0:01:35six different clusters and the participants have to devise their own development set
0:01:43there were mean up to maine a channels the telephone speech and a broadcast speech
0:01:50and here we have the six different plaster probably chinese english french slide we can
0:01:56be very in
0:01:58them the performance metric was the average of the performance within each cluster so
0:02:04these a low to development
0:02:06the development of six different a separate systems for
0:02:11it's cluster
0:02:13since the we have to torture the language in each cluster
0:02:18okay so
0:02:20we have before the yellow re some hypotheses the first one was that
0:02:27there where the data that there where l limit mismatch between that there and the
0:02:33test set up
0:02:36as we have seen the previews salaries but of course work
0:02:41i say so you
0:02:43second one is that the bottleneck features where all
0:02:47good features for these kind of a task
0:02:50and also you that
0:02:52we we're right from these hypotheses
0:02:57i where hypothesis here was that the fusion with multiple systems
0:03:02a it was a nice approached to increase their
0:03:07and we were run
0:03:12have a good development dataset design would be crucial
0:03:15and we were
0:03:19we have i mean three octaves here are the for one was to design a
0:03:23development dataset
0:03:25the second be below innovative approach is to dialect id
0:03:31on the third one select a rubber used fusion coming from the right of complementary
0:03:36bottleneck features so features
0:03:40but we were all developing on their
0:03:43darpa rats program
0:03:44and also
0:03:46fusion with the different backend classifier
0:03:52so first we use plead that data in eighty percent for training and twenty percent
0:03:56for that
0:03:58a constant mentioned in his last question it was but there are a decision that
0:04:04passage so you
0:04:05or it could be better
0:04:09and we have ten audio files per language you need you need to split
0:04:17we prevent to have these telephone conversational scrollers uttering and taps
0:04:23and in here we include a equal proportion of thirty four of telephone speech and
0:04:29broadcast speech in its in need to split
0:04:33and we screwed switchboard one and two basically because
0:04:38our first experiments didn't so great impact on that
0:04:43probably because we
0:04:45didn't expect these huge missed spots
0:04:50and so we
0:04:53get their from the with that they out your we changed a the audio to
0:04:59different segments of three seconds to assist a short durations
0:05:06a the end we have a wrong hundred k used for they ubm and i
0:05:10p i ubm training and which in the training data used for take a back
0:05:17and classifiers
0:05:21we contextualized features with different methods like sdc
0:05:26and deltas and double deltas at run p c d or pca dct and also
0:05:32we fusion different i-vector system select from a traditional features and at the end they
0:05:40bottleneck where training with these combination of different
0:05:44a better original features with different context of sessions
0:05:52for data back and classifiers we used a the gaussian backend and a neural networks
0:06:00both methods are very well known for the community
0:06:05and two methods for adapt that the other coalition back and which aims to better
0:06:10cope with a mismatch conditions
0:06:13basically it's a based on the a i-vector taste we try to select some i-vectors
0:06:19are from their from the training to train the gaussian backends
0:06:24and also the resolution and neural networks that
0:06:29it was a new method the we propose here
0:06:32and i aims to exploit day they this short dialect differences that we caff or
0:06:39with the phonetic information
0:06:42so a we have a different chunk durations from short directions to thirty two seconds
0:06:51direction a chance and the phone segment and we have a different weights for each
0:06:56for each
0:06:59for each tank
0:07:01okay and here we have comparison
0:07:05for all these five
0:07:07i can systems that we had
0:07:10they multi-resolution neural networks was performed the but the best solution we're using the best
0:07:20single bottleneck features and the number linux features in the case of the a multiresolution
0:07:25neural network we were using just the bottleneck features because
0:07:29we need phonetic information so as to make sense to use the bottleneck features
0:07:37since aware bottleneck feature for training with it for the siemens
0:07:42and also another thing it that the additive gaussian backend approaches were more complement are
0:07:49we with a normal bottleneck i-vectors
0:07:54we're uncle these systems as we can see here for our data
0:07:59and here
0:08:00what it would like to show you use that it clearly works much better the
0:08:04bottleneck features and non bottleneck features
0:08:07for a
0:08:10for the feature for the for the backends
0:08:14okay so this is it
0:08:15in general i claim or a of our system
0:08:20at the end of the consumptions we used fusion somehow some of this of these
0:08:26systems fusion like seek so or all five or six hours of them
0:08:34where we in clusters specific fusion or on overall the a data fusion and we
0:08:41with that the scores we get the look really cute conversions also or into the
0:08:45cluster or with a global
0:08:47with the global locally the huge radio and at the end this is therefore
0:08:51aw systems that we were percent the
0:08:55so the for our primary systems were used in five weight cluster based fusion
0:09:02cluster based log-likelihood conversions
0:09:05all the second one was to system we fusion a cluster based conversions the third
0:09:10one was used using the belgian but can only five wait a cluster based fusion
0:09:16and the for one was with us as the second one
0:09:20but we think global compression of day likely if you to reduce
0:09:24okay so some evaluation analyses is
0:09:32we got the
0:09:33test data we can see the future work that we have the difference between the
0:09:39on the test we were from well
0:09:41three percent to twenty three percent
0:09:45it is huge
0:09:47and of course we have questions weight happened right
0:09:51so this is a round also for it the core to compare the data under
0:09:58as we can see here this is our primary system
0:10:01so it's i think it's real one to say that are there is a three
0:10:06five percent of relative gain over the best single system that
0:10:13on the test
0:10:14we got a eight percent lost and on the evaluation
0:10:19okay so
0:10:22for us what was more important and distribution okay
0:10:25t and use a different
0:10:27algorithms that they have to develop a and use agreed a development set up
0:10:38due to these several the mismatch what is more important the algorithms that use of
0:10:42human data
0:10:44and we run some analyses of to try to have some a answers to these
0:10:51using an mfcc
0:10:53plus deltas and double and the task weights at the nn out a gaussian backend
0:11:00is that sixty nine twenty here
0:11:04so after
0:11:07which good discussions with something so the evaluation will there are several factors
0:11:13in the development least
0:11:16all morse
0:11:17the chunking didn't help at all
0:11:21so we're gonna do some experiments just removing the a the a the chunks of
0:11:27the all on that
0:11:30also the different this plead
0:11:34most of the team square you seen sixty percent now forty or sixty percent for
0:11:39training and forty percent for development
0:11:44would like to things the in made to guys for providing their the least that
0:11:48we were using
0:11:51and also usual the data for the final mark and training and calibration
0:11:56was also a key
0:11:58thing to do
0:12:01i'm unit using the uniform s p duration for the dev segments
0:12:06and also we run some augmentation of the data and some double algorithms that we
0:12:13okay so here is the results post evaluation results so us we can see we
0:12:20went from our primary system and twenty three point three
0:12:25to say fusion system to twenty one point nine within the fusion just that one
0:12:31and we keep
0:12:35improving if we modify the training and that this pleading we are you seen
0:12:40all the all the data for the training the ubm and the backend systems and
0:12:46diffusions and also
0:12:49you we are not chunking we're we are also improvement
0:12:53the performance so id in we could have fifteen percent a relative gain
0:13:01out so
0:13:03so that that's shows that a the development data was crucial easy solution
0:13:09also scenes
0:13:12a small leak said they where using a different ubm system for used its cluster
0:13:17we want to also
0:13:19use these solution and we also
0:13:22could see some improvement
0:13:25thanks to guys from prior for that
0:13:30that so we want to study how we how sensitive he's the different
0:13:36a blocks in our paper claim to this mismatch so we use radar so get
0:13:42some data from the from the test put on the development we create up for
0:13:46full deviations of that this they don't get some data on the different parts of
0:13:51the of our paper
0:13:55easily we can say that they back end that a and the i-vector extractor sniffling
0:14:02c significantly impact the mismatch a lot because we can see there is a few
0:14:07percent of relative gain an s sixty percent of relative gains seen in
0:14:16steps a respectively
0:14:18so some message to take a means that
0:14:23for us it didn't work they fusion and the chunking training data for day for
0:14:30the classification
0:14:32and it works
0:14:34and also it works for the rest of the groups i guess the bottleneck features
0:14:39the gaussian and a neural networks cans
0:14:45and also it were so
0:14:48it was a low you that are the having a good development set it was
0:14:54something very important for this
0:14:57okay something top
0:15:05we have time core for questions
0:15:12all the channels cz getting they segments that we have and lead segment a speeding
0:15:20very short segment
0:15:22from the second two seconds
0:15:27for the backend was used for the work
0:15:37and the question
0:15:46just like i guess this is a commonality whatever's but we define a fact that
0:15:51we could be successful with an at twenty split and with doing a segment durations
0:15:58for all classifier trained
0:16:03figure two no
0:16:04so we are
0:16:06is not the ones for this okay good to know
0:16:09we could you sure the spleen at least
0:16:12just yes i think we could we had documentations in it too so we have
0:16:17to talk about that part of this
0:16:23could you put up to us like the can where you didn't the twenty at
0:16:27the at twenty and then went down to the sixty forty splits
0:16:33so that it was really nice to see that because i think most groups we
0:16:37saw most sensitive using sixty forty than the data retrain right we didn't have an
0:16:43operating cycles receive you cycles what an hour training so we did we actually started
0:16:47to sixty which was where her track what hurt us
0:16:50but i think most folks of they started with the at if they didn't do
0:16:54a retrain probably
0:16:56did or did okay
0:16:58but i think that's actually showed really nice improvement on where exactly so when you
0:17:03do all
0:17:05you did is then all test
0:17:09that is the you that is the and
0:17:16to other questions
0:17:23okay well let's think the speaker again thing