0:00:06morning
0:00:07uh
0:00:08what i would like to present
0:00:09here today is uh
0:00:11our
0:00:14language recognition
0:00:15two thousand nine
0:00:16submission
0:00:18and
0:00:19we did
0:00:20a lot of work
0:00:21after the evaluations to figure out
0:00:23what happened
0:00:25darcy
0:00:25them into our lives that because actually
0:00:27we saw a big difference
0:00:29in the performance
0:00:30on our
0:00:31development data and the
0:00:33on
0:00:33then
0:00:34that on the actual evaluation
0:00:36so
0:00:37uh
0:00:39first
0:00:41i will try to explain
0:00:42what
0:00:43new in the
0:00:44uh language recognition
0:00:46what happened
0:00:47in the year two thousand nine
0:00:49that in some new data
0:00:51then i will go through
0:00:53a very quick and brief description of our
0:00:55all system
0:00:56and then i will
0:00:58try to concentrate on the
0:01:00uh issues of the calibration and data selection
0:01:03and uh how we resolve
0:01:05problems with our original development set
0:01:08then i will
0:01:09try to
0:01:09conclude
0:01:10our work
0:01:12so
0:01:13in
0:01:14two thousand nine
0:01:15what was new
0:01:16that
0:01:17the new uh source of the data came into the language recognition
0:01:21actually these data are
0:01:22broadcast
0:01:23the voice of america
0:01:25and
0:01:26we found a big are high
0:01:27of
0:01:28about
0:01:29for the three languages and
0:01:31uh the data uh
0:01:33out of this archive
0:01:34what's the use
0:01:35and actually only the detected telephone calls
0:01:38and um
0:01:40which
0:01:41this data brought
0:01:42at peak variability
0:01:44a to the original cts data we always used for the training of our language ideas
0:01:50so it
0:01:51okay
0:01:51it brought some
0:01:53new problems with calibration and channel compensation
0:01:57so
0:01:58uh
0:02:01these are the languages
0:02:03uh which
0:02:04are present
0:02:05i would have to check if they are still present in the
0:02:08a voice over
0:02:08of the M erica archive
0:02:10as you can see
0:02:13the
0:02:13and multiple languages
0:02:15here is a very huge and
0:02:17it brought
0:02:18very very nice
0:02:19dataset
0:02:20to test our systems on and
0:02:22ability to improve the language recognition
0:02:25stems
0:02:26two
0:02:26two
0:02:27uh actually classify
0:02:29more languages so
0:02:31for the
0:02:32two thousand nine
0:02:33nist lre
0:02:35these are the
0:02:36twenty three
0:02:37uh
0:02:38target languages
0:02:39and the bold ones
0:02:41other languages
0:02:42uh where the only uh
0:02:46well the
0:02:47that we had
0:02:47only uh data coming from the
0:02:50from this was
0:02:51of and there are high
0:02:53so there was no cts data for training on these languages
0:02:56on the other languages we also had
0:03:00normal
0:03:01continues
0:03:02speech data are recorded by
0:03:04L D C
0:03:05previous times
0:03:07and also for the
0:03:08two thousand nine
0:03:10uh evaluation
0:03:11so we had to deal with this issue
0:03:13and
0:03:15uh
0:03:17and
0:03:18uh do the
0:03:19proper calibration
0:03:21and channel compensation
0:03:22so
0:03:23what
0:03:24be more tomatoes
0:03:25after the evaluation to do this work and work
0:03:28again
0:03:28on our development set and to do a lot of experiments was
0:03:32but we saw a huge difference
0:03:34between the performance
0:03:36our
0:03:36original
0:03:37development set
0:03:38and uh
0:03:40you've all said
0:03:41which the uh we
0:03:43which was uh
0:03:44corrected by
0:03:45nice
0:03:47so all of the
0:03:48numbers you will see here
0:03:50will be the
0:03:52average detection cost
0:03:53defined by nice
0:03:55and
0:03:57uh
0:04:00yeah uh
0:04:02on the
0:04:03language recognition workshop there about
0:04:05there were a lot of discussions about that
0:04:07crafting of
0:04:08uh a
0:04:09development set
0:04:10alarm systems
0:04:11so
0:04:12uh
0:04:12some
0:04:13some people created a rather small and
0:04:16very clean upset
0:04:17we we had a
0:04:19actually a very very huge
0:04:20development set containing a lot of data
0:04:23which brought some computational issues
0:04:25to train the systems but
0:04:27uh we decided to go
0:04:29with this development
0:04:30set
0:04:31the big one
0:04:33and
0:04:33in the end it didn't show
0:04:35to be maybe the but the she's
0:04:37but that but decision but
0:04:39we
0:04:40had to
0:04:41well with that
0:04:42so
0:04:43and
0:04:44we
0:04:44presentation of our
0:04:46us
0:04:47is what we had in the
0:04:49in the
0:04:49summation so
0:04:51we had two types of uh
0:04:53uh front ends
0:04:54the first
0:04:55on acoustic frontends which are based
0:04:58on the gmm modelling and the features are
0:05:00mfcc derive actually
0:05:02these are the
0:05:03uh popular shifty don't like cepstral features
0:05:06and
0:05:07for the system we had
0:05:08there
0:05:08jfa sixteen
0:05:11we tried a new feature extraction
0:05:13based on the audio the
0:05:15and then we had their eighty and then
0:05:20maximum
0:05:20mutual information criterion
0:05:22and using the channel compensated features
0:05:26also
0:05:27we tried to normal
0:05:28gmm with a guilty features without any channel compensation
0:05:32we perform the
0:05:33well tract length normalisation
0:05:35cepstral mean and
0:05:37and variance normalisation
0:05:39and reading the voice activity detection using car
0:05:42hungarian phoneme recogniser
0:05:44when we
0:05:45where we met all of this
0:05:46speech phonemes to the
0:05:48speech and nonspeech
0:05:49the to decide
0:05:55yeah thanks
0:05:56then
0:05:56it's a standard based jittery sistine
0:05:59uh
0:06:00as you can see
0:06:01a sorry but
0:06:02this time of course without
0:06:04and the eigenvoices there is only a
0:06:07uh channel
0:06:08variability present
0:06:10so
0:06:10we had
0:06:11some super vector
0:06:12of gmm means for every speech segment
0:06:15and which is
0:06:16then uh
0:06:17channel dependent
0:06:19the this
0:06:19uh channel loading matrix was trained using the
0:06:22E M algorithm and
0:06:24the five hundred
0:06:26sessions for every language very used to train
0:06:29uh the
0:06:31the channel loading matrix
0:06:32and uh
0:06:33language dependent uh super vectors
0:06:36the alice
0:06:38the remote adapted using the
0:06:40rather than smart all these but also trained
0:06:43using the five
0:06:44hundred segments
0:06:45there
0:06:46a language
0:06:49actually this
0:06:50is the core acoustic system here
0:06:53because
0:06:54uh
0:06:55it uses also our delta features and
0:06:57as you will see
0:06:58later on we decided to drop the audio D features and use
0:07:02just the J faces
0:07:03scheme
0:07:04eating the shifted of packets
0:07:11yeah we tried
0:07:12a new discriminative technique to derive our features
0:07:15uh this is technique
0:07:17based on the
0:07:19a region dependent linear transforms this is a technique
0:07:22uh
0:07:22which was introduced in the speech recognition but it is known as
0:07:26S and P E
0:07:27the idea is that
0:07:28we have some
0:07:29you know transformations
0:07:31which will take our features
0:07:33and
0:07:34then
0:07:35we take the linear combinations of the transformation to
0:07:38uh
0:07:39two
0:07:42for menu
0:07:43uh feature which would be which should
0:07:46uh be discriminate
0:07:47it's trained so
0:07:49i know but
0:07:50picture and i will try to
0:07:52at least
0:07:53very briefly
0:07:55uh describe what is going on so
0:07:58in the star
0:07:59we are
0:07:59having
0:08:00some linear transformation
0:08:02in the beginning there are initialised
0:08:04two
0:08:05great just the shifted delta cepstral features
0:08:08we have some
0:08:09G M and which is trained on all or
0:08:11over all languages
0:08:13and which is
0:08:14select the two which is uh
0:08:16suppose
0:08:17two
0:08:17so like
0:08:18the
0:08:19uh
0:08:20here the transformations in every step
0:08:23it actually provides the weights
0:08:25we are uh then we are combining
0:08:28these
0:08:28transformation
0:08:29so for every twenty one frames
0:08:32we
0:08:33we take the we we take the twenty once frames
0:08:36mfcc put it into the gmm
0:08:38then we take the most meaning
0:08:40gaussian components
0:08:42which provide us the weights
0:08:44and
0:08:44we will combine
0:08:46according to this might be a combined is linear transformations
0:08:49usually
0:08:51it happened that
0:08:52only one
0:08:53or three
0:08:55a gaussian
0:08:55components
0:08:56for these twenty one frames
0:08:58where nonzero also
0:09:00uh not all of these other transformations were linearly combined all the other
0:09:05weights are set to zero
0:09:07so
0:09:08then we are taking the eating area
0:09:10combined transformations
0:09:13and
0:09:13summing up
0:09:15and then
0:09:16there is a gmm
0:09:18which will
0:09:18estimate these feature and according to the training translate criteria
0:09:23we will update
0:09:24these
0:09:25linear transform
0:09:26and then we go
0:09:27one other
0:09:28one two months frames
0:09:30train the system so
0:09:31here
0:09:32in the end what we have
0:09:34after the training
0:09:35this
0:09:36will be the features
0:09:37we will feed you are
0:09:39jeff face
0:09:43the next
0:09:43acoustic system what that was a gmm
0:09:46they two hundred
0:09:47and for the adults and
0:09:49one and
0:09:50which was
0:09:51uh discriminatively trained using tandem i
0:09:54uh criterion
0:09:55and
0:09:58uh we use the features which are which where
0:10:01penn state
0:10:04so that was
0:10:05for acoustic subsystems
0:10:07then
0:10:08some common technique from then
0:10:10uh
0:10:13the core of our well but
0:10:14but but think systems where of course our
0:10:17phoneme recognisers
0:10:19the first one to english one is a gmm based
0:10:22uh phoneme recogniser which is based on our
0:10:25on the triphone acoustic models from an
0:10:28lvcsr
0:10:29than with just a
0:10:30oh
0:10:30take
0:10:31uh language model
0:10:33the two other uh
0:10:35for the party
0:10:36for not phoneme recognisers of the russian and hungarian
0:10:39our neural network based
0:10:40well the
0:10:41neural network
0:10:42uh
0:10:43estimates the posterior probabilities
0:10:46of the phonemes and then
0:10:47it feeds them to the hmm for the decoding
0:10:50so
0:10:51these
0:10:52we uh phoneme recognisers were used to be able
0:10:56three uh binary decision tree language models
0:11:01and
0:11:01one svm
0:11:02well
0:11:04which was
0:11:04based on the hungarian phoneme written
0:11:06nice
0:11:07here the foreground
0:11:08where use
0:11:09and uh
0:11:10as we and was actually using only the trying around
0:11:13uh
0:11:14a lattice come
0:11:15as a feature
0:11:21then
0:11:21uh we were doing a fusion
0:11:23um we use it
0:11:24and you go
0:11:25multiclass
0:11:27uh logistic regression
0:11:28focal
0:11:29toolkit
0:11:30so whatever assisting
0:11:32uh
0:11:34the thing is
0:11:36but the first time we had we didn't
0:11:38trained to three separate beckons for the
0:11:41each condition
0:11:42we tried
0:11:42to do the
0:11:45duration independent fusion so
0:11:48every sixteen
0:11:50was a coding
0:11:51some
0:11:52raw scores
0:11:53and
0:11:53in addition to these it was outputting also some information about the line
0:11:57a segment
0:11:58which for the
0:11:59acoustic system or was
0:12:01number of frames and uh
0:12:02phonotactic systems they provide it
0:12:05number of phonemes
0:12:07then these
0:12:08a raw scores for every systems
0:12:10where
0:12:11we are going to lose
0:12:12uh the gaussian backend
0:12:14we had
0:12:15three but gaussian back and
0:12:17persisting because we use
0:12:19uh
0:12:21three
0:12:21and so
0:12:22uh lance normalisation either we
0:12:25divided discourse
0:12:27by the
0:12:28uh by the land
0:12:30or
0:12:31two okay square root or
0:12:33we didn't do anything
0:12:34and then
0:12:35we put
0:12:36all of
0:12:37the L wheels of the
0:12:38uh these goals and back and sing to the multiclass
0:12:41uh
0:12:42a logistic regression
0:12:43discriminatively trained
0:12:45and i'll put most
0:12:46that a calibrated
0:12:48language
0:12:48look like
0:12:49course
0:12:51so
0:12:53here's
0:12:53scheme of the
0:12:54fusion
0:12:55so again
0:12:56it's is
0:12:57thing
0:12:57uh i'll put
0:12:59uh
0:13:00four
0:13:01and
0:13:02it's a it's either
0:13:04taken as it these
0:13:05or
0:13:05it's normalised by
0:13:07where or
0:13:08divide it
0:13:09and
0:13:09uh then the output of the gaussian beckons
0:13:12both
0:13:13also together with the information about the lines to the discriminant
0:13:16it's criminal
0:13:17multi possible just
0:13:18regression
0:13:22so
0:13:24the
0:13:26the actual
0:13:26core of this
0:13:28paper
0:13:29was to
0:13:30was
0:13:31to go
0:13:32uh so our development set and decide
0:13:35whether
0:13:35or
0:13:36address but the problem
0:13:37you're right
0:13:38like thing
0:13:39uh our friends
0:13:40int or you know
0:13:42get too much yeah
0:13:43who provided us with their development set
0:13:47so we were able to do
0:13:48this analyse
0:13:49actually
0:13:50in the tory no they had
0:13:52much uh
0:13:53small development set then we had
0:13:56it contained about uh
0:13:58if i correctly may remember
0:14:00ten thousand segments of
0:14:01and thirty three
0:14:03thirty four languages our development set was
0:14:05very huge it contained
0:14:07data from
0:14:08fifty seven languages and about uh
0:14:12sixty thousand
0:14:13second
0:14:14so we did the experiment
0:14:16we try to recreate the
0:14:18putting the whole uh
0:14:20training
0:14:21set and
0:14:22development set
0:14:24and also we had
0:14:25of course all training at developments and then we
0:14:27the the four
0:14:28types
0:14:29experiment i'd everywhere
0:14:30training
0:14:31our systems
0:14:32are the system and cutting
0:14:34and calibrating in on the
0:14:36uh
0:14:37put it all
0:14:38they
0:14:39what it does set or
0:14:40we trained
0:14:42on the
0:14:43L P T set and then potty break it on our set
0:14:46or
0:14:46we train
0:14:48our set and calibrated
0:14:49on the L P T outright
0:14:51trained
0:14:51on our set and
0:14:53i degraded one hours
0:14:54so
0:14:54these
0:14:55while at
0:14:57while i
0:14:57columns
0:14:58our our
0:14:59original scores
0:15:01these analyses of course
0:15:02was done
0:15:03using our
0:15:05our
0:15:06uh one
0:15:06the stick
0:15:07subsystem the jfa system
0:15:09because it would be
0:15:10very um feasible to run all of the systems
0:15:13again
0:15:14for the training so
0:15:16as you can see
0:15:17we had some
0:15:18serious issues for some languages actually these were the languages
0:15:21uh whether only the what's of america
0:15:25uh data were available
0:15:26so
0:15:27bosnian language
0:15:28was an issue you can see a big
0:15:30difference
0:15:31between a
0:15:32twenty two and our set the the blue blue column
0:15:35is
0:15:36just
0:15:37training on our set
0:15:38and using the
0:15:39putting those
0:15:40the
0:15:41development set for calibration so
0:15:43there must have been uh some
0:15:45some bothersome issue
0:15:48in our development set
0:15:50so
0:15:50the problems where the
0:15:51wasn't in
0:15:52farsi
0:15:56and also
0:15:57the final
0:15:59final score
0:16:00we were
0:16:01everywhere
0:16:03gaining some
0:16:04performance
0:16:06a loss
0:16:07uh
0:16:08so we try to
0:16:09focus on these languages and fine
0:16:12that should we had in our development
0:16:15so the first
0:16:16first we should we found was
0:16:18ridiculous
0:16:19we had
0:16:19mislabelled one
0:16:21language in our development set
0:16:22actually that was a labour
0:16:24label for
0:16:25far as the and
0:16:26version
0:16:27and we treated them as
0:16:28different languages so we
0:16:31we corrected is or and
0:16:33the problems for the for the language
0:16:35mostly disappear
0:16:36the next problem
0:16:38we
0:16:38we address was
0:16:40finding the repeating speakers between
0:16:43training and development set because
0:16:46based on the discussions
0:16:47on the
0:16:48language recognition workshop
0:16:50we already
0:16:52a suspect it this can be a problem
0:16:54for our
0:16:55uh
0:16:56training and develop
0:16:57so
0:16:58what we D
0:16:59we trained the
0:17:00our speaker I D's
0:17:01stint from
0:17:03previous
0:17:04evaluations
0:17:05which is a gmm based
0:17:07speaker I D's
0:17:08dean
0:17:09and
0:17:11uh
0:17:12train the models for every train
0:17:14segment
0:17:15inside the language and test
0:17:17again the segment
0:17:18in the
0:17:19uh developments
0:17:21what we ended up
0:17:22was this
0:17:23uh
0:17:24bimodal uh
0:17:25distribution of
0:17:27scores
0:17:27so
0:17:28uh
0:17:29this part here
0:17:36this part here
0:17:38these are the
0:17:38hi
0:17:39speaker I discourse
0:17:40and it's uh just
0:17:41there are some recruiting speakers
0:17:44between the training and the developments
0:17:46so
0:17:48when they look at these pictures
0:17:50we decided
0:17:52to threshold the data and to discard
0:17:54everything from our development set
0:17:56what is
0:17:58higher
0:17:58score then
0:17:59for this ukrainian language
0:18:01uh
0:18:02of
0:18:03discourse
0:18:04the threshold
0:18:04twenty
0:18:06did this
0:18:07experiment
0:18:09we discovered that
0:18:11we are we are discarding
0:18:13for some languages
0:18:14yeah disquiet discarding almost everything from our development set
0:18:17for example bosnian
0:18:19we ended up
0:18:20with the
0:18:20just fourteen
0:18:21fourteen segments in our development set
0:18:24and
0:18:25for the other languages
0:18:26where
0:18:27very very doing the
0:18:29speaker i didn't
0:18:30cation filtering
0:18:31we also discarded a lot of the data for example ukrainian only twelve
0:18:35well segments
0:18:36inaudible
0:18:39so
0:18:39what was the performance change when we did this experiment
0:18:43really me
0:18:45or
0:18:45correcting the label
0:18:47or already it was easy
0:18:49and it
0:18:50a show
0:18:50and that the
0:18:52did does
0:18:52some
0:18:53uh
0:18:54proven
0:18:55and then
0:18:55speaker I D filtering
0:18:58this was
0:18:58white huge
0:18:59different
0:19:01in the performance so
0:19:04these
0:19:04again these are the results for our acoustic
0:19:07subsystem the jfa
0:19:09two thousand
0:19:10what they got
0:19:11as with
0:19:11the R T L T features
0:19:17so
0:19:18when we did this
0:19:21we decided to run
0:19:22the whole fusion on our filter
0:19:24data
0:19:25it's not that
0:19:26we we didn't change
0:19:28the nature or or we didn't retrain
0:19:30and you far system we had in the
0:19:32submission
0:19:33for the
0:19:34nist language recognition evaluation we just
0:19:37filtered out
0:19:38course
0:19:39from our development set and
0:19:41run diffusion again
0:19:43and we were
0:19:44gaining
0:19:45some
0:19:45performance
0:19:46improvements
0:19:47quite
0:19:48substantial
0:19:49so for the
0:19:50third the second condition
0:19:52the C average went from
0:19:54two point three to one point ninety three which is
0:19:57what a nice
0:19:58improvement and the
0:20:00if you look at the table for
0:20:01every duration
0:20:04the improve there is
0:20:05an improvement
0:20:06i think there is no number
0:20:08which deteriorated
0:20:09so
0:20:10it worked
0:20:11all over the conditions and the
0:20:15uh over
0:20:16oh
0:20:16the
0:20:17all
0:20:18uh
0:20:18set and
0:20:19for every language and
0:20:21four
0:20:22every duration
0:20:25what we also
0:20:26so here
0:20:27what's a little
0:20:29you duration
0:20:30of the results on our developments
0:20:33yeah it it could be
0:20:36address
0:20:37could be
0:20:39the the cost could be that the our system
0:20:42right trained actually to the
0:20:44that that is
0:20:45speaker and they they're more i can recognise
0:20:47the speaker then the
0:20:48then the language
0:20:50for some languages
0:20:55so then
0:20:56we decided to work on our
0:20:58uh
0:20:59acoustics just in the jfa
0:21:02are the L D system
0:21:03because
0:21:04we wanted to do also
0:21:06another possible experiments to improve the final
0:21:09final fusion
0:21:11so
0:21:12what we did
0:21:13we
0:21:13just discarded the
0:21:15audio T features and use
0:21:17the plane shifted delta cepstra
0:21:19train
0:21:20the system
0:21:21and
0:21:21it uh
0:21:23there was some improvement
0:21:25out of this
0:21:27also what we did
0:21:28was to train the jfa
0:21:30using all
0:21:31all the segments
0:21:32there
0:21:33language
0:21:34instead of five hundred segments
0:21:36the or language and
0:21:38this
0:21:38uh brought
0:21:39so some
0:21:40nice improvement
0:21:41so when we
0:21:42did the
0:21:44final fusion
0:21:46we
0:21:47is guarded the
0:21:48are the L T J face
0:21:50in
0:21:50replace it with the normal
0:21:52jfa
0:21:53justin
0:21:54the and the my
0:21:55us
0:21:56still remained in the fusion
0:21:57and instead of
0:21:58all other
0:22:00uh binary trees and
0:22:01that one is we and we
0:22:02we put there
0:22:03actually a lot
0:22:04of the svm
0:22:06systems which are phonotactic
0:22:07based and
0:22:09uh
0:22:10they are based on
0:22:11our
0:22:12uh
0:22:12all of us
0:22:13all of ours
0:22:14uh phoneme recognisers and that a much because we'll have at all
0:22:18on the
0:22:19two P M
0:22:20and we will he will explain
0:22:22more
0:22:23about this
0:22:24stem cell
0:22:25when we did this the final fusion went from
0:22:29one point nine
0:22:30as we saw previously
0:22:31two
0:22:33uh one point
0:22:34fifty seven
0:22:34which is
0:22:35very competitive results
0:22:37of course
0:22:38it's a positive relation with
0:22:44so what is the conclusions
0:22:46of this work
0:22:48we have to really care about our development that the data and rather than
0:22:52creating a huge
0:22:53huge development set it
0:22:55better to
0:22:56pay attention and
0:22:58and
0:22:59have it
0:23:00smaller box filter and
0:23:02clean
0:23:03uh development set we actually did experiments with
0:23:05trying given more data
0:23:07two or seven
0:23:09it didn't help us
0:23:10the problem of the repeating speakers
0:23:13between
0:23:14between the training and the development set was
0:23:17i was like
0:23:18large
0:23:19and
0:23:20we should pay attention when we are
0:23:22doing the next evolves
0:23:23so that this
0:23:24well
0:23:26so thank you
0:23:27and
0:23:33also
0:23:45uh_huh
0:23:47okay
0:23:51what
0:23:51oh
0:23:53oh
0:23:54about a person so
0:23:57we're principles is what
0:24:00later use
0:24:02a we are we looked at least
0:24:04and
0:24:04yeah we
0:24:05we talked with them with one
0:24:06in the workshop and they were
0:24:08they were doing the speaker filtering stuff
0:24:11but we didn't uh filter they are set according to our uh training set
0:24:15uh to
0:24:16but even the filter
0:24:18uh the repeating speaker
0:24:19what remained there
0:24:21we just use as it was
0:24:26oh
0:24:27right
0:24:30do you
0:24:31same speakers element
0:24:33i see
0:24:36we should
0:24:37we we don't know that and we didn't check it
0:24:40we we just
0:24:41it uh
0:24:41wanted to treat our evaluation set
0:24:44S
0:24:44and evaluation set and we didn't look at it yeah
0:24:47yeah
0:24:48you know robust
0:24:50you could probably get a little uh
0:24:53well i think that the the remote or not
0:24:55so much speakers repeating in that you've also because
0:24:58as i understood nice
0:25:00was using some uh
0:25:02previously recorded data
0:25:04and that
0:25:05it is probably much less likely that
0:25:08that there will be the meeting speakers again
0:25:11for something which is of course it can happen
0:25:13four
0:25:14some of them but we we didn't actually check if those
0:25:17or so
0:25:19well short but
0:25:21oh
0:25:22you lose it seems
0:25:24this
0:25:25list
0:25:25uh
0:25:26you choose that actually
0:25:28would be to then the more
0:25:30uh
0:25:31yeah
0:25:31yeah
0:25:33yeah it is like that
0:25:34uh
0:25:35we were
0:25:36making a lot of effort to
0:25:38try this new our guilty technique and
0:25:40uh
0:25:40which didn't work
0:25:41also what was working was
0:25:43combining 'cause of
0:25:44many phonotactic systems as
0:25:46as you did in your
0:25:48submission and
0:25:48yeah
0:25:49very easily combining
0:25:51thirteen pca base
0:25:52as we and
0:25:53since
0:25:53based on our phoneme recognisers
0:25:55what's actually
0:25:57uh
0:25:57very
0:25:59very nice
0:26:00the results
0:26:01are quite compact it if if you will
0:26:03and the number one one seventy eight
0:26:05just
0:26:05these
0:26:06svm systems
0:26:07where better then
0:26:08our
0:26:09final submission even after the filtering of the
0:26:13of the calibration
0:26:16like speaker
0:26:17sort of
0:26:19let's
0:26:20but
0:26:21oh