0:00:06okay
0:00:07so my name is on the money and time from the what a technical to anaemia
0:00:12and i will
0:00:13was then you our by uh
0:00:15its title is analysis of large scale is
0:00:17i am very not gonna
0:00:19for language recognition
0:00:21so
0:00:21this is the outline of a war
0:00:24where i was that we don't into that and then i we spend few works on support vector machine
0:00:31now we discuss some artists
0:00:33four
0:00:33five
0:00:34training overlaps case or vector machine
0:00:37i will present the subset of that a lot of our and our remote as we
0:00:42uh trained in order to evaluate the
0:00:45they are
0:00:46performances
0:00:47then i we present our experimental results and then we compute the with some notes um pushed gmms sees them
0:00:53and the conclusion on on
0:00:55the
0:00:56training or something yeah
0:00:58so
0:00:59why is
0:01:00yeah
0:01:01okay yeah so you can say yeah the svm uh
0:01:04tend to appear in the many different
0:01:06every system also necessary but
0:01:09here we will focus on a lottery system
0:01:11just to make some it's out on uh
0:01:14we have
0:01:14one eight thick anagram based the system G S P S can seize them and pushed gmms
0:01:21they
0:01:21are they are quite different but they all share this
0:01:25that which is the svm training and
0:01:27classification for phonetic and G S P svm system
0:01:31in pushing it's actually used in a different light
0:01:34however they all need some so svm training
0:01:38so S P N
0:01:39and support vector machine only not classifier
0:01:42uh
0:01:44the objective function on uh can be cast as a regularised the risk minimisation problem
0:01:50well the loss
0:01:51for the
0:01:54most used
0:01:54the loss function is the hinge loss which use place
0:01:58to the
0:01:59what is called a soft margin classifier
0:02:02and the regularisation term is given by this
0:02:05where of the normal to the hyperplane
0:02:08uh which actually is related to the inverse of the margin so we have a trade off between them are
0:02:13doing
0:02:14and the misclassification error so
0:02:17so
0:02:19uh another formulation is
0:02:21so given by the dweller grounds and no
0:02:23the svm problem
0:02:26uh which is actually a constraint that come with so
0:02:29musician problem
0:02:30and this is the culmination is interesting because uh
0:02:34yeah we have the romantics of dot products between training set bass
0:02:40and
0:02:40the fact that we can although just one with the product silos to expand the support vector machine to nonlinear
0:02:47classification by means of what is called the kernel tree
0:02:50well we just
0:02:52my what we choose to wow high data dimensional space by just evaluating dot products in yeah in an evaluation
0:03:00base without the need
0:03:01to actually perform any kind of projection
0:03:04so
0:03:05well a scalar svm
0:03:08uh
0:03:09well uh actually because we have
0:03:12menu
0:03:12training part elsewhere yeah larry O nine we have like
0:03:15seventeen house on which
0:03:17maybe are not so many
0:03:18for a recognition system in general but
0:03:21for our uh
0:03:23needs
0:03:23they are
0:03:24um there are many
0:03:26and the dimension you are actually baby
0:03:29'cause we can go
0:03:31from
0:03:31four
0:03:32the thousand for uh a freedom although with thirty five
0:03:35Q on it so
0:03:37to more than one hundred thousand for a gmm system
0:03:41so uh now we present different targets
0:03:44to train the
0:03:46yeah
0:03:47training
0:03:47yeah i mean an efficient way
0:03:50the most of these algorithms are actually uh linear but just for lena cabinets
0:03:56um
0:03:57but actually the can we use are almost
0:04:01always lead us all this
0:04:02not a problem
0:04:04so
0:04:05the this is our baseline system which is lesbian light is one oh
0:04:09plus files
0:04:10selena space
0:04:11solvers
0:04:13uh it sounds
0:04:14do a problem uh in any therapy away
0:04:17and
0:04:18by decomposing the actual uh problem in smaller subsets
0:04:23uh the problem with that is that if the
0:04:26it says that whether i think
0:04:28time behaviour
0:04:29so
0:04:30we could
0:04:31uh
0:04:32it has a quality time behaviour and
0:04:35can this but that uh we did some work to speed it up there
0:04:38but casing and the gore vacation uh cannot evaluation
0:04:43uh however
0:04:45delude a
0:04:46evaluating on kernel um on the product so
0:04:50is also whether
0:04:52problem and the the dramatic
0:04:55tends to grow
0:04:56is what i think in the memory
0:04:59space
0:05:00so
0:05:01so we are interested the inadmissible which are actually memory bounded the
0:05:06and
0:05:07possibly yeah time on it
0:05:10so the first one we analyse it was
0:05:13bag of those which is uh
0:05:14a primary that is over based on subgradient
0:05:18all stochastic
0:05:19grabbed in
0:05:20the same
0:05:21well we hear talk about so got in because of the
0:05:25this function is not actually that immobile everywhere so would
0:05:29can not
0:05:30like
0:05:30the dragon
0:05:31we
0:05:31two
0:05:32subgradient
0:05:34and we have to cussed selection of learning sample so we do not train every
0:05:40time
0:05:41the assistant on the whole database but we just selected randomly
0:05:45training part
0:05:47right
0:05:48you know that
0:05:49to improve the convergence performance is the
0:05:53uh the
0:05:55we have uh projections that on a wall of rogers
0:05:58so
0:05:59the square root of the regularisation that we have
0:06:02the
0:06:03it's yeah problem formulation
0:06:05and this is the actually
0:06:07had
0:06:08yeah it is
0:06:09to reach convergence
0:06:11so this problem the these are gonna do not
0:06:14that it provides
0:06:15the
0:06:16the um
0:06:17do a solution of the svm problem however
0:06:20if we need it does
0:06:21we might do it
0:06:23we want to implement
0:06:24jim
0:06:24pushed gmms as the mighty proposal
0:06:27uh we can actually train them
0:06:30why we are trying to be a plane
0:06:33so
0:06:33next
0:06:34we have
0:06:35do a content inside the this time we move to that was based on
0:06:39and they can we have an iterative solver which performs
0:06:42according to this and the interviewer space actually
0:06:46so we split that one problem uh in uh
0:06:49uh
0:06:50in a serious over when evaluate the optimisation so where would keep
0:06:55all but one variable fixed and we optimise just that one variable by
0:07:01uh
0:07:02by some kind of way regarding minimisation
0:07:05yeah we just have to project regarding you know them
0:07:07to assure
0:07:09that
0:07:09the it's the end what probably constraints that statement
0:07:14so um
0:07:16okay uh this time
0:07:18we do not have
0:07:19uh that actually the primal solution but actually
0:07:23it's very easy to update it while we're updating the one solution
0:07:28and this is nice also because the
0:07:30uh in order to evaluate the product so we do not have
0:07:34just or support vector so be careful because we already have the i
0:07:38plane
0:07:39so uh this
0:07:41problem is that it
0:07:42can be
0:07:44yeah actually sped up by performing a random permutation of the supplements that is we just switch the order in
0:07:51which we optimise the variables
0:07:54and also by introducing some sort of shrieking which are actually means that we
0:07:59do not
0:08:00the
0:08:01uh we tend to not up to update the variables which are
0:08:05which have region
0:08:06the bounds of
0:08:07the guy
0:08:07the constraints of the svm problem
0:08:10because uh they will probably
0:08:12state that we just check
0:08:13the yeah the this assumption is correct when we meet
0:08:17so we actually meet
0:08:18the the comma just
0:08:19material
0:08:22so
0:08:22let's
0:08:23the
0:08:24uh we have
0:08:24it implies some space would so it which was trained
0:08:29and introduce the in svm pair
0:08:32this is based on a different formulation of the svm problem the so called ones like
0:08:36um valuable information
0:08:39yeah we optimise over the ipod plane and the slack variable
0:08:43see that
0:08:44and this time we have a much more greater of a set of
0:08:48strange so
0:08:50so what is that what
0:08:52is that here is that
0:08:54we have to rebuild that working set about the constraint over which we solve the quadratic problem
0:09:01and
0:09:02what's interesting in and
0:09:05in
0:09:05these are great
0:09:06is that the solution is not actually represented by using support back home so
0:09:11but as a present the but means so what they call basis vector which are essentially have the same role
0:09:17but they are not actually taken from the training set itself
0:09:21so uh what we obtain is that we have uh much
0:09:25sparser representation because the number of basis vectors this much
0:09:31i'm only meet at the with respect to the support vector which actually tend to increase in number
0:09:37and uh linearly with the size with the training set size
0:09:41however the problem is that this time
0:09:43cannot
0:09:44assuming
0:09:45blair recovered one solution of the svm problem
0:09:50uh what is nice to actually see that
0:09:52since we have uh so few buttons back over
0:09:55it is
0:09:56is it to extend this technique to an only not cameras
0:10:00but actually we didn't try
0:10:03so find the final and i agree
0:10:04yeah is the
0:10:06yeah that ma'am this is that they can from my uh risk minimisation framework
0:10:11uh which are uh as that of the for the svm is a what
0:10:15clotting
0:10:16and so
0:10:17so this time again we build any domain that low we won't be at work
0:10:22set of approximate solution by taking tangent planes
0:10:26objective function
0:10:28and actually solving and ah
0:10:31the minimisation on
0:10:32these up
0:10:33on the the functional approximated by means of pungent planes
0:10:38so this time we still need to solve a a quadratic problem but
0:10:42the size of the quality problem is actually
0:10:45motion and i actually equal to the number of uh
0:10:49tangent plane so we are using to approximate the function
0:10:52and this is also equal to the in number of iteration with taking so
0:10:58the size of this problem is much much smaller than the size of the original problem
0:11:03and usually can be neglected since we do not
0:11:06the need
0:11:07more than two hundred or something like that
0:11:10iterations
0:11:11so i can be
0:11:12hundreds of the
0:11:14hmmm
0:11:14the primal formulation of the svm problem but
0:11:18do a solution can be about right
0:11:19the is the also
0:11:21this time
0:11:23oh
0:11:23now yeah they re more than we try to
0:11:26uh larry model without a doubt a small subset of
0:11:30the more the so we use what you larry oh no no evaluation
0:11:34it's not the phonetic model is just
0:11:36us under the
0:11:38and bigram based the system when we perform connected according using a italian
0:11:43tokenizer
0:11:44then we're stuck and they don't count so we perform svm training and we adopt
0:11:49that yeah but a lot of uh
0:11:51kennel uh uh which actually is really not cannot so we just
0:11:55to perform some kind of
0:11:57twenty but the normalisation before feeding them to the svm on it
0:12:02then the acoustic system is a standard two thousand forty eight gosh amount
0:12:06well that the the gym and uh you
0:12:09six parameters
0:12:10we
0:12:12we study the gaussian means two things
0:12:14supervectors and we use the
0:12:16okay i can do that
0:12:18again just
0:12:19normalising
0:12:21the
0:12:21buttons
0:12:22so the system we actually train it is uh
0:12:26we were interested in evaluating
0:12:27was
0:12:28gmm push system
0:12:30where we used the svm meant what solution as the combination weights
0:12:35for the model and the T model
0:12:37and
0:12:38so
0:12:39no uh scoring is performed by means of a lack of duration
0:12:44oh the evaluation condition we tested it on the L every O nine which combines
0:12:49twenty three languages with narrowband broadcast and telephone data
0:12:54we tested the systems on the thirty second and second and three second the valuation conditions
0:12:59training was performed using seven
0:13:01T in a thousand more or less training sentences
0:13:05uh the main difference between what we did this time and what we did in the end larry O nine
0:13:10evaluation is that
0:13:12this then we train channel independent
0:13:14system
0:13:15well
0:13:16for yeah that realign we use
0:13:17channel dependent system
0:13:19so on more that's not training you know one buttons or party
0:13:24class balancing is
0:13:25simulated the for all systems
0:13:27except for a svm pad which do not actually allow for easy simulation of class balancing we
0:13:34we performed by just playing with this
0:13:37see fat or uh
0:13:39and the
0:13:40and the loss function
0:13:42and in order to improve uh uh time performance is all models when training together
0:13:49that you know to to just
0:13:51have to scan one uh
0:13:53with just one second of the database we train
0:13:56uh we we trained on the more than some
0:13:59so
0:14:00here are the results
0:14:02the four
0:14:04the phonetic system okay
0:14:06is the is
0:14:07just
0:14:07the
0:14:08the same
0:14:09system using uh this way or
0:14:12uh the hinge loss function
0:14:14uh with this idea but
0:14:16you know
0:14:17true
0:14:17two
0:14:18to give it any kind of use
0:14:21one uh results
0:14:22okay so what do you can see here is that actually
0:14:26uh the
0:14:27all results when all the system and which have made the
0:14:32type
0:14:32the combatants
0:14:34criterion
0:14:35so uh we can see that
0:14:37although the the assistant performs
0:14:39almost the same except for svm pair which is uh
0:14:43due to the lack of class about nothing
0:14:46uh so this is the same for the acoustic system
0:14:50uh the results are almost the same
0:14:52this time he were here we do not they're svm back
0:14:55'cause
0:14:56we do not have the the
0:14:58and uh that was solution in uh we need
0:15:01right
0:15:01the push gmm system
0:15:04so uh i mean was that so okay here is the the phonetic
0:15:08so the second condition
0:15:10uh what
0:15:11we can see is that actually the C D M performs very well
0:15:16yeah uh svm like which is our baseline is not shown because it to like
0:15:21more than nine thousand seconds to train it so we just
0:15:25but does show in the D C S
0:15:27but
0:15:28result but
0:15:29the time
0:15:31it to just didn't
0:15:33there was just too much to show it in the plot
0:15:36and so we
0:15:37we can see is that
0:15:38actually all the arguments the
0:15:41yeah
0:15:43improved performance with three time but
0:15:46the more
0:15:47for me one is actually record in the senate which allows us
0:15:51train the svm in less
0:15:53then
0:15:54more or less to handle seconds
0:15:56so this is the same for the test is then condition
0:16:00and the three system condition actually here
0:16:03we can not so
0:16:04that
0:16:04so
0:16:05uh the generalisation um
0:16:08generalisation uh property of the svm and
0:16:12is actually
0:16:13as is like
0:16:14back
0:16:15note before we
0:16:16actually reach the the convergence criterion
0:16:19however it's
0:16:20not so relevant
0:16:22so
0:16:23now we get to the acoustic
0:16:25pushed gmms is yeah
0:16:26okay but the effect on the north in
0:16:30nothing you with respect to the previews the grass
0:16:34uh we also have the the city average
0:16:37forms
0:16:37quite well
0:16:38well there is just a small difference in terms of this year
0:16:42between as to seize them and then one sees them but
0:16:45it's
0:16:46just
0:16:47very little
0:16:48so uh for the ten second condition with
0:16:52start to see some interesting things
0:16:55so
0:16:55and
0:16:56actually what we get
0:16:58here is that
0:16:59the statistical condition we obtain results
0:17:02which we
0:17:02did not expect
0:17:04so here for pushes them as to what
0:17:07you can see is that actually
0:17:09the
0:17:10okay you
0:17:12the first part of the graph here
0:17:14represents using pushing way so which are far from the svm optimum
0:17:18so what we obtain is that actually using the svm optimum uh
0:17:24uh is not optimal for uh at least for the three second condition for pushed gmms
0:17:29uh so we can see that
0:17:31to
0:17:32yeah we have the first iteration of the bbn matter
0:17:35be an atom heart body
0:17:37which i study is like we wanted
0:17:39two
0:17:40but for pushing by simply taking the arithmetical mean of
0:17:44true class time post and the fourth circuit same posted up
0:17:47any need of a svm training
0:17:49and actually for the three second condition is
0:17:52in this performs even better they're using
0:17:55the outer door waiting
0:17:57and
0:17:58okay here we have uh
0:18:01some other kinds of weighting which is
0:18:03very far from the optimal actually we're
0:18:06they have no or not
0:18:08no i understand the board meeting this way
0:18:10the just
0:18:12around them intonational the algorithm however they
0:18:15they are very far from the optimal
0:18:17but they are
0:18:18the best performing system
0:18:20for the second condition
0:18:22so
0:18:22yeah is what those
0:18:24what they said don't form pushed gmms is then we obtain the with the results
0:18:29even when the the push away some very far from the svm optimum
0:18:34uh model and then to more than the train just taking the arithmetic mean on the thirty second condition actually
0:18:41improves performance is a
0:18:43and well let's
0:18:44so while svm push a base pushing
0:18:48improves performance for the thirty seconds and slightly for this ten seconds condition
0:18:54for the three seconds condition we actually look
0:18:56have to look
0:18:58so now the conclusion on svm modelling
0:19:01we trained different are great
0:19:03no
0:19:04and uh what we obtain is that actually this at the emmys
0:19:07process one themselves
0:19:09well problems so we if we are still interested in the the solution these
0:19:13is provided but is that really
0:19:16however uh design put it in other wise
0:19:19at this the
0:19:21the the svm solution after
0:19:24they're each button so
0:19:26not
0:19:26right
0:19:28cannot directly be that's why the in uh this a good or a class to another minor mental
0:19:34so i svm powerful is the second fastest are great no and
0:19:38can
0:19:39take advantage over this to put on the environment since um they separate from just at the end of one
0:19:44and uh of a complete database can
0:19:47and the scaling is good also for normally not can or however the
0:19:52do what solution is not provided
0:19:54and
0:19:55class balancing can not be directly implemented the in an easy way out
0:19:59it is possible with the other art
0:20:02then B R M and uh B M a time is a much slower than the other arguments
0:20:07however
0:20:08we still have to see how how far we can how much we can speed up by using distributed environment
0:20:15since
0:20:15again this time
0:20:17uh
0:20:18weights um solution update is performed by a computer after that a complete database can
0:20:24and also what is interesting and is a great it that's it
0:20:28it's better is it works
0:20:29then
0:20:30uh what we did
0:20:31to do
0:20:32very different loss functions
0:20:34and finally the inspectors
0:20:36which which through the slower than the other one is the more already
0:20:40can not
0:20:42exploited it's a bit on the environment
0:20:44so
0:20:45like the
0:20:46uh that was
0:20:47so
0:20:48before
0:20:49so
0:20:49this is all
0:20:51uh
0:20:51questions
0:20:59exactly
0:20:59your question
0:21:06since then
0:21:07actually
0:21:09some of your
0:21:10uh
0:21:11it's yeah
0:21:12summarising
0:21:14oh great
0:21:15along the way
0:21:15fine
0:21:16oh
0:21:17oh
0:21:18yeah
0:21:18forming
0:21:19yeah solution
0:21:20does it mean that actually
0:21:21yeah
0:21:22roger
0:21:23the model
0:21:26right
0:21:27like
0:21:29right
0:21:30yeah
0:21:31the different you
0:21:33well
0:21:34okay
0:21:35actually i think that
0:21:37a uh
0:21:38the svm problem uh the optimal solution actually is trying to minimise
0:21:43the estimation of yeah
0:21:45of the general
0:21:46is that just that estimating the generalisation error or the svm uh
0:21:50because we training it on train set and it tries to estimate the generalisation error but
0:21:55still using the training set
0:21:57so when we actually deployed
0:21:59so it
0:22:00it that can solve than that
0:22:02the actual uh best generalisation error is not the thing that when we have
0:22:08reached
0:22:09type combat and the materials but
0:22:11when we poses
0:22:14uh less
0:22:15five
0:22:15uh criterion for convergence
0:22:18just at some iteration
0:22:20before
0:22:20actually produce it
0:22:21this again
0:22:22in some iteration before
0:22:24whatever
0:22:25uh maybe it's just
0:22:28imposing like
0:22:30less tighter conditions
0:22:38not to mention
0:22:40uh
0:22:42what we did
0:22:45training process
0:22:47let's
0:22:47then yeah
0:22:49maybe
0:22:50nominee
0:22:50anyway
0:22:51you know the
0:22:54your
0:22:56the number of samples that you trained on most uh
0:22:58entails
0:22:59yeah
0:23:00your
0:23:00oh
0:23:01kernel matrix
0:23:02not not
0:23:03she said
0:23:04min
0:23:05yeah yeah well actually it should
0:23:07but
0:23:08and uh we went
0:23:09from a lower rio five where we have five thousand to everyone and where we have seventeen thousand so we
0:23:15don't know next to what we have uh
0:23:18so if we would have liked
0:23:19thirty thousand and uh i don't know if
0:23:22it
0:23:23just
0:23:23i
0:23:24memory
0:23:25so
0:23:26yeah yeah actually uh S P and i was trained by evaluating the uh can on my tricks and storing
0:23:32it in my memory
0:23:35i where you
0:23:37perform
0:23:37or send you it
0:23:44i had a quick question when you say class balancing you mean uh
0:23:48giving equal to the weight
0:23:50so the
0:23:50positive and the negative yeah well actually i mean we tried to simulate where the same number of true samples
0:23:57four samples by playing with the uh
0:24:00but i mean
0:24:00with the scaling parameter of the loss function
0:24:03but is
0:24:04we just
0:24:05divide the through loss
0:24:08the to the losses for the two buttons and the losses for the most part
0:24:12just wait and if
0:24:13yeah
0:24:15okay let's take the speaker can