Speech Transcript - Comparison of Large–scale SVM Training Algorithms for Language Recognition

okay so my name is on the money and time from the what a technical to anaemia and i will was then you our by uh its title is analysis of large scale is i am very not gonna for language recognition so this is the outline of a war where i was that we don't into that and then i we spend few works on support vector machine now we discuss some artists four five training overlaps case or vector machine i will present the subset of that a lot of our and our remote as we uh trained in order to evaluate the they are performances then i we present our experimental results and then we compute the with some notes um pushed gmms sees them and the conclusion on on the training or something yeah so why is yeah okay yeah so you can say yeah the svm uh tend to appear in the many different every system also necessary but here we will focus on a lottery system just to make some it's out on uh we have one eight thick anagram based the system G S P S can seize them and pushed gmms they are they are quite different but they all share this that which is the svm training and classification for phonetic and G S P svm system in pushing it's actually used in a different light however they all need some so svm training so S P N and support vector machine only not classifier uh the objective function on uh can be cast as a regularised the risk minimisation problem well the loss for the most used the loss function is the hinge loss which use place to the what is called a soft margin classifier and the regularisation term is given by this where of the normal to the hyperplane uh which actually is related to the inverse of the margin so we have a trade off between them are doing and the misclassification error so so uh another formulation is so given by the dweller grounds and no the svm problem uh which is actually a constraint that come with so musician problem and this is the culmination is interesting because uh yeah we have the romantics of dot products between training set bass and the fact that we can although just one with the product silos to expand the support vector machine to nonlinear classification by means of what is called the kernel tree well we just my what we choose to wow high data dimensional space by just evaluating dot products in yeah in an evaluation base without the need to actually perform any kind of projection so well a scalar svm uh well uh actually because we have menu training part elsewhere yeah larry O nine we have like seventeen house on which maybe are not so many for a recognition system in general but for our uh needs they are um there are many and the dimension you are actually baby 'cause we can go from four the thousand for uh a freedom although with thirty five Q on it so to more than one hundred thousand for a gmm system so uh now we present different targets to train the yeah training yeah i mean an efficient way the most of these algorithms are actually uh linear but just for lena cabinets um but actually the can we use are almost always lead us all this not a problem so the this is our baseline system which is lesbian light is one oh plus files selena space solvers uh it sounds do a problem uh in any therapy away and by decomposing the actual uh problem in smaller subsets uh the problem with that is that if the it says that whether i think time behaviour so we could uh it has a quality time behaviour and can this but that uh we did some work to speed it up there but casing and the gore vacation uh cannot evaluation uh however delude a evaluating on kernel um on the product so is also whether problem and the the dramatic tends to grow is what i think in the memory space so so we are interested the inadmissible which are actually memory bounded the and possibly yeah time on it so the first one we analyse it was bag of those which is uh a primary that is over based on subgradient all stochastic grabbed in the same well we hear talk about so got in because of the this function is not actually that immobile everywhere so would can not like the dragon we two subgradient and we have to cussed selection of learning sample so we do not train every time the assistant on the whole database but we just selected randomly training part right you know that to improve the convergence performance is the uh the we have uh projections that on a wall of rogers so the square root of the regularisation that we have the it's yeah problem formulation and this is the actually had yeah it is to reach convergence so this problem the these are gonna do not that it provides the the um do a solution of the svm problem however if we need it does we might do it we want to implement jim pushed gmms as the mighty proposal uh we can actually train them why we are trying to be a plane so next we have do a content inside the this time we move to that was based on and they can we have an iterative solver which performs according to this and the interviewer space actually so we split that one problem uh in uh uh in a serious over when evaluate the optimisation so where would keep all but one variable fixed and we optimise just that one variable by uh by some kind of way regarding minimisation yeah we just have to project regarding you know them to assure that the it's the end what probably constraints that statement so um okay uh this time we do not have uh that actually the primal solution but actually it's very easy to update it while we're updating the one solution and this is nice also because the uh in order to evaluate the product so we do not have just or support vector so be careful because we already have the i plane so uh this problem is that it can be yeah actually sped up by performing a random permutation of the supplements that is we just switch the order in which we optimise the variables and also by introducing some sort of shrieking which are actually means that we do not the uh we tend to not up to update the variables which are which have region the bounds of the guy the constraints of the svm problem because uh they will probably state that we just check the yeah the this assumption is correct when we meet so we actually meet the the comma just material so let's the uh we have it implies some space would so it which was trained and introduce the in svm pair this is based on a different formulation of the svm problem the so called ones like um valuable information yeah we optimise over the ipod plane and the slack variable see that and this time we have a much more greater of a set of strange so so what is that what is that here is that we have to rebuild that working set about the constraint over which we solve the quadratic problem and what's interesting in and in these are great is that the solution is not actually represented by using support back home so but as a present the but means so what they call basis vector which are essentially have the same role but they are not actually taken from the training set itself so uh what we obtain is that we have uh much sparser representation because the number of basis vectors this much i'm only meet at the with respect to the support vector which actually tend to increase in number and uh linearly with the size with the training set size however the problem is that this time cannot assuming blair recovered one solution of the svm problem uh what is nice to actually see that since we have uh so few buttons back over it is is it to extend this technique to an only not cameras but actually we didn't try so find the final and i agree yeah is the yeah that ma'am this is that they can from my uh risk minimisation framework uh which are uh as that of the for the svm is a what clotting and so so this time again we build any domain that low we won't be at work set of approximate solution by taking tangent planes objective function and actually solving and ah the minimisation on these up on the the functional approximated by means of pungent planes so this time we still need to solve a a quadratic problem but the size of the quality problem is actually motion and i actually equal to the number of uh tangent plane so we are using to approximate the function and this is also equal to the in number of iteration with taking so the size of this problem is much much smaller than the size of the original problem and usually can be neglected since we do not the need more than two hundred or something like that iterations so i can be hundreds of the hmmm the primal formulation of the svm problem but do a solution can be about right the is the also this time oh now yeah they re more than we try to uh larry model without a doubt a small subset of the more the so we use what you larry oh no no evaluation it's not the phonetic model is just us under the and bigram based the system when we perform connected according using a italian tokenizer then we're stuck and they don't count so we perform svm training and we adopt that yeah but a lot of uh kennel uh uh which actually is really not cannot so we just to perform some kind of twenty but the normalisation before feeding them to the svm on it then the acoustic system is a standard two thousand forty eight gosh amount well that the the gym and uh you six parameters we we study the gaussian means two things supervectors and we use the okay i can do that again just normalising the buttons so the system we actually train it is uh we were interested in evaluating was gmm push system where we used the svm meant what solution as the combination weights for the model and the T model and so no uh scoring is performed by means of a lack of duration oh the evaluation condition we tested it on the L every O nine which combines twenty three languages with narrowband broadcast and telephone data we tested the systems on the thirty second and second and three second the valuation conditions training was performed using seven T in a thousand more or less training sentences uh the main difference between what we did this time and what we did in the end larry O nine evaluation is that this then we train channel independent system well for yeah that realign we use channel dependent system so on more that's not training you know one buttons or party class balancing is simulated the for all systems except for a svm pad which do not actually allow for easy simulation of class balancing we we performed by just playing with this see fat or uh and the and the loss function and in order to improve uh uh time performance is all models when training together that you know to to just have to scan one uh with just one second of the database we train uh we we trained on the more than some so here are the results the four the phonetic system okay is the is just the the same system using uh this way or uh the hinge loss function uh with this idea but you know true two to give it any kind of use one uh results okay so what do you can see here is that actually uh the all results when all the system and which have made the type the combatants criterion so uh we can see that although the the assistant performs almost the same except for svm pair which is uh due to the lack of class about nothing uh so this is the same for the acoustic system uh the results are almost the same this time he were here we do not they're svm back 'cause we do not have the the and uh that was solution in uh we need right the push gmm system so uh i mean was that so okay here is the the phonetic so the second condition uh what we can see is that actually the C D M performs very well yeah uh svm like which is our baseline is not shown because it to like more than nine thousand seconds to train it so we just but does show in the D C S but result but the time it to just didn't there was just too much to show it in the plot and so we we can see is that actually all the arguments the yeah improved performance with three time but the more for me one is actually record in the senate which allows us train the svm in less then more or less to handle seconds so this is the same for the test is then condition and the three system condition actually here we can not so that so uh the generalisation um generalisation uh property of the svm and is actually as is like back note before we actually reach the the convergence criterion however it's not so relevant so now we get to the acoustic pushed gmms is yeah okay but the effect on the north in nothing you with respect to the previews the grass uh we also have the the city average forms quite well well there is just a small difference in terms of this year between as to seize them and then one sees them but it's just very little so uh for the ten second condition with start to see some interesting things so and actually what we get here is that the statistical condition we obtain results which we did not expect so here for pushes them as to what you can see is that actually the okay you the first part of the graph here represents using pushing way so which are far from the svm optimum so what we obtain is that actually using the svm optimum uh uh is not optimal for uh at least for the three second condition for pushed gmms uh so we can see that to yeah we have the first iteration of the bbn matter be an atom heart body which i study is like we wanted two but for pushing by simply taking the arithmetical mean of true class time post and the fourth circuit same posted up any need of a svm training and actually for the three second condition is in this performs even better they're using the outer door waiting and okay here we have uh some other kinds of weighting which is very far from the optimal actually we're they have no or not no i understand the board meeting this way the just around them intonational the algorithm however they they are very far from the optimal but they are the best performing system for the second condition so yeah is what those what they said don't form pushed gmms is then we obtain the with the results even when the the push away some very far from the svm optimum uh model and then to more than the train just taking the arithmetic mean on the thirty second condition actually improves performance is a and well let's so while svm push a base pushing improves performance for the thirty seconds and slightly for this ten seconds condition for the three seconds condition we actually look have to look so now the conclusion on svm modelling we trained different are great no and uh what we obtain is that actually this at the emmys process one themselves well problems so we if we are still interested in the the solution these is provided but is that really however uh design put it in other wise at this the the the svm solution after they're each button so not right cannot directly be that's why the in uh this a good or a class to another minor mental so i svm powerful is the second fastest are great no and can take advantage over this to put on the environment since um they separate from just at the end of one and uh of a complete database can and the scaling is good also for normally not can or however the do what solution is not provided and class balancing can not be directly implemented the in an easy way out it is possible with the other art then B R M and uh B M a time is a much slower than the other arguments however we still have to see how how far we can how much we can speed up by using distributed environment since again this time uh weights um solution update is performed by a computer after that a complete database can and also what is interesting and is a great it that's it it's better is it works then uh what we did to do very different loss functions and finally the inspectors which which through the slower than the other one is the more already can not exploited it's a bit on the environment so like the uh that was so before so this is all uh questions exactly your question since then actually some of your uh it's yeah summarising oh great along the way fine oh oh yeah forming yeah solution does it mean that actually yeah roger the model right like right yeah the different you well okay actually i think that a uh the svm problem uh the optimal solution actually is trying to minimise the estimation of yeah of the general is that just that estimating the generalisation error or the svm uh because we training it on train set and it tries to estimate the generalisation error but still using the training set so when we actually deployed so it it that can solve than that the actual uh best generalisation error is not the thing that when we have reached type combat and the materials but when we poses uh less five uh criterion for convergence just at some iteration before actually produce it this again in some iteration before whatever uh maybe it's just imposing like less tighter conditions not to mention uh what we did training process let's then yeah maybe nominee anyway you know the your the number of samples that you trained on most uh entails yeah your oh kernel matrix not not she said min yeah yeah well actually it should but and uh we went from a lower rio five where we have five thousand to everyone and where we have seventeen thousand so we don't know next to what we have uh so if we would have liked thirty thousand and uh i don't know if it just i memory so yeah yeah actually uh S P and i was trained by evaluating the uh can on my tricks and storing it in my memory i where you perform or send you it i had a quick question when you say class balancing you mean uh giving equal to the weight so the positive and the negative yeah well actually i mean we tried to simulate where the same number of true samples four samples by playing with the uh but i mean with the scaling parameter of the loss function but is we just divide the through loss the to the losses for the two buttons and the losses for the most part just wait and if yeah okay let's take the speaker can

Comparison of Large–scale SVM Training Algorithms for Language Recognition

SESSION 9: Language recognition – general and data

Added: 14. 7. 2010 11:08, Author: Sandro Cumani (Politecnico di Torino), Fabio Castaldo (Loquendo), Pietro Laface (Politecnico di Torino), Daniele Colibro, Claudio Vair (Loquendo), Length: 0:24:22