so the the graph an and and what i would like to talk about today E uh how we combine the the multiple binary classifiers to solve the well class problem and the technique uh that we are going to use these the the geometric programming we're we can uh solve the problems the using one of the convex of to my this and solver so uh i would like to start we'd the multi-class class mornings and and then are to the two different approaches to solve the multi-class learning to or multiclass problems either direct method or we we can reduce the the multi-class problems in to the multiple binary problems K is such a case actually we need to really combine the mode uh but multiple binary problems to determine the final answer uh to the multi-class problems and then probably i formulated this aggregation problems and the a geometric programming so that we can always find a global solutions "'kay" so you know or to introduce a geometric programming formulations and then i will introduce a soft models and L one norm regularized maxent lightly missions missions and no oh we we just some of the uh the numerical experiments and then and can close so the multi-class problems for example uh we need to really a signed the class labels uh from one to K A uh uh to the data point okay so either direct methods okay so suppose actually we have a three classes and the direct method try to find the sum the separating hyperplane which discriminate uh these three classes uh in general uh the separating hyperplane these not really linear okay uh on the other hand uh for the binaural decomposition the method of for example all paired so we look at the sum the para one and two and two and three and one and three so we look at the sum the binary pair problem still we can always find the sum the binary classifiers and the remaining problem ease of how we really uh aggregate this um the solutions of the binary problems you know to determine the final answer to the multi-class problems okay so what are the advantage of the binary decompositions over the direct method and is but easier and the simple or tool on the classifiers and there's a lot of other sophisticated classification uh classifiers actually as mister ready for the binary uh the D i mean the binary problems for example support vector machines and we're so this is a better suited to the parallel computation so for example of these uh the three is a well-known examples for the binary decompositions and also we can really three the binary decompositions they're the binary encoding problem okay so well in other words how we aggregate the uh multiple binary answers is uh a binary decoding i mean the decoding prop so for example the one versus all K as so we have a three crises and then first the binary classifiers actually discriminate the first the class uh from the the remaining class and we're still the second binary classifiers discriminate second class from the remaining class then so one and all pairs where we need to look at the sum the pair of one and two and two and three and one and three and we're still all they're error correcting output coding also can be uh you use so all we really uh determine the some of the the court words oh where some of the code words have as a maximal uh the hamming distance so you in other words of some of the solutions we'd the tolerable error is actually can be uh correctly classified so in other words the binary decompositions really uh leads to the sum uh code matrix tk so in this case we have a three class and the three binary classifiers so uh these are the uh exemplary who may trick uh for the one versus all and all pairs and L error correcting up according okay so in other words actually we need to really train the uh three deeper in the binary classifiers uh following the the oracle a in this school me too "'kay" so for example or the case of the one versus a and then D is the code matrix produced by the one versus soul and is that a case and we need to oh train the three different pine binary classifiers and for example all X up i is that the data and the target labels is a two so it's such a case actually uh so the target label two and then in terms of the binary classifiers to actually the correct labels should be zero one and zero okay so the first the binary classifier a second binary and third binary classifiers so we i mean these uh binary classifier followed the at the binary label uh uh in this good matrix so we trained a binary classifiers and then L each binary class apart classifiers produce the some the probability asked me a for example we can use the support vector machines with sick a the model so that the the banner classifier the produce the some the scores uh which uh between zero and one "'kay" so the the problem here E "'kay" so we trained the three binary classifiers and each binary classifiers produce the sum of the scores between the your and one K and you're to answer to the multi-class problems and that we have to really combine the and determine the by the three binary classifiers okay so how we really aggregate uh the binary classifiers and the sum of the charade heuristics these uh for the case of the all pairs and then we do easily majority voting as and then for the case of though one versus all and they maybe the maximum always means and hard decoding case and the we uh find the court word we to best match the collection of the predicted result computed by the binary classifiers okay so in the case of the three class and that we have a three code words okay and then train the three binary classifiers so that given a some the test at what point K and the three binary classified the produce a sum score is okay so the collection of the door the three values can stick you'd the three the mental vectors okay and then we search actually uh which code word he's best match the these uh the three dimensional a prediction result in or to really determine the final answer to the multi-class problems okay and or so all we can it a probabilistic decoding in other words in this case actually we need we really need to compute the class membership probabilities okay so uh C L can get the class members probabilities and then we can really uh to the prediction uh for the class so uh uh one of the popular approach in the problem list decoding is actually the based of the uh the bread are models and the let me just briefly to i mean the explain actually what really a the braille lit model oh used doing in this case so was again actually L we have a three class okay and that these reading tidy model has been used to relate the binary predictions with the class members to probably so for example okay so we have a some uh the three as servers could use the by the three binary classifiers and that we have to relate to those answers to the uh the class membership probabilities so in such a case actually we treat the class members probabilities as a prayer "'kay" so in the case of the all pairs binary D competitions and then okay so the capital P one uh star he's actually uh the a the class membership probabilities uh for the the data point access star okay no known actually a a this should be a uh see okay so this is actually the the class membership probabilities and then D is a uh all pairs result okay so all these are i mean the blue oh high like the things are actually the based on the bradley terry model and then we introduce a someone out a a is high want i to and pride three and then these uh relations are directly from the bread lee terry model and they just sub J star is actually the probability mate determine the by uh by the binary classifiers so we have that this was okay so in or to really compute the class membership probabilities and we treat them as a or and that we asked make these parameters by minimizing the okay of that verse as uh between the uh the binary for uses and or so these pie a with uh coming from them so oh is such a case actually a a from but actually which exploit it is uh the techniques uh here he's that's a that's probably three or you know other words actually uh a number of parameters grows with the number of uh the the training example okay so if you have a assumed the a huge number of training examples and then we have a huge number of uh a parameters you those should be really uh up team so all or some of the uh the existing tech the actually the base on the bradley terry model and then all one of the recent tech and he's they tried to find that some all optimal aggregation okay so uh why optimize aggregation is good you because some is some of the prediction is by and live by like i fired oh the aggregate of i mean entire performance okay so somehow if they can really uh determine this i mean the come up with a weights which all see i mean all team hourly aggregate the uh the binary predictions and then that we can really of we did this prob so of all these uh a technique a uh has been done for the optimal aggregation actually but but based on the uh uh the bread carry model a a a a uh but the problem here is actually a a a a simple of the really a lot my the um probably decoders uh i use that red with you model so the number of parameters is actually the aggregation weights and also class membership probabilities which to grow as a we the number of uh example so name and a lot my is a problem and or so uh this is not really a not i mean that not convex of to the some problems of doesn't guarantee a global lesions okay so what i would like to hear he's actually uh we would like to formulate these problems uh as a convex up to my this um prop okay so all in the aggregate some model actually a we don't look at really the bread lead a model but yeah we we ah uh uh use a softmax model uh which was also a recently uh use uh uh by us exactly and i C yeah the last year so yeah all introduce an i mean the softmax models and and is such a case uh and so these are actually the and um different binary classifiers and you know our approach actually the writers that all the aggregation weights okay so each a a class classifiers is really uh by the different uh a this and the W want through double sub and and then i goal is actually a optimized these court presents to produce a some the best uh uh a combinations of the binary prediction okay so the class and then we're the probabilities of follows the of the softmax of functions so in other words i mean the probability of wise the i equal K given some parameter this is the aggregation weight and the data point X of i follows the softmax functions uh but the exponent E uh the way sum of the discrepancy so okay so these are the the discrepancy between the code word and then the binary prediction okay so for example maybe we can use a cross-entropy entropy other functions case so all this easy really uh the probably extensions of the loss based be decoding i a and in this way actually we have only um mean the aggregation weights oh as a parameter so based on this models and then uh we write the likelihood of the training data so these C's the likely you and then maybe be of the details you can find in the papers uh and then we add the uh L one norm regularization as okay so the negative log-likelihood likelihood the L one norm regularization Z and then we come up with the some the law some exponential function and then uh we figured out E so uh are our optimized nation is actually the minimize the loss of exponential function as that's some sex the uh some of the plastic uh coefficients okay and the loss some one it's of times an is a context "'kay" so we can really solve this problem as a convex of my Z a problem uh what we figured out about a a a a a two years ago he's actually be can form this i mean but can really think this into the the geometric programming uh so i mean this is just a short introduction of the geometric programming and this is a problem of i i mean the standard form of the german to programming and the we minimize uh some of the pose in on the L and that was not a it on you but we i mean that but the different sees uh the exponents are allowed to be a real valued okay in a plain on me on the exponent is only should be the integer so the minimize sum of was not me L under this all inequality constraint and also so you quality constraint "'kay" and then uh this uh the geometric programming in a on L from always can be can already to the sum of the german program in convex uh on is you just a well in comics i so is is our of the optimisation and problems and and uh we can really write is uh optimize as the geometric programming in either convex or the port a all forms uh so there is actually a uh some efficient since all words of yeah so we simply to the use that solvers two oh find the actually uh the minimum of this uh objective a function so in in experiments and that we compared the some of the uh existing work uh which is a loss to based decoding which is just a one of the heart according and the so the map is uh one of the optima a uh aggregation method based on the bradley terry model so of these are the some of the the data actually uh uh uh uh on you C i uh we pasta re and the number of samples uh uh a east stand then these are the number of attributes and the number of a class and then we compared to some the classification performance uh uh for the three different encoding technique all pairs one versus or then there cracking up of chord and that these are the result for the loss based decoding and then W map and that these are the uh a result of our men i so i mean like that the wrecker experiment then uh our method up from uh better than oh these to existing method and although this is also the optimal aggregation uh but this involves the really uh should number of frames so uh i mean run time is really really i mean our method is much faster than the uh the pretty a now our case actually or i mean the parameters our only the aggregation weights so in conclusion i is actually be uh present the sum of the convex optimisation techniques needs uh for a aggregation of the binary classifiers to oh solve the multi-class problems uh but we chose as the geometric programming because uh our or objective function can be easily fit into the standard form of the german to programming and then we compared to uh the classification performance to some of the existing method to show you mean the the method we proposed these well seems to work uh a better than some of the existing F and then this clues um i that that you all you know the fact that you knew i have fewer were parameters for your method is that that i presume that directly relates to um that you're less likely that over fit is that i think is the right to right yeah because i mean the previous one has as you to number of parameters so the easily over feet and then maybe uh i mean that might be the one of the reason actually why are we're method performs a better than the sum of so did you want to compare your results with uh uh to class not a location for example you could have used a multinomial logistic regression as the combined as and instead of uh comparing all the against the fusion of mine be classifiers for solving the to class rob ah i i i don't like the okay so yeah maybe we can compare but uh i don't think we really compared to a to the the multinomial logistic regression uh because multinomial logistic regression and also convex so you might be right there you right okay so we really didn't do it but we will okay a number of features actually that the added descriptions is uh oh case of the number of attributes okay is from some tend to six and read and you and you on the data no no no actually we just the user some though whole uh features actually so this is just a matter of that classifier perform was not a feature extraction a right thank you