0:00:34this work was supported in part by grants from the
0:00:37you states uh so it was a as of side it is so much and the it's national science foundation
0:00:43only go back
0:00:48okay so um
0:00:50the topic of the talk is on classification
0:00:53so
0:00:53in a a model based classification as you all of there
0:00:57yeah given
0:00:58a a a a a prior distribution on the classes and uh
0:01:02the and D like to function of the observations given class
0:01:05and given these two things we can come up with the uh minimum probability of error decision rule
0:01:10which i the on noise a maximum a posterior probably who
0:01:13but simplifies to the maximum likelihood rule for you likely classes
0:01:17so that that's model based
0:01:19uh classifications not a better the model is from specified then you can in principle come up with the optimum
0:01:24decision
0:01:25in contrast to the sum of our uh be a the it in what is gonna of learning based classification
0:01:29read everything is get a driven
0:01:32so you only given examples of the two classes say
0:01:35and you wanna come up with an a got of them which separates these class
0:01:39the the channel have a lower that T wish that this is in this scenario is
0:01:43that
0:01:44very you often you encounter situations that you a high dimensional data for example you have a
0:01:48so the billions video which lines be big by a by to get a
0:01:51you have hyperspectral images you have you know synthetic aperture radar images and so forth
0:01:56so get a high dimensional data on the one hand and you very few examples
0:02:00compared to the dimensionality of the data on the other hand
0:02:03now you might say well why not just use a generic uh
0:02:07i did a reduction technique like
0:02:09say pca A or L L E or so map
0:02:12well on the one hand these are really generate methods
0:02:16which are you know not
0:02:17but it really do device so the classification problems so the optimized sort another generate method mow uh measures of
0:02:24oh such as an preserving get about distances and so forth on the one hand
0:02:27and don't know other hand
0:02:29they haven't been designed with the view to high dimensionality prob the problem but that if you example
0:02:34so our approach is to sort of exploit
0:02:37what i shall call as the
0:02:38latent in low dimensional sensing structure
0:02:41now to make
0:02:42this clear let's take a the cartoon example
0:02:45let's suppose that
0:02:47you given examples of each class only two classes here
0:02:50and a learning based as a vision got in the such as svm but a kernel svm
0:02:54which simply take the data and a lot a classification rule
0:02:57in completely ignore
0:02:58if any as C structure was present or not
0:03:02in contrast to the this is
0:03:04what i would call sensing of and classification where let's say we know that that these observations came from some
0:03:09the sensing process
0:03:11say for example the blurring operator
0:03:13or of we may have either full or partial information about the blurring operator
0:03:17and to the with some noise
0:03:19and the question is can exploit knowledge of the fact that these observations came from someone underlying sensing structure
0:03:25oh a the classification performance
0:03:28no
0:03:29yeah actually the it in a to the study of one is on the fundamental asymptotic limits of classification in
0:03:34the
0:03:34so audio of high dimensional was and very few samples
0:03:38them to make things more concrete let's we assume that the the uh the did i mention and possibly the
0:03:43number of samples
0:03:44uh goes to infinity
0:03:45while these samples per dimension
0:03:48most of you
0:03:49so that this a not of a you have that if you samples are very high dimensional data
0:03:54but
0:03:55in contrast to a a number of studies in the literature which has focused on
0:03:59and S imported easy situation be want fixed the problem difficult D S imported you meaning that even if the
0:04:04dimension increases to infinity
0:04:06as it's not going to be easy to classify
0:04:09and
0:04:10for what is essentially means is that have fixing the signal to noise ratio as the problem schemes and this
0:04:15would be considered as i do the mathematical more
0:04:18a fundamental issues that be vision as is one is yes and it that's if you can performance
0:04:23uh i that this asymptotic G
0:04:25does it probably that are good to half which means
0:04:27is it no better and random guessing
0:04:29or does it go to the optimum base
0:04:31probably that are which by the is not equal to half which is what i mean by fixing the problem
0:04:36we do not equal to zero which is what i mean by fixing the problem difficulty or to something else
0:04:41now
0:04:42to make things more concrete i have two
0:04:44i i is a model so that that's of the talk is based on only this is of a specific
0:04:48model so
0:04:49because need to understand the you got of these issues be side of the base simple model
0:04:53a model is a simple in that
0:04:55the observations are made up of um
0:04:58uh are that some uh the uh the mean location which is lying and some of sensing subspace of think
0:05:04of H as the sensing subspace
0:05:06and even get in last one you are of this look at the you a mean location and one
0:05:10and that
0:05:11you are having a scalar gaussian perturbation along the edge axis
0:05:16a big but for by a vector gaussian noise perturbation which to take your side this
0:05:21subspace into to the gender the P dimensional space
0:05:24so that the uh sensing a model we have a uh and lies the performance
0:05:28and that's and what condition each class of the means are different so be know that the means
0:05:32the are line a subspace and that
0:05:35that's a scalar of but vision component along the subspace for or by but the gaussian perturbation it's takes the
0:05:40subspace
0:05:43so that the uh simple model not and the goal of was is that you are given uh menu of
0:05:47many P dimensional vectors a and P dimensional vector some each class
0:05:51and you to come up with a classifier
0:05:52i understand the asymptotic classification performance for different uh sonatas
0:05:57now
0:05:58a be was a model to be simple to keep things tractable we are does not an article understanding not
0:06:02even though it's fairly simple
0:06:04that as not is that does make sense for example you have a sense an adults and audio
0:06:08but you could have a the use so it's be the dimension of the observation in in the previous slide
0:06:13uh each component being a sense on this case
0:06:15observing some kind of a the line each signal few you
0:06:18and and that last longer observing edge which is a signal
0:06:21i the noise
0:06:23and don't of the different class you of the negative of H i the noise
0:06:27and the board of course is that
0:06:28you a given and observations of the weak signal or sensor
0:06:31i the each class and
0:06:33the question is
0:06:34yeah to come up with a classifier with decides
0:06:36uh a the next observation is but to be which does it a long as a long the last class
0:06:40the negative class
0:06:42no moving ahead a that kind of classifies as for the rest of the talking would do
0:06:46consider are the following
0:06:48we like look at the baseline uh classifier which are is the full based which means you know everything about
0:06:53the models so what is
0:06:54a what is the that's which implements that we were fixed it
0:06:57but gonna get familiar with the notation they're
0:06:59then you wanna look at what a what i the and stuff sure
0:07:02uh classifier which means that i know that it's of the conditionally gaussian observations but i don't know the means
0:07:07that and all the variances quite is as
0:07:09i would i them to estimate everything
0:07:11using maxim like good estimates
0:07:13how to that form
0:07:14and then
0:07:15a finally that look at structure based uh
0:07:17a that additional problems
0:07:19then the first case we look at the structure of it and what exact sensing subspace
0:07:22how does the things behave in those cases
0:07:25the second case i to for a structured maximum likelihood
0:07:28which means that
0:07:29of the estimate a tomatoes
0:07:31annoying uh that is a little low dimensional subspace but i don't know the subspace
0:07:35and finally um
0:07:36you see that
0:07:37yeah have negative results in this case is and the will of more T so a structured sparsity uh more
0:07:44oh as a baseline model
0:07:45so that a likelihood ratio test Q can you can john to the at and you can come up to
0:07:50what is one of the up one like decision rule
0:07:52it's it's gonna be a linear discriminant rule and is based on these parameters that i and mu uh it's
0:07:57not important to know exactly what expressions are
0:08:00that that stands for the difference in the class conditional mean
0:08:03new is the have to the class conditional means and signal i Z equal that ends of the observations so
0:08:08the that can rule depends on these parameters
0:08:11and uh ms in can probably you can about added in closed form
0:08:14it is and of the Q function which is nothing but the T and probably of a standard normal
0:08:18and in terms of these uh a it is M to and what except which were a bit up your
0:08:22here
0:08:23only the important thing is that yeah is you a fixed the difficulty of the problem as the dimension scale
0:08:28which means that i have to fix the argument of the Q function
0:08:31that's that amount essentially fixing on most everything you are in particular the energy of these sensing a a vector
0:08:37H
0:08:38so we wanna keep the norm of edge fixed as things scale and that's an important uh a part of
0:08:42this work
0:08:44so that one of the the full based about looks like
0:08:48oh that's one of the case better we know that it's conditionally got but you know we don't know any
0:08:52of these parameters so
0:08:54this of what the base classifier looks like
0:08:56but i don't know a i don't know the model
0:08:59so i have to estimate all these parameters from the get i given
0:09:03so one approach a actual approach is to use a plug-in estimator which means estimate all these and it does
0:09:08using the did a given
0:09:10and like it into the optimum decision rule
0:09:13that you are you you get a what as well as the uh of the medical fisher rule
0:09:17and you can have a analyse the uh probably do better or you can get a close form expression and
0:09:22look at what happens so that probably at as
0:09:25these samples but dimensions go down to you'll the dimensions english to infinity
0:09:28but you fixed the difficulty of the level
0:09:31lot
0:09:32turns out
0:09:33not surprisingly that be probably a error goes to have
0:09:36which means a no but than random guessing
0:09:38now do not surprising because
0:09:40you're trying to estimate for more parameters than you have data for
0:09:44so asymptotically a you you don't catch up with the uh or or load of information that to estimate
0:09:50so we in the structure in estimating all "'em" it is not a good idea and your
0:09:54uh let's want to
0:09:56structured uh approaches
0:09:58so that's a minus so that does the sensing model
0:10:01and let's suppose and the one extreme not been more tie sensing structure which means that i know the subspace
0:10:06in which the observations lie
0:10:08okay the underlying one dimensional subspace
0:10:11so not natural thing to do in this case of wine not project everything down to the one dimensional subspace
0:10:15right is it was scalar are learning based classification problem
0:10:19estimate all the parameters
0:10:21in that a reduced one some problem using the data you have the maximal some estimates and
0:10:26C of what's
0:10:27okay
0:10:28that leads you do the uh what i what as projected empirical fisher rule
0:10:32and that's the uh i an exact expression iteration at the exact expression is a set was not very important
0:10:37but idea is that you
0:10:38you know the sensing subspace we put giving down to that and reduce is it a one dimensional problem
0:10:43and and the uh the probably did N are shown here
0:10:46asymptotically as the number of samples goes to infinity
0:10:49the out not surprisingly again that
0:10:51i to keep the difficulty level of the problem fixed and a
0:10:54as a the number of samples to infinity
0:10:57the probably of uh N or goes to the base or it probably are which means of the optimum thing
0:11:02you can do
0:11:03now there is a uh it's can expect it is because
0:11:06you know it's one i'm it so that it lit in one and some structure uh in one in this
0:11:10problem and you know it exactly so when you project it down to that problem
0:11:14that that the at the actually dimension of the but data of relevant
0:11:17so P doesn't appear to this equation at all
0:11:20your your the scale classification problem and as we know that when you uh do a mass and that the
0:11:25estimation but in number of an uh a number of samples you can asymptotically get
0:11:29optimal performance
0:11:30but the did a dimension is fixed
0:11:32so in this case the effectively the demonstrated option
0:11:35uh by it takes into account a them a reduction in this or element to this problem
0:11:41now
0:11:42but the the idea of what of that we don't even know in general the sensing structure
0:11:46okay we don't know the sensing subspace so when i is one to estimate the sensing subspace from the data
0:11:50you have
0:11:52so what would be one approach to estimate the sensing subspace
0:11:55but we know what is that if a look at the difference in the class conditional means that are
0:11:59it's actually a aligned with edge
0:12:01"'kay"
0:12:02so it was a lot of that and natural thing to do is to use a maximum that to estimate
0:12:05of the that which was done before
0:12:08and use that of the proxy for edge
0:12:10then produce then project thing down to that that up
0:12:14and then you're back to square the previous situation
0:12:17and uh i again to get a project anybody "'cause" we shouldn't X of that the that action a which
0:12:21project thing is not on the edge because it's not known to you but it's the estimated H
0:12:26what you expect to get here
0:12:28turns out that if you analyse the probability of mouth-position ever
0:12:32as examples for dimension goes to zero
0:12:35and the uh difficulty level is fixed
0:12:37the probability of classification error goes to have
0:12:41which means that even though you knew that was an underlying wind amazon something structure and you know that that
0:12:45that was aligned with that
0:12:47trying to estimate using using a matching like to kind of an estimate
0:12:50didn't
0:12:51doesn't do the job
0:12:52okay you know but and random guessing asymptotically
0:12:55but also it's it's all suggests that you need additional sensing structure to exploit here
0:13:00no although this was not presented in our icassp able um yeah since then be able to show that this
0:13:04fundamental meaning that
0:13:05for this particular problem to analysing here
0:13:08without any additional structure on edge
0:13:10it's impossible for any uh learning a lot of them
0:13:13to do any better than random guessing some importantly
0:13:16so that's not present it an i cast to be appearing elsewhere but it's actually a fundamental be able lower
0:13:20bound of the does of in probably which actually goes to have
0:13:24and if you don't make any assumptions on these sensing structure
0:13:27so that lead more T is the need for a id no structure don't edge
0:13:30and one of the structures but be like to study is of course uh is a is a popular thing
0:13:35these days
0:13:36is uh as possibly okay
0:13:38so
0:13:38uh
0:13:39that's say that the signal that uh that subspace is back the direction is sparse meaning that
0:13:44uh the energy and edge
0:13:46is look lies leave it if you components
0:13:48compared to the number of dimension
0:13:50so in particular let's see that the daily energy of a to this of the effect that edge the man
0:13:54of the vector to the components
0:13:56and their P components
0:13:58and uh let's a pick a truncation point D um and look at the energy G this truncation and the
0:14:02tail of the
0:14:04uh edge vector here
0:14:05as E N P will go to infinity you want the a in a to do to zero
0:14:10so that a certainly a a a statement about the sparsity
0:14:13as simple ks possibly all the signal
0:14:17so in this case uh a natural thing to do is to use a uh so only have used the
0:14:22maximal like to the estimate that a a of the top
0:14:25and that didn't work
0:14:27but not you know something more about edge namely that it's still energy goes to zero so one one interesting
0:14:32to do you can try is why not and K that estimator
0:14:35the component of the estimator
0:14:36and use that as a proxy for that instead
0:14:39the idea is to keep the estimate team i'd only are for all components less and some implication parameter T
0:14:44and then set to you everything beyond
0:14:47so that leads to what condition bayes estimate of the direction along H
0:14:50and i and used i
0:14:52that's the L how how things be
0:14:54a big for show that as the that mentions the number of samples and the truncation point goes to infinity
0:14:59but the truncation one is chosen in such a way
0:15:02that the it goes slower than the number of sample
0:15:05then
0:15:06as important D can estimate
0:15:08this is signal subspace perfect mean that in the mean a sense there are between the a truncated estimate and
0:15:14the true data goes to zero we can as a a to the estimated i mention the subspace and of
0:15:19course if we can estimate the subspace perfectly some got it it's on surprising then that
0:15:22uh as things scale and you could the difficulty level fixed
0:15:25the probably of class of never goes to the base probably
0:15:28another the what's not is the sensing structure
0:15:30but additional sparsity assumptions or some additional structure information
0:15:34can a simple really yeah give you the uh a bayes pro uh probably of
0:15:40he has a little simulation does not uh reinforce some of these insights
0:15:43so here we have fixed the uh is probably other the the difficulty to be point one is fixed throughout
0:15:48as a dimension scale
0:15:50the energy use fixed to not some value to and you're some parameters to than in the model
0:15:55and the number of samples is going slower than the other dimension as shown here
0:16:01um
0:16:01the truncation point
0:16:03uh uh chose into go slower than the number of samples that shown here
0:16:08and yeah one assume a polynomial D K for edge
0:16:11and joint you're up for example of the beam line is the H
0:16:14or the of one pretty localisation of edge
0:16:16and on
0:16:18the uh D D uh the red line is actually the noise the um at some like to to estimate
0:16:23that the had
0:16:24they are normalized to have unit energy
0:16:26sure you're
0:16:27and a blue one is a point conversion of the red one
0:16:30the truncation point the i-th exactly twenty or so
0:16:34on the right side is the probability of error on the vertical axes most of the dimension ambient dimension
0:16:39so that the dimension scales
0:16:41uh the unstructured uh approach where you don't know anything about the sensing structure you try to estimate all the
0:16:48parameters using mac selected estimates
0:16:50we'll approach to be that they probably about it being
0:16:52you could have
0:16:54on the other hand uh if you if you if you knew the sensing subspace but you estimated using nightly
0:16:59using
0:17:00simply that had
0:17:01which is a max um that to estimate
0:17:03then also you get a have
0:17:05but if use the truncation based estimate
0:17:08you are a pros the bayes optimal performance
0:17:12so the control my talk
0:17:13uh the
0:17:14the you take points out that
0:17:16for possible to many problems where you encounter situations where the number of samples that far fewer than the
0:17:21i'm being uh get a dimension
0:17:24in addition that is often exists a lead in sensing structure of the low-dimensional which can be exploited
0:17:30you try to totally ignore the sensing structure and nine to try to estimate everything using mac selected estimates uh
0:17:35you would probably be no better than random guessing in many scenarios
0:17:39and even having a general knowledge of sensing structure like knowing that it's a one dimensional signal edge but i
0:17:43don't know what they choose
0:17:44and trying to estimate a nightly
0:17:46can be it cannot do the job
0:17:49so but only covers if you have a general or something structure plus some additional structure and edge
0:17:54then you can often recover the optimum
0:17:56asymptotically optimum estimation
0:17:59the data into my
0:18:16yeah i think
0:18:17i know which i mean
0:18:19was gonna be departing