this work was supported in part by grants from the
you states uh so it was a as of side it is so much and the it's national science foundation
only go back
okay so um
the topic of the talk is on classification
so
in a a model based classification as you all of there
yeah given
a a a a a prior distribution on the classes and uh
the and D like to function of the observations given class
and given these two things we can come up with the uh minimum probability of error decision rule
which i the on noise a maximum a posterior probably who
but simplifies to the maximum likelihood rule for you likely classes
so that that's model based
uh classifications not a better the model is from specified then you can in principle come up with the optimum
decision
in contrast to the sum of our uh be a the it in what is gonna of learning based classification
read everything is get a driven
so you only given examples of the two classes say
and you wanna come up with an a got of them which separates these class
the the channel have a lower that T wish that this is in this scenario is
that
very you often you encounter situations that you a high dimensional data for example you have a
so the billions video which lines be big by a by to get a
you have hyperspectral images you have you know synthetic aperture radar images and so forth
so get a high dimensional data on the one hand and you very few examples
compared to the dimensionality of the data on the other hand
now you might say well why not just use a generic uh
i did a reduction technique like
say pca A or L L E or so map
well on the one hand these are really generate methods
which are you know not
but it really do device so the classification problems so the optimized sort another generate method mow uh measures of
oh such as an preserving get about distances and so forth on the one hand
and don't know other hand
they haven't been designed with the view to high dimensionality prob the problem but that if you example
so our approach is to sort of exploit
what i shall call as the
latent in low dimensional sensing structure
now to make
this clear let's take a the cartoon example
let's suppose that
you given examples of each class only two classes here
and a learning based as a vision got in the such as svm but a kernel svm
which simply take the data and a lot a classification rule
in completely ignore
if any as C structure was present or not
in contrast to the this is
what i would call sensing of and classification where let's say we know that that these observations came from some
the sensing process
say for example the blurring operator
or of we may have either full or partial information about the blurring operator
and to the with some noise
and the question is can exploit knowledge of the fact that these observations came from someone underlying sensing structure
oh a the classification performance
no
yeah actually the it in a to the study of one is on the fundamental asymptotic limits of classification in
the
so audio of high dimensional was and very few samples
them to make things more concrete let's we assume that the the uh the did i mention and possibly the
number of samples
uh goes to infinity
while these samples per dimension
most of you
so that this a not of a you have that if you samples are very high dimensional data
but
in contrast to a a number of studies in the literature which has focused on
and S imported easy situation be want fixed the problem difficult D S imported you meaning that even if the
dimension increases to infinity
as it's not going to be easy to classify
and
for what is essentially means is that have fixing the signal to noise ratio as the problem schemes and this
would be considered as i do the mathematical more
a fundamental issues that be vision as is one is yes and it that's if you can performance
uh i that this asymptotic G
does it probably that are good to half which means
is it no better and random guessing
or does it go to the optimum base
probably that are which by the is not equal to half which is what i mean by fixing the problem
we do not equal to zero which is what i mean by fixing the problem difficulty or to something else
now
to make things more concrete i have two
i i is a model so that that's of the talk is based on only this is of a specific
model so
because need to understand the you got of these issues be side of the base simple model
a model is a simple in that
the observations are made up of um
uh are that some uh the uh the mean location which is lying and some of sensing subspace of think
of H as the sensing subspace
and even get in last one you are of this look at the you a mean location and one
and that
you are having a scalar gaussian perturbation along the edge axis
a big but for by a vector gaussian noise perturbation which to take your side this
subspace into to the gender the P dimensional space
so that the uh sensing a model we have a uh and lies the performance
and that's and what condition each class of the means are different so be know that the means
the are line a subspace and that
that's a scalar of but vision component along the subspace for or by but the gaussian perturbation it's takes the
subspace
so that the uh simple model not and the goal of was is that you are given uh menu of
many P dimensional vectors a and P dimensional vector some each class
and you to come up with a classifier
i understand the asymptotic classification performance for different uh sonatas
now
a be was a model to be simple to keep things tractable we are does not an article understanding not
even though it's fairly simple
that as not is that does make sense for example you have a sense an adults and audio
but you could have a the use so it's be the dimension of the observation in in the previous slide
uh each component being a sense on this case
observing some kind of a the line each signal few you
and and that last longer observing edge which is a signal
i the noise
and don't of the different class you of the negative of H i the noise
and the board of course is that
you a given and observations of the weak signal or sensor
i the each class and
the question is
yeah to come up with a classifier with decides
uh a the next observation is but to be which does it a long as a long the last class
the negative class
no moving ahead a that kind of classifies as for the rest of the talking would do
consider are the following
we like look at the baseline uh classifier which are is the full based which means you know everything about
the models so what is
a what is the that's which implements that we were fixed it
but gonna get familiar with the notation they're
then you wanna look at what a what i the and stuff sure
uh classifier which means that i know that it's of the conditionally gaussian observations but i don't know the means
that and all the variances quite is as
i would i them to estimate everything
using maxim like good estimates
how to that form
and then
a finally that look at structure based uh
a that additional problems
then the first case we look at the structure of it and what exact sensing subspace
how does the things behave in those cases
the second case i to for a structured maximum likelihood
which means that
of the estimate a tomatoes
annoying uh that is a little low dimensional subspace but i don't know the subspace
and finally um
you see that
yeah have negative results in this case is and the will of more T so a structured sparsity uh more
oh as a baseline model
so that a likelihood ratio test Q can you can john to the at and you can come up to
what is one of the up one like decision rule
it's it's gonna be a linear discriminant rule and is based on these parameters that i and mu uh it's
not important to know exactly what expressions are
that that stands for the difference in the class conditional mean
new is the have to the class conditional means and signal i Z equal that ends of the observations so
the that can rule depends on these parameters
and uh ms in can probably you can about added in closed form
it is and of the Q function which is nothing but the T and probably of a standard normal
and in terms of these uh a it is M to and what except which were a bit up your
here
only the important thing is that yeah is you a fixed the difficulty of the problem as the dimension scale
which means that i have to fix the argument of the Q function
that's that amount essentially fixing on most everything you are in particular the energy of these sensing a a vector
H
so we wanna keep the norm of edge fixed as things scale and that's an important uh a part of
this work
so that one of the the full based about looks like
oh that's one of the case better we know that it's conditionally got but you know we don't know any
of these parameters so
this of what the base classifier looks like
but i don't know a i don't know the model
so i have to estimate all these parameters from the get i given
so one approach a actual approach is to use a plug-in estimator which means estimate all these and it does
using the did a given
and like it into the optimum decision rule
that you are you you get a what as well as the uh of the medical fisher rule
and you can have a analyse the uh probably do better or you can get a close form expression and
look at what happens so that probably at as
these samples but dimensions go down to you'll the dimensions english to infinity
but you fixed the difficulty of the level
lot
turns out
not surprisingly that be probably a error goes to have
which means a no but than random guessing
now do not surprising because
you're trying to estimate for more parameters than you have data for
so asymptotically a you you don't catch up with the uh or or load of information that to estimate
so we in the structure in estimating all "'em" it is not a good idea and your
uh let's want to
structured uh approaches
so that's a minus so that does the sensing model
and let's suppose and the one extreme not been more tie sensing structure which means that i know the subspace
in which the observations lie
okay the underlying one dimensional subspace
so not natural thing to do in this case of wine not project everything down to the one dimensional subspace
right is it was scalar are learning based classification problem
estimate all the parameters
in that a reduced one some problem using the data you have the maximal some estimates and
C of what's
okay
that leads you do the uh what i what as projected empirical fisher rule
and that's the uh i an exact expression iteration at the exact expression is a set was not very important
but idea is that you
you know the sensing subspace we put giving down to that and reduce is it a one dimensional problem
and and the uh the probably did N are shown here
asymptotically as the number of samples goes to infinity
the out not surprisingly again that
i to keep the difficulty level of the problem fixed and a
as a the number of samples to infinity
the probably of uh N or goes to the base or it probably are which means of the optimum thing
you can do
now there is a uh it's can expect it is because
you know it's one i'm it so that it lit in one and some structure uh in one in this
problem and you know it exactly so when you project it down to that problem
that that the at the actually dimension of the but data of relevant
so P doesn't appear to this equation at all
your your the scale classification problem and as we know that when you uh do a mass and that the
estimation but in number of an uh a number of samples you can asymptotically get
optimal performance
but the did a dimension is fixed
so in this case the effectively the demonstrated option
uh by it takes into account a them a reduction in this or element to this problem
now
but the the idea of what of that we don't even know in general the sensing structure
okay we don't know the sensing subspace so when i is one to estimate the sensing subspace from the data
you have
so what would be one approach to estimate the sensing subspace
but we know what is that if a look at the difference in the class conditional means that are
it's actually a aligned with edge
"'kay"
so it was a lot of that and natural thing to do is to use a maximum that to estimate
of the that which was done before
and use that of the proxy for edge
then produce then project thing down to that that up
and then you're back to square the previous situation
and uh i again to get a project anybody "'cause" we shouldn't X of that the that action a which
project thing is not on the edge because it's not known to you but it's the estimated H
what you expect to get here
turns out that if you analyse the probability of mouth-position ever
as examples for dimension goes to zero
and the uh difficulty level is fixed
the probability of classification error goes to have
which means that even though you knew that was an underlying wind amazon something structure and you know that that
that was aligned with that
trying to estimate using using a matching like to kind of an estimate
didn't
doesn't do the job
okay you know but and random guessing asymptotically
but also it's it's all suggests that you need additional sensing structure to exploit here
no although this was not presented in our icassp able um yeah since then be able to show that this
fundamental meaning that
for this particular problem to analysing here
without any additional structure on edge
it's impossible for any uh learning a lot of them
to do any better than random guessing some importantly
so that's not present it an i cast to be appearing elsewhere but it's actually a fundamental be able lower
bound of the does of in probably which actually goes to have
and if you don't make any assumptions on these sensing structure
so that lead more T is the need for a id no structure don't edge
and one of the structures but be like to study is of course uh is a is a popular thing
these days
is uh as possibly okay
so
uh
that's say that the signal that uh that subspace is back the direction is sparse meaning that
uh the energy and edge
is look lies leave it if you components
compared to the number of dimension
so in particular let's see that the daily energy of a to this of the effect that edge the man
of the vector to the components
and their P components
and uh let's a pick a truncation point D um and look at the energy G this truncation and the
tail of the
uh edge vector here
as E N P will go to infinity you want the a in a to do to zero
so that a certainly a a a statement about the sparsity
as simple ks possibly all the signal
so in this case uh a natural thing to do is to use a uh so only have used the
maximal like to the estimate that a a of the top
and that didn't work
but not you know something more about edge namely that it's still energy goes to zero so one one interesting
to do you can try is why not and K that estimator
the component of the estimator
and use that as a proxy for that instead
the idea is to keep the estimate team i'd only are for all components less and some implication parameter T
and then set to you everything beyond
so that leads to what condition bayes estimate of the direction along H
and i and used i
that's the L how how things be
a big for show that as the that mentions the number of samples and the truncation point goes to infinity
but the truncation one is chosen in such a way
that the it goes slower than the number of sample
then
as important D can estimate
this is signal subspace perfect mean that in the mean a sense there are between the a truncated estimate and
the true data goes to zero we can as a a to the estimated i mention the subspace and of
course if we can estimate the subspace perfectly some got it it's on surprising then that
uh as things scale and you could the difficulty level fixed
the probably of class of never goes to the base probably
another the what's not is the sensing structure
but additional sparsity assumptions or some additional structure information
can a simple really yeah give you the uh a bayes pro uh probably of
he has a little simulation does not uh reinforce some of these insights
so here we have fixed the uh is probably other the the difficulty to be point one is fixed throughout
as a dimension scale
the energy use fixed to not some value to and you're some parameters to than in the model
and the number of samples is going slower than the other dimension as shown here
um
the truncation point
uh uh chose into go slower than the number of samples that shown here
and yeah one assume a polynomial D K for edge
and joint you're up for example of the beam line is the H
or the of one pretty localisation of edge
and on
the uh D D uh the red line is actually the noise the um at some like to to estimate
that the had
they are normalized to have unit energy
sure you're
and a blue one is a point conversion of the red one
the truncation point the i-th exactly twenty or so
on the right side is the probability of error on the vertical axes most of the dimension ambient dimension
so that the dimension scales
uh the unstructured uh approach where you don't know anything about the sensing structure you try to estimate all the
parameters using mac selected estimates
we'll approach to be that they probably about it being
you could have
on the other hand uh if you if you if you knew the sensing subspace but you estimated using nightly
using
simply that had
which is a max um that to estimate
then also you get a have
but if use the truncation based estimate
you are a pros the bayes optimal performance
so the control my talk
uh the
the you take points out that
for possible to many problems where you encounter situations where the number of samples that far fewer than the
i'm being uh get a dimension
in addition that is often exists a lead in sensing structure of the low-dimensional which can be exploited
you try to totally ignore the sensing structure and nine to try to estimate everything using mac selected estimates uh
you would probably be no better than random guessing in many scenarios
and even having a general knowledge of sensing structure like knowing that it's a one dimensional signal edge but i
don't know what they choose
and trying to estimate a nightly
can be it cannot do the job
so but only covers if you have a general or something structure plus some additional structure and edge
then you can often recover the optimum
asymptotically optimum estimation
the data into my
yeah i think
i know which i mean
was gonna be departing