"'kay" but everybody uh my name hear close and from dot university of technology and uh i will present some
joint work with uh house you and joe just you're not "'cause" from the university of minnesota
so will talk about a a a bland basically of uh of totally least squares and and sparse reconstruction
so i think you use so many
a sessions here at a i icassp on on compressive sampling so
people give a a little twist uh to that in this in this stock
uh so as some of you might known uh might know i totally squares is a is a method where
you try to basically sold a a set of linear equations
but you hear uh have perturbations both in
the data factor which you normally also have a least squares but also in
this system matrix or the measurement matrix or
whatever you wanna wanna call it
and this has some uh applications in in statistic
for instance in the errors and variables model but
it also has a connection to linear algebra because the solution
it's actually based on a on computing a single value uh decomposition
a people have also extended it is totally squares principle to a case where you have some more prior knowledge
on these perturbations on the on the data vector and this just a matrix
this could be uh statistical
uh knowledge but could also be knowledge on the on the structure
and on the other and you have this large body of work of course on compressive sampling where you for
instance use the L one norm minimization to
to effect sparsity so well not
uh go into details i there
talks enough off uh on that
so what we do here is we basically tried to solve compressive sensing problem but in a case where you
have
perturbations also in the data matrix or somehow
a kind of compares to the
to the problem of the second speaker but
but this time we we use some kind of statistical uncertainty on the on the system matrix instead of a
a worst case a scenario
and uh these perturbations they appear in in in compressive sensing uh through a a yeah in a in the
number of of applications for instance you could have
not i L at is in the implementation of a a of a compression matrix
because off and this is something that should happen in hardware and you off do know exactly
what's going on there so it it's a realistic assumption that you
take into account some perturbations there
but it also has a lot of applications in in uh
and um
compressive sensing uh uh techniques where
you basically try to do some sensing
and you use some kind of a great based approach to uh sense
uh a location or direction of arrival or frequency
and the targets might not be exactly on the grid so you also have
you can model that's that's uh
uh
that our using a perturbation on the on the basis a matrix
uh there are some uh uh a is on on the the performance of of compressive sensing
including such a perturbation in the in the measurement may
but uh those are usually performance analysis of of standard sparse reconstruction methods like a a so or
uh or basis pursuit or a
uh those type of method that so people have looked at what happens to the R P for instance in
that case
and how sensitive these sparse approximant are to that two basis mismatch
here we basically gonna look at a and how but and to take those perturbations into account
um
uh uh we also look at statistical optimality of of that problem and look at uh
global and and local optima and uh some of these results have i have already appeared in
in a transaction on signal processing paper uh recently
today we will focus uh uh more specifically on
the the case where you have this prior knowledge on the on the perturbations such as to correlations and and
structure
so that leads tend to weighted and structured a sparse totally squares
so here you see an outline of that uh use see basically the the an outline of the problem uh
so you have a simple under determined a system of equation and wise i
and uh so you we have a that's equations i don't know and we assume that that
a known vector is sparse so usually
this problem is solved in uh
uh
using uh some kind of a
these squares for instance a regularized weight uh and L one norm on on next so that leads to some
kind of las so problem
and where you try to minimize the square it's a two norm of the residual
uh E which is called you hear a regularized with this L one norm on on H
now this only count basically for at in on the data vector on on Y
but when you also have uh uh at or as in are a matrix in that case
you have to think about other ways other types of uh of
solutions
and one of those uh uh is given by totally square so in a way it's some kind of
uh
way to compute C squares uh in in case you one include some robustness against perturbations on this on this
S system a tree
so in that case you and up with uh
a problem like this where you have
basically uh the square it's probably use where you try to minimize the squared frobenius norm
of the total perturbations both on this system matrix
a and the data vector Y
and this again regular now with
and L one norm constraint on on a
uh and uh the constraint here
yeah and the other constraint that you at is basically the the fact that you should
have a a a a
if you include the R is that you have any equality between the data vector the perturbed data vector and
a
and the uh uh to data matrix times the unknown vector X
uh uh so normally without out this uh without this constraint here without the the the sparsity uh constraints you
basically have a classical totally squares in case you would have a a an over determined system and then the
solution is actually
given by a single value decomposition
that you have to carry out on the composite matrix of a a uh
um concatenated with the data vector one
but here we are also we also have this
uh
uh a extra a L one norm constraint on the on the on the
fact
so basically this problem
uh
we can solve that in case we have no
further information on the structure on of these perturbations
uh if you have so that would be then on that case but if you have
for instance that this tickle knowledge about these perturbations you could think about
a weighted version of that problem
weighted by the inverse of the covariance matrix for is on these perturbations
or you could also look at the structured version where you take in uh where you take the structure into
account on
this corpus that a matrix of eight
concatenated with Y
and uh this this happens in various applications uh for instance in in in
deconvolution or system identification or a linear prediction where these
the this a matrix a often will be to plates or hankel
but you could also have a circle and structures or from them on the structures
for instance an harmonic retrieval
and and also when you have these all these uh a grid based the sensing approach
in that case this a is basically
constructed by for this complex exponentials so you also have this from them on the structure there in
in this type of of a problem
and the structure uh mathematically this is basically a model as a some kind of a function of this comp
it's
uh
matrix
as a function of this uh parameter vector B
so all the structure is basically model in this way so there's a unique mapping between the parameter vector
and this composite matrix which is also called has here
and and the the the problem that we're solving here is basically a weighted and structured sparse total squares problem
where we gonna uh minimize a a uh squared weighted norm
on at along which is now the perturbation not on this the the composite matrix as but on the
parameter vector so you basically minimize
an or now on the perturbation of the parameter vector which is the standard approach in
in a a weighted and structured total squares
and now we again at here the the sparsity constraint
uh two that's uh cost function
and a gang subject to uh uh this uh you quality here which is basically
the same equality quality as about we had
you on this slide but now it's
a a given in of as a function of uh uh the parameter vector P and it's
a distortion have so this might not necessarily be uh a a your equation now
that depends on how you can model the the structure
so that's why we introduce a and assumption
uh
where i will start with a actually the second part where we model as as actually a linear function on
a find function if you want
all of a piece so basically what we assume is that
uh as of P can be expressed as a linear function of P so this constraint or can be
transformed into a linear
uh a constraint
the second part is it's
yeah more of a a of a notational uh
assumptions so that makes things a little bit easier so that we make also the structure and S bowls so
we split it in
a parameter vector up or a part of parameters that's related to the system matrix a and a part that's
related to the data vector
uh why
so that that the to perturbations on those factors can be
for instance like and separately
and i after whitening that happens here you basically get the following problem
so this absolutely a a is the perturbation on the parameter vector related to a
and have lies the
perturbation
related to the data vector uh why but they're both white and now
according to the their inverse covariance major
and again here the here is that now this linear your expression which is due to the fact that we
assume the linear form
for this uh as for this structure in the in the matrix
in the in the perturbation
so this what we have to solve
uh and of course you could always
uh we
at so may or epsilon why by
the uh a its solution that's given by the constraint and make their it into an unconstrained problem for instance
you could replace absolute why high the solution that's a given
here so then you get an unconstrained problem but it's
uh a non compact
a problem that you have to solve because you have here both
it's and and the unknown perturbation on the
on the system matrix
what before we start looking into solutions uh
there's some uh uh uh a i mean there's some optimality related to that to that problems
you can interpret uh the solution of this
problem in both X and S all as a
maximum a posteriori optimal solution
under certain conditions
uh the conditions are that you need the option
perturbations
and uh that's uh uh there's a dependence between all the all the variables and also that's the parameter vector
on a
that it's kind of an informative for a uniform distribution
and that the the brown but the unknown vector X that this one is not much and distribute it so
on the does
circumstances you can show that
the solution of to this problem gives you the maximum a posteriori optimality
so
this problem it's not it has some statistical uh into it can it has some statistical uh meaning
so to solve this we thing for is about an alternating descent method where you basically uh
uh solve all for all uh iteratively between a a on a so the perturbation on the system matrix
and X so you could for instance of fix absolutely eight
so i could fix it here
and fix it here and in that case it becomes like a
a classical uh
uh
but a bit altered so a loss so like problem
so basically the solution can be found by an uh algorithm uh the that
has been proposed to to solve a
uh
sparse reconstruction problems using the least squares a cost function
and then once it's just give you can update
you can use that uh solution to uh
of to uh
find the result for the perturbation on the system matrix and
a if a X is given and everything becomes a a a a a a a uh unconstrained quadratic program
so then you for apps on you can find to the solution then
in closed four
and if you start with a perturbation that's equal to zero you basically start
with the solution that's given by the classical loss to problem
and you can show that you always uh uh uh
improve your cost function that you go that you converge at least
a a a stationary point
so
uh
within this cost function you can show this way that you will always improve upon the to the the classical
solution the classical a so so good solution
of course to salt and this loss a problem you can use your favourite us solve over
what you could also do is you courts uh use scored in a test set a court of this and
there to solve
uh the last so
with which basically means that
you a fixed all the entries except for one in your X factor and that you sold that one
separately and then it becomes like a scalar loss so
which gives you a closed-form form solution
my means of a soft thresholding
so and you can
do that those iterations altogether together so you can basically alternate between
have a and then every entry of X
and then go back to actual a and then sold that for every do we have big separately
and also that one so that gives you a global
uh core descent method that can also be shown to converge to at least a stationary point
and
of course
that this is not necessarily the global optimum but at least you know that you improve upon the the initial
solution which is the the lasso solution
so here are some uh them are coal the comparison so we assume
a that we have a a a a twenty by forty T a matrix so it's a compression here
uh uh fifty percent you could say
there's some stupid structure in a matrix we assume also different variances on
on a and Y
so note that also the uh on the perturbations on on the and Y so also the perturbation on a
has
has a to structure
and the signal vector X here is generated with ten and nonzero entries
and what is shown here is the L zero at or versus the parameter longer
and the L one at versus the parameter a longer which basically
this land like basically gives you a trade-off between
uh solving that totally squares problem and uh
yeah we use parts at each so the the bigger the long as the more
wait you give to uh to this a sparse solution
and you see that uh the best solution here in that so the L zero at or uh so this
is basically related to support recovery so it's that percent H of
uh and trees where to support between the true solution and the estimate solution or
or not the same
so this tells you something about support recovery
and there you see that if you take everything into account so the blue curve here
the weight it's uh structured sparse totally square you get the basically the best
uh a sparsity recovery
if you just take uh the weights
the the correlations into account
or the structure so these are the red curves and the black curves
then you
get a a a little bit uh uh uh
bigger
uh L zero at errors
and if you only do uh
a if you don't take any weight or structure into account you are you're a a little bit
uh worse and a loss so gives you basically the the worst L zero at
for the L one at or or so this is basically the L one norm data or the the the
a performance as are the that's it's closer to each other but of course supports recovery is is
the most important in many many of these uh application
a like i told you before this this approach is very useful in in cases where you uh do sensing
and you use some kind of a grid based approach
so that for instance uh can be used in direction of arrival estimation
where you basically can uh uh
divides the whole anger or space into different grid points into a a a or or and and angle great
winces is every two degrees you can pick it a grid point
and in that case you could express your received vector or Y T V as a linear combination of a
array response vectors so
basically this tells you this first here at don't want tells you
uh how the system would be received out the target would be received the signal would be received if it
comes in on an angle of arrival of
T one
so you get a linear combination of all these uh
uh array response vectors on the different grid points
but of course and and these X contains the combining weights
but of course the combining weights will be sparse because only where you have a target you will have a
combining way
uh of course whenever you have uh
targets that are in between the great
you
the this quality will not be exactly true and there's some kind of
perturbation
on on the
on the grid
so you could say that the the true
uh
the true exponent all five
in you or uh of your source
uh could be then model modelled as
uh
the exponent in you are uh grid points
plus some some than your correction
because like i said before we wanna make you wanna have a the perturbations in a in a linear form
so we want
to have an a find expression for the perturbations so
that means that in this case we need some kind of
approximation because there is
a lot of structure in this
uh and in these perturbations but it's not a your so we approximated by a here
uh
by the find function of of the parameter fact
i'm not gonna go into the details here
uh and so that allows you then to uh
to get a better estimate because next to solving for X you also allow these a grid points basically to
shift to the two solutions
so if you have a
a source that is uh somewhere in between the two good points
because you allows for perturbations on this a matrix a great point might be
shifting to the true uh solution
so you get some kind of super resolution of fact uh for free and this approach
uh other approaches usually start from a rubber great and then they we find a grid
uh in those locations where you have a a the target
here you got a basically in in one shot
uh for is as an example where you have H we see don't and and on time as an ninety
great points
uh
so every two degrees you have a great point and you have a source at one degree and wanted minus
nine degree
so there are exactly in between two grid point
and then you see that the classical us to basically give you uh four nonzeros
basically the the grid points around the
the the sources
you could say a okay we can interpolate those and then we get the solution but
you can only do that if you know that you have only two sources of course if you don't know
the number of sources you could think that
there are four sources now in this in this problem
while the the weighted that's touch at uh
sparse totally squares
gives you basically two peaks in the in the red locations which
which correspond to these black arrows where the true sources located so the great basically it's to the
to the right position
uh you see here also another or all but
uh a is that this is indeed be so this dot this kind of twenty db below the the the
first up
so you you basically
can also do some kind of a a number of sources cover using using this approach
so i think
that brings me to the to the conclusion so we've proposed as a weighted and structured to sparse uh a
totally squares problem
which is motivated by
first of all the fact that you have non i'd yell at these in the in a
in uh compression matrix but it's also motivated by a lot of these sensing applications
so we can account for also correlations and structure in these perturbations
and we show uh looked at the the uh map optimality of of of this uh a problem
and we looked at uh a reduced complexity alternating descent and coordinated descent solutions
uh ongoing and future research consist of recursive and robust implementations of of this method
uh we also try to see whether are also the svd can be used in in some of these problems
for since basically solving an a D also bows down to an iterative method
do you you one there whether in those iterations you could also include
uh sparsity uh and and still use
uh i C D based uh a type of methods to solve a also a sparse totally squares
so that concludes uh the my presentation
hmmm
any questions
i
i was thinking
um
much more as the complexity of your or a solution
um
i mean
what do use for a a a a a large problem so there um
a microphone array very or something that
well i can say that the
the uh the the the of the complexity is basically determined by how you solve this uh the the initial
sparse reconstruction from
and and a and you do that
iteratively so you do that maybe a
five times
i don't know exactly how many iterations that we used here but
and general we don't need to i mean you can stop ever you want to right after one iteration you
know that
you're are already improve upon
the classical uh a sparse reconstruction method so
you could do it for instance uh you can solve
you can say you are we have twice the complexity
because solving than for the perturbation that's a closed form expressions so that's not
uh the biggest complex
so it depends basically on the solver or that you use for it is a sparse reconstruction
just
as one to comment and of the question
how how complex this is
it can be some times less complex them
or each now L S
because you have to remember that a chunk ls
tales and is really
right
a can be more
a less complex and how are you
this is way
what is worse
mentioned should use that
when people he of to regular ties
okay the L S
then a this sense of this use of the world
so again can you mean just all of a
some sort of each tentative
one station
right
yeah the regularized even to less with a quadratic
yeah
um
okay can you
change of the work for a while
instead of
have one
a different from
yeah that's that's possible because okay in the i mean in in
in these iterations i mean the first start
all the iterations you basically set your perturbation to zero and then it could be
instead of a loss problem then you have a a an type of a sparse vector lies to
a problem that you could solve in the in that step
uh uh and and whatever you fixed
and the solution okay in the second step of the it's always a close form expression for the perturbations so
you could change is
L one
thank you
yeah
and
yeah
come back into a a a high resolution a a lot connotation technique you estimation techniques
what's that
snr threshold
when you use this kind of compressive sensing inspired take
yeah i i don't know we are we should we should uh a yeah we should test it on on
on more applications so this is more let's say the the theoretical framework
and now okay we have to check this on on many different applications
the thing is you can use here a kind of a rough great right as an initial point and
question is how rough can you make your great now how much can you correct for
a but that's something we didn't to analytically uh yeah
investigate
yeah we have similar but uh a theoretical analysis think that yeah
yeah
uh
well
uh
well for you so see you know we have also there is also spectrum sensing application but
so it's always compared to the there the the standard uh a sparse reconstruction methods
so uh
is it is that the comparison you would like
i mean
oh okay yeah and and actually that's
the
yeah
so we are yeah this one i i didn't show it but
so this is when you do and really and it's some some uh
uh for different monte carlo simulations you see what happens if you
compare loss so which is the blue curve
this is what actual before where you have to for peaks
right you could say okay you integrates and then you get this blue dashed line
what you see that even
the the full the weighted sparse total score still does better than the integration
and if we integrate we don't gain anything because we already are the good solution so
so this is a guy for the direction of right
so even with interpolation although you need to know the number of sources for this interpolation but
even and we we we have a better performance
okay i think you