Speech Transcript - WEIGHTED AND STRUCTURED SPARSE TOTAL LEAST-SQUARES FOR PERTURBED COMPRESSIVE SAMPLING

0:00:19	"'kay" but everybody uh my name hear close and from dot university of technology and uh i will present some
0:00:25	joint work with uh house you and joe just you're not "'cause" from the university of minnesota
0:00:30	so will talk about a a a bland basically of uh of totally least squares and and sparse reconstruction
0:00:36	so i think you use so many
0:00:38	a sessions here at a i icassp on on compressive sampling so
0:00:42	people give a a little twist uh to that in this in this stock
0:00:46	uh so as some of you might known uh might know i totally squares is a is a method where
0:00:52	you try to basically sold a a set of linear equations
0:00:55	but you hear uh have perturbations both in
0:00:59	the data factor which you normally also have a least squares but also in
0:01:03	this system matrix or the measurement matrix or
0:01:06	whatever you wanna wanna call it
0:01:08	and this has some uh applications in in statistic
0:01:12	for instance in the errors and variables model but
0:01:14	it also has a connection to linear algebra because the solution
0:01:18	it's actually based on a on computing a single value uh decomposition
0:01:23	a people have also extended it is totally squares principle to a case where you have some more prior knowledge
0:01:29	on these perturbations on the on the data vector and this just a matrix
0:01:33	this could be uh statistical
0:01:35	uh knowledge but could also be knowledge on the on the structure
0:01:39	and on the other and you have this large body of work of course on compressive sampling where you for
0:01:43	instance use the L one norm minimization to
0:01:46	to effect sparsity so well not
0:01:48	uh go into details i there
0:01:50	talks enough off uh on that
0:01:53	so what we do here is we basically tried to solve compressive sensing problem but in a case where you
0:01:58	have
0:01:59	perturbations also in the data matrix or somehow
0:02:02	a kind of compares to the
0:02:04	to the problem of the second speaker but
0:02:06	but this time we we use some kind of statistical uncertainty on the on the system matrix instead of a
0:02:13	a worst case a scenario
0:02:16	and uh these perturbations they appear in in in compressive sensing uh through a a yeah in a in the
0:02:22	number of of applications for instance you could have
0:02:25	not i L at is in the implementation of a a of a compression matrix
0:02:29	because off and this is something that should happen in hardware and you off do know exactly
0:02:33	what's going on there so it it's a realistic assumption that you
0:02:36	take into account some perturbations there
0:02:39	but it also has a lot of applications in in uh
0:02:43	and um
0:02:44	compressive sensing uh uh techniques where
0:02:47	you basically try to do some sensing
0:02:49	and you use some kind of a great based approach to uh sense
0:02:53	uh a location or direction of arrival or frequency
0:02:56	and the targets might not be exactly on the grid so you also have
0:03:00	you can model that's that's uh
0:03:02	uh
0:03:03	that our using a perturbation on the on the basis a matrix
0:03:08	uh there are some uh uh a is on on the the performance of of compressive sensing
0:03:13	including such a perturbation in the in the measurement may
0:03:17	but uh those are usually performance analysis of of standard sparse reconstruction methods like a a so or
0:03:23	uh or basis pursuit or a
0:03:25	uh those type of method that so people have looked at what happens to the R P for instance in
0:03:28	that case
0:03:30	and how sensitive these sparse approximant are to that two basis mismatch
0:03:34	here we basically gonna look at a and how but and to take those perturbations into account
0:03:39	um
0:03:41	uh uh we also look at statistical optimality of of that problem and look at uh
0:03:47	global and and local optima and uh some of these results have i have already appeared in
0:03:51	in a transaction on signal processing paper uh recently
0:03:54	today we will focus uh uh more specifically on
0:03:58	the the case where you have this prior knowledge on the on the perturbations such as to correlations and and
0:04:03	structure
0:04:04	so that leads tend to weighted and structured a sparse totally squares
0:04:10	so here you see an outline of that uh use see basically the the an outline of the problem uh
0:04:15	so you have a simple under determined a system of equation and wise i
0:04:19	and uh so you we have a that's equations i don't know and we assume that that
0:04:24	a known vector is sparse so usually
0:04:26	this problem is solved in uh
0:04:28	uh
0:04:29	using uh some kind of a
0:04:31	these squares for instance a regularized weight uh and L one norm on on next so that leads to some
0:04:35	kind of las so problem
0:04:37	and where you try to minimize the square it's a two norm of the residual
0:04:42	uh E which is called you hear a regularized with this L one norm on on H
0:04:48	now this only count basically for at in on the data vector on on Y
0:04:53	but when you also have uh uh at or as in are a matrix in that case
0:04:57	you have to think about other ways other types of uh of
0:05:01	solutions
0:05:02	and one of those uh uh is given by totally square so in a way it's some kind of
0:05:06	uh
0:05:07	way to compute C squares uh in in case you one include some robustness against perturbations on this on this
0:05:13	S system a tree
0:05:15	so in that case you and up with uh
0:05:17	a problem like this where you have
0:05:19	basically uh the square it's probably use where you try to minimize the squared frobenius norm
0:05:24	of the total perturbations both on this system matrix
0:05:28	a and the data vector Y
0:05:30	and this again regular now with
0:05:32	and L one norm constraint on on a
0:05:35	uh and uh the constraint here
0:05:38	yeah and the other constraint that you at is basically the the fact that you should
0:05:42	have a a a a
0:05:43	if you include the R is that you have any equality between the data vector the perturbed data vector and
0:05:48	a
0:05:48	and the uh uh to data matrix times the unknown vector X
0:05:54	uh uh so normally without out this uh without this constraint here without the the the sparsity uh constraints you
0:06:00	basically have a classical totally squares in case you would have a a an over determined system and then the
0:06:04	solution is actually
0:06:06	given by a single value decomposition
0:06:08	that you have to carry out on the composite matrix of a a uh
0:06:13	um concatenated with the data vector one
0:06:16	but here we are also we also have this
0:06:18	uh
0:06:20	uh a extra a L one norm constraint on the on the on the
0:06:23	fact
0:06:27	so basically this problem
0:06:29	uh
0:06:30	we can solve that in case we have no
0:06:32	further information on the structure on of these perturbations
0:06:36	uh if you have so that would be then on that case but if you have
0:06:40	for instance that this tickle knowledge about these perturbations you could think about
0:06:44	a weighted version of that problem
0:06:46	weighted by the inverse of the covariance matrix for is on these perturbations
0:06:50	or you could also look at the structured version where you take in uh where you take the structure into
0:06:54	account on
0:06:55	this corpus that a matrix of eight
0:06:58	concatenated with Y
0:07:00	and uh this this happens in various applications uh for instance in in in
0:07:05	deconvolution or system identification or a linear prediction where these
0:07:09	the this a matrix a often will be to plates or hankel
0:07:12	but you could also have a circle and structures or from them on the structures
0:07:16	for instance an harmonic retrieval
0:07:18	and and also when you have these all these uh a grid based the sensing approach
0:07:22	in that case this a is basically
0:07:25	constructed by for this complex exponentials so you also have this from them on the structure there in
0:07:30	in this type of of a problem
0:07:32	and the structure uh mathematically this is basically a model as a some kind of a function of this comp
0:07:38	it's
0:07:39	uh
0:07:40	matrix
0:07:41	as a function of this uh parameter vector B
0:07:44	so all the structure is basically model in this way so there's a unique mapping between the parameter vector
0:07:49	and this composite matrix which is also called has here
0:07:53	and and the the the problem that we're solving here is basically a weighted and structured sparse total squares problem
0:08:00	where we gonna uh minimize a a uh squared weighted norm
0:08:04	on at along which is now the perturbation not on this the the composite matrix as but on the
0:08:09	parameter vector so you basically minimize
0:08:12	an or now on the perturbation of the parameter vector which is the standard approach in
0:08:16	in a a weighted and structured total squares
0:08:19	and now we again at here the the sparsity constraint
0:08:23	uh two that's uh cost function
0:08:26	and a gang subject to uh uh this uh you quality here which is basically
0:08:31	the same equality quality as about we had
0:08:33	you on this slide but now it's
0:08:35	a a given in of as a function of uh uh the parameter vector P and it's
0:08:39	a distortion have so this might not necessarily be uh a a your equation now
0:08:44	that depends on how you can model the the structure
0:08:48	so that's why we introduce a and assumption
0:08:51	uh
0:08:52	where i will start with a actually the second part where we model as as actually a linear function on
0:08:58	a find function if you want
0:09:00	all of a piece so basically what we assume is that
0:09:03	uh as of P can be expressed as a linear function of P so this constraint or can be
0:09:08	transformed into a linear
0:09:10	uh a constraint
0:09:13	the second part is it's
0:09:14	yeah more of a a of a notational uh
0:09:17	assumptions so that makes things a little bit easier so that we make also the structure and S bowls so
0:09:22	we split it in
0:09:23	a parameter vector up or a part of parameters that's related to the system matrix a and a part that's
0:09:28	related to the data vector
0:09:30	uh why
0:09:32	so that that the to perturbations on those factors can be
0:09:35	for instance like and separately
0:09:37	and i after whitening that happens here you basically get the following problem
0:09:41	so this absolutely a a is the perturbation on the parameter vector related to a
0:09:45	and have lies the
0:09:47	perturbation
0:09:49	related to the data vector uh why but they're both white and now
0:09:52	according to the their inverse covariance major
0:09:55	and again here the here is that now this linear your expression which is due to the fact that we
0:10:00	assume the linear form
0:10:01	for this uh as for this structure in the in the matrix
0:10:05	in the in the perturbation
0:10:08	so this what we have to solve
0:10:09	uh and of course you could always
0:10:11	uh we
0:10:13	at so may or epsilon why by
0:10:15	the uh a its solution that's given by the constraint and make their it into an unconstrained problem for instance
0:10:21	you could replace absolute why high the solution that's a given
0:10:24	here so then you get an unconstrained problem but it's
0:10:27	uh a non compact
0:10:29	a problem that you have to solve because you have here both
0:10:32	it's and and the unknown perturbation on the
0:10:34	on the system matrix
0:10:38	what before we start looking into solutions uh
0:10:41	there's some uh uh uh a i mean there's some optimality related to that to that problems
0:10:46	you can interpret uh the solution of this
0:10:48	problem in both X and S all as a
0:10:51	maximum a posteriori optimal solution
0:10:53	under certain conditions
0:10:55	uh the conditions are that you need the option
0:10:58	perturbations
0:10:59	and uh that's uh uh there's a dependence between all the all the variables and also that's the parameter vector
0:11:06	on a
0:11:07	that it's kind of an informative for a uniform distribution
0:11:10	and that the the brown but the unknown vector X that this one is not much and distribute it so
0:11:15	on the does
0:11:16	circumstances you can show that
0:11:19	the solution of to this problem gives you the maximum a posteriori optimality
0:11:24	so
0:11:25	this problem it's not it has some statistical uh into it can it has some statistical uh meaning
0:11:33	so to solve this we thing for is about an alternating descent method where you basically uh
0:11:38	uh solve all for all uh iteratively between a a on a so the perturbation on the system matrix
0:11:45	and X so you could for instance of fix absolutely eight
0:11:48	so i could fix it here
0:11:50	and fix it here and in that case it becomes like a
0:11:52	a classical uh
0:11:55	uh
0:11:55	but a bit altered so a loss so like problem
0:11:58	so basically the solution can be found by an uh algorithm uh the that
0:12:02	has been proposed to to solve a
0:12:05	uh
0:12:07	sparse reconstruction problems using the least squares a cost function
0:12:11	and then once it's just give you can update
0:12:13	you can use that uh solution to uh
0:12:16	of to uh
0:12:17	find the result for the perturbation on the system matrix and
0:12:21	a if a X is given and everything becomes a a a a a a a uh unconstrained quadratic program
0:12:26	so then you for apps on you can find to the solution then
0:12:28	in closed four
0:12:30	and if you start with a perturbation that's equal to zero you basically start
0:12:34	with the solution that's given by the classical loss to problem
0:12:38	and you can show that you always uh uh uh
0:12:40	improve your cost function that you go that you converge at least
0:12:44	a a a stationary point
0:12:46	so
0:12:47	uh
0:12:48	within this cost function you can show this way that you will always improve upon the to the the classical
0:12:53	solution the classical a so so good solution
0:12:57	of course to salt and this loss a problem you can use your favourite us solve over
0:13:01	what you could also do is you courts uh use scored in a test set a court of this and
0:13:05	there to solve
0:13:06	uh the last so
0:13:08	with which basically means that
0:13:10	you a fixed all the entries except for one in your X factor and that you sold that one
0:13:15	separately and then it becomes like a scalar loss so
0:13:18	which gives you a closed-form form solution
0:13:20	my means of a soft thresholding
0:13:22	so and you can
0:13:23	do that those iterations altogether together so you can basically alternate between
0:13:27	have a and then every entry of X
0:13:30	and then go back to actual a and then sold that for every do we have big separately
0:13:35	and also that one so that gives you a global
0:13:37	uh core descent method that can also be shown to converge to at least a stationary point
0:13:43	and
0:13:47	of course
0:13:48	that this is not necessarily the global optimum but at least you know that you improve upon the the initial
0:13:53	solution which is the the lasso solution
0:13:57	so here are some uh them are coal the comparison so we assume
0:14:01	a that we have a a a a twenty by forty T a matrix so it's a compression here
0:14:05	uh uh fifty percent you could say
0:14:08	there's some stupid structure in a matrix we assume also different variances on
0:14:12	on a and Y
0:14:15	so note that also the uh on the perturbations on on the and Y so also the perturbation on a
0:14:19	has
0:14:20	has a to structure
0:14:22	and the signal vector X here is generated with ten and nonzero entries
0:14:26	and what is shown here is the L zero at or versus the parameter longer
0:14:31	and the L one at versus the parameter a longer which basically
0:14:34	this land like basically gives you a trade-off between
0:14:37	uh solving that totally squares problem and uh
0:14:41	yeah we use parts at each so the the bigger the long as the more
0:14:44	wait you give to uh to this a sparse solution
0:14:49	and you see that uh the best solution here in that so the L zero at or uh so this
0:14:54	is basically related to support recovery so it's that percent H of
0:14:58	uh and trees where to support between the true solution and the estimate solution or
0:15:03	or not the same
0:15:04	so this tells you something about support recovery
0:15:07	and there you see that if you take everything into account so the blue curve here
0:15:11	the weight it's uh structured sparse totally square you get the basically the best
0:15:15	uh a sparsity recovery
0:15:16	if you just take uh the weights
0:15:19	the the correlations into account
0:15:21	or the structure so these are the red curves and the black curves
0:15:24	then you
0:15:26	get a a a little bit uh uh uh
0:15:28	bigger
0:15:29	uh L zero at errors
0:15:31	and if you only do uh
0:15:33	a if you don't take any weight or structure into account you are you're a a little bit
0:15:37	uh worse and a loss so gives you basically the the worst L zero at
0:15:42	for the L one at or or so this is basically the L one norm data or the the the
0:15:48	a performance as are the that's it's closer to each other but of course supports recovery is is
0:15:53	the most important in many many of these uh application
0:15:58	a like i told you before this this approach is very useful in in cases where you uh do sensing
0:16:03	and you use some kind of a grid based approach
0:16:06	so that for instance uh can be used in direction of arrival estimation
0:16:11	where you basically can uh uh
0:16:14	divides the whole anger or space into different grid points into a a a or or and and angle great
0:16:20	winces is every two degrees you can pick it a grid point
0:16:24	and in that case you could express your received vector or Y T V as a linear combination of a
0:16:28	array response vectors so
0:16:30	basically this tells you this first here at don't want tells you
0:16:34	uh how the system would be received out the target would be received the signal would be received if it
0:16:38	comes in on an angle of arrival of
0:16:41	T one
0:16:42	so you get a linear combination of all these uh
0:16:45	uh array response vectors on the different grid points
0:16:47	but of course and and these X contains the combining weights
0:16:51	but of course the combining weights will be sparse because only where you have a target you will have a
0:16:55	combining way
0:16:57	uh of course whenever you have uh
0:17:00	targets that are in between the great
0:17:02	you
0:17:03	the this quality will not be exactly true and there's some kind of
0:17:06	perturbation
0:17:07	on on the
0:17:09	on the grid
0:17:10	so you could say that the the true
0:17:12	uh
0:17:13	the true exponent all five
0:17:15	in you or uh of your source
0:17:17	uh could be then model modelled as
0:17:20	uh
0:17:21	the exponent in you are uh grid points
0:17:24	plus some some than your correction
0:17:27	because like i said before we wanna make you wanna have a the perturbations in a in a linear form
0:17:32	so we want
0:17:33	to have an a find expression for the perturbations so
0:17:36	that means that in this case we need some kind of
0:17:38	approximation because there is
0:17:40	a lot of structure in this
0:17:42	uh and in these perturbations but it's not a your so we approximated by a here
0:17:46	uh
0:17:47	by the find function of of the parameter fact
0:17:51	i'm not gonna go into the details here
0:17:55	uh and so that allows you then to uh
0:17:58	to get a better estimate because next to solving for X you also allow these a grid points basically to
0:18:03	shift to the two solutions
0:18:04	so if you have a
0:18:06	a source that is uh somewhere in between the two good points
0:18:09	because you allows for perturbations on this a matrix a great point might be
0:18:13	shifting to the true uh solution
0:18:16	so you get some kind of super resolution of fact uh for free and this approach
0:18:20	uh other approaches usually start from a rubber great and then they we find a grid
0:18:24	uh in those locations where you have a a the target
0:18:27	here you got a basically in in one shot
0:18:31	uh for is as an example where you have H we see don't and and on time as an ninety
0:18:35	great points
0:18:36	uh
0:18:36	so every two degrees you have a great point and you have a source at one degree and wanted minus
0:18:40	nine degree
0:18:41	so there are exactly in between two grid point
0:18:45	and then you see that the classical us to basically give you uh four nonzeros
0:18:49	basically the the grid points around the
0:18:52	the the sources
0:18:54	you could say a okay we can interpolate those and then we get the solution but
0:18:58	you can only do that if you know that you have only two sources of course if you don't know
0:19:02	the number of sources you could think that
0:19:04	there are four sources now in this in this problem
0:19:07	while the the weighted that's touch at uh
0:19:09	sparse totally squares
0:19:11	gives you basically two peaks in the in the red locations which
0:19:16	which correspond to these black arrows where the true sources located so the great basically it's to the
0:19:21	to the right position
0:19:23	uh you see here also another or all but
0:19:25	uh a is that this is indeed be so this dot this kind of twenty db below the the the
0:19:30	first up
0:19:33	so you you basically
0:19:34	can also do some kind of a a number of sources cover using using this approach
0:19:40	so i think
0:19:41	that brings me to the to the conclusion so we've proposed as a weighted and structured to sparse uh a
0:19:47	totally squares problem
0:19:48	which is motivated by
0:19:50	first of all the fact that you have non i'd yell at these in the in a
0:19:53	in uh compression matrix but it's also motivated by a lot of these sensing applications
0:19:58	so we can account for also correlations and structure in these perturbations
0:20:03	and we show uh looked at the the uh map optimality of of of this uh a problem
0:20:09	and we looked at uh a reduced complexity alternating descent and coordinated descent solutions
0:20:15	uh ongoing and future research consist of recursive and robust implementations of of this method
0:20:21	uh we also try to see whether are also the svd can be used in in some of these problems
0:20:25	for since basically solving an a D also bows down to an iterative method
0:20:30	do you you one there whether in those iterations you could also include
0:20:33	uh sparsity uh and and still use
0:20:37	uh i C D based uh a type of methods to solve a also a sparse totally squares
0:20:43	so that concludes uh the my presentation
0:20:52	hmmm
0:20:54	any questions
0:21:00	i
0:21:01	i was thinking
0:21:03	um
0:21:04	much more as the complexity of your or a solution
0:21:08	um
0:21:09	i mean
0:21:09	what do use for a a a a a large problem so there um
0:21:14	a microphone array very or something that
0:21:16	well i can say that the
0:21:17	the uh the the the of the complexity is basically determined by how you solve this uh the the initial
0:21:22	sparse reconstruction from
0:21:24	and and a and you do that
0:21:26	iteratively so you do that maybe a
0:21:28	five times
0:21:29	i don't know exactly how many iterations that we used here but
0:21:33	and general we don't need to i mean you can stop ever you want to right after one iteration you
0:21:37	know that
0:21:38	you're are already improve upon
0:21:40	the classical uh a sparse reconstruction method so
0:21:43	you could do it for instance uh you can solve
0:21:46	you can say you are we have twice the complexity
0:21:48	because solving than for the perturbation that's a closed form expressions so that's not
0:21:52	uh the biggest complex
0:21:54	so it depends basically on the solver or that you use for it is a sparse reconstruction
0:22:02	just
0:22:04	as one to comment and of the question
0:22:06	how how complex this is
0:22:08	it can be some times less complex them
0:22:11	or each now L S
0:22:14	because you have to remember that a chunk ls
0:22:16	tales and is really
0:22:18	right
0:22:19	a can be more
0:22:20	a less complex and how are you
0:22:23	this is way
0:22:25	what is worse
0:22:26	mentioned should use that
0:22:29	when people he of to regular ties
0:22:31	okay the L S
0:22:34	then a this sense of this use of the world
0:22:37	so again can you mean just all of a
0:22:39	some sort of each tentative
0:22:41	one station
0:22:45	right
0:22:46	yeah the regularized even to less with a quadratic
0:22:50	yeah
0:22:52	um
0:22:53	okay can you
0:22:54	change of the work for a while
0:22:58	instead of
0:23:00	have one
0:23:01	a different from
0:23:04	yeah that's that's possible because okay in the i mean in in
0:23:08	in these iterations i mean the first start
0:23:10	all the iterations you basically set your perturbation to zero and then it could be
0:23:14	instead of a loss problem then you have a a an type of a sparse vector lies to
0:23:19	a problem that you could solve in the in that step
0:23:22	uh uh and and whatever you fixed
0:23:24	and the solution okay in the second step of the it's always a close form expression for the perturbations so
0:23:29	you could change is
0:23:30	L one
0:23:32	thank you
0:23:33	yeah
0:23:34	and
0:23:35	yeah
0:23:36	come back into a a a high resolution a a lot connotation technique you estimation techniques
0:23:42	what's that
0:23:43	snr threshold
0:23:45	when you use this kind of compressive sensing inspired take
0:23:49	yeah i i don't know we are we should we should uh a yeah we should test it on on
0:23:53	on more applications so this is more let's say the the theoretical framework
0:23:57	and now okay we have to check this on on many different applications
0:24:01	the thing is you can use here a kind of a rough great right as an initial point and
0:24:05	question is how rough can you make your great now how much can you correct for
0:24:10	a but that's something we didn't to analytically uh yeah
0:24:14	investigate
0:24:16	yeah we have similar but uh a theoretical analysis think that yeah
0:24:22	yeah
0:24:23	uh
0:24:24	well
0:24:27	uh
0:24:35	well for you so see you know we have also there is also spectrum sensing application but
0:24:40	so it's always compared to the there the the standard uh a sparse reconstruction methods
0:24:46	so uh
0:24:48	is it is that the comparison you would like
0:24:50	i mean
0:24:53	oh okay yeah and and actually that's
0:24:55	the
0:24:56	yeah
0:24:59	so we are yeah this one i i didn't show it but
0:25:01	so this is when you do and really and it's some some uh
0:25:05	uh for different monte carlo simulations you see what happens if you
0:25:09	compare loss so which is the blue curve
0:25:12	this is what actual before where you have to for peaks
0:25:14	right you could say okay you integrates and then you get this blue dashed line
0:25:19	what you see that even
0:25:20	the the full the weighted sparse total score still does better than the integration
0:25:25	and if we integrate we don't gain anything because we already are the good solution so
0:25:30	so this is a guy for the direction of right
0:25:33	so even with interpolation although you need to know the number of sources for this interpolation but
0:25:37	even and we we we have a better performance
0:25:44	okay i think you

WEIGHTED AND STRUCTURED SPARSE TOTAL LEAST-SQUARES FOR PERTURBED COMPRESSIVE SAMPLING

Estimation Theory and Methods

Presented by: Geert Leus, Author(s): Hao Zhu, Georgios B. Giannakis, University of Minnesota, United States; Geert Leus, Delft University of Technology, Netherlands