Speech Transcript - GRASSMANNIAN PREDICTIVE CODING FOR LIMITED FEEDBACK MULTIUSER MIMO SYSTEMS

i was speak very so i just thinking um i mean my first conference presentation every gave a member was a our ten a member of like i spend the whole session hammering like all the speakers of questions of my presentation was that the and course after after my presentation they'll hammered me together so so my plan is i'm an speak the whole time go five minutes extra low we no time for questions and the will i be very quickly um it was so i i guess we're gonna get started here i mean you know a B being a professor i can basically fill and whatever time is required i have like fifty back up so i i can do this and an hour a can do than twenty minutes a do five um so this is this is joint work with my P H D student to count you know we he's a national instruments now and uh i work that he did for P H T actually about a year ago something about one students graduated becomes a lot harder to get them to actually submit the work on time for some reason uh so anyway as time but to be here presenting a here so the the title is many and predictive coding for limited feedback multiuser mimo system and i'm glad about that the especially the previous presentation because uh it gives me a nice introduction that i was expecting to get in this uh sessions i don't have to tell you what is multiuser mimo set so the the system that i'm in a consider in this talk is so multiuser user mimo communication system with limited feedback um i'm going to make the uh the main assumption that i i guess we just found on the previous talk wasn't good which is i'm assume the that the quality information is perfect and i'm to focus on quantizing the uh direction information and so you the multi jim i'm most set of is uh we're gonna do in a in a case and do zero forcing precoding at the transmitter we have a single receive even of the objective at each user is we're gonna quantized that direction vector we're gonna send a back of the limit feedback laying we to put all those together or design trans the beam formers and everything works just like in that the previous talk and uh that so this is a set of that we're considering and uh the basic uh challenge here is as follows here so if if you been following the the field for a long time you know that you know we started off doing this was single user beamforming um we were looking at codebooks books on the order of three four five six bits for up to four antennas in three G P B ended up taking four bats so so you don't need that much feedback it's that's great now the problem is you know leveraging this work my general you find out that generally speaking the to achieve a constant gap from performance with the optimal um there were some rate that you chi with the ah with um perfect channel state information you need to have a codebook that scales with the number of users and all i also scales with the the snr so the the application here is that you you tend to need um high resolution otherwise are gonna be interference limited and and exactly like that was the nice a plot before you know you at at some point at high snr you wanna do multiuser mimo all and the reason is that we don't have enough resolution to become quantisation interference limited and um performances is the problem and then we're we're also seeing this right now a base station coordination we're people are applying multiuser mimo to distributed in a settings and we need just as much feedback back there if if not so the main thing here is that um yeah as we look at more sophisticated transmission techniques we we really need higher code-books we bigger code-books and i becomes or problem "'cause" you have more feedback so um of the other issues is that in addition the quantisation i mean we also have delay and mobility me if you you have a large codebook size the channel mobile you have to send a lot of bits back very quickly is that means the net feedback to have that link is is very high so you know for channel the the transmitters not won't point about the channel and um this is a problem so well we're gonna proposes a predictive coding framework that uh will allow us to um take advantage of correlation in the channels so that when the channels not moving very quickly we can get effectively larger codebook sizes then we otherwise what have so i'll i'll explain a little bit about the key idea and a mention some some prior work so the predictive coding concept the mean it anyone working and source coding uh ah this would be probably the first thing that you would do and um the idea as as follows here's base if you have a you have a source score over time we do is um you you find or solve a good predictor this might be uh and i'm a C predictor for example or or you have a favour you could plug it in and use use that predictor to predict the next value of the source and then instead of quantizing the noose you know the the new value of the source you quantise the error between the predicted value of the source and the source that comes at this is used to it with great success in speech and image i mean it's is use all over the place and the main things you need here i mean you need a a a predictor you need to be a quantized the error you need to be able to compute the difference between the predicted and the true and then you need to be a to track your um pretty good sequence and then um the decoder essentially what it does is a it also implements this pretty cut with this um predictor so takes the quantized there updates and that keeps implement the prediction and that gives a it's you keep updating the the decoder over time and so for example um what the traditional like use like a first-order one step predictor you might have to weights here you take a linear you're combination of the previous two symbols that becomes your predicted value and you know you can optimize as weights using mmse and um and then if you wanna design the code but probably the typical way to do this is with the lloyd algorithm there is also a of structure techniques tree structure and so on and uh this this is this is known as widely deployed a mean people the doing this for at least twenty thirty years so um the the problem here is that as follows in the limited feedback beamforming case are the information that we wanna quantise that normalized vector a normalized vector space invariant it's actually um represented by a point on the grass and of in this case uh the space of beamforming vectors are considered be points on G and come one so this is the set of sub-spaces of in N dimensional space with one to mention and you can write it like points on a sphere that's more of an illustration not exactly correct but gives you kind of the idea here and so it effectively the problem that we have with them if you back beamforming is that we want to do um predictive quantisation of a source that lives on this matter and um the and so that's that's the basic idea now let's let's figure out what the problem is we white why are we talking about this now after we've be doing them if you back for eight or nine well the problem as follows your so first all we have a subspace source that's going click coder i we need to generate the error and the problem with working on it manifold special the grassmann manifold here is that simple operations like um adding up to point not necessarily well the finite so like for example at if if i'm looking at those two lines there what is it need to add those two lines up you got another line or a me what we or getting points on a sphere you get a point that's not on the sphere ones you're done if you "'em" an euclidean space so i you need to do something special just a add these things that so the net of that is that generating a a a a a of the concept of a error it's actually not obvious um well you also need to predict things on the subspace "'em" if you really wanna take this manifold full structure into account we shouldn't be predict euclidean space we should be predicting on on this manifold a cell so we need a we need a manifold predictor but then wow what is an mmse manifold predictor or mean what what is multiplying what is weights to me all these things the they're not um very well the fine and so that's why it's be very hard to come up with even you know extremely simple um examples of of doing predictive coding here's because these operations are hard what that means is that in the predictive coder the generating the error is not straightforward want in the errors not straightforward predicting the the sequence not straightforward an updating the predictors not straight for and so um each of these proposed blocks you need to have something you here and so there there has been a a lot of work on this i mean it certainly my group another i mean everybody wants to exploit the temporal correlation the channel mean it it's it's the right thing to do a mean we know the multi to my what doesn't work well as the channels very quickly and you know this temporal correlations just sitting there begging to be exploited so in trying to do that uh i mean prior the earliest work this is um and a certain as either and i need two thousand two they have some nice papers with with gradient based update there's some dynamic codebook approaches where you you kind of adaptively select a subset of a codebook depending on how fast the channels moving if it's very slow you end up with a very directional codebook for correlation if is going fast you and of a kind of a grassmannian codebook there's a these progressive or successive refinement techniques where you zoom and on the channel estimate depending on how a fast or slow it's going that's more like a tree kind of quantisation uh there there is there been some work using would euclidean prediction you could use the euclidean predictor in font size it and depending on how you do it sometimes like an actually work very well but sometimes you lose the um the phase and variance all the structure that we were trying to exploit the grassmann manifold first what so i can be a problem um probably the best or to seen is a different role approach like a at all and this is related to something that they've actually proposed i the three G P P standard where it's essentially you look at a difference between the last vector in the next vector here and that that approach um works reasonably well but it's um does a really use anything stochastic and then there's also some rating expression ray compression techniques so i mean the main message here as i mean there there's definitely like a lot of work on this topic but there's not really a comprehensive framework for solving the problem that i just told you about doing predictive a quantisation the grass aggressive mouth so what i'm gonna do is i'm it's tell you about our approach to solving this problem and fortunately uh we do have a general solution i mean the general solution with you know general manifolds at many points is still an open problem on it so you about the solution that we have um um that we're proposing for the case where we have a a two points in so one a do is i'm a to build i'm a explain some of the mathematical concepts that we need to use of i've got the the equations here but i'm i'm a really focus on the the picture "'cause" the point is to get the intuition of what's happening the picture and then all to you how we yeah so operating on um in these kinds of manifolds here you can the fine uh several cards as one of them is is this notion of a tangent vector so this tangent vector is of i have two point X one and X two that live on the manifold here i can the fine a tangent vector which are you want to which is a vector that pointing in a direction of X two from X one and so far gone all to X one and member X one is a is a point on the grass of manifold we can represent as a unit vector so E the E is if vector that's orthogonal to X one but it's not a unit vector it has a link that's link is dependent on the court of distance between X one and X two and you can go through there's um so mice paper is it'll minutes miss that have some very general descriptions of the notions of tangents and geodesic spot but those are um if you look through which you're fine that because everything is so general actually very hard to pick it out and so one of the contributions here is that we simplified everything down for the beamforming case said it turns out this that the equation simplifies dramatically and there is actually a lot of intuition and equations old but through all but basically the tended vector here you can decompose an the two pieces one which has to do with the link which is the arc link between X one and X two a second which is the unit tangent direction and then this is a function of the inner product between X one and X two and it's a function of a coral this and so we're gonna use the tangent vector uh to give us a a quick one notion of air okay so some concept we use the stewardess so a geodesic this this is the curve that's between X one and X two that's the shortest path between these two you can come up with an equation for um for that's curve that can sit that's a function of you can write as a function of X one and X two but becomes more convenient be write is a function of X one in the tangent vector you can write it like this here you've actually got a um X one times the cosine of this thing plus um sign of this thing in the T equal zero gives you X one T one gives you X two it turns out that the this and this orthogonal you can see that because if you remember that the tangent vector is orthogonal to X one do you this nice if orthogonal decomposition so we're gonna use this geodesic together something that looks like an addition now the sort thing that we need is we we want to predict so we wanna try to figure out where the sequence of points on the grassmann manifold and um we so we wanna get some function that's got some or we can optimize over it and so for this work what we did this oh where we're gonna take a really something very simple really use this concept of parallel transport of the parallel transport is essentially a way of uh taking this tangent vector X one and mapping it over to X two a remember that that's change a vector has to be orthogonal to the point you have to do this mapping in such a way that the orthogonality is maintained to the new vector and so you can do that and it turns out that it it actually looks something like the negative of the previous tangent back so we're gonna use the parallel transport concept to fine a predicted value okay so now i'm point how we use these mathematical operations to do press ready and pretty quantisation so the first thing here is let's generated error was suppose we have a predict sequence the predicted sequences is gonna be denoted by X X still to here so this X had is the state this is um this is gonna be known at both the uh transmitter receiver this is the predicted value here which just from the state and then read take the difference between the predicted now sir so what happens is we're this is the predicted value here that we know the receiver and transmitter this is a new observation we want compute this vector tangent vector that'll take us from here to here and so this is or really do with that the error tangent there we're gonna use the a value the new value to generate a error the second thing that really do here run a quantized a tangent vector now the the D first um reaction at least at me looking at this problem is yes nother grass many quantisation problem that's great the problem is is not actually grass many quantisation problem because a change of vectors not you or it has a a structure its orthogonal to X one but otherwise it lives in the space orthogonal to that space so it's a slightly different quantisation problem so we're gonna quantise that tangent vector in this paper here what we did is we actually decompose the changing two pieces one that's a addressed many quantisation one that's a not to again it's a complex king quantisation is kind of like a quality and then a direction so how we propose a codebook for that and that's where we used to quantized ten and the third thing is the state update here so this is how we actually update we add that a to the um predicted value three use the geodesic for that's so what we're gonna do is we're gonna take the um take the X still like here we're gonna add the um this a parallel trance as a problem transfer error vector here take a full step and that's what we're gonna get for the uh updated eek what's here yeah okay right so this is the state update were add the error on and in this is the predicted value so the project what we're gonna do is we're gonna take a um the difference between these two previous state factors and then we're gonna kind of like keep going in the same direction add that on to the X had here goal a right here so that's gonna be are are predicted value and i should point out here that um it in this paper we've simplified everything down counts were taking a full step for were can probably realise that going of false step it may not be the right thing to do and so in some of our other work we have actually optimized the step size and and you can optimize that as of uh over time in there's some cool things related to that and that that of here to uh i T A a that result does is our proposed predictor here and so that's a basically the idea here i mean i taking the geometric tools for for doing things on the grass manifold and i'm gonna write these interesting equations to you to get notions of prediction and to get notions of error update quantized error as i'm i use all of that now to quantise a correlate sequence over time uh for this spring for the simulations in this paper really use a uh autoregressive model which is which is rather standard though you could do better if you use to i a a uh clark gone slight model which is band limited has better prediction problem so this is actually more of a worst case we're gone you consider the sum rate without of so we're gonna use or for a multi to my mow and we compare with using ran a vector quantisation which is essentially a good way of finding a a fixed codebook or different sizes so that the first um result as i just wanted to point out that uh indeed using this the using this our of um it has result in effectively high resolution so this is i i i forgot the the parameters the simulation here but this is basically a nine to codebook the sequences varying over time in this is you know we keep quantizing it over time here and and it the coral dozens fluctuates because sometimes you're quantized values close to the true channel sometimes as far as fluctuating and then this is our approach down here also with nine bits so we're getting about vector five down here and our approach is um because we're tracking as over of things are changing over time we're not converging to zero because it's very so um but this is just to show you that and it does actually decrease the error terms of average word of this now uh let's look at the uh sum rate comparison and so the blue curve here this is a perfect csi as for users for in as we don't have any more the reverse is we didn't select the that's for users we just picked randomly for users and they have the same average snr so there there are like sitting on the same sort if you like so here's this the blue is the you know what we're trying to achieve here that's perfect csi and this uh uh what colour that is kind of a mustard looking over that is a grassmannian i D go work here it's it it kind of a weird thing is that it turns out that there's not really good grassmannian code-books for large codebook size is so with a vector quantisation you can get more or less a grassmannian cold so it's you can get essentially a very good code have a geometry works so this actually is a good codebook book a good fixed to mention book you probably won't do much better than that and in here is the standard errors for result you get a high snr your france so right here you can see these different uh doppler a a symbol period products here so this zero point zero one this is a very slow channel and this is a reasonably fast channel here well you can see is that in a of all with nine bits of feedback in a in a effectively a slow channel were able to get a very good performance tracking that ideal sum rate with a a a you know somewhat fast channel we still get a little bit of improvement but um the performance improvements not is not and so then nice thing is that you know this this means that you know every user essentially K can be updating their um this is for all the users have the same channel profile but the out of as flexible enough that mean every user since you runs are an adaptive algorithm that predicted they get the csi and if their channel happens be slow they have a better information of the channel during fast to get worse so it's it's um i-th that it is actually tracked quite well on the also includes a five millisecond delay which is a standard assumption three D so even with delay that we're not accounting for we still can track the the sum rate with uh under some reasonable some so uh that's essentially a tears what i've done and the stalks the proposed a a a many predictive coding framework and this is i think um um at least from a source coding perspective you know the the pictures nice the stories nice um now there's a lot of limitations of first formally using the previous two point i like to use more than that clearly if i had um i figure out how to do this with three or five previous points of would just shown that to it's it's hard because these tangents all those concept are based on two point so you have three i U you gotta do something else there and so we have some ideas but that's that's proving to be very difficult um my student in his dissertation has some extension of this steve manifolds to so you can do this um we did some work with limited feedback mimo ofdm or you had to do a a people quantisation set of grassmann quantisation and this whole concept that works there as well but the feedback requirements really become high so i i i i'm not sure that this is exactly the right solution for higher dimensions yet but i think it's a good idea and then i should you the multiuser mimo results um since we develop this i've another student who graduated who has um enhance this for multi cell my mel so where we actually predict the channels from interfering base stations using a similar concept what she was actually able to come up with frankly a much better predictor and that results in a at i C C and then i have another result where we've also come up with a better predictor for interference alignment and that's gonna appear period that so the concept the concept i think it's good and that i still think of a lot of a lot of work to be done here okay so that's a i pa gesture not taking enough time um uh this is not my virus oh i took exactly the right about time one well oh uh would you know i need to get back to D it is not depend this this method on a a a a a a a you model list time correlation um it's it's not really um very dependent so in the in the journal version which uh i i think it we actually just submitted the journal version recently and it should be an archive a or tomorrow we have simulations with other temporal like a a multi tap they are models and it still works well so we're not really exploiting any knowledge about the channel correlation you could probably come up with a correlated channel model that makes the art more for like i process but but we're not using you know we we don't have the um the the correlation function anyway the there was no i'm and right i mean mean we'd like to have that or not exploiting a yeah so you not exploding help right right i i mean it it's only coming see the this be it's really only coming when we quantise the magnitude of the tangent vector so the channels bearings slowly what's gonna happen is um we're gonna take the where the quantized nick here well when we quantized this so we quantized the direction and um the magnitude to of tender vectors one very slowly we're gonna take a small step and what very quickly we're gonna take a larger step so that's the only place where the correlation comes in now the other paper i mentioned where we adapt the size they are the statistics come in but not in uh a way as one the back a this was not a speaker good heart so yeah cross this these correspond to "'cause" i'm an easy to the best presentation of the information that one can combine E hmmm hmmm you do something on some in by do have to seek to unit inspectors um okay i can use be the last part at in do you have to two stick to to to to not terms you could do just one uh the percent direction by by some big to that's not normal estimates yeah so i mean the a the question is yeah whether you need the do you know link factors not something is that remember that this vector of lives on the grass amount for model is it unit length that's also phase and vary so those two properties it holds out degrees of freedom of the vectors we have a complex vector with and in tree you have to and pram you know nor we have to and mine one this phase in bearing you have to and minus two so the whole reason that we use this on the grassmann manifold is so that we quantise less information and you can show from a point of view of capacity you're probably of error that all you need is this normalized direction information now if you wanna quantise the direction as well with the C Q why then actually your back your back to to and minus one and so if you want to combine those together you could you could do that and you will end up with something different but still you have the phase and variance so what the it when a be like it wouldn't be like a a random vector a gaussian still wanna be a gaussian problem it would be on a different fold right but but the doesn't really matter in this context menu you one to two good some some and some after at each step and and a doesn't matter oh to the one of is no number unit thanks and you you don't care about the thanks and then so anyway oh okay okay so um i-th i can give you another answer of this this question here so one okay one thing that you could do is um grassmann manifold as the rim money in manifold it's local euclidean so in fact you could uh you could use kind of a standard predictor and then you could at you can add them up and you get a point not on the manifold and then you could if you need a you know norm you can project back to the manifold and uh indeed indeed that works if you're operating in a local enough region and so we found that um you an actually the analysis in the journal version this paper makes that it's like a small angle assumption but if you have more variation the problem is that you're that effectively that addition in projection gets you five away from this kind of addition operation so i i i think there there's is actually a case for doing that to it just depends and and the the one the the approach that we've taken this i C C paper actually has a better predictor that that doesn't predict on the grassmann manifold but we do updates on the grass amount so i think it's the your point is well as well taken okay so copy break right oh i

GRASSMANNIAN PREDICTIVE CODING FOR LIMITED FEEDBACK MULTIUSER MIMO SYSTEMS

Multiuser and Network MIMO

Presented by: Robert W. Heath Jr., Author(s): Takao Inoue, National Instruments, United States; Robert W. Heath Jr., The University of Texas at Austin, United States