hi everybody uh my name is uh any check of each and today i'm going any i'm going to to to to talk about distributed optimization like uh uh uh in that some as talk before uh and uh i i i should say two things before beginning uh first uh of course i should acknowledge my uh a the to got don't key just some fun will the shame we are uh all the could exact combine take just some and what shame are also if you to used it to uh with this an R S and uh uh say also that some results i'm going to present uh where derive the very recently and uh are not in the uh in our uh i guess a paper so for starters the let uh uh us a few words about uh uh uh a action uh in i imagine a uh i i have a function uh F and i want to minimize a lot fat a finite dimensional uh factor space uh uh you uh all no uh uh uh uh i'm sure the deterministic gradient algorithm which consist a which is an a relative a them and uh uh uh consist of a following the the gradient lines uh actually if i want to minimize i uh i take the opposite direction with uh uh ghana a sub being uh a a a positive steps he's is uh if i know uh the the function uh F uh if i i i know a function F up to uh a a a random uh uh uh a perturbation uh which is a which has to be and biased uh i i could do the same thing and the the corresponding uh uh i is the is the celebrated the stochastic gradient uh had reason and uh and are um a classical assumptions one uh and full well behaved the of uh functions we can uh one can show that uh the the estimates a sequence converges to the to the roots of the gradient of a so uh we are going to uh to talk about this very uh minimization problem but in a the distributed framework uh so uh first the the outline of my talk use the following at first we are going to uh do with unconstrained optimization and the the the the algorithm i asymptotically uh and then uh move on to constrained optimization and illustrate this on a power a allocation exam so for and constraint uh optimization now we uh i imagine we have uh and network of and agent at work can be a a a a a a a from very different natures of sensors mobile phones roberts a central and agents are connecting are are are connected to according to uh uh a a random uh a graph with supposed to be a a time-varying topology biology and they want to uh to uh achieve a global mission and an for that there are able to to talk in the neighbourhood that to cooperate so uh more precisely in this talk we assume that each agents uh each agent I has the utility function uh F I and we want to uh the network wants to minimize uh the some of the utility function so uh uh we have to say a to and for less and on to uh difficult is the first difficulty is that uh a I i uh ignores uh also utility function of the the the also notes yeah it's we want to uh up to my something that depend on them so uh obviously corporation is going to be needed the second uh remark is that uh agent I uh only knows its own utility function up to a random perturbation so we also have to use the stochastic opera make approximation frame so uh a a a a de algorithm we are going to uh to analyse uh i i stated i i i'm going to comment and uh in the second uh uh uh uh uh is based on uh on i is a to the in the work of a T C send their at has been many contribution of afterwards uh interesting uh were by nee digit that an but menu menu many male work so um uh the agent but it's it's exactly the is the very algorithm that we so in the in the top by of that yeah that the it's a there are two uh two step for this uh agrees and in the first step uh uh each agent receives but uh the the patch the perturbed of the of the gradient and uh updates its estimate uh in a temporary re-estimate uh a that the uh uh uh uh and plus one i oh each agent and uh does this and then the talk uh and uh uh agent for a for instance here a agent i uh at dates he's he's fine and estimate using the temporal estimates of of his neighbours uh uh in a a and the weighted sum so W uh and plus Y i J has to be zero if the nodes uh are not connected this is the the the first of can string a the network model is the following there are also a constraint on the on the weights W uh W S must sum up to one in a a role and uh and uh and sits a double a stochastic metric first uh the first the requirement here is uh a is not very costly uh a uh uh a each agent and i can uh can tune he's weights so as to to to sum of to one but the second one is is uh is more difficult to achieve because agent has to uh to uh to cooperate to uh be able to sum up to one so there's a nice solution uh using P always go sleep as introduced in in so when boyd uh uh that ensures the the W rees matrices is our uh uh doubly stochastic uh we also we the uh we also assume that the is W R i a D that that that can be we and and this force uh assumption is a technical one uh i here with the spectral read used uh essentially actually loosely speaking it says that the network is uh is connected has one connected company so it's a very mild assumption so now let's so the uh is uh move on to D yeah two yes S to tick and i i Z we uh we define a a T to which is that we stack all the the estimates of all the agents in a single or a vector T to and are each interested in uh in two things first in the in the average of the estimates and also in the disagreements a a how the the the local estimates differ from the average so the the the difficult points the the difficult technical point is to uh is to propose uh satisfy a uh a satisfying conditions uh and there's the difficulty in uh in the stability a a be this is the property that all the the do vector or T does ben uh remain bounded with probability one and uh we uh we provide uh such a condition using a yep in a function and and what it when interest in minimisation we can take the function F the as our uh yep and a function i'm not going to to enter the detail oh the first result is uh that's with probably one uh consensus is achieved meaning that all the estimates uh achieve the eventually the same value uh and that the the average of the estimates converge to the set of the row of roots of are uh uh a gradient of F with which are are are uh are target but also if the if this set is the is made of isolated points the then we converse to one of this point uh we also uh uh uh prove the uh uh and all the to uh uh a results which is central limit am uh i'm not going to to to enter into the details but it it just says that uh under or uh uh uh good assumptions uh the the a normalized the a normalized the uh these agreements in distribution two um a a motion vector and i should point out that here uh the did the distribution of this cushion vector is degenerated since all do this the random vectors here are the same so instead of the of uh going into the detail it's me uh uh uh drew you attention on street consequence consequences of this uh uh a result the first one is that the estimate H D D estimates converges uh a a converge at speed uh square root of a of can i and uh a consequence of this consequence is that the the algorithm performs as well as the the centralized agrees and and also from the degenerate nature of the distribution with so we can uh uh uh uh a and that the disagreements uh uh uh are nick can negligible uh with uh with respect to the difference between the target uh value because the the the the distribution of the vector uh puts mask only on the consensus lie so uh another uh another uh remark is that uh uh uh what another um uh yeah remark is that it's since this is a a agreement is achieve in in the at very high speed uh compared to the to the mall to do the and the speed to watch to the the target value we we do not need to communicate to often and and actually we could show that's uh if the probability uh that there's a communication uh a goes to zero as as uh uh i i as soon as it i the it's lower than uh one of or score would of N we can still guarantee the convergence and and hence save networks and and keeps the keeping the same performance so now let's go to a too constrained optimization the my problem is the same except that now a my estimates are required to a to remain in the compact and convex set uh kept D of D and uh this uh this set is known by all the agents so uh we could use a a pretty much the same i reason except that now if one estimate uh goes out of a said D we uh bring it back to the boundary uh of the using uh a projection uh a a and of the second that's the second guy step is and change so uh a uh two remarks that the first remark is that the now no more stability issues which which is a difficult to technical point so it's a good news but the bad news that a the projection yeah introduce the introduces other a technical difficult a a uh a result is that the consensus is still achieved with probability you one and that the average of the estimates converge to the set of uh kick K you points and again if uh if this set is made of the isolated points then uh the average converges to one of this uh perhaps i should sketch the that should us keep the the the sketch of proof and and get it a to eighty five i have time so uh uh uh a a as an illustration uh we are are going to address the problem of uh of uh interference channel we have to uh source destination uh pass and uh we assume that there is no uh channel state information at the transmitter are only the nations observes that's the channel the channel gains uh and the the the goal here is that the dish the destinations rates in order to minimize the a the probability of error or uh of the the the channels and this uh i the exact function they want to minimize is actually uh a weighted sum of all the probability of error for each source path uh destination and uh the constrained come from from the fact that each uh transmitter can not uh uh use uh a a a power a power more than and uh a given uh uh a given threshold oh uh uh uh each user or has a a a a a an estimates uh it's it's the destination has a an estimate of all the the the the transmitter are whereas and we apply the the the algorithm reason we presented a just the replacing these the abstract the great that of F with our specific function to maximise uh to minimize sorry and uh and using uh uh uh the specific said D as the imposed by a can stray and the for for what we we we've seen we can guarantee that uh this i agrees um you is going to converge to a a a look minimum a a a zero ugh like a cake at points and the and with probability one so we could do the we could use a more uh a a a uh involved the uh settings the we more than uh uh with more than the uh one channel or from each source uh this nation so uh let's uh show some numerical a very rapidly some new right numerical results uh used here we plotted the um the power allocation for the two uh uh transmitters uh estimated the uh over the the time and they are that there are it's so there are two curves each uh each time one is for the centralized agrees algorithm and the other is for the distributed algorithm a center use them here is in blue distributed is in red here uh it's in to as the the the central and and green the this the distributed oh of this is for the first user are and this for the second user and we see that's we reach uh eventually uh well here is not so here uh that we reach uh uh a a a a a stable stable points as predicted by the the our results so let me and make me can conclude that the first we we study the uh distribute a stochastic at a reason uh in both constrained and and the framework we provide a realistic sufficient conditions problem for convergence with probability you one and we stated the central limit your M only for the the and constrained uh case the the the as the as true to work we we would like to get read of this uh the buttons the cast uh this doubly stochastic settee constraints of over the weight matrices since is a bit cumbersome for uh for network and and uh we would like so to establish a such a a a a a central limit your theorem in the in the constraint case thank you for a century i but why and this one is given to the node i in the is yeah in okay for a for as if if you want to to to get into the is that the one bite at least does and use of course that was go sixteen since it's i yeah the a our point here is not uh i we absolutely aware of a of that's it this algorithm yeah this is a sorry that is what what what what we say here that i i resume is is known and we we don't claim uh note from a from was from the is to T Vs case where with for