| 0:00:18 | hi everybody | 
|---|
| 0:00:19 | uh my name is uh any check of each and today i'm going any i'm going to to to to | 
|---|
| 0:00:24 | talk about distributed optimization like | 
|---|
| 0:00:26 | uh uh uh in that some as talk before | 
|---|
| 0:00:30 | uh and uh i i i should say two things | 
|---|
| 0:00:33 | before beginning uh first uh of course i should acknowledge my | 
|---|
| 0:00:37 | uh a the to got don't key just some fun will the shame | 
|---|
| 0:00:40 | we are uh all the could exact combine take | 
|---|
| 0:00:44 | just some and what shame are also if you to used it to uh with this an R S | 
|---|
| 0:00:49 | and uh uh say also that some results | 
|---|
| 0:00:52 | i'm going to present uh where derive the very recently and uh are not in the | 
|---|
| 0:00:58 | uh in our uh i guess a paper | 
|---|
| 0:01:01 | so for starters the let uh | 
|---|
| 0:01:04 | uh us a few words about uh uh uh a action uh in i imagine a uh i i have | 
|---|
| 0:01:09 | a function uh F and i want to minimize a lot fat a finite dimensional | 
|---|
| 0:01:13 | uh factor space | 
|---|
| 0:01:15 | uh uh you uh all no uh uh uh uh i'm sure the deterministic gradient algorithm which consist | 
|---|
| 0:01:21 | a which is an a relative a them and uh uh uh consist of a | 
|---|
| 0:01:26 | following the the gradient lines | 
|---|
| 0:01:28 | uh actually if i want to minimize i uh i take the opposite direction with uh | 
|---|
| 0:01:33 | uh ghana a sub being uh a a a positive steps | 
|---|
| 0:01:37 | he's is uh if i know uh the the function uh F | 
|---|
| 0:01:42 | uh if i i i know a function F up to uh a a a random uh uh uh a | 
|---|
| 0:01:47 | perturbation | 
|---|
| 0:01:48 | uh which is | 
|---|
| 0:01:50 | a which has to be and biased | 
|---|
| 0:01:53 | uh i i could do the same thing and the the corresponding uh | 
|---|
| 0:01:57 | uh i is the is the celebrated the stochastic gradient uh had reason | 
|---|
| 0:02:03 | and uh and are um | 
|---|
| 0:02:05 | a classical assumptions one uh and full well behaved the of uh functions we can uh one can show that | 
|---|
| 0:02:12 | uh the the estimates a sequence converges to the to the roots of the gradient of a | 
|---|
| 0:02:19 | so uh we are going to uh to talk about this very uh minimization problem but in a the distributed | 
|---|
| 0:02:26 | framework | 
|---|
| 0:02:27 | uh | 
|---|
| 0:02:27 | so uh first | 
|---|
| 0:02:29 | the the outline of my talk use the following at first we are going to | 
|---|
| 0:02:33 | uh do with unconstrained optimization and | 
|---|
| 0:02:36 | the the the the algorithm i asymptotically | 
|---|
| 0:02:40 | uh and then uh move on to constrained optimization and illustrate this on a power a allocation exam | 
|---|
| 0:02:46 | so for and constraint uh optimization now | 
|---|
| 0:02:49 | we uh i imagine we have uh | 
|---|
| 0:02:52 | and network of and agent | 
|---|
| 0:02:55 | at work can be a a a a a a a from very different natures of sensors mobile phones roberts | 
|---|
| 0:03:00 | a central | 
|---|
| 0:03:01 | and agents are connecting are are are connected to according to uh uh a a random uh a graph with | 
|---|
| 0:03:07 | supposed to be a a time-varying topology biology | 
|---|
| 0:03:10 | and they want to uh | 
|---|
| 0:03:11 | to uh achieve a global mission and an for that there are able to to talk in the neighbourhood that | 
|---|
| 0:03:16 | to cooperate | 
|---|
| 0:03:19 | so uh more precisely in this talk we assume that each agents | 
|---|
| 0:03:23 | uh each agent I has the utility function | 
|---|
| 0:03:27 | uh F I and we want to uh the network wants to minimize | 
|---|
| 0:03:33 | uh the some of the utility function | 
|---|
| 0:03:36 | so uh | 
|---|
| 0:03:38 | uh | 
|---|
| 0:03:39 | we have to say a to and for less and on to uh difficult is the first difficulty is that | 
|---|
| 0:03:45 | uh a I i uh ignores | 
|---|
| 0:03:48 | uh also utility function of the the the also notes | 
|---|
| 0:03:51 | yeah it's we want to uh up to my something that depend on them | 
|---|
| 0:03:54 | so uh obviously corporation is going to be needed | 
|---|
| 0:03:58 | the second uh remark is that | 
|---|
| 0:04:00 | uh agent I uh only knows its own utility function up to a random perturbation | 
|---|
| 0:04:07 | so we also have to use | 
|---|
| 0:04:09 | the stochastic opera make approximation frame | 
|---|
| 0:04:13 | so uh a a a a de algorithm we are going to uh to analyse | 
|---|
| 0:04:18 | uh i | 
|---|
| 0:04:19 | i stated | 
|---|
| 0:04:20 | i i i'm going to comment and uh in the second | 
|---|
| 0:04:23 | uh uh uh uh uh is based on uh | 
|---|
| 0:04:26 | on i is a to the in the work of a T C send their at | 
|---|
| 0:04:29 | has been many contribution of afterwards | 
|---|
| 0:04:32 | uh interesting uh were by nee digit that an but menu menu many male work | 
|---|
| 0:04:37 | so um | 
|---|
| 0:04:39 | uh the agent | 
|---|
| 0:04:41 | but it's it's exactly the is the very algorithm that we so in the in the top by of that | 
|---|
| 0:04:45 | yeah that the it's a there are two uh two step for this uh agrees and | 
|---|
| 0:04:50 | in the first step uh | 
|---|
| 0:04:52 | uh each agent receives | 
|---|
| 0:04:54 | but uh the the patch the perturbed of the | 
|---|
| 0:04:57 | of the gradient and uh updates its estimate uh in a temporary re-estimate | 
|---|
| 0:05:02 | uh a that the | 
|---|
| 0:05:04 | uh uh uh uh and plus one i | 
|---|
| 0:05:06 | oh each agent and uh does this | 
|---|
| 0:05:09 | and then the talk | 
|---|
| 0:05:11 | uh and uh uh agent | 
|---|
| 0:05:13 | for a for instance here a agent i | 
|---|
| 0:05:16 | uh at dates he's | 
|---|
| 0:05:17 | he's fine and estimate | 
|---|
| 0:05:19 | using the temporal | 
|---|
| 0:05:21 | estimates of | 
|---|
| 0:05:22 | of his neighbours | 
|---|
| 0:05:23 | uh uh in a a and the weighted sum | 
|---|
| 0:05:26 | so W uh and plus Y i J | 
|---|
| 0:05:29 | has to be zero | 
|---|
| 0:05:31 | if the nodes uh are not connected | 
|---|
| 0:05:33 | this is the the the first of can string | 
|---|
| 0:05:38 | a the network model is the following there are also a constraint on the on the weights W | 
|---|
| 0:05:43 | uh W S must sum up to one in a a role and uh and uh and sits a double | 
|---|
| 0:05:50 | a stochastic metric | 
|---|
| 0:05:52 | first uh the first the requirement here is uh a is not very costly uh a uh uh a each | 
|---|
| 0:05:59 | agent and i can uh can tune he's weights so as to to to sum of to one but the | 
|---|
| 0:06:03 | second one is is uh is more difficult to achieve because agent has to uh | 
|---|
| 0:06:07 | to uh to cooperate to uh be able to sum up to one | 
|---|
| 0:06:11 | so there's a nice solution uh | 
|---|
| 0:06:13 | using P always go sleep as introduced in in so when boyd | 
|---|
| 0:06:17 | uh uh that ensures the the W rees | 
|---|
| 0:06:20 | matrices is our uh uh doubly stochastic | 
|---|
| 0:06:23 | uh we also we the uh we also assume that the is W R i a D that that that | 
|---|
| 0:06:29 | can be we and and this force uh assumption is a technical one | 
|---|
| 0:06:33 | uh i here with the spectral read used uh essentially actually loosely speaking it says that the network is uh | 
|---|
| 0:06:39 | is connected has one connected company | 
|---|
| 0:06:43 | so it's a very mild assumption | 
|---|
| 0:06:46 | so now let's | 
|---|
| 0:06:47 | so the uh is uh move on to D yeah two | 
|---|
| 0:06:50 | yes S to tick and i i Z | 
|---|
| 0:06:52 | we uh we define a a T to which is that we stack | 
|---|
| 0:06:56 | all the the estimates of all the agents in a single or a vector T to | 
|---|
| 0:07:00 | and are each interested in uh in two things first in the in the average of the estimates | 
|---|
| 0:07:06 | and also in the disagreements | 
|---|
| 0:07:09 | a a how the the the local estimates differ from the average | 
|---|
| 0:07:13 | so the the the difficult points the the difficult technical point is to uh is to propose uh satisfy a | 
|---|
| 0:07:20 | uh a satisfying conditions | 
|---|
| 0:07:22 | uh and there's the difficulty in uh in the stability | 
|---|
| 0:07:26 | a a be this is the property that all the the do vector or T does ben uh remain bounded | 
|---|
| 0:07:32 | with probability one | 
|---|
| 0:07:34 | and uh we uh we provide uh such a condition using a yep in a function and and what it | 
|---|
| 0:07:39 | when interest in minimisation we can take | 
|---|
| 0:07:42 | the function F the as our uh yep and a function i'm not going to | 
|---|
| 0:07:46 | to enter the detail | 
|---|
| 0:07:48 | oh the first result is uh that's with probably one uh consensus is achieved | 
|---|
| 0:07:54 | meaning that all the estimates | 
|---|
| 0:07:55 | uh achieve the eventually the same value | 
|---|
| 0:07:59 | uh and that the the average of the estimates converge to the set of the row of roots of are | 
|---|
| 0:08:05 | uh | 
|---|
| 0:08:06 | uh a gradient of F with which are are are uh are target but | 
|---|
| 0:08:09 | also if the if this set is the is made of isolated points the then we converse to one of | 
|---|
| 0:08:16 | this point | 
|---|
| 0:08:18 | uh we also uh uh uh prove the uh uh and all the to uh uh a results which is | 
|---|
| 0:08:24 | central limit am | 
|---|
| 0:08:26 | uh i'm not going to to to enter into the details but | 
|---|
| 0:08:29 | it it just says that uh under or uh uh uh | 
|---|
| 0:08:34 | good assumptions | 
|---|
| 0:08:35 | uh the the a normalized the a normalized the uh these agreements in distribution | 
|---|
| 0:08:42 | two | 
|---|
| 0:08:43 | um a a motion vector | 
|---|
| 0:08:45 | and i should point out that here uh | 
|---|
| 0:08:48 | the did the distribution of this cushion vector is degenerated since all do this the random vectors here are the | 
|---|
| 0:08:54 | same | 
|---|
| 0:08:56 | so instead of the of uh going into the detail it's me uh uh uh | 
|---|
| 0:09:01 | drew you attention on street consequence consequences of this uh | 
|---|
| 0:09:05 | uh a result the first one is that | 
|---|
| 0:09:08 | the estimate H D D estimates converges | 
|---|
| 0:09:11 | uh a a converge at speed uh square root of a of can i | 
|---|
| 0:09:16 | and uh a consequence of this consequence is that the the algorithm performs as well as the the centralized agrees | 
|---|
| 0:09:23 | and | 
|---|
| 0:09:24 | and also from the degenerate nature of the distribution with so | 
|---|
| 0:09:28 | we can uh uh uh uh a and that the disagreements | 
|---|
| 0:09:32 | uh uh uh are nick can negligible | 
|---|
| 0:09:35 | uh with uh | 
|---|
| 0:09:37 | with respect to the difference between the target uh value | 
|---|
| 0:09:41 | because the the the | 
|---|
| 0:09:43 | the distribution of the vector | 
|---|
| 0:09:45 | uh puts mask only on the consensus lie | 
|---|
| 0:09:50 | so uh another uh another uh remark is that uh uh uh what another um | 
|---|
| 0:09:58 | uh | 
|---|
| 0:09:59 | yeah remark is that it's since this is a a agreement is achieve | 
|---|
| 0:10:02 | in in the at very high speed | 
|---|
| 0:10:04 | uh compared to the to the mall | 
|---|
| 0:10:08 | to do the and the speed to watch to the the target value | 
|---|
| 0:10:11 | we we do not need to communicate to often and | 
|---|
| 0:10:13 | and actually we could show that's uh if the probability | 
|---|
| 0:10:17 | uh that there's a communication uh a goes to zero | 
|---|
| 0:10:21 | as as uh uh i i as soon as it i the it's lower than uh one of or score | 
|---|
| 0:10:26 | would of N | 
|---|
| 0:10:27 | we can still guarantee the convergence and and hence | 
|---|
| 0:10:30 | save networks and and keeps the keeping the same performance | 
|---|
| 0:10:35 | so now let's go to a too constrained optimization the my problem is the same except | 
|---|
| 0:10:40 | that now a my estimates are required to a to remain in the compact and convex set uh kept D | 
|---|
| 0:10:47 | of D | 
|---|
| 0:10:48 | and uh this uh this set is known by all the agents | 
|---|
| 0:10:52 | so uh we could use a a pretty much the same i reason except that now | 
|---|
| 0:10:57 | if one estimate | 
|---|
| 0:10:58 | uh goes out of a said D we uh bring it back to the boundary uh of the using uh | 
|---|
| 0:11:04 | a projection | 
|---|
| 0:11:07 | uh a a and of the second that's the second guy step is and change | 
|---|
| 0:11:12 | so uh a uh two remarks that the first remark is that the now no more stability issues which which | 
|---|
| 0:11:18 | is a difficult to technical point so it's a good news but the bad news that a the projection | 
|---|
| 0:11:23 | yeah introduce the introduces other a technical difficult | 
|---|
| 0:11:28 | a a uh a result is that the consensus is still achieved with probability you one | 
|---|
| 0:11:35 | and that the average of the estimates converge to the set of uh kick K you points | 
|---|
| 0:11:40 | and again if uh if this set is made of the isolated points then uh the average converges to one | 
|---|
| 0:11:47 | of this | 
|---|
| 0:11:50 | uh perhaps i should sketch the that should us keep the the the sketch of proof and and get it | 
|---|
| 0:11:56 | a to eighty five i have time | 
|---|
| 0:11:58 | so uh uh uh a a | 
|---|
| 0:12:01 | as an illustration uh we are are going to address the problem of uh | 
|---|
| 0:12:06 | of uh interference channel we have to uh | 
|---|
| 0:12:09 | source destination uh pass | 
|---|
| 0:12:12 | and uh we assume that there is no uh channel state information at the transmitter are only the nations | 
|---|
| 0:12:19 | observes that's the channel the channel gains | 
|---|
| 0:12:21 | uh and the the | 
|---|
| 0:12:25 | the goal here is that the dish the destinations rates | 
|---|
| 0:12:28 | in order to minimize the a | 
|---|
| 0:12:32 | the probability of error or uh of the the the channels | 
|---|
| 0:12:36 | and this uh i the exact function they want to minimize is actually | 
|---|
| 0:12:40 | uh a weighted sum of all the probability of error for each source path uh destination | 
|---|
| 0:12:46 | and uh the constrained come from from the fact that each uh transmitter can not | 
|---|
| 0:12:51 | uh uh use uh a a a power a power more than and uh a given uh | 
|---|
| 0:12:56 | uh a given threshold | 
|---|
| 0:12:59 | oh uh uh uh each user or has a a a a a an estimates uh | 
|---|
| 0:13:05 | it's it's the destination has a an estimate of all the | 
|---|
| 0:13:09 | the the the transmitter are whereas | 
|---|
| 0:13:12 | and we apply the the | 
|---|
| 0:13:15 | the algorithm reason we presented a just the replacing these the abstract the great that of F with our specific | 
|---|
| 0:13:21 | function to maximise | 
|---|
| 0:13:23 | uh to minimize sorry and uh and using uh uh uh | 
|---|
| 0:13:27 | the specific said D as the imposed by a can stray | 
|---|
| 0:13:31 | and the for for what we we we've seen we can guarantee that | 
|---|
| 0:13:37 | uh this | 
|---|
| 0:13:37 | i agrees um you is going to converge to a a | 
|---|
| 0:13:40 | a look minimum a a a zero ugh like a cake at points | 
|---|
| 0:13:45 | and the and with probability one | 
|---|
| 0:13:48 | so we could do the | 
|---|
| 0:13:49 | we could use a more uh a a a uh involved the | 
|---|
| 0:13:54 | uh settings the we more than uh | 
|---|
| 0:13:57 | uh with more than the uh one channel | 
|---|
| 0:14:00 | or from each source | 
|---|
| 0:14:01 | uh this nation | 
|---|
| 0:14:04 | so uh let's uh show some numerical a very rapidly some new right numerical results | 
|---|
| 0:14:09 | uh used here we plotted the um | 
|---|
| 0:14:13 | the power allocation for the two uh | 
|---|
| 0:14:16 | uh transmitters uh estimated the | 
|---|
| 0:14:20 | uh over the the time | 
|---|
| 0:14:22 | and they are that there are it's so | 
|---|
| 0:14:24 | there are two curves each uh each time one is for the centralized agrees algorithm and the other is for | 
|---|
| 0:14:30 | the distributed algorithm | 
|---|
| 0:14:31 | a center use them here is in blue | 
|---|
| 0:14:33 | distributed is in red | 
|---|
| 0:14:35 | here uh it's in | 
|---|
| 0:14:38 | to as the the the central and and green | 
|---|
| 0:14:41 | the this the distributed | 
|---|
| 0:14:42 | oh of this is for the first user are and this for the second user | 
|---|
| 0:14:46 | and we see that's we reach uh eventually uh well here is not so here | 
|---|
| 0:14:52 | uh that we reach uh uh a a a a a stable | 
|---|
| 0:14:55 | stable points | 
|---|
| 0:14:57 | as predicted by the the our results | 
|---|
| 0:15:00 | so let me and make me can conclude that the first we | 
|---|
| 0:15:04 | we study the uh distribute a stochastic at a reason | 
|---|
| 0:15:07 | uh in both constrained and and the framework | 
|---|
| 0:15:12 | we provide a realistic sufficient conditions problem for convergence with probability you one | 
|---|
| 0:15:18 | and we stated the central limit your M | 
|---|
| 0:15:21 | only for the the and constrained uh case | 
|---|
| 0:15:24 | the the the | 
|---|
| 0:15:26 | as the as true to work we we would like to get read of this uh | 
|---|
| 0:15:30 | the buttons the cast uh | 
|---|
| 0:15:32 | this doubly stochastic settee constraints of over the weight matrices since is | 
|---|
| 0:15:35 | a bit cumbersome for uh for network | 
|---|
| 0:15:38 | and and uh we would like so to establish a such a a a a a central limit your theorem | 
|---|
| 0:15:42 | in the in the constraint case | 
|---|
| 0:15:44 | thank you for a century | 
|---|
| 0:16:04 | i | 
|---|
| 0:16:09 | but why and this one is given to the node | 
|---|
| 0:16:15 | i | 
|---|
| 0:16:21 | in the is yeah | 
|---|
| 0:16:23 | in | 
|---|
| 0:16:29 | okay for a for as if if you want to to to get into the is that the one bite | 
|---|
| 0:16:33 | at least does and use of course that was go sixteen since it's | 
|---|
| 0:16:36 | i | 
|---|
| 0:16:44 | yeah the a our point here is not uh i we absolutely aware of a | 
|---|
| 0:16:48 | of that's it this algorithm | 
|---|
| 0:16:52 | yeah this is a sorry that is what what what what we say here that i i resume is is | 
|---|
| 0:16:56 | known and we we don't claim uh | 
|---|
| 0:16:58 | note from a from was from the is | 
|---|
| 0:17:00 | to T Vs case where with | 
|---|
| 0:17:02 | for | 
|---|