0:00:18hi everybody
0:00:19uh my name is uh any check of each and today i'm going any i'm going to to to to
0:00:24talk about distributed optimization like
0:00:26uh uh uh in that some as talk before
0:00:30uh and uh i i i should say two things
0:00:33before beginning uh first uh of course i should acknowledge my
0:00:37uh a the to got don't key just some fun will the shame
0:00:40we are uh all the could exact combine take
0:00:44just some and what shame are also if you to used it to uh with this an R S
0:00:49and uh uh say also that some results
0:00:52i'm going to present uh where derive the very recently and uh are not in the
0:00:58uh in our uh i guess a paper
0:01:01so for starters the let uh
0:01:04uh us a few words about uh uh uh a action uh in i imagine a uh i i have
0:01:09a function uh F and i want to minimize a lot fat a finite dimensional
0:01:13uh factor space
0:01:15uh uh you uh all no uh uh uh uh i'm sure the deterministic gradient algorithm which consist
0:01:21a which is an a relative a them and uh uh uh consist of a
0:01:26following the the gradient lines
0:01:28uh actually if i want to minimize i uh i take the opposite direction with uh
0:01:33uh ghana a sub being uh a a a positive steps
0:01:37he's is uh if i know uh the the function uh F
0:01:42uh if i i i know a function F up to uh a a a random uh uh uh a
0:01:47perturbation
0:01:48uh which is
0:01:50a which has to be and biased
0:01:53uh i i could do the same thing and the the corresponding uh
0:01:57uh i is the is the celebrated the stochastic gradient uh had reason
0:02:03and uh and are um
0:02:05a classical assumptions one uh and full well behaved the of uh functions we can uh one can show that
0:02:12uh the the estimates a sequence converges to the to the roots of the gradient of a
0:02:19so uh we are going to uh to talk about this very uh minimization problem but in a the distributed
0:02:26framework
0:02:27uh
0:02:27so uh first
0:02:29the the outline of my talk use the following at first we are going to
0:02:33uh do with unconstrained optimization and
0:02:36the the the the algorithm i asymptotically
0:02:40uh and then uh move on to constrained optimization and illustrate this on a power a allocation exam
0:02:46so for and constraint uh optimization now
0:02:49we uh i imagine we have uh
0:02:52and network of and agent
0:02:55at work can be a a a a a a a from very different natures of sensors mobile phones roberts
0:03:00a central
0:03:01and agents are connecting are are are connected to according to uh uh a a random uh a graph with
0:03:07supposed to be a a time-varying topology biology
0:03:10and they want to uh
0:03:11to uh achieve a global mission and an for that there are able to to talk in the neighbourhood that
0:03:16to cooperate
0:03:19so uh more precisely in this talk we assume that each agents
0:03:23uh each agent I has the utility function
0:03:27uh F I and we want to uh the network wants to minimize
0:03:33uh the some of the utility function
0:03:36so uh
0:03:38uh
0:03:39we have to say a to and for less and on to uh difficult is the first difficulty is that
0:03:45uh a I i uh ignores
0:03:48uh also utility function of the the the also notes
0:03:51yeah it's we want to uh up to my something that depend on them
0:03:54so uh obviously corporation is going to be needed
0:03:58the second uh remark is that
0:04:00uh agent I uh only knows its own utility function up to a random perturbation
0:04:07so we also have to use
0:04:09the stochastic opera make approximation frame
0:04:13so uh a a a a de algorithm we are going to uh to analyse
0:04:18uh i
0:04:19i stated
0:04:20i i i'm going to comment and uh in the second
0:04:23uh uh uh uh uh is based on uh
0:04:26on i is a to the in the work of a T C send their at
0:04:29has been many contribution of afterwards
0:04:32uh interesting uh were by nee digit that an but menu menu many male work
0:04:37so um
0:04:39uh the agent
0:04:41but it's it's exactly the is the very algorithm that we so in the in the top by of that
0:04:45yeah that the it's a there are two uh two step for this uh agrees and
0:04:50in the first step uh
0:04:52uh each agent receives
0:04:54but uh the the patch the perturbed of the
0:04:57of the gradient and uh updates its estimate uh in a temporary re-estimate
0:05:02uh a that the
0:05:04uh uh uh uh and plus one i
0:05:06oh each agent and uh does this
0:05:09and then the talk
0:05:11uh and uh uh agent
0:05:13for a for instance here a agent i
0:05:16uh at dates he's
0:05:17he's fine and estimate
0:05:19using the temporal
0:05:21estimates of
0:05:22of his neighbours
0:05:23uh uh in a a and the weighted sum
0:05:26so W uh and plus Y i J
0:05:29has to be zero
0:05:31if the nodes uh are not connected
0:05:33this is the the the first of can string
0:05:38a the network model is the following there are also a constraint on the on the weights W
0:05:43uh W S must sum up to one in a a role and uh and uh and sits a double
0:05:50a stochastic metric
0:05:52first uh the first the requirement here is uh a is not very costly uh a uh uh a each
0:05:59agent and i can uh can tune he's weights so as to to to sum of to one but the
0:06:03second one is is uh is more difficult to achieve because agent has to uh
0:06:07to uh to cooperate to uh be able to sum up to one
0:06:11so there's a nice solution uh
0:06:13using P always go sleep as introduced in in so when boyd
0:06:17uh uh that ensures the the W rees
0:06:20matrices is our uh uh doubly stochastic
0:06:23uh we also we the uh we also assume that the is W R i a D that that that
0:06:29can be we and and this force uh assumption is a technical one
0:06:33uh i here with the spectral read used uh essentially actually loosely speaking it says that the network is uh
0:06:39is connected has one connected company
0:06:43so it's a very mild assumption
0:06:46so now let's
0:06:47so the uh is uh move on to D yeah two
0:06:50yes S to tick and i i Z
0:06:52we uh we define a a T to which is that we stack
0:06:56all the the estimates of all the agents in a single or a vector T to
0:07:00and are each interested in uh in two things first in the in the average of the estimates
0:07:06and also in the disagreements
0:07:09a a how the the the local estimates differ from the average
0:07:13so the the the difficult points the the difficult technical point is to uh is to propose uh satisfy a
0:07:20uh a satisfying conditions
0:07:22uh and there's the difficulty in uh in the stability
0:07:26a a be this is the property that all the the do vector or T does ben uh remain bounded
0:07:32with probability one
0:07:34and uh we uh we provide uh such a condition using a yep in a function and and what it
0:07:39when interest in minimisation we can take
0:07:42the function F the as our uh yep and a function i'm not going to
0:07:46to enter the detail
0:07:48oh the first result is uh that's with probably one uh consensus is achieved
0:07:54meaning that all the estimates
0:07:55uh achieve the eventually the same value
0:07:59uh and that the the average of the estimates converge to the set of the row of roots of are
0:08:05uh
0:08:06uh a gradient of F with which are are are uh are target but
0:08:09also if the if this set is the is made of isolated points the then we converse to one of
0:08:16this point
0:08:18uh we also uh uh uh prove the uh uh and all the to uh uh a results which is
0:08:24central limit am
0:08:26uh i'm not going to to to enter into the details but
0:08:29it it just says that uh under or uh uh uh
0:08:34good assumptions
0:08:35uh the the a normalized the a normalized the uh these agreements in distribution
0:08:42two
0:08:43um a a motion vector
0:08:45and i should point out that here uh
0:08:48the did the distribution of this cushion vector is degenerated since all do this the random vectors here are the
0:08:54same
0:08:56so instead of the of uh going into the detail it's me uh uh uh
0:09:01drew you attention on street consequence consequences of this uh
0:09:05uh a result the first one is that
0:09:08the estimate H D D estimates converges
0:09:11uh a a converge at speed uh square root of a of can i
0:09:16and uh a consequence of this consequence is that the the algorithm performs as well as the the centralized agrees
0:09:23and
0:09:24and also from the degenerate nature of the distribution with so
0:09:28we can uh uh uh uh a and that the disagreements
0:09:32uh uh uh are nick can negligible
0:09:35uh with uh
0:09:37with respect to the difference between the target uh value
0:09:41because the the the
0:09:43the distribution of the vector
0:09:45uh puts mask only on the consensus lie
0:09:50so uh another uh another uh remark is that uh uh uh what another um
0:09:58uh
0:09:59yeah remark is that it's since this is a a agreement is achieve
0:10:02in in the at very high speed
0:10:04uh compared to the to the mall
0:10:08to do the and the speed to watch to the the target value
0:10:11we we do not need to communicate to often and
0:10:13and actually we could show that's uh if the probability
0:10:17uh that there's a communication uh a goes to zero
0:10:21as as uh uh i i as soon as it i the it's lower than uh one of or score
0:10:26would of N
0:10:27we can still guarantee the convergence and and hence
0:10:30save networks and and keeps the keeping the same performance
0:10:35so now let's go to a too constrained optimization the my problem is the same except
0:10:40that now a my estimates are required to a to remain in the compact and convex set uh kept D
0:10:47of D
0:10:48and uh this uh this set is known by all the agents
0:10:52so uh we could use a a pretty much the same i reason except that now
0:10:57if one estimate
0:10:58uh goes out of a said D we uh bring it back to the boundary uh of the using uh
0:11:04a projection
0:11:07uh a a and of the second that's the second guy step is and change
0:11:12so uh a uh two remarks that the first remark is that the now no more stability issues which which
0:11:18is a difficult to technical point so it's a good news but the bad news that a the projection
0:11:23yeah introduce the introduces other a technical difficult
0:11:28a a uh a result is that the consensus is still achieved with probability you one
0:11:35and that the average of the estimates converge to the set of uh kick K you points
0:11:40and again if uh if this set is made of the isolated points then uh the average converges to one
0:11:47of this
0:11:50uh perhaps i should sketch the that should us keep the the the sketch of proof and and get it
0:11:56a to eighty five i have time
0:11:58so uh uh uh a a
0:12:01as an illustration uh we are are going to address the problem of uh
0:12:06of uh interference channel we have to uh
0:12:09source destination uh pass
0:12:12and uh we assume that there is no uh channel state information at the transmitter are only the nations
0:12:19observes that's the channel the channel gains
0:12:21uh and the the
0:12:25the goal here is that the dish the destinations rates
0:12:28in order to minimize the a
0:12:32the probability of error or uh of the the the channels
0:12:36and this uh i the exact function they want to minimize is actually
0:12:40uh a weighted sum of all the probability of error for each source path uh destination
0:12:46and uh the constrained come from from the fact that each uh transmitter can not
0:12:51uh uh use uh a a a power a power more than and uh a given uh
0:12:56uh a given threshold
0:12:59oh uh uh uh each user or has a a a a a an estimates uh
0:13:05it's it's the destination has a an estimate of all the
0:13:09the the the transmitter are whereas
0:13:12and we apply the the
0:13:15the algorithm reason we presented a just the replacing these the abstract the great that of F with our specific
0:13:21function to maximise
0:13:23uh to minimize sorry and uh and using uh uh uh
0:13:27the specific said D as the imposed by a can stray
0:13:31and the for for what we we we've seen we can guarantee that
0:13:37uh this
0:13:37i agrees um you is going to converge to a a
0:13:40a look minimum a a a zero ugh like a cake at points
0:13:45and the and with probability one
0:13:48so we could do the
0:13:49we could use a more uh a a a uh involved the
0:13:54uh settings the we more than uh
0:13:57uh with more than the uh one channel
0:14:00or from each source
0:14:01uh this nation
0:14:04so uh let's uh show some numerical a very rapidly some new right numerical results
0:14:09uh used here we plotted the um
0:14:13the power allocation for the two uh
0:14:16uh transmitters uh estimated the
0:14:20uh over the the time
0:14:22and they are that there are it's so
0:14:24there are two curves each uh each time one is for the centralized agrees algorithm and the other is for
0:14:30the distributed algorithm
0:14:31a center use them here is in blue
0:14:33distributed is in red
0:14:35here uh it's in
0:14:38to as the the the central and and green
0:14:41the this the distributed
0:14:42oh of this is for the first user are and this for the second user
0:14:46and we see that's we reach uh eventually uh well here is not so here
0:14:52uh that we reach uh uh a a a a a stable
0:14:55stable points
0:14:57as predicted by the the our results
0:15:00so let me and make me can conclude that the first we
0:15:04we study the uh distribute a stochastic at a reason
0:15:07uh in both constrained and and the framework
0:15:12we provide a realistic sufficient conditions problem for convergence with probability you one
0:15:18and we stated the central limit your M
0:15:21only for the the and constrained uh case
0:15:24the the the
0:15:26as the as true to work we we would like to get read of this uh
0:15:30the buttons the cast uh
0:15:32this doubly stochastic settee constraints of over the weight matrices since is
0:15:35a bit cumbersome for uh for network
0:15:38and and uh we would like so to establish a such a a a a a central limit your theorem
0:15:42in the in the constraint case
0:15:44thank you for a century
0:16:04i
0:16:09but why and this one is given to the node
0:16:15i
0:16:21in the is yeah
0:16:23in
0:16:29okay for a for as if if you want to to to get into the is that the one bite
0:16:33at least does and use of course that was go sixteen since it's
0:16:36i
0:16:44yeah the a our point here is not uh i we absolutely aware of a
0:16:48of that's it this algorithm
0:16:52yeah this is a sorry that is what what what what we say here that i i resume is is
0:16:56known and we we don't claim uh
0:16:58note from a from was from the is
0:17:00to T Vs case where with
0:17:02for