hi everybody

uh my name is uh any check of each and today i'm going any i'm going to to to to

talk about distributed optimization like

uh uh uh in that some as talk before

uh and uh i i i should say two things

before beginning uh first uh of course i should acknowledge my

uh a the to got don't key just some fun will the shame

we are uh all the could exact combine take

just some and what shame are also if you to used it to uh with this an R S

and uh uh say also that some results

i'm going to present uh where derive the very recently and uh are not in the

uh in our uh i guess a paper

so for starters the let uh

uh us a few words about uh uh uh a action uh in i imagine a uh i i have

a function uh F and i want to minimize a lot fat a finite dimensional

uh factor space

uh uh you uh all no uh uh uh uh i'm sure the deterministic gradient algorithm which consist

a which is an a relative a them and uh uh uh consist of a

following the the gradient lines

uh actually if i want to minimize i uh i take the opposite direction with uh

uh ghana a sub being uh a a a positive steps

he's is uh if i know uh the the function uh F

uh if i i i know a function F up to uh a a a random uh uh uh a

perturbation

uh which is

a which has to be and biased

uh i i could do the same thing and the the corresponding uh

uh i is the is the celebrated the stochastic gradient uh had reason

and uh and are um

a classical assumptions one uh and full well behaved the of uh functions we can uh one can show that

uh the the estimates a sequence converges to the to the roots of the gradient of a

so uh we are going to uh to talk about this very uh minimization problem but in a the distributed

framework

uh

so uh first

the the outline of my talk use the following at first we are going to

uh do with unconstrained optimization and

the the the the algorithm i asymptotically

uh and then uh move on to constrained optimization and illustrate this on a power a allocation exam

so for and constraint uh optimization now

we uh i imagine we have uh

and network of and agent

at work can be a a a a a a a from very different natures of sensors mobile phones roberts

a central

and agents are connecting are are are connected to according to uh uh a a random uh a graph with

supposed to be a a time-varying topology biology

and they want to uh

to uh achieve a global mission and an for that there are able to to talk in the neighbourhood that

to cooperate

so uh more precisely in this talk we assume that each agents

uh each agent I has the utility function

uh F I and we want to uh the network wants to minimize

uh the some of the utility function

so uh

uh

we have to say a to and for less and on to uh difficult is the first difficulty is that

uh a I i uh ignores

uh also utility function of the the the also notes

yeah it's we want to uh up to my something that depend on them

so uh obviously corporation is going to be needed

the second uh remark is that

uh agent I uh only knows its own utility function up to a random perturbation

so we also have to use

the stochastic opera make approximation frame

so uh a a a a de algorithm we are going to uh to analyse

uh i

i stated

i i i'm going to comment and uh in the second

uh uh uh uh uh is based on uh

on i is a to the in the work of a T C send their at

has been many contribution of afterwards

uh interesting uh were by nee digit that an but menu menu many male work

so um

uh the agent

but it's it's exactly the is the very algorithm that we so in the in the top by of that

yeah that the it's a there are two uh two step for this uh agrees and

in the first step uh

uh each agent receives

but uh the the patch the perturbed of the

of the gradient and uh updates its estimate uh in a temporary re-estimate

uh a that the

uh uh uh uh and plus one i

oh each agent and uh does this

and then the talk

uh and uh uh agent

for a for instance here a agent i

uh at dates he's

he's fine and estimate

using the temporal

estimates of

of his neighbours

uh uh in a a and the weighted sum

so W uh and plus Y i J

has to be zero

if the nodes uh are not connected

this is the the the first of can string

a the network model is the following there are also a constraint on the on the weights W

uh W S must sum up to one in a a role and uh and uh and sits a double

a stochastic metric

first uh the first the requirement here is uh a is not very costly uh a uh uh a each

agent and i can uh can tune he's weights so as to to to sum of to one but the

second one is is uh is more difficult to achieve because agent has to uh

to uh to cooperate to uh be able to sum up to one

so there's a nice solution uh

using P always go sleep as introduced in in so when boyd

uh uh that ensures the the W rees

matrices is our uh uh doubly stochastic

uh we also we the uh we also assume that the is W R i a D that that that

can be we and and this force uh assumption is a technical one

uh i here with the spectral read used uh essentially actually loosely speaking it says that the network is uh

is connected has one connected company

so it's a very mild assumption

so now let's

so the uh is uh move on to D yeah two

yes S to tick and i i Z

we uh we define a a T to which is that we stack

all the the estimates of all the agents in a single or a vector T to

and are each interested in uh in two things first in the in the average of the estimates

and also in the disagreements

a a how the the the local estimates differ from the average

so the the the difficult points the the difficult technical point is to uh is to propose uh satisfy a

uh a satisfying conditions

uh and there's the difficulty in uh in the stability

a a be this is the property that all the the do vector or T does ben uh remain bounded

with probability one

and uh we uh we provide uh such a condition using a yep in a function and and what it

when interest in minimisation we can take

the function F the as our uh yep and a function i'm not going to

to enter the detail

oh the first result is uh that's with probably one uh consensus is achieved

meaning that all the estimates

uh achieve the eventually the same value

uh and that the the average of the estimates converge to the set of the row of roots of are

uh

uh a gradient of F with which are are are uh are target but

also if the if this set is the is made of isolated points the then we converse to one of

this point

uh we also uh uh uh prove the uh uh and all the to uh uh a results which is

central limit am

uh i'm not going to to to enter into the details but

it it just says that uh under or uh uh uh

good assumptions

uh the the a normalized the a normalized the uh these agreements in distribution

two

um a a motion vector

and i should point out that here uh

the did the distribution of this cushion vector is degenerated since all do this the random vectors here are the

same

so instead of the of uh going into the detail it's me uh uh uh

drew you attention on street consequence consequences of this uh

uh a result the first one is that

the estimate H D D estimates converges

uh a a converge at speed uh square root of a of can i

and uh a consequence of this consequence is that the the algorithm performs as well as the the centralized agrees

and

and also from the degenerate nature of the distribution with so

we can uh uh uh uh a and that the disagreements

uh uh uh are nick can negligible

uh with uh

with respect to the difference between the target uh value

because the the the

the distribution of the vector

uh puts mask only on the consensus lie

so uh another uh another uh remark is that uh uh uh what another um

uh

yeah remark is that it's since this is a a agreement is achieve

in in the at very high speed

uh compared to the to the mall

to do the and the speed to watch to the the target value

we we do not need to communicate to often and

and actually we could show that's uh if the probability

uh that there's a communication uh a goes to zero

as as uh uh i i as soon as it i the it's lower than uh one of or score

would of N

we can still guarantee the convergence and and hence

save networks and and keeps the keeping the same performance

so now let's go to a too constrained optimization the my problem is the same except

that now a my estimates are required to a to remain in the compact and convex set uh kept D

of D

and uh this uh this set is known by all the agents

so uh we could use a a pretty much the same i reason except that now

if one estimate

uh goes out of a said D we uh bring it back to the boundary uh of the using uh

a projection

uh a a and of the second that's the second guy step is and change

so uh a uh two remarks that the first remark is that the now no more stability issues which which

is a difficult to technical point so it's a good news but the bad news that a the projection

yeah introduce the introduces other a technical difficult

a a uh a result is that the consensus is still achieved with probability you one

and that the average of the estimates converge to the set of uh kick K you points

and again if uh if this set is made of the isolated points then uh the average converges to one

of this

uh perhaps i should sketch the that should us keep the the the sketch of proof and and get it

a to eighty five i have time

so uh uh uh a a

as an illustration uh we are are going to address the problem of uh

of uh interference channel we have to uh

source destination uh pass

and uh we assume that there is no uh channel state information at the transmitter are only the nations

observes that's the channel the channel gains

uh and the the

the goal here is that the dish the destinations rates

in order to minimize the a

the probability of error or uh of the the the channels

and this uh i the exact function they want to minimize is actually

uh a weighted sum of all the probability of error for each source path uh destination

and uh the constrained come from from the fact that each uh transmitter can not

uh uh use uh a a a power a power more than and uh a given uh

uh a given threshold

oh uh uh uh each user or has a a a a a an estimates uh

it's it's the destination has a an estimate of all the

the the the transmitter are whereas

and we apply the the

the algorithm reason we presented a just the replacing these the abstract the great that of F with our specific

function to maximise

uh to minimize sorry and uh and using uh uh uh

the specific said D as the imposed by a can stray

and the for for what we we we've seen we can guarantee that

uh this

i agrees um you is going to converge to a a

a look minimum a a a zero ugh like a cake at points

and the and with probability one

so we could do the

we could use a more uh a a a uh involved the

uh settings the we more than uh

uh with more than the uh one channel

or from each source

uh this nation

so uh let's uh show some numerical a very rapidly some new right numerical results

uh used here we plotted the um

the power allocation for the two uh

uh transmitters uh estimated the

uh over the the time

and they are that there are it's so

there are two curves each uh each time one is for the centralized agrees algorithm and the other is for

the distributed algorithm

a center use them here is in blue

distributed is in red

here uh it's in

to as the the the central and and green

the this the distributed

oh of this is for the first user are and this for the second user

and we see that's we reach uh eventually uh well here is not so here

uh that we reach uh uh a a a a a stable

stable points

as predicted by the the our results

so let me and make me can conclude that the first we

we study the uh distribute a stochastic at a reason

uh in both constrained and and the framework

we provide a realistic sufficient conditions problem for convergence with probability you one

and we stated the central limit your M

only for the the and constrained uh case

the the the

as the as true to work we we would like to get read of this uh

the buttons the cast uh

this doubly stochastic settee constraints of over the weight matrices since is

a bit cumbersome for uh for network

and and uh we would like so to establish a such a a a a a central limit your theorem

in the in the constraint case

thank you for a century

i

but why and this one is given to the node

i

in the is yeah

in

okay for a for as if if you want to to to get into the is that the one bite

at least does and use of course that was go sixteen since it's

i

yeah the a our point here is not uh i we absolutely aware of a

of that's it this algorithm

yeah this is a sorry that is what what what what we say here that i i resume is is

known and we we don't claim uh

note from a from was from the is

to T Vs case where with

for