that

it there is a joint work with uh

and i is due that of a seems to the found my

the

and you guy and uh a a separation much very from us C

um so first

the problem we are considering a a is that dynamic spectrum access and on the model

and the is illustrated that system where looking at to um

there are and need given then channels uh

i i don't get it it to a primary system that's using a slot transmission structure

and uh of second and their use of the objective here is to find the the i'd slot and the

limit its sensing

oh we a at any given so uh a set use the can only choose K channels to set

and if

the channel is i do it what transmit and uh

if you transmit if you patch what i'd slot you will receive one unit to work then essentially is you

get one packet delete

and the problem here is how to way i choose

which K channels was we should sense and the beginning of a

based on

observation as a visual have so far

and to answer this question the first thing we need to specify

is the model of channel all two

so if we look at

each channel so

each channel is essentially piece the i do is a stochastic process

and each channel is model as a stock as a process

with on parameter

so we do know what kind of stochastic process with be ways

but the prime in

so the simple liz

so the process we can be

i be here will be a i pro

so the channel or as the ah for channel i will the i people in the only with a known

means it

so see the i'd say is the probability channellise i'd in any given as

and

changing

i

and not

we want to choose

without knowing which channel was more likely to be bus so i don't

imagining for pink is with on the by S and you want to get as many had

oh

so now how do we choose

channel

um K out of

you know there is lot

and objective is to maximise house swoop

so what is the maximum throughput would we kind

even assume

we not the perfect

we know all the C

in that scenario we no we should always sense the channel with the largest to seat

with

the max probability

i

and you that case

i was we'll put we look at it is just see

as every i does well we get

when you in

the now without knowing all the see guys is it possible to achieve the same maximum throughput be fine by

the non model seen that

and if yes how

a turns of this is actually a well studied problem you've we take a D model

this is you was special case of more the amp and

vol

and the region problem we can explain using this slot machine with a dependent are

and that you just time count is that like one um to play

but we don't know the we word statistics

and the we word processor soon i

and maximise long term

i

as essence of this problem is in the tradeoff between exploitation and exploration

so you to it to like

we want to play a a a um that has a large is the sample me based on what we

have

is there are so

but that same time

we have to play each are sufficiently many both

a sufficient number of times it so that our some whole me is close enough to the true

oh that's essentially the

i which a problem

and uh you can be read was we formulate it in the following way

see the one to see that and is the we what mean of each are which is a little or

no

and the maximum we what we can i get over a reading of lance T

you

it has it's no is given by the maximal see that i use subscript the one here to in the

extreme the here

to see the white the largest one multiplied by right

which is achieved by always playing the bass ah

very into it

or not we don't know all the see so we don't know which way is the best um

we can design design a policy high

that tell us which um to play based on what we have

there are so far

and uh this is the re what we can get

for a given policy

and how do we know how to these policy is

it's measured by that you friends

to these ideals in their real way where you do not

so this is called the required

also called cost of nor learning that's the cost we pay

for not snowy them uh

and

very intuitive to this

every time we spend time

a a a a a a a that's not the

best

we waiting incur a loss of C that one minus seat

we are spending time

i

a multiplied by the expected time

we

yeah play a the ice best

so to minimize wet

we want to minimize a model

spend a each

and uh object here is to minimize the growth rate of the web

she

it's intuitive

whether always grew of with time because we and never stop playing

each a

to make sure our sample means zach

and we can also see

if we have a sampling year roles of whether with time T then we can achieve the maximum average reward

of the best uh

a find five

the normal models area because if you divide by T C these is not be know you with

and there is a problem is nice solved by at all

i five

they show that

the minimum required will street is lot

with

T

they also found the best leading constant uh and given in terms of kl divergence

and they the also construct a the policy to achieve peace past

well

but was rather complicated policy

and uh about ten to a fifty years later

there's simple policies to achieve lot

at

this is

essentially a index

type of policy based on simple me

one of the most uh uh uh well no one is

proposed by our uh

is the so called U C B one policy

so in this case what compute the in that

for each a are at each time T

the in is them play the sum of me of these are

class

this

turn is called of the confidence of on

used term is given by a square with the two of log T divide by G i

yeah i is the number of times we have played arm

i

not a car

so you can see

if we are playing a a a a significant a last log T used remote allow

it will make the index the large as

that we will have to

so the is always played around with a large

oh use actually were thing

the policy to at you spend a lot of keep time

each uh

which that's what line all be has shown that

what we have

so you've what it you know i be model though wow a channel occupancy

the problem is completely so

it's just a special case of the classic the um and

so you we consider a more general more um

complex model that's the each channel is a markov chain

to state B the i don't

it change according to this

a a transition that

here

and we don't know the transition problem it's we don't know P zero one Q one

at all the channels are still has had an equal they have the same transition

not

how do we design channels action policy

to maximise

swoop

turns out

this is a variation of the uh a more yeah and it problem but has never been self so we

are

essentially a on the first to to address

these

more down band

and um my

and the what to challenges here

so first here we can see if we have a markovian model for each channel

then the optimal policy even if we know the transition probability

is no longer the state home one channel

we need to

so each across channels based on our prediction or which channel is more likely to

i do like these

a fixed

because of the channel memory makes prediction

so that in this case we can not simply

minimize our required by funding at that time was spent on a bad channel

because

there's no really bad channel

say we need to switch the was smart we are all switching on much

so you actually large mine to or what is the best way

to switching on among channels based on how observation

and in general

there are infinite possibilities of

how to to switching

so the wheel are

from these infinite possibility which one is the bad

so that's really that difficult part

and perhaps the reason that this problem has not addressed

a lead to each

so actually turns out

if we do one of the transition probability

it's is this problem itself is complex enough not this is is actually the S is multi um band problem

and they're not that that makes

uh formulate that we do ninety eight eight

and this problem in general is peace space ha

that's shown but have them each one that's

the cool think here is we have a special recipe spend it

the think we are you ways here is the only a two state markov G channel each channel

and we have shown and they no model

the optimal policy is a simple now

and do this is uh a a a a real work uh um have down with a a couple of

clever with er

so the optimality is

current to and there's certain conditions

so if we have a key one one we to that P zero one which be is is that daisy

chain

it's more likely to stay rather than change state

then of to melody is generally um shoe

but that you are the think like you P was model that P zero once a sure what three channels

and the all K equal to and mine there's one

but we come it's can gently these holds for all and for

you you a rising case we have be able to prove

so but that's

assume we can this time your to project or is true then we know and the not model myopic policy

is the optimal

and the more significant to these on the the problem

is that the mouth you policy has a simple so a universal struck

oh which means a we need to know is what if P one mine's squid to that P zero one

we always switch in this way

as we stay at a i don't channel if this channel with current the sense is i don't who was

stay with

you it becomes a it then we switched to the channel visited the log is

that's are are way of speech

and if

you want one small the

if zero one then way switching a you from way

we actually stay if the channel was a the and the was which if becomes i with where

why you re what then we go to somewhere else

and how the with switch with switched to the channel most recently visited a are all channels with a T

that you most of all

you've there are not such an no

the we switch to the channel busy long as

anyway

there essentially a of this says is all optimal we of change

among channels

i'm only takes two possible

depending on the transition

so that's really

makes the are no problem

significant a simpler to some instead of searching my infinite possible ways of how to switching on channels

we know there are only two possible

if we don't to P zero one

what is either there this way or that

the

we can formulate a sort of conceptual more the and it

well

so we treat each way of channel switching as a a

you really can see that what is a a a remote on a problem

is nothing but a choice

a possible option

have and we don't know which of change the bass

as exactly what do we have

so then not we'll a which um is that what are we only have forms we of switching

so we want to do which one is the good one

because we don't know the transition probably so we don't know which one to choose

so it same we just read of the problem to the class more they are banded problem

not that i C there are two channel is we have to address

the first one the price are we have to answer is how long to play each i

so a region that we just play one as we got some word for the classic

model

but not a a a a a one way of switching on much

so we have to follow that we of switching among channels long enough you all there to see if i

so how long we play

and terms not to the optimal lattice of formal each way use spell star

depends on the transition probability

unfortunately fortunately and that's something we don't not

so that's one they we have to dress

and the out way is the reward in this case and the longer i be kinda

not even independent

oh i be of rows are

because

each um is are only able way of switching on among channels

and that they are also reaching a on the same as that of and J

they are no longer

and that right

so to address these two challenges that's our uh solution here

so first how long to play each uh

the optimal lattice of playing

each arm of only each we of a channel switching

depends on the parameters are no priming mean

so to get a a a round of was back to the trick we have here is

we play a each and with increasing that

so we do not fixed

a a specific lance for playing each a

but rather choose a sequence that rules with

and

we will pace

certain price in terms of required

but we can choose the increasing read of view sequence which really small a lot a lot of T

and

do the price a pay we pay will be how which we really

a

yeah a thing to deal with not i T samples

um what we did it is we um proved a modified each one of thing

which essentially is the chief for proving a lot required to in the classic i D and eight uh

model

so the uh channel thing an eh

uh are we should not foundation is all the conditional um

expectations should have should have the same me so that a conditional exactly expectation should be the same me

and then we come by to the probability that

the sample mean is to far from the truth

and in this case we how we don't have identical um every we sample anymore

so we need to to modify that because the conditional expectation no longer the same

but we can get a the result if the conditional expectations are with thing

a certain range of each other

so

then we can use similar techniques to prove there is

and the what we have shown here is

we can achieve a which very close to a log all there for the required

essentially

in in front of a lot we don't have a constant anymore is

it's but a you know increasing sequence

but this sequence can be a it is small

as slow as you want

so you choose

a small you want to you hug close you want to get a lot T and based on that

we can choose

the increasing the for each

L N

so i was give the details

on the relation between G T and uh L T actually very simple

is just showing at each time

what is the current L where you

and uh that's a it or what is conclude um

this paper here

so what

we have considered a a a a of the problem was a multi bit it by um dynamic spectrum access

um we believe this general approach would be useful for for solving a either

uh rest we spend it probably even a mark of general mark of C your process is on the primary

a what is the message here

the method here is if we are dealing with a small T are bandit problem

with a known

uh then them

and we want to achieve

the same i reach word as we can get and the not

so in general problem is intractable

but the problem may have a whole

if

and the the not model

the optimal policy has what we call a find that option

by means

there are only fun and two ways of

do we find the and optimal policy only has four that possible for

depending on the primary

don't we treat each possible for

as as a a a a we don't to know the prior

parameters then we tried to or which one is actually a one for that and a line on them

so that's the general message

but of course for each specific problem we have to but yeah sure

whether log

oh i'm sure close to log where can be but she

and in the context of dynamic spectrum access that's great we have been focusing are

the problem is really possible because house

of this i a universal structure of the our P policy for that

no model pace

and uh i a little bit advertisement

um i be giving a another problem related a problem being about forty minutes

uh

because this one

a uh we can only somehow for a very special cases where you have only to state markov chain and

all the channels are are are arms a stochastic identical

the what if we're dealing with a general fine state markov chain with the no in the transition problem

what can we do there

so this is paper actually is a dressing

this complete be general model

um but we can also achieve not require

but

the definition of the required it is uh

we

oh

oh

path

i don't one person something to sort of question from the speech

true

oh i yes

so that's essentially here

you that cool

yes yeah

so

that the required we get it is these G Q locked chi right so we want chi G to grow

very slow is time T

and uh so if you want to achieve a lot a lot should be

uh for chi chi

then

eventually do sequence

uh

this is just

the number of

but what is the value of L for the current

so

L one i is

well we start we play one i L L one

slot

so then for the first L once lots the these sequence was what have the same value a one

for the that

what a a a a a model be out to we play out to time

so for the next L tools as well as the but will be out

so

L is increasing

so these sequence will be also be in crazy but even slower or

because it's repeating the same value

but it

yeah

okay

or running times to time well just a record some

um

i field to see if a connection was not little huh but this is really trying to learn

um

trying to maximise this throughput put and there known parameters

but a a is more like the inclusion was your peer users here we even only have one

uh and secondary user

i i i probably miss

um

but an action that i i don't really see whether two problems i really to

um

maybe we can talk more data

you to to rooms

so so let's

that's

the speaker