0:00:15 | that |
---|---|

0:00:17 | it there is a joint work with uh |

0:00:19 | and i is due that of a seems to the found my |

0:00:22 | the |

0:00:23 | and you guy and uh a a separation much very from us C |

0:00:27 | um so first |

0:00:29 | the problem we are considering a a is that dynamic spectrum access and on the model |

0:00:34 | and the is illustrated that system where looking at to um |

0:00:38 | there are and need given then channels uh |

0:00:41 | i i don't get it it to a primary system that's using a slot transmission structure |

0:00:46 | and uh of second and their use of the objective here is to find the the i'd slot and the |

0:00:52 | limit its sensing |

0:00:53 | oh we a at any given so uh a set use the can only choose K channels to set |

0:00:59 | and if |

0:01:00 | the channel is i do it what transmit and uh |

0:01:04 | if you transmit if you patch what i'd slot you will receive one unit to work then essentially is you |

0:01:09 | get one packet delete |

0:01:11 | and the problem here is how to way i choose |

0:01:14 | which K channels was we should sense and the beginning of a |

0:01:18 | based on |

0:01:20 | observation as a visual have so far |

0:01:21 | and to answer this question the first thing we need to specify |

0:01:25 | is the model of channel all two |

0:01:27 | so if we look at |

0:01:29 | each channel so |

0:01:30 | each channel is essentially piece the i do is a stochastic process |

0:01:34 | and each channel is model as a stock as a process |

0:01:37 | with on parameter |

0:01:39 | so we do know what kind of stochastic process with be ways |

0:01:42 | but the prime in |

0:01:44 | so the simple liz |

0:01:45 | so the process we can be |

0:01:47 | i be here will be a i pro |

0:01:50 | so the channel or as the ah for channel i will the i people in the only with a known |

0:01:55 | means it |

0:01:56 | so see the i'd say is the probability channellise i'd in any given as |

0:02:01 | and |

0:02:02 | changing |

0:02:02 | i |

0:02:04 | and not |

0:02:05 | we want to choose |

0:02:07 | without knowing which channel was more likely to be bus so i don't |

0:02:11 | imagining for pink is with on the by S and you want to get as many had |

0:02:16 | oh |

0:02:17 | so now how do we choose |

0:02:18 | channel |

0:02:19 | um K out of |

0:02:20 | you know there is lot |

0:02:22 | and objective is to maximise house swoop |

0:02:25 | so what is the maximum throughput would we kind |

0:02:28 | even assume |

0:02:29 | we not the perfect |

0:02:31 | we know all the C |

0:02:33 | in that scenario we no we should always sense the channel with the largest to seat |

0:02:38 | with |

0:02:39 | the max probability |

0:02:40 | i |

0:02:41 | and you that case |

0:02:42 | i was we'll put we look at it is just see |

0:02:45 | as every i does well we get |

0:02:46 | when you in |

0:02:48 | the now without knowing all the see guys is it possible to achieve the same maximum throughput be fine by |

0:02:54 | the non model seen that |

0:02:55 | and if yes how |

0:02:58 | a turns of this is actually a well studied problem you've we take a D model |

0:03:03 | this is you was special case of more the amp and |

0:03:06 | vol |

0:03:07 | and the region problem we can explain using this slot machine with a dependent are |

0:03:14 | and that you just time count is that like one um to play |

0:03:17 | but we don't know the we word statistics |

0:03:20 | and the we word processor soon i |

0:03:23 | and maximise long term |

0:03:25 | i |

0:03:26 | as essence of this problem is in the tradeoff between exploitation and exploration |

0:03:32 | so you to it to like |

0:03:33 | we want to play a a a um that has a large is the sample me based on what we |

0:03:37 | have |

0:03:38 | is there are so |

0:03:39 | but that same time |

0:03:40 | we have to play each are sufficiently many both |

0:03:44 | a sufficient number of times it so that our some whole me is close enough to the true |

0:03:49 | oh that's essentially the |

0:03:51 | i which a problem |

0:03:53 | and uh you can be read was we formulate it in the following way |

0:03:56 | see the one to see that and is the we what mean of each are which is a little or |

0:04:01 | no |

0:04:02 | and the maximum we what we can i get over a reading of lance T |

0:04:07 | you |

0:04:08 | it has it's no is given by the maximal see that i use subscript the one here to in the |

0:04:13 | extreme the here |

0:04:14 | to see the white the largest one multiplied by right |

0:04:17 | which is achieved by always playing the bass ah |

0:04:20 | very into it |

0:04:22 | or not we don't know all the see so we don't know which way is the best um |

0:04:26 | we can design design a policy high |

0:04:29 | that tell us which um to play based on what we have |

0:04:32 | there are so far |

0:04:33 | and uh this is the re what we can get |

0:04:36 | for a given policy |

0:04:38 | and how do we know how to these policy is |

0:04:41 | it's measured by that you friends |

0:04:43 | to these ideals in their real way where you do not |

0:04:46 | so this is called the required |

0:04:48 | also called cost of nor learning that's the cost we pay |

0:04:51 | for not snowy them uh |

0:04:53 | and |

0:04:54 | very intuitive to this |

0:04:56 | every time we spend time |

0:04:58 | a a a a a a a that's not the |

0:05:01 | best |

0:05:02 | we waiting incur a loss of C that one minus seat |

0:05:05 | we are spending time |

0:05:07 | i |

0:05:08 | a multiplied by the expected time |

0:05:10 | we |

0:05:11 | yeah play a the ice best |

0:05:13 | so to minimize wet |

0:05:14 | we want to minimize a model |

0:05:16 | spend a each |

0:05:18 | and uh object here is to minimize the growth rate of the web |

0:05:23 | she |

0:05:23 | it's intuitive |

0:05:25 | whether always grew of with time because we and never stop playing |

0:05:28 | each a |

0:05:29 | to make sure our sample means zach |

0:05:32 | and we can also see |

0:05:34 | if we have a sampling year roles of whether with time T then we can achieve the maximum average reward |

0:05:40 | of the best uh |

0:05:41 | a find five |

0:05:42 | the normal models area because if you divide by T C these is not be know you with |

0:05:49 | and there is a problem is nice solved by at all |

0:05:53 | i five |

0:05:54 | they show that |

0:05:55 | the minimum required will street is lot |

0:05:58 | with |

0:05:58 | T |

0:05:59 | they also found the best leading constant uh and given in terms of kl divergence |

0:06:04 | and they the also construct a the policy to achieve peace past |

0:06:08 | well |

0:06:09 | but was rather complicated policy |

0:06:11 | and uh about ten to a fifty years later |

0:06:14 | there's simple policies to achieve lot |

0:06:17 | at |

0:06:17 | this is |

0:06:18 | essentially a index |

0:06:20 | type of policy based on simple me |

0:06:22 | one of the most uh uh uh well no one is |

0:06:25 | proposed by our uh |

0:06:27 | is the so called U C B one policy |

0:06:29 | so in this case what compute the in that |

0:06:31 | for each a are at each time T |

0:06:34 | the in is them play the sum of me of these are |

0:06:38 | class |

0:06:39 | this |

0:06:39 | turn is called of the confidence of on |

0:06:42 | used term is given by a square with the two of log T divide by G i |

0:06:47 | yeah i is the number of times we have played arm |

0:06:50 | i |

0:06:51 | not a car |

0:06:53 | so you can see |

0:06:54 | if we are playing a a a a significant a last log T used remote allow |

0:07:00 | it will make the index the large as |

0:07:02 | that we will have to |

0:07:03 | so the is always played around with a large |

0:07:06 | oh use actually were thing |

0:07:08 | the policy to at you spend a lot of keep time |

0:07:11 | each uh |

0:07:12 | which that's what line all be has shown that |

0:07:15 | what we have |

0:07:16 | so you've what it you know i be model though wow a channel occupancy |

0:07:20 | the problem is completely so |

0:07:22 | it's just a special case of the classic the um and |

0:07:27 | so you we consider a more general more um |

0:07:30 | complex model that's the each channel is a markov chain |

0:07:33 | to state B the i don't |

0:07:35 | it change according to this |

0:07:37 | a a transition that |

0:07:38 | here |

0:07:39 | and we don't know the transition problem it's we don't know P zero one Q one |

0:07:43 | at all the channels are still has had an equal they have the same transition |

0:07:48 | not |

0:07:48 | how do we design channels action policy |

0:07:51 | to maximise |

0:07:52 | swoop |

0:07:53 | turns out |

0:07:54 | this is a variation of the uh a more yeah and it problem but has never been self so we |

0:08:00 | are |

0:08:00 | essentially a on the first to to address |

0:08:03 | these |

0:08:04 | more down band |

0:08:05 | and um my |

0:08:07 | and the what to challenges here |

0:08:10 | so first here we can see if we have a markovian model for each channel |

0:08:15 | then the optimal policy even if we know the transition probability |

0:08:19 | is no longer the state home one channel |

0:08:21 | we need to |

0:08:23 | so each across channels based on our prediction or which channel is more likely to |

0:08:27 | i do like these |

0:08:29 | a fixed |

0:08:29 | because of the channel memory makes prediction |

0:08:33 | so that in this case we can not simply |

0:08:36 | minimize our required by funding at that time was spent on a bad channel |

0:08:41 | because |

0:08:41 | there's no really bad channel |

0:08:43 | say we need to switch the was smart we are all switching on much |

0:08:48 | so you actually large mine to or what is the best way |

0:08:51 | to switching on among channels based on how observation |

0:08:55 | and in general |

0:08:56 | there are infinite possibilities of |

0:08:58 | how to to switching |

0:08:59 | so the wheel are |

0:09:00 | from these infinite possibility which one is the bad |

0:09:03 | so that's really that difficult part |

0:09:05 | and perhaps the reason that this problem has not addressed |

0:09:09 | a lead to each |

0:09:10 | so actually turns out |

0:09:12 | if we do one of the transition probability |

0:09:15 | it's is this problem itself is complex enough not this is is actually the S is multi um band problem |

0:09:21 | and they're not that that makes |

0:09:23 | uh formulate that we do ninety eight eight |

0:09:26 | and this problem in general is peace space ha |

0:09:29 | that's shown but have them each one that's |

0:09:32 | the cool think here is we have a special recipe spend it |

0:09:35 | the think we are you ways here is the only a two state markov G channel each channel |

0:09:40 | and we have shown and they no model |

0:09:42 | the optimal policy is a simple now |

0:09:46 | and do this is uh a a a a real work uh um have down with a a couple of |

0:09:50 | clever with er |

0:09:51 | so the optimality is |

0:09:53 | current to and there's certain conditions |

0:09:55 | so if we have a key one one we to that P zero one which be is is that daisy |

0:09:59 | chain |

0:10:00 | it's more likely to stay rather than change state |

0:10:03 | then of to melody is generally um shoe |

0:10:06 | but that you are the think like you P was model that P zero once a sure what three channels |

0:10:12 | and the all K equal to and mine there's one |

0:10:14 | but we come it's can gently these holds for all and for |

0:10:18 | you you a rising case we have be able to prove |

0:10:21 | so but that's |

0:10:22 | assume we can this time your to project or is true then we know and the not model myopic policy |

0:10:28 | is the optimal |

0:10:30 | and the more significant to these on the the problem |

0:10:33 | is that the mouth you policy has a simple so a universal struck |

0:10:39 | oh which means a we need to know is what if P one mine's squid to that P zero one |

0:10:43 | we always switch in this way |

0:10:46 | as we stay at a i don't channel if this channel with current the sense is i don't who was |

0:10:50 | stay with |

0:10:51 | you it becomes a it then we switched to the channel visited the log is |

0:10:56 | that's are are way of speech |

0:10:58 | and if |

0:10:59 | you want one small the |

0:11:00 | if zero one then way switching a you from way |

0:11:03 | we actually stay if the channel was a the and the was which if becomes i with where |

0:11:08 | why you re what then we go to somewhere else |

0:11:11 | and how the with switch with switched to the channel most recently visited a are all channels with a T |

0:11:17 | that you most of all |

0:11:18 | you've there are not such an no |

0:11:20 | the we switch to the channel busy long as |

0:11:23 | anyway |

0:11:24 | there essentially a of this says is all optimal we of change |

0:11:29 | among channels |

0:11:30 | i'm only takes two possible |

0:11:32 | depending on the transition |

0:11:34 | so that's really |

0:11:36 | makes the are no problem |

0:11:38 | significant a simpler to some instead of searching my infinite possible ways of how to switching on channels |

0:11:44 | we know there are only two possible |

0:11:47 | if we don't to P zero one |

0:11:48 | what is either there this way or that |

0:11:51 | the |

0:11:52 | we can formulate a sort of conceptual more the and it |

0:11:55 | well |

0:11:56 | so we treat each way of channel switching as a a |

0:12:00 | you really can see that what is a a a remote on a problem |

0:12:04 | is nothing but a choice |

0:12:05 | a possible option |

0:12:07 | have and we don't know which of change the bass |

0:12:09 | as exactly what do we have |

0:12:11 | so then not we'll a which um is that what are we only have forms we of switching |

0:12:16 | so we want to do which one is the good one |

0:12:18 | because we don't know the transition probably so we don't know which one to choose |

0:12:22 | so it same we just read of the problem to the class more they are banded problem |

0:12:27 | not that i C there are two channel is we have to address |

0:12:31 | the first one the price are we have to answer is how long to play each i |

0:12:35 | so a region that we just play one as we got some word for the classic |

0:12:39 | model |

0:12:40 | but not a a a a a one way of switching on much |

0:12:43 | so we have to follow that we of switching among channels long enough you all there to see if i |

0:12:50 | so how long we play |

0:12:52 | and terms not to the optimal lattice of formal each way use spell star |

0:12:57 | depends on the transition probability |

0:12:59 | unfortunately fortunately and that's something we don't not |

0:13:02 | so that's one they we have to dress |

0:13:04 | and the out way is the reward in this case and the longer i be kinda |

0:13:08 | not even independent |

0:13:10 | oh i be of rows are |

0:13:12 | because |

0:13:12 | each um is are only able way of switching on among channels |

0:13:15 | and that they are also reaching a on the same as that of and J |

0:13:19 | they are no longer |

0:13:20 | and that right |

0:13:22 | so to address these two challenges that's our uh solution here |

0:13:26 | so first how long to play each uh |

0:13:28 | the optimal lattice of playing |

0:13:30 | each arm of only each we of a channel switching |

0:13:34 | depends on the parameters are no priming mean |

0:13:37 | so to get a a a round of was back to the trick we have here is |

0:13:41 | we play a each and with increasing that |

0:13:43 | so we do not fixed |

0:13:45 | a a specific lance for playing each a |

0:13:47 | but rather choose a sequence that rules with |

0:13:51 | and |

0:13:51 | we will pace |

0:13:52 | certain price in terms of required |

0:13:55 | but we can choose the increasing read of view sequence which really small a lot a lot of T |

0:14:01 | and |

0:14:02 | do the price a pay we pay will be how which we really |

0:14:05 | a |

0:14:06 | yeah a thing to deal with not i T samples |

0:14:10 | um what we did it is we um proved a modified each one of thing |

0:14:15 | which essentially is the chief for proving a lot required to in the classic i D and eight uh |

0:14:21 | model |

0:14:22 | so the uh channel thing an eh |

0:14:25 | uh are we should not foundation is all the conditional um |

0:14:29 | expectations should have should have the same me so that a conditional exactly expectation should be the same me |

0:14:35 | and then we come by to the probability that |

0:14:38 | the sample mean is to far from the truth |

0:14:41 | and in this case we how we don't have identical um every we sample anymore |

0:14:46 | so we need to to modify that because the conditional expectation no longer the same |

0:14:50 | but we can get a the result if the conditional expectations are with thing |

0:14:54 | a certain range of each other |

0:14:56 | so |

0:14:57 | then we can use similar techniques to prove there is |

0:15:01 | and the what we have shown here is |

0:15:03 | we can achieve a which very close to a log all there for the required |

0:15:08 | essentially |

0:15:09 | in in front of a lot we don't have a constant anymore is |

0:15:12 | it's but a you know increasing sequence |

0:15:14 | but this sequence can be a it is small |

0:15:17 | as slow as you want |

0:15:18 | so you choose |

0:15:19 | a small you want to you hug close you want to get a lot T and based on that |

0:15:24 | we can choose |

0:15:25 | the increasing the for each |

0:15:27 | L N |

0:15:28 | so i was give the details |

0:15:30 | on the relation between G T and uh L T actually very simple |

0:15:34 | is just showing at each time |

0:15:36 | what is the current L where you |

0:15:39 | and uh that's a it or what is conclude um |

0:15:43 | this paper here |

0:15:44 | so what |

0:15:45 | we have considered a a a a of the problem was a multi bit it by um dynamic spectrum access |

0:15:52 | um we believe this general approach would be useful for for solving a either |

0:15:57 | uh rest we spend it probably even a mark of general mark of C your process is on the primary |

0:16:03 | a what is the message here |

0:16:05 | the method here is if we are dealing with a small T are bandit problem |

0:16:09 | with a known |

0:16:10 | uh then them |

0:16:12 | and we want to achieve |

0:16:14 | the same i reach word as we can get and the not |

0:16:18 | so in general problem is intractable |

0:16:21 | but the problem may have a whole |

0:16:24 | if |

0:16:24 | and the the not model |

0:16:26 | the optimal policy has what we call a find that option |

0:16:30 | by means |

0:16:32 | there are only fun and two ways of |

0:16:34 | do we find the and optimal policy only has four that possible for |

0:16:39 | depending on the primary |

0:16:42 | don't we treat each possible for |

0:16:44 | as as a a a a we don't to know the prior |

0:16:47 | parameters then we tried to or which one is actually a one for that and a line on them |

0:16:53 | so that's the general message |

0:16:54 | but of course for each specific problem we have to but yeah sure |

0:16:59 | whether log |

0:17:00 | oh i'm sure close to log where can be but she |

0:17:04 | and in the context of dynamic spectrum access that's great we have been focusing are |

0:17:08 | the problem is really possible because house |

0:17:11 | of this i a universal structure of the our P policy for that |

0:17:15 | no model pace |

0:17:16 | and uh i a little bit advertisement |

0:17:19 | um i be giving a another problem related a problem being about forty minutes |

0:17:25 | uh |

0:17:25 | because this one |

0:17:27 | a uh we can only somehow for a very special cases where you have only to state markov chain and |

0:17:33 | all the channels are are are arms a stochastic identical |

0:17:37 | the what if we're dealing with a general fine state markov chain with the no in the transition problem |

0:17:44 | what can we do there |

0:17:45 | so this is paper actually is a dressing |

0:17:48 | this complete be general model |

0:17:50 | um but we can also achieve not require |

0:17:53 | but |

0:17:54 | the definition of the required it is uh |

0:17:56 | we |

0:17:57 | oh |

0:17:58 | oh |

0:18:02 | path |

0:18:03 | i don't one person something to sort of question from the speech |

0:18:08 | true |

0:18:15 | oh i yes |

0:18:16 | so that's essentially here |

0:18:20 | you that cool |

0:18:21 | yes yeah |

0:18:22 | so |

0:18:23 | that the required we get it is these G Q locked chi right so we want chi G to grow |

0:18:28 | very slow is time T |

0:18:30 | and uh so if you want to achieve a lot a lot should be |

0:18:34 | uh for chi chi |

0:18:36 | then |

0:18:36 | eventually do sequence |

0:18:38 | uh |

0:18:39 | this is just |

0:18:41 | the number of |

0:18:43 | but what is the value of L for the current |

0:18:46 | so |

0:18:47 | L one i is |

0:18:48 | well we start we play one i L L one |

0:18:51 | slot |

0:18:52 | so then for the first L once lots the these sequence was what have the same value a one |

0:18:58 | for the that |

0:18:59 | what a a a a a model be out to we play out to time |

0:19:02 | so for the next L tools as well as the but will be out |

0:19:06 | so |

0:19:06 | L is increasing |

0:19:08 | so these sequence will be also be in crazy but even slower or |

0:19:12 | because it's repeating the same value |

0:19:15 | but it |

0:19:16 | yeah |

0:19:19 | okay |

0:19:20 | or running times to time well just a record some |

0:19:31 | um |

0:19:33 | i field to see if a connection was not little huh but this is really trying to learn |

0:19:38 | um |

0:19:38 | trying to maximise this throughput put and there known parameters |

0:19:42 | but a a is more like the inclusion was your peer users here we even only have one |

0:19:48 | uh and secondary user |

0:19:49 | i i i probably miss |

0:19:51 | um |

0:19:52 | but an action that i i don't really see whether two problems i really to |

0:19:56 | um |

0:19:57 | maybe we can talk more data |

0:19:59 | you to to rooms |

0:20:01 | so so let's |

0:20:02 | that's |

0:20:02 | the speaker |