| 0:00:15 | i'm not like a | 
|---|
| 0:00:17 | and my dog adviser a woman devilish and that he picked him | 
|---|
| 0:00:22 | and i want to talk about the user adaptation | 
|---|
| 0:00:25 | in dialogue system | 
|---|
| 0:00:28 | so most of the state of course | 
|---|
| 0:00:33 | dialogue system and most of the production dialogue system | 
|---|
| 0:00:36 | are adapting | 
|---|
| 0:00:39 | gender equality generic strategy | 
|---|
| 0:00:42 | so we have the same behavior | 
|---|
| 0:00:44 | for any user | 
|---|
| 0:00:46 | users | 
|---|
| 0:00:47 | and what's going to do is to learn one strategy | 
|---|
| 0:00:51 | for each of these users | 
|---|
| 0:00:55 | the propose a problem with a learning strategy from scratch | 
|---|
| 0:00:59 | is one to do some expression | 
|---|
| 0:01:04 | and expression lead to | 
|---|
| 0:01:08 | very bad | 
|---|
| 0:01:10 | performance is far directions | 
|---|
| 0:01:13 | so we want to design | 
|---|
| 0:01:17 | a framework | 
|---|
| 0:01:18 | which is | 
|---|
| 0:01:20 | i very good during the course starts of face | 
|---|
| 0:01:24 | and it must also be good during the as i said | 
|---|
| 0:01:29 | concept that interface | 
|---|
| 0:01:31 | so we propose | 
|---|
| 0:01:34 | for processes for user adaptation | 
|---|
| 0:01:36 | and who can composed of upright faces | 
|---|
| 0:01:41 | and it goes of this way | 
|---|
| 0:01:44 | so let's say we have a bunch of robot's we present think a dialogue system | 
|---|
| 0:01:49 | and each of these robots | 
|---|
| 0:01:52 | a learning strategy versus use a specific users | 
|---|
| 0:01:57 | and they also giver | 
|---|
| 0:01:58 | or the dialogue was done with the this user | 
|---|
| 0:02:04 | so all the knowledge of this well but | 
|---|
| 0:02:08 | is represented | 
|---|
| 0:02:09 | by the dialogues | 
|---|
| 0:02:11 | so we want to elect | 
|---|
| 0:02:15 | some representatives | 
|---|
| 0:02:16 | all the database | 
|---|
| 0:02:18 | and for example gives a little bit and i did one | 
|---|
| 0:02:22 | and it's a it's a novel we have a target user | 
|---|
| 0:02:25 | and we don't have a system | 
|---|
| 0:02:27 | two dialogue you'd of these target user so we want to design a system from | 
|---|
| 0:02:31 | scratch | 
|---|
| 0:02:33 | and what's going to do is to transfer the knowledge of one of the we | 
|---|
| 0:02:37 | present that you to the system | 
|---|
| 0:02:39 | so i'd first we want to select the best representative to dialogue we have or | 
|---|
| 0:02:44 | target user input | 
|---|
| 0:02:47 | and we will try it should be represent the t one by one | 
|---|
| 0:02:51 | and at the end | 
|---|
| 0:02:52 | we select the better a dialogue system which is blue lines the you use | 
|---|
| 0:02:58 | so now we natural for all the knowledge | 
|---|
| 0:03:01 | to the new system | 
|---|
| 0:03:03 | so let's say we have | 
|---|
| 0:03:06 | scrunch system | 
|---|
| 0:03:08 | and we're gonna know the strategic thanks to the knowledge transfer and also | 
|---|
| 0:03:15 | we all the dialogue don't during the source selection face | 
|---|
| 0:03:19 | so we gonna use this new this can they have system | 
|---|
| 0:03:23 | to their with this user | 
|---|
| 0:03:25 | and we collect more dialogues | 
|---|
| 0:03:28 | and then we can learn new system morse a more specialised | 
|---|
| 0:03:32 | to this target user | 
|---|
| 0:03:34 | and we repeat this process and to be which | 
|---|
| 0:03:37 | a very as busy writers the spectral is | 
|---|
| 0:03:41 | general system to be a target user | 
|---|
| 0:03:46 | so in the end we are then you | 
|---|
| 0:03:48 | and you wanna target dust into the two sources | 
|---|
| 0:03:53 | so i will detail each of these a face | 
|---|
| 0:03:56 | so the sources are dialogue manager | 
|---|
| 0:04:00 | so they have manager components of dialogue systems | 
|---|
| 0:04:04 | and this manager take as input a repetition activities | 
|---|
| 0:04:09 | for example i would like to book a flight suit on then | 
|---|
| 0:04:13 | and the dialogue manager with the connection | 
|---|
| 0:04:16 | for example a good field or a good nine | 
|---|
| 0:04:21 | and the usual way to design their manager | 
|---|
| 0:04:27 | is to a task than a reinforcement learning problems | 
|---|
| 0:04:31 | so we first but only programs | 
|---|
| 0:04:35 | and you with one engines | 
|---|
| 0:04:38 | interaction with no agreement | 
|---|
| 0:04:40 | so for example are agent is a dialogue manager | 
|---|
| 0:04:44 | and the environment will be a target user | 
|---|
| 0:04:48 | so the engine can take | 
|---|
| 0:04:52 | interaction | 
|---|
| 0:04:53 | and the environments we'll react | 
|---|
| 0:04:57 | and we can also it's a reaction | 
|---|
| 0:05:01 | so prime is an observation and we can also are but we are we want | 
|---|
| 0:05:08 | so amp right | 
|---|
| 0:05:09 | and even in this observation and no also the action taken | 
|---|
| 0:05:14 | be an agent can a date | 
|---|
| 0:05:17 | it's a joint state | 
|---|
| 0:05:19 | so we got here we go to a far from is to a sprite | 
|---|
| 0:05:24 | so we conducted that | 
|---|
| 0:05:27 | or the knowledge of the environment is contain | 
|---|
| 0:05:31 | in the top l is a | 
|---|
| 0:05:35 | a sprite and | 
|---|
| 0:05:37 | our prior | 
|---|
| 0:05:39 | so this is | 
|---|
| 0:05:41 | the mentioning you know reinforcement learning | 
|---|
| 0:05:43 | so we have knowledge of the environment | 
|---|
| 0:05:47 | taking the form of the samples | 
|---|
| 0:05:49 | and we want to design a good the strategy for the nao manager | 
|---|
| 0:05:56 | and have used that this is good policy so this is a function mapping | 
|---|
| 0:06:02 | states to a collection | 
|---|
| 0:06:04 | and we want to find the optimal policy | 
|---|
| 0:06:06 | so the optimal policy | 
|---|
| 0:06:08 | is a policy which maximizes | 
|---|
| 0:06:10 | at the community we weren't | 
|---|
| 0:06:12 | during in the direction | 
|---|
| 0:06:14 | between the dialogue manager and the target user | 
|---|
| 0:06:19 | so no | 
|---|
| 0:06:22 | i of the there is an equivalency between the dialogue manager a time stamp | 
|---|
| 0:06:26 | robots and a policy | 
|---|
| 0:06:28 | so we want to find the best | 
|---|
| 0:06:32 | what d c two represents all the database | 
|---|
| 0:06:36 | so this is this will selection phase | 
|---|
| 0:06:39 | and we introduce in this is the main contribution of the paper | 
|---|
| 0:06:43 | we introduce bodysuit raven distance | 
|---|
| 0:06:47 | so this is a matrix | 
|---|
| 0:06:48 | which computes | 
|---|
| 0:06:50 | the have you or differences between what is | 
|---|
| 0:06:54 | so | 
|---|
| 0:06:54 | we some state and we look at which edge action is taken | 
|---|
| 0:07:00 | in a each of these distinct | 
|---|
| 0:07:03 | and for example one can see that the third one | 
|---|
| 0:07:07 | is very close to populate one | 
|---|
| 0:07:10 | and the yellow is very different to the to the little | 
|---|
| 0:07:15 | so one can see this at least relevant distance | 
|---|
| 0:07:19 | as a binary vector | 
|---|
| 0:07:22 | and where the ones | 
|---|
| 0:07:25 | we present the action taken in a given state | 
|---|
| 0:07:29 | so for example | 
|---|
| 0:07:31 | we will but take these actions | 
|---|
| 0:07:34 | and the been every vector will look like | 
|---|
| 0:07:37 | and it if we combine of using every vector | 
|---|
| 0:07:41 | to the gender and all | 
|---|
| 0:07:43 | we have a unique button see | 
|---|
| 0:07:45 | with the which is greater | 
|---|
| 0:07:47 | train a distance | 
|---|
| 0:07:49 | so this allow us to use a clustering algorithm called k-means | 
|---|
| 0:07:56 | so can means will give our or the skewed or a dialogue manager | 
|---|
| 0:08:02 | as clusters | 
|---|
| 0:08:04 | and since we want to represent the gmm | 
|---|
| 0:08:07 | we will have to learn one policy by clusters | 
|---|
| 0:08:12 | so we give a working knowledge of each cluster and we learned policy with that | 
|---|
| 0:08:18 | but we can also use an of our algorithm | 
|---|
| 0:08:21 | code that come into its | 
|---|
| 0:08:22 | and i'm in the winter thanks to the police drama distance | 
|---|
| 0:08:26 | we finish directly free representative | 
|---|
| 0:08:31 | okay so no we want to select the best | 
|---|
| 0:08:34 | policy to dialogue with the target user | 
|---|
| 0:08:39 | so this is association or | 
|---|
| 0:08:41 | so for that we cannot use a bounded algorithm | 
|---|
| 0:08:44 | corn use into one | 
|---|
| 0:08:45 | so usually one will test | 
|---|
| 0:08:48 | each of the representative one by one time | 
|---|
| 0:08:51 | so you would deal with when one and two score is to with a one | 
|---|
| 0:08:56 | and then the with one | 
|---|
| 0:08:58 | and no is the next dialogue other the next system that the user will dialogue | 
|---|
| 0:09:04 | with | 
|---|
| 0:09:05 | is as a system which maximize the be value so | 
|---|
| 0:09:09 | now we will deal with the blue one | 
|---|
| 0:09:12 | and the u w is to the best | 
|---|
| 0:09:15 | so we keep the earring with the blue one | 
|---|
| 0:09:17 | and to which a very but school | 
|---|
| 0:09:20 | and at these points | 
|---|
| 0:09:22 | the red system at the better value so we switch or robots | 
|---|
| 0:09:27 | and we would be this process and to me which are maximum timing it | 
|---|
| 0:09:31 | for example one hundred the time step | 
|---|
| 0:09:36 | and so we know that on this is as the system or maximizing the them | 
|---|
| 0:09:42 | so the point of using a c d one is that the summaries and take | 
|---|
| 0:09:46 | into account the high variability | 
|---|
| 0:09:49 | of the dialogs | 
|---|
| 0:09:53 | okay so knowledge transfer the knowledge of this you know to a menu system | 
|---|
| 0:09:59 | so is also face | 
|---|
| 0:10:01 | so let's saying we have to the edge of samples the source image and the | 
|---|
| 0:10:05 | target image | 
|---|
| 0:10:07 | and we want to remove | 
|---|
| 0:10:09 | where the sample from the source badge | 
|---|
| 0:10:11 | already played present in the target image | 
|---|
| 0:10:14 | so for that we use those two base | 
|---|
| 0:10:18 | so this is a filtering algorithm | 
|---|
| 0:10:20 | it will consider their each some part of the source of h | 
|---|
| 0:10:24 | so let's say we start with this one | 
|---|
| 0:10:26 | and it would what's there are some kind with the same action | 
|---|
| 0:10:30 | so these two | 
|---|
| 0:10:32 | and sees us israel states is very different to the red state in the two | 
|---|
| 0:10:37 | states | 
|---|
| 0:10:38 | we can have a the source better | 
|---|
| 0:10:40 | to the funeral image | 
|---|
| 0:10:43 | no we because the obvious something | 
|---|
| 0:10:46 | and we can see that the light red state is very close to the right | 
|---|
| 0:10:51 | state | 
|---|
| 0:10:52 | so we don't at this simple to the pitch | 
|---|
| 0:10:55 | and we keep the we continue this for each sample of just a bench | 
|---|
| 0:11:01 | and in the end that we have but target image | 
|---|
| 0:11:05 | and we will use it really was this | 
|---|
| 0:11:08 | for learning a new policy | 
|---|
| 0:11:11 | so the other so that only | 
|---|
| 0:11:13 | is don't thanks to we the did you | 
|---|
| 0:11:17 | so if you did you is a reinforcement learning algorithm which take of any goods | 
|---|
| 0:11:23 | a bunch of samples | 
|---|
| 0:11:25 | and it would computes the optimal policy for this some pairs | 
|---|
| 0:11:31 | to think issue is | 
|---|
| 0:11:33 | and i resign coming from fitted value iteration and this specific algorithm can also from | 
|---|
| 0:11:41 | body recognition | 
|---|
| 0:11:42 | and value iteration is a very famous algorithm to solve a markov decision processes | 
|---|
| 0:11:51 | so if we combine as a filtering in the running | 
|---|
| 0:11:54 | one can see that we learn a | 
|---|
| 0:11:58 | a system | 
|---|
| 0:11:59 | which is a mix between when diesel together and the real users | 
|---|
| 0:12:04 | so we're gonna use this new | 
|---|
| 0:12:07 | this new system | 
|---|
| 0:12:09 | to dialogue now | 
|---|
| 0:12:11 | we target user | 
|---|
| 0:12:13 | so we a new dialogue to the target bench | 
|---|
| 0:12:16 | and you can see that the free software that at the bench are very similar | 
|---|
| 0:12:20 | to the sampling this was image | 
|---|
| 0:12:23 | so in the enter | 
|---|
| 0:12:25 | it remains only is about as a as a sample from the target image | 
|---|
| 0:12:30 | so when we going out on the then you put it | 
|---|
| 0:12:34 | we will on the very special specialised system to this a target user | 
|---|
| 0:12:41 | so this is the overall the additional process for | 
|---|
| 0:12:46 | for users | 
|---|
| 0:12:48 | and what we want to test are | 
|---|
| 0:12:51 | our framework on some experience | 
|---|
| 0:12:54 | so we gonna uses the negotiation that okay | 
|---|
| 0:12:57 | so we focused on a negotiation because | 
|---|
| 0:13:01 | we have two actors | 
|---|
| 0:13:04 | having a different be have your | 
|---|
| 0:13:07 | so we want to adapt to this year | 
|---|
| 0:13:09 | so in the negotiation there again you want to appear | 
|---|
| 0:13:12 | and they are given some time slots | 
|---|
| 0:13:17 | and preferences | 
|---|
| 0:13:18 | for each time slot | 
|---|
| 0:13:20 | and averaged around a | 
|---|
| 0:13:22 | each agent | 
|---|
| 0:13:25 | we're the proposed a slot | 
|---|
| 0:13:28 | for example kenny proposed a this drinks but | 
|---|
| 0:13:32 | and the wheel but we shoes and propose it's one utterance but | 
|---|
| 0:13:37 | so since as negation again is an obstruction of a yellow | 
|---|
| 0:13:42 | dialogue we introduced a noise | 
|---|
| 0:13:45 | in communication channel | 
|---|
| 0:13:47 | and the form of switching sometimes but so for example we replace the previous times | 
|---|
| 0:13:54 | right with the yellow one | 
|---|
| 0:13:56 | and can you will result we will assign a new information | 
|---|
| 0:14:01 | as a form of an automatic speech recognition score | 
|---|
| 0:14:06 | and you want this information it can continue the dialogue | 
|---|
| 0:14:10 | are you can ask to deal the origin to repeat the proposition | 
|---|
| 0:14:14 | or you can and does the data | 
|---|
| 0:14:16 | so for example you yes to repeat | 
|---|
| 0:14:21 | and be able but repeats | 
|---|
| 0:14:24 | and at some points | 
|---|
| 0:14:26 | can you can accept the proposition | 
|---|
| 0:14:29 | are you can also deny and the dialogue | 
|---|
| 0:14:34 | in the end of the dialogue where the users are rewarding | 
|---|
| 0:14:39 | we have a score | 
|---|
| 0:14:41 | and this court is functions you'd | 
|---|
| 0:14:44 | with the | 
|---|
| 0:14:46 | we are all the time slot and read | 
|---|
| 0:14:50 | so i four went to say that the point of the game | 
|---|
| 0:14:53 | is to final than agreements | 
|---|
| 0:14:56 | between at experts | 
|---|
| 0:14:58 | so can you really ugly well the less buttons here the all but see so | 
|---|
| 0:15:03 | that estimates is | 
|---|
| 0:15:04 | is smaller | 
|---|
| 0:15:07 | so now we want to test the this again | 
|---|
| 0:15:10 | we use the and there is a under the user interacting with the system so | 
|---|
| 0:15:15 | we designed a similar to users | 
|---|
| 0:15:17 | with a very difference profiles | 
|---|
| 0:15:21 | and so we have for example the determinized each user | 
|---|
| 0:15:24 | we will you will | 
|---|
| 0:15:26 | proposed is a certain slots in decreasing order | 
|---|
| 0:15:30 | and we have also this one now proposing instance | 
|---|
| 0:15:34 | taking a random actions | 
|---|
| 0:15:37 | this wonderful whereas propose it's a base the best start | 
|---|
| 0:15:42 | and this one accept as soon as possible and finally | 
|---|
| 0:15:46 | this one and the dialogue as soon as possible so this is very different be | 
|---|
| 0:15:52 | a if you are and we want to adapt to these vehicles | 
|---|
| 0:15:55 | we also design you want models | 
|---|
| 0:15:59 | so each one model is | 
|---|
| 0:16:01 | is a model of you man thanks to everything off | 
|---|
| 0:16:08 | one and read the dialogue by men so for you man | 
|---|
| 0:16:13 | and we model it is these | 
|---|
| 0:16:17 | is that so we used results | 
|---|
| 0:16:19 | with a k-nearest neighbor algorithm | 
|---|
| 0:16:23 | and you can scenes in the table | 
|---|
| 0:16:25 | the distribution of action for a feature we really humans | 
|---|
| 0:16:31 | so you can lead to that we'll and at x are very similar | 
|---|
| 0:16:36 | and you go and no one are pretty difference | 
|---|
| 0:16:41 | so now we want to design the system | 
|---|
| 0:16:43 | which we don't directly with this these results | 
|---|
| 0:16:48 | so that won't have the same action and the of the users to simplify the | 
|---|
| 0:16:52 | design | 
|---|
| 0:16:54 | as a set of function is received restricting | 
|---|
| 0:16:58 | and we don't know as we so previously this system with a few | 
|---|
| 0:17:04 | and a morse wire and that's one really agrees them to do some exploration | 
|---|
| 0:17:10 | so the in this tell the isn't sure of the dialogue system the dialog manager | 
|---|
| 0:17:17 | is a actually to commit a combination of the costs of the automatic speech sure | 
|---|
| 0:17:22 | regression recognition score | 
|---|
| 0:17:23 | and also the number of the | 
|---|
| 0:17:25 | of that are during the key | 
|---|
| 0:17:29 | so before test susie | 
|---|
| 0:17:32 | men framework we want to show that running one system by a user is a | 
|---|
| 0:17:39 | good thing | 
|---|
| 0:17:40 | so here we have a bunch of system so v s u one two three | 
|---|
| 0:17:45 | extra and each of the system learning strategy | 
|---|
| 0:17:49 | with the this users so obviously when we don't know | 
|---|
| 0:17:53 | the strategy against a pu one | 
|---|
| 0:17:56 | and you can not is that the board values | 
|---|
| 0:18:00 | actually indicate that | 
|---|
| 0:18:02 | as a bit so the bit the system to dialogue we've a given user is | 
|---|
| 0:18:07 | the system we should on the strategy | 
|---|
| 0:18:09 | we this user | 
|---|
| 0:18:10 | so there is a real we need to adaptation | 
|---|
| 0:18:16 | we can share the same with you'll and when they're users | 
|---|
| 0:18:19 | the t and the difference is that well if you | 
|---|
| 0:18:23 | and actually it is the especially for is a screen and thus use alex | 
|---|
| 0:18:30 | the | 
|---|
| 0:18:31 | the both | 
|---|
| 0:18:33 | one point or seventy four in one way or seventy three | 
|---|
| 0:18:38 | a very close and you can do sources and the thing for the line we | 
|---|
| 0:18:43 | will | 
|---|
| 0:18:45 | so | 
|---|
| 0:18:46 | no we can test the main framework for adaptation | 
|---|
| 0:18:50 | so for that we introduce two new methods | 
|---|
| 0:18:55 | one using | 
|---|
| 0:18:57 | and without the scratch so is quite sure it's just go down just learn to | 
|---|
| 0:19:01 | make the system from scratch without | 
|---|
| 0:19:04 | transferring in english | 
|---|
| 0:19:05 | and the other one is a limited so this is the generic | 
|---|
| 0:19:09 | generic midi the | 
|---|
| 0:19:11 | each way on the policy we all the knowledge of the database | 
|---|
| 0:19:14 | so we generate too slow system database one for the user's stability and once for | 
|---|
| 0:19:20 | the human model users | 
|---|
| 0:19:22 | and each new system is it on things to | 
|---|
| 0:19:25 | we one that thousand two hundred dialogues | 
|---|
| 0:19:29 | and each means that there is this two | 
|---|
| 0:19:33 | we to two hundred dialogues | 
|---|
| 0:19:37 | so for simulated users | 
|---|
| 0:19:40 | alternate alternative is intent on the other show a significant better result than i don't | 
|---|
| 0:19:45 | know and scratch for the two metrics | 
|---|
| 0:19:48 | the scores and task completion | 
|---|
| 0:19:51 | but in an upper hand for your money they results | 
|---|
| 0:19:54 | our method are it is better | 
|---|
| 0:19:56 | but not that much and | 
|---|
| 0:19:59 | the reason for that is negotiation that again is a two simple for humans | 
|---|
| 0:20:04 | and i actually most of the human have the same behavior on the game | 
|---|
| 0:20:10 | so there is no points of learning | 
|---|
| 0:20:14 | i don't that you strategy | 
|---|
| 0:20:15 | since all the people have the same behavior | 
|---|
| 0:20:21 | so we have to conclude we provide the framework for a user adaptation | 
|---|
| 0:20:26 | and the we introduce a prescription distance which is a way to | 
|---|
| 0:20:32 | compute the everywhere differences | 
|---|
| 0:20:35 | and we validate the framework on both | 
|---|
| 0:20:40 | this unit user and human with a user setup | 
|---|
| 0:20:43 | and finally we show that the overall | 
|---|
| 0:20:47 | dialogue quality is a hands | 
|---|
| 0:20:50 | based on two metrics of the task completion and the score | 
|---|
| 0:20:55 | so thank you | 
|---|
| 0:21:23 | i wasn't sure what you squirt for your cross comparison | 
|---|
| 0:21:28 | i we want to see this way | 
|---|
| 0:21:33 | next table so what is numbers and what's good | 
|---|
| 0:21:39 | well which | 
|---|
| 0:21:42 | each for represents the score | 
|---|
| 0:21:44 | of each is then given the user of the whole | 
|---|
| 0:21:48 | so the system is | 
|---|
| 0:21:50 | and the other thing we the each user | 
|---|
| 0:21:54 | so | 
|---|
| 0:21:54 | so for example a dispute to have a score of zero point forty four | 
|---|
| 0:22:01 | we the b one | 
|---|
| 0:22:03 | what is that score | 
|---|
| 0:22:05 | score is a score is | 
|---|
| 0:22:07 | is the mean we while of is a diagonal | 
|---|
| 0:22:10 | g i at the end of the dialogue there is a we want okay and | 
|---|
| 0:22:13 | we do some you know g though | 
|---|
| 0:22:15 | on the register maximum rate is the maximum score | 
|---|
| 0:22:22 | yes actually it's | 
|---|
| 0:22:24 | it's too | 
|---|
| 0:22:26 | higher | 
|---|
| 0:22:28 | sorry the higher better that's | 
|---|
| 0:22:48 | okay | 
|---|
| 0:22:49 | the question could you | 
|---|
| 0:22:51 | more details about a reinforcement learning | 
|---|
| 0:22:56 | i e c | 
|---|
| 0:23:00 | the key | 
|---|
| 0:23:02 | you want you are | 
|---|
| 0:23:13 | i | 
|---|
| 0:23:15 | speaker once again | 
|---|