0:00:15i'm not like a
0:00:17and my dog adviser a woman devilish and that he picked him
0:00:22and i want to talk about the user adaptation
0:00:25in dialogue system
0:00:28so most of the state of course
0:00:33dialogue system and most of the production dialogue system
0:00:36are adapting
0:00:39gender equality generic strategy
0:00:42so we have the same behavior
0:00:44for any user
0:00:46users
0:00:47and what's going to do is to learn one strategy
0:00:51for each of these users
0:00:55the propose a problem with a learning strategy from scratch
0:00:59is one to do some expression
0:01:04and expression lead to
0:01:08very bad
0:01:10performance is far directions
0:01:13so we want to design
0:01:17a framework
0:01:18which is
0:01:20i very good during the course starts of face
0:01:24and it must also be good during the as i said
0:01:29concept that interface
0:01:31so we propose
0:01:34for processes for user adaptation
0:01:36and who can composed of upright faces
0:01:41and it goes of this way
0:01:44so let's say we have a bunch of robot's we present think a dialogue system
0:01:49and each of these robots
0:01:52a learning strategy versus use a specific users
0:01:57and they also giver
0:01:58or the dialogue was done with the this user
0:02:04so all the knowledge of this well but
0:02:08is represented
0:02:09by the dialogues
0:02:11so we want to elect
0:02:15some representatives
0:02:16all the database
0:02:18and for example gives a little bit and i did one
0:02:22and it's a it's a novel we have a target user
0:02:25and we don't have a system
0:02:27two dialogue you'd of these target user so we want to design a system from
0:02:31scratch
0:02:33and what's going to do is to transfer the knowledge of one of the we
0:02:37present that you to the system
0:02:39so i'd first we want to select the best representative to dialogue we have or
0:02:44target user input
0:02:47and we will try it should be represent the t one by one
0:02:51and at the end
0:02:52we select the better a dialogue system which is blue lines the you use
0:02:58so now we natural for all the knowledge
0:03:01to the new system
0:03:03so let's say we have
0:03:06scrunch system
0:03:08and we're gonna know the strategic thanks to the knowledge transfer and also
0:03:15we all the dialogue don't during the source selection face
0:03:19so we gonna use this new this can they have system
0:03:23to their with this user
0:03:25and we collect more dialogues
0:03:28and then we can learn new system morse a more specialised
0:03:32to this target user
0:03:34and we repeat this process and to be which
0:03:37a very as busy writers the spectral is
0:03:41general system to be a target user
0:03:46so in the end we are then you
0:03:48and you wanna target dust into the two sources
0:03:53so i will detail each of these a face
0:03:56so the sources are dialogue manager
0:04:00so they have manager components of dialogue systems
0:04:04and this manager take as input a repetition activities
0:04:09for example i would like to book a flight suit on then
0:04:13and the dialogue manager with the connection
0:04:16for example a good field or a good nine
0:04:21and the usual way to design their manager
0:04:27is to a task than a reinforcement learning problems
0:04:31so we first but only programs
0:04:35and you with one engines
0:04:38interaction with no agreement
0:04:40so for example are agent is a dialogue manager
0:04:44and the environment will be a target user
0:04:48so the engine can take
0:04:52interaction
0:04:53and the environments we'll react
0:04:57and we can also it's a reaction
0:05:01so prime is an observation and we can also are but we are we want
0:05:08so amp right
0:05:09and even in this observation and no also the action taken
0:05:14be an agent can a date
0:05:17it's a joint state
0:05:19so we got here we go to a far from is to a sprite
0:05:24so we conducted that
0:05:27or the knowledge of the environment is contain
0:05:31in the top l is a
0:05:35a sprite and
0:05:37our prior
0:05:39so this is
0:05:41the mentioning you know reinforcement learning
0:05:43so we have knowledge of the environment
0:05:47taking the form of the samples
0:05:49and we want to design a good the strategy for the nao manager
0:05:56and have used that this is good policy so this is a function mapping
0:06:02states to a collection
0:06:04and we want to find the optimal policy
0:06:06so the optimal policy
0:06:08is a policy which maximizes
0:06:10at the community we weren't
0:06:12during in the direction
0:06:14between the dialogue manager and the target user
0:06:19so no
0:06:22i of the there is an equivalency between the dialogue manager a time stamp
0:06:26robots and a policy
0:06:28so we want to find the best
0:06:32what d c two represents all the database
0:06:36so this is this will selection phase
0:06:39and we introduce in this is the main contribution of the paper
0:06:43we introduce bodysuit raven distance
0:06:47so this is a matrix
0:06:48which computes
0:06:50the have you or differences between what is
0:06:54so
0:06:54we some state and we look at which edge action is taken
0:07:00in a each of these distinct
0:07:03and for example one can see that the third one
0:07:07is very close to populate one
0:07:10and the yellow is very different to the to the little
0:07:15so one can see this at least relevant distance
0:07:19as a binary vector
0:07:22and where the ones
0:07:25we present the action taken in a given state
0:07:29so for example
0:07:31we will but take these actions
0:07:34and the been every vector will look like
0:07:37and it if we combine of using every vector
0:07:41to the gender and all
0:07:43we have a unique button see
0:07:45with the which is greater
0:07:47train a distance
0:07:49so this allow us to use a clustering algorithm called k-means
0:07:56so can means will give our or the skewed or a dialogue manager
0:08:02as clusters
0:08:04and since we want to represent the gmm
0:08:07we will have to learn one policy by clusters
0:08:12so we give a working knowledge of each cluster and we learned policy with that
0:08:18but we can also use an of our algorithm
0:08:21code that come into its
0:08:22and i'm in the winter thanks to the police drama distance
0:08:26we finish directly free representative
0:08:31okay so no we want to select the best
0:08:34policy to dialogue with the target user
0:08:39so this is association or
0:08:41so for that we cannot use a bounded algorithm
0:08:44corn use into one
0:08:45so usually one will test
0:08:48each of the representative one by one time
0:08:51so you would deal with when one and two score is to with a one
0:08:56and then the with one
0:08:58and no is the next dialogue other the next system that the user will dialogue
0:09:04with
0:09:05is as a system which maximize the be value so
0:09:09now we will deal with the blue one
0:09:12and the u w is to the best
0:09:15so we keep the earring with the blue one
0:09:17and to which a very but school
0:09:20and at these points
0:09:22the red system at the better value so we switch or robots
0:09:27and we would be this process and to me which are maximum timing it
0:09:31for example one hundred the time step
0:09:36and so we know that on this is as the system or maximizing the them
0:09:42so the point of using a c d one is that the summaries and take
0:09:46into account the high variability
0:09:49of the dialogs
0:09:53okay so knowledge transfer the knowledge of this you know to a menu system
0:09:59so is also face
0:10:01so let's saying we have to the edge of samples the source image and the
0:10:05target image
0:10:07and we want to remove
0:10:09where the sample from the source badge
0:10:11already played present in the target image
0:10:14so for that we use those two base
0:10:18so this is a filtering algorithm
0:10:20it will consider their each some part of the source of h
0:10:24so let's say we start with this one
0:10:26and it would what's there are some kind with the same action
0:10:30so these two
0:10:32and sees us israel states is very different to the red state in the two
0:10:37states
0:10:38we can have a the source better
0:10:40to the funeral image
0:10:43no we because the obvious something
0:10:46and we can see that the light red state is very close to the right
0:10:51state
0:10:52so we don't at this simple to the pitch
0:10:55and we keep the we continue this for each sample of just a bench
0:11:01and in the end that we have but target image
0:11:05and we will use it really was this
0:11:08for learning a new policy
0:11:11so the other so that only
0:11:13is don't thanks to we the did you
0:11:17so if you did you is a reinforcement learning algorithm which take of any goods
0:11:23a bunch of samples
0:11:25and it would computes the optimal policy for this some pairs
0:11:31to think issue is
0:11:33and i resign coming from fitted value iteration and this specific algorithm can also from
0:11:41body recognition
0:11:42and value iteration is a very famous algorithm to solve a markov decision processes
0:11:51so if we combine as a filtering in the running
0:11:54one can see that we learn a
0:11:58a system
0:11:59which is a mix between when diesel together and the real users
0:12:04so we're gonna use this new
0:12:07this new system
0:12:09to dialogue now
0:12:11we target user
0:12:13so we a new dialogue to the target bench
0:12:16and you can see that the free software that at the bench are very similar
0:12:20to the sampling this was image
0:12:23so in the enter
0:12:25it remains only is about as a as a sample from the target image
0:12:30so when we going out on the then you put it
0:12:34we will on the very special specialised system to this a target user
0:12:41so this is the overall the additional process for
0:12:46for users
0:12:48and what we want to test are
0:12:51our framework on some experience
0:12:54so we gonna uses the negotiation that okay
0:12:57so we focused on a negotiation because
0:13:01we have two actors
0:13:04having a different be have your
0:13:07so we want to adapt to this year
0:13:09so in the negotiation there again you want to appear
0:13:12and they are given some time slots
0:13:17and preferences
0:13:18for each time slot
0:13:20and averaged around a
0:13:22each agent
0:13:25we're the proposed a slot
0:13:28for example kenny proposed a this drinks but
0:13:32and the wheel but we shoes and propose it's one utterance but
0:13:37so since as negation again is an obstruction of a yellow
0:13:42dialogue we introduced a noise
0:13:45in communication channel
0:13:47and the form of switching sometimes but so for example we replace the previous times
0:13:54right with the yellow one
0:13:56and can you will result we will assign a new information
0:14:01as a form of an automatic speech recognition score
0:14:06and you want this information it can continue the dialogue
0:14:10are you can ask to deal the origin to repeat the proposition
0:14:14or you can and does the data
0:14:16so for example you yes to repeat
0:14:21and be able but repeats
0:14:24and at some points
0:14:26can you can accept the proposition
0:14:29are you can also deny and the dialogue
0:14:34in the end of the dialogue where the users are rewarding
0:14:39we have a score
0:14:41and this court is functions you'd
0:14:44with the
0:14:46we are all the time slot and read
0:14:50so i four went to say that the point of the game
0:14:53is to final than agreements
0:14:56between at experts
0:14:58so can you really ugly well the less buttons here the all but see so
0:15:03that estimates is
0:15:04is smaller
0:15:07so now we want to test the this again
0:15:10we use the and there is a under the user interacting with the system so
0:15:15we designed a similar to users
0:15:17with a very difference profiles
0:15:21and so we have for example the determinized each user
0:15:24we will you will
0:15:26proposed is a certain slots in decreasing order
0:15:30and we have also this one now proposing instance
0:15:34taking a random actions
0:15:37this wonderful whereas propose it's a base the best start
0:15:42and this one accept as soon as possible and finally
0:15:46this one and the dialogue as soon as possible so this is very different be
0:15:52a if you are and we want to adapt to these vehicles
0:15:55we also design you want models
0:15:59so each one model is
0:16:01is a model of you man thanks to everything off
0:16:08one and read the dialogue by men so for you man
0:16:13and we model it is these
0:16:17is that so we used results
0:16:19with a k-nearest neighbor algorithm
0:16:23and you can scenes in the table
0:16:25the distribution of action for a feature we really humans
0:16:31so you can lead to that we'll and at x are very similar
0:16:36and you go and no one are pretty difference
0:16:41so now we want to design the system
0:16:43which we don't directly with this these results
0:16:48so that won't have the same action and the of the users to simplify the
0:16:52design
0:16:54as a set of function is received restricting
0:16:58and we don't know as we so previously this system with a few
0:17:04and a morse wire and that's one really agrees them to do some exploration
0:17:10so the in this tell the isn't sure of the dialogue system the dialog manager
0:17:17is a actually to commit a combination of the costs of the automatic speech sure
0:17:22regression recognition score
0:17:23and also the number of the
0:17:25of that are during the key
0:17:29so before test susie
0:17:32men framework we want to show that running one system by a user is a
0:17:39good thing
0:17:40so here we have a bunch of system so v s u one two three
0:17:45extra and each of the system learning strategy
0:17:49with the this users so obviously when we don't know
0:17:53the strategy against a pu one
0:17:56and you can not is that the board values
0:18:00actually indicate that
0:18:02as a bit so the bit the system to dialogue we've a given user is
0:18:07the system we should on the strategy
0:18:09we this user
0:18:10so there is a real we need to adaptation
0:18:16we can share the same with you'll and when they're users
0:18:19the t and the difference is that well if you
0:18:23and actually it is the especially for is a screen and thus use alex
0:18:30the
0:18:31the both
0:18:33one point or seventy four in one way or seventy three
0:18:38a very close and you can do sources and the thing for the line we
0:18:43will
0:18:45so
0:18:46no we can test the main framework for adaptation
0:18:50so for that we introduce two new methods
0:18:55one using
0:18:57and without the scratch so is quite sure it's just go down just learn to
0:19:01make the system from scratch without
0:19:04transferring in english
0:19:05and the other one is a limited so this is the generic
0:19:09generic midi the
0:19:11each way on the policy we all the knowledge of the database
0:19:14so we generate too slow system database one for the user's stability and once for
0:19:20the human model users
0:19:22and each new system is it on things to
0:19:25we one that thousand two hundred dialogues
0:19:29and each means that there is this two
0:19:33we to two hundred dialogues
0:19:37so for simulated users
0:19:40alternate alternative is intent on the other show a significant better result than i don't
0:19:45know and scratch for the two metrics
0:19:48the scores and task completion
0:19:51but in an upper hand for your money they results
0:19:54our method are it is better
0:19:56but not that much and
0:19:59the reason for that is negotiation that again is a two simple for humans
0:20:04and i actually most of the human have the same behavior on the game
0:20:10so there is no points of learning
0:20:14i don't that you strategy
0:20:15since all the people have the same behavior
0:20:21so we have to conclude we provide the framework for a user adaptation
0:20:26and the we introduce a prescription distance which is a way to
0:20:32compute the everywhere differences
0:20:35and we validate the framework on both
0:20:40this unit user and human with a user setup
0:20:43and finally we show that the overall
0:20:47dialogue quality is a hands
0:20:50based on two metrics of the task completion and the score
0:20:55so thank you
0:21:23i wasn't sure what you squirt for your cross comparison
0:21:28i we want to see this way
0:21:33next table so what is numbers and what's good
0:21:39well which
0:21:42each for represents the score
0:21:44of each is then given the user of the whole
0:21:48so the system is
0:21:50and the other thing we the each user
0:21:54so
0:21:54so for example a dispute to have a score of zero point forty four
0:22:01we the b one
0:22:03what is that score
0:22:05score is a score is
0:22:07is the mean we while of is a diagonal
0:22:10g i at the end of the dialogue there is a we want okay and
0:22:13we do some you know g though
0:22:15on the register maximum rate is the maximum score
0:22:22yes actually it's
0:22:24it's too
0:22:26higher
0:22:28sorry the higher better that's
0:22:48okay
0:22:49the question could you
0:22:51more details about a reinforcement learning
0:22:56i e c
0:23:00the key
0:23:02you want you are
0:23:13i
0:23:15speaker once again