i'm not like a

and my dog adviser a woman devilish and that he picked him

and i want to talk about the user adaptation

in dialogue system

so most of the state of course

dialogue system and most of the production dialogue system

are adapting

gender equality generic strategy

so we have the same behavior

for any user

users

and what's going to do is to learn one strategy

for each of these users

the propose a problem with a learning strategy from scratch

is one to do some expression

and expression lead to

very bad

performance is far directions

so we want to design

a framework

which is

i very good during the course starts of face

and it must also be good during the as i said

concept that interface

so we propose

for processes for user adaptation

and who can composed of upright faces

and it goes of this way

so let's say we have a bunch of robot's we present think a dialogue system

and each of these robots

a learning strategy versus use a specific users

and they also giver

or the dialogue was done with the this user

so all the knowledge of this well but

is represented

by the dialogues

so we want to elect

some representatives

all the database

and for example gives a little bit and i did one

and it's a it's a novel we have a target user

and we don't have a system

two dialogue you'd of these target user so we want to design a system from

scratch

and what's going to do is to transfer the knowledge of one of the we

present that you to the system

so i'd first we want to select the best representative to dialogue we have or

target user input

and we will try it should be represent the t one by one

and at the end

we select the better a dialogue system which is blue lines the you use

so now we natural for all the knowledge

to the new system

so let's say we have

scrunch system

and we're gonna know the strategic thanks to the knowledge transfer and also

we all the dialogue don't during the source selection face

so we gonna use this new this can they have system

to their with this user

and we collect more dialogues

and then we can learn new system morse a more specialised

to this target user

and we repeat this process and to be which

a very as busy writers the spectral is

general system to be a target user

so in the end we are then you

and you wanna target dust into the two sources

so i will detail each of these a face

so the sources are dialogue manager

so they have manager components of dialogue systems

and this manager take as input a repetition activities

for example i would like to book a flight suit on then

and the dialogue manager with the connection

for example a good field or a good nine

and the usual way to design their manager

is to a task than a reinforcement learning problems

so we first but only programs

and you with one engines

interaction with no agreement

so for example are agent is a dialogue manager

and the environment will be a target user

so the engine can take

interaction

and the environments we'll react

and we can also it's a reaction

so prime is an observation and we can also are but we are we want

so amp right

and even in this observation and no also the action taken

be an agent can a date

it's a joint state

so we got here we go to a far from is to a sprite

so we conducted that

or the knowledge of the environment is contain

in the top l is a

a sprite and

our prior

so this is

the mentioning you know reinforcement learning

so we have knowledge of the environment

taking the form of the samples

and we want to design a good the strategy for the nao manager

and have used that this is good policy so this is a function mapping

states to a collection

and we want to find the optimal policy

so the optimal policy

is a policy which maximizes

at the community we weren't

during in the direction

between the dialogue manager and the target user

so no

i of the there is an equivalency between the dialogue manager a time stamp

robots and a policy

so we want to find the best

what d c two represents all the database

so this is this will selection phase

and we introduce in this is the main contribution of the paper

we introduce bodysuit raven distance

so this is a matrix

which computes

the have you or differences between what is

so

we some state and we look at which edge action is taken

in a each of these distinct

and for example one can see that the third one

is very close to populate one

and the yellow is very different to the to the little

so one can see this at least relevant distance

as a binary vector

and where the ones

we present the action taken in a given state

so for example

we will but take these actions

and the been every vector will look like

and it if we combine of using every vector

to the gender and all

we have a unique button see

with the which is greater

train a distance

so this allow us to use a clustering algorithm called k-means

so can means will give our or the skewed or a dialogue manager

as clusters

and since we want to represent the gmm

we will have to learn one policy by clusters

so we give a working knowledge of each cluster and we learned policy with that

but we can also use an of our algorithm

code that come into its

and i'm in the winter thanks to the police drama distance

we finish directly free representative

okay so no we want to select the best

policy to dialogue with the target user

so this is association or

so for that we cannot use a bounded algorithm

corn use into one

so usually one will test

each of the representative one by one time

so you would deal with when one and two score is to with a one

and then the with one

and no is the next dialogue other the next system that the user will dialogue

with

is as a system which maximize the be value so

now we will deal with the blue one

and the u w is to the best

so we keep the earring with the blue one

and to which a very but school

and at these points

the red system at the better value so we switch or robots

and we would be this process and to me which are maximum timing it

for example one hundred the time step

and so we know that on this is as the system or maximizing the them

so the point of using a c d one is that the summaries and take

into account the high variability

of the dialogs

okay so knowledge transfer the knowledge of this you know to a menu system

so is also face

so let's saying we have to the edge of samples the source image and the

target image

and we want to remove

where the sample from the source badge

already played present in the target image

so for that we use those two base

so this is a filtering algorithm

it will consider their each some part of the source of h

so let's say we start with this one

and it would what's there are some kind with the same action

so these two

and sees us israel states is very different to the red state in the two

states

we can have a the source better

to the funeral image

no we because the obvious something

and we can see that the light red state is very close to the right

state

so we don't at this simple to the pitch

and we keep the we continue this for each sample of just a bench

and in the end that we have but target image

and we will use it really was this

for learning a new policy

so the other so that only

is don't thanks to we the did you

so if you did you is a reinforcement learning algorithm which take of any goods

a bunch of samples

and it would computes the optimal policy for this some pairs

to think issue is

and i resign coming from fitted value iteration and this specific algorithm can also from

body recognition

and value iteration is a very famous algorithm to solve a markov decision processes

so if we combine as a filtering in the running

one can see that we learn a

a system

which is a mix between when diesel together and the real users

so we're gonna use this new

this new system

to dialogue now

we target user

so we a new dialogue to the target bench

and you can see that the free software that at the bench are very similar

to the sampling this was image

so in the enter

it remains only is about as a as a sample from the target image

so when we going out on the then you put it

we will on the very special specialised system to this a target user

so this is the overall the additional process for

for users

and what we want to test are

our framework on some experience

so we gonna uses the negotiation that okay

so we focused on a negotiation because

we have two actors

having a different be have your

so we want to adapt to this year

so in the negotiation there again you want to appear

and they are given some time slots

and preferences

for each time slot

and averaged around a

each agent

we're the proposed a slot

for example kenny proposed a this drinks but

and the wheel but we shoes and propose it's one utterance but

so since as negation again is an obstruction of a yellow

dialogue we introduced a noise

in communication channel

and the form of switching sometimes but so for example we replace the previous times

right with the yellow one

and can you will result we will assign a new information

as a form of an automatic speech recognition score

and you want this information it can continue the dialogue

are you can ask to deal the origin to repeat the proposition

or you can and does the data

so for example you yes to repeat

and be able but repeats

and at some points

can you can accept the proposition

are you can also deny and the dialogue

in the end of the dialogue where the users are rewarding

we have a score

and this court is functions you'd

with the

we are all the time slot and read

so i four went to say that the point of the game

is to final than agreements

between at experts

so can you really ugly well the less buttons here the all but see so

that estimates is

is smaller

so now we want to test the this again

we use the and there is a under the user interacting with the system so

we designed a similar to users

with a very difference profiles

and so we have for example the determinized each user

we will you will

proposed is a certain slots in decreasing order

and we have also this one now proposing instance

taking a random actions

this wonderful whereas propose it's a base the best start

and this one accept as soon as possible and finally

this one and the dialogue as soon as possible so this is very different be

a if you are and we want to adapt to these vehicles

we also design you want models

so each one model is

is a model of you man thanks to everything off

one and read the dialogue by men so for you man

and we model it is these

is that so we used results

with a k-nearest neighbor algorithm

and you can scenes in the table

the distribution of action for a feature we really humans

so you can lead to that we'll and at x are very similar

and you go and no one are pretty difference

so now we want to design the system

which we don't directly with this these results

so that won't have the same action and the of the users to simplify the

design

as a set of function is received restricting

and we don't know as we so previously this system with a few

and a morse wire and that's one really agrees them to do some exploration

so the in this tell the isn't sure of the dialogue system the dialog manager

is a actually to commit a combination of the costs of the automatic speech sure

regression recognition score

and also the number of the

of that are during the key

so before test susie

men framework we want to show that running one system by a user is a

good thing

so here we have a bunch of system so v s u one two three

extra and each of the system learning strategy

with the this users so obviously when we don't know

the strategy against a pu one

and you can not is that the board values

actually indicate that

as a bit so the bit the system to dialogue we've a given user is

the system we should on the strategy

we this user

so there is a real we need to adaptation

we can share the same with you'll and when they're users

the t and the difference is that well if you

and actually it is the especially for is a screen and thus use alex

the

the both

one point or seventy four in one way or seventy three

a very close and you can do sources and the thing for the line we

will

so

no we can test the main framework for adaptation

so for that we introduce two new methods

one using

and without the scratch so is quite sure it's just go down just learn to

make the system from scratch without

transferring in english

and the other one is a limited so this is the generic

generic midi the

each way on the policy we all the knowledge of the database

so we generate too slow system database one for the user's stability and once for

the human model users

and each new system is it on things to

we one that thousand two hundred dialogues

and each means that there is this two

we to two hundred dialogues

so for simulated users

alternate alternative is intent on the other show a significant better result than i don't

know and scratch for the two metrics

the scores and task completion

but in an upper hand for your money they results

our method are it is better

but not that much and

the reason for that is negotiation that again is a two simple for humans

and i actually most of the human have the same behavior on the game

so there is no points of learning

i don't that you strategy

since all the people have the same behavior

so we have to conclude we provide the framework for a user adaptation

and the we introduce a prescription distance which is a way to

compute the everywhere differences

and we validate the framework on both

this unit user and human with a user setup

and finally we show that the overall

dialogue quality is a hands

based on two metrics of the task completion and the score

so thank you

i wasn't sure what you squirt for your cross comparison

i we want to see this way

next table so what is numbers and what's good

well which

each for represents the score

of each is then given the user of the whole

so the system is

and the other thing we the each user

so

so for example a dispute to have a score of zero point forty four

we the b one

what is that score

score is a score is

is the mean we while of is a diagonal

g i at the end of the dialogue there is a we want okay and

we do some you know g though

on the register maximum rate is the maximum score

yes actually it's

it's too

higher

sorry the higher better that's

okay

the question could you

more details about a reinforcement learning

i e c

the key

you want you are

i

speaker once again