two

hello and the lightning

again and welcome to the next fashion and on policy and knowledge and we will

start this test set with the talk

on the reinforcement learning for modeling chitchat dialogue with this we actually it's

and that i did they are is by

seen that the right channel chi and c g rather and the presenter is a

g

i works

and you at the trial run

hi everyone

thank you for be here and it's pretty exciting to be at sig dull

i'm cg it's probably and let me give a little background intro

to what we do i'm from global a i'm a research scientist healthily multiple machine

learning groups my group is focused on dealing with a lot of deep learning problems

where you actually have to inject structure into deep networks like only combine graph lining

the traditional graph learning approaches

with deep learning so we've actually released like a bunch of things and doing semi

supervised learning at scale if you using any of the good products g mail so

to anything et cetera where you will actually be using stuff that people

we also do count as a actually i so i'll show you want example of

that on detecting intends but also like multiple times

board for language and also for revisions of using state-of-the-art vision

technology

misnomer

people might think google

large companies have a lot of resources we label all the data sets that we

have

do you actually able to set of god recognition image recognition system that you using

google photos and cloud

we have less than one percent

annotation

and the reason it works is

in like two words

semi supervised

thus

deep learning and a lot of other optimisations that are going on under the hood

to my group is responsible for some of these things

and finally

a lot of the problems that we have to do with

actually require a lot of compute on the cloud

my group is also looking at things like how to do things on device

imagine you have to build a dialog generation system

or a conversational system that has to fit on your watch that cannot actually have

access to gigabytes of memory or even you know a lot of compute unlike you

know the cloud where you can do cpus gpus and all the latest generation hardware

so with that

hope is gone just mapping of things we work on

this is joint work with

my fabulous interns to know the right who couldn't be here is from y'all are

from us images lab

the talk is gonna be about deep reinforcement learning for modeling chitchat

dialogue with discrete attribute if that's quite a mouthful

all it means as

we try to do dialog generation but controllable semantics

and i will give you an overview of what we are talking about here so

first off

like for any generation system you have to predict responses

here to applications where we have to predict responses and these are not more data

and but equally hard

at the order of like millions or even billions of predictions per day

one s market by which our team double up

several years ago

i mean if you're familiar with smart reply

okay quite a few if for those of you who don't know

if a using g e mail

on your phone

if you see those blue suggestion box that pop up at the bottom that's exactly

what it is

so

if you have any email or chat message it actually contextually generates responses that are

relevant for you and if you notice these are actually very different responses that all

the three suggestions and not necessarily the same so this is the smart reply system

and for free folks who think that this is a simple

and coder decoder problem

i can sure you that

to get it to work

it's definitely not there's a lot more things going on you can either paper from

ktd

but out that someone some of these attributes later in the talk today as well

but you can take this to the multi modal setting as well so we all

really something called for a reply after the initial smart of like version

where now you lead to you receive an image and you have to understand the

semantics of the visual content

and generate an appropriate response so if you look at the picture

and it shows a baby

the system would say so cute

and you probably send it unless probably you don't have a hard

right

or if you see like other favourite things that would like if you see skydiving

video or a image it'll actually suggest how brave

i always been a very good the start

one more suggestions how stupid should come at the end of it as well but

b control for those set of things

so these are just examples of generation systems but

like the task that we're trying to solve in this paper is well basically we

try to model open-domain dialogue so everybody here i don't need to introduce

task-oriented dialog systems are available in everyday systems i mean you're talking about booking reservations

like you know playing music et cetera there is a task and all the you

know prediction a system that you bill

parameters are optimized towards solving the task

open-ended dialogue is much harder

and one of the common way that people's all this is the standard

sequences sequence model

but you try to modeled as a machine translation problem so you given a history

of dialogue utterance sequences

and then you're trying to translate

some representation of that encoded sequence

into

you know decoder sequence in this case an utterance that you're going to

like send

what's the problem

almost every system especially the neural systems

that you have today

like doesn't matter which over time when you use seem quite repetitive and they sound

very redundant right so the problem as a like from and ml perspective

the unlike the task oriented dialogue the we cover is much larger and

there's a high entropy that you have like few responses that are very commonly occurring

but then of this long tail off like red responses so

given a choice most of these systems are trying to maximize likelihood in some form

of the other

ill actually pretty big to generate responses and give you the maximum

likelihood or the lowest perplexity

so this is a common problem of course it's not a new problem like anyone

who's

both systems would have realised this and there are many ways to address this like

people afraid doing adding like you know loss function objective function extending the loss functions

you basically by sir system to produce longer sequences you know non-redundant responses

adding an rl layer on top of the you know the deep learning system so

that you can actually optimise your policy to do something that is non redundant and

even injecting knowledge it's from sources like we need but a et cetera

so

in our work

what we propose is instead

do

conditional model where we're trying to condition the utterance generation that the dialog generation

based on interpretable and discrete dialog attributes

so

i will unpack each of those phrases like it within the next you slide but

here the building block for the model

so we use the standard

encoder-decoder model but this is a hierarchical encoder-decoder model like originally introduced in serving at

all

and

you can think of the says like to levels of and rnn recurrent neural network

where the first layer is actually operating over words in the utterance

at any given time step and then that generates a context eight

and then you have another rnn that operate over a sequence of

timestamps

so basically that operates over the multiple turns in the dialogue

simple enough of course

training these things a never ever simple enough is like you know all kinds of

hyperparameter tunings et cetera but we're not gonna talk about that

instead what our model does as we propose a conditional response generation model

where we trying to learn a conversational network that is conditioned on interpretable and

compose able dialogue attribute so

you have the same the first layer of rnn operating over be what in the

utterance

but instead of actually using just the context it to start decoding and generate a

response we now going to model attributes

dialog attributes in a tell you what does dialog attributes are

these are interpretable and discrete attributes

just not like there's been what do not like latent attributes where you have continues

representations like the model a dialog state et cetera but here we can use discrete

attribute

which are predicted

and model

during the generation process

and now want to predict the attribute at a given time stamp

that last the context state is

together used to generate the decoding state that means then you're gonna start generating the

utterance after that point

so what is a dialog attribute

so we chose intentionally chose things like

dialogue acts

sentiment emotion speaker persona these are things that be actually want to model about a

dialogue

so the reason is we want to get control the semantic so

it's not just about

saying that hey does it look fluent or not

but imagine what i want to if i want to say that

make the dialogue sound more happy

or

for example

and that the specific speaker style

or a specific emotion

or in the extreme and this is like

for their along if you want your dialogue systems to start becoming empathetic et cetera

like first of all quantifying what that means is also hard problem like there's i

we don't have a whole talk and just that

and

this is that

crucial part here

so we are trying to force the encoder not to just generate the con contextual

state but instead use that also degenerate a latent but interpretable representation of the dialogue

at that particular time stamp and together use it to start the generation process

now these are composed of lies has said

so it's not just one single dialogue act or dialogue act to be that you

would predict you can actually predict multiple ones of them so you can have a

sentiment and a dialogue act

and any motion and a style all being represented in the same model and in

a few slides will be tear why this is useful

so

this is pretty much the just of the model

so the

but that you change are now you wouldn't model the attribute sequence

and predicting the attribute itself is a simple mlp multilayer perceptron you can have more

fancier things

but this is integrated with the joint model

and then used are the generation process

during inference the best part about this is you would say that now you're complicating

model even more

you just introduce another bunch of parameters there

obviously is gonna do better perplexity but

what are you going to do for annotation like do you need another system just

to give you manually labeled annotated data at the attribute level now for your dollar

the good news is that you don't need it so here's how you do the

inference

so you start predicting be dialog attributes of the dialogue context so at any time

to time you use the context vector to predict the attribute

now condition on the previous attribute

you actually predict the next

i'd view that means that time stamp i use that attributed i minus one to

predict that you know the dialogue act

combine it with the context aided i minus one

to start the generation process

and as i mentioned the

attribute annotation is not required during inference you just user during training

now there is a whole

bunch of things you can do together we even from the actual adaptation during training

time for example

you need to say that

i need my training data also to be tied with semantic labels or like you

motion labels or dialogue acts

you could learn

an open-ended

set of things like for example open-ended topics of the dialogue

and i want getting to that and the startling it but if a person to

be happy to answer that you to

so

this is the crux of the model

of course it doesn't stop there

for most dialogue systems we also have to do in a rl reinforcement layer on

top of that where you try to optimize a policy gradient

usually these objectives a slightly different from the maximum likelihood objective that means you're trying

to bias along responses or some other goal

use the standard reinforce

and usually the policies are initialized from the supervised pre-training so the

attribute conditional the hierarchical recurrent

and coda model is the one for screen and then you initialise the rl policy

parameters

from that state

in standard works the this is how it looks like

you formant formally the policy as a token prediction problem so this database is basically

represented by the context at that means the encoder state

and the action space is you trying to predict the token vocabulary one at a

time

what's the problem with this

besides the double countries large for open-domain

usually what ends up happening is these

policy grading methods exhibit high variance and this is basically because of the large action

space

and

the rl which is actually introduced to actually buys this surprise learning system some you

know away from what it was supposed to line and like printers

do meaningful dialogue

instead tries to step away be linguistic and that's language phenomena

simply because

certain words are more frequent than others

again

the policies friend

big

those words

from the vocabulary that will actually maximize its reward or utility function

so

of course

training and convergence is another issue in this

setting as well

instead would be say is like

instead of doing be

token generation be formulated policy as a dialog attribute prediction problem the state space now

becomes

a combination of the dialogue context

and the contextual attribute and these attributes of the dialogue at with the dimension in

the previous slide

the action space is

the set of dialog attribute

something more latent

something more interpretable

and

in fact

think about it like if you capture some aspect of a semantics of a sentiment

you need all the words possible

in the english vocabulary or any language vocabulary to generate that specific sentiment i mean

as soon as you gotta that just

the generation can actually downstream do much more interesting things so you're elevating the problem

from the lexical level to the semantic level

so

there's a reason why this so people might say okay you introduce another attribute or

like another set of parameters a latent layer there this is interpretable it's great

of course this is gonna improve perplexity

i'll show you that it's not just about complexity what ends up happening is even

from the

learning theory perspective

because you're introducing these

latent models and interpretable discrete variable models

it actually converges better and learns to generate much more fluent and smooth responses

and explore parts of the search space that it wouldn't the before

simply because as an on almost every problem in the space is nonconvex so here

we start with that but

so here you're actually using the semantics or the user not language phenomena to guide

it in a better

what was it speaks

so the experiment results conform the same like so we runs on a bunch of

datasets like there's a perplexity and the table shows basically

the columns are how much training data was trained on

obviously if you go from left to right

the more data trained on the better the perplexity of the generated dialogue that it's

e

and here are the attributes that we use a to model the dialogue

now

like sentiment means you're actually incorporating sentiment in the dialogue attribute stage of the model

prediction switchboard is basically the dialogue acts frames is not a set of dialogue act

so

this can all be mutually exclusive all to be complementary or even overlapping

and what we know what is this it's actually even beneficial to compose them of

these attributes so they provide very different information so

the fact that you model sentiment is not the same as you fact that you

model

dialogue acts the fact that you model dialogue acts from one particular

john does not the same as modeling

dialogue act from a different drawn so you can actually compose these attributes in very

flexible fashion and in fact it actually improves the generation

but the means the perplexity goes down

so overall would be c is that the

both the attribute conditioning and the reinforcement learning part

generates like much better responses and more interesting in diverse responses

so one we obviously

as i said i keep repeating perplexity because every time you see a deep learning

system i mean it's easy to improve perplexity try to me you add more parameters

the system i mean

the

the weight works is like more parameters means and you add more data you can

actually improve perplexity by optimising towards better state to the other parameter settings configurations

now we also in addition

did you many bows on the generated responses to see if it actually makes sense

i mean because as a whole goal of generation i believe every generation system should

do

human about some setting if at all possible

and what we notice is like

a standard sequences sequence model compared with the attribute conditioning

obviously the i could be conditioning actually helps the varsity and also relevance

better that means it has much more winter loss ratio compared to this baseline model

now in addition

when you add the rl conditioning on top of that the means like we do

the policy optimisation from this implies pre-training step

it does even better

so the rl as i said is actually knew

move or nicely supervised training states from that initialization state to a better is good

a lot about a policy but instead of learning it over at the token level

now it's actually gonna learned that the attribute so we injecting attribute conditioning both the

b r a level and also this approach training model

if you compute the score is already but see discourse and their standard ways to

do these based in the literature

look at the responses and you can do automatic

you know computation of the about metrics like

compute the number of you know n-grams

that are overlapping et cetera

a how many distinct phrases or you know generated in the system

overall the

sequences you can model is worse than the attribute condition model and the other one

is actually even better than both of that

in addition

if you take like the said

of the response space that means like the most likely responses

and you look at the percentage of them generated in the new systems

the percentage goes down significantly how many times have you seen a chat or anything

or any of the voice's systems you ask a question says i don't know right

so the goal is

that's a default you know fallback mechanism but the goal is like instead of that

can be model something about for example

emotional responses or other things just sort of engage the user in a better fashion

what this allows to do is like you don't get the

standard frustrating i don't know instead you get something mourn once it may not be

the answer directly but it'll probably d the quantisation a much better five

or direction

and you're some examples which are one go through but like

for standard inputs or not the standard either from read it so that never standard

you get like interesting responses instead of think saying things like

you know i don't know or you know leaving i don't want to have no

idea used are getting like longer responses but also things that like mitch you know

probably make more sense like for example i'm honestly bit confused

why

no one is brought me or my books any k might but it should be

box i think at kick

i don't think i don't think anything that's with the sequence a sequence model would

even but that you conditioning

voices are all say i can't wait to see in the city

some of the context is missing from this example because the previous dialogue history it's

been cut off here but there's something about the c d being mentioned there that's

why it's to see

okay just to summarize i-th

we propose a new approach for dialog generation with control the link opposable semantics i

think this is a super important then interesting topic because

it's very easy to

begin or what can generation we can do jans and all kinds of things like

that but

making it actually interpretable uncontrollable in this fashion believe also how that these in our

empirical experiments tell the learning process as well it's not just about saying that this

is a good knots language for non that we wanna model

both the rl and look at the conditioning

gender improves the baseline model by generating interesting and it was responses

their number of things that b

you know are looking at in the future

in addition to incorporating multimodal but

what is the impact of debriefing

classifiers like for example as is that like we didn't use pre-trained classifiers as the

attribute prediction problem there

and how do we like

measure the interpretability via modeling this during the training process

audrey dialogue data generated actually

respecting the semantics of the attributes that it actually predicts i mean there's that even

makes sense

and then like how do you know do this for

speaker persona an extended to more open-ended concepts

these are

questions in like you know thoughts

if you have any questions related to any of these things hundred runs of them

i am residuals from start of five am i was very interested in your training

corpus size of the examples you gave for the dialogue model training we've had up

to two meeting million training examples obviously in a situation assume you're not a manually

generating them are you getting them for me to give examples or where else you

get it's a user some of them are from

that dreaded and the open-set i was corporas these are available

as it is said

the attributes

themselves i'm not necessarily always manly annotated for example for so which but i believe

first part of that behind it

a for one of the dataset but what we ended up doing is like you

can take the

standard lda or any other you know tool

actually label them with the center so you can have a less a high precision

classify image actually do

a runaway training corpus so these can be single label for instance

and interesting part is that

after modeling all this like the it's not necessary the accuracy of the dialogue act

to be prediction will go are in the latent system

even though that might be really eighties or something like that it still is good

enough for the generation system

it so there is a so there's something work to be done about like

how good can we get like i mean should be bumped up to like to

ninety nine percent then whether that have an effect on the generation

things that we are looking at

i am adding more german research lab just had a question about i guess did

you look at speaker persona at all i was only curious maybe you can speculated

about it do you think with enough data

with the conditional model you could model individual users

maybe like to read it user names or something like

there is a joke when we really smarter clapping after the first

further for version assume

i think it was a some professor from universities it

this modifies and getting seem very snotty to me

as like

it's training on your own data i mean we don't look at the data but

you know it's basically reflect in yourself

so show an answer is yes but of course you want to do this what

you know data right and you also want to do it in the privacy present

manner which i haven't talked about here at all right part of my group focus

on like

how do you do this all in the privacy preserving manner right for example you

can build a general system

but then

all the inference and things can happen only on-device are in like sort of like

your data is like silent off from everybody else

and the question is again

deep really do you feel like you have a specific personality or what you feel

was is what you actually right

might be very different right so that their aspects of that to be considered

i'll be here if you want