0:00:17two
0:00:30hello and the lightning
0:00:33again and welcome to the next fashion and on policy and knowledge and we will
0:00:41start this test set with the talk
0:00:44on the reinforcement learning for modeling chitchat dialogue with this we actually it's
0:00:50and that i did they are is by
0:00:53seen that the right channel chi and c g rather and the presenter is a
0:00:58g
0:01:10i works
0:01:12and you at the trial run
0:01:14hi everyone
0:01:16thank you for be here and it's pretty exciting to be at sig dull
0:01:20i'm cg it's probably and let me give a little background intro
0:01:25to what we do i'm from global a i'm a research scientist healthily multiple machine
0:01:30learning groups my group is focused on dealing with a lot of deep learning problems
0:01:35where you actually have to inject structure into deep networks like only combine graph lining
0:01:39the traditional graph learning approaches
0:01:41with deep learning so we've actually released like a bunch of things and doing semi
0:01:45supervised learning at scale if you using any of the good products g mail so
0:01:49to anything et cetera where you will actually be using stuff that people
0:01:53we also do count as a actually i so i'll show you want example of
0:01:57that on detecting intends but also like multiple times
0:02:02board for language and also for revisions of using state-of-the-art vision
0:02:07technology
0:02:09misnomer
0:02:10people might think google
0:02:12large companies have a lot of resources we label all the data sets that we
0:02:16have
0:02:17do you actually able to set of god recognition image recognition system that you using
0:02:21google photos and cloud
0:02:23we have less than one percent
0:02:25annotation
0:02:26and the reason it works is
0:02:28in like two words
0:02:30semi supervised
0:02:31thus
0:02:32deep learning and a lot of other optimisations that are going on under the hood
0:02:36to my group is responsible for some of these things
0:02:39and finally
0:02:40a lot of the problems that we have to do with
0:02:43actually require a lot of compute on the cloud
0:02:45my group is also looking at things like how to do things on device
0:02:49imagine you have to build a dialog generation system
0:02:52or a conversational system that has to fit on your watch that cannot actually have
0:02:56access to gigabytes of memory or even you know a lot of compute unlike you
0:03:01know the cloud where you can do cpus gpus and all the latest generation hardware
0:03:06so with that
0:03:07hope is gone just mapping of things we work on
0:03:09this is joint work with
0:03:11my fabulous interns to know the right who couldn't be here is from y'all are
0:03:15from us images lab
0:03:18the talk is gonna be about deep reinforcement learning for modeling chitchat
0:03:22dialogue with discrete attribute if that's quite a mouthful
0:03:25all it means as
0:03:26we try to do dialog generation but controllable semantics
0:03:30and i will give you an overview of what we are talking about here so
0:03:34first off
0:03:36like for any generation system you have to predict responses
0:03:39here to applications where we have to predict responses and these are not more data
0:03:44and but equally hard
0:03:46at the order of like millions or even billions of predictions per day
0:03:50one s market by which our team double up
0:03:54several years ago
0:03:55i mean if you're familiar with smart reply
0:03:57okay quite a few if for those of you who don't know
0:04:00if a using g e mail
0:04:02on your phone
0:04:03if you see those blue suggestion box that pop up at the bottom that's exactly
0:04:07what it is
0:04:08so
0:04:09if you have any email or chat message it actually contextually generates responses that are
0:04:13relevant for you and if you notice these are actually very different responses that all
0:04:17the three suggestions and not necessarily the same so this is the smart reply system
0:04:22and for free folks who think that this is a simple
0:04:24and coder decoder problem
0:04:26i can sure you that
0:04:28to get it to work
0:04:29it's definitely not there's a lot more things going on you can either paper from
0:04:33ktd
0:04:34but out that someone some of these attributes later in the talk today as well
0:04:37but you can take this to the multi modal setting as well so we all
0:04:40really something called for a reply after the initial smart of like version
0:04:44where now you lead to you receive an image and you have to understand the
0:04:49semantics of the visual content
0:04:51and generate an appropriate response so if you look at the picture
0:04:55and it shows a baby
0:04:56the system would say so cute
0:04:59and you probably send it unless probably you don't have a hard
0:05:02right
0:05:03or if you see like other favourite things that would like if you see skydiving
0:05:07video or a image it'll actually suggest how brave
0:05:11i always been a very good the start
0:05:13one more suggestions how stupid should come at the end of it as well but
0:05:17b control for those set of things
0:05:20so these are just examples of generation systems but
0:05:23like the task that we're trying to solve in this paper is well basically we
0:05:26try to model open-domain dialogue so everybody here i don't need to introduce
0:05:30task-oriented dialog systems are available in everyday systems i mean you're talking about booking reservations
0:05:35like you know playing music et cetera there is a task and all the you
0:05:39know prediction a system that you bill
0:05:42parameters are optimized towards solving the task
0:05:45open-ended dialogue is much harder
0:05:47and one of the common way that people's all this is the standard
0:05:51sequences sequence model
0:05:52but you try to modeled as a machine translation problem so you given a history
0:05:56of dialogue utterance sequences
0:05:57and then you're trying to translate
0:05:59some representation of that encoded sequence
0:06:03into
0:06:04you know decoder sequence in this case an utterance that you're going to
0:06:07like send
0:06:09what's the problem
0:06:10almost every system especially the neural systems
0:06:14that you have today
0:06:16like doesn't matter which over time when you use seem quite repetitive and they sound
0:06:20very redundant right so the problem as a like from and ml perspective
0:06:26the unlike the task oriented dialogue the we cover is much larger and
0:06:30there's a high entropy that you have like few responses that are very commonly occurring
0:06:33but then of this long tail off like red responses so
0:06:37given a choice most of these systems are trying to maximize likelihood in some form
0:06:41of the other
0:06:42ill actually pretty big to generate responses and give you the maximum
0:06:46likelihood or the lowest perplexity
0:06:48so this is a common problem of course it's not a new problem like anyone
0:06:52who's
0:06:53both systems would have realised this and there are many ways to address this like
0:06:56people afraid doing adding like you know loss function objective function extending the loss functions
0:07:02you basically by sir system to produce longer sequences you know non-redundant responses
0:07:08adding an rl layer on top of the you know the deep learning system so
0:07:12that you can actually optimise your policy to do something that is non redundant and
0:07:16even injecting knowledge it's from sources like we need but a et cetera
0:07:22so
0:07:22in our work
0:07:23what we propose is instead
0:07:25do
0:07:26conditional model where we're trying to condition the utterance generation that the dialog generation
0:07:30based on interpretable and discrete dialog attributes
0:07:34so
0:07:34i will unpack each of those phrases like it within the next you slide but
0:07:41here the building block for the model
0:07:43so we use the standard
0:07:45encoder-decoder model but this is a hierarchical encoder-decoder model like originally introduced in serving at
0:07:50all
0:07:50and
0:07:50you can think of the says like to levels of and rnn recurrent neural network
0:07:55where the first layer is actually operating over words in the utterance
0:07:59at any given time step and then that generates a context eight
0:08:02and then you have another rnn that operate over a sequence of
0:08:06timestamps
0:08:07so basically that operates over the multiple turns in the dialogue
0:08:11simple enough of course
0:08:12training these things a never ever simple enough is like you know all kinds of
0:08:16hyperparameter tunings et cetera but we're not gonna talk about that
0:08:20instead what our model does as we propose a conditional response generation model
0:08:24where we trying to learn a conversational network that is conditioned on interpretable and
0:08:29compose able dialogue attribute so
0:08:32you have the same the first layer of rnn operating over be what in the
0:08:36utterance
0:08:36but instead of actually using just the context it to start decoding and generate a
0:08:41response we now going to model attributes
0:08:44dialog attributes in a tell you what does dialog attributes are
0:08:47these are interpretable and discrete attributes
0:08:50just not like there's been what do not like latent attributes where you have continues
0:08:53representations like the model a dialog state et cetera but here we can use discrete
0:08:58attribute
0:08:59which are predicted
0:09:00and model
0:09:01during the generation process
0:09:02and now want to predict the attribute at a given time stamp
0:09:06that last the context state is
0:09:09together used to generate the decoding state that means then you're gonna start generating the
0:09:13utterance after that point
0:09:14so what is a dialog attribute
0:09:17so we chose intentionally chose things like
0:09:20dialogue acts
0:09:21sentiment emotion speaker persona these are things that be actually want to model about a
0:09:25dialogue
0:09:26so the reason is we want to get control the semantic so
0:09:29it's not just about
0:09:30saying that hey does it look fluent or not
0:09:33but imagine what i want to if i want to say that
0:09:37make the dialogue sound more happy
0:09:38or
0:09:39for example
0:09:40and that the specific speaker style
0:09:43or a specific emotion
0:09:44or in the extreme and this is like
0:09:46for their along if you want your dialogue systems to start becoming empathetic et cetera
0:09:52like first of all quantifying what that means is also hard problem like there's i
0:09:56we don't have a whole talk and just that
0:09:59and
0:10:00this is that
0:10:01crucial part here
0:10:02so we are trying to force the encoder not to just generate the con contextual
0:10:06state but instead use that also degenerate a latent but interpretable representation of the dialogue
0:10:11at that particular time stamp and together use it to start the generation process
0:10:16now these are composed of lies has said
0:10:19so it's not just one single dialogue act or dialogue act to be that you
0:10:22would predict you can actually predict multiple ones of them so you can have a
0:10:25sentiment and a dialogue act
0:10:28and any motion and a style all being represented in the same model and in
0:10:33a few slides will be tear why this is useful
0:10:36so
0:10:38this is pretty much the just of the model
0:10:40so the
0:10:42but that you change are now you wouldn't model the attribute sequence
0:10:45and predicting the attribute itself is a simple mlp multilayer perceptron you can have more
0:10:50fancier things
0:10:51but this is integrated with the joint model
0:10:53and then used are the generation process
0:10:55during inference the best part about this is you would say that now you're complicating
0:10:59model even more
0:11:00you just introduce another bunch of parameters there
0:11:02obviously is gonna do better perplexity but
0:11:06what are you going to do for annotation like do you need another system just
0:11:09to give you manually labeled annotated data at the attribute level now for your dollar
0:11:14the good news is that you don't need it so here's how you do the
0:11:17inference
0:11:18so you start predicting be dialog attributes of the dialogue context so at any time
0:11:22to time you use the context vector to predict the attribute
0:11:25now condition on the previous attribute
0:11:28you actually predict the next
0:11:30i'd view that means that time stamp i use that attributed i minus one to
0:11:34predict that you know the dialogue act
0:11:36combine it with the context aided i minus one
0:11:40to start the generation process
0:11:43and as i mentioned the
0:11:44attribute annotation is not required during inference you just user during training
0:11:49now there is a whole
0:11:50bunch of things you can do together we even from the actual adaptation during training
0:11:56time for example
0:11:57you need to say that
0:11:58i need my training data also to be tied with semantic labels or like you
0:12:02motion labels or dialogue acts
0:12:04you could learn
0:12:05an open-ended
0:12:06set of things like for example open-ended topics of the dialogue
0:12:10and i want getting to that and the startling it but if a person to
0:12:13be happy to answer that you to
0:12:16so
0:12:17this is the crux of the model
0:12:19of course it doesn't stop there
0:12:21for most dialogue systems we also have to do in a rl reinforcement layer on
0:12:25top of that where you try to optimize a policy gradient
0:12:28usually these objectives a slightly different from the maximum likelihood objective that means you're trying
0:12:33to bias along responses or some other goal
0:12:36use the standard reinforce
0:12:37and usually the policies are initialized from the supervised pre-training so the
0:12:42attribute conditional the hierarchical recurrent
0:12:44and coda model is the one for screen and then you initialise the rl policy
0:12:49parameters
0:12:50from that state
0:12:52in standard works the this is how it looks like
0:12:55you formant formally the policy as a token prediction problem so this database is basically
0:13:00represented by the context at that means the encoder state
0:13:04and the action space is you trying to predict the token vocabulary one at a
0:13:08time
0:13:09what's the problem with this
0:13:11besides the double countries large for open-domain
0:13:14usually what ends up happening is these
0:13:16policy grading methods exhibit high variance and this is basically because of the large action
0:13:20space
0:13:21and
0:13:22the rl which is actually introduced to actually buys this surprise learning system some you
0:13:26know away from what it was supposed to line and like printers
0:13:29do meaningful dialogue
0:13:31instead tries to step away be linguistic and that's language phenomena
0:13:35simply because
0:13:36certain words are more frequent than others
0:13:38again
0:13:39the policies friend
0:13:40big
0:13:40those words
0:13:41from the vocabulary that will actually maximize its reward or utility function
0:13:46so
0:13:47of course
0:13:48training and convergence is another issue in this
0:13:51setting as well
0:13:52instead would be say is like
0:13:55instead of doing be
0:13:57token generation be formulated policy as a dialog attribute prediction problem the state space now
0:14:02becomes
0:14:04a combination of the dialogue context
0:14:06and the contextual attribute and these attributes of the dialogue at with the dimension in
0:14:10the previous slide
0:14:11the action space is
0:14:13the set of dialog attribute
0:14:15something more latent
0:14:17something more interpretable
0:14:18and
0:14:19in fact
0:14:20think about it like if you capture some aspect of a semantics of a sentiment
0:14:25you need all the words possible
0:14:28in the english vocabulary or any language vocabulary to generate that specific sentiment i mean
0:14:33as soon as you gotta that just
0:14:35the generation can actually downstream do much more interesting things so you're elevating the problem
0:14:39from the lexical level to the semantic level
0:14:44so
0:14:45there's a reason why this so people might say okay you introduce another attribute or
0:14:50like another set of parameters a latent layer there this is interpretable it's great
0:14:56of course this is gonna improve perplexity
0:14:58i'll show you that it's not just about complexity what ends up happening is even
0:15:02from the
0:15:03learning theory perspective
0:15:05because you're introducing these
0:15:06latent models and interpretable discrete variable models
0:15:10it actually converges better and learns to generate much more fluent and smooth responses
0:15:15and explore parts of the search space that it wouldn't the before
0:15:19simply because as an on almost every problem in the space is nonconvex so here
0:15:24we start with that but
0:15:25so here you're actually using the semantics or the user not language phenomena to guide
0:15:30it in a better
0:15:31what was it speaks
0:15:33so the experiment results conform the same like so we runs on a bunch of
0:15:37datasets like there's a perplexity and the table shows basically
0:15:41the columns are how much training data was trained on
0:15:44obviously if you go from left to right
0:15:46the more data trained on the better the perplexity of the generated dialogue that it's
0:15:50e
0:15:51and here are the attributes that we use a to model the dialogue
0:15:56now
0:15:57like sentiment means you're actually incorporating sentiment in the dialogue attribute stage of the model
0:16:01prediction switchboard is basically the dialogue acts frames is not a set of dialogue act
0:16:06so
0:16:07this can all be mutually exclusive all to be complementary or even overlapping
0:16:12and what we know what is this it's actually even beneficial to compose them of
0:16:15these attributes so they provide very different information so
0:16:18the fact that you model sentiment is not the same as you fact that you
0:16:21model
0:16:21dialogue acts the fact that you model dialogue acts from one particular
0:16:25john does not the same as modeling
0:16:27dialogue act from a different drawn so you can actually compose these attributes in very
0:16:31flexible fashion and in fact it actually improves the generation
0:16:34but the means the perplexity goes down
0:16:38so overall would be c is that the
0:16:40both the attribute conditioning and the reinforcement learning part
0:16:44generates like much better responses and more interesting in diverse responses
0:16:49so one we obviously
0:16:51as i said i keep repeating perplexity because every time you see a deep learning
0:16:55system i mean it's easy to improve perplexity try to me you add more parameters
0:16:59the system i mean
0:17:00the
0:17:01the weight works is like more parameters means and you add more data you can
0:17:05actually improve perplexity by optimising towards better state to the other parameter settings configurations
0:17:12now we also in addition
0:17:14did you many bows on the generated responses to see if it actually makes sense
0:17:18i mean because as a whole goal of generation i believe every generation system should
0:17:22do
0:17:22human about some setting if at all possible
0:17:26and what we notice is like
0:17:27a standard sequences sequence model compared with the attribute conditioning
0:17:32obviously the i could be conditioning actually helps the varsity and also relevance
0:17:36better that means it has much more winter loss ratio compared to this baseline model
0:17:41now in addition
0:17:42when you add the rl conditioning on top of that the means like we do
0:17:46the policy optimisation from this implies pre-training step
0:17:49it does even better
0:17:51so the rl as i said is actually knew
0:17:54move or nicely supervised training states from that initialization state to a better is good
0:18:00a lot about a policy but instead of learning it over at the token level
0:18:02now it's actually gonna learned that the attribute so we injecting attribute conditioning both the
0:18:06b r a level and also this approach training model
0:18:11if you compute the score is already but see discourse and their standard ways to
0:18:15do these based in the literature
0:18:17look at the responses and you can do automatic
0:18:20you know computation of the about metrics like
0:18:23compute the number of you know n-grams
0:18:25that are overlapping et cetera
0:18:27a how many distinct phrases or you know generated in the system
0:18:31overall the
0:18:33sequences you can model is worse than the attribute condition model and the other one
0:18:37is actually even better than both of that
0:18:42in addition
0:18:45if you take like the said
0:18:47of the response space that means like the most likely responses
0:18:50and you look at the percentage of them generated in the new systems
0:18:54the percentage goes down significantly how many times have you seen a chat or anything
0:18:58or any of the voice's systems you ask a question says i don't know right
0:19:02so the goal is
0:19:06that's a default you know fallback mechanism but the goal is like instead of that
0:19:10can be model something about for example
0:19:13emotional responses or other things just sort of engage the user in a better fashion
0:19:18what this allows to do is like you don't get the
0:19:20standard frustrating i don't know instead you get something mourn once it may not be
0:19:24the answer directly but it'll probably d the quantisation a much better five
0:19:28or direction
0:19:31and you're some examples which are one go through but like
0:19:34for standard inputs or not the standard either from read it so that never standard
0:19:39you get like interesting responses instead of think saying things like
0:19:45you know i don't know or you know leaving i don't want to have no
0:19:48idea used are getting like longer responses but also things that like mitch you know
0:19:53probably make more sense like for example i'm honestly bit confused
0:19:57why
0:19:58no one is brought me or my books any k might but it should be
0:20:01box i think at kick
0:20:04i don't think i don't think anything that's with the sequence a sequence model would
0:20:07even but that you conditioning
0:20:10voices are all say i can't wait to see in the city
0:20:13some of the context is missing from this example because the previous dialogue history it's
0:20:16been cut off here but there's something about the c d being mentioned there that's
0:20:20why it's to see
0:20:22okay just to summarize i-th
0:20:25we propose a new approach for dialog generation with control the link opposable semantics i
0:20:29think this is a super important then interesting topic because
0:20:33it's very easy to
0:20:34begin or what can generation we can do jans and all kinds of things like
0:20:38that but
0:20:39making it actually interpretable uncontrollable in this fashion believe also how that these in our
0:20:44empirical experiments tell the learning process as well it's not just about saying that this
0:20:48is a good knots language for non that we wanna model
0:20:51both the rl and look at the conditioning
0:20:54gender improves the baseline model by generating interesting and it was responses
0:20:58their number of things that b
0:21:00you know are looking at in the future
0:21:02in addition to incorporating multimodal but
0:21:05what is the impact of debriefing
0:21:07classifiers like for example as is that like we didn't use pre-trained classifiers as the
0:21:11attribute prediction problem there
0:21:13and how do we like
0:21:15measure the interpretability via modeling this during the training process
0:21:18audrey dialogue data generated actually
0:21:22respecting the semantics of the attributes that it actually predicts i mean there's that even
0:21:26makes sense
0:21:28and then like how do you know do this for
0:21:30speaker persona an extended to more open-ended concepts
0:21:34these are
0:21:36questions in like you know thoughts
0:21:37if you have any questions related to any of these things hundred runs of them
0:21:50i am residuals from start of five am i was very interested in your training
0:21:54corpus size of the examples you gave for the dialogue model training we've had up
0:21:57to two meeting million training examples obviously in a situation assume you're not a manually
0:22:03generating them are you getting them for me to give examples or where else you
0:22:06get it's a user some of them are from
0:22:09that dreaded and the open-set i was corporas these are available
0:22:13as it is said
0:22:14the attributes
0:22:15themselves i'm not necessarily always manly annotated for example for so which but i believe
0:22:20first part of that behind it
0:22:22a for one of the dataset but what we ended up doing is like you
0:22:25can take the
0:22:26standard lda or any other you know tool
0:22:29actually label them with the center so you can have a less a high precision
0:22:32classify image actually do
0:22:34a runaway training corpus so these can be single label for instance
0:22:37and interesting part is that
0:22:40after modeling all this like the it's not necessary the accuracy of the dialogue act
0:22:45to be prediction will go are in the latent system
0:22:48even though that might be really eighties or something like that it still is good
0:22:52enough for the generation system
0:22:54it so there is a so there's something work to be done about like
0:22:57how good can we get like i mean should be bumped up to like to
0:23:00ninety nine percent then whether that have an effect on the generation
0:23:04things that we are looking at
0:23:18i am adding more german research lab just had a question about i guess did
0:23:21you look at speaker persona at all i was only curious maybe you can speculated
0:23:25about it do you think with enough data
0:23:29with the conditional model you could model individual users
0:23:32maybe like to read it user names or something like
0:23:35there is a joke when we really smarter clapping after the first
0:23:38further for version assume
0:23:41i think it was a some professor from universities it
0:23:43this modifies and getting seem very snotty to me
0:23:46as like
0:23:47it's training on your own data i mean we don't look at the data but
0:23:50you know it's basically reflect in yourself
0:23:52so show an answer is yes but of course you want to do this what
0:23:56you know data right and you also want to do it in the privacy present
0:23:59manner which i haven't talked about here at all right part of my group focus
0:24:01on like
0:24:02how do you do this all in the privacy preserving manner right for example you
0:24:05can build a general system
0:24:06but then
0:24:07all the inference and things can happen only on-device are in like sort of like
0:24:11your data is like silent off from everybody else
0:24:14and the question is again
0:24:16deep really do you feel like you have a specific personality or what you feel
0:24:20was is what you actually right
0:24:21might be very different right so that their aspects of that to be considered
0:24:35i'll be here if you want