0:00:15how everyone a man in style i'm a student is the university and i'll be
0:00:20discussing some joint work between my collaborators that you know p group and also the
0:00:25for research and relations that are
0:00:27and i guess before i actually get started i think this target can be pretty
0:00:31deep learning happy so
0:00:33before you kind of star trek may resume you know as like the harbinger of
0:00:36everything is bad and bad in dialogue today i'm and you to learn as well
0:00:40i wanna talk you know please lets visible about this so
0:00:45just come isn't that is
0:00:47before i get started i like to kind of take a step back
0:00:51and discuss some of what i think like the larger motivations of dialogue research arm
0:00:55and to do that i'd like to talk about a film her which somebody may
0:00:59be seen
0:01:00in it the protagonist played by walking unix essentially develops an infant relationship in the
0:01:06sparse feature world with his super intelligent assistance amanda
0:01:10and how do we estimate the so feeling is her charisma her really to
0:01:15conduct very intelligible conversations
0:01:17and why would you necessarily swell the details of the movie i would like to
0:01:22say that i think it does a fantastic job of illustrating really what is at
0:01:25the core of a lot of dialogue research i think on the one and we
0:01:28do we are trying to build very practically useful agents try we're trying to build
0:01:32things that people can use on a daily basis
0:01:34but i think more broadly
0:01:35i think we also should be trying to build a just edible compassionate and pathetic
0:01:39relatable collaborative i think in doing so will learn a lot of ourselves what we
0:01:45as humans are what makes as human what's the core of our of our humanity
0:01:49and so i think this is that this is due motive is something that i
0:01:53think she got a lot of dialogue research and certainly guys a lot of us
0:01:56of the that i
0:01:57well like to do
0:01:59moving now into the actual talk itself
0:02:03a quick roadmap i'm gonna be discussing some background to this work a i'll be
0:02:07discussing the model that we developed a dataset we also developed
0:02:12also the experiments that validated sort of the approach and some concluding remarks
0:02:19so background
0:02:22if we take this snippet of dialogue between a human asking a sort of fairly
0:02:26simple query you know what time is my doctor's appointment
0:02:30we would like an agent to be able to just to answer the query with
0:02:33reasonable effectiveness and say something to be effective your appointment is a three point one
0:02:37thursday
0:02:38the traditional dialogue systems tend to have a lot going on in the back end
0:02:42we have a number of modules that in various things including actual and understanding interfacing
0:02:47with some sort of a knowledge base and then obviously a natural and generation
0:02:52in tradition we have a separate modules that are doing all these things together and
0:02:56and often times can be very difficult to make a smooth interaction between all these
0:03:00different modules
0:03:01and so i think the problem is of a lot of present the enrolled dollar
0:03:04researchers will is that will be able to kind of an automated
0:03:06some
0:03:07really all of all these separate modules and with is affected and doesn't really
0:03:12limit performance
0:03:15more specifically i think that one of the big challenge is that a lot of
0:03:18present in all dialogue systems suffer from is interfacing with the knowledge base itself
0:03:23and so
0:03:24really the kinds of things we would like to see is sort of a smooth
0:03:28interaction involves heterogeneous components and we could replace these all these separate you know hardworking
0:03:33the robot's with one make a robot i the end-to-end dialog system then
0:03:37maybe we're getting some sort of progress
0:03:39this is of course
0:03:40i have a suitable it may be would like to work towards
0:03:45so the purposes of this work i guess so first discuss
0:03:49some previous work has been done in this in general in this general line of
0:03:52enquiry
0:03:53so some work from when it all has sought to essentially take the traditional modular
0:03:58connected paradigm and replace some or all the components with the neural acquittal and
0:04:04other work has tried to
0:04:06kind of enhanced these soft the kb lookups and interaction of the kb through some
0:04:10sort of soft operation that still maintained some sort of belief state tracking
0:04:16there's another line of work the kind of tries to find a middle ground
0:04:19that try seek the best of sort of the rule based heuristic systems
0:04:22and the more neural that is still not able to neural training
0:04:27and then there's some work that we kind of been pursuing in the past
0:04:29that seeks to
0:04:31bill some sort and then system that's builds of the traditional c to seek paradigm
0:04:35and is able to enhance the paradigm with some mechanisms that actually one more effective
0:04:40dialogue exchanges
0:04:43the motivation then of our work is twofold
0:04:45one we would like to develop some sort of a system that can interface with
0:04:48the knowledge base in a more or less intense fashion without the need for explicit
0:04:53training of believers like trackers
0:04:55and i think a sample of that is then how we get a sequence the
0:04:59sequence architectures this purported architecture to interact nicely with some intrinsically structure information you know
0:05:05we're talking about it
0:05:06sequential model
0:05:07combining with this more like structured representation
0:05:10and
0:05:11getting there's to work together is something that i think is gonna be a challenge
0:05:14going forward
0:05:17some details on the model
0:05:20so first steps i don't know what people's general material acoustic models but
0:05:25the encoder decoder with attention framework is one is investigated a number of different works
0:05:30and for the purposes of dialog evolves more less the exact same starting paradigm the
0:05:34same general back on the encoder side we're basically heating in a single token of
0:05:39of dialogue context one of the time through a recurrent unit highlighted in blue
0:05:44and one travelling the recurrent for some number of times that's
0:05:48and after some number of computations we get the hidden state that is initial that
0:05:51is used to initialize the decoder which also the recurrent unit and is also relevant
0:05:55for some number of time steps
0:05:57at each step of the decoding we're gonna be referring back to the encoder and
0:06:01essentially computing some sort of a distribution
0:06:04over the various tokens of the encoder
0:06:06and this will be used to generate a context vector that then is combined with
0:06:10the decoder hidden state to form a distribution over possible capture tokens that we can
0:06:16arg max over and essentially new our system response for
0:06:22sewing with this general background i liked hypothesize that in principle we should be able
0:06:26to just like take this decoder hidden state that we already computing at a given
0:06:29timestep just move that one step further and say hey uses exact same decoder hidden
0:06:34state to compute some sort of an attention over the rows of a knowledge base
0:06:38so that the question is how do we actually represent the knowledge base in such
0:06:41a way that this is actually feasible i mean we're eigen can talking about structure
0:06:44information and we're trying to deal with in some more of a sequential fashion to
0:06:48we are interested sequence
0:06:52so again this is the question is really guarding the
0:06:56the work is how can we were represent a cave effectively
0:06:59to do so we draw information are inspiration from
0:07:04the key value memory networks of millard all which essentially showed that he value representation
0:07:08which
0:07:09not only is kind of a nice
0:07:11elegant design paradigm but also
0:07:13can we can directly be shown to be quite effective a number different tasks
0:07:18maybe something helpful for us
0:07:20so the show how this actually would it play out for our purposes i mean
0:07:25taking one row of a kb and show how were trying to transform into something
0:07:28that is amenable to keep value representation
0:07:33so consider this a single row of a look at here we're talking about a
0:07:37calendar scheduling task
0:07:39and we have some
0:07:40these the structure information
0:07:42and we want to convert that into essentially what is the subject relation object a
0:07:46triple format
0:07:47and so here what we're doing is we have some event the dinner
0:07:51which is connected to a number of different items in a backs about the dinner
0:07:56through some relation so you have some time which can be relations and data which
0:08:00simulation et cetera et cetera
0:08:02and everything is information that is originally represented in the in the role of the
0:08:06cup knowledge base is now collapsed into triple format
0:08:11and so this is the first sort of a operation that we're gonna work with
0:08:17going from the subject relation object triple format
0:08:20we then
0:08:21make just one small change which converts into a key values store
0:08:26taking the subject a relation and essentially concatenating it to form a sort of canonical
0:08:30as representation that is our key
0:08:33that is sort of exactly what we're trying to do
0:08:35so
0:08:35if you look the first row we had the simulation object with for the dinnertime
0:08:40an eight p m
0:08:41and
0:08:42this subject relation essentially become this new not realised make a that's a make a
0:08:47key called dinnertime for lack of about word and the object is just mapped one
0:08:52to one to the value
0:08:54and we do the same for every single other row in the original
0:08:57is a row a triple format
0:09:00and so because we're dealing with embeddings
0:09:02the keys in this case and that being just the sum of these subjects relation
0:09:06embeddings
0:09:06so dinnertime this case is just litter the sum of the gender bending and the
0:09:09time adding
0:09:10and
0:09:11an important detail is now one word doing some sort of decoding
0:09:14we're all argmax sing over an augmented vocabulary
0:09:17which includes not only the original vocabulary that we started off with but now also
0:09:21these
0:09:22these additional canonical as a key representations
0:09:28when we put it all together we
0:09:29have essentially again well we start out with which was the sink or decode with
0:09:33the tension framework
0:09:34but now we filled in this attention over the over the knowledge base
0:09:39we compute some weight over every single role of the knowledge base
0:09:43and so for example in the case of something like you know the football time
0:09:47at two p m
0:09:47that's visible
0:09:50there's no weights that is there is used to await the appropriate entry in this
0:09:54case the football time cannot representation in the distribution of you are mixing or
0:10:00we do this essentially for every single row of the
0:10:04of the of the new canonical eyes kb
0:10:07for that
0:10:09and this essentially is adjusted model
0:10:14moving on
0:10:16the dataset that we used because
0:10:19i mean first off i guess a quick no data scarcity the obvious ignition a
0:10:22lot about research especially when we're talking about the neural dialogue models that are that
0:10:26a lot of people are dealing with you know it seems that more data often
0:10:30helps but
0:10:31given that are collaborations one with for which obvious is a for company is a
0:10:35car company and hence
0:10:37the really only interested in things really it requires
0:10:40we had to go about building since the new data set
0:10:43that would be
0:10:44and then able to still being able as the same question that we want to
0:10:46ask about knowledge bases but is kind of more relevant to their use case
0:10:50so that in the being the in car virtual assistant domain
0:10:53so here i three sub domains there were interested in our scheduling calendar scheduling whether
0:10:59and then point of interest navigation
0:11:04the way we wanna by collecting data set
0:11:06essentially you see whether masking which is adapted from the work of one at all
0:11:10and it essentially what we're doing is we have
0:11:14crowdsource workers
0:11:14that are playing one of human essentially they can either be the driver or the
0:11:19car systems
0:11:20and we progress dialogue collection one exchange of the time
0:11:24so the driver basing interface looks like this
0:11:27you have essentially a task that's generated automatically for the worker
0:11:31and usually provided with the with the dialogue history but because this is the first
0:11:35exchange of the dialogue there's no history again with
0:11:38and then you have the to the worker is passed with essentially progressing the dialogue
0:11:44a single turn
0:11:47on the cars just inside
0:11:49we also provide the history of the dialogue history so far
0:11:53but
0:11:54the car system is actually being asked to use some private collection information that they
0:11:58had access to the user does not have access to and they are then supposed
0:12:02to use then information to also progress
0:12:04the dialog iteratively port exactly what the user ones
0:12:10the dataset ontology
0:12:12has a number of different
0:12:14and three types and associated values across the different domains
0:12:18and i guess that sort of lends itself to a fairly a large amount of
0:12:22devastation types of things that people can talk about
0:12:26what data collection was done we had a little over three thousand dialogues and it
0:12:30was more or less split evenly across the three different domains
0:12:33with an average number of like five or utterances per dialogue as well as
0:12:38nine research tokens per utterance
0:12:42now for some experiments
0:12:44using this data set and the model we propose
0:12:48the baselines that we used for benchmarking our model we're two
0:12:51first we build a sort of traditional rule based system that uses
0:12:56manual rules go to do not going to understanding as well as the naturalness generation
0:13:02and to do all the interfacing with the k p
0:13:05and then on the kind of the neural competitor that we put up against are
0:13:10new model was the copy augment the c to stick model that we could build
0:13:14previously in prior work which at its core is essentially also and encoderdecoder framework
0:13:20with attention
0:13:21kind of background but also daugman's that
0:13:24with an additional copy mechanism over the entities that are the dimension of the dialogue
0:13:29context
0:13:29we chose this
0:13:30because one it is an exact same classifier of models as the new one to
0:13:35we're proposing iac to stick with attention
0:13:39and i guess previous work also shown that this is actually pretty competitive with other
0:13:42model classes including like the intent every network facebook
0:13:46and also because the code was already there so one
0:13:51so i guess for automatic evaluation we had a number different metrics and i'm gonna
0:13:56say this and i'm the bite the bullet that we did provide some sort of
0:13:59automatic evaluation
0:14:00but i guess i know that indicates a dollar especially automatic evaluation is something that
0:14:04is a little tricky to do and in that it really is a little dab
0:14:07it is divisive of a of a topic
0:14:10but there were some object but i guess some people have reported previously so we
0:14:14kinda just follow the line of previous work
0:14:16we use bleu which is of course of data from machine translation and there's some
0:14:21work that says it's actually awful metric
0:14:22no correlation human judgement and then there's some more recent work that says you know
0:14:26it's like it's pretty decent the n-gram basis extraction not really all that had
0:14:32and then we provided in into the f one which basically is a matter of
0:14:35micro-averaged f one over the set of entities that are mentions the response as compared
0:14:42to that in the target response that we're going for
0:14:46so when we hit it all the models against each other we see that
0:14:51first off be the rule based model is doesn't have a particular high blue which
0:14:54again a binary too much but
0:14:56that can simply be explained by the fact that maybe we don't write as many
0:15:00diverse templates for natural image generation
0:15:04but the idea of one is decent in the sense that
0:15:07you know we kind of did target the models which way that was can be
0:15:09pretty accurate of picking out
0:15:11and accommodating search queries
0:15:14the copy network is what had a pretty decently score which
0:15:18can of course be true to the fact that i mean this acoustic models are
0:15:20known to be good language modeling but the mtf one is pretty bad comparatively and
0:15:25this is i guess a function of the fact that
0:15:28essentially the copy number doesn't really making use of the kb directly instead relying totally
0:15:33on dialogue context to generate entities
0:15:36and then the cater to a network outperforms these on the various metrics
0:15:41performed pretty well and lewis wasn't you have one but we also show human performance
0:15:46on this and show that
0:15:48they're still naturally japanese to be filled so well this is encouraging it's not i'm
0:15:53not receive tentative and it's by no means suggest of the fact that quilt models
0:15:56of the one model secure the other
0:15:58but it is kind of it there for coarse grained evaluation
0:16:03we also provide an human evaluation
0:16:05where we essentially generate about a hundred twenty distinct scenarios across the three different domains
0:16:09that we had
0:16:10once the to never before been seen in training or test and then we hear
0:16:14that the different more classes with amt workers in real time and had then conduct
0:16:19the dialogue and then assess the quality of the dialogue based on fluency what we're
0:16:23denis and human likeness on a one to five scale
0:16:28here i mean this kind of this scheme evaluation tends to be a little more
0:16:32but more consistent a little a little more seriously regarded and again it at which
0:16:36you network actually outperforms
0:16:38various and fetters
0:16:39especially getting good gains over the copy network which is which is of course encouraging
0:16:44here again we also have human performance which
0:16:46i mean a sort of sanity check does perform does provide an upper bounds that
0:16:49there still of really large margin between even our best performing system in human performance
0:16:53so
0:16:54the still gap to the to be filled there
0:16:59i just as an example of a dialog one of these scenarios
0:17:03we have here a sort of truncated knowledge base
0:17:08and
0:17:09in each data point of interest navigation
0:17:11setting and
0:17:12we have the driver asking for
0:17:14a gas station with the shortest route from where you are
0:17:18the car answers appropriately
0:17:20you know the driver kind of all samples the next year's gas station the cars
0:17:24is again
0:17:25and string approach really with respect to the knowledge of its given so it's nice
0:17:29to see that there is a reference to the knowledge base and it's handling stuff
0:17:32appropriately
0:17:35some conclusion and kind of final thoughts
0:17:38so the main contributions of the work we're namely that we had this new class
0:17:42of seek to seek style models that is able to perform a look up over
0:17:47the knowledge base in a way that is that is fairly effective
0:17:52and it does this without any slot or belief state tracking which is kind of
0:17:55a nice and nice benefit
0:17:57and it doesn't outperform several the baselines
0:17:59on a number of different metrics
0:18:01and the process we also created new data set of roughly three thousand dialogues in
0:18:05a in a radically domain
0:18:08and i heard a new domain
0:18:11future directions i think one of the main ones a scaling up the knowledge bases
0:18:14so
0:18:15right now word not exactly only the scale of knowledge base of the people would
0:18:18seeing how to relax applications and think that somebody's
0:18:22typical google calendar or
0:18:25any anything of that nature there is always the a disparity in the size of
0:18:28these knowledge bases
0:18:29and so we like to move in the actual feasible realm of types things that
0:18:33people talk about and is the magnitude of types of things people talk about
0:18:37we also like to can move away from
0:18:39operating in the most at each range for gene and it's that kind of do
0:18:43more rl by base things which we accommodate any deviations from typical dialogue tempos that
0:18:48we may see
0:18:49and i guess even further down the line it would be nice to see models
0:18:52that
0:18:53they can actually incorporate more kind of pragmatic reasoning
0:18:55into the kinds of inference is that they're able to make so that simple query
0:18:59like well i need to wear jacketed a the pragmatic reasoning the lousy to say
0:19:03that hey wearing a jacket is indicative of some sort of temperature kind of reason
0:19:06gonna have to do is a bic that also in the model
0:19:11so that that's my presentation thank you be happy taking questions
0:19:24question use
0:19:45i think that that's a great question and i think right now for the predicate
0:19:49this particular iteration of the of the model
0:19:52i think it is
0:19:53relatively dependent on the types of things there talked about because again with the kinds
0:19:57of
0:19:58the entire look up operation is depending on like embeddings and is embedding have been
0:20:02trained right on the appropriate types of a database c and so naturally you're talking
0:20:06about calendar scheduling for you know five hundred dialogues a listen you're talking about you
0:20:11know ponies or something is gonna be hard to have well trained embeddings that are
0:20:15gonna allow you to do that and so i think that certainly
0:20:18this is something that is a subject of future work and i can think of
0:20:22likes some ways you know using pre-trained embeddings mail it kind of circumvent the need
0:20:25to literally train scratch again and kind of bootstrap a little bit more other kinds
0:20:30of things you expect to see i think it's a regression deaf and something to
0:20:33spoken further
0:20:44and thank you for your presentation i just want to our during the experiment and
0:20:49the training process
0:20:51and the testing as well so i'll do you deal with unseen situations you know
0:20:56if you show you can see our knowledge like used to meet
0:21:01used in the nation's you in that and are talking to deal with that sorry
0:21:05how can i do anything to deal with the situation you the task
0:21:10also
0:21:11in what particular sensor like you're talking about
0:21:15what exactly so it's like if something that is entirely different what you've seen before
0:21:19all maybe like just
0:21:21some kind of like you just
0:21:24like new p-values show you the task force not change
0:21:29i mean i think in this case
0:21:32it would have to be augmented a little bit more with some sort of a
0:21:34copy mechanism by you over the
0:21:38i mean i guess in this case
0:21:40it is a little bit dependent on the kinds of things that it's seen
0:21:44and i think that
0:21:47i think that in general there have to be done
0:21:50through
0:21:52i mean right now is able to pattern only having seen these entities in training
0:21:55as well
0:21:57i think in general it's something that which kind of look at how they can
0:22:00be done in a way that is
0:22:02less dependent on the keys the keys as they and demonstrate and i think right
0:22:06now it would probably come out as in which people a little difficult to handle
0:22:09but
0:22:10but solution to
0:22:23last point that you had your site some future direction it just structure knowledge addition
0:22:30right information system is that you can perform reason you can you can recently you
0:22:36can probably be you have any and you like to incorporate that with the reading
0:22:40and twenty five
0:22:41right you mean allowing for these kinds of more complex styles of reasoning without
0:22:47i mean that so that's a really good points and
0:22:50i think the last one especially is right now a little bit of a long
0:22:53time i mean in the sense that
0:22:56even though it's and the kinds of things that are common it still is something
0:23:01like that more less falls into the
0:23:03one particular type of pattern with the slot you can well as the land and
0:23:06act on that
0:23:07i think that
0:23:09right now the model would what's troubles would be famous probably with this kind of
0:23:13things that obviously deal with like you know synonyms kinds of various like the use
0:23:18of speech et cetera
0:23:19and i don't have like a super gonna answer for with that would look like
0:23:22because the model is very much of this like
0:23:26slot filling but i think that
0:23:28i think the interplay of chitchat systems and the kind of the more structure information
0:23:33is one that should definitely be explored more we can i think that you know
0:23:36really touched on that a lot as well
0:23:41and you speaker