and the she's going to present a flexibly structured models for task oriented dialogues
0:00:32another end-to-end dialog model
and you're not everyone on relation for university of illinois at chicago our present our
work flexible structured task oriented dialogue modeling for short addressed the
this work at all my me pair molly no multi
0:01:22who shoe being deal
0:01:24why children and spoken for
lattice quick reply recap module it end-to-end dialog systems
0:01:33traditional modularised dialogue system at the pipeline of natural language understanding dialog state tracking knowledge
0:01:40base squarey
0:01:42that a dialogue policy engine and natural language generation
0:01:47and you and that of system connect all these motors together and the chain them
0:01:51together with detecting and text out
0:01:54the advantage of and you and fashion you that it can reduce the error propagation
0:02:02dialog state tracking the key module which understanding user intentions
0:02:08track dialog history and update dialog state at every turn
0:02:13the update of dialogue state get used for carrying the knowledge base and a for
0:02:17policy engine and for response generations
0:02:20there are two popular approaches week or them fully structured approach and a freeform approach
0:02:30the following doctrine approach uses the full structure of the knowledge base
0:02:35both it's schema
0:02:37and that the values
0:02:39it as you don't that
0:02:41the set of informable slot values and the requestable slots are fixed
0:02:47the network about it's multiclass classification
0:02:51the advantages that value and the slot are well aligned
0:02:55the disadvantage in that it can not adapted dynamic knowledge base and detect out-ofvocabulary values
0:03:03appeared user's utterance
0:03:10the freefall approach does not exploit and information
0:03:14a pause the knowledge base
0:03:16in the model architecture
0:03:18it achieves the dialog state as a sequence of informal values and the requestable slots
0:03:25for example in the picture
0:03:27in the restaurant domain
0:03:29that dialog state it's
0:03:30italian then we call an cheap then we call them
0:03:34address then we call an and a full
0:03:37the network it's sequences sequence
0:03:40the pros i that
0:03:42it can adapt to new domains
0:03:44and that the changes in the content of knowledge base
0:03:48it is stopped out-of-vocabulary problems
0:03:50the disadvantage is that
0:03:53value and the slot
0:03:54and not aligned
0:03:56for example
0:03:57in travel booking system
0:03:59given a
0:04:00dialog state chicago and that's the other
0:04:03can you tell
0:04:04what you that departure city and the which when it's a rival city
0:04:09and also
0:04:10tough free from approach which model unwanted order of requestable slots and it can produce
0:04:16in many states
0:04:18that may be generated and non requestable slot words
0:04:24so our proposed yet
0:04:26flexible structured dialogue models
0:04:29the contents fine components
0:04:31the first it the queen hard
0:04:33the queen hardly at all we encoded in core encoder module
0:04:37and the yellow and orange part of our dialog state tracking
0:04:41the purple part of its knowledge base query
0:04:45the red part it's all a new module we propose yet call the response lot
0:04:50and the green and of the we and that the blue part well together would
0:04:54be the response generation
0:04:58so we propose a flexible subject turn dialog state tracking
0:05:04what you use only the information in the schema
0:05:08of the knowledge base but not to use the information about the values
0:05:13the architecture we propose contains two parts
0:05:18informable slot value decoder the yellow in this pictures
0:05:22and the requestable slot decoder the already part
0:05:26the informable slot value decoder has separate decoder to each informable slot
0:05:32for example in this picture
0:05:36what is for that right
0:05:37given the start of standard token foot
0:05:40the decoder generate italian and of food
0:05:45for the requestable slot decoder idiot a multi-label classifier for requestable slots
0:05:50or you can think that
0:05:53binary classification given a requestable slot
0:05:57you can see that inflexible structured approach has a lot of advantage first slot and
0:06:04the values are aligned
0:06:06it also solves all the vocabulary problem
0:06:09and the k usually at that between your domains and of the changes of the
0:06:12content of knowledge base because we are using a generation method for the informable value
0:06:19and also we remove the unwanted order of the requestable slots and that the channel
0:06:24to generate invalid the states
0:06:29a nice the flexible subject read dialog state tracking it's
0:06:33it can explicitly
0:06:35a design value to slots
0:06:38like the fully structured approach
0:06:40why are also preserving the capability of dealing with all the vocabulary
0:06:45like the freefall approach
0:06:47meanwhile it ring challenges in response generation
0:06:52the first challenge is that
0:06:54the it possible to improve the response generation quality based i'll flexible structured dst
0:07:01the second challenge is that
0:07:04how to incorporate the output for a flexible subject or dst
0:07:08for response generation
0:07:12so regarding the first challenge
0:07:14how to improve the response generation we propose a novel module called the response large
0:07:21the writing to pick the right part in the pictures
0:07:25the response slots
0:07:28of the response slots
0:07:29i don't slot names or the slot tokens
0:07:32that appear in that you lexicalised the response
0:07:35for example
0:07:36the user request the address
0:07:39the system replies
0:07:40the address
0:07:41often am slot
0:07:43it in i just thought
0:07:45so for the response lot colder we also adopt a multi-label classifier
0:07:52regarding the stacking the challenge
0:07:54how to incorporate
0:07:56flexible subject or
0:07:57the st
0:07:59for the rest both generations
0:08:01we propose toward a copy distributions
0:08:04it will increase the chance of awards
0:08:07in the informable slot values
0:08:10requestable slots and the response lot to appear in the agent response
0:08:15for example
0:08:17the address of an m slot get e
0:08:20i had just a lot so we are trying to increase the channels off
0:08:25name slot and at a slot to appear in the response
0:08:31it'll from now i'm going to go to detail how we link these modules together
0:08:39first it always input encoders
0:08:42i like input encoder
0:08:44takes so you kind of input
0:08:46the first get agents right well in the pastor
0:08:50the second it that dialog state
0:08:53and this sort yet the current the user's utterance
0:08:56the out the were p
0:08:58the last hidden state of the encoder
0:09:01it was first asked initial hidden state
0:09:04what the dialog state tracker and that the response generation
0:09:12informable slot about a decoder gets one part of our flexible structure dst
0:09:18it has to kind of input
0:09:21the input e at last the hidden states from the encoders
0:09:25and that the unique start of sentence syllables for each slot
0:09:29for example
0:09:31for the slot starting word gets food
0:09:34the output
0:09:35for each slot
0:09:37a sequence of words regarding the slot values are generated
0:09:41for example
0:09:43the value generated of all for the slot here
0:09:46and awful
0:09:48the intuition here is that
0:09:50the unique start of sentencing both issuers
0:09:54the slot and the value alignment
0:09:56and that the complement can it then a command sequences sequence allows copying of values
0:10:01directly from the encoder input
0:10:05the requestable slot binary classifier
0:10:08this is the another part in our d
0:10:10flexible structure to dst
0:10:13the you what is that
0:10:14last hidden state of the encoder
0:10:17unique start of send the symbols for each slot
0:10:20for example
0:10:22for the slot starting a war it also for
0:10:25the also forty it's
0:10:26for each slot
0:10:28a binary prediction
0:10:29true or false
0:10:31the produced regarding whether the slot it is requested by the user or not
0:10:38note that
0:10:39but you are you here i guess only one step
0:10:42it may be replaced that with any classification high key picture you want like
0:10:47which uses you are good because we want to use the hidden state here
0:10:50at the initial state for our response slot binary classifier
0:10:57what the knowledge base acquire a get takes the in the generated informable slot values
0:11:02and of the knowledge base and output
0:11:05well how the vector represents the number of record the matched
0:11:12he i get our response slot binary classifier
0:11:16if the input es
0:11:17the knowledge base par with a result
0:11:20the hidden state from the requestable slot binary classifier
0:11:25output yet
0:11:26for each response plot a binary prediction
0:11:29true or false
0:11:30if the produced regarding whether it is response not appear in the asian the response
0:11:36or not
0:11:38the motivation is that
0:11:39incorporating all it really relevant information about the retrieved entities
0:11:45and that the requested slots into the response
0:11:52copy what a word a copy distribution can use them
0:11:56the motivation here is that
0:11:58the canonical copy
0:12:00mechanic then only takes a sequence of words in text input
0:12:05but not accept
0:12:06the multi porno distribution we obtain
0:12:09from the binary classifiers
0:12:12so we taking
0:12:14the prediction from the informable slot the value decoders
0:12:18and that from the requestable slot binary classifier and the response slot binary classifier
0:12:25and output a word distribution
0:12:28if a word yet a requestable slot or a response not
0:12:33the probability of the a binary classifier output
0:12:37if a word appears in the generated informable slot values
0:12:42if the probability equal to one
0:12:45four or other words in there
0:12:53a interest about decoder
0:12:55what taking that encode
0:12:56the last hidden state of the encoders
0:12:59and the knowledge base carried a result
0:13:01and that the word a copy distributions
0:13:04all support get a delexicalised agent response
0:13:08the overall loss for the whole network what including the informable slot values
0:13:14so loss and of the requestable slot values last response slot values most and that
0:13:20the agent a response slot values but a gender is the boss loss
0:13:27experimental settings
0:13:28we use to kind of the that
0:13:31the cambridge restaurant dataset and the stand for in-car assistant there is that
0:13:35and the evaluation matches we use
0:13:38for the dialog state tracking we report the
0:13:41we report the precision recall and f-score four informable slot values and requestable smarts
0:13:47and of what have completion
0:13:49we use the and you match rate and the success f one score
0:13:54and the blue yet apply to degenerated agent response for evaluating the language quality
0:14:02we compare our method to these baselines
0:14:05and em
0:14:06and id and their functional ones
0:14:09they using the fully structured approach what
0:14:12for the dialog state tracking is
0:14:14and the kb are in from the stand for
0:14:16they do not think that they do not do that dialog state tracking
0:14:19and that est p
0:14:21and the t st p
0:14:23without are your and ts tp the other freefall approaches
0:14:27they use a two-stage copy and could be mccain didn't sequence of sequence
0:14:31which kaldi software encoders and the true copy mechanic simple commanded decoders
0:14:36to decode belief state first and then the response generation as
0:14:40and of for the for its ep and also tuning
0:14:46the response slot by the reinforcement learning
0:14:51here the turn dialogue dialog state tracking results
0:14:54you are notice that
0:14:56our proposed the method fst in it performs much better than the free for approach
0:15:01jesse p especially
0:15:04especially on the requestable slot the reason is that
0:15:08the free for approach that modeled the unwanted order
0:15:12of the requestable slots
0:15:14so that why hall or of f is the uncanny can perform better than them
0:15:22this it our that of the level task completion without
0:15:26you also notice that fst and can perform better than most
0:15:30better than the baseline in models to match
0:15:33you most the metrics that the blue on the kb it dataset
0:15:39here it example of generated dialog state and the response from the free for approach
0:15:45and all approach
0:15:46in the calendars
0:16:10the belief state at the want to choose a belief that here is that for
0:16:14the informable slot the you've and easy crow to the meeting and for the requestable
0:16:18slot the user try to be acquired state
0:16:20time and parity
0:16:22the freefall approach it would generate meeting data and a party an ofdm would generate
0:16:27the you've and the crow to them at a meeting data it to time it
0:16:31to an a party it's true
0:16:33you a notice that here the free for approach cannot generate the time
0:16:38the time here the really that in the training dataset
0:16:42the down a lot of example
0:16:44contain data in the parties so they modeled disc the free one approach you model
0:16:49it is kind of orders
0:16:51so the mammoth right data in party together so when during the testing
0:16:56the it
0:16:57what during the testing if the user request that date time party it cannot predict
0:17:02that the it cannot predict about the problem
0:17:05and also for that
0:17:06begin the response
0:17:09one shows the it's your anatomy of the way it's
0:17:12parties slot on that there is not a time slot the t a cp generate
0:17:17the next meeting at that time slot on days not and the time slot and
0:17:21i'll have sdm can generate
0:17:24and maybe
0:17:25a baseline at a time slot with part is not here the freedom approach can
0:17:30generate system with the us and repeating this at the time slot
0:17:37the conclusion here that we propose an island to an architecture with a flexible structure
0:17:44for the task oriented dialogues
0:17:46and the experiment
0:17:47suggest that the architecture get competitive with these us assume top models and the wire
0:17:53our model can be apply applicable you real world scenarios
0:17:57our code will be available in the next few weeks on this links
0:18:05and is it another when you work regarding the model be multi action policy what
0:18:09task oriented dialogs it will appear mlp tucson it
0:18:14the pre and the code are publicly accessible on this link all you can see
0:18:17again the cure a cold
0:18:19the traditional policy engine predicts what action per term which were limited express upon work
0:18:24and introduce unwanted terms but interactions
0:18:28we propose to generate monte action per turn by generating a sequence of tuple the
0:18:34tuple units continue act and the smart
0:18:37the continue here means well that we are going to stop generating just tuples all
0:18:42we are going to continue to generate the couples the slot to me the accuracy
0:18:47of the dialogue act and the slots media a does not carry it's the it's
0:18:51not like a movie name
0:18:53we propose a novel recurrent zero
0:18:56called the data continues that's not g c is
0:18:59which contains two units
0:19:01continue you need act you need and the smallest unit
0:19:05and it sequentially-connected in this recurrent is there
0:19:09so the whole decoder yet in a recurrent of recurrent a fashions
0:19:15we would like to deliver a special thanks to alex janice woman maps and the
thank you
0:19:40i thank you very much that was very interesting
0:19:43so what the system do if somebody didn't respond with a slot name or a
0:19:48slot value
0:19:49you know what time you what restaurant you want you that it is that the
0:19:52closest one to the form theatre
0:19:59excuse me to repeat the lessons again
0:20:03your system prompts somebody for a restaurant where they want to eat you money that
0:20:07some italian food the system says what restaurant would you like to eat at and
0:20:12the user says the closest italian restaurant to the form theatre
0:20:16so i'm not giving you a slot value i'm giving you a constraint on the
0:20:20slot value
0:20:21what this kind of an architecture do with something like that is a response okay
0:20:25thank you a generate a
0:20:28does not the menus provided user to what we are working for most of the
0:20:32values were detected
0:20:34so when we gent
0:20:35the always informable slot value decoder
0:20:45informable slot the melody currently decoder were trying to catch these the end use these
0:20:49informations from the user side so when we are trying to generated is kind of
0:20:53each things we are also well using the copy
0:20:56we also are trying to increase these words to be appeared in the response generation
0:21:01is for example
0:21:02the titanium at the italian restaurant or you want to what b
0:21:05a someplace this method that
0:21:08i understand how you do that but the question is how would you get the
0:21:11act to what the wrapper internal representation be somehow that we get the closest to
0:21:16get the superlative
0:21:18in the result how what if compute the closest of all you're doing is attending
0:21:22to values that you have to compute some function like instance
0:21:27actually at the very but the question i think that
0:21:31it is them i'm getting you are trying to ask you whether if the
0:21:36have to informable slot the values from the user the is not exactly match is
0:21:40something that appear in the knowledge base
0:21:42it is that strike not trying to i'm saying the user doesn't know what's in
0:21:46the knowledge base it's just saying whatever is the closest one you tell me
0:21:50okay the closest the one for example you get it will also be something like
0:21:56it will be something like for the area slot values actually this kind of situation
0:22:01our current a model cannot handle and or on the past work cannot handle because
0:22:05it and it is not actually appeared in the dataset we are using
0:22:09right thank you
0:22:23i notice that you were evaluating your model on two datasets the cambridge restaurant and
0:22:28the key v read and i was wondering with wouldn't be or how difficult would
0:22:33it be to extend them all to work on the multi walls dataset which is
0:22:37you know bigger than those two and as more domains and
0:22:41actually the very good questions
0:22:45actually in the
0:22:48in the for the for the most you want us that being the latest ecr
0:22:52conference to trader network that is trying to do it
0:22:57then updated it into do that they were showed that of the cherokee use the
0:23:01system in a kind of all a similar kind of techniques
0:23:04using different as that of sentence si models
0:23:07two different start of than the steamboat to generated are the values
0:23:11so i think that so we did a the our work kind of kind of
0:23:16prove that
0:23:17that's flexible started at the phonetic symbols structured the entity can be applied on the
0:23:22multi award part
0:23:23and the for the response generation part we believe that of the we believe that
0:23:28our proposed the copy word like anything can also work
0:23:31okay so basically you think that just retraining should it's should be sufficient i think
0:23:36okay thanks okay any other question
0:23:42it then i guess i have one more
0:23:46and there was
0:23:51basically when you when you were showing the us a lot response model or responsible
0:23:59decoder that was the
0:24:02i mean
0:24:03and i and you said that you have like once the gru
0:24:11ones that what as it exactly mean or weight is there like and one gru
0:24:15cell that is
0:24:17yes with a good
0:24:18kind of using the gru zero but we do not using it the recurrent a
0:24:23right and the output is like a
0:24:25one hearts encoding of the slots to be inserted in the response or is it
0:24:32some kind of embedding
0:24:36here it's a it depending on but also put his to the whole body it
0:24:40sure for small we can thing yet
0:24:42distribution from that there'll where right okay so that's why or what a couple or
0:24:48what a copy distribution what using this kind of zero to one values and the
0:24:52probability that we decide whether this
0:24:54the to increase this words channels appear in the in the agent response
0:24:58right okay thank you very much thank you
alright so what's thank the speaker again