0:00:18okay so the last
0:00:21speaker in this session is play issue
0:00:26and the she's going to present a flexibly structured models for task oriented dialogues so
0:00:32another end-to-end dialog model
0:00:37go ahead trees
0:01:07and you're not everyone on relation for university of illinois at chicago our present our
0:01:13work flexible structured task oriented dialogue modeling for short addressed the
0:01:19this work at all my me pair molly no multi
0:01:22who shoe being deal
0:01:24why children and spoken for
0:01:28lattice quick reply recap module it end-to-end dialog systems
0:01:33traditional modularised dialogue system at the pipeline of natural language understanding dialog state tracking knowledge
0:01:40base squarey
0:01:42that a dialogue policy engine and natural language generation
0:01:47and you and that of system connect all these motors together and the chain them
0:01:51together with detecting and text out
0:01:54the advantage of and you and fashion you that it can reduce the error propagation
0:02:02dialog state tracking the key module which understanding user intentions
0:02:08track dialog history and update dialog state at every turn
0:02:13the update of dialogue state get used for carrying the knowledge base and a for
0:02:17policy engine and for response generations
0:02:20there are two popular approaches week or them fully structured approach and a freeform approach
0:02:30the following doctrine approach uses the full structure of the knowledge base
0:02:35both it's schema
0:02:37and that the values
0:02:39it as you don't that
0:02:41the set of informable slot values and the requestable slots are fixed
0:02:47the network about it's multiclass classification
0:02:51the advantages that value and the slot are well aligned
0:02:55the disadvantage in that it can not adapted dynamic knowledge base and detect out-ofvocabulary values
0:03:03appeared user's utterance
0:03:10the freefall approach does not exploit and information
0:03:14a pause the knowledge base
0:03:16in the model architecture
0:03:18it achieves the dialog state as a sequence of informal values and the requestable slots
0:03:25for example in the picture
0:03:27in the restaurant domain
0:03:29that dialog state it's
0:03:30italian then we call an cheap then we call them
0:03:34address then we call an and a full
0:03:37the network it's sequences sequence
0:03:40the pros i that
0:03:42it can adapt to new domains
0:03:44and that the changes in the content of knowledge base
0:03:48it is stopped out-of-vocabulary problems
0:03:50the disadvantage is that
0:03:53value and the slot
0:03:54and not aligned
0:03:56for example
0:03:57in travel booking system
0:03:59given a
0:04:00dialog state chicago and that's the other
0:04:03can you tell
0:04:04what you that departure city and the which when it's a rival city
0:04:09and also
0:04:10tough free from approach which model unwanted order of requestable slots and it can produce
0:04:16in many states
0:04:18that may be generated and non requestable slot words
0:04:24so our proposed yet
0:04:26flexible structured dialogue models
0:04:29the contents fine components
0:04:31the first it the queen hard
0:04:33the queen hardly at all we encoded in core encoder module
0:04:37and the yellow and orange part of our dialog state tracking
0:04:41the purple part of its knowledge base query
0:04:45the red part it's all a new module we propose yet call the response lot
0:04:50and the green and of the we and that the blue part well together would
0:04:54be the response generation
0:04:58so we propose a flexible subject turn dialog state tracking
0:05:04what you use only the information in the schema
0:05:08of the knowledge base but not to use the information about the values
0:05:13the architecture we propose contains two parts
0:05:18informable slot value decoder the yellow in this pictures
0:05:22and the requestable slot decoder the already part
0:05:26the informable slot value decoder has separate decoder to each informable slot
0:05:32for example in this picture
0:05:36what is for that right
0:05:37given the start of standard token foot
0:05:40the decoder generate italian and of food
0:05:45for the requestable slot decoder idiot a multi-label classifier for requestable slots
0:05:50or you can think that
0:05:53binary classification given a requestable slot
0:05:57you can see that inflexible structured approach has a lot of advantage first slot and
0:06:04the values are aligned
0:06:06it also solves all the vocabulary problem
0:06:09and the k usually at that between your domains and of the changes of the
0:06:12content of knowledge base because we are using a generation method for the informable value
0:06:19and also we remove the unwanted order of the requestable slots and that the channel
0:06:24to generate invalid the states
0:06:29a nice the flexible subject read dialog state tracking it's
0:06:33it can explicitly
0:06:35a design value to slots
0:06:38like the fully structured approach
0:06:40why are also preserving the capability of dealing with all the vocabulary
0:06:45like the freefall approach
0:06:47meanwhile it ring challenges in response generation
0:06:52the first challenge is that
0:06:54the it possible to improve the response generation quality based i'll flexible structured dst
0:07:01the second challenge is that
0:07:04how to incorporate the output for a flexible subject or dst
0:07:08for response generation
0:07:12so regarding the first challenge
0:07:14how to improve the response generation we propose a novel module called the response large
0:07:21the writing to pick the right part in the pictures
0:07:25the response slots
0:07:28of the response slots
0:07:29i don't slot names or the slot tokens
0:07:32that appear in that you lexicalised the response
0:07:35for example
0:07:36the user request the address
0:07:39the system replies
0:07:40the address
0:07:41often am slot
0:07:43it in i just thought
0:07:45so for the response lot colder we also adopt a multi-label classifier
0:07:52regarding the stacking the challenge
0:07:54how to incorporate
0:07:56flexible subject or
0:07:57the st
0:07:59for the rest both generations
0:08:01we propose toward a copy distributions
0:08:04it will increase the chance of awards
0:08:07in the informable slot values
0:08:10requestable slots and the response lot to appear in the agent response
0:08:15for example
0:08:17the address of an m slot get e
0:08:20i had just a lot so we are trying to increase the channels off
0:08:25name slot and at a slot to appear in the response
0:08:31it'll from now i'm going to go to detail how we link these modules together
0:08:39first it always input encoders
0:08:42i like input encoder
0:08:44takes so you kind of input
0:08:46the first get agents right well in the pastor
0:08:50the second it that dialog state
0:08:53and this sort yet the current the user's utterance
0:08:56the out the were p
0:08:58the last hidden state of the encoder
0:09:01it was first asked initial hidden state
0:09:04what the dialog state tracker and that the response generation
0:09:12informable slot about a decoder gets one part of our flexible structure dst
0:09:18it has to kind of input
0:09:21the input e at last the hidden states from the encoders
0:09:25and that the unique start of sentence syllables for each slot
0:09:29for example
0:09:31for the slot starting word gets food
0:09:34the output
0:09:35for each slot
0:09:37a sequence of words regarding the slot values are generated
0:09:41for example
0:09:43the value generated of all for the slot here
0:09:46and awful
0:09:48the intuition here is that
0:09:50the unique start of sentencing both issuers
0:09:54the slot and the value alignment
0:09:56and that the complement can it then a command sequences sequence allows copying of values
0:10:01directly from the encoder input
0:10:05the requestable slot binary classifier
0:10:08this is the another part in our d
0:10:10flexible structure to dst
0:10:13the you what is that
0:10:14last hidden state of the encoder
0:10:17unique start of send the symbols for each slot
0:10:20for example
0:10:22for the slot starting a war it also for
0:10:25the also forty it's
0:10:26for each slot
0:10:28a binary prediction
0:10:29true or false
0:10:31the produced regarding whether the slot it is requested by the user or not
0:10:38note that
0:10:39but you are you here i guess only one step
0:10:42it may be replaced that with any classification high key picture you want like
0:10:47which uses you are good because we want to use the hidden state here
0:10:50at the initial state for our response slot binary classifier
0:10:57what the knowledge base acquire a get takes the in the generated informable slot values
0:11:02and of the knowledge base and output
0:11:05well how the vector represents the number of record the matched
0:11:12he i get our response slot binary classifier
0:11:16if the input es
0:11:17the knowledge base par with a result
0:11:20the hidden state from the requestable slot binary classifier
0:11:25output yet
0:11:26for each response plot a binary prediction
0:11:29true or false
0:11:30if the produced regarding whether it is response not appear in the asian the response
0:11:36or not
0:11:38the motivation is that
0:11:39incorporating all it really relevant information about the retrieved entities
0:11:45and that the requested slots into the response
0:11:52copy what a word a copy distribution can use them
0:11:56the motivation here is that
0:11:58the canonical copy
0:12:00mechanic then only takes a sequence of words in text input
0:12:05but not accept
0:12:06the multi porno distribution we obtain
0:12:09from the binary classifiers
0:12:12so we taking
0:12:14the prediction from the informable slot the value decoders
0:12:18and that from the requestable slot binary classifier and the response slot binary classifier
0:12:25and output a word distribution
0:12:28if a word yet a requestable slot or a response not
0:12:33the probability of the a binary classifier output
0:12:37if a word appears in the generated informable slot values
0:12:42if the probability equal to one
0:12:45four or other words in there
0:12:53a interest about decoder
0:12:55what taking that encode
0:12:56the last hidden state of the encoders
0:12:59and the knowledge base carried a result
0:13:01and that the word a copy distributions
0:13:04all support get a delexicalised agent response
0:13:08the overall loss for the whole network what including the informable slot values
0:13:14so loss and of the requestable slot values last response slot values most and that
0:13:20the agent a response slot values but a gender is the boss loss
0:13:27experimental settings
0:13:28we use to kind of the that
0:13:31the cambridge restaurant dataset and the stand for in-car assistant there is that
0:13:35and the evaluation matches we use
0:13:38for the dialog state tracking we report the
0:13:41we report the precision recall and f-score four informable slot values and requestable smarts
0:13:47and of what have completion
0:13:49we use the and you match rate and the success f one score
0:13:54and the blue yet apply to degenerated agent response for evaluating the language quality
0:14:02we compare our method to these baselines
0:14:05and em
0:14:06and id and their functional ones
0:14:09they using the fully structured approach what
0:14:12for the dialog state tracking is
0:14:14and the kb are in from the stand for
0:14:16they do not think that they do not do that dialog state tracking
0:14:19and that est p
0:14:21and the t st p
0:14:23without are your and ts tp the other freefall approaches
0:14:27they use a two-stage copy and could be mccain didn't sequence of sequence
0:14:31which kaldi software encoders and the true copy mechanic simple commanded decoders
0:14:36to decode belief state first and then the response generation as
0:14:40and of for the for its ep and also tuning
0:14:46the response slot by the reinforcement learning
0:14:51here the turn dialogue dialog state tracking results
0:14:54you are notice that
0:14:56our proposed the method fst in it performs much better than the free for approach
0:15:01jesse p especially
0:15:04especially on the requestable slot the reason is that
0:15:08the free for approach that modeled the unwanted order
0:15:12of the requestable slots
0:15:14so that why hall or of f is the uncanny can perform better than them
0:15:22this it our that of the level task completion without
0:15:26you also notice that fst and can perform better than most
0:15:30better than the baseline in models to match
0:15:33you most the metrics that the blue on the kb it dataset
0:15:39here it example of generated dialog state and the response from the free for approach
0:15:45and all approach
0:15:46in the calendars
0:16:10the belief state at the want to choose a belief that here is that for
0:16:14the informable slot the you've and easy crow to the meeting and for the requestable
0:16:18slot the user try to be acquired state
0:16:20time and parity
0:16:22the freefall approach it would generate meeting data and a party an ofdm would generate
0:16:27the you've and the crow to them at a meeting data it to time it
0:16:31to an a party it's true
0:16:33you a notice that here the free for approach cannot generate the time
0:16:38the time here the really that in the training dataset
0:16:42the down a lot of example
0:16:44contain data in the parties so they modeled disc the free one approach you model
0:16:49it is kind of orders
0:16:51so the mammoth right data in party together so when during the testing
0:16:56the it
0:16:57what during the testing if the user request that date time party it cannot predict
0:17:02that the it cannot predict about the problem
0:17:05and also for that
0:17:06begin the response
0:17:09one shows the it's your anatomy of the way it's
0:17:12parties slot on that there is not a time slot the t a cp generate
0:17:17the next meeting at that time slot on days not and the time slot and
0:17:21i'll have sdm can generate
0:17:24and maybe
0:17:25a baseline at a time slot with part is not here the freedom approach can
0:17:30generate system with the us and repeating this at the time slot
0:17:37the conclusion here that we propose an island to an architecture with a flexible structure
0:17:44for the task oriented dialogues
0:17:46and the experiment
0:17:47suggest that the architecture get competitive with these us assume top models and the wire
0:17:53our model can be apply applicable you real world scenarios
0:17:57our code will be available in the next few weeks on this links
0:18:05and is it another when you work regarding the model be multi action policy what
0:18:09task oriented dialogs it will appear mlp tucson it
0:18:14the pre and the code are publicly accessible on this link all you can see
0:18:17again the cure a cold
0:18:19the traditional policy engine predicts what action per term which were limited express upon work
0:18:24and introduce unwanted terms but interactions
0:18:28we propose to generate monte action per turn by generating a sequence of tuple the
0:18:34tuple units continue act and the smart
0:18:37the continue here means well that we are going to stop generating just tuples all
0:18:42we are going to continue to generate the couples the slot to me the accuracy
0:18:47of the dialogue act and the slots media a does not carry it's the it's
0:18:51not like a movie name
0:18:53we propose a novel recurrent zero
0:18:56called the data continues that's not g c is
0:18:59which contains two units
0:19:01continue you need act you need and the smallest unit
0:19:05and it sequentially-connected in this recurrent is there
0:19:09so the whole decoder yet in a recurrent of recurrent a fashions
0:19:15we would like to deliver a special thanks to alex janice woman maps and the
0:19:20stick their reviewers thank you
0:19:27thank you very much for the talk
0:19:29so are there any questions okay or in the back
0:19:40i thank you very much that was very interesting
0:19:43so what the system do if somebody didn't respond with a slot name or a
0:19:48slot value
0:19:49you know what time you what restaurant you want you that it is that the
0:19:52closest one to the form theatre
0:19:59excuse me to repeat the lessons again
0:20:03your system prompts somebody for a restaurant where they want to eat you money that
0:20:07some italian food the system says what restaurant would you like to eat at and
0:20:12the user says the closest italian restaurant to the form theatre
0:20:16so i'm not giving you a slot value i'm giving you a constraint on the
0:20:20slot value
0:20:21what this kind of an architecture do with something like that is a response okay
0:20:25thank you a generate a
0:20:28does not the menus provided user to what we are working for most of the
0:20:32values were detected
0:20:34so when we gent
0:20:35the always informable slot value decoder
0:20:45informable slot the melody currently decoder were trying to catch these the end use these
0:20:49informations from the user side so when we are trying to generated is kind of
0:20:53each things we are also well using the copy
0:20:56we also are trying to increase these words to be appeared in the response generation
0:21:01is for example
0:21:02the titanium at the italian restaurant or you want to what b
0:21:05a someplace this method that
0:21:08i understand how you do that but the question is how would you get the
0:21:11act to what the wrapper internal representation be somehow that we get the closest to
0:21:16get the superlative
0:21:18in the result how what if compute the closest of all you're doing is attending
0:21:22to values that you have to compute some function like instance
0:21:27actually at the very but the question i think that
0:21:31it is them i'm getting you are trying to ask you whether if the
0:21:36have to informable slot the values from the user the is not exactly match is
0:21:40something that appear in the knowledge base
0:21:42it is that strike not trying to i'm saying the user doesn't know what's in
0:21:46the knowledge base it's just saying whatever is the closest one you tell me
0:21:50okay the closest the one for example you get it will also be something like
0:21:56it will be something like for the area slot values actually this kind of situation
0:22:01our current a model cannot handle and or on the past work cannot handle because
0:22:05it and it is not actually appeared in the dataset we are using
0:22:09right thank you
0:22:14any other questions
0:22:19okay in that case i'd like to the collection
0:22:23i notice that you were evaluating your model on two datasets the cambridge restaurant and
0:22:28the key v read and i was wondering with wouldn't be or how difficult would
0:22:33it be to extend them all to work on the multi walls dataset which is
0:22:37you know bigger than those two and as more domains and
0:22:41actually the very good questions
0:22:45actually in the
0:22:48in the for the for the most you want us that being the latest ecr
0:22:52conference to trader network that is trying to do it
0:22:57then updated it into do that they were showed that of the cherokee use the
0:23:01system in a kind of all a similar kind of techniques
0:23:04using different as that of sentence si models
0:23:07two different start of than the steamboat to generated are the values
0:23:11so i think that so we did a the our work kind of kind of
0:23:16prove that
0:23:17that's flexible started at the phonetic symbols structured the entity can be applied on the
0:23:22multi award part
0:23:23and the for the response generation part we believe that of the we believe that
0:23:28our proposed the copy word like anything can also work
0:23:31okay so basically you think that just retraining should it's should be sufficient i think
0:23:36okay thanks okay any other question
0:23:42it then i guess i have one more
0:23:46and there was
0:23:51basically when you when you were showing the us a lot response model or responsible
0:23:59decoder that was the
0:24:02i mean
0:24:03and i and you said that you have like once the gru
0:24:11ones that what as it exactly mean or weight is there like and one gru
0:24:15cell that is
0:24:17yes with a good
0:24:18kind of using the gru zero but we do not using it the recurrent a
0:24:23right and the output is like a
0:24:25one hearts encoding of the slots to be inserted in the response or is it
0:24:32some kind of embedding
0:24:36here it's a it depending on but also put his to the whole body it
0:24:40sure for small we can thing yet
0:24:42distribution from that there'll where right okay so that's why or what a couple or
0:24:48what a copy distribution what using this kind of zero to one values and the
0:24:52probability that we decide whether this
0:24:54the to increase this words channels appear in the in the agent response
0:24:58right okay thank you very much thank you
0:25:02alright so what's thank the speaker again