Speech Transcript - Flexibly-Structured Model for Task-Oriented Dialogues

0:00:18	okay so the last
0:00:21	speaker in this session is play issue
0:00:26	and the she's going to present a flexibly structured models for task oriented dialogues so
0:00:32	another end-to-end dialog model
0:00:35	so
0:00:37	go ahead trees
0:01:07	and you're not everyone on relation for university of illinois at chicago our present our
0:01:13	work flexible structured task oriented dialogue modeling for short addressed the
0:01:19	this work at all my me pair molly no multi
0:01:22	who shoe being deal
0:01:24	why children and spoken for
0:01:28	lattice quick reply recap module it end-to-end dialog systems
0:01:33	traditional modularised dialogue system at the pipeline of natural language understanding dialog state tracking knowledge
0:01:40	base squarey
0:01:42	that a dialogue policy engine and natural language generation
0:01:47	and you and that of system connect all these motors together and the chain them
0:01:51	together with detecting and text out
0:01:54	the advantage of and you and fashion you that it can reduce the error propagation
0:02:02	dialog state tracking the key module which understanding user intentions
0:02:08	track dialog history and update dialog state at every turn
0:02:13	the update of dialogue state get used for carrying the knowledge base and a for
0:02:17	policy engine and for response generations
0:02:20	there are two popular approaches week or them fully structured approach and a freeform approach
0:02:30	the following doctrine approach uses the full structure of the knowledge base
0:02:35	both it's schema
0:02:37	and that the values
0:02:39	it as you don't that
0:02:41	the set of informable slot values and the requestable slots are fixed
0:02:47	the network about it's multiclass classification
0:02:51	the advantages that value and the slot are well aligned
0:02:55	the disadvantage in that it can not adapted dynamic knowledge base and detect out-ofvocabulary values
0:03:03	appeared user's utterance
0:03:10	the freefall approach does not exploit and information
0:03:14	a pause the knowledge base
0:03:16	in the model architecture
0:03:18	it achieves the dialog state as a sequence of informal values and the requestable slots
0:03:25	for example in the picture
0:03:27	in the restaurant domain
0:03:29	that dialog state it's
0:03:30	italian then we call an cheap then we call them
0:03:34	address then we call an and a full
0:03:37	the network it's sequences sequence
0:03:40	the pros i that
0:03:42	it can adapt to new domains
0:03:44	and that the changes in the content of knowledge base
0:03:48	it is stopped out-of-vocabulary problems
0:03:50	the disadvantage is that
0:03:53	value and the slot
0:03:54	and not aligned
0:03:56	for example
0:03:57	in travel booking system
0:03:59	given a
0:04:00	dialog state chicago and that's the other
0:04:03	can you tell
0:04:04	what you that departure city and the which when it's a rival city
0:04:09	and also
0:04:10	tough free from approach which model unwanted order of requestable slots and it can produce
0:04:16	in many states
0:04:18	that may be generated and non requestable slot words
0:04:24	so our proposed yet
0:04:26	flexible structured dialogue models
0:04:29	the contents fine components
0:04:31	the first it the queen hard
0:04:33	the queen hardly at all we encoded in core encoder module
0:04:37	and the yellow and orange part of our dialog state tracking
0:04:41	the purple part of its knowledge base query
0:04:45	the red part it's all a new module we propose yet call the response lot
0:04:49	decoders
0:04:50	and the green and of the we and that the blue part well together would
0:04:54	be the response generation
0:04:58	so we propose a flexible subject turn dialog state tracking
0:05:03	approach
0:05:04	what you use only the information in the schema
0:05:08	of the knowledge base but not to use the information about the values
0:05:13	the architecture we propose contains two parts
0:05:18	informable slot value decoder the yellow in this pictures
0:05:22	and the requestable slot decoder the already part
0:05:26	the informable slot value decoder has separate decoder to each informable slot
0:05:32	for example in this picture
0:05:36	what is for that right
0:05:37	given the start of standard token foot
0:05:40	the decoder generate italian and of food
0:05:45	for the requestable slot decoder idiot a multi-label classifier for requestable slots
0:05:50	or you can think that
0:05:53	binary classification given a requestable slot
0:05:57	you can see that inflexible structured approach has a lot of advantage first slot and
0:06:04	the values are aligned
0:06:06	it also solves all the vocabulary problem
0:06:09	and the k usually at that between your domains and of the changes of the
0:06:12	content of knowledge base because we are using a generation method for the informable value
0:06:18	decoder
0:06:19	and also we remove the unwanted order of the requestable slots and that the channel
0:06:24	to generate invalid the states
0:06:29	a nice the flexible subject read dialog state tracking it's
0:06:33	it can explicitly
0:06:35	a design value to slots
0:06:38	like the fully structured approach
0:06:40	why are also preserving the capability of dealing with all the vocabulary
0:06:45	like the freefall approach
0:06:47	meanwhile it ring challenges in response generation
0:06:52	the first challenge is that
0:06:54	the it possible to improve the response generation quality based i'll flexible structured dst
0:07:01	the second challenge is that
0:07:04	how to incorporate the output for a flexible subject or dst
0:07:08	for response generation
0:07:12	so regarding the first challenge
0:07:14	how to improve the response generation we propose a novel module called the response large
0:07:20	decoder
0:07:21	the writing to pick the right part in the pictures
0:07:25	the response slots
0:07:27	decoders
0:07:28	of the response slots
0:07:29	i don't slot names or the slot tokens
0:07:32	that appear in that you lexicalised the response
0:07:35	for example
0:07:36	the user request the address
0:07:39	the system replies
0:07:40	the address
0:07:41	often am slot
0:07:43	it in i just thought
0:07:45	so for the response lot colder we also adopt a multi-label classifier
0:07:52	regarding the stacking the challenge
0:07:54	how to incorporate
0:07:56	flexible subject or
0:07:57	the st
0:07:59	for the rest both generations
0:08:01	we propose toward a copy distributions
0:08:04	it will increase the chance of awards
0:08:07	in the informable slot values
0:08:10	requestable slots and the response lot to appear in the agent response
0:08:15	for example
0:08:17	the address of an m slot get e
0:08:20	i had just a lot so we are trying to increase the channels off
0:08:23	address
0:08:25	name slot and at a slot to appear in the response
0:08:31	it'll from now i'm going to go to detail how we link these modules together
0:08:39	first it always input encoders
0:08:42	i like input encoder
0:08:44	takes so you kind of input
0:08:46	the first get agents right well in the pastor
0:08:50	the second it that dialog state
0:08:53	and this sort yet the current the user's utterance
0:08:56	the out the were p
0:08:58	the last hidden state of the encoder
0:09:01	it was first asked initial hidden state
0:09:04	what the dialog state tracker and that the response generation
0:09:12	informable slot about a decoder gets one part of our flexible structure dst
0:09:18	it has to kind of input
0:09:21	the input e at last the hidden states from the encoders
0:09:25	and that the unique start of sentence syllables for each slot
0:09:29	for example
0:09:31	for the slot starting word gets food
0:09:34	the output
0:09:35	for each slot
0:09:37	a sequence of words regarding the slot values are generated
0:09:41	for example
0:09:43	the value generated of all for the slot here
0:09:45	italian
0:09:46	and awful
0:09:48	the intuition here is that
0:09:50	the unique start of sentencing both issuers
0:09:54	the slot and the value alignment
0:09:56	and that the complement can it then a command sequences sequence allows copying of values
0:10:01	directly from the encoder input
0:10:05	the requestable slot binary classifier
0:10:08	this is the another part in our d
0:10:10	flexible structure to dst
0:10:13	the you what is that
0:10:14	last hidden state of the encoder
0:10:17	unique start of send the symbols for each slot
0:10:20	for example
0:10:22	for the slot starting a war it also for
0:10:25	the also forty it's
0:10:26	for each slot
0:10:28	a binary prediction
0:10:29	true or false
0:10:31	the produced regarding whether the slot it is requested by the user or not
0:10:38	note that
0:10:39	but you are you here i guess only one step
0:10:42	it may be replaced that with any classification high key picture you want like
0:10:47	which uses you are good because we want to use the hidden state here
0:10:50	at the initial state for our response slot binary classifier
0:10:57	what the knowledge base acquire a get takes the in the generated informable slot values
0:11:02	and of the knowledge base and output
0:11:05	well how the vector represents the number of record the matched
0:11:12	he i get our response slot binary classifier
0:11:16	if the input es
0:11:17	the knowledge base par with a result
0:11:20	the hidden state from the requestable slot binary classifier
0:11:25	output yet
0:11:26	for each response plot a binary prediction
0:11:29	true or false
0:11:30	if the produced regarding whether it is response not appear in the asian the response
0:11:36	or not
0:11:38	the motivation is that
0:11:39	incorporating all it really relevant information about the retrieved entities
0:11:45	and that the requested slots into the response
0:11:52	our
0:11:52	copy what a word a copy distribution can use them
0:11:56	the motivation here is that
0:11:58	the canonical copy
0:12:00	mechanic then only takes a sequence of words in text input
0:12:05	but not accept
0:12:06	the multi porno distribution we obtain
0:12:09	from the binary classifiers
0:12:12	so we taking
0:12:14	the prediction from the informable slot the value decoders
0:12:18	and that from the requestable slot binary classifier and the response slot binary classifier
0:12:25	and output a word distribution
0:12:27	so
0:12:28	if a word yet a requestable slot or a response not
0:12:33	the probability of the a binary classifier output
0:12:37	if a word appears in the generated informable slot values
0:12:42	if the probability equal to one
0:12:45	four or other words in there
0:12:53	a interest about decoder
0:12:55	what taking that encode
0:12:56	the last hidden state of the encoders
0:12:59	and the knowledge base carried a result
0:13:01	and that the word a copy distributions
0:13:04	all support get a delexicalised agent response
0:13:08	the overall loss for the whole network what including the informable slot values
0:13:14	so loss and of the requestable slot values last response slot values most and that
0:13:20	the agent a response slot values but a gender is the boss loss
0:13:27	experimental settings
0:13:28	we use to kind of the that
0:13:31	the cambridge restaurant dataset and the stand for in-car assistant there is that
0:13:35	and the evaluation matches we use
0:13:38	for the dialog state tracking we report the
0:13:41	we report the precision recall and f-score four informable slot values and requestable smarts
0:13:47	and of what have completion
0:13:49	we use the and you match rate and the success f one score
0:13:54	and the blue yet apply to degenerated agent response for evaluating the language quality
0:14:02	we compare our method to these baselines
0:14:05	and em
0:14:06	and id and their functional ones
0:14:09	they using the fully structured approach what
0:14:12	for the dialog state tracking is
0:14:14	and the kb are in from the stand for
0:14:16	they do not think that they do not do that dialog state tracking
0:14:19	and that est p
0:14:21	and the t st p
0:14:23	without are your and ts tp the other freefall approaches
0:14:27	they use a two-stage copy and could be mccain didn't sequence of sequence
0:14:31	which kaldi software encoders and the true copy mechanic simple commanded decoders
0:14:36	to decode belief state first and then the response generation as
0:14:40	and of for the for its ep and also tuning
0:14:46	the response slot by the reinforcement learning
0:14:51	here the turn dialogue dialog state tracking results
0:14:54	you are notice that
0:14:56	our proposed the method fst in it performs much better than the free for approach
0:15:01	jesse p especially
0:15:04	the
0:15:04	especially on the requestable slot the reason is that
0:15:08	the free for approach that modeled the unwanted order
0:15:12	of the requestable slots
0:15:14	so that why hall or of f is the uncanny can perform better than them
0:15:22	this it our that of the level task completion without
0:15:26	you also notice that fst and can perform better than most
0:15:30	better than the baseline in models to match
0:15:33	you most the metrics that the blue on the kb it dataset
0:15:39	here it example of generated dialog state and the response from the free for approach
0:15:45	and all approach
0:15:46	in the calendars
0:16:09	okay
0:16:10	the belief state at the want to choose a belief that here is that for
0:16:14	the informable slot the you've and easy crow to the meeting and for the requestable
0:16:18	slot the user try to be acquired state
0:16:20	time and parity
0:16:22	the freefall approach it would generate meeting data and a party an ofdm would generate
0:16:27	the you've and the crow to them at a meeting data it to time it
0:16:31	to an a party it's true
0:16:33	you a notice that here the free for approach cannot generate the time
0:16:38	the time here the really that in the training dataset
0:16:42	the down a lot of example
0:16:44	contain data in the parties so they modeled disc the free one approach you model
0:16:49	it is kind of orders
0:16:51	so the mammoth right data in party together so when during the testing
0:16:56	the it
0:16:57	what during the testing if the user request that date time party it cannot predict
0:17:02	that the it cannot predict about the problem
0:17:05	and also for that
0:17:06	begin the response
0:17:08	the
0:17:09	one shows the it's your anatomy of the way it's
0:17:12	parties slot on that there is not a time slot the t a cp generate
0:17:17	the next meeting at that time slot on days not and the time slot and
0:17:21	i'll have sdm can generate
0:17:24	and maybe
0:17:25	a baseline at a time slot with part is not here the freedom approach can
0:17:30	generate system with the us and repeating this at the time slot
0:17:37	the conclusion here that we propose an island to an architecture with a flexible structure
0:17:42	model
0:17:44	for the task oriented dialogues
0:17:46	and the experiment
0:17:47	suggest that the architecture get competitive with these us assume top models and the wire
0:17:53	our model can be apply applicable you real world scenarios
0:17:57	our code will be available in the next few weeks on this links
0:18:05	and is it another when you work regarding the model be multi action policy what
0:18:09	task oriented dialogs it will appear mlp tucson it
0:18:14	the pre and the code are publicly accessible on this link all you can see
0:18:17	again the cure a cold
0:18:19	the traditional policy engine predicts what action per term which were limited express upon work
0:18:24	and introduce unwanted terms but interactions
0:18:28	so
0:18:28	we propose to generate monte action per turn by generating a sequence of tuple the
0:18:34	tuple units continue act and the smart
0:18:37	the continue here means well that we are going to stop generating just tuples all
0:18:42	we are going to continue to generate the couples the slot to me the accuracy
0:18:47	of the dialogue act and the slots media a does not carry it's the it's
0:18:51	not like a movie name
0:18:53	we propose a novel recurrent zero
0:18:56	called the data continues that's not g c is
0:18:59	which contains two units
0:19:01	continue you need act you need and the smallest unit
0:19:05	and it sequentially-connected in this recurrent is there
0:19:09	so the whole decoder yet in a recurrent of recurrent a fashions
0:19:15	we would like to deliver a special thanks to alex janice woman maps and the
0:19:20	stick their reviewers thank you
0:19:27	thank you very much for the talk
0:19:29	so are there any questions okay or in the back
0:19:40	i thank you very much that was very interesting
0:19:43	so what the system do if somebody didn't respond with a slot name or a
0:19:48	slot value
0:19:49	you know what time you what restaurant you want you that it is that the
0:19:52	closest one to the form theatre
0:19:59	excuse me to repeat the lessons again
0:20:03	your system prompts somebody for a restaurant where they want to eat you money that
0:20:07	some italian food the system says what restaurant would you like to eat at and
0:20:12	the user says the closest italian restaurant to the form theatre
0:20:16	so i'm not giving you a slot value i'm giving you a constraint on the
0:20:20	slot value
0:20:21	what this kind of an architecture do with something like that is a response okay
0:20:25	thank you a generate a
0:20:27	that
0:20:28	does not the menus provided user to what we are working for most of the
0:20:32	values were detected
0:20:34	so when we gent
0:20:35	the always informable slot value decoder
0:20:45	informable slot the melody currently decoder were trying to catch these the end use these
0:20:49	informations from the user side so when we are trying to generated is kind of
0:20:53	each things we are also well using the copy
0:20:56	we also are trying to increase these words to be appeared in the response generation
0:21:01	is for example
0:21:02	the titanium at the italian restaurant or you want to what b
0:21:05	a someplace this method that
0:21:08	i understand how you do that but the question is how would you get the
0:21:11	act to what the wrapper internal representation be somehow that we get the closest to
0:21:16	get the superlative
0:21:18	in the result how what if compute the closest of all you're doing is attending
0:21:22	to values that you have to compute some function like instance
0:21:27	actually at the very but the question i think that
0:21:31	it is them i'm getting you are trying to ask you whether if the
0:21:36	have to informable slot the values from the user the is not exactly match is
0:21:40	something that appear in the knowledge base
0:21:42	it is that strike not trying to i'm saying the user doesn't know what's in
0:21:46	the knowledge base it's just saying whatever is the closest one you tell me
0:21:50	okay the closest the one for example you get it will also be something like
0:21:56	it will be something like for the area slot values actually this kind of situation
0:22:01	our current a model cannot handle and or on the past work cannot handle because
0:22:05	it and it is not actually appeared in the dataset we are using
0:22:09	right thank you
0:22:14	any other questions
0:22:19	okay in that case i'd like to the collection
0:22:23	i notice that you were evaluating your model on two datasets the cambridge restaurant and
0:22:28	the key v read and i was wondering with wouldn't be or how difficult would
0:22:33	it be to extend them all to work on the multi walls dataset which is
0:22:37	you know bigger than those two and as more domains and
0:22:41	actually the very good questions
0:22:45	actually in the
0:22:48	in the for the for the most you want us that being the latest ecr
0:22:52	conference to trader network that is trying to do it
0:22:57	then updated it into do that they were showed that of the cherokee use the
0:23:01	system in a kind of all a similar kind of techniques
0:23:04	using different as that of sentence si models
0:23:07	two different start of than the steamboat to generated are the values
0:23:11	so i think that so we did a the our work kind of kind of
0:23:16	prove that
0:23:17	that's flexible started at the phonetic symbols structured the entity can be applied on the
0:23:22	multi award part
0:23:23	and the for the response generation part we believe that of the we believe that
0:23:28	our proposed the copy word like anything can also work
0:23:31	okay so basically you think that just retraining should it's should be sufficient i think
0:23:36	okay thanks okay any other question
0:23:42	it then i guess i have one more
0:23:46	and there was
0:23:51	basically when you when you were showing the us a lot response model or responsible
0:23:59	decoder that was the
0:24:02	i mean
0:24:03	and i and you said that you have like once the gru
0:24:11	ones that what as it exactly mean or weight is there like and one gru
0:24:15	cell that is
0:24:17	yes with a good
0:24:18	kind of using the gru zero but we do not using it the recurrent a
0:24:22	later
0:24:23	right and the output is like a
0:24:25	one hearts encoding of the slots to be inserted in the response or is it
0:24:32	some kind of embedding
0:24:36	here it's a it depending on but also put his to the whole body it
0:24:40	sure for small we can thing yet
0:24:42	distribution from that there'll where right okay so that's why or what a couple or
0:24:48	what a copy distribution what using this kind of zero to one values and the
0:24:52	probability that we decide whether this
0:24:54	the to increase this words channels appear in the in the agent response
0:24:58	right okay thank you very much thank you
0:25:02	alright so what's thank the speaker again

Flexibly-Structured Model for Task-Oriented Dialogues

Oral Session 3: Generation and End-to-end Dialogue Systems

Lei Shu, Piero Molino, Mahdi Namazifar, Hu Xu, Bing Liu, Huaixiu Zheng and Gokhan Tur