Speech Transcript - Dialog State Tracking: A Neural Reading Comprehension Approach

0:00:17	so the mixed speaker
0:00:29	so the next be could be sounded
0:00:31	so these
0:00:36	we study presentation
0:00:40	okay everyone my name is sent it and i'm going to present our data dialog
0:00:44	state tracking and you don't reading completion approach
0:00:46	this is a dying forbidden below we shake tag and the like another set from
0:00:49	the amazon alex a it means anyone california
0:00:52	so i'll first briefly introduce of the problem galaxy tracking is i guess sort of
0:00:55	you already know that work
0:00:57	thus for completeness and then i'll
0:00:59	talk about the motivation of our approach going to the tts of architecture show some
0:01:03	results innovations thirties and finally conclude that some at an analysis
0:01:07	so let's start so this is a the state discuss order dialog state is basically
0:01:12	dialog state represents a composition of dialogue history but galaxies basically
0:01:17	to represents what the user is interested in at any point in the conversation and
0:01:20	typically you the presenter dialog state with
0:01:23	slots and values
0:01:24	so here in the in the first and the user say that it needs a
0:01:27	book he needs to book a hotel in the use of that four stars and
0:01:30	this corresponds to a state where you have to start stars any together the respective
0:01:34	values
0:01:35	the elderly represents a domain that the user is talking about
0:01:37	and it will become more evident by that's important because
0:01:41	in the conversation again have multiple domains
0:01:43	so in like to examine the second done the user sees that so that it
0:01:48	can response asking if they surprising the user say that does not matter if it
0:01:52	has three wifi in parking in so how the spigot submitted this with three new
0:01:57	slot spotting and internet with the values us and the price don't get and the
0:02:01	other does not starting here gets carried about
0:02:04	in the next on the agent give some recommendation user say that sounds good i
0:02:08	would like also like a taxi to the ordered from cambridge sonar here we see
0:02:11	that these stars correspond the hotel domain gets got it over
0:02:14	but they are
0:02:15	slots to new starts departure and destination
0:02:17	corresponding to
0:02:18	a new domain taxi that also we need to
0:02:21	which also gets a bit in the dialog state so
0:02:24	know what is the task of dialog state tracking so you are not attacking basically
0:02:28	means you want to predict
0:02:29	the dialog state of the user one or more complete you are given the dialogue
0:02:33	history plus the current user utterance and you want to predict a distribution over the
0:02:37	our dialogue states and we saw the galaxy stability to typically to presented as slots
0:02:42	and values so this means a state trackers are
0:02:44	output a distribution over the slots and all the associated values
0:02:47	and that i looks too quickly consists of features like past user utterances pa system
0:02:52	response
0:02:52	it can have previous belief state or even any you interpretation of that is available
0:02:56	so this is the task
0:02:58	i don't to talk about briefly about what are the other traditional approaches to say
0:03:01	tracking
0:03:02	so one of the common approaches is a is very you encode the dialogue history
0:03:07	to some model architecture and then you have
0:03:10	you have a linear plus softmax layer on top and you are put a distribution
0:03:13	over the vocabulary
0:03:14	all the slot type and you do this for each slot in your scheme although
0:03:17	our dialog state
0:03:19	for example here you see on a protocol joystick tracking where the encode the dialogue
0:03:23	history using high technical lstm and then on top of that on the hidden representation
0:03:28	of the context they have a few properly or one for each not type
0:03:32	and then softmax layer to output the distribution of would be values that the that
0:03:36	particular star can take and these are the values which you have seen on the
0:03:39	training set
0:03:40	this brings to like to
0:03:42	main problems which such approaches
0:03:44	one is that they cannot handle out-of-vocabulary slot value mentions because the only output the
0:03:49	distribution over values that have been seen in the training set
0:03:52	the so in such a process it is assumed that the vocabulary or the ontology
0:03:56	is known in advance
0:03:57	and the second thing is that they do not scale well for slots that have
0:04:00	large vocabulary
0:04:00	but example the slot based on in we can assume that you can imagine that
0:04:04	the slot can take values from a possibly very large set so there's not enough
0:04:08	data to learn a good distribution over this large vocabulary
0:04:11	so on the other hand the teaching completion approaches typically do not rely on the
0:04:15	fixed vocabulary
0:04:16	this is because there are typically reading completion approaches are structured as
0:04:20	i an extractive question answering their the goal is to find a span of tokens
0:04:23	in the
0:04:24	in the passage which can t is the answer so there is no fixed vocabulary
0:04:28	and the second thing is
0:04:29	also that they have been a lot of be set advancement in reading comprehension that
0:04:32	we can leverage
0:04:33	if we structure our problem of state tracking as reading comprehension this let us to
0:04:37	propose this be computed for dialog state tracking and
0:04:43	in the next side of
0:04:44	before i go to exactly how we found in the problem i also want to
0:04:47	just give a month later or would be of how
0:04:50	typically machine reading compuation problems are opposed
0:04:52	so the general idea in reading companies you are given a question and pass it
0:04:55	and you are looking for a start of tokens in the passage that can be
0:04:58	assigned to
0:04:59	it's also to extract a question answering
0:05:01	and how people do is you encode the past it european a representation of each
0:05:06	token in the past would you encode the question you have a question representation and
0:05:09	on the top you have generally have what ancient head i training from the question
0:05:13	to each token in the past it one of the intention had to present the
0:05:16	start probability distribution
0:05:17	and the other representing and probability distribution once you have these two probability distributions you
0:05:21	just output b
0:05:22	at this point all the most probable span
0:05:25	and that is your answer
0:05:26	here it shows a popular architecture contatenate which is from microsoft the internally gets on
0:05:31	one and the use bunch of self attention according to layers to encode the basses
0:05:35	tokens
0:05:36	with the general it is assumed that you encode passage and question and then you
0:05:38	have attention for representing the start and end spent
0:05:41	so not less look at how we form the guitar that it's a tracking problem
0:05:45	as a teaching completion so
0:05:46	is the same dialogue as before
0:05:48	user is looking for a hotel
0:05:49	and after the second on you want to predict the values for each of these
0:05:52	slots at a hotel at a reading rise
0:05:55	and so on
0:05:56	and this easy chart takes into something like this may
0:05:59	you're dialogue context the whole dialogue context becomes a passage between alex and user times
0:06:04	and then the questions or something like what is the requested hotel at all but
0:06:08	to the requested value of the slot that you want to track and or is
0:06:11	something like this parking required in total and so on and then what you want
0:06:14	to find is the answer to these questions so
0:06:17	hotel for these first question you can look for the arts and the passes and
0:06:20	the models are point or something like ease and for some luckily second mission
0:06:24	and you are looking for hotel creating the models the point of this setup tokens
0:06:27	for starts
0:06:28	so as simple as that
0:06:31	no representations of how we present if a different got different components so dialogue history
0:06:36	which is also like the passage an arc formulation is represented as a concatenated user
0:06:41	in it and onset is to solve
0:06:42	it can be either one dimensional representation order to like to have assumed matrix like
0:06:47	a hierarchical representation and then you can use probably had a cloud in is to
0:06:50	encode them
0:06:52	and the slot which is the question in our formulation is domain class light emitting
0:06:57	we want to mean as well because as we saw in the previous ones the
0:06:59	example in there
0:07:01	it's not get out a data taken them a span multiple domains
0:07:04	and we have a fixed dimensional vector for these domains not combination which is learned
0:07:08	along with the full model
0:07:10	one thing to note here is that unlike what actually alike
0:07:13	we don't actually convert the slot into a full natural language question we just three
0:07:17	the embedding of the slot plus domain
0:07:19	as the question itself
0:07:21	and finally the onset is adjusted
0:07:23	starting in position in the conversation
0:07:26	"'kay" so this is the main model in our approach is quite the slots and
0:07:30	model
0:07:31	which is just like a typical extract if you're model what it does it predicts
0:07:34	the slot values this panel to consider in the dialogue the you have starting point
0:07:38	does and the starting spend a lot to bilinear tension between the dialogue context and
0:07:42	the slot invading
0:07:43	just like reading completion models and example shown here is
0:07:46	the same dialogue proposed on the uses a user wants to book a hotel in
0:07:49	these four stars so after the first and if you want to track this not
0:07:53	wouldn't hotel at a so in this case will assume that our model outputs a
0:07:58	start and probability which is high for the eight token in the context which represents
0:08:02	basically they down south east
0:08:04	okay so but this model is not sufficient
0:08:08	and this is true also for a question i think cases because
0:08:11	in certain slots that can take values from a closer like this a parking and
0:08:14	internet yes no so we need to can't for that and also the assumption slot
0:08:18	that can have a value core don't care for example pricing in the previous example
0:08:22	and
0:08:23	many of the slots they are never mation to the schema and so you need
0:08:26	to fill them with the default none value so these are the kids at that
0:08:28	cannot be guardedly handled by the span model
0:08:31	so to do this we augment are q model be to other auxiliary models at
0:08:35	cal you would model and the slot take model
0:08:37	okay you will model predicts whether we should just
0:08:40	a bit a slot value of in the current dialog the scheduler the old slightly
0:08:44	from the previous done and in the beginning it's not is initialized at the t
0:08:47	for none value
0:08:48	and a type model is just a simple classified which makes
0:08:51	decision about one of the four classes related yes no don't care order span type
0:08:57	so i'm going to because of the two models okay you are modeled as i
0:09:00	said it just predict so that will be the slot value for the content on
0:09:02	or to tell you what and it makes the binary decision for all the stories
0:09:06	jointly at each done
0:09:08	an example here would be so after the first and you have
0:09:11	values so what one thing i wanted like if i get the can you are
0:09:14	model is a kind of confusing because
0:09:18	what it exactly is it a slot a bit model what by mean that is
0:09:21	like
0:09:21	the one he represents that
0:09:23	you want to update the slot and zero to present that you want a caddy
0:09:25	or i just give this convention because we have it can in the people
0:09:28	so in here after the when you go from the forced down to the second
0:09:32	done the using as mentioned three new starts by five
0:09:36	like internet parking and the pricing so those slots will get a bit rates of
0:09:39	the values one by the added to start at an stars
0:09:43	they will be single because they want they will just get carried away from the
0:09:46	previous turn
0:09:48	and the type model is a simple it just predicts the start i given the
0:09:53	question which is the start and the dialogue context and it makes a for a
0:09:57	decision but yes no don't get a span simple example would be just a hotel
0:10:01	at a full in this context would be a span type because you want to
0:10:04	find the value used in the context and for the slot would barking the value
0:10:08	would be just
0:10:08	yes so it would be the aesthetically that the model should output
0:10:12	okay the so putting all this together the combined model is also be at the
0:10:16	bottom most we have about embedding it will cover the tokens in the passage
0:10:20	next we have a connection limiting i coding which is basically a bidirectional lstm
0:10:25	we just use only a bidirectional lstm one so this will give us the contextual
0:10:28	representation for each of the tokens use the last hidden layer of the lstm which
0:10:31	gives us the embedding of the dialogue
0:10:33	we embed the question using just the start as domain of adding a randomly initialized
0:10:38	and we just learned to the model
0:10:40	then so this
0:10:42	dialogue embedding back to will data t v used to predict these not get you
0:10:47	were decision so we have an instance in my
0:10:49	layer on top of that it just makes the binary decision for each of the
0:10:51	slots
0:10:53	for the slot i one of the input the dialogue embedding vector along with the
0:10:56	question vector and then it makes that's a softmax the to predict
0:10:59	i one of the four classes
0:11:00	and the spend more and finally will take input the question vector and that can
0:11:04	have attention from the question to each of the tokens in the past it's just
0:11:08	like any dm model and you would have these start span prediction and the in
0:11:11	prediction
0:11:12	so at infinite what happens is you will you will begin you with a single
0:11:17	dislike at what model if the cat you were modeled sees
0:11:20	a one which means to update the slot if it is a zero then we
0:11:23	just carry over the slot value from the previous done if it saves one which
0:11:26	means you want to be to start
0:11:27	then we label that i model
0:11:30	that i models easiest nor don't give it a bit the slot value for that
0:11:32	if it's a span then meanwhile disband model to get at
0:11:36	the start and end position of the slot value and then we just extract that
0:11:40	from the conversation and update a slot value
0:11:42	okay so
0:11:44	everyone and the two they have been using the same data set i can do
0:11:46	you know with the multi was dataset it's
0:11:48	most which is a human document collection about two point five thousand single domain and
0:11:52	seventy multi domain dialogues
0:11:54	it has annotations for dialog state and system acts we don't user dialogue act in
0:11:59	this in the small
0:12:00	and some statistics on that has about it of the four dialogs about hundred fifteen
0:12:04	thousand dollars
0:12:06	and averaged about answer starting point five in total exhausted we're tracking here is thirty
0:12:10	seven
0:12:10	a cross six domains
0:12:14	some results
0:12:15	so this is the original so before that the metric it is joint goal accuracy
0:12:20	which basically means that activity done you want to predict all the slots critically if
0:12:24	any of the start is a round then the value the accuracy zero otherwise one
0:12:29	so it so it is strict metric
0:12:31	so the audio this other
0:12:33	the first number it's from the original multi was paper the response people
0:12:36	glad and dcr what about that have been there a lattice using like sender can
0:12:41	do out and then split
0:12:43	i mean the global tracking in a local track attendee c is just a simplified
0:12:46	version of black
0:12:47	so these two numbers and then
0:12:49	dstc joint state tracking that i should before where the encode decode and dialogue history
0:12:53	too high typical lstm
0:12:54	and then have a feed-forward layer for each start i
0:12:57	so that the number is about thirty eight to not approach with the single model
0:13:01	is bits all these approaches
0:13:03	and then we'll to be done on someone model which is basically just take a
0:13:06	majority would between t different a models trained with three different seats
0:13:10	and finally we also wanted to come we also wanted to check
0:13:13	a however this work if you just combine our approach with this
0:13:17	with a close look at videoplus like of demonstrating a joint state tracking model and
0:13:21	how we combine is it is very simple we just
0:13:23	choose one of the two approaches based on
0:13:26	for each slot we choose one of the two approaches based on which of it
0:13:28	is better
0:13:29	for that particular slot on the dev set
0:13:31	and this gives us a constable whose like about five percent
0:13:34	and we see why this happens it
0:13:38	we did some recent studies the first and most important is like
0:13:41	if we feed the ground truth for all the three models that get so these
0:13:45	submissions series of for a for this the single model of are plotted this is
0:13:49	not for the
0:13:49	we combined
0:13:51	a model that the dst
0:13:52	so here if we feed on the t carry over to slot-types and these not
0:13:56	and model that the ground truth
0:13:57	you get the accuracy joint goal accuracy on the dataset as seventy three
0:14:01	which basically means that approach is upper bounded base of entity
0:14:04	what that basic you need to decrease with seven percent of slot values are not
0:14:07	even present in the conversation and example would be something like what kind of sports
0:14:11	in the context six marginal sports attraction are is available in the centre of town
0:14:15	and you want to find the slot attraction or type
0:14:18	the if the answer is multiple support a model will never get it right even
0:14:22	if the model and it points to support it does points to this values board
0:14:25	it is not the same as the ground truth is much but also in this
0:14:28	area bounded
0:14:30	by seventy percent and this is also the reason and combine our approach with the
0:14:33	all close look at very which is more based on the ontology we get some
0:14:36	post
0:14:37	and elevation is that board so if we add about you get about two percent
0:14:41	gain then we did some oracle with each of the model type so if we
0:14:45	place the so the justice not like model with the ground truth so this already
0:14:49	constructed model we don't get much again we get about like one percent gain or
0:14:54	half of a person in
0:14:55	if we replace the slots and model with the grounded we get about four percent
0:14:59	in
0:15:00	but if we replace the order of the slot carryover model with the ground would
0:15:03	be get about sixty we get about twenty percent
0:15:05	the in so as you can see that this is the bottleneck here the caddy
0:15:08	were model this is also evident from if you look at the accuracy for each
0:15:11	model that i understand models have like
0:15:13	ninety to ninety five percent which is pretty high
0:15:15	but i and you're model only has like seventy percent of seventy six percent unable
0:15:18	accuracy
0:15:19	so this gives direction for future work may be wanting prove this
0:15:23	you model
0:15:26	so we also analyze how does the performance leafy as being the conversation history but
0:15:30	and these are strictly decrease in performance that as a conversation is cheaper and this
0:15:34	is
0:15:35	because of the other propagation from the caddy one model
0:15:40	and finally we did some added analysis we basically took some two hundred data samples
0:15:45	and b
0:15:47	we did some two hundred and samples the and we analyze the men be bracketed
0:15:50	them into for different categories
0:15:52	the first in the biggest categories call unanswerable slot data
0:15:55	so these are the others which are made by our cat your start getting word
0:15:58	model
0:15:59	so there to get a case in this the first one is but the difference
0:16:03	is non and hypothesis is not and it basically needs
0:16:05	the references that we should can't you does not value from the previous done by
0:16:09	the i model the same to updated
0:16:12	so in this case and in this is the second one is the opposite of
0:16:14	this so in the first case
0:16:16	even though this is the bulk of i don't like forty two person i mean
0:16:19	we look at the actually the others these are not real as the model is
0:16:22	making the prediction which is actually correct
0:16:25	but there is a lot of annotation noise in the dataset because the state some
0:16:28	on either the states is are they are modeled they have adhered model like they
0:16:33	are updated after one after one done so because of its all these that you
0:16:37	get added as that i was but a bunch of them are about "'em" a
0:16:40	lot of them are not really errors
0:16:42	in the second case of it is
0:16:44	maybe ground for this predicting that we should
0:16:47	but with the ground it is to update the start value while our model predicts
0:16:49	to just carry over from the previous done in this case the there are some
0:16:53	errors
0:16:54	for example here you can see the user is trying to book
0:16:58	trying to destroy in the centre part of the down and finally the they didn't
0:17:01	is able to make the reservation and the new users to next say that you
0:17:05	also needs an attraction type near the nystrom so here many via when you want
0:17:09	to fill a slot say attraction dark at a so the model is model c
0:17:14	is that this would be non which basically means it is not been mentioned
0:17:17	no but as you can see that the user says it should be near the
0:17:20	neck structure it should be carried away from the previous domain so our model is
0:17:23	unable to unable to do that
0:17:26	so the next i will denote is what we call in can extract reference which
0:17:29	basically means there are multiple possible candidates in the context but our model predicts that
0:17:33	on candidate so in this case you see the user is trying to book a
0:17:37	hotel with of with all four people made in response to the booking was unsuccessful
0:17:41	and the user
0:17:42	a basic question at eight people
0:17:45	the ground truth is eight of course but our model predicts for
0:17:47	so be seen as a lot of this happens and there is at i think
0:17:50	as in this case or in the user change its mind so our model is
0:17:53	not
0:17:54	a robust to these kinds of things and the possible reason would be that models
0:17:57	were fitted to a particular entity like for which is the testing more data
0:18:00	training set
0:18:02	this accounts for about twenty percent of it is
0:18:05	the next categories the what we call slot resolution that are here you see the
0:18:10	context or something like i want to leave the hotel by two thirty
0:18:13	the model with a model pointed to thirty but the ground truth is actually fifteen
0:18:17	thirty so these this is kind of like an intended output because we only do
0:18:20	pointing the context
0:18:21	so these are more like and unlike playstation it is it's about thirty percent
0:18:25	the final thing is the slot boundary errors there that's and model makes a mistake
0:18:28	it's either exploit it i to get the span which is are supposed to be
0:18:34	a different sort it is a subset of the difference in this case the difference
0:18:37	is just the nine does as the to start by a model guest not all
0:18:40	city center but this is only a small was it is like to point represents
0:18:43	the other
0:18:45	finally i also want to just
0:18:47	but one slide on that the number that i should is about state-of-the-art can be
0:18:50	some but it but since then there's a paper it is here this is the
0:18:53	straight
0:18:53	or transportable multi-domain generated out for task oriented dialogue systems are
0:18:57	here what did we deduce pointed entered a network to combine the fixable cavity along
0:19:01	with their distribution over the dialogues
0:19:03	dialogue history and they get a slightly better accuracy than the
0:19:05	the then a model combined with the dst but the a key difference between data
0:19:10	points in a see that
0:19:11	the user decoded degenerate barstow can try to convey the we just use r two
0:19:15	pointers to point to the start and end up the span
0:19:17	and
0:19:20	that's probably already wanted to thank you and
0:19:22	i could questions
0:19:29	okay so we have time questions
0:19:32	i said to thank you thank you for the talk my question is when you're
0:19:36	considering the different types like yes no don't care ends and the span and span
0:19:42	this potential eels of another case right with that is when the user doesn't really
0:19:49	see
0:19:50	the value of the slot is but can infer that like twins fsa and what
0:19:54	cuisine type do you want an essay i want some pizza tonight the classifier could
0:19:59	be inferred that is the value for the cuisine type will be italian but the
0:20:04	user never said italian so the span would not cover that case right
0:20:08	so you what the user say so you're geniuses user says i want some pizza
0:20:12	night or something that are not okay that's true so those are not covered here
0:20:16	because we are just doing more like pointing i and model probably would put expand
0:20:20	because it's not one of the two types but we will and will point probably
0:20:24	point two is the category but we fail just like in other cases
0:20:28	so we have you have a future direction where we can sort inspired from being
0:20:31	completion where you can do more like abstract of question answering you can use these
0:20:35	as a rational and then try to have a generative model it generates the value
0:20:39	which it is most like italian grounded selection that we can we can do that
0:20:43	in future
0:20:45	okay
0:20:51	thanks for the great talk a just one simple question present so if i give
0:20:56	you a sentence like i want to go from cameras to and then you know
0:21:00	the destination efficient and approaches camera dissing your model can do like in this case
0:21:06	you can do better because they are all they are both value for the place
0:21:09	right
0:21:10	using a model can do better than baseline system within these kind of
0:21:14	designs
0:21:15	because you are still like slot by slot by then how this the model no
0:21:19	destination is
0:21:20	it's london is not comments
0:21:24	i see
0:21:26	so it would because we three the context right so it can learn like from
0:21:29	and to from the context that
0:21:32	what about you know what about because you check that span try and is possible
0:21:37	that both slot
0:21:39	both on both a prediction that's n
0:21:42	like they all mark no so but we also proceed in the slot type right
0:21:47	so destination and so the c
0:21:56	no i don't maybe it's in the final present so he have in the predicting
0:22:01	the
0:22:02	we also
0:22:03	have a question vector right so it would be either destination on the source right
0:22:07	so it based on that the span model can infer whether it would be
0:22:11	the question is user query embedding so follows two slots is the same user query
0:22:17	no should be different right so it would be a destination are the source
0:22:22	so the other considered slide information yes okay so this is the question recognition it
0:22:26	is a slot
0:22:28	okay they can you might tell different it meant that if okay cool thanks
0:22:35	the questions
0:22:50	maybe a provocative question but we have heard many papers about you know
0:22:56	dialog state tracking and in particular at this particular corpus and so my question is
0:23:02	what do you think we need to take it to then next level
0:23:05	when you know we don't talk about going from cambridge to land on or looking
0:23:11	for a chinese restaurant on
0:23:14	tonight
0:23:15	so if you don't or a particularly improving on this dataset i think nine jen
0:23:20	be honest
0:23:24	i think
0:23:26	i mean v but it is necessary i would say like a particularly looking experimented
0:23:30	with it is this data set i found that the a lot of errors in
0:23:33	this especially with respect to dialog state annotations so if you're just trying to improve
0:23:38	upon this it's not a good idea because we won't even over that we are
0:23:41	doing better not a so they are these a new dataset dstc a that we
0:23:45	can look into and c
0:23:47	for approaches are do better but otherwise i mean i feel like now people have
0:23:52	begin to do more into n approaches where you don't even need the state it's
0:23:55	more implicit but then that's eigenvoice under the same problem to pipeline or not to
0:24:00	pipeline so
0:24:01	i don't good answers
0:24:05	and user questions
0:24:08	i have one question so have you can see that the wasteful evaluation i i'm
0:24:14	not sure if the carryover ease the you know closing some problem in the evaluation
0:24:18	if we can be so previous slot values a circle back propagating areas to the
0:24:25	next ones but if you if you sort of the
0:24:28	have another metric that like a soft update rate or something like that is the
0:24:33	be possible for you to evaluate you missus more accurately
0:24:37	a slot will be treated like
0:24:41	also
0:24:42	i see a point
0:24:44	so the numbers i think get for the
0:24:48	so this some of the seventy six percent is more like
0:24:51	each i don't level accuracy for a particular done if the carrier model predict everything
0:24:55	gradient using more like
0:24:57	better not be updated
0:24:59	like more like precision and recall for either that be better exactly the eigen put
0:25:03	it here but also here like this these twitter data rate
0:25:07	you can think about it is the first one is more like a precision it
0:25:10	will for the slot a big model for the carrier would like this thing about
0:25:13	that big model so in this case the model predicts that we should update
0:25:17	but the grounded is not a base so this is like a precision and the
0:25:20	second is more can you what it
0:25:22	statistic is more likely correlated
0:25:23	so i don't know the numbers this morning at t and eighty four percent number
0:25:27	to this more destructive actually somewhat inflated i is more meaningful looking down level because
0:25:32	it won't all the
0:25:34	starts to be getting because eventual goal is to do joint goal accuracy when you
0:25:37	want all the slots to be correctly predicted
0:25:39	okay and we did we didn't train our models so an important
0:25:43	thing is also that train these the caddy or model jointly and not
0:25:48	well as log and this is important because if you do per slot the we
0:25:51	don't we try to the meeting good performance because of a one like
0:25:55	the dataset that lead up examples particularly for the cable model is highly biased you
0:26:00	can imagine like the number of bits are very few most of the time distorted
0:26:03	just getting at either one so if you just trained directly you would won't have
0:26:06	anything but signals are two for the updates and you will get just biased
0:26:11	the training
0:26:14	so it it's about time so it's not to speak again

Dialog State Tracking: A Neural Reading Comprehension Approach

Oral Session 4: Understanding and Dialogue State Tracking

Shuyang Gao, Abhishek Sethi, Sanchit Agarwal, Tagyoung Chung and Dilek Hakkani-Tur