Speech Transcript - Reference Resolution in Situated Dialogue with Learned Semantics

0:00:15	and
0:00:16	it's reference resolution in situated dialogue with learned semantics
0:00:21	so
0:00:23	a last look at the
0:00:26	iterative dialogue for
0:00:29	sitting at the dialog situated in and
0:00:32	environment
0:00:33	like a
0:00:35	for example
0:00:36	for human robot interaction
0:00:38	in this image this human was trying to teach this robot to learn the map
0:00:45	of the physical environment in this room
0:00:48	and the next example it's the intelligent during system
0:00:53	and this other tier was trying to teach college steering is to use computer to
0:01:00	solve complex problems
0:01:02	so as we can see the natural language dialogue between in those environments are highly
0:01:10	related to the environment it they frequently referred to the objects or events
0:01:15	that happen in the environment
0:01:18	but here is an example from the
0:01:22	tutorial dialogue which is about java programming
0:01:28	in each a tutorial session there is a human tutor and the human student this
0:01:34	tutor was trying to teach this student on and java programming as we can see
0:01:38	everything here the dialogue is a it's related to the content of the java code
0:01:44	say the they talk about the objects
0:01:48	in the java code
0:01:50	so to build a intelligent tutoring system to understand the dialogue of the user we
0:01:56	have to understand
0:01:58	the dialogue with in this environment including
0:02:01	interpret the referring expressions
0:02:04	so
0:02:05	the problem is defined as
0:02:07	given a referring expression which it is a sequence of a little worse or tokens
0:02:13	and the an environment
0:02:15	in this case with simplified the environment as a set of a objects so the
0:02:20	goal here is to find the most compatible
0:02:25	object for this referring expression
0:02:29	though for the rest part of this card i'll introduce the corpus we used and
0:02:35	the challenges and related words solution experiments than final a future work
0:02:42	though the to the corpus we used is from
0:02:47	is tutorial that all it's a set of all started dialogues from a java programming
0:02:53	though those dialogs are between human you're in humans do you need
0:02:58	here's the interface how we collect the data which is eclipse plug-in so this plugin
0:03:05	will lead you want your in the in the students to work remotely like in
0:03:10	different rooms
0:03:12	just like using google dot so whenever the steering and
0:03:15	and it the code that you're will see it
0:03:17	and they can also see and text message to each other
0:03:20	i within this
0:03:22	this tool
0:03:23	so we class of the dialogue between them and
0:03:26	all of the editing behaviors
0:03:28	so
0:03:31	at it
0:03:31	the tutorial dialogue is mostly on introductory
0:03:36	programming in java programming
0:03:38	which involves creating traversing
0:03:41	and modifying parallel
0:03:42	a race this out was collected in two thousand seven which includes forty five two
0:03:48	recessions
0:03:49	almost five thousand our results in total for each session a like last about one
0:03:55	hour which has a on average one hundred and eight
0:04:00	our races
0:04:02	though they are some challenges to do there were a reference resolution in such a
0:04:08	a setting
0:04:10	the easy cases like when the user refer to something in the java code only
0:04:15	use the name the proper name
0:04:17	if you
0:04:18	intuitively if you're we can just to compare
0:04:22	the screen
0:04:23	from the object and from the referring expression to see whether they match one out
0:04:28	but this only account about a third of or
0:04:30	all the cases
0:04:32	it could be even it could be harder which means the user refers to something
0:04:37	in the java code only use the attributes
0:04:40	not the name
0:04:41	like for the two dimensional array the array
0:04:45	and it could be even harder they refer to something that
0:04:49	are not properly defined
0:04:51	channel a concept or
0:04:55	which could be just a piece of code
0:04:57	i for example here
0:05:01	you could apply and use that cone if you want it
0:05:05	so back alone is just the random
0:05:08	line of code difficult here so
0:05:12	those are the three challenges or
0:05:14	actually two
0:05:15	and the
0:05:16	the last one is the number of objects in the java code could be very
0:05:21	large which include the map the parables objects or any piece of a co and
0:05:27	is dynamic because as the programming goes
0:05:32	there could be objects removed from the vocal or introduced
0:05:37	so
0:05:38	that's it
0:05:39	and
0:05:40	then i'll talk about some closely related to prepare for
0:05:45	how people
0:05:46	like to do with this talented before like the first one if that either something
0:05:51	and
0:05:52	paper they work on reference resolution
0:05:56	for a dialogues from the collaborative game which is called the at n-gram in this
0:06:01	game there are seven objects and they are two players to play this game one
0:06:08	is the instructor the other one
0:06:10	well apply the instruction from the instructor to the to manipulate those objects
0:06:15	so the used dialogue his rate and have his rate which are for dialogue his
0:06:20	rate is
0:06:21	any object that were mentioned
0:06:23	recently or from the beginning of the dialogue
0:06:26	and that have his rate was
0:06:28	any objects
0:06:29	that were manipulated
0:06:31	from the beginning of the task
0:06:34	that's how they do it
0:06:35	and
0:06:36	the next one
0:06:38	is
0:06:39	we can and some fifteen paper
0:06:42	i the used a word as classifier to learn the
0:06:46	a relationship between
0:06:49	referring expression tokens
0:06:51	to
0:06:54	physical attributes
0:06:56	in this setting
0:06:58	the a set of a objects so they use the kind like a co location
0:07:04	information like for a token
0:07:07	they find all of the
0:07:11	the co location co-located attributes are with the they manually comic a match the referring
0:07:17	expression and the referent so the find the co locating does the co location information
0:07:22	between
0:07:24	tokens and attributes
0:07:27	so the use the learner
0:07:28	i like intention
0:07:30	to predict the referent for a new giving referring expression
0:07:35	so in this paper we follow the either a suntan we use
0:07:41	similar dialogue history and the task is very features
0:07:46	so here's an example of a from the corpus
0:07:52	look here the student just a typed a line of code
0:07:56	a rate goes to new
0:07:58	int but well
0:08:00	then
0:08:01	another line of code there are only if that
0:08:05	the minor a look like it is set up correctly now so we can see
0:08:11	here is a relationship
0:08:13	kinda like a
0:08:15	between the behavior
0:08:17	and the
0:08:17	the referring behavior so after that the t is that in the forum what should
0:08:25	you be storing in that ray so is also coming very close so they will
0:08:31	refer to the same thing i go kind like repeatedly locally
0:08:35	so that's why we think this a dialogue history and
0:08:39	task is very are very important
0:08:42	so we use them
0:08:43	the third kind of information when you is semantic information
0:08:47	given the referring expression which is a noun phrase this noun phrase has different segments
0:08:52	used argument could indicate some kind of a attribute i'll the referent for this referring
0:08:58	expression so a we used a
0:09:05	conditional random field to segment and label this referring expression
0:09:09	is to find out
0:09:11	the attribute information it gives so
0:09:18	though
0:09:19	after this is a segmentation and labeling we confine the attribute segments
0:09:26	like in this a referring expression data rate if it's a category
0:09:31	and the two dimension
0:09:33	in the case
0:09:34	the dimension of this ray
0:09:36	so after that we extract the attribute value from each segment
0:09:42	here
0:09:44	and we use at this added to make the attribute vector so this attribute baxter
0:09:49	is that if the south of a attributes that this referring that this referent of
0:09:55	this or referring expression should have
0:09:59	if we do it correctly right
0:10:01	and so after
0:10:05	before starting their reference resolution a task we want to come like a make a
0:10:12	candidate list because the number of objects in the
0:10:18	in the java code could be very large
0:10:21	i because
0:10:22	i
0:10:23	contain like everything you know
0:10:25	so is a very intuitive approach with your first late we use all of the
0:10:32	mission objects so far
0:10:35	from the beginning of the session
0:10:36	and
0:10:38	we include all of the manipulate objects
0:10:41	from the beginning of the session into the candidate list
0:10:44	and the final a we include all of the object that match any attribute of
0:10:50	this
0:10:52	in this mission in this referring expression in this
0:10:56	so the reason
0:10:57	here
0:10:58	to match only one attribute is we don't want to miss any
0:11:02	real referent just a
0:11:05	from mistake in padding the
0:11:08	but semantics so that's how we do the
0:11:11	create the candidate list
0:11:14	and the
0:11:16	here
0:11:17	the reference resolution task is defined as to find the most compatible referent most compatible
0:11:26	object from the candidate list
0:11:29	for this referring expression so
0:11:32	this probability is defined as the output
0:11:35	of a classification function
0:11:37	so for the classification function here
0:11:40	we use the four different kinds of the classifiers to see
0:11:43	how do they work in the setting
0:11:45	we used a logistic regression decision tree
0:11:48	nine but yes and then you're networks
0:11:51	so here
0:11:54	when
0:11:55	we can see the probability
0:11:58	of
0:12:00	referring this given referring expression and
0:12:03	candidate in the candidate list
0:12:05	so we can rank this
0:12:08	probability for all of the candidates and pick the candidate with the highest probability as
0:12:14	the referent
0:12:14	so that's how we
0:12:16	did it
0:12:18	so
0:12:20	we used here
0:12:24	are the features we use the first group is the dialogue history features
0:12:29	which are
0:12:31	when this object
0:12:34	like we're mission
0:12:36	how long ago
0:12:38	was it mention
0:12:40	the second
0:12:41	a group of a features are
0:12:44	the task is very features
0:12:46	like a how long ago was this object
0:12:50	manipulated like a tight
0:12:52	or selected or
0:12:55	kind of this
0:12:56	the third group of a features are
0:13:00	the semantic features like to measure how
0:13:03	the semantics of a
0:13:05	the referring expression match
0:13:07	a given candidate
0:13:10	though
0:13:13	for the experiments we use the
0:13:17	six sessions
0:13:18	the tutorial data tutorial dialogue
0:13:21	and
0:13:22	the contain three hundred sixty four are referring expressions
0:13:27	and that we manually
0:13:29	label their referendum from the java code
0:13:32	and
0:13:34	we had two annotators
0:13:35	and
0:13:37	the
0:13:38	we got a cap of a your voice six five
0:13:42	and we used six fold cross validation which is basically take one session out in
0:13:49	do the training was
0:13:50	there are the other five sessions in the test on the
0:13:53	the
0:13:55	the last one
0:13:59	two
0:14:01	evaluate our approach we compare with two baseline models the first one is you know
0:14:07	baseline
0:14:08	model they use the dialogue his rate and task is rate in their in their
0:14:14	task
0:14:15	in their approach
0:14:16	so to make it fair
0:14:19	we and the handcrafted lexicon
0:14:23	to provide some
0:14:25	semantic
0:14:26	information for this model
0:14:29	the second baseline models the content and baseline model
0:14:33	because it was
0:14:36	weakly supervised approach
0:14:38	and in
0:14:39	dead-end perform the river
0:14:41	reference resolution
0:14:43	in a dialog setting so
0:14:45	to make it fair we add
0:14:47	the dialogue history and task he three features to this approach
0:14:53	after that you're are the
0:14:56	the results
0:14:58	we got
0:14:58	as we can the our approach got
0:15:02	a higher
0:15:04	our accuracy on the reference resolution have the reason why is higher
0:15:10	is
0:15:11	is the
0:15:12	the semantics wheeler using the conditional random fields which has a higher accuracy on the
0:15:18	semantics
0:15:20	though
0:15:22	actually there are two groups for the referring fashion
0:15:27	for the reference resolution task because
0:15:29	some of the are referring expressions
0:15:31	contain some semantic information
0:15:34	the estimate are indicates it's
0:15:36	and some of them are just the
0:15:38	products
0:15:39	which does not have semantic
0:15:43	information in it
0:15:46	our so our work here
0:15:49	the contribution here is a basic is mostly on the hour of reference resolution for
0:15:55	those referring expression that contain semantic information
0:16:02	and
0:16:03	so to see a
0:16:07	approach could work given the better
0:16:09	semantic information so we test and using gold standard semantic labels which are made manually
0:16:17	so
0:16:18	here begins the
0:16:20	using the goal in our semantics to run the same approach again we got
0:16:26	a higher
0:16:28	accuracy
0:16:30	though
0:16:31	this means the semantic information here is very important in doing this reference resolution task
0:16:39	and but is do you like a
0:16:44	there are still room to improve because
0:16:47	the human agreement
0:16:49	are
0:16:51	like is eighty five percent which is a lot higher than the approach that the
0:16:58	remote from the approach
0:17:00	as
0:17:02	but we did for the future work
0:17:05	i think it will be promising to consider the structure of a
0:17:10	of the dialogue
0:17:12	and
0:17:14	also an unsupervised or weakly supervised approach will be battery also very interesting it doesn't
0:17:22	require much annotation
0:17:25	we're
0:17:27	that's it
0:17:29	and want to
0:17:31	thank our colleagues or their input
0:17:34	and
0:17:35	thank our sponsors
0:17:39	in q
0:18:10	i'm your repeat request so you were saying i we have different problem approaches for
0:18:14	a referring expression
0:18:17	like
0:18:18	as the pronoun and the non-frontal
0:18:20	right
0:18:21	yes
0:18:23	the difference here is only on the semantic information because we
0:18:28	this
0:18:28	the conch the main contribution of this work
0:18:30	is employing the
0:18:33	semantic information from a referring expression but problems they are pretty simple we don't have
0:18:39	much information from it so we can like run this
0:18:44	this model this approach
0:18:46	by splitting this
0:18:48	the set of referring expressions they are kind looks similar but
0:18:53	yes i think we will consider this when we really being the entire interview system
0:18:59	thank you
0:19:28	yes
0:19:29	the eye gaze would definitely give us more information
0:19:33	like when we do the reference resolution
0:19:35	it's
0:19:36	come back into two like a sum and assumption here
0:19:40	they won't look at the object when they refer to it
0:19:43	so that
0:19:45	could be
0:19:46	another feature directly added to this approach
0:19:49	or maybe there will be some more
0:19:53	sophisticated way to use this kind of information
0:19:58	thank you
0:20:07	the mouse cursor
0:20:10	actually we use the
0:20:13	the selection
0:20:15	which is
0:20:16	the student my flat
0:20:18	this part of a coding task how do they look like or
0:20:23	as a question about it
0:20:25	kind like a hard
0:20:27	one case of a using the mouse or
0:20:29	the cursor
0:20:32	yes but that's definitely and also very interesting
0:20:37	information to consider this case
0:20:52	a well
0:20:55	actually i haven't had a very deep consideration on this
0:21:00	i just
0:21:01	if you like a
0:21:03	in different
0:21:05	like
0:21:06	additions
0:21:08	of a
0:21:09	the discourse structure this could give some
0:21:12	interesting information on determining
0:21:15	like duh the referent
0:21:22	the details

Reference Resolution in Situated Dialogue with Learned Semantics

Oral Session 5: Semantics: Learning and Inference

Xiaolong Li and Kristy Boyer