Speech Transcript - Cogent: A Generic Dialogue System Shell Based on a Collaborative Problem Solving Model

0:00:16	so i'm presenting the syllable they have to a whole team a people from my
0:00:21	agency shown on and you are here j is not
0:00:28	and this is gonna be a little bit different you "'cause" we're gonna have no
0:00:32	neural networks knock or run with of the pause and no f scores
0:00:38	no numbers
0:00:39	so is gonna be a little difference
0:00:41	so here's a the problem that at that
0:00:45	we are i
0:00:46	the so we start state-of-the-art in dialogue systems actually a couple of you please and
0:00:52	the you know and others have had a similar slide
0:00:57	what we're doing mostly is very simple parsing based on keywords phrases and so on
0:01:04	a regular expressions as one
0:01:08	very simple dialogue models based on either finite state somehow or frame systems with slot
0:01:18	filling its own
0:01:20	engineer for a specific application
0:01:24	and there's
0:01:24	sounds of applications for these
0:01:28	but
0:01:29	every single dialog system is developed for that specific application in which you some cases
0:01:33	here this in get out
0:01:37	modified domain but essentially there's sort of separate dialogue systems they're kind of work together
0:01:41	with a single the interface
0:01:44	but importantly there is no transfer between these domains there is no generic
0:01:52	capability in these systems the transfer from
0:01:56	one domain to another
0:01:59	and as far as the kind of interactions that these other systems allow
0:02:04	there's
0:02:05	no effective the verification or corrections the kind of dialogue that allow is actually very
0:02:12	limited
0:02:15	so here's our position
0:02:17	dialogue is an activity that we can be and should be modeled independently of the
0:02:23	application domain
0:02:26	we i understanding of language to effectively and robustly handle the a broad range of
0:02:32	user utterances that the same
0:02:36	intention can be expressed in so many different ways
0:02:41	added
0:02:42	most of these
0:02:45	finite state based and with simple parsing hubris of data that are sitting in a
0:02:50	day just common
0:02:53	all the somebody's willing to spend years just
0:02:57	encoding what's a regular expressions i suppose
0:03:00	and we also think that the community needs to the frameworks to facilitate the development
0:03:05	of these a complex mixed-initiative systems with very sophisticated back-end recently and i think there's
0:03:12	a fierce of such tools
0:03:15	we see for example in parsing with a stand for the tools or nltk or
0:03:21	other various tools
0:03:23	people adopted them and they started using them and they got better outcomes of that
0:03:29	but in the dialogue maybe we don't have sophisticated enough tools
0:03:34	a tool allows for the for people to a develop such systems
0:03:39	so
0:03:41	as use only the title our model is
0:03:45	based on the collaborative problem solving so what is collaborative problem solving
0:03:50	well when they collaborate what they do they rehabilitate you they developed jointly solutions the
0:03:57	identify and resolve errors problems of the here a kind of the progress as the
0:04:03	task is going on
0:04:07	they jointly perform actions the of course they can negotiate roles
0:04:13	and they learned from one another
0:04:14	at all these things are done through communication right it's not necessarily by language communication
0:04:21	could be gestures it could be other kinds of communication but it is by communication
0:04:26	so we need to
0:04:29	so
0:04:30	our central thesis is that essentially all or at least most of the human machine
0:04:36	language based communication can you model effectively
0:04:40	as collaborative problem solving
0:04:44	so
0:04:45	what does the collected for solving a model in table
0:04:50	so what we need by this is the is that we need to model the
0:04:54	shared initial space between the two agents or some people actually have a
0:05:01	i and the something about
0:05:04	modified agents a sort of
0:05:07	once i
0:05:09	agent dialogue here we just limit ourselves to two but
0:05:13	even with multiple the same response applied
0:05:16	so what is this and intersentential spaceport kind of objects that we are dealing with
0:05:22	these are particles solutions
0:05:24	and understanding common ground session that strange
0:05:29	and all this shared understanding
0:05:32	arises from communication we need to communicate and agree on things and so on
0:05:39	one page counts
0:05:40	create a collaborative goal or as solution there has you to a pursue something together
0:05:48	obviously a selection japan like to go without
0:05:51	the other person
0:05:53	so this is i pictures taken from a paper i data alone and a couple
0:05:59	other of my calling problem
0:06:01	two thousand two
0:06:03	the models place the sort of the this case of tasks in this model in
0:06:10	four different areas communicative interaction a collaborative problem solving a problem solving a individual problem
0:06:19	solving actually
0:06:20	i don't i did of course might interest in this talk is just about that
0:06:24	solve the problem solving here
0:06:26	which really can look at the object in there really reflect the problem solving actually
0:06:32	the same kind of thing
0:06:33	except that their properties
0:06:35	so
0:06:37	the central thesis that we have that in the two thousand every two thousand and
0:06:43	it wasn't just ask other people have the same idea is that at that level
0:06:48	when you can a reason in a domain independent represent things in the domain independent
0:06:54	way
0:06:55	but this has never been rated are properly and we also didn't we problems today
0:07:02	we have a larger prototype we never really did it so here today i'm announcing
0:07:08	that we know that
0:07:10	and
0:07:12	this
0:07:13	this architecture would be familiar to all of you it doesn't look very different from
0:07:17	other things that we which is so far
0:07:19	so we have natural understanding there's lexicon ontology
0:07:23	the dialogue management which is really the class problem solving agent at that we have
0:07:28	it is in the centre
0:07:29	there's a the backend problem solving or okay here
0:07:33	a behavioral agent there's generation so this doesn't look very a different from other systems
0:07:41	the parts that are in colour or the components of cogent
0:07:46	of is domainindependent shall right so by itself people look at that you're not gonna
0:07:51	have a dialogue system just by that having that but you can have the this
0:07:56	dialogue system i dialogue system by adding to that
0:08:01	the behavior spectrum domain specific and not to mention that language generation and of course
0:08:07	generation you could press all have some higher level but mainly depend generation components but
0:08:14	we don't have it
0:08:16	so a lot of people can do sort of in domain you an iteration
0:08:22	and
0:08:23	so
0:08:25	we also to do i'm just gonna talk a little bit about that components there
0:08:29	so the natural language understanding the workforce of everything that we didn't for the last
0:08:35	twenty some years as in the tricks parser
0:08:39	it's a d
0:08:45	the that is too sparse to use a very representation of the meaning of every
0:08:50	a sentence it has a very sure principle ontology it has a very large lexicon
0:08:57	some of it or ten thousand maybe more
0:09:00	are handled lexical entries we it stand by learning from a word that but a
0:09:07	session we derive automatically so freebase for example for we driver automatically the roles that
0:09:13	have the they are from definitions
0:09:16	it's and so on
0:09:19	and
0:09:22	i'm not gonna talk about to make too many details but it is available online
0:09:26	and you can actually check it there's a there's a web service for the basic
0:09:30	parser and or number of variations of the parser as well
0:09:34	the output
0:09:37	positions
0:09:51	i don't see that
0:09:53	data
0:09:54	so i don't think this is actually visible but
0:09:58	so this is the
0:10:00	web interface i just put of sensors earlier something that it came up earlier i
0:10:06	need a hotel in the centre of calibration
0:10:09	and that's
0:10:10	what a parse multiply and you can see that
0:10:15	everything so there's a speech act at all
0:10:18	every single more represented here has a type in the ontology
0:10:24	so for hotel accommodation for needed one is one
0:10:28	can the residual graphic region
0:10:31	i even with the british spelling and their got that right
0:10:37	and if you look for example at the next one i prefer very nice hotels
0:10:42	when you can see that before is also one just like need which is something
0:10:47	that you probably want to you
0:10:51	and you can see how adjectives have
0:10:55	very interesting types here the space here is basically a value on a scale of
0:11:00	expressiveness as it for and so on
0:11:03	so you get very rich representation
0:11:14	well
0:11:15	there's an additional thing is here the dealing with reference resolution ellipsis processing ontology mapping
0:11:21	i'm not gonna talk too much about this
0:11:25	i one is the here is that the there's conventional speech act identification still sometimes
0:11:29	you can ask a question by making socially an assertion or you can you can
0:11:36	make an assertion by asking a question for making a request asking it a question
0:11:41	so there's conventional mapping between the surface speech act and the user speech act but
0:11:49	you just really
0:11:51	so not to do this yes agent
0:11:54	so a
0:11:56	essentially the output of all these national chance any sizes a feed into the a
0:12:01	collaborative problem solving agent and what it does is it provides a domain and model
0:12:06	communication adaptable to new domains
0:12:10	on
0:12:11	what side it just
0:12:13	what really could be called just intention recognition
0:12:16	so there's communicated at coming in from user utterance you want i understand would be
0:12:22	fashion of the user is i and we call that can also be guy
0:12:28	and obviously on the other side adjusting for someone to the specs much time on
0:12:32	that
0:12:34	if the system itself once to communicate to the user it will do that is
0:12:39	actually creating a collaborative problem solving task which can get sense to the generation component
0:12:45	and eventually we'll get into like that
0:12:50	so this section does that and essentially maintain the quality of a state
0:12:57	which
0:12:58	all these acts together essentially drive the a conversational structure so that's why it is
0:13:03	a dialogue model
0:13:06	and again going to repeat myself here but this is primes good idea that there
0:13:09	is in the in domain and the semantics of language that supports
0:13:14	reasoning about intentions
0:13:18	so there but
0:13:20	there is attention here between the desire for domain independent processing and the need for
0:13:26	very affordable a specific processing so
0:13:29	understanding detection of user is almost always it possible to do in just the domain
0:13:35	independent way so the way we deal with this problem is that essentially the collaborative
0:13:41	problem solving agent should be understanding of the user intention is a hypothesis
0:13:48	and then this is over to the behavioral agent which concludes sort of grounding of
0:13:52	all objects and is actually trying to figure out does this make any sense in
0:13:57	this particular state of the task does this makes test and if so then that
0:14:04	i guess
0:14:05	committed as a show if it's a goal then the system can mislead as a
0:14:12	as a shared real but if not there can be clarification so on going on
0:14:19	so is actually the way this is done based on the previous evaluate commit a
0:14:24	little
0:14:25	so the collaborative problem solving agent will figure out a probably problem solving a which
0:14:32	explains the user utterance
0:14:35	would send an evaluation and evaluate at the behavioral agent
0:14:40	and the behavioral agent agree use it will send back an acceptable and only and
0:14:45	we have a commit to the goal of the shared
0:14:51	and this is the same way that we're dealing with a request proposals of those
0:14:57	are questions as well
0:15:00	if the va
0:15:03	doesn't
0:15:04	a light
0:15:05	at the evaluation there's many different that there are several different ways it can handle
0:15:10	with this one is just say a rejection actually i think this should be unacceptable
0:15:15	but anyway
0:15:16	but
0:15:17	we use the like to do this and it can actually give a release
0:15:23	it's a horizontal we don't have enough box for corporate law
0:15:28	it is also possible to propose alternative way and together that for a to the
0:15:34	resulting
0:15:36	i'm gonna skip on aspect is just models
0:15:39	so in the paper is a very detailed description of the various a quite a
0:15:44	problem solving a
0:15:46	so i'm not gonna going to the detail so there's a number of them have
0:15:50	to deal with gold so we cannot do not select d for a goal if
0:15:55	you don't wanna deal with the right now you can completely abandon the goal or
0:15:59	we can really easy to release it means that it's completed
0:16:02	satisfactorily more or not
0:16:06	and there's a there's a bunch back support knowledge in make an assertion that is
0:16:10	actually once is committed to that means of the agent a now believe whatever you
0:16:18	don't the whatever that whenever the human user in intense corpus and the belief
0:16:25	this question is a ask even task w a just to what
0:16:30	questions
0:16:31	you can see in a number of examples that
0:16:33	quite complicated example these are actual examples from system you
0:16:38	including something like doesn't amount of sorely
0:16:41	at the conditional you
0:16:43	at a one that if we increase the amount of whatever the some other proteins
0:16:47	all
0:16:48	or i wh with choices of the gt wagner propose which are regulated by a
0:16:53	reinstall
0:16:55	so this is all the little and there's a number of access related to the
0:16:58	a problem solving status so again acceptable not an unacceptable are essentially interpretation yes where
0:17:06	the da says i like that i don't like it that goes can be we
0:17:11	use will reject it
0:17:13	they can be failures of execution i answers to questions and execution status which can
0:17:20	be either
0:17:21	done at the very end but it can still it can be also used to
0:17:25	just more progress i'm still working on this
0:17:30	okay well as you one is the u
0:17:35	so
0:17:36	what is mean to add a behavioral agent to actually haven't i was system based
0:17:42	on cogent
0:17:43	so
0:17:45	you can think of the cts access establishes a sort of a protocol was implemented
0:17:49	protocol and any sure that the obligations that these things create
0:17:55	are satisfied
0:17:57	then after that there's nothing else to do essentially there's no requirement for how the
0:18:02	behavioral agent represents intuitively
0:18:05	i think what it's a line system or a very simple database lookup
0:18:09	what kind appended complexity has
0:18:12	how many some agents are out there are a as long as there's a single
0:18:17	interface a single overarching yea everything should be fine
0:18:21	with it has a models alone
0:18:24	there are some related ways of affecting how the natural language understanding works
0:18:30	but is somewhat so you really want to use this and actually
0:18:36	change how the natural language understanding work because it's not good enough you ask the
0:18:42	did you never i'm not reliable
0:18:45	so we have a number of very implement coded based systems in very different domains
0:18:51	very different interactions is
0:18:54	so by duration
0:18:57	that station in an assistant a biologist assist and a bunch of systems that have
0:19:03	to do with the blocks world
0:19:05	more or less
0:19:07	and some others the that are sort of music composition visual storytelling that's creating such
0:19:13	scenarios for making movies essentially with animated characters so with very different domains very different
0:19:20	vocabulary very different interaction style
0:19:25	so i'm not gonna go too much into a into a we have used systems
0:19:29	but one of the reviewers we want to see the by iteration a system
0:19:34	and i could put too much into the paper because it wasn't published and it
0:19:38	still isn't really
0:19:40	but i'm gonna give you a little video of the system and
0:19:45	so these are all systems except for the one that you are represented the other
0:19:50	day all these systems are not develop is people power cogent and they developed on
0:19:56	the role
0:19:57	so let's look at it of a dialogue
0:20:09	providing you understand looks like logical systems like
0:20:15	was there
0:20:19	one is going to be sensor
0:20:21	but the trees are a little bit
0:20:23	the rule machine i don't want the one here
0:20:27	sorry but
0:20:31	alright so here we would have sort of a the dialogue history then is a
0:20:36	idea a system by averages
0:20:39	what you from an implementation and what you what the goal here i want to
0:20:45	find out how you be shown in the
0:20:50	b equal to these two genes
0:20:54	and there's just outline i think it's probably best work
0:20:58	so i'm so what is the goal here i want to find an explanation so
0:21:03	it's a very interesting type of goal of how this happens
0:21:09	and the way the system knows how to provide an answer that time is to
0:21:13	build and what a model of the molecular interactions
0:21:18	and can try to find out
0:21:20	one that you are maybe we which is kind of the source
0:21:25	useless is g the joan i in this particular cells
0:21:30	so
0:21:31	i'm gonna you go your
0:21:39	so the user then asks how does your maybe if we regulate pi okay now
0:21:43	why did they know you can see here about the p eight we hate you
0:21:47	"'cause" they're biologists obviously this is not a system for novices
0:21:51	and what the system does it actually looks also there's a huge array of a
0:21:57	by will just pacific agents
0:21:59	including ones that go look up a ways in a perfect database is
0:22:05	there's one but actually read papers and can we can extract information from the air
0:22:11	so it defines a watermark task between these two
0:22:16	g and it creates a network that the user can use it as a source
0:22:22	of information
0:22:24	so i'm gonna speed up because i know my ties are already right it is
0:22:29	okay
0:22:31	so a and creates a so
0:22:35	i'm just gonna lexical and only because it is below
0:22:39	so not the user creates with the system at i a very specific don't model
0:22:44	of this
0:22:46	the system actually based on what it sees it can suggest additional information based on
0:22:52	what it knows
0:22:54	and the user can look at it and say well okay that
0:22:56	good enough with an actual i know something even more specific than that
0:23:00	and the system comes back you can see here
0:23:03	but
0:23:05	to actually explain
0:23:07	the original question that the user a
0:23:11	and there's more it can actually take this and create a dynamic model about it
0:23:17	can ask questions for example is the monitor for whatever protein high and you can
0:23:23	see all kinds of useful information about so i'll stop here
0:23:44	for
0:23:46	four point recognition we actually don't to a
0:23:49	in the in the air agent in the cccs agent
0:23:53	we don't actually use right no plan recognition i know that i
0:23:58	more me
0:23:59	running when you
0:24:01	understanding dialog
0:24:03	we
0:24:04	for now we don't at high
0:24:11	so
0:24:12	the i you can see some essentially the one where of i've answering this question
0:24:17	is why was why where we successful with this where we're reward before and done
0:24:23	more work before because of this the way we split
0:24:27	what can be done in the domain independent way from what can be done in
0:24:30	a domain independent way
0:24:32	so a lot of the time i is a set in this evaluates commit little
0:24:36	we basically just wrote things over the fast and say well you figure it out
0:24:39	so most of the situational context and in there is not a model of user
0:24:43	modelling in this thing but the were all of this would actually reside right now
0:24:48	in to be a obviously you want at the at this is a level to
0:24:52	have some of it
0:24:53	to be able to do some walk some more reasoning but right now we don't
0:25:10	we don't offer a deterioration that all the teams that have worked on this have
0:25:15	essentially created template case the generation on the role and so we did we don't
0:25:25	provide
0:25:34	no
0:25:35	shortcuts
0:25:37	would be very difficult
0:25:51	well we started with similar goals right it with the collagen there are
0:26:00	actually some of these older papers dealing more with that question about the differences
0:26:07	i
0:26:08	there are some limitations in the collagen model there are some really good features the
0:26:14	colour to model
0:26:15	so i think we can at the same in the same direction but kind of
0:26:20	tackle things a little bit differently but actually i just wanna learn recently that the
0:26:29	the chart for each and others that have put together idea i toolkit
0:26:35	moving in the same direction
0:26:37	although as far as i understand i haven't seen it in practice that their there's
0:26:42	is more task oriented kind of like reading floor
0:26:47	so you know what you know way they can move their expectations as the kind
0:26:52	of reduce their expectations
0:26:57	so i don't know discourse on the slice sliding it was at a
0:27:03	link you can actually download it recommended to use
0:27:07	at least the parser you can actually do much better than what we people do
0:27:10	and if you want to use the whole system will be

Cogent: A Generic Dialogue System Shell Based on a Collaborative Problem Solving Model

Oral Session 5: State Tracking

Lucian Galescu, Choh Man Teng, James Allen, Ian Perera