Speech Transcript - Interactive Factual Knowledge Learning in Dialogues

0:00:17	i speaker is
0:00:20	second
0:00:22	so
0:00:24	who will talk about interaction into that i still not only learning in i
0:00:50	hello everyone
0:00:52	and we only one
0:00:53	hello everyone
0:00:55	so mining station with an hour and i'm going to present the work like noninteractive
0:01:00	learning of actual nondigit model
0:01:02	and is a joint work lead to be you should leave one and the ensemble
0:01:07	so a list also human conversation is always knowledge-driven and human off and like most
0:01:14	the time ground facts to generate meaningful and intelligent responses
0:01:20	so do batteries on actually in recent years many words once a mini course has
0:01:28	inspired to walk on the particular area of knowledge from the decomposition modeling
0:01:33	so here to prominent works in this topic so one the first work is actually
0:01:40	upper extends the sickness to seek this model where the a response generation is conditioned
0:01:46	on the
0:01:48	pointing squandered twenty seven past selected from a knowledge base apart from the context to
0:01:54	make their lander from the conversation history
0:01:58	and the sake in the one is about the response a ranking model for right
0:02:04	of its dialogue systems
0:02:06	where they actually used a fact encoder and
0:02:12	combine it with their shared utterance and whistle response in order to generate some prediction
0:02:17	scores
0:02:18	so that usually these existing works is like the work so it fix knowledge bases
0:02:23	the knowledge bases so as we don't know like they are highly incomplete
0:02:28	and the knowledge is just one group over time during the conversation
0:02:33	so how an agent can learn new knowledge in the conversation processes with this two
0:02:38	major an approach that we can follow one is the passive learning approach that is
0:02:43	the learning by extracting information from we corpus or like past conversation history
0:02:51	and interactive learning approaches learning to interacting multiple dialogues for that is our focus in
0:02:55	this paper
0:02:57	so there's have an example here like the human law incentive original it in a
0:03:01	lifelong manner so for example user too large that spock home is that the capital
0:03:07	of sweden from user one and then leverages the knowledge in that now another station
0:03:11	to recall made user t v is the stockholm
0:03:16	so
0:03:17	this kind of knowledge learning happens it want to know what's where and with you
0:03:20	know multiuser environment setup
0:03:23	so based on this like idea actually we propose a new but i'd be more
0:03:29	lifelong interactive learning and inference
0:03:32	so the motivation is actually knowledge is not enough for the it in has to
0:03:37	learn the semantics continuously otherwise
0:03:40	in that it does not learn how to realise that knowledge in downstream applications
0:03:45	so the better team actually consist of three states and that actually the dependent on
0:03:53	each other so the first is formulate an executed very specific interests in to infer
0:03:59	a strategy that image processing and interactive actions
0:04:03	and then
0:04:04	execute after executing state easy they'd alarms the interaction but as that is gain some
0:04:10	knowledgeable
0:04:12	deciding what to ask and when to us
0:04:14	and then
0:04:15	back to our knowledge is actually a whereas in the current and future interest and
0:04:21	inference process
0:04:24	so in this work our focus is to develop an system that and lower new
0:04:31	knowledge effectively
0:04:32	from user
0:04:34	when the system is unable to answer the user's wh portion for example if the
0:04:38	user task like to annoy
0:04:40	which country boston is look at that
0:04:42	so than set may not be directly present in the knowledge base but the system
0:04:47	can ask some
0:04:49	after some supporting facts from the user to be tied and so
0:04:54	so it but with the facts that and we expect to both like hit relation
0:04:59	and they're like boston is the hidden to do you listen is look at england
0:05:02	three and with is that in entity
0:05:05	and here we focus on developing that y in g in this study
0:05:09	a building a full fledged dialogue system that also lines
0:05:15	tuning on portions and conversation like semantic question other stuff is like which task
0:05:21	and i
0:05:23	we leave it in the future work
0:05:25	so the challenges in developing this kind of this database you need
0:05:30	not knowledge acquisition and learning model that should
0:05:33	be a train that should be learned simultaneously and set it may not be present
0:05:39	in the kb but can be inferred so the
0:05:43	it's in has a lot of somebody's any probabilities
0:05:47	and
0:05:49	by aligning could easily over existing techniques also allows the semantics of the knowledge that
0:05:54	and replace models
0:05:56	conversational applications
0:05:58	so at it usually is like a group in one line intelligence ripples query relation
0:06:05	and in to be maybe on one because considering the open and in is all
0:06:10	the conversation new concepts and relations that you know that i
0:06:14	and then set or may not be present they can be so the it's and
0:06:18	has to learn how to reject
0:06:20	that is
0:06:22	see i don't know more and more confident enough
0:06:25	so the formally the problem can be formulated like this would be even enables the
0:06:30	query relation so we build that structured query
0:06:35	so the goal is two fold against fading the user query or in this thing
0:06:38	that weighted we on it can answer
0:06:41	you this direct answer is believed not exist in the kb
0:06:44	and landing or a putting some knowledge in the process that and we live it
0:06:48	is in the future like interaction and are also inference data
0:06:55	and we probably thing we should pay for creating a local queries basically where all
0:07:00	tickle one is at the destination unknown to the query
0:07:04	and we'll been
0:07:05	one queries is either of them can be on
0:07:09	so far that we propose an inching for one clean one for an interactive an
0:07:12	additional that in short silk
0:07:15	so the main idea of this system is tree where we use a to convert
0:07:20	the open what we're using the proposed one creating by equating some supporting facts
0:07:26	the user and so that it actually with during the conversation process is called supporting
0:07:32	and then inferring that variance or by realizing the were supporting facts an existing fess
0:07:39	in the kb
0:07:40	so the inference process is basically like this so
0:07:46	actually use this each in td that is present in the kb to form we
0:07:50	can do to triple
0:07:52	and then a it discourse each triple and chooses the indeed be with maximum score
0:08:00	as an so the creating like the confidence a more logical reasoning process
0:08:05	so it's can see stop like three conformance by knowledge base that stores the knowledge
0:08:10	that need to get started it want to see what time
0:08:13	and then we have interaction model that executives the skip one that is decides
0:08:19	when to ask and what to ask basically
0:08:22	unlike inference model that performs the inference
0:08:25	that or it can basically not the semantics
0:08:28	the expert knowledge
0:08:30	so less an example here sort on the left hand side we show the interfacing
0:08:35	all the silk system with the variational components
0:08:39	and on the right-hand side we show some things that example interaction all where a
0:08:46	system lance some something fast
0:08:49	so for example if user asking what country boston is located then submits a i
0:08:54	don't know why look at it can't remains can provide mean
0:08:58	an example so it is asking for a clue for that relation to that of
0:09:02	the semantics of the relation
0:09:03	and also you the entreaties unknown suppose it does not know
0:09:08	what is faster
0:09:09	it can ask for some additional fast
0:09:12	from the user
0:09:14	a two norm or able to the indian the semantics before it goes
0:09:19	for the inference
0:09:21	so as you can reduce it is not a teacher student learning setup because
0:09:25	the
0:09:26	like user does user may not know the actual and several so he can provide
0:09:31	the exact evaluation but again it was found related facts from weeks
0:09:36	the system has to learn
0:09:38	and be like that so
0:09:40	so the inference model is
0:09:44	like constructed based on the knowledge base completion problem
0:09:49	so basic idea is to infer the new x one existing fast in the kb
0:09:56	so the k v c actually explore as well as an option
0:10:00	so that is like so that so it is not directly applicable to the contour
0:10:04	knowledge learning in conversation
0:10:06	because it can only handle queries with no one relation and in today's and also
0:10:12	does not do any rejection
0:10:13	so where we actually remove social as well as i'm probably and follows the action
0:10:18	in the inference
0:10:20	so we use then you'll only just completion models with signaling so you need models
0:10:25	can be used
0:10:27	to solve this problem was specially adopt
0:10:30	the model
0:10:32	the idea of the model is like this so we actually convert each in detail
0:10:37	in section and
0:10:39	the like introduced in the relation in a candidate realistically into a like one hot
0:10:46	encoding vectors and then we'll on the remaining okay "'cause" for
0:10:51	indeed en relation and then rescore the triple based on some by linear scoring function
0:10:57	as we can should see the we going
0:10:59	and so does the overall model and we basically training with minimizing the maximizing the
0:11:04	ranking loss
0:11:05	so it actually maximize the positive triples and minimize the
0:11:10	a score of production rules
0:11:12	this because some option where here we use
0:11:17	so how to do rejection in the kb inference so
0:11:21	the same propose that a short above are basically that's forced
0:11:26	i in p d and relations basically prediction threshold and is continuously updated over time
0:11:33	so is computed like this so the facial for an entity or relation is basically
0:11:40	that original means course of triple involving positive integers and mean score or triple since
0:11:45	looking at it is
0:11:47	that belongs to a particular like hal initially very tuples testing probably
0:11:53	that part indeed you're relation so this basically validation body doubles it is basically a
0:11:59	like you just any minus top also q is that we at trying to be
0:12:03	a hit or maybe
0:12:04	and fifty plus is that says set for that gradient minuses the negative instances
0:12:10	for the practically
0:12:13	so
0:12:14	deciding the interaction
0:12:16	step is to be sufficiently so
0:12:18	as you all know like silk aspect we're supporting fast
0:12:21	tool and the embedding or entity and relation but use that can only rule high
0:12:26	very few sub for supporting perspire session which may not be subfunction line would be
0:12:31	meetings
0:12:32	and also a reading too many supporting taxes no into the user and is also
0:12:37	and i guess at even the model has already large
0:12:40	to do resenting only a given entity and relation
0:12:45	apart from that we also need submachine legal validation dataset to lower the thresholds
0:12:51	so
0:12:52	the min strategies ask for supporting facts for
0:12:56	no one integers and relation that is not for who is the system is not
0:13:01	confident enough because done on one for which you have to act with some first
0:13:06	one on them
0:13:08	so for which we propose the performance bar for those for the power from in
0:13:12	statistics of the
0:13:15	inference module basically over time
0:13:18	so where no
0:13:19	but and b are basically d north average a moderate so that is the mean
0:13:24	reciprocal rank achieved way they are the interaction inference model
0:13:29	and that is evaluated on the validation credited to sit at a sample from a
0:13:33	knowledge base
0:13:34	as at each dialogue station like this
0:13:39	still basically detects the second order cone like percent of the query and it is
0:13:44	a relation based on the immoral score so basically these basically shows you but if
0:13:50	you didn't in tdoa relation set and relations that for the nist dialogue session
0:13:55	so that means the one what set of entities and relation the model is not
0:13:59	are following very well
0:14:01	and then ask user for supporting fast eve the query and at all relations will
0:14:07	also die the diffusion state or is on the second so that is the basics
0:14:12	that is
0:14:12	so putting it all together so we have three states so the knowledge acquisition step
0:14:17	where
0:14:18	the system interacts with the user quicker supporting facts
0:14:21	and then knowledge of it sounded fess where we store
0:14:25	the supporting facing the knowledge base and also modeled aim in terms of training and
0:14:30	validation triple
0:14:32	and then inference model and one for a bit fess where we in like sam
0:14:37	illicit of training data and in addition to make a data model and also a
0:14:45	bit apart from as and like special of course
0:14:48	and this actually happens bar in one session and the proposals on a lifelong when
0:14:55	in multiple in with interaction with multiple users
0:14:59	so the overall evaluation setup is like this so as to outsource this evaluation as
0:15:04	a long is very difficult to conduct
0:15:06	i
0:15:07	also very time consuming for this is a specific set of because it needs point
0:15:12	in was introduction to the user
0:15:14	and also is not necessary given we are like
0:15:19	deal with the structured query
0:15:21	which is basically an as i mentioned but so we don't have a proper dataset
0:15:28	collected
0:15:30	for
0:15:31	like real interactions and not only that like static that does it will not work
0:15:35	because it has to be evaluated in a steaming weight in want investment
0:15:41	so we created a simulated program basically that we
0:15:46	call similar to do that actually to effect with this and it has to
0:15:50	components the knowledge base
0:15:52	so for task setting portions that is asked way so that is to provide the
0:15:56	supporting fast
0:15:57	and the query dataset from utt issues query to the system
0:16:02	and we have to convert it to well known knowledge base dataset like wordnet and
0:16:07	now
0:16:08	and we created a large table store out of that and then split it into
0:16:14	that users knowledge base and the base kb also and equating dataset to for evaluation
0:16:22	so the be a scary can be regarded as the initial maybe when the system
0:16:26	is deployed and begins
0:16:28	the knowledge learning when interacting with the user
0:16:32	so the overall evaluation setup is
0:16:34	based on to face is the initial training phase where we
0:16:38	actually in and initiate a inference model
0:16:44	only on like norm query lutions an interface that is present in the base kb
0:16:48	that is the initial kb that the system has been deployed we
0:16:52	and then online training and evaluation face ready to interest with the simulated user and
0:16:58	actually a supporting facts in the process and access the query
0:17:03	and the did the
0:17:05	d q actually
0:17:07	consist of both on
0:17:09	queries involving with more than a known relations and entities
0:17:13	so these shows the overall statistics of the data after the conversants the details of
0:17:19	the details of the
0:17:21	creation process in this in the paper
0:17:25	so we have almost one heard relations that we mean and basically by randomly deleting
0:17:31	the triples in the original graph
0:17:34	and also this the statistics of the unknown in tedious and relations over the case
0:17:41	where this
0:17:42	it's clings would be
0:17:43	and is there is no existing works that apart from the same fast so we
0:17:48	basically did some evolution study here
0:17:51	so we propose two kinds of variance of the model one is that spatial variance
0:17:55	and the agenda-based on the
0:17:57	that doesn't sampling strategy
0:17:59	so the facial variances of four is introduce a change its articles in a only
0:18:04	need to take spatial
0:18:05	to learn the
0:18:08	as a prediction threshold
0:18:09	and then relation special variance and the mean fiddler those two and the max threshold
0:18:15	and in dataset sampling strategy we have
0:18:19	like the partition of where we actually the inference one and only we triples involving
0:18:25	gradient it is and of the setup we train we really nation triples
0:18:30	here is the
0:18:32	training with a relation and entities
0:18:35	so this overall far from a statistics all if it's of audience
0:18:41	that is mean thirty long so humorous question set that it to one and he
0:18:45	said that case
0:18:47	so he said that it one means
0:18:50	the how often the contact
0:18:52	the answer a like one p denotes a discarding fifty million features and install then
0:18:59	how often the true answer exist in top ten pretty
0:19:03	so as you can see the max
0:19:05	do you gbd a single e x is the overall performance
0:19:09	the car noises little bit lower basically how well
0:19:14	because it's very hard problem and you need some
0:19:17	chance far off knowledge or early in the processes that will work on currently
0:19:23	and specially for a
0:19:25	near basically it is but little or than word and because the label nears
0:19:30	cost is k often stop lot of sound field relational organisation means the relations for
0:19:36	reason the number of pupils as relays so it is not
0:19:40	is very hard to learn proper semantics for them
0:19:46	so if we consider the detection performance so we actually evaluated based on the probability
0:19:52	of predicting an alpha given difference so existing the kb
0:19:56	and probability of rejecting there is a given then so does not exist in q
0:20:00	v and we want
0:20:03	should be high basically
0:20:05	so as you can see the max
0:20:07	threshold btr
0:20:10	partition basically
0:20:12	five at all based on what was it is the indy d facial and relations
0:20:16	are scored one
0:20:17	i'm showing some order to study z prediction and rejection behaviour
0:20:22	to study the effect on use any direction so we actually did experiments a high
0:20:27	paying the number of clues and three d face activated by the system
0:20:32	per station over time and so if i want their one point one please so
0:20:38	if we used a number obviously fashionable forcing be this in general especially for word
0:20:43	right
0:20:44	that shows like usually fess are much more important than lose
0:20:49	clues that speakers each
0:20:50	because the relation maybe
0:20:54	be are in future also quite frequently but
0:20:58	the comparatively introduce
0:21:00	will be with the lowest frequency the gradient is if you has been austin past
0:21:05	and we also compare the horse and very desirable the for problems but a four
0:21:10	and the performance drops in that case that shows like
0:21:16	the landing
0:21:17	the apart from the statistics over time leading the knowledge any glazing in interaction
0:21:22	but
0:21:24	indeed close
0:21:26	so this shows the overall performance improvement over time given the model has made some
0:21:31	prediction
0:21:32	and as we can see
0:21:34	like in the streaming very to the set like from fifty percent hundred percent these
0:21:39	results of safety
0:21:42	the performance of rollie was spatially for open one where it is because in open
0:21:46	one
0:21:47	it always
0:21:47	a gathers o supporting facts
0:21:50	where is in overall performance also improves efficiently considering he's that the one
0:21:56	because it also lots of facts
0:21:58	for a known
0:21:59	in this annotation
0:22:01	for was it is not from
0:22:03	well
0:22:05	so to someone like three proposed a condom lost knowledge i need in for dialogue
0:22:10	systems
0:22:11	and evaluated on the simulated in
0:22:14	user setting basically
0:22:16	and also some promising results
0:22:18	and this kind of system can be useful for call information-seeking on position or a
0:22:25	common a system that
0:22:26	deals with real world fax occipital so it's getting any and i will always that
0:22:31	can be input in terms of triples and required is any basically
0:22:38	can be
0:22:39	maybe they can use this kind of system apart from that all is around the
0:22:44	conversation morning is another potential use case
0:22:47	and in apparently and also in future working on larry
0:22:51	the mean college london component of the system to believe that will start problem because
0:22:56	when this relation has made it is no more of triples
0:23:00	so it is often hard to learn the semantics properly so to some extent like
0:23:05	if we can get a summit knowledge
0:23:08	from similar relations that we have already learned in the basket again
0:23:12	most of the performance
0:23:14	and also jointly learning
0:23:16	of all the different components with the core lining engine
0:23:20	so for each other
0:23:22	finally collect and some datasets
0:23:24	to train
0:23:27	so time you like if you have any
0:23:35	we have tens or
0:23:37	one of two questions
0:23:43	i one from rows a fine for the talk
0:23:46	and i'm just wondering where you give an example are just write a phrase by
0:23:51	harvard university
0:23:54	you're likely to get from you like this
0:23:59	are there
0:24:02	and i just wondered given it's actually
0:24:05	when you do this learning using no ellipses indicate prove parser
0:24:10	as opposed to just partly comes
0:24:15	is definitely
0:24:16	the semantic parsing and knowledge in any also quite interrelated because one can like influenza
0:24:24	other in terms of alignment
0:24:27	so a semantic parsing is hard actually because as we can is likely say like
0:24:32	be easily us what differences solution also
0:24:35	like in the process and also considering the conversation history and i need a copy
0:24:41	that the use of using their dean about
0:24:44	so currently are working that process but i can be used a segment here
0:24:52	how to like is also an issue and how clean to get to
0:24:56	like both the models to wire missing
0:25:01	but we are so there's a challenging problem
0:25:04	what are working on
0:25:14	i think interesting talk so i'm really i think one of things are averaged thing
0:25:18	about this
0:25:20	work interaction module i know was very simple but design but maybe you can talk
0:25:24	about well your design decisions in what would you want as like a baseline for
0:25:29	that sort of interaction
0:25:31	okay so a fortnight interests no one will design like
0:25:36	a so we actually also be some preliminary study and was in archival bizarrely original
0:25:42	paper where we used a rule based like strategy to learn the
0:25:47	interaction behaviour so it is very specific to the application individually applying the system
0:25:53	so that so you proposed original like
0:25:56	model and then
0:25:58	some example scenario where we then design some simple interest and strategy
0:26:04	so but
0:26:06	it's is based on like specific application you need to think about
0:26:11	a currently we so apparently we actually be with something a finite state automata or
0:26:18	like some if else kind of dialogue flow kind of systems to design the intuition
0:26:25	strategy
0:26:27	and then that is parameterized you can think and we'll aren't that bad i mean
0:26:31	the to like adapted it collection
0:26:35	well hard i
0:26:36	from the knowledge we gain in the interaction process
0:26:39	so that is a kind of still far away to looking at that but obviously
0:26:43	the problem is much more complex and lot of things can be done in future
0:26:47	so like the end of you know that
0:26:50	but the for ten seconds
0:26:55	thank you let's speaker

Interactive Factual Knowledge Learning in Dialogues

Oral Session 1: Policy and Knowledge

Sahisnu Mazumder, Bing Liu, Shuai Wang and Nianzu Ma