Speech Transcript - Which aspects of discourse relations are hard to learn? Primitive decomposition for discourse relation classification

0:00:17	the next talk we will be presented by charlie rose
0:00:22	entitled which aspects of discourse relations are hard to learn primitive decomposition for discourse relation
0:00:28	classification
0:00:31	so hi everyone i'm sorry "'cause" i'm going to present a joint work with a
0:00:37	three point
0:00:39	in this work we are interested we are interested in the following question which aspects
0:00:43	of discourse relations are hard to learn
0:00:45	by aspects we mean that are the information and could use it by discourse relations
0:00:50	can be decomposed into a small set of characteristics
0:00:55	that we call primitives
0:00:57	and in this work we implements a primitive decomposition of discourse relations
0:01:02	you know that's will help discourse relations classification
0:01:06	so the global task we are interested in is discourse parsing
0:01:10	which aims at identifying discourse structure
0:01:13	this structure is a composed by semantic and pragmatic links between discourse units
0:01:19	these units can cover text spans of the various sizes
0:01:23	the links are called discourse relations and is relations can be either explicit or implicit
0:01:29	for example in one
0:01:31	we have a contract expectation relation if you we use we use do you
0:01:36	the relation from the penn discourse treebank that we are going to present later
0:01:42	and this is a relation is explicitly marked by to connect you bats
0:01:47	whereas in the second example
0:01:49	we have a reason relation and here we don't have any connective to mark the
0:01:53	relation
0:01:54	it's an implicit relation
0:01:58	so they are several to reason frameworks that a much representing discourse structure among the
0:02:03	most well-known we have rst sdrt and the penn discourse treebank framework
0:02:10	so we have corpora annotated following these various frameworks
0:02:14	but did we have no consensus on the label set some discourse relations
0:02:20	in each from what we have more or less specific relations and coding relations that
0:02:25	do a different levels of granularity for instance the contrast relation from the i-th in
0:02:32	the rt
0:02:33	corresponds to tree relations in the rst
0:02:38	so even if the different label sets
0:02:43	our difference we argue that they include a common range of semantic and pragmatic information
0:02:48	and we wonder if it's possible to find a way to represent discriminant fame information
0:02:54	so discourse relations identification is generally seen as a classification task
0:03:00	and it uses separated between x p c it an interesting relations identification
0:03:07	the second task implicit relations identification is considered as the artists
0:03:13	in fact the results remain quite low on this task i just by the variance
0:03:19	of approaches that have been tried
0:03:22	so we can as close as if at the problem is only about the way
0:03:26	we represent the data
0:03:28	also but wait the task is modeled
0:03:31	so in this work we want to act on the way we model staffed by
0:03:34	splitting it's
0:03:36	into several simpler task
0:03:38	the idea is to decompose the problem and to investigate the reasons of the difficulty
0:03:42	you know a discourse relation identification
0:03:45	so to have several simpler task we decompose the information and coded in by the
0:03:52	relation labels into values for small set of characteristics that we call primitives
0:03:58	to do is we rely on the cognitive approach to coherence relations
0:04:03	which provides a proper provide an inventory of a
0:04:07	dimensions that we could primitives of relations
0:04:11	this infantry is provided we have mappings from the relation of pdtb rst n is
0:04:17	the to each to into primitive values
0:04:21	they hard core primitives which are the original the do the ones in the original
0:04:26	the c r
0:04:29	and additional ones that were introduced to explicit the specificities of the various from also
0:04:35	the mappings
0:04:37	so these mappings can be seen as an interface between the existing frameworks
0:04:44	so in our work we provide an operational mapping between annotated relations two sets of
0:04:51	primitive values and we test the approach and the penn discourse treebank but you do
0:04:56	goal is to extend the approach to other remote state later
0:05:01	so we try to answer the question which primitives the harder to predict by diff
0:05:05	defining a several classification tasks for each primitive
0:05:10	then we do a reverse mapping from the sets of
0:05:14	predicted primitive values to a set of to two were compatible relation labels
0:05:20	and we and with a relation identification system that we want to evaluate
0:05:26	so here at penn discourse treebank hierarchy
0:05:29	it is tree levels representing the different granularities so we have more less specific relations
0:05:36	on the top little the level one we have relations that cold classes
0:05:41	and then we have types at level two and supply such little tree
0:05:45	so we have and labels which are the most specific relations at level tree l
0:05:49	two
0:05:50	and in term at its wines which are underspecified relations they can they have a
0:05:56	relations on the them that's the
0:05:58	but a finer
0:06:00	so we take each pdtb relation and of map it into a set of primitive
0:06:04	values
0:06:06	we have five core primitives that we're going to illustrate the each have two or
0:06:11	three values
0:06:12	plus we ideas and the n s value for an unspecified
0:06:18	it was to treat some cases of on but when in the us ecr mapping
0:06:22	there were several possible value for one primitive
0:06:26	all to treat the case of intermediate labels that were absent from the cc a
0:06:30	mapping
0:06:32	and we have three additional primitives the that are binary conditional a tentative and specificity
0:06:39	so two illustrates the mapping tool krakow primitives are we can secure example of the
0:06:44	contract dictation relation
0:06:48	here for the
0:06:50	from the contents of the first units
0:06:53	we have an expected indication which is that the by a few cost more
0:06:57	because it's more expensive to produce
0:07:00	and in the second units this expectation is denied
0:07:05	in fact the bile sure doesn't possible
0:07:09	so here the mapping of contracts dictation into a primitive values
0:07:14	so because it involves an indication that this or relation it is associated with the
0:07:19	basic operation that is causal
0:07:21	otherwise it would be additive
0:07:24	because it involves a negation the polarity is a set to negative otherwise it would
0:07:30	be sparsity
0:07:32	and we have the value basic for implication although we have here in implication
0:07:38	and the do inflectional the refers to the mm or
0:07:43	the arguments in which the premise of the implication now it's
0:07:49	the all the values i are non basic and eighty
0:07:52	which is not applicable for additive relations
0:07:57	the another primitive another primitive it's got source of coherence which refers to a common
0:08:04	distinction in the literature
0:08:07	we have objective relations which operates at the level of profit of propositional content and
0:08:13	subjective one at that operate at the base to make a speech that
0:08:16	speech act level
0:08:18	sorry
0:08:20	here we have an example of a subjective relation which is justification
0:08:25	here i state that meets easy regan is lying because they found students who said
0:08:30	she gave me
0:08:31	similar
0:08:33	so we have the mapping to of justification into a primitive values its causal positive
0:08:38	and one basic and we have the values subjective
0:08:42	and it remains non specifies for a temporal all the
0:08:46	and temporal all the is eyes free values chronological entrepreneur you couldn't synchronous
0:08:54	so with respect to the penn discourse treebank higher actually all these primitives are not
0:08:59	equal in importance
0:09:01	some of them are able to make distinctions between the top level classes it's the
0:09:06	case for basic operation and polarity for instance basic operation as the value close all
0:09:11	for all relations
0:09:13	and the other contingency class
0:09:15	and i did steve for relations and the other component class
0:09:18	is the same for polarity we have all the compare is and relations that are
0:09:23	negative for polarity
0:09:27	and we have other priorities that makes to that makes label distinctions at lower levels
0:09:32	is able to a tree
0:09:36	so here's tum that we have applied the mapping to each relation in the penn
0:09:40	discourse treebank
0:09:42	and here's the distribution of values for each primitive in the corpus
0:09:48	on the left we have the list of primitives and on the right to the
0:09:52	list of all values and are mixed together
0:09:59	so
0:09:59	for each primitive we define a classification task
0:10:03	we have one and twenty eight thousand pairs of arguments for an hour training set
0:10:09	we use that quite straightforward the actually true for the classification
0:10:15	each argument of the relations e the represented with the interest and sentence encoder which
0:10:22	is a very common for semantics task so each argument is mapped into pre-trained what
0:10:27	invading and then on coded with the by nist and with max pretty putting
0:10:31	and after that we combine the two arguments representation with concatenation a difference and products
0:10:39	we test various settings
0:10:42	we tested various settings we tested various a very sizable an additional layer on top
0:10:49	of the arguments combinations and different regularization values
0:10:56	not so we take the base sitting at a as a best model for each
0:10:59	for each task
0:11:02	and the as a baseline we take a majority classifier
0:11:07	so the results in accuracy and natural f one
0:11:11	for the baseline in blue and at the best model in all range
0:11:16	for each call primitive polarity basic operations so coherence implication or the and temporal order
0:11:25	i
0:11:26	i don't have
0:11:27	for first of all argument pairs in the core and the corpus
0:11:31	in the test in the
0:11:33	that's
0:11:34	we are all primitives that are correctly predicted which is not very good but in
0:11:39	of rage we have at each person primitive that are correctly predicted
0:11:44	we're going to discuss about polarity and basic impression and we said before that
0:11:49	they are the most important primitives respect to the penn discourse treebank higher iq and
0:11:54	the a similar distribution of values where the are comparable
0:11:59	basic operation it has the lowest improvements with respect to the baseline over all the
0:12:04	core primitives
0:12:06	and we i don't see five correctly only seventeen percent of causal relations
0:12:14	and we have better results for polarity
0:12:17	it doesn't greater improvement with respect to the baseline and we have fifty best and
0:12:22	the negative relations that we correctly
0:12:25	like label
0:12:27	source of coherence is the primitive that as the greatest improvement with respect to the
0:12:31	baseline but we have to temper this result because we have less than one person
0:12:36	all subjective relation in our dataset so we need to have only object of relations
0:12:42	and recent for which we have the not specified value so it that's not very
0:12:46	informative
0:12:48	for time for it all the we have a little improvement with respect to the
0:12:52	baseline and this is due to the fact that relations are
0:12:56	i mean method
0:12:57	i mean in table that an unspecified
0:13:02	after that we wanted to evaluate the performance of our systems on predicting discourse relations
0:13:08	so we operate the reverse mapping from the set of predicates that but use for
0:13:13	each primitive to the set to a set of compare to be compressible relation labels
0:13:18	so we start with a set containing all the possible relations at all levels
0:13:24	then we remove the relations that are incompatible with the primitive values that we protected
0:13:29	for instance if the polarity is predicted positive we remove all relations associated with a
0:13:35	negative polarity and we do the same for each primitive
0:13:40	and then we removed you're in time zones information of if the set contains all
0:13:45	the sub types and their insightful or all the types underclass we only keep the
0:13:50	upper level underspecified relation
0:13:53	so we evaluate we have a number of questions we need to measure for a
0:13:59	hierarchical classification we have a and the specifications in
0:14:03	in the devaluation
0:14:06	the
0:14:06	predicate level can be more or less specific than the goal label from p t
0:14:11	v
0:14:12	and we didn't measure for unmentionable classification
0:14:16	in fact we all system can predict a addictions angle relation
0:14:21	so we use or a hierarchical approach precision and recall on the set of all
0:14:27	labels
0:14:30	so for instance
0:14:33	on the on the left if we have in the goal of a one relation
0:14:37	which is expressions as and that's you
0:14:40	and we predict two relations that are finer
0:14:44	we are okay onto labels and the we have to elements that are wrong so
0:14:50	i'll precision is of a zero point five
0:14:53	whereas in the example on the rights if we have two relations in the gold
0:14:59	and
0:15:00	we only predicts one relation which is less specific
0:15:04	we have to only a good labels and we missed some of them so we
0:15:09	have a recall of the run five
0:15:11	so we compare the system
0:15:16	we've the reverse mapping from affected primitives into a set of relations with system with
0:15:24	the direct this was discourse relations classification with no decomposition
0:15:29	into primitives
0:15:30	and we as a measure of we give the accuracy you do hierarchical precision and
0:15:34	recall that we just presented and
0:15:36	again the hierarchical scores but only on the best match between what we predicted and
0:15:43	the pdtb relations
0:15:46	so here we can see that
0:15:49	the system with
0:15:52	for that's it that's the prefix relations with the remote inverse mapping from the predicted
0:15:59	primitives
0:16:00	as lower results on the on the autumn users except for the
0:16:04	the max hierarchical precision
0:16:09	and by observing the results we see that's we may real meeting a lot of
0:16:13	contingency class relations which is consistent as with us what we so on the on
0:16:19	the primitive prediction because we are missing the value close that in most of the
0:16:24	cases for the former for the primitive basic operation
0:16:29	and we were wrongly predicts that the temporal class relations very often
0:16:36	so this is due to the fact that this a relation is associated with quite
0:16:41	underspecified values for the primitives
0:16:45	and that generally we can say that prediction primitives still leaves too much underspecification and
0:16:51	it as and then back to the recall
0:16:54	and we predict too many labels so it as and then based on our precision
0:17:00	so to conclude we
0:17:03	we can see that one of the most important primitives a that's basic operation seems
0:17:08	to be the hardest to predict
0:17:11	and we so that the period is obviously are not independent from each of her
0:17:14	so when we learn them in isolation we are less accurate than when we learn
0:17:20	a fully specified relation
0:17:23	so one of the things that we can we want to do is the azimuth
0:17:28	fast onion set learning setting
0:17:31	and we want also to extend the approach by applying this decomposition into to all
0:17:37	the scores frameworks
0:17:39	in order to have a cross compiler our training and prediction
0:17:43	thank you
0:17:52	they carry much as i questions
0:18:05	alright then thank the single that the thing the speaker again

Which aspects of discourse relations are hard to learn? Primitive decomposition for discourse relation classification

Oral Session 7: Discourse

Charlotte Roze, Chloé Braud and Philippe Muller