0:00:17the next talk we will be presented by charlie rose
0:00:22entitled which aspects of discourse relations are hard to learn primitive decomposition for discourse relation
0:00:28classification
0:00:31so hi everyone i'm sorry "'cause" i'm going to present a joint work with a
0:00:37three point
0:00:39in this work we are interested we are interested in the following question which aspects
0:00:43of discourse relations are hard to learn
0:00:45by aspects we mean that are the information and could use it by discourse relations
0:00:50can be decomposed into a small set of characteristics
0:00:55that we call primitives
0:00:57and in this work we implements a primitive decomposition of discourse relations
0:01:02you know that's will help discourse relations classification
0:01:06so the global task we are interested in is discourse parsing
0:01:10which aims at identifying discourse structure
0:01:13this structure is a composed by semantic and pragmatic links between discourse units
0:01:19these units can cover text spans of the various sizes
0:01:23the links are called discourse relations and is relations can be either explicit or implicit
0:01:29for example in one
0:01:31we have a contract expectation relation if you we use we use do you
0:01:36the relation from the penn discourse treebank that we are going to present later
0:01:42and this is a relation is explicitly marked by to connect you bats
0:01:47whereas in the second example
0:01:49we have a reason relation and here we don't have any connective to mark the
0:01:53relation
0:01:54it's an implicit relation
0:01:58so they are several to reason frameworks that a much representing discourse structure among the
0:02:03most well-known we have rst sdrt and the penn discourse treebank framework
0:02:10so we have corpora annotated following these various frameworks
0:02:14but did we have no consensus on the label set some discourse relations
0:02:20in each from what we have more or less specific relations and coding relations that
0:02:25do a different levels of granularity for instance the contrast relation from the i-th in
0:02:32the rt
0:02:33corresponds to tree relations in the rst
0:02:38so even if the different label sets
0:02:43our difference we argue that they include a common range of semantic and pragmatic information
0:02:48and we wonder if it's possible to find a way to represent discriminant fame information
0:02:54so discourse relations identification is generally seen as a classification task
0:03:00and it uses separated between x p c it an interesting relations identification
0:03:07the second task implicit relations identification is considered as the artists
0:03:13in fact the results remain quite low on this task i just by the variance
0:03:19of approaches that have been tried
0:03:22so we can as close as if at the problem is only about the way
0:03:26we represent the data
0:03:28also but wait the task is modeled
0:03:31so in this work we want to act on the way we model staffed by
0:03:34splitting it's
0:03:36into several simpler task
0:03:38the idea is to decompose the problem and to investigate the reasons of the difficulty
0:03:42you know a discourse relation identification
0:03:45so to have several simpler task we decompose the information and coded in by the
0:03:52relation labels into values for small set of characteristics that we call primitives
0:03:58to do is we rely on the cognitive approach to coherence relations
0:04:03which provides a proper provide an inventory of a
0:04:07dimensions that we could primitives of relations
0:04:11this infantry is provided we have mappings from the relation of pdtb rst n is
0:04:17the to each to into primitive values
0:04:21they hard core primitives which are the original the do the ones in the original
0:04:26the c r
0:04:29and additional ones that were introduced to explicit the specificities of the various from also
0:04:35the mappings
0:04:37so these mappings can be seen as an interface between the existing frameworks
0:04:44so in our work we provide an operational mapping between annotated relations two sets of
0:04:51primitive values and we test the approach and the penn discourse treebank but you do
0:04:56goal is to extend the approach to other remote state later
0:05:01so we try to answer the question which primitives the harder to predict by diff
0:05:05defining a several classification tasks for each primitive
0:05:10then we do a reverse mapping from the sets of
0:05:14predicted primitive values to a set of to two were compatible relation labels
0:05:20and we and with a relation identification system that we want to evaluate
0:05:26so here at penn discourse treebank hierarchy
0:05:29it is tree levels representing the different granularities so we have more less specific relations
0:05:36on the top little the level one we have relations that cold classes
0:05:41and then we have types at level two and supply such little tree
0:05:45so we have and labels which are the most specific relations at level tree l
0:05:49two
0:05:50and in term at its wines which are underspecified relations they can they have a
0:05:56relations on the them that's the
0:05:58but a finer
0:06:00so we take each pdtb relation and of map it into a set of primitive
0:06:04values
0:06:06we have five core primitives that we're going to illustrate the each have two or
0:06:11three values
0:06:12plus we ideas and the n s value for an unspecified
0:06:18it was to treat some cases of on but when in the us ecr mapping
0:06:22there were several possible value for one primitive
0:06:26all to treat the case of intermediate labels that were absent from the cc a
0:06:30mapping
0:06:32and we have three additional primitives the that are binary conditional a tentative and specificity
0:06:39so two illustrates the mapping tool krakow primitives are we can secure example of the
0:06:44contract dictation relation
0:06:48here for the
0:06:50from the contents of the first units
0:06:53we have an expected indication which is that the by a few cost more
0:06:57because it's more expensive to produce
0:07:00and in the second units this expectation is denied
0:07:05in fact the bile sure doesn't possible
0:07:09so here the mapping of contracts dictation into a primitive values
0:07:14so because it involves an indication that this or relation it is associated with the
0:07:19basic operation that is causal
0:07:21otherwise it would be additive
0:07:24because it involves a negation the polarity is a set to negative otherwise it would
0:07:30be sparsity
0:07:32and we have the value basic for implication although we have here in implication
0:07:38and the do inflectional the refers to the mm or
0:07:43the arguments in which the premise of the implication now it's
0:07:49the all the values i are non basic and eighty
0:07:52which is not applicable for additive relations
0:07:57the another primitive another primitive it's got source of coherence which refers to a common
0:08:04distinction in the literature
0:08:07we have objective relations which operates at the level of profit of propositional content and
0:08:13subjective one at that operate at the base to make a speech that
0:08:16speech act level
0:08:18sorry
0:08:20here we have an example of a subjective relation which is justification
0:08:25here i state that meets easy regan is lying because they found students who said
0:08:30she gave me
0:08:31similar
0:08:33so we have the mapping to of justification into a primitive values its causal positive
0:08:38and one basic and we have the values subjective
0:08:42and it remains non specifies for a temporal all the
0:08:46and temporal all the is eyes free values chronological entrepreneur you couldn't synchronous
0:08:54so with respect to the penn discourse treebank higher actually all these primitives are not
0:08:59equal in importance
0:09:01some of them are able to make distinctions between the top level classes it's the
0:09:06case for basic operation and polarity for instance basic operation as the value close all
0:09:11for all relations
0:09:13and the other contingency class
0:09:15and i did steve for relations and the other component class
0:09:18is the same for polarity we have all the compare is and relations that are
0:09:23negative for polarity
0:09:27and we have other priorities that makes to that makes label distinctions at lower levels
0:09:32is able to a tree
0:09:36so here's tum that we have applied the mapping to each relation in the penn
0:09:40discourse treebank
0:09:42and here's the distribution of values for each primitive in the corpus
0:09:48on the left we have the list of primitives and on the right to the
0:09:52list of all values and are mixed together
0:09:59so
0:09:59for each primitive we define a classification task
0:10:03we have one and twenty eight thousand pairs of arguments for an hour training set
0:10:09we use that quite straightforward the actually true for the classification
0:10:15each argument of the relations e the represented with the interest and sentence encoder which
0:10:22is a very common for semantics task so each argument is mapped into pre-trained what
0:10:27invading and then on coded with the by nist and with max pretty putting
0:10:31and after that we combine the two arguments representation with concatenation a difference and products
0:10:39we test various settings
0:10:42we tested various settings we tested various a very sizable an additional layer on top
0:10:49of the arguments combinations and different regularization values
0:10:56not so we take the base sitting at a as a best model for each
0:10:59for each task
0:11:02and the as a baseline we take a majority classifier
0:11:07so the results in accuracy and natural f one
0:11:11for the baseline in blue and at the best model in all range
0:11:16for each call primitive polarity basic operations so coherence implication or the and temporal order
0:11:25i
0:11:26i don't have
0:11:27for first of all argument pairs in the core and the corpus
0:11:31in the test in the
0:11:33that's
0:11:34we are all primitives that are correctly predicted which is not very good but in
0:11:39of rage we have at each person primitive that are correctly predicted
0:11:44we're going to discuss about polarity and basic impression and we said before that
0:11:49they are the most important primitives respect to the penn discourse treebank higher iq and
0:11:54the a similar distribution of values where the are comparable
0:11:59basic operation it has the lowest improvements with respect to the baseline over all the
0:12:04core primitives
0:12:06and we i don't see five correctly only seventeen percent of causal relations
0:12:14and we have better results for polarity
0:12:17it doesn't greater improvement with respect to the baseline and we have fifty best and
0:12:22the negative relations that we correctly
0:12:25like label
0:12:27source of coherence is the primitive that as the greatest improvement with respect to the
0:12:31baseline but we have to temper this result because we have less than one person
0:12:36all subjective relation in our dataset so we need to have only object of relations
0:12:42and recent for which we have the not specified value so it that's not very
0:12:46informative
0:12:48for time for it all the we have a little improvement with respect to the
0:12:52baseline and this is due to the fact that relations are
0:12:56i mean method
0:12:57i mean in table that an unspecified
0:13:02after that we wanted to evaluate the performance of our systems on predicting discourse relations
0:13:08so we operate the reverse mapping from the set of predicates that but use for
0:13:13each primitive to the set to a set of compare to be compressible relation labels
0:13:18so we start with a set containing all the possible relations at all levels
0:13:24then we remove the relations that are incompatible with the primitive values that we protected
0:13:29for instance if the polarity is predicted positive we remove all relations associated with a
0:13:35negative polarity and we do the same for each primitive
0:13:40and then we removed you're in time zones information of if the set contains all
0:13:45the sub types and their insightful or all the types underclass we only keep the
0:13:50upper level underspecified relation
0:13:53so we evaluate we have a number of questions we need to measure for a
0:13:59hierarchical classification we have a and the specifications in
0:14:03in the devaluation
0:14:06the
0:14:06predicate level can be more or less specific than the goal label from p t
0:14:11v
0:14:12and we didn't measure for unmentionable classification
0:14:16in fact we all system can predict a addictions angle relation
0:14:21so we use or a hierarchical approach precision and recall on the set of all
0:14:27labels
0:14:30so for instance
0:14:33on the on the left if we have in the goal of a one relation
0:14:37which is expressions as and that's you
0:14:40and we predict two relations that are finer
0:14:44we are okay onto labels and the we have to elements that are wrong so
0:14:50i'll precision is of a zero point five
0:14:53whereas in the example on the rights if we have two relations in the gold
0:14:59and
0:15:00we only predicts one relation which is less specific
0:15:04we have to only a good labels and we missed some of them so we
0:15:09have a recall of the run five
0:15:11so we compare the system
0:15:16we've the reverse mapping from affected primitives into a set of relations with system with
0:15:24the direct this was discourse relations classification with no decomposition
0:15:29into primitives
0:15:30and we as a measure of we give the accuracy you do hierarchical precision and
0:15:34recall that we just presented and
0:15:36again the hierarchical scores but only on the best match between what we predicted and
0:15:43the pdtb relations
0:15:46so here we can see that
0:15:49the system with
0:15:52for that's it that's the prefix relations with the remote inverse mapping from the predicted
0:15:59primitives
0:16:00as lower results on the on the autumn users except for the
0:16:04the max hierarchical precision
0:16:09and by observing the results we see that's we may real meeting a lot of
0:16:13contingency class relations which is consistent as with us what we so on the on
0:16:19the primitive prediction because we are missing the value close that in most of the
0:16:24cases for the former for the primitive basic operation
0:16:29and we were wrongly predicts that the temporal class relations very often
0:16:36so this is due to the fact that this a relation is associated with quite
0:16:41underspecified values for the primitives
0:16:45and that generally we can say that prediction primitives still leaves too much underspecification and
0:16:51it as and then back to the recall
0:16:54and we predict too many labels so it as and then based on our precision
0:17:00so to conclude we
0:17:03we can see that one of the most important primitives a that's basic operation seems
0:17:08to be the hardest to predict
0:17:11and we so that the period is obviously are not independent from each of her
0:17:14so when we learn them in isolation we are less accurate than when we learn
0:17:20a fully specified relation
0:17:23so one of the things that we can we want to do is the azimuth
0:17:28fast onion set learning setting
0:17:31and we want also to extend the approach by applying this decomposition into to all
0:17:37the scores frameworks
0:17:39in order to have a cross compiler our training and prediction
0:17:43thank you
0:17:52they carry much as i questions
0:18:05alright then thank the single that the thing the speaker again