0:00:17 | the next talk we will be presented by charlie rose |
---|---|

0:00:22 | entitled which aspects of discourse relations are hard to learn primitive decomposition for discourse relation |

0:00:28 | classification |

0:00:31 | so hi everyone i'm sorry "'cause" i'm going to present a joint work with a |

0:00:37 | three point |

0:00:39 | in this work we are interested we are interested in the following question which aspects |

0:00:43 | of discourse relations are hard to learn |

0:00:45 | by aspects we mean that are the information and could use it by discourse relations |

0:00:50 | can be decomposed into a small set of characteristics |

0:00:55 | that we call primitives |

0:00:57 | and in this work we implements a primitive decomposition of discourse relations |

0:01:02 | you know that's will help discourse relations classification |

0:01:06 | so the global task we are interested in is discourse parsing |

0:01:10 | which aims at identifying discourse structure |

0:01:13 | this structure is a composed by semantic and pragmatic links between discourse units |

0:01:19 | these units can cover text spans of the various sizes |

0:01:23 | the links are called discourse relations and is relations can be either explicit or implicit |

0:01:29 | for example in one |

0:01:31 | we have a contract expectation relation if you we use we use do you |

0:01:36 | the relation from the penn discourse treebank that we are going to present later |

0:01:42 | and this is a relation is explicitly marked by to connect you bats |

0:01:47 | whereas in the second example |

0:01:49 | we have a reason relation and here we don't have any connective to mark the |

0:01:53 | relation |

0:01:54 | it's an implicit relation |

0:01:58 | so they are several to reason frameworks that a much representing discourse structure among the |

0:02:03 | most well-known we have rst sdrt and the penn discourse treebank framework |

0:02:10 | so we have corpora annotated following these various frameworks |

0:02:14 | but did we have no consensus on the label set some discourse relations |

0:02:20 | in each from what we have more or less specific relations and coding relations that |

0:02:25 | do a different levels of granularity for instance the contrast relation from the i-th in |

0:02:32 | the rt |

0:02:33 | corresponds to tree relations in the rst |

0:02:38 | so even if the different label sets |

0:02:43 | our difference we argue that they include a common range of semantic and pragmatic information |

0:02:48 | and we wonder if it's possible to find a way to represent discriminant fame information |

0:02:54 | so discourse relations identification is generally seen as a classification task |

0:03:00 | and it uses separated between x p c it an interesting relations identification |

0:03:07 | the second task implicit relations identification is considered as the artists |

0:03:13 | in fact the results remain quite low on this task i just by the variance |

0:03:19 | of approaches that have been tried |

0:03:22 | so we can as close as if at the problem is only about the way |

0:03:26 | we represent the data |

0:03:28 | also but wait the task is modeled |

0:03:31 | so in this work we want to act on the way we model staffed by |

0:03:34 | splitting it's |

0:03:36 | into several simpler task |

0:03:38 | the idea is to decompose the problem and to investigate the reasons of the difficulty |

0:03:42 | you know a discourse relation identification |

0:03:45 | so to have several simpler task we decompose the information and coded in by the |

0:03:52 | relation labels into values for small set of characteristics that we call primitives |

0:03:58 | to do is we rely on the cognitive approach to coherence relations |

0:04:03 | which provides a proper provide an inventory of a |

0:04:07 | dimensions that we could primitives of relations |

0:04:11 | this infantry is provided we have mappings from the relation of pdtb rst n is |

0:04:17 | the to each to into primitive values |

0:04:21 | they hard core primitives which are the original the do the ones in the original |

0:04:26 | the c r |

0:04:29 | and additional ones that were introduced to explicit the specificities of the various from also |

0:04:35 | the mappings |

0:04:37 | so these mappings can be seen as an interface between the existing frameworks |

0:04:44 | so in our work we provide an operational mapping between annotated relations two sets of |

0:04:51 | primitive values and we test the approach and the penn discourse treebank but you do |

0:04:56 | goal is to extend the approach to other remote state later |

0:05:01 | so we try to answer the question which primitives the harder to predict by diff |

0:05:05 | defining a several classification tasks for each primitive |

0:05:10 | then we do a reverse mapping from the sets of |

0:05:14 | predicted primitive values to a set of to two were compatible relation labels |

0:05:20 | and we and with a relation identification system that we want to evaluate |

0:05:26 | so here at penn discourse treebank hierarchy |

0:05:29 | it is tree levels representing the different granularities so we have more less specific relations |

0:05:36 | on the top little the level one we have relations that cold classes |

0:05:41 | and then we have types at level two and supply such little tree |

0:05:45 | so we have and labels which are the most specific relations at level tree l |

0:05:49 | two |

0:05:50 | and in term at its wines which are underspecified relations they can they have a |

0:05:56 | relations on the them that's the |

0:05:58 | but a finer |

0:06:00 | so we take each pdtb relation and of map it into a set of primitive |

0:06:04 | values |

0:06:06 | we have five core primitives that we're going to illustrate the each have two or |

0:06:11 | three values |

0:06:12 | plus we ideas and the n s value for an unspecified |

0:06:18 | it was to treat some cases of on but when in the us ecr mapping |

0:06:22 | there were several possible value for one primitive |

0:06:26 | all to treat the case of intermediate labels that were absent from the cc a |

0:06:30 | mapping |

0:06:32 | and we have three additional primitives the that are binary conditional a tentative and specificity |

0:06:39 | so two illustrates the mapping tool krakow primitives are we can secure example of the |

0:06:44 | contract dictation relation |

0:06:48 | here for the |

0:06:50 | from the contents of the first units |

0:06:53 | we have an expected indication which is that the by a few cost more |

0:06:57 | because it's more expensive to produce |

0:07:00 | and in the second units this expectation is denied |

0:07:05 | in fact the bile sure doesn't possible |

0:07:09 | so here the mapping of contracts dictation into a primitive values |

0:07:14 | so because it involves an indication that this or relation it is associated with the |

0:07:19 | basic operation that is causal |

0:07:21 | otherwise it would be additive |

0:07:24 | because it involves a negation the polarity is a set to negative otherwise it would |

0:07:30 | be sparsity |

0:07:32 | and we have the value basic for implication although we have here in implication |

0:07:38 | and the do inflectional the refers to the mm or |

0:07:43 | the arguments in which the premise of the implication now it's |

0:07:49 | the all the values i are non basic and eighty |

0:07:52 | which is not applicable for additive relations |

0:07:57 | the another primitive another primitive it's got source of coherence which refers to a common |

0:08:04 | distinction in the literature |

0:08:07 | we have objective relations which operates at the level of profit of propositional content and |

0:08:13 | subjective one at that operate at the base to make a speech that |

0:08:16 | speech act level |

0:08:18 | sorry |

0:08:20 | here we have an example of a subjective relation which is justification |

0:08:25 | here i state that meets easy regan is lying because they found students who said |

0:08:30 | she gave me |

0:08:31 | similar |

0:08:33 | so we have the mapping to of justification into a primitive values its causal positive |

0:08:38 | and one basic and we have the values subjective |

0:08:42 | and it remains non specifies for a temporal all the |

0:08:46 | and temporal all the is eyes free values chronological entrepreneur you couldn't synchronous |

0:08:54 | so with respect to the penn discourse treebank higher actually all these primitives are not |

0:08:59 | equal in importance |

0:09:01 | some of them are able to make distinctions between the top level classes it's the |

0:09:06 | case for basic operation and polarity for instance basic operation as the value close all |

0:09:11 | for all relations |

0:09:13 | and the other contingency class |

0:09:15 | and i did steve for relations and the other component class |

0:09:18 | is the same for polarity we have all the compare is and relations that are |

0:09:23 | negative for polarity |

0:09:27 | and we have other priorities that makes to that makes label distinctions at lower levels |

0:09:32 | is able to a tree |

0:09:36 | so here's tum that we have applied the mapping to each relation in the penn |

0:09:40 | discourse treebank |

0:09:42 | and here's the distribution of values for each primitive in the corpus |

0:09:48 | on the left we have the list of primitives and on the right to the |

0:09:52 | list of all values and are mixed together |

0:09:59 | so |

0:09:59 | for each primitive we define a classification task |

0:10:03 | we have one and twenty eight thousand pairs of arguments for an hour training set |

0:10:09 | we use that quite straightforward the actually true for the classification |

0:10:15 | each argument of the relations e the represented with the interest and sentence encoder which |

0:10:22 | is a very common for semantics task so each argument is mapped into pre-trained what |

0:10:27 | invading and then on coded with the by nist and with max pretty putting |

0:10:31 | and after that we combine the two arguments representation with concatenation a difference and products |

0:10:39 | we test various settings |

0:10:42 | we tested various settings we tested various a very sizable an additional layer on top |

0:10:49 | of the arguments combinations and different regularization values |

0:10:56 | not so we take the base sitting at a as a best model for each |

0:10:59 | for each task |

0:11:02 | and the as a baseline we take a majority classifier |

0:11:07 | so the results in accuracy and natural f one |

0:11:11 | for the baseline in blue and at the best model in all range |

0:11:16 | for each call primitive polarity basic operations so coherence implication or the and temporal order |

0:11:25 | i |

0:11:26 | i don't have |

0:11:27 | for first of all argument pairs in the core and the corpus |

0:11:31 | in the test in the |

0:11:33 | that's |

0:11:34 | we are all primitives that are correctly predicted which is not very good but in |

0:11:39 | of rage we have at each person primitive that are correctly predicted |

0:11:44 | we're going to discuss about polarity and basic impression and we said before that |

0:11:49 | they are the most important primitives respect to the penn discourse treebank higher iq and |

0:11:54 | the a similar distribution of values where the are comparable |

0:11:59 | basic operation it has the lowest improvements with respect to the baseline over all the |

0:12:04 | core primitives |

0:12:06 | and we i don't see five correctly only seventeen percent of causal relations |

0:12:14 | and we have better results for polarity |

0:12:17 | it doesn't greater improvement with respect to the baseline and we have fifty best and |

0:12:22 | the negative relations that we correctly |

0:12:25 | like label |

0:12:27 | source of coherence is the primitive that as the greatest improvement with respect to the |

0:12:31 | baseline but we have to temper this result because we have less than one person |

0:12:36 | all subjective relation in our dataset so we need to have only object of relations |

0:12:42 | and recent for which we have the not specified value so it that's not very |

0:12:46 | informative |

0:12:48 | for time for it all the we have a little improvement with respect to the |

0:12:52 | baseline and this is due to the fact that relations are |

0:12:56 | i mean method |

0:12:57 | i mean in table that an unspecified |

0:13:02 | after that we wanted to evaluate the performance of our systems on predicting discourse relations |

0:13:08 | so we operate the reverse mapping from the set of predicates that but use for |

0:13:13 | each primitive to the set to a set of compare to be compressible relation labels |

0:13:18 | so we start with a set containing all the possible relations at all levels |

0:13:24 | then we remove the relations that are incompatible with the primitive values that we protected |

0:13:29 | for instance if the polarity is predicted positive we remove all relations associated with a |

0:13:35 | negative polarity and we do the same for each primitive |

0:13:40 | and then we removed you're in time zones information of if the set contains all |

0:13:45 | the sub types and their insightful or all the types underclass we only keep the |

0:13:50 | upper level underspecified relation |

0:13:53 | so we evaluate we have a number of questions we need to measure for a |

0:13:59 | hierarchical classification we have a and the specifications in |

0:14:03 | in the devaluation |

0:14:06 | the |

0:14:06 | predicate level can be more or less specific than the goal label from p t |

0:14:11 | v |

0:14:12 | and we didn't measure for unmentionable classification |

0:14:16 | in fact we all system can predict a addictions angle relation |

0:14:21 | so we use or a hierarchical approach precision and recall on the set of all |

0:14:27 | labels |

0:14:30 | so for instance |

0:14:33 | on the on the left if we have in the goal of a one relation |

0:14:37 | which is expressions as and that's you |

0:14:40 | and we predict two relations that are finer |

0:14:44 | we are okay onto labels and the we have to elements that are wrong so |

0:14:50 | i'll precision is of a zero point five |

0:14:53 | whereas in the example on the rights if we have two relations in the gold |

0:14:59 | and |

0:15:00 | we only predicts one relation which is less specific |

0:15:04 | we have to only a good labels and we missed some of them so we |

0:15:09 | have a recall of the run five |

0:15:11 | so we compare the system |

0:15:16 | we've the reverse mapping from affected primitives into a set of relations with system with |

0:15:24 | the direct this was discourse relations classification with no decomposition |

0:15:29 | into primitives |

0:15:30 | and we as a measure of we give the accuracy you do hierarchical precision and |

0:15:34 | recall that we just presented and |

0:15:36 | again the hierarchical scores but only on the best match between what we predicted and |

0:15:43 | the pdtb relations |

0:15:46 | so here we can see that |

0:15:49 | the system with |

0:15:52 | for that's it that's the prefix relations with the remote inverse mapping from the predicted |

0:15:59 | primitives |

0:16:00 | as lower results on the on the autumn users except for the |

0:16:04 | the max hierarchical precision |

0:16:09 | and by observing the results we see that's we may real meeting a lot of |

0:16:13 | contingency class relations which is consistent as with us what we so on the on |

0:16:19 | the primitive prediction because we are missing the value close that in most of the |

0:16:24 | cases for the former for the primitive basic operation |

0:16:29 | and we were wrongly predicts that the temporal class relations very often |

0:16:36 | so this is due to the fact that this a relation is associated with quite |

0:16:41 | underspecified values for the primitives |

0:16:45 | and that generally we can say that prediction primitives still leaves too much underspecification and |

0:16:51 | it as and then back to the recall |

0:16:54 | and we predict too many labels so it as and then based on our precision |

0:17:00 | so to conclude we |

0:17:03 | we can see that one of the most important primitives a that's basic operation seems |

0:17:08 | to be the hardest to predict |

0:17:11 | and we so that the period is obviously are not independent from each of her |

0:17:14 | so when we learn them in isolation we are less accurate than when we learn |

0:17:20 | a fully specified relation |

0:17:23 | so one of the things that we can we want to do is the azimuth |

0:17:28 | fast onion set learning setting |

0:17:31 | and we want also to extend the approach by applying this decomposition into to all |

0:17:37 | the scores frameworks |

0:17:39 | in order to have a cross compiler our training and prediction |

0:17:43 | thank you |

0:17:52 | they carry much as i questions |

0:18:05 | alright then thank the single that the thing the speaker again |