0:00:14 | and that |
---|
0:00:17 | there are other structure might talk will be person going to motivate why we're looking |
---|
0:00:21 | at pdtb in the context of this corpus |
---|
0:00:24 | explain the corpus and then talk about you studies one involving manual annotation and one |
---|
0:00:29 | involving automatic a discourse parsing |
---|
0:00:34 | what are we looking at pdtb for student data |
---|
0:00:38 | so probably most people are familiar with pdtb penn discourse treebank framework and i'm going |
---|
0:00:42 | to use the abbreviation to refer to the framework |
---|
0:00:45 | rather than actual corpus that on the wall street journal on when i talk about |
---|
0:00:49 | that although it's a wall street journal |
---|
0:00:51 | i ptt |
---|
0:00:54 | it's one of the currently very on the dominant theories of discourse structure in the |
---|
0:00:58 | community |
---|
0:00:59 | it's lexically grounded and i'll give examples of what i mean by that the moment |
---|
0:01:04 | and unlike other alternative theories such as rst it's much more shallow so basically the |
---|
0:01:10 | analysis of the local level with relations and they have two arguments |
---|
0:01:14 | it's become increasingly study because first there be and now a lot of studies in |
---|
0:01:20 | many languages many genres |
---|
0:01:22 | and spin shown that it's a framework that people can reliably annotate |
---|
0:01:26 | and now because of all this annotation there's a lot of data which has really |
---|
0:01:29 | screwed interest in automatic i'm discourse parsing |
---|
0:01:32 | so they're bin in fact at the last two connell conferences their bin i shared |
---|
0:01:37 | task and pdtb discourse parsing |
---|
0:01:43 | so although it has been used in a lot of languages an honours genres one |
---|
0:01:46 | area which it hasn't been used and is the area of interest that i work |
---|
0:01:49 | in which a student can produce content |
---|
0:01:53 | and in particular we've been looking at a corpus of student essays |
---|
0:01:57 | which differ from a prior corpora that have been examined in this framework |
---|
0:02:02 | along the three dimensions shown here |
---|
0:02:06 | first there argumentative structure there basically have an argumentative nature |
---|
0:02:10 | second on in addition to the text being somewhat different the people who are writing |
---|
0:02:15 | the checks are also different than for example newspaper writers and that their students |
---|
0:02:19 | so there's still learning how to |
---|
0:02:22 | convey discourse structure and they also have a lot of other problems with other aspects |
---|
0:02:25 | of writing more low-level issues |
---|
0:02:30 | okay so the goals of the work of representing today or to fall so because |
---|
0:02:35 | of these differences between student data and prior data where interested in looking at this |
---|
0:02:41 | does this kind of corpus push |
---|
0:02:43 | the annotation procedures that have been developed and i'm other genres |
---|
0:02:47 | and also due to these differences how do you existing on discourse parsers that have |
---|
0:02:52 | been developed primarily for the wall street journal |
---|
0:02:55 | work on this more challenging domain |
---|
0:02:58 | and from that sort of from my educate my and all p perspective |
---|
0:03:01 | from my other had as a researcher and ai in education |
---|
0:03:06 | i'm also interested in how we can use some of these issues to |
---|
0:03:10 | support downstream applications which might take advantage of discourse analysis |
---|
0:03:15 | such as i'm writing tutors and |
---|
0:03:19 | that's a analysis and so forth |
---|
0:03:22 | okay so let me briefly describe my corpus |
---|
0:03:26 | there are data consist of first and second draft upper face persuasive essays written by |
---|
0:03:31 | high school students in the pittsburgh area is were actually written in |
---|
0:03:35 | the context and to classrooms |
---|
0:03:38 | or corpus comes from forty seven students may each row to first and second raster |
---|
0:03:42 | we have places many papers |
---|
0:03:44 | and all of the data is in response to the prompted shown in red explain |
---|
0:03:50 | why contemporary should be sent each of the first six sections of dante's help so |
---|
0:03:54 | this is |
---|
0:03:54 | in a class of advanced students in the us their advanced placement courses which prepare |
---|
0:04:00 | students for taking stance which can given them |
---|
0:04:03 | colors creditor help in place out of a college level english classes |
---|
0:04:08 | and so in this corpus students first row there is a response to this problem |
---|
0:04:12 | that is these were then given to other students in a peer review process where |
---|
0:04:17 | they were graded according to a rubric a numerical great amount of feedback |
---|
0:04:21 | and then they revise their papers |
---|
0:04:22 | and to hopefully make it better |
---|
0:04:27 | the here's an example of a fairly well written essay as dante descends into the |
---|
0:04:32 | second circle he sees the sinners you make their reason for all under the oak |
---|
0:04:36 | of their last these were the souls of those |
---|
0:04:38 | the main act of love but inappropriately on an impulse this would be a fine |
---|
0:04:42 | level of health for all those you cheat on their boyfriends or girlfriends and high |
---|
0:04:45 | school |
---|
0:04:46 | because let's face it they aren't really online |
---|
0:04:50 | okay so by the second row the goal is to have people write this nice |
---|
0:04:54 | persuasive essay with a fairly canonical structure there's usually be an introduction with each this |
---|
0:04:59 | is laid out |
---|
0:05:00 | and there should be some |
---|
0:05:02 | paragraphs developing the reasoning so this was kind of where this example comes from and |
---|
0:05:07 | then there should be and include |
---|
0:05:09 | so a conclusion so the sas unlike for example the wall street journal where a |
---|
0:05:13 | much of the pdtb working community have |
---|
0:05:16 | has taken place a rs is because that's an argumentative structure |
---|
0:05:21 | there has been another recent large-scale corpus that for all piled u r b in |
---|
0:05:26 | the medical community where they looked at scientific medical argument |
---|
0:05:30 | papers and so those are similar and the argumentative nature to our corpus but those |
---|
0:05:35 | are written by you know as professional scientists unlike high school students so |
---|
0:05:40 | even though they have the argument of the are corpus differs from them in the |
---|
0:05:43 | level of that people producing the text |
---|
0:05:47 | and i'm not gonna read this one in detail but here's an essay which is |
---|
0:05:50 | an as well written |
---|
0:05:51 | you can kind of read that in the background it's either sort problem lots of |
---|
0:05:56 | levels |
---|
0:05:57 | and so even though they get feedback still the at caesar quite noisy for many |
---|
0:06:02 | students even after the |
---|
0:06:04 | you know the final version |
---|
0:06:06 | so in their problems range from low-level issues such as grammatical and spelling errors to |
---|
0:06:10 | more discourse wearing to |
---|
0:06:12 | issues of lack of coherence with references and discourse relations |
---|
0:06:19 | okay so that's the data so i'm first gonna talk about how we created are |
---|
0:06:24 | manual and annotated corpus |
---|
0:06:29 | no for those for unfamiliar with p d p m briefly just gonna review some |
---|
0:06:33 | of |
---|
0:06:35 | major |
---|
0:06:36 | annotation |
---|
0:06:38 | things in the framework that we were interested in annotating |
---|
0:06:40 | so as i said dvd to use the lexically or in to discourse theory which |
---|
0:06:45 | have the idea that |
---|
0:06:47 | discourse relations between two arguments can be seen signal but lexically |
---|
0:06:51 | so when there's the explicit discourse connectives this is called an explicit relation one it's |
---|
0:06:55 | not explicit then we have |
---|
0:06:57 | these other options |
---|
0:07:00 | so if the discourse connective isn't there explicitly but the annotator could put it in |
---|
0:07:04 | there that called an implicit relation if the discourse relation would be redundant but relation |
---|
0:07:10 | have an alternative lexical is asian that's a call all x |
---|
0:07:14 | sometimes the coherence is not in terms of |
---|
0:07:18 | the relation signal by connectives but by entities |
---|
0:07:21 | and then in some cases there where we have incoherent |
---|
0:07:25 | relations there were classified that is no relation so those are the five relation types |
---|
0:07:30 | that will be annotating |
---|
0:07:33 | for each of those relations then they can be categorized in terms of sentences and |
---|
0:07:38 | so the full scale full blown theory of the pdtb framework has a hierarchical annotation |
---|
0:07:43 | that you can see with this tree structure of our work because this was the |
---|
0:07:48 | first |
---|
0:07:49 | first study in we weren't even short we could do the |
---|
0:07:53 | the highest level of the top of each of these for trees we limited our |
---|
0:07:59 | current study to just that so we're just levelling |
---|
0:08:02 | labelling them with respect to what's called level one which are the highest level of |
---|
0:08:06 | the tree comparison contingency |
---|
0:08:09 | expansion and temporal |
---|
0:08:10 | and then as you can see in a full blown pdtb analysis |
---|
0:08:14 | a temporal can then be labeled whether a synchronous or asynchronous and then if you |
---|
0:08:19 | want all we channel-level three asynchronous could also be labeled with respect to whether it |
---|
0:08:23 | runs that citizens or succession |
---|
0:08:27 | okay so here just a few annotated examples to make this a little clear so |
---|
0:08:32 | the first example |
---|
0:08:34 | filled with hatred for many it never acts upon his room thoughts |
---|
0:08:38 | the notation and all be using that is typically used in p d c t |
---|
0:08:42 | is the connective is shown with underlines here the connective is yet because that actually |
---|
0:08:47 | in the text |
---|
0:08:48 | this is an explicit relation |
---|
0:08:50 | and then it is |
---|
0:08:51 | can be associated with several |
---|
0:08:54 | senses and in this case it's labeled as a comparison and then it has two |
---|
0:08:59 | arguments of the that the first argument are shown with that alex and the second |
---|
0:09:02 | is shown in bold |
---|
0:09:04 | next example the man was stuck in the slayers you have never use devoted his |
---|
0:09:08 | entire life or other people's possible later in his own |
---|
0:09:12 | so there's no connective here that's |
---|
0:09:15 | just shown by the underlying |
---|
0:09:16 | so this is an implicit relation because even though the writer doesn't put the connective |
---|
0:09:21 | in the annotator could infer that an appropriate connective could have been placed there |
---|
0:09:25 | i mainly because so it's implicit and then the sense of the relation that's implicitly |
---|
0:09:30 | signal in this example is contingency |
---|
0:09:35 | okay so that sort of the output of the annotations so the process is as |
---|
0:09:40 | follows |
---|
0:09:41 | so we retain |
---|
0:09:42 | sort of the key aspects of g d g p of the pdtb framework namely |
---|
0:09:46 | we wanted to annotate with respect to the five relation types that i |
---|
0:09:50 | it just explain and the for level one senses |
---|
0:09:53 | but following prior studies we modified some of the conventions to fit our domain which |
---|
0:09:58 | i think that differ from some of the prior work |
---|
0:10:00 | to help increase the reliability of the annotation and the time that a truck because |
---|
0:10:05 | very expensive to |
---|
0:10:06 | higher expert annotators to do this |
---|
0:10:10 | the following our work that a apply this framework in handy are annotation basically made |
---|
0:10:16 | one pass through as a so we did kind of relation and of time |
---|
0:10:21 | because of our data having all these sort of low-level issues that you want see |
---|
0:10:24 | for example in the wall street journal we allow annotator to a lower relations bit |
---|
0:10:29 | one ungrammatical units of it was clear that |
---|
0:10:32 | what really should have been |
---|
0:10:33 | in written if the low-level problems |
---|
0:10:36 | hadn't been there so here we see the first layer palette the vestibule in the |
---|
0:10:39 | entrance of hail this is a large open gate symbolising that's easy to get into |
---|
0:10:43 | so you can see that there's no capitalisation before for this and there's no period |
---|
0:10:49 | after helmet we can also to put the there ourselves so we like |
---|
0:10:52 | the annotator pretend that those real error and |
---|
0:10:57 | it's we have those be the two arguments even know if we enforce this constraint |
---|
0:11:01 | for well written text you want to have a lab that and then the relation |
---|
0:11:04 | here is an entity relation there's no explicit or implicit connective between helen this but |
---|
0:11:10 | we can infer coherence through entity |
---|
0:11:13 | and i'd like to note that because of some of the modifications we may when |
---|
0:11:17 | we apply the parsers which follow the strict p d |
---|
0:11:19 | g p e d t p obviously they're not going to be able to get |
---|
0:11:24 | these examples right so it will be impossible for |
---|
0:11:26 | a parser to get a hundred percent on our corpus currently |
---|
0:11:32 | another change that we made which we followed from the bible d r b corpus |
---|
0:11:36 | which is i mentioned like ours is argumentative |
---|
0:11:39 | is to permit implicit arguments non-adjacent within paragraph unit so you can see in this |
---|
0:11:44 | example |
---|
0:11:46 | we have the implicit relations so |
---|
0:11:49 | so there's no so isn't actually in the text but the annotator felt could have |
---|
0:11:53 | been place there so it's an implicit |
---|
0:11:55 | and that's first argument of so is the first sentence in the place of the |
---|
0:11:59 | porters while the second argument is although and as you can see |
---|
0:12:02 | they're non-adjacent so in strict pdtb this one be allowed and we'd have |
---|
0:12:06 | you'd are weaker relationship or no relationship and we missing some of the and this |
---|
0:12:11 | was found as i said to be an issue |
---|
0:12:13 | and that by d you're the corpus as well |
---|
0:12:18 | okay so once we completed our annotation are first interest was in comparing how the |
---|
0:12:24 | distribution of what we annotated compared to these other corpora in the literature to see |
---|
0:12:28 | the impact of both |
---|
0:12:30 | a the argumentative genre as well it's torque and conjoined with that |
---|
0:12:34 | the |
---|
0:12:36 | elementary level of the writing ability of the people producing the text |
---|
0:12:40 | so on the first row you can see the distribution across the five relation types |
---|
0:12:44 | or rs a data and them below you can see comparison with these two other |
---|
0:12:48 | corpora that of mention the wall street journal and the by what you're be |
---|
0:12:52 | and i've highlighted two things i just want to drive a talking there are more |
---|
0:12:55 | details about some other things in the paper |
---|
0:12:58 | never first unlike |
---|
0:13:00 | the other two corpora which have |
---|
0:13:02 | exactly the same percentage of explicitly signal relations are data has much fewer |
---|
0:13:07 | and we believe this probably reflects the not this nature of people producing the taxes |
---|
0:13:13 | there still actually learning how to construct |
---|
0:13:15 | a coherent discourse and haven't quite figured out the proper use of connectives and so |
---|
0:13:19 | as i said we feel this is something that discourse structure could be used in |
---|
0:13:23 | downstream applications to highlight areas that might benefit from tutoring |
---|
0:13:29 | we also see that although the last |
---|
0:13:33 | column that use either the no relation |
---|
0:13:35 | although it's very low in all of the corpora and are as we basically got |
---|
0:13:39 | it down to zero and we believe that's because the loosening of the can adjacency |
---|
0:13:43 | constraint although the by the are we also this not constraint may |
---|
0:13:46 | still didn't really differ from the wall street journal |
---|
0:13:51 | with respect to the other major component that we annotated the sense distributions |
---|
0:13:56 | you can see in the first column at |
---|
0:13:59 | but the sas in the buyer the rbf you were comparisons of this suggests that |
---|
0:14:03 | this might be a feature that's relevant to the argument in nature of a text |
---|
0:14:06 | rather than to the skill level of the writers and this is kind of opposite |
---|
0:14:11 | to the contingency where we see that |
---|
0:14:14 | wall street journal on the by dear d r b which are get burned whether |
---|
0:14:18 | they're argumentative or not |
---|
0:14:19 | or much more similar to each other as opposed to the sas where it is |
---|
0:14:23 | the skill level of the students that is what's |
---|
0:14:26 | a notable there |
---|
0:14:31 | okay and then the final thing we that was identified in our manual annotation was |
---|
0:14:37 | that the annotator had a lot of |
---|
0:14:42 | ambiguities that she had trouble annotating that consistently euros |
---|
0:14:45 | in particular between the three things i've shown there and i've just given two examples |
---|
0:14:50 | and so in the first examples you had a lot of trouble deciding should this |
---|
0:14:53 | be an implicit expansion or an entity relation and some of these concerns we're because |
---|
0:14:58 | on the way pdtb works if there is a predefined as the connectives that came |
---|
0:15:02 | out of largely the wall street journal and in our student data we're seeing a |
---|
0:15:05 | lot of things which probably could |
---|
0:15:07 | we consider connected but aren't you |
---|
0:15:09 | that are resources that are used to guide most manual annotation efforts |
---|
0:15:17 | here we see a another ambiguity between explicit expansion work and contingency |
---|
0:15:24 | this |
---|
0:15:25 | issue of causality with which is way to contingency was also a problem that was |
---|
0:15:30 | in the by the european back they |
---|
0:15:32 | added some extra senses to reflect sort of contingency that is specific to argumentation |
---|
0:15:40 | okay so no turning to the automatic parsing |
---|
0:15:44 | in this study we use the off-the-shelf than nl discourse parser which was the first |
---|
0:15:48 | and on pdtb ptt parser it was produced that the national university of singapore |
---|
0:15:55 | and was trained on the wall street journal |
---|
0:15:57 | and it's basically has a pipeline architecture where a |
---|
0:16:01 | a set of predefined discourse connective that i mentioned before identified once |
---|
0:16:05 | those of identify then all the explicit relations are the arguments are identified in a |
---|
0:16:10 | sign to sense and then all the non explicit relations are dealt with |
---|
0:16:15 | and our study we use two versions of the parser we first use the one |
---|
0:16:19 | that you base we can download directly which is trained on level to send systems |
---|
0:16:23 | are data is only in terms of level one we could parse in terms of |
---|
0:16:26 | level two and then |
---|
0:16:28 | rewrite that in the more abstract level one versions |
---|
0:16:31 | are we thought it might be more productive to actually retrain the parser by not |
---|
0:16:35 | using the level two sentences in the wall street journal but simplifying them to level |
---|
0:16:40 | one and then training and testing directly and |
---|
0:16:42 | that are and us people finally we trained up your parts of force |
---|
0:16:47 | in the second version |
---|
0:16:51 | okay so here are on our results and to "'em" performance using f one score |
---|
0:16:56 | which is |
---|
0:16:57 | the standard way that these parsers are currently evaluated |
---|
0:17:01 | so in the first column you can see the configuration for the training that particular |
---|
0:17:05 | parser we use the data was trained on the level of the sense |
---|
0:17:09 | sense is an annotation that was used for the training and then you can see |
---|
0:17:12 | the testing situation in our case we not only |
---|
0:17:15 | switch from wall street journal for training to evaluation on sas |
---|
0:17:19 | and then you can see sometimes we |
---|
0:17:21 | trained on the same level that we |
---|
0:17:23 | tested on and other times that very |
---|
0:17:26 | and then there are two different ways of evaluating and to and performance based on |
---|
0:17:30 | whether you need an exact match and arguments or partial match obviously the partial matches |
---|
0:17:34 | a user evaluation so you get higher perform |
---|
0:17:37 | and here we can see that as we suspected our best results are obtained by |
---|
0:17:41 | retraining the parser so that it |
---|
0:17:44 | trains and test at the same sentence level |
---|
0:17:49 | although this is then are |
---|
0:17:51 | really a very careful |
---|
0:17:53 | possible to be a very careful comparison we were interested in just looking at absolute |
---|
0:17:58 | performance levels because of its that are real interest is using the output of parsing |
---|
0:18:03 | for downstream applications and although these performance levels are not greater apart from great people |
---|
0:18:09 | have been |
---|
0:18:10 | found that it is possible to use output of parsers from prior studies in these |
---|
0:18:15 | and so our goal was to make changes such as the changes to the annotation |
---|
0:18:20 | matt that the use of level one to get are absolute levels up to prior |
---|
0:18:24 | work |
---|
0:18:24 | in q that we could then use them |
---|
0:18:26 | so in the top are you can see what i had shown on the prior |
---|
0:18:29 | table on the bottom you can see some benchmarks |
---|
0:18:32 | what kind of the state-of-the-art in the literature so the first row here shows |
---|
0:18:36 | the same parser we use when not only trained in the way we use the |
---|
0:18:40 | protested on the same training data |
---|
0:18:42 | you can see that under both partial an exact match repair only comparable |
---|
0:18:47 | the second two rows show the best performing parser from the common all competition not |
---|
0:18:52 | this year but |
---|
0:18:54 | two thousand fifteen that was going one available the time we did our work |
---|
0:18:58 | and again you can see that even that was trained on the wall street journal |
---|
0:19:01 | tested on different levels |
---|
0:19:03 | that if you look at the last column at our performance levels are fairly comparable |
---|
0:19:08 | as well |
---|
0:19:10 | i'm m finally just a few more observations as you start earlier their different kind |
---|
0:19:17 | of relations that one can predicting explicit versus all the others |
---|
0:19:22 | a so we were interested in how performance very whether you went mutual that into |
---|
0:19:26 | account |
---|
0:19:28 | do not surprisingly again you can see that's much easier to predict explicit relations compared |
---|
0:19:33 | to non explicit relations in our corpora corpus that's true and all the other prior |
---|
0:19:37 | studies as well |
---|
0:19:39 | and this is largely due to the fact that it's based on first this connective |
---|
0:19:43 | identification which is fairly reliable in our case it's ninety percent which although good is |
---|
0:19:48 | still as i'm |
---|
0:19:50 | said a little lower than a prior corpora because the list of connectives the drive |
---|
0:19:54 | this |
---|
0:19:54 | was developed for the wall street journal and doesn't necessarily match as well as it |
---|
0:19:59 | could to a student data |
---|
0:20:03 | and finally when we looked at the two different ways of combining the levels for |
---|
0:20:06 | training and testing we can see that there was a clear benefit for the level |
---|
0:20:10 | one and training and testing for the non explicit results |
---|
0:20:14 | well for the level two we had lately i flipped version although the differences weren't |
---|
0:20:18 | quite is dramatic we can see that the training on a more specific one and |
---|
0:20:24 | testing on the abstracted version actually works better which suggests some sort of hybrid |
---|
0:20:29 | approach combining the two four n using different |
---|
0:20:33 | different parsers for different senses might give us better results than any other approach |
---|
0:20:40 | in the paper there's a lot of error analysis like detail confusion matrices if you're |
---|
0:20:45 | interested many years reflect interestingly many errors that the parser make reflect the cases that |
---|
0:20:51 | the annotator felt to be difficult ambiguities like discussed earlier and are they also mentioned |
---|
0:20:55 | the parser would never be able to actually get a hundred percent in our case |
---|
0:20:59 | because the |
---|
0:20:59 | the changes that we made to some of conventions |
---|
0:21:03 | which the current parsers that we're off-the-shelf don't yet have implemented |
---|
0:21:08 | okay so in this paper i tried to |
---|
0:21:12 | so analysis of a very will develop framework that's been used in many other languages |
---|
0:21:17 | and genres and how it sort of |
---|
0:21:19 | what get stressed when it's applied to this new corpora which differs and other three |
---|
0:21:24 | ways i've shown here |
---|
0:21:26 | first idea of manual relation annotation by comparing our distributions prior corpora we've identified some |
---|
0:21:32 | issues that some methodological complexity is an annotation that need to be further developed to |
---|
0:21:37 | a further enhance the generality of each led this framework and also could be used |
---|
0:21:42 | to |
---|
0:21:43 | motivate our writing tutors |
---|
0:21:45 | i with respect to automatic relation parsing our studies compared a variety of parsers and |
---|
0:21:50 | different training and testing condition |
---|
0:21:52 | and suggest that the approaches we made to our annotation framework you give us comparable |
---|
0:21:58 | results in an absolute performance level |
---|
0:22:02 | in our current directions unfortunately this data was not originally collected by me it was |
---|
0:22:06 | conducted by people who don't know anything about releasing corpora so that human studies subjects |
---|
0:22:11 | protocol did not |
---|
0:22:14 | we're not written such that can release the data but we're now creating a new |
---|
0:22:18 | corpus |
---|
0:22:19 | a similar type of data where that a problem has been fixed that were correctly |
---|
0:22:23 | i'm gonna be collecting and annotating the data and then should be able to make |
---|
0:22:27 | a |
---|
0:22:27 | corpus that's very similar to this publicly available |
---|
0:22:31 | i'm are also now doing a larger scale study of discourse parsing or basically trying |
---|
0:22:36 | to find anything that is available to the public and to use either off-the-shelf or |
---|
0:22:40 | for those that a lower retraining to actually retrain and on student data and tested |
---|
0:22:46 | on student data and what we eventually like to do is not just use them |
---|
0:22:50 | off the shelf a really try to |
---|
0:22:51 | modified them in ways to |
---|
0:22:53 | optimize them for a particular kind of performance |
---|
0:22:56 | and then finally were trying out to use the output of are both our automatic |
---|
0:23:01 | and manual annotation in downstream tasks in writing analysis as a scoring |
---|
0:23:06 | and revision our system we have some promising results there that are under submission |
---|
0:23:13 | thank you |
---|
0:23:36 | yes that would be |
---|
0:23:39 | he one place to do it or at some sort of confidence |
---|
0:23:42 | rating as well and try to use those in the analysis |
---|
0:24:27 | we're actually are doing that in two ways so one way is |
---|
0:24:31 | we are in our study of using discourse parsers would actually like to try some |
---|
0:24:36 | of the rst parsers even though our data isn't trained in that so we can't |
---|
0:24:39 | do in an intrinsic evaluation |
---|
0:24:41 | and how well that work since we are using it for other tasks such as |
---|
0:24:44 | that's a scoring and |
---|
0:24:46 | i'm revision analysis we could see of that more global discourse structure how words others |
---|
0:24:50 | have done those kind of comparative studies and down and it it's useful |
---|
0:24:54 | and the second thing we're doing is we are trying within the pdtb framework to |
---|
0:24:59 | to do some image the still not getting maybe at all really global structure but |
---|
0:25:03 | try to infer from these very local thing |
---|
0:25:05 | some length local ones by various inference rules and we've got some preliminary results that |
---|
0:25:11 | suggest that also promising approach |
---|
0:25:15 | and have |
---|
0:25:51 | i think at this point we're not necessarily |
---|
0:25:55 | i don't have such a lofty goal i think where more just telling them they |
---|
0:25:59 | should have a discourse marker as opposed to which one they should have |
---|
0:26:04 | but that's an interesting question which |
---|
0:26:07 | up to think about |
---|