0:00:14and that
0:00:17there are other structure might talk will be person going to motivate why we're looking
0:00:21at pdtb in the context of this corpus
0:00:24explain the corpus and then talk about you studies one involving manual annotation and one
0:00:29involving automatic a discourse parsing
0:00:34what are we looking at pdtb for student data
0:00:38so probably most people are familiar with pdtb penn discourse treebank framework and i'm going
0:00:42to use the abbreviation to refer to the framework
0:00:45rather than actual corpus that on the wall street journal on when i talk about
0:00:49that although it's a wall street journal
0:00:51i ptt
0:00:54it's one of the currently very on the dominant theories of discourse structure in the
0:00:58community
0:00:59it's lexically grounded and i'll give examples of what i mean by that the moment
0:01:04and unlike other alternative theories such as rst it's much more shallow so basically the
0:01:10analysis of the local level with relations and they have two arguments
0:01:14it's become increasingly study because first there be and now a lot of studies in
0:01:20many languages many genres
0:01:22and spin shown that it's a framework that people can reliably annotate
0:01:26and now because of all this annotation there's a lot of data which has really
0:01:29screwed interest in automatic i'm discourse parsing
0:01:32so they're bin in fact at the last two connell conferences their bin i shared
0:01:37task and pdtb discourse parsing
0:01:43so although it has been used in a lot of languages an honours genres one
0:01:46area which it hasn't been used and is the area of interest that i work
0:01:49in which a student can produce content
0:01:53and in particular we've been looking at a corpus of student essays
0:01:57which differ from a prior corpora that have been examined in this framework
0:02:02along the three dimensions shown here
0:02:06first there argumentative structure there basically have an argumentative nature
0:02:10second on in addition to the text being somewhat different the people who are writing
0:02:15the checks are also different than for example newspaper writers and that their students
0:02:19so there's still learning how to
0:02:22convey discourse structure and they also have a lot of other problems with other aspects
0:02:25of writing more low-level issues
0:02:30okay so the goals of the work of representing today or to fall so because
0:02:35of these differences between student data and prior data where interested in looking at this
0:02:41does this kind of corpus push
0:02:43the annotation procedures that have been developed and i'm other genres
0:02:47and also due to these differences how do you existing on discourse parsers that have
0:02:52been developed primarily for the wall street journal
0:02:55work on this more challenging domain
0:02:58and from that sort of from my educate my and all p perspective
0:03:01from my other had as a researcher and ai in education
0:03:06i'm also interested in how we can use some of these issues to
0:03:10support downstream applications which might take advantage of discourse analysis
0:03:15such as i'm writing tutors and
0:03:19that's a analysis and so forth
0:03:22okay so let me briefly describe my corpus
0:03:26there are data consist of first and second draft upper face persuasive essays written by
0:03:31high school students in the pittsburgh area is were actually written in
0:03:35the context and to classrooms
0:03:38or corpus comes from forty seven students may each row to first and second raster
0:03:42we have places many papers
0:03:44and all of the data is in response to the prompted shown in red explain
0:03:50why contemporary should be sent each of the first six sections of dante's help so
0:03:54this is
0:03:54in a class of advanced students in the us their advanced placement courses which prepare
0:04:00students for taking stance which can given them
0:04:03colors creditor help in place out of a college level english classes
0:04:08and so in this corpus students first row there is a response to this problem
0:04:12that is these were then given to other students in a peer review process where
0:04:17they were graded according to a rubric a numerical great amount of feedback
0:04:21and then they revise their papers
0:04:22and to hopefully make it better
0:04:27the here's an example of a fairly well written essay as dante descends into the
0:04:32second circle he sees the sinners you make their reason for all under the oak
0:04:36of their last these were the souls of those
0:04:38the main act of love but inappropriately on an impulse this would be a fine
0:04:42level of health for all those you cheat on their boyfriends or girlfriends and high
0:04:45school
0:04:46because let's face it they aren't really online
0:04:50okay so by the second row the goal is to have people write this nice
0:04:54persuasive essay with a fairly canonical structure there's usually be an introduction with each this
0:04:59is laid out
0:05:00and there should be some
0:05:02paragraphs developing the reasoning so this was kind of where this example comes from and
0:05:07then there should be and include
0:05:09so a conclusion so the sas unlike for example the wall street journal where a
0:05:13much of the pdtb working community have
0:05:16has taken place a rs is because that's an argumentative structure
0:05:21there has been another recent large-scale corpus that for all piled u r b in
0:05:26the medical community where they looked at scientific medical argument
0:05:30papers and so those are similar and the argumentative nature to our corpus but those
0:05:35are written by you know as professional scientists unlike high school students so
0:05:40even though they have the argument of the are corpus differs from them in the
0:05:43level of that people producing the text
0:05:47and i'm not gonna read this one in detail but here's an essay which is
0:05:50an as well written
0:05:51you can kind of read that in the background it's either sort problem lots of
0:05:56levels
0:05:57and so even though they get feedback still the at caesar quite noisy for many
0:06:02students even after the
0:06:04you know the final version
0:06:06so in their problems range from low-level issues such as grammatical and spelling errors to
0:06:10more discourse wearing to
0:06:12issues of lack of coherence with references and discourse relations
0:06:19okay so that's the data so i'm first gonna talk about how we created are
0:06:24manual and annotated corpus
0:06:29no for those for unfamiliar with p d p m briefly just gonna review some
0:06:33of
0:06:35major
0:06:36annotation
0:06:38things in the framework that we were interested in annotating
0:06:40so as i said dvd to use the lexically or in to discourse theory which
0:06:45have the idea that
0:06:47discourse relations between two arguments can be seen signal but lexically
0:06:51so when there's the explicit discourse connectives this is called an explicit relation one it's
0:06:55not explicit then we have
0:06:57these other options
0:07:00so if the discourse connective isn't there explicitly but the annotator could put it in
0:07:04there that called an implicit relation if the discourse relation would be redundant but relation
0:07:10have an alternative lexical is asian that's a call all x
0:07:14sometimes the coherence is not in terms of
0:07:18the relation signal by connectives but by entities
0:07:21and then in some cases there where we have incoherent
0:07:25relations there were classified that is no relation so those are the five relation types
0:07:30that will be annotating
0:07:33for each of those relations then they can be categorized in terms of sentences and
0:07:38so the full scale full blown theory of the pdtb framework has a hierarchical annotation
0:07:43that you can see with this tree structure of our work because this was the
0:07:48first
0:07:49first study in we weren't even short we could do the
0:07:53the highest level of the top of each of these for trees we limited our
0:07:59current study to just that so we're just levelling
0:08:02labelling them with respect to what's called level one which are the highest level of
0:08:06the tree comparison contingency
0:08:09expansion and temporal
0:08:10and then as you can see in a full blown pdtb analysis
0:08:14a temporal can then be labeled whether a synchronous or asynchronous and then if you
0:08:19want all we channel-level three asynchronous could also be labeled with respect to whether it
0:08:23runs that citizens or succession
0:08:27okay so here just a few annotated examples to make this a little clear so
0:08:32the first example
0:08:34filled with hatred for many it never acts upon his room thoughts
0:08:38the notation and all be using that is typically used in p d c t
0:08:42is the connective is shown with underlines here the connective is yet because that actually
0:08:47in the text
0:08:48this is an explicit relation
0:08:50and then it is
0:08:51can be associated with several
0:08:54senses and in this case it's labeled as a comparison and then it has two
0:08:59arguments of the that the first argument are shown with that alex and the second
0:09:02is shown in bold
0:09:04next example the man was stuck in the slayers you have never use devoted his
0:09:08entire life or other people's possible later in his own
0:09:12so there's no connective here that's
0:09:15just shown by the underlying
0:09:16so this is an implicit relation because even though the writer doesn't put the connective
0:09:21in the annotator could infer that an appropriate connective could have been placed there
0:09:25i mainly because so it's implicit and then the sense of the relation that's implicitly
0:09:30signal in this example is contingency
0:09:35okay so that sort of the output of the annotations so the process is as
0:09:40follows
0:09:41so we retain
0:09:42sort of the key aspects of g d g p of the pdtb framework namely
0:09:46we wanted to annotate with respect to the five relation types that i
0:09:50it just explain and the for level one senses
0:09:53but following prior studies we modified some of the conventions to fit our domain which
0:09:58i think that differ from some of the prior work
0:10:00to help increase the reliability of the annotation and the time that a truck because
0:10:05very expensive to
0:10:06higher expert annotators to do this
0:10:10the following our work that a apply this framework in handy are annotation basically made
0:10:16one pass through as a so we did kind of relation and of time
0:10:21because of our data having all these sort of low-level issues that you want see
0:10:24for example in the wall street journal we allow annotator to a lower relations bit
0:10:29one ungrammatical units of it was clear that
0:10:32what really should have been
0:10:33in written if the low-level problems
0:10:36hadn't been there so here we see the first layer palette the vestibule in the
0:10:39entrance of hail this is a large open gate symbolising that's easy to get into
0:10:43so you can see that there's no capitalisation before for this and there's no period
0:10:49after helmet we can also to put the there ourselves so we like
0:10:52the annotator pretend that those real error and
0:10:57it's we have those be the two arguments even know if we enforce this constraint
0:11:01for well written text you want to have a lab that and then the relation
0:11:04here is an entity relation there's no explicit or implicit connective between helen this but
0:11:10we can infer coherence through entity
0:11:13and i'd like to note that because of some of the modifications we may when
0:11:17we apply the parsers which follow the strict p d
0:11:19g p e d t p obviously they're not going to be able to get
0:11:24these examples right so it will be impossible for
0:11:26a parser to get a hundred percent on our corpus currently
0:11:32another change that we made which we followed from the bible d r b corpus
0:11:36which is i mentioned like ours is argumentative
0:11:39is to permit implicit arguments non-adjacent within paragraph unit so you can see in this
0:11:44example
0:11:46we have the implicit relations so
0:11:49so there's no so isn't actually in the text but the annotator felt could have
0:11:53been place there so it's an implicit
0:11:55and that's first argument of so is the first sentence in the place of the
0:11:59porters while the second argument is although and as you can see
0:12:02they're non-adjacent so in strict pdtb this one be allowed and we'd have
0:12:06you'd are weaker relationship or no relationship and we missing some of the and this
0:12:11was found as i said to be an issue
0:12:13and that by d you're the corpus as well
0:12:18okay so once we completed our annotation are first interest was in comparing how the
0:12:24distribution of what we annotated compared to these other corpora in the literature to see
0:12:28the impact of both
0:12:30a the argumentative genre as well it's torque and conjoined with that
0:12:34the
0:12:36elementary level of the writing ability of the people producing the text
0:12:40so on the first row you can see the distribution across the five relation types
0:12:44or rs a data and them below you can see comparison with these two other
0:12:48corpora that of mention the wall street journal and the by what you're be
0:12:52and i've highlighted two things i just want to drive a talking there are more
0:12:55details about some other things in the paper
0:12:58never first unlike
0:13:00the other two corpora which have
0:13:02exactly the same percentage of explicitly signal relations are data has much fewer
0:13:07and we believe this probably reflects the not this nature of people producing the taxes
0:13:13there still actually learning how to construct
0:13:15a coherent discourse and haven't quite figured out the proper use of connectives and so
0:13:19as i said we feel this is something that discourse structure could be used in
0:13:23downstream applications to highlight areas that might benefit from tutoring
0:13:29we also see that although the last
0:13:33column that use either the no relation
0:13:35although it's very low in all of the corpora and are as we basically got
0:13:39it down to zero and we believe that's because the loosening of the can adjacency
0:13:43constraint although the by the are we also this not constraint may
0:13:46still didn't really differ from the wall street journal
0:13:51with respect to the other major component that we annotated the sense distributions
0:13:56you can see in the first column at
0:13:59but the sas in the buyer the rbf you were comparisons of this suggests that
0:14:03this might be a feature that's relevant to the argument in nature of a text
0:14:06rather than to the skill level of the writers and this is kind of opposite
0:14:11to the contingency where we see that
0:14:14wall street journal on the by dear d r b which are get burned whether
0:14:18they're argumentative or not
0:14:19or much more similar to each other as opposed to the sas where it is
0:14:23the skill level of the students that is what's
0:14:26a notable there
0:14:31okay and then the final thing we that was identified in our manual annotation was
0:14:37that the annotator had a lot of
0:14:42ambiguities that she had trouble annotating that consistently euros
0:14:45in particular between the three things i've shown there and i've just given two examples
0:14:50and so in the first examples you had a lot of trouble deciding should this
0:14:53be an implicit expansion or an entity relation and some of these concerns we're because
0:14:58on the way pdtb works if there is a predefined as the connectives that came
0:15:02out of largely the wall street journal and in our student data we're seeing a
0:15:05lot of things which probably could
0:15:07we consider connected but aren't you
0:15:09that are resources that are used to guide most manual annotation efforts
0:15:17here we see a another ambiguity between explicit expansion work and contingency
0:15:24this
0:15:25issue of causality with which is way to contingency was also a problem that was
0:15:30in the by the european back they
0:15:32added some extra senses to reflect sort of contingency that is specific to argumentation
0:15:40okay so no turning to the automatic parsing
0:15:44in this study we use the off-the-shelf than nl discourse parser which was the first
0:15:48and on pdtb ptt parser it was produced that the national university of singapore
0:15:55and was trained on the wall street journal
0:15:57and it's basically has a pipeline architecture where a
0:16:01a set of predefined discourse connective that i mentioned before identified once
0:16:05those of identify then all the explicit relations are the arguments are identified in a
0:16:10sign to sense and then all the non explicit relations are dealt with
0:16:15and our study we use two versions of the parser we first use the one
0:16:19that you base we can download directly which is trained on level to send systems
0:16:23are data is only in terms of level one we could parse in terms of
0:16:26level two and then
0:16:28rewrite that in the more abstract level one versions
0:16:31are we thought it might be more productive to actually retrain the parser by not
0:16:35using the level two sentences in the wall street journal but simplifying them to level
0:16:40one and then training and testing directly and
0:16:42that are and us people finally we trained up your parts of force
0:16:47in the second version
0:16:51okay so here are on our results and to "'em" performance using f one score
0:16:56which is
0:16:57the standard way that these parsers are currently evaluated
0:17:01so in the first column you can see the configuration for the training that particular
0:17:05parser we use the data was trained on the level of the sense
0:17:09sense is an annotation that was used for the training and then you can see
0:17:12the testing situation in our case we not only
0:17:15switch from wall street journal for training to evaluation on sas
0:17:19and then you can see sometimes we
0:17:21trained on the same level that we
0:17:23tested on and other times that very
0:17:26and then there are two different ways of evaluating and to and performance based on
0:17:30whether you need an exact match and arguments or partial match obviously the partial matches
0:17:34a user evaluation so you get higher perform
0:17:37and here we can see that as we suspected our best results are obtained by
0:17:41retraining the parser so that it
0:17:44trains and test at the same sentence level
0:17:49although this is then are
0:17:51really a very careful
0:17:53possible to be a very careful comparison we were interested in just looking at absolute
0:17:58performance levels because of its that are real interest is using the output of parsing
0:18:03for downstream applications and although these performance levels are not greater apart from great people
0:18:09have been
0:18:10found that it is possible to use output of parsers from prior studies in these
0:18:15and so our goal was to make changes such as the changes to the annotation
0:18:20matt that the use of level one to get are absolute levels up to prior
0:18:24work
0:18:24in q that we could then use them
0:18:26so in the top are you can see what i had shown on the prior
0:18:29table on the bottom you can see some benchmarks
0:18:32what kind of the state-of-the-art in the literature so the first row here shows
0:18:36the same parser we use when not only trained in the way we use the
0:18:40protested on the same training data
0:18:42you can see that under both partial an exact match repair only comparable
0:18:47the second two rows show the best performing parser from the common all competition not
0:18:52this year but
0:18:54two thousand fifteen that was going one available the time we did our work
0:18:58and again you can see that even that was trained on the wall street journal
0:19:01tested on different levels
0:19:03that if you look at the last column at our performance levels are fairly comparable
0:19:08as well
0:19:10i'm m finally just a few more observations as you start earlier their different kind
0:19:17of relations that one can predicting explicit versus all the others
0:19:22a so we were interested in how performance very whether you went mutual that into
0:19:26account
0:19:28do not surprisingly again you can see that's much easier to predict explicit relations compared
0:19:33to non explicit relations in our corpora corpus that's true and all the other prior
0:19:37studies as well
0:19:39and this is largely due to the fact that it's based on first this connective
0:19:43identification which is fairly reliable in our case it's ninety percent which although good is
0:19:48still as i'm
0:19:50said a little lower than a prior corpora because the list of connectives the drive
0:19:54this
0:19:54was developed for the wall street journal and doesn't necessarily match as well as it
0:19:59could to a student data
0:20:03and finally when we looked at the two different ways of combining the levels for
0:20:06training and testing we can see that there was a clear benefit for the level
0:20:10one and training and testing for the non explicit results
0:20:14well for the level two we had lately i flipped version although the differences weren't
0:20:18quite is dramatic we can see that the training on a more specific one and
0:20:24testing on the abstracted version actually works better which suggests some sort of hybrid
0:20:29approach combining the two four n using different
0:20:33different parsers for different senses might give us better results than any other approach
0:20:40in the paper there's a lot of error analysis like detail confusion matrices if you're
0:20:45interested many years reflect interestingly many errors that the parser make reflect the cases that
0:20:51the annotator felt to be difficult ambiguities like discussed earlier and are they also mentioned
0:20:55the parser would never be able to actually get a hundred percent in our case
0:20:59because the
0:20:59the changes that we made to some of conventions
0:21:03which the current parsers that we're off-the-shelf don't yet have implemented
0:21:08okay so in this paper i tried to
0:21:12so analysis of a very will develop framework that's been used in many other languages
0:21:17and genres and how it sort of
0:21:19what get stressed when it's applied to this new corpora which differs and other three
0:21:24ways i've shown here
0:21:26first idea of manual relation annotation by comparing our distributions prior corpora we've identified some
0:21:32issues that some methodological complexity is an annotation that need to be further developed to
0:21:37a further enhance the generality of each led this framework and also could be used
0:21:42to
0:21:43motivate our writing tutors
0:21:45i with respect to automatic relation parsing our studies compared a variety of parsers and
0:21:50different training and testing condition
0:21:52and suggest that the approaches we made to our annotation framework you give us comparable
0:21:58results in an absolute performance level
0:22:02in our current directions unfortunately this data was not originally collected by me it was
0:22:06conducted by people who don't know anything about releasing corpora so that human studies subjects
0:22:11protocol did not
0:22:14we're not written such that can release the data but we're now creating a new
0:22:18corpus
0:22:19a similar type of data where that a problem has been fixed that were correctly
0:22:23i'm gonna be collecting and annotating the data and then should be able to make
0:22:27a
0:22:27corpus that's very similar to this publicly available
0:22:31i'm are also now doing a larger scale study of discourse parsing or basically trying
0:22:36to find anything that is available to the public and to use either off-the-shelf or
0:22:40for those that a lower retraining to actually retrain and on student data and tested
0:22:46on student data and what we eventually like to do is not just use them
0:22:50off the shelf a really try to
0:22:51modified them in ways to
0:22:53optimize them for a particular kind of performance
0:22:56and then finally were trying out to use the output of are both our automatic
0:23:01and manual annotation in downstream tasks in writing analysis as a scoring
0:23:06and revision our system we have some promising results there that are under submission
0:23:13thank you
0:23:36yes that would be
0:23:39he one place to do it or at some sort of confidence
0:23:42rating as well and try to use those in the analysis
0:24:27we're actually are doing that in two ways so one way is
0:24:31we are in our study of using discourse parsers would actually like to try some
0:24:36of the rst parsers even though our data isn't trained in that so we can't
0:24:39do in an intrinsic evaluation
0:24:41and how well that work since we are using it for other tasks such as
0:24:44that's a scoring and
0:24:46i'm revision analysis we could see of that more global discourse structure how words others
0:24:50have done those kind of comparative studies and down and it it's useful
0:24:54and the second thing we're doing is we are trying within the pdtb framework to
0:24:59to do some image the still not getting maybe at all really global structure but
0:25:03try to infer from these very local thing
0:25:05some length local ones by various inference rules and we've got some preliminary results that
0:25:11suggest that also promising approach
0:25:15and have
0:25:51i think at this point we're not necessarily
0:25:55i don't have such a lofty goal i think where more just telling them they
0:25:59should have a discourse marker as opposed to which one they should have
0:26:04but that's an interesting question which
0:26:07up to think about