Speech Transcript - Extracting PDTB Discourse Relations from Student Essays

and that

there are other structure might talk will be person going to motivate why we're looking

at pdtb in the context of this corpus

explain the corpus and then talk about you studies one involving manual annotation and one

involving automatic a discourse parsing

what are we looking at pdtb for student data

so probably most people are familiar with pdtb penn discourse treebank framework and i'm going

to use the abbreviation to refer to the framework

rather than actual corpus that on the wall street journal on when i talk about

that although it's a wall street journal

i ptt

it's one of the currently very on the dominant theories of discourse structure in the

community

it's lexically grounded and i'll give examples of what i mean by that the moment

and unlike other alternative theories such as rst it's much more shallow so basically the

analysis of the local level with relations and they have two arguments

it's become increasingly study because first there be and now a lot of studies in

many languages many genres

and spin shown that it's a framework that people can reliably annotate

and now because of all this annotation there's a lot of data which has really

screwed interest in automatic i'm discourse parsing

so they're bin in fact at the last two connell conferences their bin i shared

task and pdtb discourse parsing

so although it has been used in a lot of languages an honours genres one

area which it hasn't been used and is the area of interest that i work

in which a student can produce content

and in particular we've been looking at a corpus of student essays

which differ from a prior corpora that have been examined in this framework

along the three dimensions shown here

first there argumentative structure there basically have an argumentative nature

second on in addition to the text being somewhat different the people who are writing

the checks are also different than for example newspaper writers and that their students

so there's still learning how to

convey discourse structure and they also have a lot of other problems with other aspects

of writing more low-level issues

okay so the goals of the work of representing today or to fall so because

of these differences between student data and prior data where interested in looking at this

does this kind of corpus push

the annotation procedures that have been developed and i'm other genres

and also due to these differences how do you existing on discourse parsers that have

been developed primarily for the wall street journal

work on this more challenging domain

and from that sort of from my educate my and all p perspective

from my other had as a researcher and ai in education

i'm also interested in how we can use some of these issues to

support downstream applications which might take advantage of discourse analysis

such as i'm writing tutors and

that's a analysis and so forth

okay so let me briefly describe my corpus

there are data consist of first and second draft upper face persuasive essays written by

high school students in the pittsburgh area is were actually written in

the context and to classrooms

or corpus comes from forty seven students may each row to first and second raster

we have places many papers

and all of the data is in response to the prompted shown in red explain

why contemporary should be sent each of the first six sections of dante's help so

this is

in a class of advanced students in the us their advanced placement courses which prepare

students for taking stance which can given them

colors creditor help in place out of a college level english classes

and so in this corpus students first row there is a response to this problem

that is these were then given to other students in a peer review process where

they were graded according to a rubric a numerical great amount of feedback

and then they revise their papers

and to hopefully make it better

the here's an example of a fairly well written essay as dante descends into the

second circle he sees the sinners you make their reason for all under the oak

of their last these were the souls of those

the main act of love but inappropriately on an impulse this would be a fine

level of health for all those you cheat on their boyfriends or girlfriends and high

school

because let's face it they aren't really online

okay so by the second row the goal is to have people write this nice

persuasive essay with a fairly canonical structure there's usually be an introduction with each this

is laid out

and there should be some

paragraphs developing the reasoning so this was kind of where this example comes from and

then there should be and include

so a conclusion so the sas unlike for example the wall street journal where a

much of the pdtb working community have

has taken place a rs is because that's an argumentative structure

there has been another recent large-scale corpus that for all piled u r b in

the medical community where they looked at scientific medical argument

papers and so those are similar and the argumentative nature to our corpus but those

are written by you know as professional scientists unlike high school students so

even though they have the argument of the are corpus differs from them in the

level of that people producing the text

and i'm not gonna read this one in detail but here's an essay which is

an as well written

you can kind of read that in the background it's either sort problem lots of

levels

and so even though they get feedback still the at caesar quite noisy for many

students even after the

you know the final version

so in their problems range from low-level issues such as grammatical and spelling errors to

more discourse wearing to

issues of lack of coherence with references and discourse relations

okay so that's the data so i'm first gonna talk about how we created are

manual and annotated corpus

no for those for unfamiliar with p d p m briefly just gonna review some

major

annotation

things in the framework that we were interested in annotating

so as i said dvd to use the lexically or in to discourse theory which

have the idea that

discourse relations between two arguments can be seen signal but lexically

so when there's the explicit discourse connectives this is called an explicit relation one it's

not explicit then we have

these other options

so if the discourse connective isn't there explicitly but the annotator could put it in

there that called an implicit relation if the discourse relation would be redundant but relation

have an alternative lexical is asian that's a call all x

sometimes the coherence is not in terms of

the relation signal by connectives but by entities

and then in some cases there where we have incoherent

relations there were classified that is no relation so those are the five relation types

that will be annotating

for each of those relations then they can be categorized in terms of sentences and

so the full scale full blown theory of the pdtb framework has a hierarchical annotation

that you can see with this tree structure of our work because this was the

first

first study in we weren't even short we could do the

the highest level of the top of each of these for trees we limited our

current study to just that so we're just levelling

labelling them with respect to what's called level one which are the highest level of

the tree comparison contingency

expansion and temporal

and then as you can see in a full blown pdtb analysis

a temporal can then be labeled whether a synchronous or asynchronous and then if you

want all we channel-level three asynchronous could also be labeled with respect to whether it

runs that citizens or succession

okay so here just a few annotated examples to make this a little clear so

the first example

filled with hatred for many it never acts upon his room thoughts

the notation and all be using that is typically used in p d c t

is the connective is shown with underlines here the connective is yet because that actually

in the text

this is an explicit relation

and then it is

can be associated with several

senses and in this case it's labeled as a comparison and then it has two

arguments of the that the first argument are shown with that alex and the second

is shown in bold

next example the man was stuck in the slayers you have never use devoted his

entire life or other people's possible later in his own

so there's no connective here that's

just shown by the underlying

so this is an implicit relation because even though the writer doesn't put the connective

in the annotator could infer that an appropriate connective could have been placed there

i mainly because so it's implicit and then the sense of the relation that's implicitly

signal in this example is contingency

okay so that sort of the output of the annotations so the process is as

follows

so we retain

sort of the key aspects of g d g p of the pdtb framework namely

we wanted to annotate with respect to the five relation types that i

it just explain and the for level one senses

but following prior studies we modified some of the conventions to fit our domain which

i think that differ from some of the prior work

to help increase the reliability of the annotation and the time that a truck because

very expensive to

higher expert annotators to do this

the following our work that a apply this framework in handy are annotation basically made

one pass through as a so we did kind of relation and of time

because of our data having all these sort of low-level issues that you want see

for example in the wall street journal we allow annotator to a lower relations bit

one ungrammatical units of it was clear that

what really should have been

in written if the low-level problems

hadn't been there so here we see the first layer palette the vestibule in the

entrance of hail this is a large open gate symbolising that's easy to get into

so you can see that there's no capitalisation before for this and there's no period

after helmet we can also to put the there ourselves so we like

the annotator pretend that those real error and

it's we have those be the two arguments even know if we enforce this constraint

for well written text you want to have a lab that and then the relation

here is an entity relation there's no explicit or implicit connective between helen this but

we can infer coherence through entity

and i'd like to note that because of some of the modifications we may when

we apply the parsers which follow the strict p d

g p e d t p obviously they're not going to be able to get

these examples right so it will be impossible for

a parser to get a hundred percent on our corpus currently

another change that we made which we followed from the bible d r b corpus

which is i mentioned like ours is argumentative

is to permit implicit arguments non-adjacent within paragraph unit so you can see in this

example

we have the implicit relations so

so there's no so isn't actually in the text but the annotator felt could have

been place there so it's an implicit

and that's first argument of so is the first sentence in the place of the

porters while the second argument is although and as you can see

they're non-adjacent so in strict pdtb this one be allowed and we'd have

you'd are weaker relationship or no relationship and we missing some of the and this

was found as i said to be an issue

and that by d you're the corpus as well

okay so once we completed our annotation are first interest was in comparing how the

distribution of what we annotated compared to these other corpora in the literature to see

the impact of both

a the argumentative genre as well it's torque and conjoined with that

the

elementary level of the writing ability of the people producing the text

so on the first row you can see the distribution across the five relation types

or rs a data and them below you can see comparison with these two other

corpora that of mention the wall street journal and the by what you're be

and i've highlighted two things i just want to drive a talking there are more

details about some other things in the paper

never first unlike

the other two corpora which have

exactly the same percentage of explicitly signal relations are data has much fewer

and we believe this probably reflects the not this nature of people producing the taxes

there still actually learning how to construct

a coherent discourse and haven't quite figured out the proper use of connectives and so

as i said we feel this is something that discourse structure could be used in

downstream applications to highlight areas that might benefit from tutoring

we also see that although the last

column that use either the no relation

although it's very low in all of the corpora and are as we basically got

it down to zero and we believe that's because the loosening of the can adjacency

constraint although the by the are we also this not constraint may

still didn't really differ from the wall street journal

with respect to the other major component that we annotated the sense distributions

you can see in the first column at

but the sas in the buyer the rbf you were comparisons of this suggests that

this might be a feature that's relevant to the argument in nature of a text

rather than to the skill level of the writers and this is kind of opposite

to the contingency where we see that

wall street journal on the by dear d r b which are get burned whether

they're argumentative or not

or much more similar to each other as opposed to the sas where it is

the skill level of the students that is what's

a notable there

okay and then the final thing we that was identified in our manual annotation was

that the annotator had a lot of

ambiguities that she had trouble annotating that consistently euros

in particular between the three things i've shown there and i've just given two examples

and so in the first examples you had a lot of trouble deciding should this

be an implicit expansion or an entity relation and some of these concerns we're because

on the way pdtb works if there is a predefined as the connectives that came

out of largely the wall street journal and in our student data we're seeing a

lot of things which probably could

we consider connected but aren't you

that are resources that are used to guide most manual annotation efforts

here we see a another ambiguity between explicit expansion work and contingency

this

issue of causality with which is way to contingency was also a problem that was

in the by the european back they

added some extra senses to reflect sort of contingency that is specific to argumentation

okay so no turning to the automatic parsing

in this study we use the off-the-shelf than nl discourse parser which was the first

and on pdtb ptt parser it was produced that the national university of singapore

and was trained on the wall street journal

and it's basically has a pipeline architecture where a

a set of predefined discourse connective that i mentioned before identified once

those of identify then all the explicit relations are the arguments are identified in a

sign to sense and then all the non explicit relations are dealt with

and our study we use two versions of the parser we first use the one

that you base we can download directly which is trained on level to send systems

are data is only in terms of level one we could parse in terms of

level two and then

rewrite that in the more abstract level one versions

are we thought it might be more productive to actually retrain the parser by not

using the level two sentences in the wall street journal but simplifying them to level

one and then training and testing directly and

that are and us people finally we trained up your parts of force

in the second version

okay so here are on our results and to "'em" performance using f one score

which is

the standard way that these parsers are currently evaluated

so in the first column you can see the configuration for the training that particular

parser we use the data was trained on the level of the sense

sense is an annotation that was used for the training and then you can see

the testing situation in our case we not only

switch from wall street journal for training to evaluation on sas

and then you can see sometimes we

trained on the same level that we

tested on and other times that very

and then there are two different ways of evaluating and to and performance based on

whether you need an exact match and arguments or partial match obviously the partial matches

a user evaluation so you get higher perform

and here we can see that as we suspected our best results are obtained by

retraining the parser so that it

trains and test at the same sentence level

although this is then are

really a very careful

possible to be a very careful comparison we were interested in just looking at absolute

performance levels because of its that are real interest is using the output of parsing

for downstream applications and although these performance levels are not greater apart from great people

have been

found that it is possible to use output of parsers from prior studies in these

and so our goal was to make changes such as the changes to the annotation

matt that the use of level one to get are absolute levels up to prior

work

in q that we could then use them

so in the top are you can see what i had shown on the prior

table on the bottom you can see some benchmarks

what kind of the state-of-the-art in the literature so the first row here shows

the same parser we use when not only trained in the way we use the

protested on the same training data

you can see that under both partial an exact match repair only comparable

the second two rows show the best performing parser from the common all competition not

this year but

two thousand fifteen that was going one available the time we did our work

and again you can see that even that was trained on the wall street journal

tested on different levels

that if you look at the last column at our performance levels are fairly comparable

as well

i'm m finally just a few more observations as you start earlier their different kind

of relations that one can predicting explicit versus all the others

a so we were interested in how performance very whether you went mutual that into

account

do not surprisingly again you can see that's much easier to predict explicit relations compared

to non explicit relations in our corpora corpus that's true and all the other prior

studies as well

and this is largely due to the fact that it's based on first this connective

identification which is fairly reliable in our case it's ninety percent which although good is

still as i'm

said a little lower than a prior corpora because the list of connectives the drive

this

was developed for the wall street journal and doesn't necessarily match as well as it

could to a student data

and finally when we looked at the two different ways of combining the levels for

training and testing we can see that there was a clear benefit for the level

one and training and testing for the non explicit results

well for the level two we had lately i flipped version although the differences weren't

quite is dramatic we can see that the training on a more specific one and

testing on the abstracted version actually works better which suggests some sort of hybrid

approach combining the two four n using different

different parsers for different senses might give us better results than any other approach

in the paper there's a lot of error analysis like detail confusion matrices if you're

interested many years reflect interestingly many errors that the parser make reflect the cases that

the annotator felt to be difficult ambiguities like discussed earlier and are they also mentioned

the parser would never be able to actually get a hundred percent in our case

because the

the changes that we made to some of conventions

which the current parsers that we're off-the-shelf don't yet have implemented

okay so in this paper i tried to

so analysis of a very will develop framework that's been used in many other languages

and genres and how it sort of

what get stressed when it's applied to this new corpora which differs and other three

ways i've shown here

first idea of manual relation annotation by comparing our distributions prior corpora we've identified some

issues that some methodological complexity is an annotation that need to be further developed to

a further enhance the generality of each led this framework and also could be used

motivate our writing tutors

i with respect to automatic relation parsing our studies compared a variety of parsers and

different training and testing condition

and suggest that the approaches we made to our annotation framework you give us comparable

results in an absolute performance level

in our current directions unfortunately this data was not originally collected by me it was

conducted by people who don't know anything about releasing corpora so that human studies subjects

protocol did not

we're not written such that can release the data but we're now creating a new

corpus

a similar type of data where that a problem has been fixed that were correctly

i'm gonna be collecting and annotating the data and then should be able to make

corpus that's very similar to this publicly available

i'm are also now doing a larger scale study of discourse parsing or basically trying

to find anything that is available to the public and to use either off-the-shelf or

for those that a lower retraining to actually retrain and on student data and tested

on student data and what we eventually like to do is not just use them

off the shelf a really try to

modified them in ways to

optimize them for a particular kind of performance

and then finally were trying out to use the output of are both our automatic

and manual annotation in downstream tasks in writing analysis as a scoring

and revision our system we have some promising results there that are under submission

thank you

yes that would be

he one place to do it or at some sort of confidence

rating as well and try to use those in the analysis

we're actually are doing that in two ways so one way is

we are in our study of using discourse parsers would actually like to try some

of the rst parsers even though our data isn't trained in that so we can't

do in an intrinsic evaluation

and how well that work since we are using it for other tasks such as

that's a scoring and

i'm revision analysis we could see of that more global discourse structure how words others

have done those kind of comparative studies and down and it it's useful

and the second thing we're doing is we are trying within the pdtb framework to

to do some image the still not getting maybe at all really global structure but

try to infer from these very local thing

some length local ones by various inference rules and we've got some preliminary results that

suggest that also promising approach

and have

i think at this point we're not necessarily

i don't have such a lofty goal i think where more just telling them they

should have a discourse marker as opposed to which one they should have

but that's an interesting question which

up to think about

Extracting PDTB Discourse Relations from Student Essays

Oral Session 3: Discourse processing

Kate Forbes-Riley, Fan Zhang and Diane Litman