and that
there are other structure might talk will be person going to motivate why we're looking
at pdtb in the context of this corpus
explain the corpus and then talk about you studies one involving manual annotation and one
involving automatic a discourse parsing
what are we looking at pdtb for student data
so probably most people are familiar with pdtb penn discourse treebank framework and i'm going
to use the abbreviation to refer to the framework
rather than actual corpus that on the wall street journal on when i talk about
that although it's a wall street journal
i ptt
it's one of the currently very on the dominant theories of discourse structure in the
community
it's lexically grounded and i'll give examples of what i mean by that the moment
and unlike other alternative theories such as rst it's much more shallow so basically the
analysis of the local level with relations and they have two arguments
it's become increasingly study because first there be and now a lot of studies in
many languages many genres
and spin shown that it's a framework that people can reliably annotate
and now because of all this annotation there's a lot of data which has really
screwed interest in automatic i'm discourse parsing
so they're bin in fact at the last two connell conferences their bin i shared
task and pdtb discourse parsing
so although it has been used in a lot of languages an honours genres one
area which it hasn't been used and is the area of interest that i work
in which a student can produce content
and in particular we've been looking at a corpus of student essays
which differ from a prior corpora that have been examined in this framework
along the three dimensions shown here
first there argumentative structure there basically have an argumentative nature
second on in addition to the text being somewhat different the people who are writing
the checks are also different than for example newspaper writers and that their students
so there's still learning how to
convey discourse structure and they also have a lot of other problems with other aspects
of writing more low-level issues
okay so the goals of the work of representing today or to fall so because
of these differences between student data and prior data where interested in looking at this
does this kind of corpus push
the annotation procedures that have been developed and i'm other genres
and also due to these differences how do you existing on discourse parsers that have
been developed primarily for the wall street journal
work on this more challenging domain
and from that sort of from my educate my and all p perspective
from my other had as a researcher and ai in education
i'm also interested in how we can use some of these issues to
support downstream applications which might take advantage of discourse analysis
such as i'm writing tutors and
that's a analysis and so forth
okay so let me briefly describe my corpus
there are data consist of first and second draft upper face persuasive essays written by
high school students in the pittsburgh area is were actually written in
the context and to classrooms
or corpus comes from forty seven students may each row to first and second raster
we have places many papers
and all of the data is in response to the prompted shown in red explain
why contemporary should be sent each of the first six sections of dante's help so
this is
in a class of advanced students in the us their advanced placement courses which prepare
students for taking stance which can given them
colors creditor help in place out of a college level english classes
and so in this corpus students first row there is a response to this problem
that is these were then given to other students in a peer review process where
they were graded according to a rubric a numerical great amount of feedback
and then they revise their papers
and to hopefully make it better
the here's an example of a fairly well written essay as dante descends into the
second circle he sees the sinners you make their reason for all under the oak
of their last these were the souls of those
the main act of love but inappropriately on an impulse this would be a fine
level of health for all those you cheat on their boyfriends or girlfriends and high
school
because let's face it they aren't really online
okay so by the second row the goal is to have people write this nice
persuasive essay with a fairly canonical structure there's usually be an introduction with each this
is laid out
and there should be some
paragraphs developing the reasoning so this was kind of where this example comes from and
then there should be and include
so a conclusion so the sas unlike for example the wall street journal where a
much of the pdtb working community have
has taken place a rs is because that's an argumentative structure
there has been another recent large-scale corpus that for all piled u r b in
the medical community where they looked at scientific medical argument
papers and so those are similar and the argumentative nature to our corpus but those
are written by you know as professional scientists unlike high school students so
even though they have the argument of the are corpus differs from them in the
level of that people producing the text
and i'm not gonna read this one in detail but here's an essay which is
an as well written
you can kind of read that in the background it's either sort problem lots of
levels
and so even though they get feedback still the at caesar quite noisy for many
students even after the
you know the final version
so in their problems range from low-level issues such as grammatical and spelling errors to
more discourse wearing to
issues of lack of coherence with references and discourse relations
okay so that's the data so i'm first gonna talk about how we created are
manual and annotated corpus
no for those for unfamiliar with p d p m briefly just gonna review some
of
major
annotation
things in the framework that we were interested in annotating
so as i said dvd to use the lexically or in to discourse theory which
have the idea that
discourse relations between two arguments can be seen signal but lexically
so when there's the explicit discourse connectives this is called an explicit relation one it's
not explicit then we have
these other options
so if the discourse connective isn't there explicitly but the annotator could put it in
there that called an implicit relation if the discourse relation would be redundant but relation
have an alternative lexical is asian that's a call all x
sometimes the coherence is not in terms of
the relation signal by connectives but by entities
and then in some cases there where we have incoherent
relations there were classified that is no relation so those are the five relation types
that will be annotating
for each of those relations then they can be categorized in terms of sentences and
so the full scale full blown theory of the pdtb framework has a hierarchical annotation
that you can see with this tree structure of our work because this was the
first
first study in we weren't even short we could do the
the highest level of the top of each of these for trees we limited our
current study to just that so we're just levelling
labelling them with respect to what's called level one which are the highest level of
the tree comparison contingency
expansion and temporal
and then as you can see in a full blown pdtb analysis
a temporal can then be labeled whether a synchronous or asynchronous and then if you
want all we channel-level three asynchronous could also be labeled with respect to whether it
runs that citizens or succession
okay so here just a few annotated examples to make this a little clear so
the first example
filled with hatred for many it never acts upon his room thoughts
the notation and all be using that is typically used in p d c t
is the connective is shown with underlines here the connective is yet because that actually
in the text
this is an explicit relation
and then it is
can be associated with several
senses and in this case it's labeled as a comparison and then it has two
arguments of the that the first argument are shown with that alex and the second
is shown in bold
next example the man was stuck in the slayers you have never use devoted his
entire life or other people's possible later in his own
so there's no connective here that's
just shown by the underlying
so this is an implicit relation because even though the writer doesn't put the connective
in the annotator could infer that an appropriate connective could have been placed there
i mainly because so it's implicit and then the sense of the relation that's implicitly
signal in this example is contingency
okay so that sort of the output of the annotations so the process is as
follows
so we retain
sort of the key aspects of g d g p of the pdtb framework namely
we wanted to annotate with respect to the five relation types that i
it just explain and the for level one senses
but following prior studies we modified some of the conventions to fit our domain which
i think that differ from some of the prior work
to help increase the reliability of the annotation and the time that a truck because
very expensive to
higher expert annotators to do this
the following our work that a apply this framework in handy are annotation basically made
one pass through as a so we did kind of relation and of time
because of our data having all these sort of low-level issues that you want see
for example in the wall street journal we allow annotator to a lower relations bit
one ungrammatical units of it was clear that
what really should have been
in written if the low-level problems
hadn't been there so here we see the first layer palette the vestibule in the
entrance of hail this is a large open gate symbolising that's easy to get into
so you can see that there's no capitalisation before for this and there's no period
after helmet we can also to put the there ourselves so we like
the annotator pretend that those real error and
it's we have those be the two arguments even know if we enforce this constraint
for well written text you want to have a lab that and then the relation
here is an entity relation there's no explicit or implicit connective between helen this but
we can infer coherence through entity
and i'd like to note that because of some of the modifications we may when
we apply the parsers which follow the strict p d
g p e d t p obviously they're not going to be able to get
these examples right so it will be impossible for
a parser to get a hundred percent on our corpus currently
another change that we made which we followed from the bible d r b corpus
which is i mentioned like ours is argumentative
is to permit implicit arguments non-adjacent within paragraph unit so you can see in this
example
we have the implicit relations so
so there's no so isn't actually in the text but the annotator felt could have
been place there so it's an implicit
and that's first argument of so is the first sentence in the place of the
porters while the second argument is although and as you can see
they're non-adjacent so in strict pdtb this one be allowed and we'd have
you'd are weaker relationship or no relationship and we missing some of the and this
was found as i said to be an issue
and that by d you're the corpus as well
okay so once we completed our annotation are first interest was in comparing how the
distribution of what we annotated compared to these other corpora in the literature to see
the impact of both
a the argumentative genre as well it's torque and conjoined with that
the
elementary level of the writing ability of the people producing the text
so on the first row you can see the distribution across the five relation types
or rs a data and them below you can see comparison with these two other
corpora that of mention the wall street journal and the by what you're be
and i've highlighted two things i just want to drive a talking there are more
details about some other things in the paper
never first unlike
the other two corpora which have
exactly the same percentage of explicitly signal relations are data has much fewer
and we believe this probably reflects the not this nature of people producing the taxes
there still actually learning how to construct
a coherent discourse and haven't quite figured out the proper use of connectives and so
as i said we feel this is something that discourse structure could be used in
downstream applications to highlight areas that might benefit from tutoring
we also see that although the last
column that use either the no relation
although it's very low in all of the corpora and are as we basically got
it down to zero and we believe that's because the loosening of the can adjacency
constraint although the by the are we also this not constraint may
still didn't really differ from the wall street journal
with respect to the other major component that we annotated the sense distributions
you can see in the first column at
but the sas in the buyer the rbf you were comparisons of this suggests that
this might be a feature that's relevant to the argument in nature of a text
rather than to the skill level of the writers and this is kind of opposite
to the contingency where we see that
wall street journal on the by dear d r b which are get burned whether
they're argumentative or not
or much more similar to each other as opposed to the sas where it is
the skill level of the students that is what's
a notable there
okay and then the final thing we that was identified in our manual annotation was
that the annotator had a lot of
ambiguities that she had trouble annotating that consistently euros
in particular between the three things i've shown there and i've just given two examples
and so in the first examples you had a lot of trouble deciding should this
be an implicit expansion or an entity relation and some of these concerns we're because
on the way pdtb works if there is a predefined as the connectives that came
out of largely the wall street journal and in our student data we're seeing a
lot of things which probably could
we consider connected but aren't you
that are resources that are used to guide most manual annotation efforts
here we see a another ambiguity between explicit expansion work and contingency
this
issue of causality with which is way to contingency was also a problem that was
in the by the european back they
added some extra senses to reflect sort of contingency that is specific to argumentation
okay so no turning to the automatic parsing
in this study we use the off-the-shelf than nl discourse parser which was the first
and on pdtb ptt parser it was produced that the national university of singapore
and was trained on the wall street journal
and it's basically has a pipeline architecture where a
a set of predefined discourse connective that i mentioned before identified once
those of identify then all the explicit relations are the arguments are identified in a
sign to sense and then all the non explicit relations are dealt with
and our study we use two versions of the parser we first use the one
that you base we can download directly which is trained on level to send systems
are data is only in terms of level one we could parse in terms of
level two and then
rewrite that in the more abstract level one versions
are we thought it might be more productive to actually retrain the parser by not
using the level two sentences in the wall street journal but simplifying them to level
one and then training and testing directly and
that are and us people finally we trained up your parts of force
in the second version
okay so here are on our results and to "'em" performance using f one score
which is
the standard way that these parsers are currently evaluated
so in the first column you can see the configuration for the training that particular
parser we use the data was trained on the level of the sense
sense is an annotation that was used for the training and then you can see
the testing situation in our case we not only
switch from wall street journal for training to evaluation on sas
and then you can see sometimes we
trained on the same level that we
tested on and other times that very
and then there are two different ways of evaluating and to and performance based on
whether you need an exact match and arguments or partial match obviously the partial matches
a user evaluation so you get higher perform
and here we can see that as we suspected our best results are obtained by
retraining the parser so that it
trains and test at the same sentence level
although this is then are
really a very careful
possible to be a very careful comparison we were interested in just looking at absolute
performance levels because of its that are real interest is using the output of parsing
for downstream applications and although these performance levels are not greater apart from great people
have been
found that it is possible to use output of parsers from prior studies in these
and so our goal was to make changes such as the changes to the annotation
matt that the use of level one to get are absolute levels up to prior
work
in q that we could then use them
so in the top are you can see what i had shown on the prior
table on the bottom you can see some benchmarks
what kind of the state-of-the-art in the literature so the first row here shows
the same parser we use when not only trained in the way we use the
protested on the same training data
you can see that under both partial an exact match repair only comparable
the second two rows show the best performing parser from the common all competition not
this year but
two thousand fifteen that was going one available the time we did our work
and again you can see that even that was trained on the wall street journal
tested on different levels
that if you look at the last column at our performance levels are fairly comparable
as well
i'm m finally just a few more observations as you start earlier their different kind
of relations that one can predicting explicit versus all the others
a so we were interested in how performance very whether you went mutual that into
account
do not surprisingly again you can see that's much easier to predict explicit relations compared
to non explicit relations in our corpora corpus that's true and all the other prior
studies as well
and this is largely due to the fact that it's based on first this connective
identification which is fairly reliable in our case it's ninety percent which although good is
still as i'm
said a little lower than a prior corpora because the list of connectives the drive
this
was developed for the wall street journal and doesn't necessarily match as well as it
could to a student data
and finally when we looked at the two different ways of combining the levels for
training and testing we can see that there was a clear benefit for the level
one and training and testing for the non explicit results
well for the level two we had lately i flipped version although the differences weren't
quite is dramatic we can see that the training on a more specific one and
testing on the abstracted version actually works better which suggests some sort of hybrid
approach combining the two four n using different
different parsers for different senses might give us better results than any other approach
in the paper there's a lot of error analysis like detail confusion matrices if you're
interested many years reflect interestingly many errors that the parser make reflect the cases that
the annotator felt to be difficult ambiguities like discussed earlier and are they also mentioned
the parser would never be able to actually get a hundred percent in our case
because the
the changes that we made to some of conventions
which the current parsers that we're off-the-shelf don't yet have implemented
okay so in this paper i tried to
so analysis of a very will develop framework that's been used in many other languages
and genres and how it sort of
what get stressed when it's applied to this new corpora which differs and other three
ways i've shown here
first idea of manual relation annotation by comparing our distributions prior corpora we've identified some
issues that some methodological complexity is an annotation that need to be further developed to
a further enhance the generality of each led this framework and also could be used
to
motivate our writing tutors
i with respect to automatic relation parsing our studies compared a variety of parsers and
different training and testing condition
and suggest that the approaches we made to our annotation framework you give us comparable
results in an absolute performance level
in our current directions unfortunately this data was not originally collected by me it was
conducted by people who don't know anything about releasing corpora so that human studies subjects
protocol did not
we're not written such that can release the data but we're now creating a new
corpus
a similar type of data where that a problem has been fixed that were correctly
i'm gonna be collecting and annotating the data and then should be able to make
a
corpus that's very similar to this publicly available
i'm are also now doing a larger scale study of discourse parsing or basically trying
to find anything that is available to the public and to use either off-the-shelf or
for those that a lower retraining to actually retrain and on student data and tested
on student data and what we eventually like to do is not just use them
off the shelf a really try to
modified them in ways to
optimize them for a particular kind of performance
and then finally were trying out to use the output of are both our automatic
and manual annotation in downstream tasks in writing analysis as a scoring
and revision our system we have some promising results there that are under submission
thank you
yes that would be
he one place to do it or at some sort of confidence
rating as well and try to use those in the analysis
we're actually are doing that in two ways so one way is
we are in our study of using discourse parsers would actually like to try some
of the rst parsers even though our data isn't trained in that so we can't
do in an intrinsic evaluation
and how well that work since we are using it for other tasks such as
that's a scoring and
i'm revision analysis we could see of that more global discourse structure how words others
have done those kind of comparative studies and down and it it's useful
and the second thing we're doing is we are trying within the pdtb framework to
to do some image the still not getting maybe at all really global structure but
try to infer from these very local thing
some length local ones by various inference rules and we've got some preliminary results that
suggest that also promising approach
and have
i think at this point we're not necessarily
i don't have such a lofty goal i think where more just telling them they
should have a discourse marker as opposed to which one they should have
but that's an interesting question which
up to think about