Speech Transcript - Extracting PDTB Discourse Relations from Student Essays

0:00:14	and that
0:00:17	there are other structure might talk will be person going to motivate why we're looking
0:00:21	at pdtb in the context of this corpus
0:00:24	explain the corpus and then talk about you studies one involving manual annotation and one
0:00:29	involving automatic a discourse parsing
0:00:34	what are we looking at pdtb for student data
0:00:38	so probably most people are familiar with pdtb penn discourse treebank framework and i'm going
0:00:42	to use the abbreviation to refer to the framework
0:00:45	rather than actual corpus that on the wall street journal on when i talk about
0:00:49	that although it's a wall street journal
0:00:51	i ptt
0:00:54	it's one of the currently very on the dominant theories of discourse structure in the
0:00:58	community
0:00:59	it's lexically grounded and i'll give examples of what i mean by that the moment
0:01:04	and unlike other alternative theories such as rst it's much more shallow so basically the
0:01:10	analysis of the local level with relations and they have two arguments
0:01:14	it's become increasingly study because first there be and now a lot of studies in
0:01:20	many languages many genres
0:01:22	and spin shown that it's a framework that people can reliably annotate
0:01:26	and now because of all this annotation there's a lot of data which has really
0:01:29	screwed interest in automatic i'm discourse parsing
0:01:32	so they're bin in fact at the last two connell conferences their bin i shared
0:01:37	task and pdtb discourse parsing
0:01:43	so although it has been used in a lot of languages an honours genres one
0:01:46	area which it hasn't been used and is the area of interest that i work
0:01:49	in which a student can produce content
0:01:53	and in particular we've been looking at a corpus of student essays
0:01:57	which differ from a prior corpora that have been examined in this framework
0:02:02	along the three dimensions shown here
0:02:06	first there argumentative structure there basically have an argumentative nature
0:02:10	second on in addition to the text being somewhat different the people who are writing
0:02:15	the checks are also different than for example newspaper writers and that their students
0:02:19	so there's still learning how to
0:02:22	convey discourse structure and they also have a lot of other problems with other aspects
0:02:25	of writing more low-level issues
0:02:30	okay so the goals of the work of representing today or to fall so because
0:02:35	of these differences between student data and prior data where interested in looking at this
0:02:41	does this kind of corpus push
0:02:43	the annotation procedures that have been developed and i'm other genres
0:02:47	and also due to these differences how do you existing on discourse parsers that have
0:02:52	been developed primarily for the wall street journal
0:02:55	work on this more challenging domain
0:02:58	and from that sort of from my educate my and all p perspective
0:03:01	from my other had as a researcher and ai in education
0:03:06	i'm also interested in how we can use some of these issues to
0:03:10	support downstream applications which might take advantage of discourse analysis
0:03:15	such as i'm writing tutors and
0:03:19	that's a analysis and so forth
0:03:22	okay so let me briefly describe my corpus
0:03:26	there are data consist of first and second draft upper face persuasive essays written by
0:03:31	high school students in the pittsburgh area is were actually written in
0:03:35	the context and to classrooms
0:03:38	or corpus comes from forty seven students may each row to first and second raster
0:03:42	we have places many papers
0:03:44	and all of the data is in response to the prompted shown in red explain
0:03:50	why contemporary should be sent each of the first six sections of dante's help so
0:03:54	this is
0:03:54	in a class of advanced students in the us their advanced placement courses which prepare
0:04:00	students for taking stance which can given them
0:04:03	colors creditor help in place out of a college level english classes
0:04:08	and so in this corpus students first row there is a response to this problem
0:04:12	that is these were then given to other students in a peer review process where
0:04:17	they were graded according to a rubric a numerical great amount of feedback
0:04:21	and then they revise their papers
0:04:22	and to hopefully make it better
0:04:27	the here's an example of a fairly well written essay as dante descends into the
0:04:32	second circle he sees the sinners you make their reason for all under the oak
0:04:36	of their last these were the souls of those
0:04:38	the main act of love but inappropriately on an impulse this would be a fine
0:04:42	level of health for all those you cheat on their boyfriends or girlfriends and high
0:04:45	school
0:04:46	because let's face it they aren't really online
0:04:50	okay so by the second row the goal is to have people write this nice
0:04:54	persuasive essay with a fairly canonical structure there's usually be an introduction with each this
0:04:59	is laid out
0:05:00	and there should be some
0:05:02	paragraphs developing the reasoning so this was kind of where this example comes from and
0:05:07	then there should be and include
0:05:09	so a conclusion so the sas unlike for example the wall street journal where a
0:05:13	much of the pdtb working community have
0:05:16	has taken place a rs is because that's an argumentative structure
0:05:21	there has been another recent large-scale corpus that for all piled u r b in
0:05:26	the medical community where they looked at scientific medical argument
0:05:30	papers and so those are similar and the argumentative nature to our corpus but those
0:05:35	are written by you know as professional scientists unlike high school students so
0:05:40	even though they have the argument of the are corpus differs from them in the
0:05:43	level of that people producing the text
0:05:47	and i'm not gonna read this one in detail but here's an essay which is
0:05:50	an as well written
0:05:51	you can kind of read that in the background it's either sort problem lots of
0:05:56	levels
0:05:57	and so even though they get feedback still the at caesar quite noisy for many
0:06:02	students even after the
0:06:04	you know the final version
0:06:06	so in their problems range from low-level issues such as grammatical and spelling errors to
0:06:10	more discourse wearing to
0:06:12	issues of lack of coherence with references and discourse relations
0:06:19	okay so that's the data so i'm first gonna talk about how we created are
0:06:24	manual and annotated corpus
0:06:29	no for those for unfamiliar with p d p m briefly just gonna review some
0:06:33	of
0:06:35	major
0:06:36	annotation
0:06:38	things in the framework that we were interested in annotating
0:06:40	so as i said dvd to use the lexically or in to discourse theory which
0:06:45	have the idea that
0:06:47	discourse relations between two arguments can be seen signal but lexically
0:06:51	so when there's the explicit discourse connectives this is called an explicit relation one it's
0:06:55	not explicit then we have
0:06:57	these other options
0:07:00	so if the discourse connective isn't there explicitly but the annotator could put it in
0:07:04	there that called an implicit relation if the discourse relation would be redundant but relation
0:07:10	have an alternative lexical is asian that's a call all x
0:07:14	sometimes the coherence is not in terms of
0:07:18	the relation signal by connectives but by entities
0:07:21	and then in some cases there where we have incoherent
0:07:25	relations there were classified that is no relation so those are the five relation types
0:07:30	that will be annotating
0:07:33	for each of those relations then they can be categorized in terms of sentences and
0:07:38	so the full scale full blown theory of the pdtb framework has a hierarchical annotation
0:07:43	that you can see with this tree structure of our work because this was the
0:07:48	first
0:07:49	first study in we weren't even short we could do the
0:07:53	the highest level of the top of each of these for trees we limited our
0:07:59	current study to just that so we're just levelling
0:08:02	labelling them with respect to what's called level one which are the highest level of
0:08:06	the tree comparison contingency
0:08:09	expansion and temporal
0:08:10	and then as you can see in a full blown pdtb analysis
0:08:14	a temporal can then be labeled whether a synchronous or asynchronous and then if you
0:08:19	want all we channel-level three asynchronous could also be labeled with respect to whether it
0:08:23	runs that citizens or succession
0:08:27	okay so here just a few annotated examples to make this a little clear so
0:08:32	the first example
0:08:34	filled with hatred for many it never acts upon his room thoughts
0:08:38	the notation and all be using that is typically used in p d c t
0:08:42	is the connective is shown with underlines here the connective is yet because that actually
0:08:47	in the text
0:08:48	this is an explicit relation
0:08:50	and then it is
0:08:51	can be associated with several
0:08:54	senses and in this case it's labeled as a comparison and then it has two
0:08:59	arguments of the that the first argument are shown with that alex and the second
0:09:02	is shown in bold
0:09:04	next example the man was stuck in the slayers you have never use devoted his
0:09:08	entire life or other people's possible later in his own
0:09:12	so there's no connective here that's
0:09:15	just shown by the underlying
0:09:16	so this is an implicit relation because even though the writer doesn't put the connective
0:09:21	in the annotator could infer that an appropriate connective could have been placed there
0:09:25	i mainly because so it's implicit and then the sense of the relation that's implicitly
0:09:30	signal in this example is contingency
0:09:35	okay so that sort of the output of the annotations so the process is as
0:09:40	follows
0:09:41	so we retain
0:09:42	sort of the key aspects of g d g p of the pdtb framework namely
0:09:46	we wanted to annotate with respect to the five relation types that i
0:09:50	it just explain and the for level one senses
0:09:53	but following prior studies we modified some of the conventions to fit our domain which
0:09:58	i think that differ from some of the prior work
0:10:00	to help increase the reliability of the annotation and the time that a truck because
0:10:05	very expensive to
0:10:06	higher expert annotators to do this
0:10:10	the following our work that a apply this framework in handy are annotation basically made
0:10:16	one pass through as a so we did kind of relation and of time
0:10:21	because of our data having all these sort of low-level issues that you want see
0:10:24	for example in the wall street journal we allow annotator to a lower relations bit
0:10:29	one ungrammatical units of it was clear that
0:10:32	what really should have been
0:10:33	in written if the low-level problems
0:10:36	hadn't been there so here we see the first layer palette the vestibule in the
0:10:39	entrance of hail this is a large open gate symbolising that's easy to get into
0:10:43	so you can see that there's no capitalisation before for this and there's no period
0:10:49	after helmet we can also to put the there ourselves so we like
0:10:52	the annotator pretend that those real error and
0:10:57	it's we have those be the two arguments even know if we enforce this constraint
0:11:01	for well written text you want to have a lab that and then the relation
0:11:04	here is an entity relation there's no explicit or implicit connective between helen this but
0:11:10	we can infer coherence through entity
0:11:13	and i'd like to note that because of some of the modifications we may when
0:11:17	we apply the parsers which follow the strict p d
0:11:19	g p e d t p obviously they're not going to be able to get
0:11:24	these examples right so it will be impossible for
0:11:26	a parser to get a hundred percent on our corpus currently
0:11:32	another change that we made which we followed from the bible d r b corpus
0:11:36	which is i mentioned like ours is argumentative
0:11:39	is to permit implicit arguments non-adjacent within paragraph unit so you can see in this
0:11:44	example
0:11:46	we have the implicit relations so
0:11:49	so there's no so isn't actually in the text but the annotator felt could have
0:11:53	been place there so it's an implicit
0:11:55	and that's first argument of so is the first sentence in the place of the
0:11:59	porters while the second argument is although and as you can see
0:12:02	they're non-adjacent so in strict pdtb this one be allowed and we'd have
0:12:06	you'd are weaker relationship or no relationship and we missing some of the and this
0:12:11	was found as i said to be an issue
0:12:13	and that by d you're the corpus as well
0:12:18	okay so once we completed our annotation are first interest was in comparing how the
0:12:24	distribution of what we annotated compared to these other corpora in the literature to see
0:12:28	the impact of both
0:12:30	a the argumentative genre as well it's torque and conjoined with that
0:12:34	the
0:12:36	elementary level of the writing ability of the people producing the text
0:12:40	so on the first row you can see the distribution across the five relation types
0:12:44	or rs a data and them below you can see comparison with these two other
0:12:48	corpora that of mention the wall street journal and the by what you're be
0:12:52	and i've highlighted two things i just want to drive a talking there are more
0:12:55	details about some other things in the paper
0:12:58	never first unlike
0:13:00	the other two corpora which have
0:13:02	exactly the same percentage of explicitly signal relations are data has much fewer
0:13:07	and we believe this probably reflects the not this nature of people producing the taxes
0:13:13	there still actually learning how to construct
0:13:15	a coherent discourse and haven't quite figured out the proper use of connectives and so
0:13:19	as i said we feel this is something that discourse structure could be used in
0:13:23	downstream applications to highlight areas that might benefit from tutoring
0:13:29	we also see that although the last
0:13:33	column that use either the no relation
0:13:35	although it's very low in all of the corpora and are as we basically got
0:13:39	it down to zero and we believe that's because the loosening of the can adjacency
0:13:43	constraint although the by the are we also this not constraint may
0:13:46	still didn't really differ from the wall street journal
0:13:51	with respect to the other major component that we annotated the sense distributions
0:13:56	you can see in the first column at
0:13:59	but the sas in the buyer the rbf you were comparisons of this suggests that
0:14:03	this might be a feature that's relevant to the argument in nature of a text
0:14:06	rather than to the skill level of the writers and this is kind of opposite
0:14:11	to the contingency where we see that
0:14:14	wall street journal on the by dear d r b which are get burned whether
0:14:18	they're argumentative or not
0:14:19	or much more similar to each other as opposed to the sas where it is
0:14:23	the skill level of the students that is what's
0:14:26	a notable there
0:14:31	okay and then the final thing we that was identified in our manual annotation was
0:14:37	that the annotator had a lot of
0:14:42	ambiguities that she had trouble annotating that consistently euros
0:14:45	in particular between the three things i've shown there and i've just given two examples
0:14:50	and so in the first examples you had a lot of trouble deciding should this
0:14:53	be an implicit expansion or an entity relation and some of these concerns we're because
0:14:58	on the way pdtb works if there is a predefined as the connectives that came
0:15:02	out of largely the wall street journal and in our student data we're seeing a
0:15:05	lot of things which probably could
0:15:07	we consider connected but aren't you
0:15:09	that are resources that are used to guide most manual annotation efforts
0:15:17	here we see a another ambiguity between explicit expansion work and contingency
0:15:24	this
0:15:25	issue of causality with which is way to contingency was also a problem that was
0:15:30	in the by the european back they
0:15:32	added some extra senses to reflect sort of contingency that is specific to argumentation
0:15:40	okay so no turning to the automatic parsing
0:15:44	in this study we use the off-the-shelf than nl discourse parser which was the first
0:15:48	and on pdtb ptt parser it was produced that the national university of singapore
0:15:55	and was trained on the wall street journal
0:15:57	and it's basically has a pipeline architecture where a
0:16:01	a set of predefined discourse connective that i mentioned before identified once
0:16:05	those of identify then all the explicit relations are the arguments are identified in a
0:16:10	sign to sense and then all the non explicit relations are dealt with
0:16:15	and our study we use two versions of the parser we first use the one
0:16:19	that you base we can download directly which is trained on level to send systems
0:16:23	are data is only in terms of level one we could parse in terms of
0:16:26	level two and then
0:16:28	rewrite that in the more abstract level one versions
0:16:31	are we thought it might be more productive to actually retrain the parser by not
0:16:35	using the level two sentences in the wall street journal but simplifying them to level
0:16:40	one and then training and testing directly and
0:16:42	that are and us people finally we trained up your parts of force
0:16:47	in the second version
0:16:51	okay so here are on our results and to "'em" performance using f one score
0:16:56	which is
0:16:57	the standard way that these parsers are currently evaluated
0:17:01	so in the first column you can see the configuration for the training that particular
0:17:05	parser we use the data was trained on the level of the sense
0:17:09	sense is an annotation that was used for the training and then you can see
0:17:12	the testing situation in our case we not only
0:17:15	switch from wall street journal for training to evaluation on sas
0:17:19	and then you can see sometimes we
0:17:21	trained on the same level that we
0:17:23	tested on and other times that very
0:17:26	and then there are two different ways of evaluating and to and performance based on
0:17:30	whether you need an exact match and arguments or partial match obviously the partial matches
0:17:34	a user evaluation so you get higher perform
0:17:37	and here we can see that as we suspected our best results are obtained by
0:17:41	retraining the parser so that it
0:17:44	trains and test at the same sentence level
0:17:49	although this is then are
0:17:51	really a very careful
0:17:53	possible to be a very careful comparison we were interested in just looking at absolute
0:17:58	performance levels because of its that are real interest is using the output of parsing
0:18:03	for downstream applications and although these performance levels are not greater apart from great people
0:18:09	have been
0:18:10	found that it is possible to use output of parsers from prior studies in these
0:18:15	and so our goal was to make changes such as the changes to the annotation
0:18:20	matt that the use of level one to get are absolute levels up to prior
0:18:24	work
0:18:24	in q that we could then use them
0:18:26	so in the top are you can see what i had shown on the prior
0:18:29	table on the bottom you can see some benchmarks
0:18:32	what kind of the state-of-the-art in the literature so the first row here shows
0:18:36	the same parser we use when not only trained in the way we use the
0:18:40	protested on the same training data
0:18:42	you can see that under both partial an exact match repair only comparable
0:18:47	the second two rows show the best performing parser from the common all competition not
0:18:52	this year but
0:18:54	two thousand fifteen that was going one available the time we did our work
0:18:58	and again you can see that even that was trained on the wall street journal
0:19:01	tested on different levels
0:19:03	that if you look at the last column at our performance levels are fairly comparable
0:19:08	as well
0:19:10	i'm m finally just a few more observations as you start earlier their different kind
0:19:17	of relations that one can predicting explicit versus all the others
0:19:22	a so we were interested in how performance very whether you went mutual that into
0:19:26	account
0:19:28	do not surprisingly again you can see that's much easier to predict explicit relations compared
0:19:33	to non explicit relations in our corpora corpus that's true and all the other prior
0:19:37	studies as well
0:19:39	and this is largely due to the fact that it's based on first this connective
0:19:43	identification which is fairly reliable in our case it's ninety percent which although good is
0:19:48	still as i'm
0:19:50	said a little lower than a prior corpora because the list of connectives the drive
0:19:54	this
0:19:54	was developed for the wall street journal and doesn't necessarily match as well as it
0:19:59	could to a student data
0:20:03	and finally when we looked at the two different ways of combining the levels for
0:20:06	training and testing we can see that there was a clear benefit for the level
0:20:10	one and training and testing for the non explicit results
0:20:14	well for the level two we had lately i flipped version although the differences weren't
0:20:18	quite is dramatic we can see that the training on a more specific one and
0:20:24	testing on the abstracted version actually works better which suggests some sort of hybrid
0:20:29	approach combining the two four n using different
0:20:33	different parsers for different senses might give us better results than any other approach
0:20:40	in the paper there's a lot of error analysis like detail confusion matrices if you're
0:20:45	interested many years reflect interestingly many errors that the parser make reflect the cases that
0:20:51	the annotator felt to be difficult ambiguities like discussed earlier and are they also mentioned
0:20:55	the parser would never be able to actually get a hundred percent in our case
0:20:59	because the
0:20:59	the changes that we made to some of conventions
0:21:03	which the current parsers that we're off-the-shelf don't yet have implemented
0:21:08	okay so in this paper i tried to
0:21:12	so analysis of a very will develop framework that's been used in many other languages
0:21:17	and genres and how it sort of
0:21:19	what get stressed when it's applied to this new corpora which differs and other three
0:21:24	ways i've shown here
0:21:26	first idea of manual relation annotation by comparing our distributions prior corpora we've identified some
0:21:32	issues that some methodological complexity is an annotation that need to be further developed to
0:21:37	a further enhance the generality of each led this framework and also could be used
0:21:42	to
0:21:43	motivate our writing tutors
0:21:45	i with respect to automatic relation parsing our studies compared a variety of parsers and
0:21:50	different training and testing condition
0:21:52	and suggest that the approaches we made to our annotation framework you give us comparable
0:21:58	results in an absolute performance level
0:22:02	in our current directions unfortunately this data was not originally collected by me it was
0:22:06	conducted by people who don't know anything about releasing corpora so that human studies subjects
0:22:11	protocol did not
0:22:14	we're not written such that can release the data but we're now creating a new
0:22:18	corpus
0:22:19	a similar type of data where that a problem has been fixed that were correctly
0:22:23	i'm gonna be collecting and annotating the data and then should be able to make
0:22:27	a
0:22:27	corpus that's very similar to this publicly available
0:22:31	i'm are also now doing a larger scale study of discourse parsing or basically trying
0:22:36	to find anything that is available to the public and to use either off-the-shelf or
0:22:40	for those that a lower retraining to actually retrain and on student data and tested
0:22:46	on student data and what we eventually like to do is not just use them
0:22:50	off the shelf a really try to
0:22:51	modified them in ways to
0:22:53	optimize them for a particular kind of performance
0:22:56	and then finally were trying out to use the output of are both our automatic
0:23:01	and manual annotation in downstream tasks in writing analysis as a scoring
0:23:06	and revision our system we have some promising results there that are under submission
0:23:13	thank you
0:23:36	yes that would be
0:23:39	he one place to do it or at some sort of confidence
0:23:42	rating as well and try to use those in the analysis
0:24:27	we're actually are doing that in two ways so one way is
0:24:31	we are in our study of using discourse parsers would actually like to try some
0:24:36	of the rst parsers even though our data isn't trained in that so we can't
0:24:39	do in an intrinsic evaluation
0:24:41	and how well that work since we are using it for other tasks such as
0:24:44	that's a scoring and
0:24:46	i'm revision analysis we could see of that more global discourse structure how words others
0:24:50	have done those kind of comparative studies and down and it it's useful
0:24:54	and the second thing we're doing is we are trying within the pdtb framework to
0:24:59	to do some image the still not getting maybe at all really global structure but
0:25:03	try to infer from these very local thing
0:25:05	some length local ones by various inference rules and we've got some preliminary results that
0:25:11	suggest that also promising approach
0:25:15	and have
0:25:51	i think at this point we're not necessarily
0:25:55	i don't have such a lofty goal i think where more just telling them they
0:25:59	should have a discourse marker as opposed to which one they should have
0:26:04	but that's an interesting question which
0:26:07	up to think about

Extracting PDTB Discourse Relations from Student Essays

Oral Session 3: Discourse processing

Kate Forbes-Riley, Fan Zhang and Diane Litman