Speech Transcript - The Role of Discourse Units in Near-Extractive Summarization

0:00:15	everyone so i will continue to talk
0:00:19	the topic on a rst but we will focus instead on the discourse units in
0:00:25	the context of summarisation
0:00:27	i'm just see is the joint work we camp when amanda when i was interning
0:00:31	at yahoo
0:00:34	what summarisation that's first look at an example and i will read it
0:00:39	for that case as the global warming created by human emissions costly and is the
0:00:43	mouth and ocean water to expand
0:00:45	scientists warned that the accelerating rise of the c would eventually impair the united states
0:00:50	coastline
0:00:51	now these warnings are no longer theoretical the enumeration of the coast has begun to
0:00:55	see has crept up to the point at high tide and the first we know
0:00:59	what it takes descent water pouring into streets and homes and so on
0:01:03	so here i'm showing real human summary that says
0:01:07	scientists warnings that the rise of the c would eventually impair the united states "'cause"
0:01:11	nine are no longer theoretical
0:01:14	so if we compare the two we see that the documents and it sentences in
0:01:17	order to capture the documents meaning
0:01:20	and they do so by trimming extraneous content
0:01:23	by combining sentences
0:01:25	i replacing phrases or clauses
0:01:27	and so on
0:01:29	though for machine summarisation usually there too big scores of for the system's one is
0:01:35	extractive summarization where the send a the summary summarizer extract four sentences from the original
0:01:41	article
0:01:42	the second one is abstract of summarisation where the system actually generate the text response
0:01:47	for the summary
0:01:49	and if we look at the number of results returned by a search engine we
0:01:53	see that actually the extractive techniques are very popular
0:01:57	and things they select sentences from the documents the summaries are
0:02:02	always grammatical
0:02:04	and so that the systems can focus on things like cartons that action and coherence
0:02:10	now
0:02:11	if we want to have an extractive summary that convey everything that human
0:02:16	was trying to convey in their summary
0:02:19	i'm these two sentences will be selected
0:02:21	and we can see that the summary here is very low on it and it's
0:02:24	nothing like what the human was trying to produce
0:02:28	so in this paper we look at single document summarization
0:02:31	we want to ask question whether extractive summarization techniques can be used to be produced
0:02:36	more human like summaries
0:02:38	in particular we are interested in whether extracting sub sentential units would help to produce
0:02:43	a wider range of summaries
0:02:45	by a wider range what i mean is a four summers to be near extractive
0:02:50	where the tokens extracted from contiguous and not consumer goods bands
0:02:54	from the original sentences
0:02:57	and for a sub sentential units we are particularly interested in elementary discourse unit square
0:03:03	use and we want to see whether they are good the summarisation units
0:03:08	though just for a quick recap what our elementary discourse units
0:03:13	but this is part of the rhetorical structure theory what rst where it's a user
0:03:18	defined at the segmentation of sentences
0:03:20	in two independent clauses
0:03:23	so for example astro floppy drive rights or read
0:03:26	i think on disk it is working for ways to keep lose particles and dust
0:03:30	from causing software as and dropouts
0:03:33	so here the sentence is segmented into three edus
0:03:37	in a full discourse tree the second and third edu has a purpose relationship
0:03:41	and
0:03:42	they also have a circumstance relationship with the first you
0:03:47	in the for discourse tree the more important part of a relation is quite the
0:03:51	nucleus
0:03:52	and the less important part is called the satellite and this fact will be used
0:03:56	later
0:03:58	here's the contributions for this paper
0:04:00	we first of all do analysis
0:04:03	automatically obtained edus and cost and human identified concepts
0:04:08	we show that edus correspond with these conceptual units identified by human
0:04:13	and second we show that on the importance of edus
0:04:17	correlate with the importance of concepts
0:04:20	next we look at the context of near extractive summarization where we first introduce a
0:04:25	large dataset of extractive and you're extractive summaries
0:04:29	and then we show that you boundaries aligned with human content extraction in this dataset
0:04:34	and furthermore we show that edus are superior to sentences in your extractive sent it's
0:04:40	summarisation
0:04:41	under varying length constraints
0:04:44	okay so i will start with the first contribution how we look at on edus
0:04:48	and it's correspondence with human identified conceptual units
0:04:53	the ideas on the one hand we have abstract units of information on the other
0:04:58	hand we have sentences that contain these units
0:05:01	and we want to see is whether elementary discourse units are happy middle ground between
0:05:04	the two
0:05:06	so what we have is articles with human identified and labeled our conceptual units
0:05:13	and we can segmented over automatically into edus so we can get a correspondence between
0:05:17	edus and concept
0:05:20	and then using this correspondence we can look at the lexical coverage for edus
0:05:25	but the articles with human labeled concepts k we use are from are the human
0:05:29	summaries from top two thousand five to two thousand seven and task two thousand eight
0:05:33	two thousand eleven
0:05:34	the concepts here are summary content unit contributors and the hear each a summary content
0:05:40	unit or as su contains at least one contribute are extracted from each summary
0:05:45	so what do i mean by contributors
0:05:47	so say here is a original article
0:05:50	and humans coming and the right summaries for this article at and
0:05:54	at this point we will
0:05:55	disregard the original article and consider the summaries as independent
0:06:01	articles
0:06:02	except that they have the same topic
0:06:05	now other humans coming and they mark contribute our contributors
0:06:10	from these summaries
0:06:11	and their aggregated into summary content units with a way to cure the weight is
0:06:15	depend is determined by
0:06:18	how many summaries contain the
0:06:21	a contributor what's the same semantic content
0:06:24	so here the weight of for means that it comes from or summaries
0:06:28	and here wait up to means that comes from two summaries
0:06:32	though what do they look like
0:06:35	so for example the american booksellers association represents private books bookstore on there's and sponsors
0:06:40	book expo and i know convention
0:06:43	here the first contributor is the american booksellers association rubber represents private bookstore on there's
0:06:49	the second one is american booksellers association sponsors book expo
0:06:53	and the third one is book expo an annual convention
0:06:58	though in all we have more than thirty two thousand contributors and about seventy nine
0:07:03	percent of them are contiguous spans in the text
0:07:06	and from now on we will refer to these contributors that's concepts
0:07:12	though now we have a human-labelled concepts from the summaries how do we get the
0:07:17	edus will be doing so we do for discourse parsing automatically using phone in her
0:07:22	stool
0:07:24	though in the previous example everything before the word and is the first edu and
0:07:29	everything afterwards is the second
0:07:31	the now we can look at number of overlapping edus per concept in particular this
0:07:36	graph shows the number of edus that overlap with at least one toll can
0:07:41	with each concept
0:07:43	and we see that it's usually one it one sometimes to and rarely more than
0:07:47	three
0:07:48	so on average the number of concept that over
0:07:51	concepts overlapped with one point five six used
0:07:55	and the no
0:07:56	the number of concepts in the whole sentence this is two point one eight
0:08:00	so we can see that sentences are much more coarse then edus
0:08:04	or concepts
0:08:06	and that if we want to represent a concepts using edus we would not like
0:08:11	extraneous content in the concept that's not present in the user so
0:08:15	here we show the number of words that need to be deleted from each concept
0:08:19	to be covered by a single edu
0:08:21	and here
0:08:23	most of them are we see that
0:08:26	in most cases edus are larger the concepts
0:08:29	and the less than eight percent of the concepts are observed to have more than
0:08:32	four words out outside their corresponding you
0:08:37	so now we see that use do correspond with human identify conceptual labels
0:08:42	so now we can look at
0:08:45	as a then another angle which is on the importance of edus with the importance
0:08:49	of a concept weights
0:08:52	so how do we do this so remember that each concept is associated with the
0:08:56	weight
0:08:57	that is from how many summaries are
0:09:00	the same semantic content concept is present
0:09:04	so we have the weight of concepts and we have for each concept the overlapping
0:09:08	edus
0:09:09	so now if we can get the waiter edus we have the full picture for
0:09:13	comparison
0:09:14	and indeed we can
0:09:16	i will not elaborate on how to derive is
0:09:19	but the idea is to use the nucleus and satellite information and in this case
0:09:26	the second edu is the most important one
0:09:30	but now in this table i shows the average a salience score for are used
0:09:35	that overlap with concepts with different weights and we can see that as the weight
0:09:40	of a concept because becomes larger the weight of the edu also goes higher
0:09:45	and
0:09:46	i want to stress that the weight for concepts it's from different documents
0:09:50	but the weight for are edu is from a single document so that intuitively
0:09:55	the weight of the edu a can have some notion for the importance of the
0:10:01	concept in itself
0:10:04	okay so now we see that in try document edu weights correlate with a into
0:10:08	a document concept weights next we can investigate near extractive summarization and i will first
0:10:14	talk about the dataset
0:10:16	with data we use is harder than a ldc released of the new times annotated
0:10:22	dataset
0:10:23	in particular it contains about two hundred forty five thousand online lead paragraphs
0:10:29	from two thousand one to two thousand seven so these are the paragraphs underlined the
0:10:34	headlines
0:10:35	all then you're times a homepage
0:10:37	and the in do you the first the example there actually in the beginning
0:10:42	is one of these a ninety paragraphs
0:10:46	though in particular it in this dataset we have identified three subsets of extractive been
0:10:52	your extractive summaries
0:10:54	so the first one is
0:10:55	sentence extractive alright kinds contains more than thirty eight thousand examples
0:11:00	where the summary sentences are extracted from the original text sentences
0:11:05	the second one is near extractive span
0:11:07	it contains more than fifteen thousand examples
0:11:10	where the summary sentences are from contiguous spans from the original text sentences
0:11:15	and the third one is near extractive sub-sequences
0:11:18	which contains more than twenty five thousand examples here the sentences from non contiguous spans
0:11:24	from the original text sentences
0:11:26	and we have cleaned up the data and with the code it's released
0:11:30	on this website
0:11:33	okay so what they this dataset now we can look at how are edu boundaries
0:11:37	aligned with a human content extraction and we are only interested in the near extractive
0:11:44	datasets because they're the human actually need to delete something
0:11:48	though we have on the one hand that's the article on the other hand we
0:11:51	have the summary what we can do is we can get the corresponding units whether
0:11:55	sentences or edus and we can study the number of
0:11:58	words they need to be deleted were added from each unit to recover the summary
0:12:04	for example here i'm showing a summary sentence
0:12:08	with three edus
0:12:09	and below i'm showing the corresponding sentences
0:12:12	from the document and we can see that some of the content or use are
0:12:16	deleted
0:12:18	from the original text
0:12:21	so here we show the average number of tokens
0:12:24	that need to be deleted or added for each type of units in order to
0:12:29	recover the summary
0:12:30	and we can see that on average twelve tokens need to be deleted from sentences
0:12:35	but what you use this average number is less than two
0:12:39	and the number of added talk and square edus is also less than one
0:12:44	but we see that edus do involve much less talk and deletion and very little
0:12:47	addition
0:12:49	so what are the words that are deleted so here i'm showing different part-of-speech categories
0:12:54	and the
0:12:56	darker colours are the sentences so the take away here is that for sentences a
0:13:01	lot of the content words need to be deleted and these are kind of difficult
0:13:04	to solve
0:13:06	okay so now we see that edu boundaries to align with human content extraction
0:13:11	now we can look at things summarisation whether edus are superior to a sentences
0:13:17	so we do single-document summarization all the new york times dataset and we barely our
0:13:23	land constrained form a hundred to three hundred characters so hundred here is
0:13:27	about one standard deviation below the
0:13:30	the shortest than your extractive spend
0:13:33	and three hundred character is
0:13:35	one standard deviation
0:13:37	above the longest extractive sentence
0:13:41	dataset
0:13:43	the summarization framework that we use is a supervised greedy summarizer
0:13:47	where we have and units
0:13:49	we want to select a subset
0:13:52	where the feature weights are maximized
0:13:55	and the length constraint is satisfied
0:13:58	and for inference we do agree
0:14:00	for learning we do structured perceptron
0:14:04	for the features
0:14:05	we want to use neutral features that are not biased towards the benefits or disadvantages
0:14:12	for each type of unit
0:14:14	so we both basically use the
0:14:17	things like position of the unit position of the paragraph containing the unit
0:14:21	cosine weighted similarity with document and the unit
0:14:25	whether the unit is adjacent to something that's previously added to some reinstall
0:14:31	the for evaluation we used to each one and two
0:14:35	so rouge is the recall oriented metric that looks at the coverage of the summary
0:14:40	content
0:14:41	and which one here means a unigram in which two things bigram
0:14:45	okay so before a show varying length results
0:14:51	if we think about single-document summarization a strong baseline is just selecting
0:14:55	the first k top k units such that the length constraint is satisfied
0:15:02	so
0:15:03	we want to compare with that and here we show the results for each type
0:15:07	of unit
0:15:08	for each system and we shall we see that the supervised summarizers outperform
0:15:13	the baseline in all cases and then
0:15:16	edus outperforms sentences where all cases
0:15:19	and this is underlined constraint of two hundred characters
0:15:23	now we are ready to look at varying budget results
0:15:26	so here i'm showing the results were extractive sentence ignore almost all cases use of
0:15:32	on sentences
0:15:34	for any extractive spend in all cases usable outperforms sentences and when you extracted so
0:15:40	stuff sequence okay and the situation similar to extractive sentence situation
0:15:46	and in particular we see that when the land constrain its a tighter edus have
0:15:51	a much better advantage and sentences
0:15:55	the wire you still good here's an example
0:15:59	the reference summary is the plan which rivals the scope of battery park city would
0:16:03	be so no one seventy five block area of queen point
0:16:05	and williamsburg
0:16:07	so here we can see that the summarizer is not selecting the right sentence at
0:16:10	all
0:16:12	but for edus all of the content is selected so
0:16:15	we see that it's not the case that the summarizer cannot find the right sentence
0:16:18	is sometimes like the that that's of the sentence is just too long
0:16:25	and also you boundary a boundary is really correspond well with a human identified content
0:16:32	boundaries and finally since user clauses
0:16:35	they have much better readability and things like n-grams
0:16:40	okay so in conclusion we first conduct a corpus analysis where we show that edus
0:16:46	correspond well with human identify conceptual units
0:16:49	we show that you use the importance of edus from intra document
0:16:55	weights
0:16:55	correlate with the inter documents concept weights
0:16:58	and we also look at near extractive summarization where first i introduce a large dataset
0:17:04	for extractive in your extractive summaries
0:17:07	it's are released on this website
0:17:09	and
0:17:11	we showed in this dataset edu boundaries along with human called doesn't extraction and finally
0:17:16	edus are superior to sentences in your extractive summarization under varying length constraint
0:17:22	and that's all thanks for your attention i will come in questions
0:18:11	so
0:18:13	are you referring to kind of the boundary for use or you're referring to
0:18:17	so there also like the importance of the concept right
0:18:22	i think depends on how someone want to express something that importance itself may be
0:18:27	different but as we can see the summaries our problem for different people
0:18:32	but we also
0:18:33	observe this kind of correlation which we found really interesting but we need to look
0:18:37	into more like why this is the case
0:18:39	but for used i think
0:18:41	for
0:18:42	like we analyze two corpora one it's like
0:18:45	different summaries from different people and second one is a gold summaries from editors
0:18:50	we see that good correspondence with each case so i'm pretty confident that you know
0:18:54	this is okay
0:19:15	right we're not looking at the coherence and grammatical it for this work but it's
0:19:20	part of the future plans that we have
0:19:24	so
0:19:26	for some reason we still find a good readable summaries
0:19:30	so
0:19:33	for example
0:19:38	well if we look at this one for the edus it's
0:19:42	build very reasonable but i wouldn't say like everything is super grammatical or and
0:19:48	like
0:19:48	we will see different edus being attached just because the summarizer want to fulfil the
0:19:54	length it doesn't make sense and
0:19:56	things like that is what happened
0:20:16	no not at all so all of our features for the summarizer we bypassed anything
0:20:22	that has
0:20:24	e
0:20:24	that will show the advantage or disadvantage for each type of unit
0:20:29	so we are only using things like position and
0:20:32	a similarity cosine similarity and things like adjacency and so on
0:20:44	the weights
0:20:46	yes i we didn't use the parser but it's for the summarization task we only
0:20:51	use the edus
0:20:53	but for the analysis part we did look at the weights for the use and
0:20:57	we associate that with the weights for concept
0:21:07	right there is a common work that's why we did for parsing
0:21:25	right so
0:21:27	the pdtb
0:21:29	it doesn't have two things that i think we really need in this task
0:21:33	the first one is of full segmentation
0:21:36	so the pdtb arguments are
0:21:39	but like they have very
0:21:41	but we have a lot of freedom to where the position of the arguments are
0:21:46	and they're not a segmentations are nothing is contiguous
0:21:49	the second part is we don't like for the pdtb there's nothing
0:21:54	associated with salient so if we want to consider weights
0:21:57	or so or salience we cannot do that pdtb

The Role of Discourse Units in Near-Extractive Summarization

Oral Session 3: Discourse processing

Junyi Jessy Li, Kapil Thadani and Amanda Stent