Speech Transcript - The Role of Discourse Units in Near-Extractive Summarization

everyone so i will continue to talk

the topic on a rst but we will focus instead on the discourse units in

the context of summarisation

i'm just see is the joint work we camp when amanda when i was interning

at yahoo

what summarisation that's first look at an example and i will read it

for that case as the global warming created by human emissions costly and is the

mouth and ocean water to expand

scientists warned that the accelerating rise of the c would eventually impair the united states

coastline

now these warnings are no longer theoretical the enumeration of the coast has begun to

see has crept up to the point at high tide and the first we know

what it takes descent water pouring into streets and homes and so on

so here i'm showing real human summary that says

scientists warnings that the rise of the c would eventually impair the united states "'cause"

nine are no longer theoretical

so if we compare the two we see that the documents and it sentences in

order to capture the documents meaning

and they do so by trimming extraneous content

by combining sentences

i replacing phrases or clauses

and so on

though for machine summarisation usually there too big scores of for the system's one is

extractive summarization where the send a the summary summarizer extract four sentences from the original

article

the second one is abstract of summarisation where the system actually generate the text response

for the summary

and if we look at the number of results returned by a search engine we

see that actually the extractive techniques are very popular

and things they select sentences from the documents the summaries are

always grammatical

and so that the systems can focus on things like cartons that action and coherence

now

if we want to have an extractive summary that convey everything that human

was trying to convey in their summary

i'm these two sentences will be selected

and we can see that the summary here is very low on it and it's

nothing like what the human was trying to produce

so in this paper we look at single document summarization

we want to ask question whether extractive summarization techniques can be used to be produced

more human like summaries

in particular we are interested in whether extracting sub sentential units would help to produce

a wider range of summaries

by a wider range what i mean is a four summers to be near extractive

where the tokens extracted from contiguous and not consumer goods bands

from the original sentences

and for a sub sentential units we are particularly interested in elementary discourse unit square

use and we want to see whether they are good the summarisation units

though just for a quick recap what our elementary discourse units

but this is part of the rhetorical structure theory what rst where it's a user

defined at the segmentation of sentences

in two independent clauses

so for example astro floppy drive rights or read

i think on disk it is working for ways to keep lose particles and dust

from causing software as and dropouts

so here the sentence is segmented into three edus

in a full discourse tree the second and third edu has a purpose relationship

and

they also have a circumstance relationship with the first you

in the for discourse tree the more important part of a relation is quite the

nucleus

and the less important part is called the satellite and this fact will be used

later

here's the contributions for this paper

we first of all do analysis

automatically obtained edus and cost and human identified concepts

we show that edus correspond with these conceptual units identified by human

and second we show that on the importance of edus

correlate with the importance of concepts

next we look at the context of near extractive summarization where we first introduce a

large dataset of extractive and you're extractive summaries

and then we show that you boundaries aligned with human content extraction in this dataset

and furthermore we show that edus are superior to sentences in your extractive sent it's

summarisation

under varying length constraints

okay so i will start with the first contribution how we look at on edus

and it's correspondence with human identified conceptual units

the ideas on the one hand we have abstract units of information on the other

hand we have sentences that contain these units

and we want to see is whether elementary discourse units are happy middle ground between

the two

so what we have is articles with human identified and labeled our conceptual units

and we can segmented over automatically into edus so we can get a correspondence between

edus and concept

and then using this correspondence we can look at the lexical coverage for edus

but the articles with human labeled concepts k we use are from are the human

summaries from top two thousand five to two thousand seven and task two thousand eight

two thousand eleven

the concepts here are summary content unit contributors and the hear each a summary content

unit or as su contains at least one contribute are extracted from each summary

so what do i mean by contributors

so say here is a original article

and humans coming and the right summaries for this article at and

at this point we will

disregard the original article and consider the summaries as independent

articles

except that they have the same topic

now other humans coming and they mark contribute our contributors

from these summaries

and their aggregated into summary content units with a way to cure the weight is

depend is determined by

how many summaries contain the

a contributor what's the same semantic content

so here the weight of for means that it comes from or summaries

and here wait up to means that comes from two summaries

though what do they look like

so for example the american booksellers association represents private books bookstore on there's and sponsors

book expo and i know convention

here the first contributor is the american booksellers association rubber represents private bookstore on there's

the second one is american booksellers association sponsors book expo

and the third one is book expo an annual convention

though in all we have more than thirty two thousand contributors and about seventy nine

percent of them are contiguous spans in the text

and from now on we will refer to these contributors that's concepts

though now we have a human-labelled concepts from the summaries how do we get the

edus will be doing so we do for discourse parsing automatically using phone in her

stool

though in the previous example everything before the word and is the first edu and

everything afterwards is the second

the now we can look at number of overlapping edus per concept in particular this

graph shows the number of edus that overlap with at least one toll can

with each concept

and we see that it's usually one it one sometimes to and rarely more than

three

so on average the number of concept that over

concepts overlapped with one point five six used

and the no

the number of concepts in the whole sentence this is two point one eight

so we can see that sentences are much more coarse then edus

or concepts

and that if we want to represent a concepts using edus we would not like

extraneous content in the concept that's not present in the user so

here we show the number of words that need to be deleted from each concept

to be covered by a single edu

and here

most of them are we see that

in most cases edus are larger the concepts

and the less than eight percent of the concepts are observed to have more than

four words out outside their corresponding you

so now we see that use do correspond with human identify conceptual labels

so now we can look at

as a then another angle which is on the importance of edus with the importance

of a concept weights

so how do we do this so remember that each concept is associated with the

weight

that is from how many summaries are

the same semantic content concept is present

so we have the weight of concepts and we have for each concept the overlapping

edus

so now if we can get the waiter edus we have the full picture for

comparison

and indeed we can

i will not elaborate on how to derive is

but the idea is to use the nucleus and satellite information and in this case

the second edu is the most important one

but now in this table i shows the average a salience score for are used

that overlap with concepts with different weights and we can see that as the weight

of a concept because becomes larger the weight of the edu also goes higher

and

i want to stress that the weight for concepts it's from different documents

but the weight for are edu is from a single document so that intuitively

the weight of the edu a can have some notion for the importance of the

concept in itself

okay so now we see that in try document edu weights correlate with a into

a document concept weights next we can investigate near extractive summarization and i will first

talk about the dataset

with data we use is harder than a ldc released of the new times annotated

dataset

in particular it contains about two hundred forty five thousand online lead paragraphs

from two thousand one to two thousand seven so these are the paragraphs underlined the

headlines

all then you're times a homepage

and the in do you the first the example there actually in the beginning

is one of these a ninety paragraphs

though in particular it in this dataset we have identified three subsets of extractive been

your extractive summaries

so the first one is

sentence extractive alright kinds contains more than thirty eight thousand examples

where the summary sentences are extracted from the original text sentences

the second one is near extractive span

it contains more than fifteen thousand examples

where the summary sentences are from contiguous spans from the original text sentences

and the third one is near extractive sub-sequences

which contains more than twenty five thousand examples here the sentences from non contiguous spans

from the original text sentences

and we have cleaned up the data and with the code it's released

on this website

okay so what they this dataset now we can look at how are edu boundaries

aligned with a human content extraction and we are only interested in the near extractive

datasets because they're the human actually need to delete something

though we have on the one hand that's the article on the other hand we

have the summary what we can do is we can get the corresponding units whether

sentences or edus and we can study the number of

words they need to be deleted were added from each unit to recover the summary

for example here i'm showing a summary sentence

with three edus

and below i'm showing the corresponding sentences

from the document and we can see that some of the content or use are

deleted

from the original text

so here we show the average number of tokens

that need to be deleted or added for each type of units in order to

recover the summary

and we can see that on average twelve tokens need to be deleted from sentences

but what you use this average number is less than two

and the number of added talk and square edus is also less than one

but we see that edus do involve much less talk and deletion and very little

addition

so what are the words that are deleted so here i'm showing different part-of-speech categories

and the

darker colours are the sentences so the take away here is that for sentences a

lot of the content words need to be deleted and these are kind of difficult

to solve

okay so now we see that edu boundaries to align with human content extraction

now we can look at things summarisation whether edus are superior to a sentences

so we do single-document summarization all the new york times dataset and we barely our

land constrained form a hundred to three hundred characters so hundred here is

about one standard deviation below the

the shortest than your extractive spend

and three hundred character is

one standard deviation

above the longest extractive sentence

dataset

the summarization framework that we use is a supervised greedy summarizer

where we have and units

we want to select a subset

where the feature weights are maximized

and the length constraint is satisfied

and for inference we do agree

for learning we do structured perceptron

for the features

we want to use neutral features that are not biased towards the benefits or disadvantages

for each type of unit

so we both basically use the

things like position of the unit position of the paragraph containing the unit

cosine weighted similarity with document and the unit

whether the unit is adjacent to something that's previously added to some reinstall

the for evaluation we used to each one and two

so rouge is the recall oriented metric that looks at the coverage of the summary

content

and which one here means a unigram in which two things bigram

okay so before a show varying length results

if we think about single-document summarization a strong baseline is just selecting

the first k top k units such that the length constraint is satisfied

we want to compare with that and here we show the results for each type

of unit

for each system and we shall we see that the supervised summarizers outperform

the baseline in all cases and then

edus outperforms sentences where all cases

and this is underlined constraint of two hundred characters

now we are ready to look at varying budget results

so here i'm showing the results were extractive sentence ignore almost all cases use of

on sentences

for any extractive spend in all cases usable outperforms sentences and when you extracted so

stuff sequence okay and the situation similar to extractive sentence situation

and in particular we see that when the land constrain its a tighter edus have

a much better advantage and sentences

the wire you still good here's an example

the reference summary is the plan which rivals the scope of battery park city would

be so no one seventy five block area of queen point

and williamsburg

so here we can see that the summarizer is not selecting the right sentence at

all

but for edus all of the content is selected so

we see that it's not the case that the summarizer cannot find the right sentence

is sometimes like the that that's of the sentence is just too long

and also you boundary a boundary is really correspond well with a human identified content

boundaries and finally since user clauses

they have much better readability and things like n-grams

okay so in conclusion we first conduct a corpus analysis where we show that edus

correspond well with human identify conceptual units

we show that you use the importance of edus from intra document

weights

correlate with the inter documents concept weights

and we also look at near extractive summarization where first i introduce a large dataset

for extractive in your extractive summaries

it's are released on this website

and

we showed in this dataset edu boundaries along with human called doesn't extraction and finally

edus are superior to sentences in your extractive summarization under varying length constraint

and that's all thanks for your attention i will come in questions

are you referring to kind of the boundary for use or you're referring to

so there also like the importance of the concept right

i think depends on how someone want to express something that importance itself may be

different but as we can see the summaries our problem for different people

but we also

observe this kind of correlation which we found really interesting but we need to look

into more like why this is the case

but for used i think

for

like we analyze two corpora one it's like

different summaries from different people and second one is a gold summaries from editors

we see that good correspondence with each case so i'm pretty confident that you know

this is okay

right we're not looking at the coherence and grammatical it for this work but it's

part of the future plans that we have

for some reason we still find a good readable summaries

for example

well if we look at this one for the edus it's

build very reasonable but i wouldn't say like everything is super grammatical or and

we will see different edus being attached just because the summarizer want to fulfil the

length it doesn't make sense and

things like that is what happened

no not at all so all of our features for the summarizer we bypassed anything

that has

that will show the advantage or disadvantage for each type of unit

so we are only using things like position and

a similarity cosine similarity and things like adjacency and so on

the weights

yes i we didn't use the parser but it's for the summarization task we only

use the edus

but for the analysis part we did look at the weights for the use and

we associate that with the weights for concept

right there is a common work that's why we did for parsing

right so

the pdtb

it doesn't have two things that i think we really need in this task

the first one is of full segmentation

so the pdtb arguments are

but like they have very

but we have a lot of freedom to where the position of the arguments are

and they're not a segmentations are nothing is contiguous

the second part is we don't like for the pdtb there's nothing

associated with salient so if we want to consider weights

or so or salience we cannot do that pdtb

The Role of Discourse Units in Near-Extractive Summarization

Oral Session 3: Discourse processing

Junyi Jessy Li, Kapil Thadani and Amanda Stent