Speech Transcript - The SENSEI Annotated Corpus: Human Summaries of Reader Comment Conversations in On-line News

0:00:15	and well you have to name cream right
0:00:20	and this is another resources a paper like the last one that describes how we
0:00:23	go a corpus of human authored a reference summaries for we'd a comma conversations online
0:00:30	news so start off with a this work a fictional pitch
0:00:34	so i think most is no or where a reader comments there you know wide
0:00:38	range of online news sources some of which are shown down the right
0:00:42	and these as a multi way conversations
0:00:44	and they got lots of in information of potential buyers lot of rubbish as well
0:00:48	as a lot of information of facial value to a range of users including
0:00:51	one just people typically reading but people posting comments as well as reading to journalists
0:00:56	and use that is maybe that and was and so on
0:00:59	however i'm sure you've noticed this if you looked at these major problem is of
0:01:03	the news article may quickly tract hundreds even thousands of comments
0:01:06	if you readers a have the patience to wait three this much
0:01:10	so i just estimation seems to be going to be last week automatic summarize these
0:01:14	languages to gain some of overview of what's going on in this conversation
0:01:20	cases but if i were don't known as already and divided the approach is up
0:01:24	and sort of two broad categories what you michael
0:01:27	technology different approaches that is
0:01:29	let's try what we already know how to do and see how well it works
0:01:32	so the idea is well i we without topic to cluster things so that's cluster
0:01:36	all these comments topically using something like lda
0:01:39	and then that's rank them using some sort of ranking algorithm goes ranked clusters rank
0:01:43	senses and the clusters and levels build an extractive summary
0:01:46	from the results
0:01:48	and subsets of this we now to do when it generates a so-called summaries but
0:01:52	in fact if you look at them
0:01:54	i don't for a good summaries their
0:01:55	they fail to capture the argument oriented nature of the
0:01:58	a couple of the conversations pretty spectacularly
0:02:03	the set of approaches which haven't really come to fruition that's are promising our might
0:02:08	be called argument very different approaches those lot of work are given meeting social media
0:02:12	this results in various schemes
0:02:14	defining argument elements and relations is an argument of discourse and of such elements relations
0:02:20	could in fact be detected in is these comments
0:02:23	and they might form a basis for building a summary and the number of people
0:02:27	working in this area have cited summarization is about a motivation for their work
0:02:32	our noses yet proposed have given analysis of sort they
0:02:36	a person with there is one code actually
0:02:39	i drive a summary from the full reader comments that
0:02:44	a well what we talk when we started this project that so the sensei project
0:02:48	on which this work is based
0:02:51	that's really what was need it is a an answer to the underlying fundamental question
0:02:57	watches a summary of reader comments you like
0:02:59	and also be helpful if we had seen him a generated exemplars
0:03:03	for a real sets reader comments
0:03:06	and this would allow us to both a better select appropriate technologies reader comments summarization
0:03:10	and also
0:03:12	to evaluate and develop are systems using these materials
0:03:17	okay so that's gone interaction hands be structure for the talk so
0:03:20	and the next talk a look at those watches a summary of reader comments like
0:03:24	all then talk about a method that we developed for building or authoring reader comments
0:03:28	summaries
0:03:29	and talk about the corpus we don't
0:03:31	some comments on related work in search time and conclusions and future work
0:03:37	well what should the summary of reader comments be like well i think one can
0:03:41	start from i think some remarks made by karen's barge ins that's a really what
0:03:45	a summary should be like depends on the nature of the common sense that one
0:03:48	wants to summarize of the used to which these summaries to be part
0:03:52	so if we look at reader comments to say what characteristics almost must you think
0:03:56	one that is common sets are typically organized into threads based on reply to structure
0:04:01	every comment falls into exactly
0:04:03	one thread another initiates a nice red or
0:04:06	replies to exactly one comment earlier in the thread
0:04:09	as a consequence that these conversations have the formal character a set of trees
0:04:14	after an initial combat is really three separate rate
0:04:17	now we have a comments or other intermediate or leaf nodes whose parrot is the
0:04:21	comments which the reply
0:04:23	now you might not have naively think that these threads are gonna be talking to
0:04:27	a cohesive
0:04:28	in practice they rarely are the same topic may be addressed across multiple threads
0:04:33	actual conversation get long as people don't bother reading what's going before so they start
0:04:37	off the same thing again
0:04:38	and a single thread major from one topic onto another so there there's as many
0:04:42	relation between
0:04:45	threads and topics
0:04:46	so he's quite example this is a big are indian a must for our data
0:04:50	sources the guard in this paper in the high
0:04:53	it's only hotly debated issue of
0:04:56	when the very town councils council northern england decided to reduce
0:05:00	robustness garbage collection see once every three weeks ones are rather than once every two
0:05:05	weeks
0:05:05	as you can imagine that sparked after all right
0:05:08	and there are a course you compositions the original
0:05:11	articles appear in the guardian a quick summary of the article the top followed by
0:05:14	the detail
0:05:15	and then the common starts on these are
0:05:18	how well it sort of like this so starts off or something
0:05:21	i can see how would attract
0:05:23	right so another environment
0:05:25	i know some difficult decisions had it may with cost funding but this seems like
0:05:28	very poorly funded idea
0:05:30	and then someone replies
0:05:32	only people use compost bins and have no trouble with route score or foxes
0:05:36	and so i don't roles and like this
0:05:40	so our observation having looked a lot
0:05:43	very many of these
0:05:44	as a reader comments are primarily then or exclusively a comeback this argumentative the nature
0:05:50	i was readers making
0:05:52	assertions that either express a viewpoint or stances some college or on an issue
0:05:58	raising the original article or by an earlier comment
0:06:00	or providing evidence or grounds for believing if you want or assertion it's already been
0:06:05	expressed
0:06:06	so in the approach and with are developed a theoretical framework which reported in a
0:06:11	a paper wrist argument mining workshop and in berlin a so it works well
0:06:16	issues at the frame was based on the notion of issue where issues a question
0:06:19	on which all of you have you points are possible so for instance shouldn't collect
0:06:24	should be produced once every three weeks
0:06:26	which is a binary alternatives
0:06:28	i didn't be binary that they can be an open-ended as well like
0:06:33	been initially something what was the best from two thousand and fifteen
0:06:37	else is worth noting the issues are often implies that is the not directly expressed
0:06:41	in the comments
0:06:42	and so for instance this issue which unfolds on the common set it is referred
0:06:46	to
0:06:46	well reducing been collection lead to an increase in vermin is never
0:06:50	explicitly mention as such as an issue
0:06:52	well the people this should be in on either side of it and the readers
0:06:55	left to
0:06:57	infer the fact whether argue there is this issue will reducing the intellectually to an
0:07:01	increase environment
0:07:04	so i again as i mentioned while comments are primarily argumentative of course or other
0:07:09	things as well for instance to macy clarification about facts and the may provide background
0:07:15	as the
0:07:17	speakers mention of course they strictly
0:07:19	include a jokes or
0:07:20	sarcasm one from another of motion often these other things are really
0:07:25	they're in the service of some
0:07:27	addressing some viewpoint to taking a stand on a particular issue
0:07:31	so sarcasm automotive terms which are currently this barry been collection argument things like a
0:07:36	lame brained in crazy and some come along indicate commented stance as well as their
0:07:41	commercial added
0:07:44	okay so given that these things a primary argument is
0:07:47	we also i a useful sort summary would be a generic of informative summary that
0:07:52	attempted to give an overview of the arguments in the commons
0:07:56	and when we were selected on that and discussed at some length seems that the
0:07:59	key thing we wanted and sort of overview summary
0:08:03	but we then find articulate the main issues in the comment that is the questions
0:08:07	of things that people are working about their taking signs on
0:08:09	and to characterize the opinion on the main issue so
0:08:13	identifying alternative viewpoints indicating grounds given support viewpoints
0:08:17	aggregating so cross of the same opinions expressed multiple times what proportion of them
0:08:22	the comment is around one side or another of arguments
0:08:25	and indicating whether there's consensus or disagreement looks comments
0:08:29	we then i put this proposal for among several other proposals for
0:08:35	and summary times and two
0:08:40	a set of you know sort of respondents without question i would not very positive
0:08:44	feedback on this on the summary type of these responses include not just
0:08:48	authors and readers of your common journalists and use that is as well
0:08:52	and so the based on that we developed a set of guidelines for authoring this
0:08:55	summaries
0:08:57	and
0:08:58	we try not to make them to
0:09:00	prescriptive in the sense of we'd give someone theory of argumentation so you must build
0:09:04	a summary in accordance with this their ear other
0:09:07	we told them about we can introduce these ideas of
0:09:09	identifying issues in characterizing opinion and then not them
0:09:13	more or less follow their news data that one is to what into we don't
0:09:17	like the best way to summarize
0:09:20	okay so on to be the method then
0:09:22	so as you if you've audible a already
0:09:25	since i started speaking or if you set m studios
0:09:28	you realise very quickly the writing summaries of large numbers of reader comments is very
0:09:31	hard
0:09:32	so we first started this problem we had no idea how we go about it
0:09:35	and we put set and read
0:09:37	a hundred to any comments and thought
0:09:38	unlike what happened we summarize this
0:09:41	so it's clear you need to break it down and some multiple stage processing is
0:09:44	able to tools to support process
0:09:46	and that's we've done
0:09:48	since we're gonna down to four stage process the really only the first three stages
0:09:52	have to do summary
0:09:53	offering
0:09:54	and the last stages something extra which will come back to
0:09:57	so the first stage is what we call a common labeling as on the stage
0:10:02	of all annotators go through the conversation comment by comment and write a brief label
0:10:07	or
0:10:08	you like many summary which tries to capture the
0:10:10	essential domain central point the person's making in that common
0:10:15	and seven to some additional things or someone read three can with improve the what
0:10:19	else annotators rested over there are few examples up arrow the top l one of
0:10:23	paradoxes of this is that these things are there are also has to bear in
0:10:26	mind they may look at these and context later
0:10:30	and so we need right enough that they can understand without having to go back
0:10:32	and look at the whole rather conversations in some cases
0:10:37	anaphora will be expanded the weather in the label making the label paradoxically longer than
0:10:41	the comment
0:10:41	at this lesson to be looked at that actually a independently later on
0:10:46	so that the and then this is the interface we don't for this was function
0:10:50	is to parse the left and green circle that
0:10:53	is pretty populated from the conversation automatically and then the annotators distill and their labels
0:10:59	on the right ears as a conversation about
0:11:03	network rail doing fine for like running trains in the u k
0:11:06	and various right cheaper than writing short a labels like
0:11:10	and that for real ticket prices the comments applying would seem high
0:11:14	some not saying that were rounded mozart's fares are
0:11:18	or operate trains
0:11:20	and so on these are summaries of a common so you see them but must
0:11:23	is as the much or
0:11:25	second stage then is to improve these labels together and topically
0:11:30	okay so annotators to group
0:11:32	written together
0:11:33	i placed by putting those we just similar rate of the same group
0:11:37	but then assign a group label that describe the common theme the group
0:11:41	and we allow them one level of all subgrouping
0:11:44	and since some people particular found much easier for a good
0:11:48	wrongly group things and then as conversation able to realise is a more structured element
0:11:52	word subgroups things a bit but we didn't want them to be arbitrary
0:11:55	the subgrouping
0:11:58	and so these but going through the sections of the grouping then
0:12:02	allows the annotators be better place to make sense of the row constant
0:12:05	the comments before they come to writing a summary
0:12:07	and again there's a and interface looks like this
0:12:10	and so first they just get all the all the labels and then they connect
0:12:13	groups by pressing a button to add a new group in a group label
0:12:16	so you end of something we got a group label them they
0:12:19	they don't the labels or many summaries which the comments underneath is i don't the
0:12:24	next group so one
0:12:26	the annotators can go back to the previous screen of older comments on the full
0:12:31	text as they if they wished as well
0:12:34	the first baseline is generating the summary
0:12:37	so we asked annotators try to summaries one which is to do first is an
0:12:41	unconstrained one or several don't worry about the airlines too much
0:12:44	just try to summary
0:12:46	and then the second one is constrained where we said no more than the last
0:12:49	and hundred and fifty no more than two hundred fifty words
0:12:52	and they do that with the first thing constraint summary available as we have reference
0:12:57	so
0:12:59	further analysis obviously takes place as the annotators go through that stage
0:13:03	and may have developed a group label for their and turning it into a summary
0:13:08	and sentences and right and so
0:13:10	we encourage annotators to use phrases like
0:13:12	many several few common to serve basque
0:13:15	opinion was divided on the consensus was someone
0:13:19	to try to capture the integration or to extract over a number of separate comments
0:13:26	so again there's interface for this on the right sort of the left and the
0:13:30	green circle you see the previous stage to stage to it but with the working
0:13:33	on the right
0:13:34	they offer the summary with a
0:13:36	word attention right of the boredom which dynamically changes to the right it's like can
0:13:41	see how long they summary
0:13:43	okay so that completes the sum rewriting and four stages of backtracking stage where which
0:13:48	isn't strictly necessary creating the summaries was very useful
0:13:52	as resource and for further that's
0:13:56	algorithm phones you see later so we asked the authors and select the sentence length
0:14:01	the sentences and the constraint like summary
0:14:03	two or more groups that form the creation of that sentence
0:14:07	okay so really i think some large groups of labels but since the labels themselves
0:14:12	have an associated
0:14:13	comment id we can actually link directly back from the summary sentences to the source
0:14:17	comments that support of them
0:14:20	and there's interface again look at a detail here were effectively each summary sentence is
0:14:25	presented at all
0:14:27	and then the
0:14:28	and annotated can select
0:14:30	which of the grooves inform the construction of that sounds all that's recorded
0:14:36	okay so coming onto the corpus
0:14:38	so they were
0:14:39	fifteen annotators who carried it summary writing task mostly
0:14:45	finally a german some stains grice's of expertise and language and writing in academics
0:14:50	and majority were native english speakers this they all have a for english writing skills
0:14:55	how to get which given a training session and their guidelines produced as well
0:14:59	and the data source ones
0:15:02	about three on staff thousand guardian articles of social common sense published in joining gyms
0:15:07	it doesn't fourteen
0:15:08	then we select a small subset of that
0:15:11	in fact eighteen articles
0:15:14	in these domains listed here also export health et cetera
0:15:18	huh from each of these with like to approximate the first hundred comments from each
0:15:21	full common set
0:15:23	that is more detail of precisely how this is done on paper
0:15:26	so you see it's army of the kind for we iterate underlining corpus that top
0:15:31	in terms of your article length complex and so one and so forth
0:15:35	but overall me this there's eighteen articles but
0:15:38	full of
0:15:40	the number of
0:15:41	common set total comments is close to seven files and almost ninety thousand words in
0:15:46	total
0:15:48	i don't see annotation characteristics so
0:15:51	is at articles and a plus common sense of them fifteen were doubly annotated three
0:15:56	we're triple annotated
0:15:57	and it is even with the tools you can see the annotators to three and
0:16:00	have to six hours to complete the task for one article plus comments
0:16:05	so this is a non trivial undertaking idea
0:16:07	anchorage it right without some serious
0:16:10	commitment
0:16:11	but we replace of the results at their they thirty nine in each of these
0:16:15	thirteen annotations assisting summaries
0:16:18	each the summaries and so startling to one or more groups comments so all of
0:16:21	this is in the corpus which is now available for down
0:16:26	and i gonna some statistics don't for the paper which why we're going to in
0:16:29	detail about the numbers here of annotations so
0:16:35	they just a bit of qualitative analysis of the quantitative analysis
0:16:40	before it turned related work in conclusion so and looking over the one slot striking
0:16:45	things as the people group things
0:16:48	in different sorts of way is particularly they i guess this is the famous a
0:16:52	lumber is first displayed here is that we're finding
0:16:55	and so on average there was something like nine
0:16:57	across the whole annotation
0:17:01	all annotations the average number of groups for annotations that was nine range from four
0:17:06	not able to fourteen point five
0:17:08	for some braves the average pronunciation set is five
0:17:12	so most annotators use the subgroup option at least once
0:17:16	and but in fact there's quite a divide between those who use the same rules
0:17:20	quite frequently and those you only used rolled are rarely
0:17:26	and so pleadingly from are from the source back without initially for the
0:17:31	a target summaries all of them contain
0:17:34	sense reporting views on different views on issues
0:17:37	and they frequently picked a points of contention
0:17:40	a provided examples of the reasons people gain support of viewpoints
0:17:44	they frequently indicated proportionate amount of
0:17:47	of commenters talking about the views and so
0:17:50	a so the whereas we think the mlp what we one of them to do
0:17:54	quite well
0:17:55	a couple of examples here is a coded this one with
0:17:58	red highlighting the comments that are
0:18:01	expressing sort of aggregation
0:18:03	and a green identifying some of the issues that more explicitly stated in the summaries
0:18:09	i've got another one about skip over the
0:18:11	so quite healthy looking
0:18:14	summaries the sort
0:18:16	i and we show these
0:18:17	that's if a common so we
0:18:19	we actually showed used to various people in particular the guardian themselves and they were
0:18:24	very impressed if you could do this automatically now
0:18:26	we be very happy
0:18:29	so we also that quickly looked at the
0:18:34	try to this determine how similar the summaries were used in this not the sort
0:18:38	that used in back and two thousand one
0:18:40	where you compare the contrary see what for you look for each sentence in us
0:18:45	a summary a to see whether all its contents covered in summary a and then
0:18:48	you do the and then you the reverse
0:18:51	using a sort of likert scale system
0:18:53	to see how what commonality is
0:18:55	and as a running a timeout skipped is very quickly but
0:18:58	essentially we determine there is affirmative
0:19:01	of that's in the summers are quite similar you're not there is a problem with
0:19:05	or not
0:19:07	i in a one extracts different reference summaries
0:19:10	they are relatively similar there is a high level of agreement between the judges and
0:19:13	making the judgement similarity
0:19:16	what i've only got very short time lasso
0:19:19	a bias the really work is cover the
0:19:22	and the in the paper is to say a high-level think of three sorts of
0:19:26	things
0:19:27	a sentence assessment which is a approach to than others of user building resource that's
0:19:31	for evaluating extractive summaries
0:19:34	i don't from real of
0:19:36	reader comments which we used
0:19:38	necessarily i think is the one way to gel essentially
0:19:42	work on the any corpus which but a detailed comparison here but read that in
0:19:46	the paper
0:19:47	essentially what we do similar so that the different several key ways perhaps
0:19:51	and most importantly that they're summarizing meeting reports in which are much more
0:19:56	there are a fixed domain and you can anticipate the sorts of things they're gonna
0:20:00	immersion a meeting where is you can't and reader comments
0:20:03	and finally
0:20:04	some work by misread well on summarizing arguments in
0:20:08	across conversations but where the focus of the work is really on try to summarise
0:20:12	an argument
0:20:14	so it's something like gun control or
0:20:16	i gay marriage across a whole set of different online conversations rather trials summarize all
0:20:21	v
0:20:22	all comments and single conversation which may be able to different topics
0:20:27	so distinctly then we've developed
0:20:30	we proposed a of all over the summer that captures key content
0:20:34	of these was able to multiparty argue but oriented conversations developed a method how humans
0:20:39	also such things
0:20:41	and used a method of the first publicly available corpus of reader comments probably annotated
0:20:45	summaries another information
0:20:47	we think summaries produced a pretty good with that achieved a comment
0:20:52	and we also use the already been able to use the corpus for whole sets
0:20:55	of things for instance reviews the grouping to evaluate clustering algorithms
0:20:59	we use the back things top and form a unsupervised cluster late a cause for
0:21:03	labeling algorithm
0:21:05	and we've done a
0:21:07	i use the summaries to inform assessors entire space system evaluation
0:21:11	and just very quickly future work well obviously the corpus is limited size would like
0:21:16	to make it bigger
0:21:17	scalability we still have to prove that scales a two thousand comments from say a
0:21:22	hundred
0:21:23	we think it well but that's just think we'd have to we have to investigate
0:21:27	this and also we like to see whether we can think about some ways of
0:21:30	maybe crowdsourcing smaller amounts of the sampling altogether
0:21:35	as more questions today would groups and subgroups and finally there's evaluation how do you
0:21:40	evaluate against these things
0:21:42	why last point so is relation appropriate method
0:21:45	that is to be investigated if not how what we do it
0:21:50	so this to finish would like to acknowledge then the european community for funding this
0:21:54	work under the
0:21:55	sensei project guardian for lattice use the materials and redistributed
0:21:59	are annotated for hard work reviewers here for helpful comments
0:22:04	that a questions that if you would like to download the corpus is available
0:22:20	yes and the back
0:22:45	well
0:22:48	if you have so
0:22:50	we have a system that's get an interesting question thank you we have a
0:22:54	which will system that the does clustering with the several clustering all those including lda
0:22:58	and we put all the all the comments in particular clusters together
0:23:02	and people look at the clusters and we usually say
0:23:05	and
0:23:06	another where is the clusters then that some of the argument of the structure is
0:23:09	lost and people actually don't like having these clusters but in front of these and
0:23:13	users that they want to go back to see the visual context "'cause" they can
0:23:16	really only makes sense of the comments
0:23:19	in the dialogic context where there is an argument for it again this don't make
0:23:22	sense pulled out on the road are clustered together
0:23:25	so it's an interesting idea but i don't think it's gonna help people speed up
0:23:29	and doing the task i think they
0:23:31	i need to do the grouping on their to be intra one idea you comments
0:23:35	just
0:23:36	maybe think of the be interesting to see the extent to which the
0:23:40	well we had done formal evaluation using the standard sort of
0:23:45	pages for evaluating clustering of the machine gender clusters in one's of the scores are
0:23:50	up to get a good will be more interesting to see use actually to do
0:23:54	something analysis on that a look at how
0:23:56	the sorts of things that are that the
0:23:59	algorithms putting into the clusters that humans are excluding so but
0:24:02	essentially i don't think what happens in summary writing
0:24:05	that it could help in
0:24:08	obviously an algorithm development which is also important
0:24:31	i think is that there is
0:24:33	the sre some record a question what think we're hugger what it was or the
0:24:36	suggestion was that we think about
0:24:39	i guess is a sort of active learning approach or something like this where the
0:24:42	system you annotate something the system uses the time at a more common somehow hopefully
0:24:46	speed up the annotation is that correct
0:24:48	so we don't like
0:24:49	so it is good idea we have followed by doing things like that
0:24:52	but we have no contrast trying women's in practice to see how what i really
0:24:55	work thanks
0:25:01	then
0:25:43	which ones
0:25:47	of the so this is a and after the fact that were after-the-fact assessment of
0:25:53	what was going on
0:25:54	it wasn't called think of the summary creation this was
0:26:32	well we're where we want
0:26:36	we well we don't have to i mean we
0:26:39	with the results actually has multiple different reference summaries the way a lot of reference
0:26:43	summaries sensor data
0:26:44	and then we just came back afterwards and so that's better of interest has similarities
0:26:48	to each other
0:26:50	so it's not hard to produce in the resource that we did that stuff that's
0:26:53	actually part of analysing it afterwards to see the extent to which
0:26:56	these things are similar
0:27:12	yes
0:27:13	so it's like
0:27:14	so i guess we could then
0:27:16	it's also like what people call sort of reconciliation we have multiple annotators do some
0:27:21	you try to progress that the proposed a single gold standard
0:27:24	so we couldn't act do another stage now
0:27:26	then for each this multiple things and do the reconciliation and come up and say
0:27:30	well this is
0:27:31	i this is the reconcile set the perfect summary if you like of the set
0:27:36	of
0:27:39	yes i like permit i guess a sort from a larger no
0:27:42	i got is i mean
0:27:43	actually we wanna resorts to do this space but there's lots more you could do
0:27:48	in fact that somebody want to do that on top of what we're releasing that
0:27:50	will be great
0:27:52	i wonder
0:27:55	okay this like robert

The SENSEI Annotated Corpus: Human Summaries of Reader Comment Conversations in On-line News

Oral Session 2: Corpus creation

Emma Barker, Monica Lestari Paramita, Ahmet Aker, Emina Kurtic, Mark Hepple and Robert Gaizauskas