0:00:15and well you have to name cream right
0:00:20and this is another resources a paper like the last one that describes how we
0:00:23go a corpus of human authored a reference summaries for we'd a comma conversations online
0:00:30news so start off with a this work a fictional pitch
0:00:34so i think most is no or where a reader comments there you know wide
0:00:38range of online news sources some of which are shown down the right
0:00:42and these as a multi way conversations
0:00:44and they got lots of in information of potential buyers lot of rubbish as well
0:00:48as a lot of information of facial value to a range of users including
0:00:51one just people typically reading but people posting comments as well as reading to journalists
0:00:56and use that is maybe that and was and so on
0:00:59however i'm sure you've noticed this if you looked at these major problem is of
0:01:03the news article may quickly tract hundreds even thousands of comments
0:01:06if you readers a have the patience to wait three this much
0:01:10so i just estimation seems to be going to be last week automatic summarize these
0:01:14languages to gain some of overview of what's going on in this conversation
0:01:20cases but if i were don't known as already and divided the approach is up
0:01:24and sort of two broad categories what you michael
0:01:27technology different approaches that is
0:01:29let's try what we already know how to do and see how well it works
0:01:32so the idea is well i we without topic to cluster things so that's cluster
0:01:36all these comments topically using something like lda
0:01:39and then that's rank them using some sort of ranking algorithm goes ranked clusters rank
0:01:43senses and the clusters and levels build an extractive summary
0:01:46from the results
0:01:48and subsets of this we now to do when it generates a so-called summaries but
0:01:52in fact if you look at them
0:01:54i don't for a good summaries their
0:01:55they fail to capture the argument oriented nature of the
0:01:58a couple of the conversations pretty spectacularly
0:02:03the set of approaches which haven't really come to fruition that's are promising our might
0:02:08be called argument very different approaches those lot of work are given meeting social media
0:02:12this results in various schemes
0:02:14defining argument elements and relations is an argument of discourse and of such elements relations
0:02:20could in fact be detected in is these comments
0:02:23and they might form a basis for building a summary and the number of people
0:02:27working in this area have cited summarization is about a motivation for their work
0:02:32our noses yet proposed have given analysis of sort they
0:02:36a person with there is one code actually
0:02:39i drive a summary from the full reader comments that
0:02:44a well what we talk when we started this project that so the sensei project
0:02:48on which this work is based
0:02:51that's really what was need it is a an answer to the underlying fundamental question
0:02:57watches a summary of reader comments you like
0:02:59and also be helpful if we had seen him a generated exemplars
0:03:03for a real sets reader comments
0:03:06and this would allow us to both a better select appropriate technologies reader comments summarization
0:03:10and also
0:03:12to evaluate and develop are systems using these materials
0:03:17okay so that's gone interaction hands be structure for the talk so
0:03:20and the next talk a look at those watches a summary of reader comments like
0:03:24all then talk about a method that we developed for building or authoring reader comments
0:03:28summaries
0:03:29and talk about the corpus we don't
0:03:31some comments on related work in search time and conclusions and future work
0:03:37well what should the summary of reader comments be like well i think one can
0:03:41start from i think some remarks made by karen's barge ins that's a really what
0:03:45a summary should be like depends on the nature of the common sense that one
0:03:48wants to summarize of the used to which these summaries to be part
0:03:52so if we look at reader comments to say what characteristics almost must you think
0:03:56one that is common sets are typically organized into threads based on reply to structure
0:04:01every comment falls into exactly
0:04:03one thread another initiates a nice red or
0:04:06replies to exactly one comment earlier in the thread
0:04:09as a consequence that these conversations have the formal character a set of trees
0:04:14after an initial combat is really three separate rate
0:04:17now we have a comments or other intermediate or leaf nodes whose parrot is the
0:04:21comments which the reply
0:04:23now you might not have naively think that these threads are gonna be talking to
0:04:27a cohesive
0:04:28in practice they rarely are the same topic may be addressed across multiple threads
0:04:33actual conversation get long as people don't bother reading what's going before so they start
0:04:37off the same thing again
0:04:38and a single thread major from one topic onto another so there there's as many
0:04:42relation between
0:04:45threads and topics
0:04:46so he's quite example this is a big are indian a must for our data
0:04:50sources the guard in this paper in the high
0:04:53it's only hotly debated issue of
0:04:56when the very town councils council northern england decided to reduce
0:05:00robustness garbage collection see once every three weeks ones are rather than once every two
0:05:05weeks
0:05:05as you can imagine that sparked after all right
0:05:08and there are a course you compositions the original
0:05:11articles appear in the guardian a quick summary of the article the top followed by
0:05:14the detail
0:05:15and then the common starts on these are
0:05:18how well it sort of like this so starts off or something
0:05:21i can see how would attract
0:05:23right so another environment
0:05:25i know some difficult decisions had it may with cost funding but this seems like
0:05:28very poorly funded idea
0:05:30and then someone replies
0:05:32only people use compost bins and have no trouble with route score or foxes
0:05:36and so i don't roles and like this
0:05:40so our observation having looked a lot
0:05:43very many of these
0:05:44as a reader comments are primarily then or exclusively a comeback this argumentative the nature
0:05:50i was readers making
0:05:52assertions that either express a viewpoint or stances some college or on an issue
0:05:58raising the original article or by an earlier comment
0:06:00or providing evidence or grounds for believing if you want or assertion it's already been
0:06:05expressed
0:06:06so in the approach and with are developed a theoretical framework which reported in a
0:06:11a paper wrist argument mining workshop and in berlin a so it works well
0:06:16issues at the frame was based on the notion of issue where issues a question
0:06:19on which all of you have you points are possible so for instance shouldn't collect
0:06:24should be produced once every three weeks
0:06:26which is a binary alternatives
0:06:28i didn't be binary that they can be an open-ended as well like
0:06:33been initially something what was the best from two thousand and fifteen
0:06:37else is worth noting the issues are often implies that is the not directly expressed
0:06:41in the comments
0:06:42and so for instance this issue which unfolds on the common set it is referred
0:06:46to
0:06:46well reducing been collection lead to an increase in vermin is never
0:06:50explicitly mention as such as an issue
0:06:52well the people this should be in on either side of it and the readers
0:06:55left to
0:06:57infer the fact whether argue there is this issue will reducing the intellectually to an
0:07:01increase environment
0:07:04so i again as i mentioned while comments are primarily argumentative of course or other
0:07:09things as well for instance to macy clarification about facts and the may provide background
0:07:15as the
0:07:17speakers mention of course they strictly
0:07:19include a jokes or
0:07:20sarcasm one from another of motion often these other things are really
0:07:25they're in the service of some
0:07:27addressing some viewpoint to taking a stand on a particular issue
0:07:31so sarcasm automotive terms which are currently this barry been collection argument things like a
0:07:36lame brained in crazy and some come along indicate commented stance as well as their
0:07:41commercial added
0:07:44okay so given that these things a primary argument is
0:07:47we also i a useful sort summary would be a generic of informative summary that
0:07:52attempted to give an overview of the arguments in the commons
0:07:56and when we were selected on that and discussed at some length seems that the
0:07:59key thing we wanted and sort of overview summary
0:08:03but we then find articulate the main issues in the comment that is the questions
0:08:07of things that people are working about their taking signs on
0:08:09and to characterize the opinion on the main issue so
0:08:13identifying alternative viewpoints indicating grounds given support viewpoints
0:08:17aggregating so cross of the same opinions expressed multiple times what proportion of them
0:08:22the comment is around one side or another of arguments
0:08:25and indicating whether there's consensus or disagreement looks comments
0:08:29we then i put this proposal for among several other proposals for
0:08:35and summary times and two
0:08:40a set of you know sort of respondents without question i would not very positive
0:08:44feedback on this on the summary type of these responses include not just
0:08:48authors and readers of your common journalists and use that is as well
0:08:52and so the based on that we developed a set of guidelines for authoring this
0:08:55summaries
0:08:57and
0:08:58we try not to make them to
0:09:00prescriptive in the sense of we'd give someone theory of argumentation so you must build
0:09:04a summary in accordance with this their ear other
0:09:07we told them about we can introduce these ideas of
0:09:09identifying issues in characterizing opinion and then not them
0:09:13more or less follow their news data that one is to what into we don't
0:09:17like the best way to summarize
0:09:20okay so on to be the method then
0:09:22so as you if you've audible a already
0:09:25since i started speaking or if you set m studios
0:09:28you realise very quickly the writing summaries of large numbers of reader comments is very
0:09:31hard
0:09:32so we first started this problem we had no idea how we go about it
0:09:35and we put set and read
0:09:37a hundred to any comments and thought
0:09:38unlike what happened we summarize this
0:09:41so it's clear you need to break it down and some multiple stage processing is
0:09:44able to tools to support process
0:09:46and that's we've done
0:09:48since we're gonna down to four stage process the really only the first three stages
0:09:52have to do summary
0:09:53offering
0:09:54and the last stages something extra which will come back to
0:09:57so the first stage is what we call a common labeling as on the stage
0:10:02of all annotators go through the conversation comment by comment and write a brief label
0:10:07or
0:10:08you like many summary which tries to capture the
0:10:10essential domain central point the person's making in that common
0:10:15and seven to some additional things or someone read three can with improve the what
0:10:19else annotators rested over there are few examples up arrow the top l one of
0:10:23paradoxes of this is that these things are there are also has to bear in
0:10:26mind they may look at these and context later
0:10:30and so we need right enough that they can understand without having to go back
0:10:32and look at the whole rather conversations in some cases
0:10:37anaphora will be expanded the weather in the label making the label paradoxically longer than
0:10:41the comment
0:10:41at this lesson to be looked at that actually a independently later on
0:10:46so that the and then this is the interface we don't for this was function
0:10:50is to parse the left and green circle that
0:10:53is pretty populated from the conversation automatically and then the annotators distill and their labels
0:10:59on the right ears as a conversation about
0:11:03network rail doing fine for like running trains in the u k
0:11:06and various right cheaper than writing short a labels like
0:11:10and that for real ticket prices the comments applying would seem high
0:11:14some not saying that were rounded mozart's fares are
0:11:18or operate trains
0:11:20and so on these are summaries of a common so you see them but must
0:11:23is as the much or
0:11:25second stage then is to improve these labels together and topically
0:11:30okay so annotators to group
0:11:32written together
0:11:33i placed by putting those we just similar rate of the same group
0:11:37but then assign a group label that describe the common theme the group
0:11:41and we allow them one level of all subgrouping
0:11:44and since some people particular found much easier for a good
0:11:48wrongly group things and then as conversation able to realise is a more structured element
0:11:52word subgroups things a bit but we didn't want them to be arbitrary
0:11:55the subgrouping
0:11:58and so these but going through the sections of the grouping then
0:12:02allows the annotators be better place to make sense of the row constant
0:12:05the comments before they come to writing a summary
0:12:07and again there's a and interface looks like this
0:12:10and so first they just get all the all the labels and then they connect
0:12:13groups by pressing a button to add a new group in a group label
0:12:16so you end of something we got a group label them they
0:12:19they don't the labels or many summaries which the comments underneath is i don't the
0:12:24next group so one
0:12:26the annotators can go back to the previous screen of older comments on the full
0:12:31text as they if they wished as well
0:12:34the first baseline is generating the summary
0:12:37so we asked annotators try to summaries one which is to do first is an
0:12:41unconstrained one or several don't worry about the airlines too much
0:12:44just try to summary
0:12:46and then the second one is constrained where we said no more than the last
0:12:49and hundred and fifty no more than two hundred fifty words
0:12:52and they do that with the first thing constraint summary available as we have reference
0:12:57so
0:12:59further analysis obviously takes place as the annotators go through that stage
0:13:03and may have developed a group label for their and turning it into a summary
0:13:08and sentences and right and so
0:13:10we encourage annotators to use phrases like
0:13:12many several few common to serve basque
0:13:15opinion was divided on the consensus was someone
0:13:19to try to capture the integration or to extract over a number of separate comments
0:13:26so again there's interface for this on the right sort of the left and the
0:13:30green circle you see the previous stage to stage to it but with the working
0:13:33on the right
0:13:34they offer the summary with a
0:13:36word attention right of the boredom which dynamically changes to the right it's like can
0:13:41see how long they summary
0:13:43okay so that completes the sum rewriting and four stages of backtracking stage where which
0:13:48isn't strictly necessary creating the summaries was very useful
0:13:52as resource and for further that's
0:13:56algorithm phones you see later so we asked the authors and select the sentence length
0:14:01the sentences and the constraint like summary
0:14:03two or more groups that form the creation of that sentence
0:14:07okay so really i think some large groups of labels but since the labels themselves
0:14:12have an associated
0:14:13comment id we can actually link directly back from the summary sentences to the source
0:14:17comments that support of them
0:14:20and there's interface again look at a detail here were effectively each summary sentence is
0:14:25presented at all
0:14:27and then the
0:14:28and annotated can select
0:14:30which of the grooves inform the construction of that sounds all that's recorded
0:14:36okay so coming onto the corpus
0:14:38so they were
0:14:39fifteen annotators who carried it summary writing task mostly
0:14:45finally a german some stains grice's of expertise and language and writing in academics
0:14:50and majority were native english speakers this they all have a for english writing skills
0:14:55how to get which given a training session and their guidelines produced as well
0:14:59and the data source ones
0:15:02about three on staff thousand guardian articles of social common sense published in joining gyms
0:15:07it doesn't fourteen
0:15:08then we select a small subset of that
0:15:11in fact eighteen articles
0:15:14in these domains listed here also export health et cetera
0:15:18huh from each of these with like to approximate the first hundred comments from each
0:15:21full common set
0:15:23that is more detail of precisely how this is done on paper
0:15:26so you see it's army of the kind for we iterate underlining corpus that top
0:15:31in terms of your article length complex and so one and so forth
0:15:35but overall me this there's eighteen articles but
0:15:38full of
0:15:40the number of
0:15:41common set total comments is close to seven files and almost ninety thousand words in
0:15:46total
0:15:48i don't see annotation characteristics so
0:15:51is at articles and a plus common sense of them fifteen were doubly annotated three
0:15:56we're triple annotated
0:15:57and it is even with the tools you can see the annotators to three and
0:16:00have to six hours to complete the task for one article plus comments
0:16:05so this is a non trivial undertaking idea
0:16:07anchorage it right without some serious
0:16:10commitment
0:16:11but we replace of the results at their they thirty nine in each of these
0:16:15thirteen annotations assisting summaries
0:16:18each the summaries and so startling to one or more groups comments so all of
0:16:21this is in the corpus which is now available for down
0:16:26and i gonna some statistics don't for the paper which why we're going to in
0:16:29detail about the numbers here of annotations so
0:16:35they just a bit of qualitative analysis of the quantitative analysis
0:16:40before it turned related work in conclusion so and looking over the one slot striking
0:16:45things as the people group things
0:16:48in different sorts of way is particularly they i guess this is the famous a
0:16:52lumber is first displayed here is that we're finding
0:16:55and so on average there was something like nine
0:16:57across the whole annotation
0:17:01all annotations the average number of groups for annotations that was nine range from four
0:17:06not able to fourteen point five
0:17:08for some braves the average pronunciation set is five
0:17:12so most annotators use the subgroup option at least once
0:17:16and but in fact there's quite a divide between those who use the same rules
0:17:20quite frequently and those you only used rolled are rarely
0:17:26and so pleadingly from are from the source back without initially for the
0:17:31a target summaries all of them contain
0:17:34sense reporting views on different views on issues
0:17:37and they frequently picked a points of contention
0:17:40a provided examples of the reasons people gain support of viewpoints
0:17:44they frequently indicated proportionate amount of
0:17:47of commenters talking about the views and so
0:17:50a so the whereas we think the mlp what we one of them to do
0:17:54quite well
0:17:55a couple of examples here is a coded this one with
0:17:58red highlighting the comments that are
0:18:01expressing sort of aggregation
0:18:03and a green identifying some of the issues that more explicitly stated in the summaries
0:18:09i've got another one about skip over the
0:18:11so quite healthy looking
0:18:14summaries the sort
0:18:16i and we show these
0:18:17that's if a common so we
0:18:19we actually showed used to various people in particular the guardian themselves and they were
0:18:24very impressed if you could do this automatically now
0:18:26we be very happy
0:18:29so we also that quickly looked at the
0:18:34try to this determine how similar the summaries were used in this not the sort
0:18:38that used in back and two thousand one
0:18:40where you compare the contrary see what for you look for each sentence in us
0:18:45a summary a to see whether all its contents covered in summary a and then
0:18:48you do the and then you the reverse
0:18:51using a sort of likert scale system
0:18:53to see how what commonality is
0:18:55and as a running a timeout skipped is very quickly but
0:18:58essentially we determine there is affirmative
0:19:01of that's in the summers are quite similar you're not there is a problem with
0:19:05or not
0:19:07i in a one extracts different reference summaries
0:19:10they are relatively similar there is a high level of agreement between the judges and
0:19:13making the judgement similarity
0:19:16what i've only got very short time lasso
0:19:19a bias the really work is cover the
0:19:22and the in the paper is to say a high-level think of three sorts of
0:19:26things
0:19:27a sentence assessment which is a approach to than others of user building resource that's
0:19:31for evaluating extractive summaries
0:19:34i don't from real of
0:19:36reader comments which we used
0:19:38necessarily i think is the one way to gel essentially
0:19:42work on the any corpus which but a detailed comparison here but read that in
0:19:46the paper
0:19:47essentially what we do similar so that the different several key ways perhaps
0:19:51and most importantly that they're summarizing meeting reports in which are much more
0:19:56there are a fixed domain and you can anticipate the sorts of things they're gonna
0:20:00immersion a meeting where is you can't and reader comments
0:20:03and finally
0:20:04some work by misread well on summarizing arguments in
0:20:08across conversations but where the focus of the work is really on try to summarise
0:20:12an argument
0:20:14so it's something like gun control or
0:20:16i gay marriage across a whole set of different online conversations rather trials summarize all
0:20:21v
0:20:22all comments and single conversation which may be able to different topics
0:20:27so distinctly then we've developed
0:20:30we proposed a of all over the summer that captures key content
0:20:34of these was able to multiparty argue but oriented conversations developed a method how humans
0:20:39also such things
0:20:41and used a method of the first publicly available corpus of reader comments probably annotated
0:20:45summaries another information
0:20:47we think summaries produced a pretty good with that achieved a comment
0:20:52and we also use the already been able to use the corpus for whole sets
0:20:55of things for instance reviews the grouping to evaluate clustering algorithms
0:20:59we use the back things top and form a unsupervised cluster late a cause for
0:21:03labeling algorithm
0:21:05and we've done a
0:21:07i use the summaries to inform assessors entire space system evaluation
0:21:11and just very quickly future work well obviously the corpus is limited size would like
0:21:16to make it bigger
0:21:17scalability we still have to prove that scales a two thousand comments from say a
0:21:22hundred
0:21:23we think it well but that's just think we'd have to we have to investigate
0:21:27this and also we like to see whether we can think about some ways of
0:21:30maybe crowdsourcing smaller amounts of the sampling altogether
0:21:35as more questions today would groups and subgroups and finally there's evaluation how do you
0:21:40evaluate against these things
0:21:42why last point so is relation appropriate method
0:21:45that is to be investigated if not how what we do it
0:21:50so this to finish would like to acknowledge then the european community for funding this
0:21:54work under the
0:21:55sensei project guardian for lattice use the materials and redistributed
0:21:59are annotated for hard work reviewers here for helpful comments
0:22:04that a questions that if you would like to download the corpus is available
0:22:20yes and the back
0:22:45well
0:22:48if you have so
0:22:50we have a system that's get an interesting question thank you we have a
0:22:54which will system that the does clustering with the several clustering all those including lda
0:22:58and we put all the all the comments in particular clusters together
0:23:02and people look at the clusters and we usually say
0:23:05and
0:23:06another where is the clusters then that some of the argument of the structure is
0:23:09lost and people actually don't like having these clusters but in front of these and
0:23:13users that they want to go back to see the visual context "'cause" they can
0:23:16really only makes sense of the comments
0:23:19in the dialogic context where there is an argument for it again this don't make
0:23:22sense pulled out on the road are clustered together
0:23:25so it's an interesting idea but i don't think it's gonna help people speed up
0:23:29and doing the task i think they
0:23:31i need to do the grouping on their to be intra one idea you comments
0:23:35just
0:23:36maybe think of the be interesting to see the extent to which the
0:23:40well we had done formal evaluation using the standard sort of
0:23:45pages for evaluating clustering of the machine gender clusters in one's of the scores are
0:23:50up to get a good will be more interesting to see use actually to do
0:23:54something analysis on that a look at how
0:23:56the sorts of things that are that the
0:23:59algorithms putting into the clusters that humans are excluding so but
0:24:02essentially i don't think what happens in summary writing
0:24:05that it could help in
0:24:08obviously an algorithm development which is also important
0:24:31i think is that there is
0:24:33the sre some record a question what think we're hugger what it was or the
0:24:36suggestion was that we think about
0:24:39i guess is a sort of active learning approach or something like this where the
0:24:42system you annotate something the system uses the time at a more common somehow hopefully
0:24:46speed up the annotation is that correct
0:24:48so we don't like
0:24:49so it is good idea we have followed by doing things like that
0:24:52but we have no contrast trying women's in practice to see how what i really
0:24:55work thanks
0:25:01then
0:25:43which ones
0:25:47of the so this is a and after the fact that were after-the-fact assessment of
0:25:53what was going on
0:25:54it wasn't called think of the summary creation this was
0:26:32well we're where we want
0:26:36we well we don't have to i mean we
0:26:39with the results actually has multiple different reference summaries the way a lot of reference
0:26:43summaries sensor data
0:26:44and then we just came back afterwards and so that's better of interest has similarities
0:26:48to each other
0:26:50so it's not hard to produce in the resource that we did that stuff that's
0:26:53actually part of analysing it afterwards to see the extent to which
0:26:56these things are similar
0:27:12yes
0:27:13so it's like
0:27:14so i guess we could then
0:27:16it's also like what people call sort of reconciliation we have multiple annotators do some
0:27:21you try to progress that the proposed a single gold standard
0:27:24so we couldn't act do another stage now
0:27:26then for each this multiple things and do the reconciliation and come up and say
0:27:30well this is
0:27:31i this is the reconcile set the perfect summary if you like of the set
0:27:36of
0:27:39yes i like permit i guess a sort from a larger no
0:27:42i got is i mean
0:27:43actually we wanna resorts to do this space but there's lots more you could do
0:27:48in fact that somebody want to do that on top of what we're releasing that
0:27:50will be great
0:27:52i wonder
0:27:55okay this like robert