0:00:17a a a a a a we are dealing with the amount of document summarization
0:00:21and uh the goal of a summarisation is finding the most important bits of information
0:00:26uh from either a single document uh such as then you've story or a voice mail
0:00:31or multiple documents uh such as we use of a product or need stories of a nine
0:00:36or spoken documents
0:00:37such as broadcast news broadcast conversations lectures or reading
0:00:42uh the main issue is uh basic be a tackling with this information or a problem
0:00:47uh there are a variety of sources but which you get in formation of all
0:00:51and these are usually we don't the
0:00:52and the only have a limited time to process all this information
0:00:56and uh i usually it is information is not necessarily in the optimum or some sometimes we
0:01:01read a paper and everything that
0:01:03should ever it is other paper before this one
0:01:05to be able to understand this one that
0:01:08uh so we are working on a both speech and text summarisation
0:01:12but basically needs to been organising people shows summarisation immigration providing the research they good framework
0:01:19to do summarisation research and then these include
0:01:21uh documents document summarization
0:01:24and and this has been going on since the past and years and uh
0:01:29researchers are provide be the set of documents paired meet uh corresponding human some
0:01:35a a a a and uh
0:01:36i i'll be shopping results uh the nist data to be comparable with the previous three sir
0:01:41so uh a on the you like it works i uh people they have treated some summarisation as a classification
0:01:48and we are also doing the same
0:01:50uh usually in these approaches as uh the original documents that as this is they don't have a category all
0:01:55we have a the human summaries
0:01:57the first step is it's dining category to the uh a document sentence that such a summary sentence or a
0:02:02on somebody said
0:02:04and most of the previous work is used similar word based like to measure as between the documents not to
0:02:09assess and the summary
0:02:10uh and then assign labels to the sentence
0:02:13and then and uh done but a binary classification which features that she's the sentence length uh position in the
0:02:20document it's the route
0:02:22main it should such approaches is that a word based similarity measure is usually step fail to capture the semantic
0:02:29uh between uh sentences and the summaries
0:02:32so uh in addition to the summarisation that's classification approach we use generate models
0:02:37and then use their
0:02:39you don't concept uh to figure out to seen to between that does than documents so that we can take
0:02:44this problem
0:02:45uh generative models have been used by others as well uh for summarisation for example
0:02:50hi geeky key and one they're and they have used a here i Q call to and usually a location
0:02:55and uh we have used let and usually allocation bayes models as well and are uh for work here is
0:03:00the based on our previous work
0:03:02uh which is the sum and D A or S out of the a
0:03:05and then a there will be a it does this uh to put semi-supervised extractive summarization method
0:03:10and it uses uh at the a a
0:03:13a supervised version of a the a a a a two class there uh documents and some we the to
0:03:18and then use used that's classification in that in the S out the approach
0:03:24and uh the main assumption is that
0:03:26uh there are two types of concepts in the in the documents generic concepts that's then the specific that
0:03:31and the generic ones are the ones that are usually included in the summary so that's the made assumption
0:03:36uh and the us that
0:03:38fig ones item on that are usually specific to each individual document
0:03:42and uh
0:03:44at a very high level of uh we show the process of we have a set of a document
0:03:48and the corresponding human summary
0:03:50and i be used um we yeah D a which i'll describe next uh two
0:03:54a a point is that of uh that and variable that and cost that
0:03:57a a and uh that are specific can generate using some supervision from the human summaries
0:04:02and then we go back and look at that training uh original documents that does that
0:04:07and then be mark the ones that have a making this specific course set as negative examples and the ones
0:04:12that have generic course that is positive example
0:04:15now we train a fired and and at the wrong time at the inference time we use that classifier to
0:04:20decide if that senses
0:04:21should be included in a summary or not
0:04:25uh so uh while this is you don't one and uh there is this uh cost off not the coast
0:04:30but the
0:04:31uh some optimality of transferring labels from the topics to the sentence that
0:04:36so in this work B are basically be in looking at the topics
0:04:40and now we are trying to classify the latent variables themselves instead of the
0:04:44instead that of the sentence sentences
0:04:46and then learn to distinguish in there's of those latent variables that would be useful when we are trying to
0:04:51summarise need a Q
0:04:53so it works like this we train a classifier to do
0:04:56to distinguish the two topics
0:04:58and then at the prince time that we use the regular let and usually location
0:05:02and then uh uh a find change topics and then use the classifier to determine which more should be in
0:05:08the summary
0:05:09and then each most should a
0:05:11and a pick the sentence does that include the generate course
0:05:16so you're are more detail
0:05:17uh like leave this morning Q not speech uh already into to use the A
0:05:22and now approach is an extension of that the A for the summarisation task
0:05:26so i these a two model and it's uh it's that allows us to
0:05:30uh to explain a set of observations
0:05:33by a no observed or hidden or let's and groups
0:05:36and uh this explains why some parts of the data are similar
0:05:41and uh the assumption is each document is there a mixture of a small number of topics
0:05:45and each or creation is attributable to one of the documents topics
0:05:49so each word is sampled uh from a a a a a a a
0:05:52the topic
0:05:53so uh what we do we somebody D is also very similar to how the a but you know that
0:05:59in the case of thought date it was new ground but
0:06:01there are also looking at unigrams and bigram
0:06:04you're are when we are looking at the unit you know if it just if you to peers in the
0:06:08most summary or not
0:06:09so if it already appears in a human you be forced it we have two sets of the estimate was
0:06:13that's of course that's
0:06:14and we assume that that it should be generated it should be sampled both from the general concept that's instead
0:06:19of the space for
0:06:21and if it is not then it could be generated from any any top
0:06:26uh so that be not proceed with the C V C so once we have these two sets of topics
0:06:31generate can space thing
0:06:33uh uh uh the extract features for every single topic
0:06:36the are think that you do is you basically try to find the most frequent unigrams and bigrams in the
0:06:41in the document
0:06:42and then a uh you be these
0:06:44generate a set of features for every topic
0:06:47so the first set of feature there
0:06:49as many features that is
0:06:51uh the number of the three terms
0:06:53the first feature is the probability of
0:06:55this topic cluster their appearing with this
0:06:58frequent where
0:06:59and and and the a and and you also look that all we need uh free for each topic class
0:07:04in close so that the other feature and you basically use the threshold to determine uh and then count to
0:07:10and then normalized by the number of frequent turn
0:07:13and we use maximum entropy classification
0:07:17oh at the inference time man and you set of documents are given
0:07:20since we don't have any some uh be beyond just the usual at the egg model on each document that
0:07:26that we use the classifier that we trained in the previous step
0:07:29to to label to estimate labels for the K topic
0:07:32and generate a coarse space
0:07:34then using the we compute the that those scores
0:07:37and then you decide if a site to
0:07:39that should be included a summary or not
0:07:42so in addition to this previous work has shown that unigram and bigram uh because is
0:07:47are useful in determining
0:07:49if a to this should be included in a summary or not
0:07:51so we are merging the scores based on C V
0:07:56so use use we compute the scores basically
0:08:00for every sentence you look uh at the uh so this first part
0:08:03you look at the uh a number of
0:08:06uh uh uh space is for you generic topic it contains then normalize it by K
0:08:10but we also look at the S out the a score and not the entire make the two of them
0:08:15and then as the next step be the entire interpolate the C V C score with the unigram and bigram
0:08:22basically the unigram and bigram scores are normalized total to of the high-frequency unigrams or bigrams that the a a
0:08:29a test that the kind of thing
0:08:31a of course in consideration point
0:08:40and uh
0:08:42and it a a a a a at the yeah
0:08:45uh since we need that subset of these standards as the uh their scores uh he applied a greedy
0:08:51so uh it's the be right order all the sentences according to their scores then start from the highest scoring
0:08:57it take the highest one
0:08:59then uh these
0:09:00start looking at the rest of the sentence that
0:09:03we at this time to to do some only if it is not read on that lead this act is
0:09:07that are already not in the set in the summary
0:09:10so this is very similar uh to uh
0:09:12two M M are approach but not exactly the same
0:09:16a a and uh these top uh these keep the standards that a lot of read on is it already
0:09:21generated summary
0:09:22and we stop if the summary like to satisfy
0:09:27and uh be able be eight uh the performance of summarisation using a score just like the previous paper
0:09:33uh you look at which one on a used to and with S you four
0:09:37so which one and two of a basic he compute the unigram and bigram or would lead between a that
0:09:43uh a human and system summary and which just C four looks that the
0:09:47uh skip
0:09:48or get P by grams up to a these of four
0:09:51and uh for training and we use a doc two thousand five and six
0:09:55data sets
0:09:56uh so there are in total hundred documents that this state L
0:10:00and each
0:10:01uh documents that contains to in five use arc
0:10:04and is that about eighty thousand sentences
0:10:06and for testing at we use the uh a doc to talk and that seven
0:10:10uh data set so is that a forty five documents sets and
0:10:14i each estimate five news article again and a twenty five thousand side
0:10:19so we use this it because we wanna compare you to be able to compare our results with most of
0:10:25previous work a a that is using these data sets for evaluation
0:10:28and and uh the
0:10:30form of the task
0:10:31actually the nist evaluation of the form of the task has changed
0:10:34uh after two thousand that's seven so it's not you have become payable this town
0:10:38and uh
0:10:39the goal is to create two hundred and fifty word summaries from each document that
0:10:45so we are the results uh for the baseline you big you just use the cosine similarity to mark the
0:10:50in the initial set of sentence that
0:10:52a a similar to the previous work
0:10:54and a but as different from the previous work here we are only using the type of features are based
0:10:59on the word based features
0:11:00that we are computing so is they've bit weaker than the previous for
0:11:04another uh a baseline that we are comparing with is the five eight system so uh this
0:11:09a was also a classification like summarisation and that was the top performer ian
0:11:14not to that is that one
0:11:15but it that
0:11:16much more sophisticated features that the one that we are trying
0:11:20a that are one is the are some so this as the one uh that uses a hierarchical out the
0:11:25a a a a generate method that
0:11:27and uh so of the the the way a form the summary series is basically they use hlda to for
0:11:32to find out the topics
0:11:34then then they are picking the sentence that
0:11:36they try to keep the topic distribution for weird or original document
0:11:40and and they are mean the summary and then they use the kl divergence measure to make sure that uh
0:11:44the summary the summary topics
0:11:46are not significantly different than the original document topic
0:11:50so it use the in terms of which one and which
0:11:53to scores both X out the and C V C form
0:11:56uh significantly better than the that all the other approach
0:11:59and C give them a little bit better than a as that a but it also using a S eh
0:12:04in terms of fruits to we are about the same so it's are in the ballpark
0:12:07one of the main reason this some of the previous ports start actually optimising according to buy is the are
0:12:12using both unigrams that bigram
0:12:15uh but we didn't have access to the here are some summaries so we don't know what exactly what the
0:12:22so in conclusion a a we are trying to learn a a summary content distribution
0:12:27uh from uh the
0:12:28document sets provide that uh uh according to a
0:12:34and uh uh use the human summaries to have some supervision and we are finding the
0:12:39generate and the uh a specific because that
0:12:42a so we have shown improvements in terms of rouge scores
0:12:45yeah and that most of five think that can that we think that K
0:12:49so there are you big it's is that one of them i for got to include here uh one of
0:12:53them means we are not really competing their only competing word based features of future work is to actually improve
0:12:58the feature set
0:12:59that we are using
0:13:00and and the other future work work based i of them i think the uh the sentence the selection part
0:13:06so you our previous work uh we have shown that
0:13:09you can actually do an exact search using integer linear programming
0:13:12and there are stiff system was uh the best performing system in dark
0:13:16two thousand and
0:13:18two thousand nine and ten
0:13:19so uh
0:13:20it's actually can easily be adapted here so that's also part of
0:13:24a future work "'cause" we are also find concepts and then that is
0:13:27that of the is think the concept uh that are included in the summary
0:13:31and that furthermore we also proposed a hierarchical topic model
0:13:34for summarisation
0:13:35yeah direct that should is uh moving towards
0:13:40thank you
0:13:45i i i'm also a a really a so yeah
0:14:08a possible simple solution we were to cover most of which ones are able
0:14:14for what you can
0:14:18actually that the
0:14:20and we i not
0:14:21okay key that all the
0:14:22a whole at the jedi concept that
0:14:24but in the i P framework actually we could do that
0:14:30a a uh we could uh try to a next to my the generic call that should be
0:14:37but uh we have time
0:15:09a i know that that have a a a a a a for mission just the just a
0:15:13just to human summary
0:15:40well actually even position based features are just the if you take the first that this from most of the
0:15:45previous work has role that really does so well
0:15:48then the but it's not
0:15:49of course that