Speech Transcript - CONCEPT-BASED CLASSIFICATION FOR MULTI-DOCUMENT SUMMARIZATION

a a a a a a we are dealing with the amount of document summarization and uh the goal of a summarisation is finding the most important bits of information uh from either a single document uh such as then you've story or a voice mail or multiple documents uh such as we use of a product or need stories of a nine or spoken documents such as broadcast news broadcast conversations lectures or reading uh the main issue is uh basic be a tackling with this information or a problem uh there are a variety of sources but which you get in formation of all and these are usually we don't the and the only have a limited time to process all this information and uh i usually it is information is not necessarily in the optimum or some sometimes we read a paper and everything that should ever it is other paper before this one to be able to understand this one that uh so we are working on a both speech and text summarisation but basically needs to been organising people shows summarisation immigration providing the research they good framework to do summarisation research and then these include uh documents document summarization and and this has been going on since the past and years and uh uh researchers are provide be the set of documents paired meet uh corresponding human some a a a a and uh i i'll be shopping results uh the nist data to be comparable with the previous three sir so uh a on the you like it works i uh people they have treated some summarisation as a classification problem and we are also doing the same uh usually in these approaches as uh the original documents that as this is they don't have a category all we have a the human summaries the first step is it's dining category to the uh a document sentence that such a summary sentence or a on somebody said and most of the previous work is used similar word based like to measure as between the documents not to assess and the summary uh and then assign labels to the sentence and then and uh done but a binary classification which features that she's the sentence length uh position in the document it's the route main it should such approaches is that a word based similarity measure is usually step fail to capture the semantic select uh between uh sentences and the summaries so uh in addition to the summarisation that's classification approach we use generate models and then use their you don't concept uh to figure out to seen to between that does than documents so that we can take this problem uh generative models have been used by others as well uh for summarisation for example hi geeky key and one they're and they have used a here i Q call to and usually a location and uh we have used let and usually allocation bayes models as well and are uh for work here is the based on our previous work uh which is the sum and D A or S out of the a and then a there will be a it does this uh to put semi-supervised extractive summarization method and it uses uh at the a a a supervised version of a the a a a a two class there uh documents and some we the to topic and then use used that's classification in that in the S out the approach um and and uh the main assumption is that uh there are two types of concepts in the in the documents generic concepts that's then the specific that and the generic ones are the ones that are usually included in the summary so that's the made assumption uh and the us that fig ones item on that are usually specific to each individual document and uh at a very high level of uh we show the process of we have a set of a document and the corresponding human summary and i be used um we yeah D a which i'll describe next uh two a a point is that of uh that and variable that and cost that a a and uh that are specific can generate using some supervision from the human summaries and then we go back and look at that training uh original documents that does that and then be mark the ones that have a making this specific course set as negative examples and the ones that have generic course that is positive example now we train a fired and and at the wrong time at the inference time we use that classifier to decide if that senses should be included in a summary or not uh so uh while this is you don't one and uh there is this uh cost off not the coast but the uh some optimality of transferring labels from the topics to the sentence that ooh so in this work B are basically be in looking at the topics and now we are trying to classify the latent variables themselves instead of the instead that of the sentence sentences and then learn to distinguish in there's of those latent variables that would be useful when we are trying to summarise need a Q so it works like this we train a classifier to do to distinguish the two topics and then at the prince time that we use the regular let and usually location and then uh uh a find change topics and then use the classifier to determine which more should be in the summary and then each most should a and a pick the sentence does that include the generate course so you're are more detail uh like leave this morning Q not speech uh already into to use the A and now approach is an extension of that the A for the summarisation task so i these a two model and it's uh it's that allows us to uh to explain a set of observations by a no observed or hidden or let's and groups and uh this explains why some parts of the data are similar and uh the assumption is each document is there a mixture of a small number of topics and each or creation is attributable to one of the documents topics so each word is sampled uh from a a a a a a a the topic so uh what we do we somebody D is also very similar to how the a but you know that unit in the case of thought date it was new ground but there are also looking at unigrams and bigram you're are when we are looking at the unit you know if it just if you to peers in the most summary or not so if it already appears in a human you be forced it we have two sets of the estimate was that's of course that's and we assume that that it should be generated it should be sampled both from the general concept that's instead of the space for and if it is not then it could be generated from any any top and uh so that be not proceed with the C V C so once we have these two sets of topics generate can space thing uh uh uh the extract features for every single topic the are think that you do is you basically try to find the most frequent unigrams and bigrams in the in the document and then a uh you be these generate a set of features for every topic so the first set of feature there as many features that is uh the number of the three terms the first feature is the probability of this topic cluster their appearing with this frequent where and and and the a and and you also look that all we need uh free for each topic class in close so that the other feature and you basically use the threshold to determine uh and then count to and then normalized by the number of frequent turn and we use maximum entropy classification oh at the inference time man and you set of documents are given since we don't have any some uh be beyond just the usual at the egg model on each document that that we use the classifier that we trained in the previous step to to label to estimate labels for the K topic and generate a coarse space then using the we compute the that those scores and then you decide if a site to that should be included a summary or not so in addition to this previous work has shown that unigram and bigram uh because is are useful in determining if a to this should be included in a summary or not so we are merging the scores based on C V so use use we compute the scores basically for every sentence you look uh at the uh so this first part you look at the uh a number of uh uh uh space is for you generic topic it contains then normalize it by K but we also look at the S out the a score and not the entire make the two of them and then as the next step be the entire interpolate the C V C score with the unigram and bigram score basically the unigram and bigram scores are normalized total to of the high-frequency unigrams or bigrams that the a a a test that the kind of thing a of course in consideration point oops uh and uh and it a a a a a at the yeah uh since we need that subset of these standards as the uh their scores uh he applied a greedy so uh it's the be right order all the sentences according to their scores then start from the highest scoring one it take the highest one then uh these start looking at the rest of the sentence that we at this time to to do some only if it is not read on that lead this act is that are already not in the set in the summary so this is very similar uh to uh two M M are approach but not exactly the same a a and uh these top uh these keep the standards that a lot of read on is it already generated summary and we stop if the summary like to satisfy and uh be able be eight uh the performance of summarisation using a score just like the previous paper uh you look at which one on a used to and with S you four so which one and two of a basic he compute the unigram and bigram or would lead between a that uh a human and system summary and which just C four looks that the uh skip or get P by grams up to a these of four and uh for training and we use a doc two thousand five and six data sets uh so there are in total hundred documents that this state L and each uh documents that contains to in five use arc and is that about eighty thousand sentences and for testing at we use the uh a doc to talk and that seven uh data set so is that a forty five documents sets and i each estimate five news article again and a twenty five thousand side so we use this it because we wanna compare you to be able to compare our results with most of them previous work a a that is using these data sets for evaluation and and uh the form of the task actually the nist evaluation of the form of the task has changed uh after two thousand that's seven so it's not you have become payable this town and uh the goal is to create two hundred and fifty word summaries from each document that so we are the results uh for the baseline you big you just use the cosine similarity to mark the in the initial set of sentence that a a similar to the previous work and a but as different from the previous work here we are only using the type of features are based on the word based features that we are computing so is they've bit weaker than the previous for another uh a baseline that we are comparing with is the five eight system so uh this a was also a classification like summarisation and that was the top performer ian not to that is that one but it that much more sophisticated features that the one that we are trying a that are one is the are some so this as the one uh that uses a hierarchical out the a a a a generate method that and uh so of the the the way a form the summary series is basically they use hlda to for to find out the topics then then they are picking the sentence that they try to keep the topic distribution for weird or original document and and they are mean the summary and then they use the kl divergence measure to make sure that uh the summary the summary topics are not significantly different than the original document topic so it use the in terms of which one and which to scores both X out the and C V C form uh significantly better than the that all the other approach and C give them a little bit better than a as that a but it also using a S eh in terms of fruits to we are about the same so it's are in the ballpark one of the main reason this some of the previous ports start actually optimising according to buy is the are using both unigrams that bigram uh but we didn't have access to the here are some summaries so we don't know what exactly what the reason and so in conclusion a a we are trying to learn a a summary content distribution uh from uh the document sets provide that uh uh according to a um according provide and uh uh use the human summaries to have some supervision and we are finding the generate and the uh a specific because that a so we have shown improvements in terms of rouge scores yeah and that most of five think that can that we think that K so there are you big it's is that one of them i for got to include here uh one of them means we are not really competing their only competing word based features of future work is to actually improve the feature set that we are using and and the other future work work based i of them i think the uh the sentence the selection part so you our previous work uh we have shown that you can actually do an exact search using integer linear programming and there are stiff system was uh the best performing system in dark two thousand and two thousand nine and ten so uh it's actually can easily be adapted here so that's also part of a future work "'cause" we are also find concepts and then that is that of the is think the concept uh that are included in the summary and that furthermore we also proposed a hierarchical topic model for summarisation yeah direct that should is uh moving towards oh thank you i i i'm also a a really a so yeah i a possible simple solution we were to cover most of which ones are able you for what you can actually that the and we i not okay key that all the a whole at the jedi concept that but in the i P framework actually we could do that uh a a uh we could uh try to a next to my the generic call that should be some uh but uh we have time that a i a i for i alright a i know that that have a a a a a a for mission just the just a just to human summary for oh i i i i i i a well actually even position based features are just the if you take the first that this from most of the documents previous work has role that really does so well then the but it's not of course that yeah a i i a

CONCEPT-BASED CLASSIFICATION FOR MULTI-DOCUMENT SUMMARIZATION

Spoken Document Processing

Presented by: Dilek Hakkani-Tur, Author(s): Asli Celikyilmaz, University of California Berkeley, United States; Dilek Hakkani-Tür, Microsoft Corporation, United States