0:00:15in q
0:00:17my name is that i have i worked with we marker at the natural language
0:00:22and dialogue systems lab at uses an improved
0:00:24i'm going to talk about
0:00:26learning fine grained knowledge about contingent relations between everyday band
0:00:32or well here in this work is
0:00:34to capture commonsense knowledge
0:00:36about the fine grained events are everyday experience and the events that occur in their
0:00:42everyday life of people
0:00:43like in a like opening every enable
0:00:47preparing food in getting to or
0:00:49an alarm going off
0:00:52triggers waking up and getting out of that
0:00:56you believe that this type of knowledge and the relation between
0:01:00you events is a contingency relation
0:01:05based on the definition of contingency from the penn discourse treebank which has two types
0:01:10that cause and condition
0:01:13another motivation of our work if that much of the user generated content on social
0:01:17media is provided by ordinary people telling stories about their daily lives and stories are
0:01:24reach in common sense
0:01:27knowledge and discontinuing relation between events i have two examples here from our dataset our
0:01:34dataset is drawn from
0:01:36is actually a subset of this you know what's which have millions of blog posts
0:01:40and it contains the personal stories written by people about their daily lives in their
0:01:45block
0:01:47in the examples you can see that are sequences of coherent events in the stories
0:01:52for example the first one
0:01:54it about going on a camping trip
0:01:57and you have a sequence of events that
0:01:59they pack everything they
0:02:01we got in the morning to go get to the camping ground and
0:02:05a place that are a set of the pen so there is you sequence of
0:02:09events in the second story is about witnessing the store
0:02:12the parking
0:02:13make landfall the green blue a tree fell
0:02:16then they people start cleaning up writing of the trees a lot of breaking
0:02:23there is this commonsense knowledge in this contingency relation between events in stories implicitly
0:02:28and you want to learn these and
0:02:32it is showing his talk that use fine-grained knowledge
0:02:35is not found in the previous work
0:02:38on the extraction of
0:02:41narrative events and even collection
0:02:45a much of the previous work
0:02:47is not focus on a particularly the relation between the events they characterise what they
0:02:53learned as a collection no
0:02:55events that tend to co-occur
0:02:57and on they are kind of vague about what is the relation between the sequence
0:03:01of events
0:03:03is also mostly focused on venues wired longer
0:03:06so the type of the knowledge that they can learn
0:03:09is a limited to
0:03:11the new rd events that are on the news articles like the ball mean really
0:03:15more explosions
0:03:18as for evaluation they mostly used an article fast which we believe is not the
0:03:23right way to evaluate used type of knowledge
0:03:25so
0:03:27in our work we focus on contingency relation between events
0:03:33we use the personal blog stories
0:03:35at the dataset
0:03:37so we can learn new types of knowledge about even and other than the newsworthy
0:03:42events
0:03:44and we also used to
0:03:46evaluation metals one of them is inspire and motivated by previous work and the other
0:03:52one is completely new
0:03:57this is that a distortion this dataset or tends to be told in chronological order
0:04:03so there's a temporal order between the events that are told in the story and
0:04:07this is great because temporal order between events is a strong cue to contingency
0:04:11so this makes it us to the whole
0:04:13for our task
0:04:15but these data sitcoms the with its own challenges
0:04:19it has more of an informal structure as compared the news article that are well
0:04:23structured
0:04:24and the structural
0:04:26disorders or more similar to the oral narrative in one of our previous studies apply
0:04:33the oral narrative while of all involves key to
0:04:36this to label that clauses in his personal stories and we show that about only
0:04:41a third of the sentences in the personal narratives
0:04:43you describe the actions and events and
0:04:46the other two is there are talking about the background and the
0:04:51try to distaste the emotional of the narrator
0:04:55i have an example here i'm not going to describe what is labeled are but
0:04:58you can see that there is some background like
0:05:00now on we speak story
0:05:04is that one can take the
0:05:06or a then there is some a actions and events about the person getting rs
0:05:10there
0:05:11by the traffic police and then
0:05:15at like me to my i should go free so it's not all events there
0:05:19is a lot of other things going on which makes it more challenging
0:05:24and
0:05:25so we need not all methods to
0:05:29in very useful relations between events from this dataset and i'm going to show it
0:05:33in the experiments that
0:05:34if we apply the methods that work on the news articles to extract the event
0:05:39collections
0:05:40we won't get good results on this dataset
0:05:44what events we define event at the brier
0:05:48we three arguments the subject the direct object and the particle
0:05:53and some examples
0:05:55definition is motivated by
0:05:58one of the previous work by a hotel and nineteen twenty fourteen that they show
0:06:03that more argument representation argument is richer in it is more capable of capturing the
0:06:08interaction between events
0:06:11they use verb and subject and object we also added the particle
0:06:16because we think that it is also necessary for conveying the right meaning of a
0:06:21bound for example the first and then a stable
0:06:25event it putting out that and
0:06:27and you have put direct object and the particle with all and you can see
0:06:31how all these arguments
0:06:33i can contribute to the meaning of the event like put by itself
0:06:37it has a different meaning that putting up to ten and
0:06:41also the particle put and put up
0:06:43are you tell you different thing
0:06:45and you know or especially it is important because
0:06:48it's more informally had a lot of each verb still it's important to have all
0:06:52the arguments
0:06:53in the event representation
0:06:54or extracting events we use the stamp for dependency parser and
0:07:00use the dependency parse trees to extract the use of verbs and arguments
0:07:04and we also use the sample named entity recognizer
0:07:07to do a little more generalization of the arguments for example the
0:07:12terms of the phrases that refer to location
0:07:15are mapped to their type location the same four percent i'm date and
0:07:21section so
0:07:23the contributions of our work is that we have a data collection step we generate
0:07:28topics sort of personal stories using it would it's tracking algorithm
0:07:33then be directly compare our method for extracting these contingency relation between events on a
0:07:39general domain set of stories and also on the topic specific data that we have
0:07:45generated
0:07:46and we will show that we can learn more fine grained and richer knowledge and
0:07:50more interesting knowledge from the topic specific corpus
0:07:53and a model works
0:07:55significantly better on the topic specific corpus and this is the first time that you're
0:07:58doing this comparison directly on these two types of data set
0:08:02for the event collection
0:08:04and will show that is improvement is possible even with less amount of data on
0:08:10the topic specific corpus
0:08:13we have to use that some experiments we directly compare our work to the most
0:08:18relevant previous work
0:08:19and we also used to
0:08:21evaluation methods for these experiments
0:08:25no the data collection of our we have a some unsupervised algorithm for generating a
0:08:30topic specific dataset using a bootstrapping method the corpus you're is the general the on
0:08:36annotated blocks corpus that has the all the personal blog stories
0:08:40we first manually label a small set at this feature for the bootstrapping
0:08:45about two hundred to three hundred and each topic
0:08:48and that is into a lot of like us we choose a we it
0:08:55after the learner
0:08:56so we generate some event utterance
0:08:59specific to that talking for example if you're looking at the camping trip story
0:09:03we can generate some pattern like this like and p followed by proposition followed by
0:09:07and optional and p and the head of the first noun phrases counting the recognition
0:09:13and so
0:09:14it generates some even tyrants that or
0:09:16strongly correlated with the topic
0:09:19and then we use these patterns to lose track and label automatically label or stories
0:09:25on the topic from the
0:09:27corpus so
0:09:28then we fading the on labeled data
0:09:31java slot and we use this patterns
0:09:34and based on how much of the patterns of a topic you can find in
0:09:38an unlabeled data
0:09:39in the label each other
0:09:42that topic so
0:09:43we do about two to three hundred and two topics we generated about with one
0:09:48around a bootstrapping you generated about one problem
0:09:51new label with a bootstrapping
0:09:55and here i'm presenting the results on two
0:09:58topics
0:09:59from our corpus the counting story
0:10:01and stories about witnessing image or store
0:10:06a bit about three hundred stories we generated the expanded the corpus about al
0:10:13or learning called a contingency relation between events we use how the potential method introduced
0:10:20by anymore in your june two thousand nine
0:10:23it's an unsupervised distributional measured it measures that
0:10:26and then the of an event pair to encode a cause relation
0:10:30you know on it apparently when have a high like cause a potential all swore
0:10:35they have a higher probability of occurring in
0:10:40the causal context
0:10:42so the first component your is the mutual information in the second one
0:10:46it is taking into account the temporal order between the advanced so
0:10:52if they can talk or more in this order this particular ordered a we have
0:10:57a higher recall the potential score
0:10:59and this is
0:11:01great for our corpus because the events are the events tend to be told in
0:11:05the right sample order
0:11:08then we calculate a called a potential
0:11:10for every pair of you understand events in the corpus
0:11:14a using escape to bigram model
0:11:18because like i shown in example
0:11:21all the sentences or not
0:11:24events and events can be interrupted by the non events and that the we use
0:11:27this you to bigram
0:11:29which defines two events to be ideas and if they are really in two or
0:11:33less the events from each other
0:11:38most of the previous work use this narrative closed test
0:11:42for evaluating their
0:11:44sequence of events that they have learned
0:11:47now suppose that a sequence of narrative events in a document from which one event
0:11:52has been removed
0:11:53and the task is to predict the remove event
0:11:57no we believe that this is not
0:12:00suitable for our task for evaluating the coherence of events
0:12:04and also in the previous work by each other and we they show that unigram
0:12:08model results are nearly as good as or more complicated more sophisticated models
0:12:14on this task so it's
0:12:16not good for a capturing the
0:12:19all the capabilities of the model
0:12:22no we are proposing in new evaluation model which is motivated by cope all corpora
0:12:30wasn't evaluation method for the common sense causal reasoning it had to choice questions
0:12:37so
0:12:38we are generating are automatically generating used to choice questions from a task that we
0:12:42have a separate held-out test set for each dataset
0:12:45and
0:12:46it would choice question
0:12:49consist of one event question
0:12:51event question for example of your a range outdoor
0:12:54is extracted from the test that still it occurs in the test that
0:12:58and one of the choices which is the correct answer
0:13:01is the event that is followed by it is falling the a range so it's
0:13:06falling the event question
0:13:08unlike the task that and the second one
0:13:11it is not the correct answer is randomly generated from the list of all events
0:13:15that have that
0:13:16so if you're in that have a range outdoor followed by whole tray
0:13:20and call it is randomly generated
0:13:23so the model is supposed to predict which one
0:13:26of these two choices is more likely to have a contingency relation with the event
0:13:31in the question and then we calculate the accuracy based on the answers that the
0:13:35model generates your
0:13:39in previous work we compared to indirectly it
0:13:44the work by but is the remaining in grounding thirteen
0:13:48they generate something that they call the realm gram toppled it basically a pair of
0:13:53relational toppled of events so they generate these pairs of events
0:13:58that tend to
0:13:59collector together
0:14:01they use the news article
0:14:04there is that
0:14:06use the co-occurrence statistics based on symmetric conditional probability
0:14:11which is here and the cp
0:14:12so we basically just combines the bigram model in two directions
0:14:17and on their corpus their demands that they have learned is publicly available you can
0:14:23access to
0:14:24the run online search interface
0:14:27and they show that in the remote that they outperform the previous work on the
0:14:30thing
0:14:31talking on learning the narrative events
0:14:34big and i two experiments to compare these previous work
0:14:38e compare the content of what hitler
0:14:41to show that would be learned not exist in the previous collections and we also
0:14:44applied or model on our dataset to show that the model that more on the
0:14:49more structured data like news article cannot get good results on how to do that
0:14:56of the baseline we use the unigram model which basically is the distribution of the
0:15:02prior probability of the events we use the bigram which is the bigram probability of
0:15:06the event pair
0:15:08again using the script a bigram model and the event a cp the symmetric conditional
0:15:12probability from the real grams work
0:15:16and i mean method here is that all the potential so
0:15:21we have two dataset the general domain stories dataset are the stories are randomly selected
0:15:27from the corpus they don't have a specific theme or topic
0:15:33we have four thousand stories in the training and two hundred stories in the held
0:15:37out test set
0:15:38we also have a topic specific that is that your time
0:15:41i will be presenting the results on two topics the camping stories and stories about
0:15:46witnessing the score
0:15:48here's displayed other dataset for each of topic so we had a hand labeled c
0:15:53we split into test and training so we have the hand labeled cast you have
0:15:59the hand labeled training and then we create for each topic
0:16:02a larger training set that has the hand labeled training
0:16:05last the bootstrap data to see if the blue strapping is helpful at all or
0:16:10not
0:16:14here is the results this is the accuracy on a all the task was true
0:16:18for each topic
0:16:20though
0:16:22i'm reporting the results of the baselines on the largest rings that each other hand
0:16:28label and it with a strap because
0:16:31and the hand labeled the results are
0:16:33just a little worse so i'm just a reporting the
0:16:37best results for the baseline
0:16:39and then for causal potentially all have the results for both the hand labeled train
0:16:44set mutual small about
0:16:45one or two hundred and
0:16:47on the largest trends that about the problem
0:16:51which is the hand labeled plus the word wrap
0:16:55it up
0:16:56here you can see that the other potential results are significantly stronger than
0:17:01all gutter baseline
0:17:05and also the results on the topic specific dataset is significantly stronger
0:17:10on the results on the general domain even for the call the controls about
0:17:15accuracy is pointing by one but on the topic specific on the
0:17:21even a smaller dataset
0:17:24you can
0:17:25and you sixty eight percent accuracy for the for one topic and for another about
0:17:29eighty eight percent accuracy
0:17:32and also if you compare the results on the smaller hand-labeled train set to the
0:17:37training set with the worst wrapping which is larger
0:17:41they consider more training data collected by bootstraping can improve the results that was tracking
0:17:47was actually effective
0:17:52no
0:17:55event the cp or the bigram models that
0:17:57were used in the previous work for generating these events a collection
0:18:02did not work very well on our dataset
0:18:09the next thing it is we want to compare the content of what we have
0:18:13them and see if we actually exist in the previous collections are not so you're
0:18:17i want to show the results of comparing the event that we expected from
0:18:22the camping trip story
0:18:26against the realm i'm tuples
0:18:27so the real grounds are not topics or that
0:18:30so what we did to get the ones that are related to the campaign is
0:18:34that we use our top then even tighter that are generated in the blues tracking
0:18:41process
0:18:42and we use them to search the interface so
0:18:47each event pair in the
0:18:50like this example
0:18:52for example go camping is one of the even patterns that we have and then
0:18:57research it in the interface and then we get all the pairs that
0:19:03there at least one of the event is
0:19:05go camping
0:19:09so then apply filtering and ranking that was used in the same paper they filter
0:19:14based on the frequency at rank based on the symmetric conditional probability metric that they
0:19:19had
0:19:19and then evaluate the top and in four hundred
0:19:23on our next evaluation task that a jury next
0:19:28and this is some examples of the
0:19:30there are extracted for the counting from real ground
0:19:33in you can see that if you look at the second
0:19:36events in his parents
0:19:38person a likeable camping and then work we first then we'll reorganisation be direct organisation
0:19:43lose percent so it seems that this is not about the chanting tree like to
0:19:47find camping three it's about
0:19:50mostly the eight moves or the refugee
0:19:54and
0:19:55so we propose a new evaluation method on
0:20:00under the mechanical turk
0:20:02for evaluating the topic specific contingent event pair
0:20:05so we evaluate the cars based on their topic relevant and contingency relation
0:20:10we asked the annotators to
0:20:13rates the pairs on a scale of zero two three zero is the events are
0:20:17not contingent to one
0:20:19events are contingent but not relevant to talk to their can you know but somewhat
0:20:23relevant topic and three's the strongest the events are doing and stronger about the topic
0:20:28in to make the
0:20:30even under presentation more readable for annotators be
0:20:34not receive and representation to subject verb particle
0:20:38and
0:20:40direct object
0:20:41so
0:20:42i subject percent or topic or particle all will be mapped to person tackle part
0:20:47which is more readable to the user's
0:20:49and this is the result
0:20:52what the runtime evaluation
0:20:54so
0:20:56only seven percent are judged to be continued and topic relevant and we think this
0:21:00is because
0:21:01that the camping trip a big actually does not exist in the collection
0:21:05an overall only forty two percent are just to be continue
0:21:11we evaluate our topic specific content event pairs in the same big help to clustering
0:21:15method selecting
0:21:17the pairs that are
0:21:19more than five times frequent filtering by the same event utterance
0:21:24and the ranking by call control model
0:21:26on the same clustering and ranking method and evaluating the top hundred for each
0:21:30a topic and this is there a result
0:21:33that is showing that
0:21:35for each topic for the camping forty four percent and for the store
0:21:39thirty three percent
0:21:41are contingent and type of nn
0:21:43and overall about eighty percent of the
0:21:46all the pairs that the other and are contingent and
0:21:50on average inter annotator
0:21:51reliability on these mechanical turk task was
0:21:55point seven three we show substantial agreement so
0:22:02finally i want to show some examples of the event first
0:22:08no
0:22:09we show that the results on the topic specific are stronger and even by looking
0:22:14at the examples you can see that the knowledge that we learn is more interesting
0:22:18like climb find a rock or we include transformer power was a three policies crush
0:22:24relocation exactly person but
0:22:27the ones from the general domain data set or more general like person who were
0:22:31down trail or persons you can't person simply use or personable locked on
0:22:37and
0:22:38conclusion
0:22:42so we learn new type of knowledge at the current knowledge round every event that
0:22:47is not available in the a previous work on the news wire genre
0:22:51you have a collection is that of uses a supervised model to with this relevant
0:22:56greatest topic specific data and the first work that the
0:22:59compared the results on the topic specific words that the general domain the story
0:23:04have to new evaluation method one of them completely new on mechanical turk and the
0:23:08other one inspired by the core task
0:23:11and the results
0:23:15i have already talked about this and by doing things
0:23:19and you
0:23:59i think that's true so if you have the dataset that is specific to even
0:24:03it's easier
0:24:04to learn and it's
0:24:05the methods will be more effective
0:24:20that is definitely an interesting idea i have tried to war to make model
0:24:24on the corpus but the results didn't look a good
0:24:31like that the ones that are considered similar or when i look at then they
0:24:34are actually not similar for our task
0:25:13the labeling is only for the for the
0:25:21the only thing is only here
0:25:25or the
0:25:27stories not the event types
0:25:29right so the event handlers are generated automatically by the awful like
0:25:36you just need to find some topics like
0:25:38come up with the topics like you think that okay what people right on the
0:25:41blocks
0:25:43four minutes and store and then you go you look at the corpus you try
0:25:47to find a small c
0:25:48a small set of stories that are undoubtedly
0:25:59so what it initially was running a topic modeling is
0:26:04but it topics that are generated or non-coherent but they help you get some why
0:26:09doesn't what topics exist here so you know that you can go look for the
0:26:13stories about going
0:26:15you like whoa
0:26:17or going on a camping trip
0:26:21but once you come up with the topics that actually this i think you can
0:26:25expand this and then be more and more rounds of what is tracking you can
0:26:28collect more data