0:00:15well in a single parent i'm not actually when we also this paper so this
0:00:19paper by properly aren't each element area of this
0:00:23should be with
0:00:25all right i do not work we don although the force also tracker are the
0:00:36this we will use that also just have some background the motivation
0:00:40and i don't talk american what we mean by his of the inference
0:00:45and but not all introduce the problem statement and have a model and tell her
0:00:52and deal with cross domain generalization
0:00:55and then introduce a data experiments and an
0:00:59of and conclusions and future work
0:01:01so in terms of the background in
0:01:04so as the main use of these days of c critical to
0:01:07successful completions of tasks and task the stylus
0:01:11and usability is
0:01:14braries expresses the probability distribution over goals which are represented as state value pairs
0:01:20and i'm typically state tracking or tracking approaches use dialogue acts to infer user intentions
0:01:27towards the slot values that have been detected and typical dialogue acts would be inform
0:01:33deny in four
0:01:35requests negates so an example utterance here finally french restaurants
0:01:40in boston a
0:01:42example slu output would be informed crazy is french
0:01:47okay she got city equals lost or not
0:01:52basically in
0:01:56the idea of this work it is our motivation of this work is guy recordings
0:02:00dialogue acts are always not adequately capture the user intends to words
0:02:05so far is in the bookcases
0:02:08so one example is implicit denial so in this example here you know if the
0:02:14user invites by john
0:02:16and joe for dinner
0:02:18and then se stage o often is so i think here that this is in
0:02:22place of because it doesn't cars onto it in ir in the case
0:02:26dialogue act
0:02:28and expect so here we have the
0:02:31we have the
0:02:33or comments on the left and you expected slu a for a typical system on
0:02:37the right
0:02:38and to another limitation can be expressed and preferences for slot values and this is
0:02:44specifically to i think to the space slot value so in this example we asked
0:02:50to read french restaurants in los and not that
0:02:53the second order and says
0:02:55finally some in san jose to disordered finally by singular right instead so current slu
0:03:01and expresses dialogue acts wouldn't
0:03:04distinguish between the second and sororities whereas the intent is of see quite different in
0:03:09that in that sort instances basically express the preference for your right which would imply
0:03:14to replace
0:03:16da replace go well as currently in the state we can write i'm
0:03:21in the second instance you just wanted and it's
0:03:25and then i don't know in the limitation is that
0:03:29it doesn't deal well we numerical update if you just one incrementality command so in
0:03:34this example you know you ask for table four formant then you might say a
0:03:39four more seats to lose more seats and the expected a popcorn systems would deal
0:03:45with tiger
0:03:48so the solution is in the solution that we propose that the authors proposed
0:03:52is it okay intense which basically describes a new semantic class of intense directly
0:03:59hi to the update you get a user intends
0:04:02and so here's the list of in intense so
0:04:06the first one is apparent so basically the user specifies that
0:04:10and value or multiple value
0:04:13for multi valued slots so it's basically equivalent of attending to specify values to motivate
0:04:19yourself for remove
0:04:21and basically it's the
0:04:24the complement of that basically to remove the value and from what about a slot
0:04:31expresses the press a reference for the slot so basically
0:04:36expresses and evaluate that to be preferred over previous value six enter means replace existing
0:04:41value and then have increased by and decrease by which are specific to numeric or
0:04:49i am i'm here some examples so
0:04:52so what we have here is an utterance and hundred
0:04:56very conventional slu and then satirical not intense
0:05:01and so for example we had earlier of tape show off the latest this would
0:05:05maybe common informative when a sequence joe
0:05:09and for restaurant search then we see their data and for examples of find someone
0:05:14somehow say to this would be informed up and whereas find me in gilroy instead
0:05:20become an informer place
0:05:24and then for data
0:05:26numerical examples
0:05:29for a for more seats the be common form increased by four
0:05:34it can you move to see what and for increased by
0:05:41and okay so environment and how we formulate this problem basically
0:05:46you have any user utterance a identify the intense all the slot values mentioned in
0:05:51it so the impose user utterance that's alright intact with slots and values
0:05:56and here what is update intense for
0:05:59and these five classes for all slots are so here we see two examples drop
0:06:04one person wearing
0:06:07number i guess is the slot name and one is the stuff hourly and
0:06:12updated decreased by and you're example joe can make it so and people names is
0:06:17the slot name and joseph hourly and the update intent here is to remove the
0:06:22and we formulate this is a multiclass classification of slot
0:06:26a second to five okay ten classes
0:06:29so modeling here is
0:06:33sequence labeling problem with a bidirectional lstm
0:06:39so the user utterances a sequence of slot values of tokens
0:06:43and i'm labels
0:06:45basically the user intents or words that analysis slot don't responses slot values the labels
0:06:52just generic token and we also do so the effects of quantization of slot filling
0:06:56so this is what it looks like
0:06:58and so on the bottom we have the input okay forget sunnyvale try to put
0:07:02you know instead the for sick reduced delexicalise that so we have a slot value
0:07:06we basically delexicalise it to the slot name which has been shown
0:07:11previous work to generalize better
0:07:14limited training data and the slot values themselves you know maybe a vocabulary in the
0:07:20training data
0:07:22and then we have a the embedding layer and then basically
0:07:26a typical bidirectional lstm i don't finally we have a softmax layer and we predict
0:07:32the target so you can see in this example basically okay forget are tokens
0:07:38so you very little or location you'll content words that the and for computing you
0:07:44know you know the intense words that replace
0:07:52the actual realization was silent is helpful when in generalizing to slot values not seen
0:07:57in the training data well it's
0:08:02but only really with it within a single domain so in cross want to go
0:08:07cross-domain the slot names maybe difference in you may see slot names in
0:08:12the target domain that didn't exist in the source domain
0:08:16and however different domain should if we can group slot names in two types different
0:08:22domains should share the same types of slots
0:08:25so and as an example restaurant reservation and online shopping domains have numbering gas certain
0:08:33number grocery items is about numeric five so we can we can relax if we
0:08:37can be like spoken about the lexicalised to this
0:08:41in high and we may be able to generalize
0:08:44so the solution is that the lexical items like five
0:08:47so these are the three slot-types to within the final maybe memorex
0:08:53so slots which become really increase and decrease
0:08:57and we've two
0:08:58types of multi valued slots this junk those slots which can
0:09:03take multiple values in this junction solidworks or so and there is an example of
0:09:08was lost
0:09:09or not
0:09:10i'm counterfeit which communist and can take multiple values
0:09:14in conjunction syllables than in the names of people going to the items in shopping
0:09:21okay so much
0:09:23to evaluate this and what we were acquired from a dataset was dialogs containing
0:09:28you may rate can control them in this show the multi valued slots
0:09:32in the domain ontology allow the weights you know annotations for the proposed update express
0:09:39so i'm basically an existing data sets and didn't have all of these
0:09:44so the also screen the wrong data set
0:09:48and it does basically they talk to domains or restaurants in online shopping
0:09:53and had eight different professional editors generate conversations
0:09:58and in these domains and so that it the basically asked craig our conversations corresponding
0:10:05to the task the task would be search for a restaurant
0:10:08make it in or booking by
0:10:10groceries by close
0:10:13and they were told to assume appropriate part responses cover not require building an end-to-end
0:10:18system here
0:10:19and basically don't have a button to the czech generated a were annotated with slot
0:10:24names and the update intense
0:10:27i just as a reminder this is what you eight essentially look like you have
0:10:31the utterance
0:10:32it's annotated with the
0:10:35with the slot name the slot value which would be impulse the system
0:10:39and the update intent will be detected but
0:10:44and this is the
0:10:47this for the restaurant and shopping domain this is the list the slot and names
0:10:51under types so we have been a participant names number of gas menu items cuisine
0:10:58and location restaurants
0:11:00and grossly items quantity of roast or operate items colour and size for shopping
0:11:07i mean you can see that although
0:11:10the this out of it it's not names are disjoint they stiff share the same
0:11:15and slot right
0:11:17okay so after the data was greater than at a this is what the distribution
0:11:21looks like so
0:11:23we had
0:11:23similar distributions and shopping and restaurant possibly so we don't conversations each and thirteen hundred
0:11:30and you can see on average there is
0:11:33more than one and stuff i mentioned in each utterance
0:11:39and then in terms of the actual updates
0:11:41intense themselves this is the distributions they can see in both domains
0:11:46at hand is domain and the most the most common updates
0:11:50followed by replace
0:11:53and for shopping the increased by is noticeably different compared to the restaurant which an
0:12:01which you know so it's like twelve percent and chopping verses four percent and restaurant
0:12:08okay so then and terms expire in terms of experiments there are we implement that
0:12:12the improvement of the lstm in your us and optimizes
0:12:18it data with a
0:12:20in optimize enormous problem size is sixty four cross entropy loss
0:12:25so the embedding player was initialized with pre-trained glove embeddings on the upcoming crawled dataset
0:12:32and i'm missing words were initialized randomly
0:12:37and basically the evaluation was leave-one-out cross-validation where
0:12:41because the data was created with point eight in individual editors you they didn't ones
0:12:48i intra added to the evaluation can maybe this a manager will express the same
0:12:54to basically
0:12:55for a given a follow also without they would always trained on seven editors and
0:13:00test on the other atoms data
0:13:02and every time is also the average over all it follows and only also the
0:13:06parameter tuning on the drawing on learning rates
0:13:12and then they did it also have some baseline so
0:13:16to a simple n-gram baseline and based on a word window around the slot values
0:13:22context but it will logistic regression classifier
0:13:26but because of course that the remote will slot values an utterance that they have
0:13:30to decide which
0:13:32which of these slot values given you know words or n-grams belong to so i
0:13:37went to details but they basically had two approaches to this one with a headset
0:13:40segmentation which is basically a rule based approach to deciding
0:13:44which can slot value the word
0:13:48should be
0:13:49belong to or self segmentation which basically
0:13:53create an x basically paid for their
0:13:56you know basically for every ward it could be encounters being to the left to
0:14:02right to left well to the left and between two slot values at the right
0:14:07and between two slot so you basically increase the size of the feature representation and
0:14:12in another bit baseline was the phone level related events quantization
0:14:17the use of the
0:14:19classification results for the full model so
0:14:22i guess the key point here is that you can get pretty accurate f one
0:14:27here seeking an f one score over nineteen both domain and
0:14:32i'm for quite a few of the intense
0:14:35it can get over ninety percent have one so i think for both domains the
0:14:40most difficult
0:14:41one for some reason is remove
0:14:44and it's not i could be the case that you don't have enough training data
0:14:48of the older
0:14:49increased by and decrease by actually have less
0:14:54and then to be compared to the baseline and probably on unsurprisingly the models that
0:15:00much that the model does much better than here the n-gram baseline and we can
0:15:05also see that the delexicalization helps a lot so and for restaurants a lesson to
0:15:11improve from eighty percent ninety percent
0:15:15i have one
0:15:16i don't and for a shopping from eighty four to ninety
0:15:24okay and then in terms of a the cross domain generalization
0:15:28so just
0:15:29and some terminology so here they use the in the paper to use the
0:15:34i'm not in domain versus its domain
0:15:36and i'm basically i two settings one was just combined training what you just trying
0:15:41to combine combination of in domain and out-of-domain data
0:15:44and you do mostly retraining with fine tuning so they preach chain
0:15:48and yet domain data and then finetuned only on the
0:15:52is that a typo density function on union and domain data of a both settings
0:15:57they vary the percentage of in the in domain
0:16:00it was you selected in show a core and the rest
0:16:04so here's the ear results when a restaurant was the ads domain and it shop
0:16:09was the target domain
0:16:11so the green is what happened if you only train an in-domain data
0:16:16and i think is if you use a pre-training approach and
0:16:21that is a combined training so you can see actually with zero in domain data
0:16:25there are added in pretty well like we just upgraded percents
0:16:29versus like mid ninety one is being the optimal
0:16:32and you can get pretty good
0:16:35like close to optimal results but only twenty percent of in domain data
0:16:40and when we got the opposite way the results are still pretty encouraging model are
0:16:44quite as good so what zero in domain data
0:16:47and in the f one is only seventy percent
0:16:51so in it seems to me at least act
0:16:55this suggests that we measure the restaurants data may be richer and more very so
0:17:01and i
0:17:03training on the simple case that is just not
0:17:06transferring as well
0:17:09okay so and okay so
0:17:11conclusions basically they propose a new type of slot-specific user intends
0:17:16these user intents and addresses user intent containing
0:17:20the implicit niles numerical update some preferences
0:17:24for slot values
0:17:26and the present it is sequence labeling model for classifying update intents
0:17:31and also propose a method for transfer their learning across domains
0:17:36and then also showed strong classification performance in this task a promising domain independent results
0:17:43and future they plan to incorporate a pay attention to real dialogues
0:17:49state tracking
0:17:50and so
0:17:52i'm not in order but i can try to answer some questions for say especially
0:17:56if they're clarifications question type questions "'cause" i have last
0:18:01also has a lot questions with this myself
0:18:03is not if i can also or anything
0:18:06this is the first two words are this email addresses
0:18:53not sure microsoft something very ones especially i don't see how you know you could
0:19:00just replace the nlu zero because you have four people use a task to i
0:19:05can use like the number six from the nlu there
0:19:28sure i mean that's only sense added to model so
0:19:31so i like this eight minutes of a so a question but to me to
0:19:34access more and more difficult where frame
0:19:59i have a question myself but i was only thinking on the last night so
0:20:02as to write it was too late to ask the authors if they have available
0:20:06but it's a quite it's something i call to me as well
0:20:10will be interesting to see what exactly is been confused
0:20:44i don't i mean it's so i guess
0:20:48i i'm not sure answer the question i guess this causes two steps the annotation
0:20:52one is created dialogues and you're is actually annotating the weights the slot means values
0:20:58and intent and so i guess the second part you could get inter annotator agreement
0:21:01for it on the cue cards of the source but i don't i don't believe
0:21:05they are
0:21:07they try to cover on it so agreement
0:21:09i mean that the fact that
0:21:12they can get ninety percent f one suggesting that the labels can be too noisy
0:21:18because of their very noisy would be hard to be accurate like
0:21:23that's not the same as that of course explicitly measure