| 0:00:17 | so the first present there is a man you know so | 
|---|
| 0:00:19 | these start you presentation | 
|---|
| 0:00:22 | good after don't know to one | 
|---|
| 0:00:24 | so my name is manner thus generally amount of furniture from the interaction lab | 
|---|
| 0:00:29 | of they headed for university and then gonna present work have done we don't have | 
|---|
| 0:00:34 | an so an oliver lemon | 
|---|
| 0:00:36 | about a docking outmoded task natural language understanding system for cross domain conversationally i that | 
|---|
| 0:00:42 | we call and meet nlu | 
|---|
| 0:00:45 | so and another language understanding is quite a white concept | 
|---|
| 0:00:50 | a most of the time when is about compositionally i a dialogue system it of | 
|---|
| 0:00:54 | us to the process of extracting the meeting from natural language and providing key to | 
|---|
| 0:00:58 | the dialogue system in a structured way so that the dialogue system can perform definitely | 
|---|
| 0:01:03 | better | 
|---|
| 0:01:04 | and we begin end up | 
|---|
| 0:01:07 | study studying this problem is for the sake of it but actually | 
|---|
| 0:01:10 | we did it in the context of the moment project which will see as you | 
|---|
| 0:01:14 | to be project that was about | 
|---|
| 0:01:16 | at the deployment of a robot with the | 
|---|
| 0:01:18 | multimodal interaction capability it was supposed to be deployed in a shopping one thing around | 
|---|
| 0:01:23 | and it was supposed to interact with the user's a giving them a structure entertaining | 
|---|
| 0:01:27 | them would only be little bit of chit chatting | 
|---|
| 0:01:29 | and i'm gonna show a video of it that may be explained it be better | 
|---|
| 0:01:33 | what the robot was supposed to do | 
|---|
| 0:01:35 | help you can hear the audio although they don't the subtitles | 
|---|
| 0:01:45 | i dunno one of the recording | 
|---|
| 0:01:50 | so the robot with both i sent and if no indication we just the and | 
|---|
| 0:02:00 | voice | 
|---|
| 0:02:01 | in this five phase | 
|---|
| 0:02:03 | and with or without the backing being detriment and the preference of the user | 
|---|
| 0:02:09 | right | 
|---|
| 0:02:16 | one value no straight i actually and no not attacking | 
|---|
| 0:02:32 | but for some with of the next to | 
|---|
| 0:02:35 | so we so a lot of generation but everything started with a request from the | 
|---|
| 0:02:38 | user | 
|---|
| 0:02:39 | and that's the mute one where we are focusing today so is basically designing an | 
|---|
| 0:02:45 | nlu component of with a robust enough to work and is very complex dialogue move | 
|---|
| 0:02:49 | to model dialogue system | 
|---|
| 0:02:52 | again most often in compositionally i | 
|---|
| 0:02:56 | not a language understanding is a synonym of shallow semantic parsing so this can actually | 
|---|
| 0:03:00 | the beat with the next to the | 
|---|
| 0:03:02 | morning keynote and which is the process of extracting some frame an argument structure | 
|---|
| 0:03:08 | that completely meaning in a sentence and it doesn't really matter how we call them | 
|---|
| 0:03:12 | if is intent of slot | 
|---|
| 0:03:13 | well and most of the time this types are defined according to | 
|---|
| 0:03:17 | the application domain | 
|---|
| 0:03:18 | whether they have a system two db i'm like framesemantic switched off and isolate of | 
|---|
| 0:03:22 | abstraction and is the one we are using in our context | 
|---|
| 0:03:26 | but actually some problems especially in our case when we wanted to be then interface | 
|---|
| 0:03:30 | there was able to but using several different domains while most of the time | 
|---|
| 0:03:35 | in dialogue system when you have another language understanding component they always did we must | 
|---|
| 0:03:39 | single domain or | 
|---|
| 0:03:41 | if you don't through domains at the same time | 
|---|
| 0:03:44 | and this also | 
|---|
| 0:03:44 | what because | 
|---|
| 0:03:45 | the resources are available the are always or about so looking restaurants so booking flights | 
|---|
| 0:03:51 | while we wanted our interface to be use them in several different location that can | 
|---|
| 0:03:55 | be in a domestic environmental rights of the shopping mall or in sin for example | 
|---|
| 0:04:00 | why you have to command robot | 
|---|
| 0:04:02 | formant in unseen offshore all drinks | 
|---|
| 0:04:04 | and so | 
|---|
| 0:04:05 | one of the first problem want to the system to be the system that was | 
|---|
| 0:04:08 | cross domain | 
|---|
| 0:04:09 | and even if there may be noted see a recipe for that we what trying | 
|---|
| 0:04:13 | to this problem anyway | 
|---|
| 0:04:16 | and the big problem is that | 
|---|
| 0:04:17 | most of the time dependencies into that are designed i you for dialogue system error | 
|---|
| 0:04:22 | only contain a single intent or frame | 
|---|
| 0:04:25 | while in our case there are many sentences that given to the robot | 
|---|
| 0:04:29 | which contains two different free more intense and four as can be very important to | 
|---|
| 0:04:35 | a detect both of them because if we ignore the temporal relation between these two | 
|---|
| 0:04:41 | different frames for every important to you know satisfy the user both for the codec | 
|---|
| 0:04:46 | a mess by action and also the needing of a pole at the same time | 
|---|
| 0:04:50 | so that's another problem that when you rely on these | 
|---|
| 0:04:54 | hi you know the and structure | 
|---|
| 0:04:57 | most of the time | 
|---|
| 0:04:58 | two different kind of interaction might end up being the exact same intent or frame | 
|---|
| 0:05:03 | like in this case while the actually belong in the dialogue | 
|---|
| 0:05:06 | two different kind of interaction so what we actually wanted to do is not only | 
|---|
| 0:05:10 | targeting the frame and en | 
|---|
| 0:05:13 | and the slots | 
|---|
| 0:05:14 | but also wanting a layer of dialogue acts they will tell the dialogue system | 
|---|
| 0:05:18 | the context in which these are has been said so for example in the first | 
|---|
| 0:05:21 | case we are informing the robot's that starbucks next on the all imagine that we | 
|---|
| 0:05:24 | want to teach the robot how the shopping mall is done and the second one | 
|---|
| 0:05:28 | days at a customer that is ask asking a an information about the location | 
|---|
| 0:05:32 | all starbucks | 
|---|
| 0:05:33 | so in two | 
|---|
| 0:05:35 | quickly to cup we wanted to deal with different domain of the same time if | 
|---|
| 0:05:39 | possible | 
|---|
| 0:05:40 | we wanted to talk more than one single intent and arguments | 
|---|
| 0:05:44 | the sentence and since we are also during the dialogue act so we have a | 
|---|
| 0:05:48 | moody task i could that share | 
|---|
| 0:05:49 | we have to deal also we multiple dialogue act | 
|---|
| 0:05:52 | we might argue why the | 
|---|
| 0:05:54 | is actually very important to understand both the dialogue act in this case | 
|---|
| 0:05:58 | if not the final intent is only to give information about the location of starbucks | 
|---|
| 0:06:03 | but actually we might want also to understand why | 
|---|
| 0:06:06 | the user is asking for starbucks because we need a coffee if maybe was meeting | 
|---|
| 0:06:09 | and meet shaken does not starbucks you could do could have pointed it somewhere else | 
|---|
| 0:06:13 | so far have this stuff is real important | 
|---|
| 0:06:16 | and of course | 
|---|
| 0:06:17 | we wanted to try to benchmark of and the you system a initiatives | 
|---|
| 0:06:24 | and eye gaze to off-the-shelf tools in this was given by the people are there | 
|---|
| 0:06:28 | was actually | 
|---|
| 0:06:29 | providing us with these utterances and evaluations and we will see later | 
|---|
| 0:06:34 | note the very quickly i mean is nothing complicated we tried with this | 
|---|
| 0:06:39 | this problem by | 
|---|
| 0:06:40 | addressing the three different task | 
|---|
| 0:06:42 | at the same time so this asks another of locating dialogue acts the frame | 
|---|
| 0:06:48 | and the arguments | 
|---|
| 0:06:50 | each task was solve the with a sequence labeling approach in which we were giving | 
|---|
| 0:06:55 | and label to each token of the sentence is | 
|---|
| 0:06:57 | something very common in nlp | 
|---|
| 0:07:00 | and each label was actually composed by the class | 
|---|
| 0:07:03 | of the structured we were able to target for a given task | 
|---|
| 0:07:08 | enriched with the label that can be o i o | 
|---|
| 0:07:12 | depending well | 
|---|
| 0:07:13 | the and the type was the beginning of a span of a structure they inside | 
|---|
| 0:07:18 | or was outside one of these and here we have a very easy example | 
|---|
| 0:07:21 | now the problem is that | 
|---|
| 0:07:23 | this is a linear solution for a problem which is | 
|---|
| 0:07:26 | and i gotta save because the language is a gaussian then we might end up | 
|---|
| 0:07:29 | having some structure which set actually nested inside other structure especially for freeman arguments this | 
|---|
| 0:07:35 | doesn't happen that basically never for dialogue acts | 
|---|
| 0:07:39 | but for frame and arguments this is happens quite of an especially in the data | 
|---|
| 0:07:44 | we collected | 
|---|
| 0:07:45 | so what we that was solutions kit was to | 
|---|
| 0:07:48 | basically collapse | 
|---|
| 0:07:49 | the just actual in a single linear selection and trying to get whether one of | 
|---|
| 0:07:53 | this structure | 
|---|
| 0:07:54 | was actually inside | 
|---|
| 0:07:56 | a previously target that one | 
|---|
| 0:07:58 | by using some realistic on the syntactic relation among the words of an example if | 
|---|
| 0:08:02 | find was actually | 
|---|
| 0:08:04 | syntactic child of two | 
|---|
| 0:08:06 | we could but usually sticks a by some roots actually say what that the locating | 
|---|
| 0:08:11 | nh frame was actually a embedded inside the requirement argument of the needing frame | 
|---|
| 0:08:18 | now there has been solved in a multitask fashion so we basically generate them created | 
|---|
| 0:08:23 | a single network that was dealing with that the ti in task at the same | 
|---|
| 0:08:26 | time is basically other sequence of stick with the t within quadrants yet if that | 
|---|
| 0:08:31 | is that i'm gonna show | 
|---|
| 0:08:32 | next slide is nothing but the only complicated but there are two main reason why | 
|---|
| 0:08:37 | we adopt the d is | 
|---|
| 0:08:39 | architecture first of all we wanted more or less to replicate | 
|---|
| 0:08:42 | and yet a key of | 
|---|
| 0:08:44 | and task difficulty in a sense that we were assuming actually we were | 
|---|
| 0:08:48 | not the think that the tagging they'll that backs is easier than typing frames any | 
|---|
| 0:08:52 | it easy if the target frame t v then tagging arguments | 
|---|
| 0:08:56 | and that's also | 
|---|
| 0:08:57 | i kind of structural relationship between you do it between these three because many times | 
|---|
| 0:09:00 | some frames tend to appear model friend in the context of some dialogue acts and | 
|---|
| 0:09:05 | arguments are almost always dependent on and frames | 
|---|
| 0:09:09 | extra especially when there is a strong to be i'm like from semantics | 
|---|
| 0:09:12 | and | 
|---|
| 0:09:13 | so this is these are the reason why the network is down like this | 
|---|
| 0:09:17 | and i'm going to illustrate the network quite quickly because this is a little bit | 
|---|
| 0:09:21 | more | 
|---|
| 0:09:22 | technical stuff so | 
|---|
| 0:09:24 | the input file a network with only a pretty and then one betting that we | 
|---|
| 0:09:27 | were not be training and that with the firstly there was encoding with a step | 
|---|
| 0:09:32 | of encoded with some set potentially there was supposed to capture | 
|---|
| 0:09:36 | some relationship that the bidirectional lstm encoder was in capturing because he wouldn't sometimes of | 
|---|
| 0:09:42 | attention is more able to capture relationship among words which are quite distant in the | 
|---|
| 0:09:47 | sentence | 
|---|
| 0:09:48 | and then we were feeding us yet if layer | 
|---|
| 0:09:51 | there was actually typing the sequence of four by your tags for the dialogue act | 
|---|
| 0:09:56 | in a right of the this of attention delay | 
|---|
| 0:10:00 | so for the frames it was basically the same thing | 
|---|
| 0:10:04 | but we were | 
|---|
| 0:10:06 | using shot recognition before because we wanted to provide encoded with the fresh information | 
|---|
| 0:10:11 | from the first layer so actually the lexical information but also | 
|---|
| 0:10:16 | which some information that was encoded while | 
|---|
| 0:10:18 | being it | 
|---|
| 0:10:19 | kind of i and directly being a condition on what the | 
|---|
| 0:10:23 | the dialogue act was starting so we were putting the information together and with serving | 
|---|
| 0:10:28 | the information to the next layer | 
|---|
| 0:10:30 | and the with a crf for typing of before | 
|---|
| 0:10:32 | and finally for the arguments whether again the same thing | 
|---|
| 0:10:36 | another step of encoding and crf layer with lots of attention and these came up | 
|---|
| 0:10:40 | from the experiments we have done with some ablation study it is on the p | 
|---|
| 0:10:44 | but we're another button you hear about this is the final network we manage to | 
|---|
| 0:10:49 | tune at the very end | 
|---|
| 0:10:51 | so in either was think at the beginning we wanted to | 
|---|
| 0:10:57 | benchmark this | 
|---|
| 0:10:59 | these nlu | 
|---|
| 0:11:01 | components now benchmarking and nlu for the system is quite of a big issue in | 
|---|
| 0:11:05 | a sense that the dataset and that was thing before most of these are that | 
|---|
| 0:11:10 | are quite | 
|---|
| 0:11:12 | single domain | 
|---|
| 0:11:13 | and then very few stuff | 
|---|
| 0:11:15 | i mean about an hour now that there are some doubt that direct | 
|---|
| 0:11:18 | the started go popping up but the beginning of this year we were still put | 
|---|
| 0:11:22 | on that side | 
|---|
| 0:11:24 | by likely that was these results which is score the nlu benchmark | 
|---|
| 0:11:29 | which is a bicycle cross domain corpus of hundred interaction with the house assistant the | 
|---|
| 0:11:33 | robot | 
|---|
| 0:11:34 | is mostly i or orient that is not a collection of dialogue is the only | 
|---|
| 0:11:38 | single interaction utterance interaction we with the system | 
|---|
| 0:11:42 | and callers a lot of the mean we will see later | 
|---|
| 0:11:45 | and but is mostly not oriented there are some | 
|---|
| 0:11:50 | a comments that can be used for a robot bodies mostly again i go to | 
|---|
| 0:11:53 | oriented | 
|---|
| 0:11:53 | what does a second rest of that we started collecting along the esn is taking | 
|---|
| 0:11:58 | a lot of time | 
|---|
| 0:11:59 | which is the rubber score was a is called the is like that because we | 
|---|
| 0:12:03 | stand for robotics oriented mostly task language understanding corpus | 
|---|
| 0:12:07 | and is again is a collection of single interaction with the robot that called a | 
|---|
| 0:12:12 | different domains that more think them of kind of interaction there is there is to | 
|---|
| 0:12:16 | chopping that is | 
|---|
| 0:12:17 | is state common the robot's there is a also a lot of information you can | 
|---|
| 0:12:21 | give to the robot about completion of the environmental name of both on | 
|---|
| 0:12:25 | well this kind of tough | 
|---|
| 0:12:26 | that's quite a huge overlap between the two in terms of kind of interaction | 
|---|
| 0:12:30 | but they spun on | 
|---|
| 0:12:32 | different domains | 
|---|
| 0:12:33 | so | 
|---|
| 0:12:35 | the first corpus the nlu benchmark provide us three different semantically yes | 
|---|
| 0:12:41 | and their code scenario action an entity i know this sounds completely different of | 
|---|
| 0:12:44 | from what we said before but we had to find some mappings with the stuff | 
|---|
| 0:12:48 | we where we wanted to that are go over the sentences | 
|---|
| 0:12:52 | the robot is good big the full set of it is twenty five almost twenty | 
|---|
| 0:12:57 | six thousand percent sentences | 
|---|
| 0:13:00 | and there are agent different this scenario types and each scenario busy a domain | 
|---|
| 0:13:05 | and that of the fifty four different action types and fifty six different entities | 
|---|
| 0:13:11 | there is something the goal and intent which is basically the sum up of scenario | 
|---|
| 0:13:15 | plus action and this is important for the model for the evaluation will see later | 
|---|
| 0:13:20 | as you can see there is a problem with this the dataset is that is | 
|---|
| 0:13:24 | that it is gonna cost domain | 
|---|
| 0:13:26 | is that it is more t task because we have three different semantic layer | 
|---|
| 0:13:29 | but | 
|---|
| 0:13:30 | we have always one single send audio and actions so one single intent per sentence | 
|---|
| 0:13:35 | so what we could benchmark on these it | 
|---|
| 0:13:38 | corpus was mostly these two initial | 
|---|
| 0:13:42 | these two initial factors | 
|---|
| 0:13:45 | we did evaluation according to the paper that was presenting | 
|---|
| 0:13:49 | the benchmark | 
|---|
| 0:13:50 | and this was done on a ten fold cross validation with like half of the | 
|---|
| 0:13:53 | sentences that eleven off of the sentences in this was to balance | 
|---|
| 0:13:56 | the number of classes and it is inside the on the results | 
|---|
| 0:14:02 | so i that was saying that we had to do a mapping | 
|---|
| 0:14:05 | between | 
|---|
| 0:14:06 | their tagging scheme and whatever we wanted to die which is very general approach for | 
|---|
| 0:14:11 | extracting the semantics from sentences in the context of a dialogue system | 
|---|
| 0:14:16 | bum we also so that | 
|---|
| 0:14:18 | the kind of relationship that what holding between | 
|---|
| 0:14:20 | they are semantically at one or more or less the same there were holding for | 
|---|
| 0:14:24 | our approach | 
|---|
| 0:14:26 | and so these at some result | 
|---|
| 0:14:28 | this is that are reported in the be but there are quite old in a | 
|---|
| 0:14:31 | sense that they are from the beginning of this the they've been evaluated in two | 
|---|
| 0:14:34 | thousand eighteen | 
|---|
| 0:14:35 | they have been around on all the open source | 
|---|
| 0:14:39 | reduction of these that nlu component of dialogue system available of the shots | 
|---|
| 0:14:46 | that's a problem we want some because you know why second specific training for entities | 
|---|
| 0:14:50 | and these was not possible because it does a constraint on the number | 
|---|
| 0:14:56 | of entity types and ended example you can pass do we do we try to | 
|---|
| 0:15:00 | talk with what some people but we didn't manage to get the licensed at least | 
|---|
| 0:15:03 | to run a one training with the full set of things so do you have | 
|---|
| 0:15:08 | to take that into account too much unfortunately | 
|---|
| 0:15:11 | the intent that was think is the sum up of the scenario | 
|---|
| 0:15:14 | and an action | 
|---|
| 0:15:16 | and these | 
|---|
| 0:15:19 | performance is then | 
|---|
| 0:15:21 | obtain it on ten fold cross validation i didn't about the standard deviation because | 
|---|
| 0:15:25 | it would they were almost all stable but if you want to look at them | 
|---|
| 0:15:28 | they're on the paper | 
|---|
| 0:15:29 | and the other important thing is that we want to take into account whether it's | 
|---|
| 0:15:34 | upon | 
|---|
| 0:15:35 | of a target structure to was matching exactly actually | 
|---|
| 0:15:39 | the elders of the people when in taking into account that | 
|---|
| 0:15:41 | but they got the true positive whether there was a and an overlap | 
|---|
| 0:15:45 | an overlap of the of the spun | 
|---|
| 0:15:48 | so these are kind of a lose metric | 
|---|
| 0:15:50 | that we whatever we are evaluating one | 
|---|
| 0:15:52 | we can see that the entity for the entity and then the combined setting a | 
|---|
| 0:15:57 | our system was the performing on average better than the other while for the intent | 
|---|
| 0:16:01 | we will actually not performing as what is what some but better than the other | 
|---|
| 0:16:06 | two system | 
|---|
| 0:16:07 | the other important bit is that the combined the | 
|---|
| 0:16:11 | measure is actually the sum up of the two confusion matrix of intents and entities | 
|---|
| 0:16:15 | are we doesn't | 
|---|
| 0:16:16 | actually give us anything about the pipeline | 
|---|
| 0:16:18 | our the full pipeline is working | 
|---|
| 0:16:20 | but these a something that we have done | 
|---|
| 0:16:22 | on our corpus which is much smaller | 
|---|
| 0:16:25 | and is not yet available because that we are still gathering data | 
|---|
| 0:16:29 | probably end of this year we're gonna release it | 
|---|
| 0:16:32 | i know if you colours are very natural environment but for people doing a chair | 
|---|
| 0:16:37 | are your dialogue in the context of robotics this can be | 
|---|
| 0:16:39 | one interesting | 
|---|
| 0:16:42 | so here we have eleven dialogue types and fifty eight frame types | 
|---|
| 0:16:46 | which compared to the number of example is quite high | 
|---|
| 0:16:49 | and eighty four frame element types of which are the arguments | 
|---|
| 0:16:52 | and if you can see | 
|---|
| 0:16:54 | not always but there are many cases in which will we have more than one | 
|---|
| 0:16:58 | frame per sentence and what them more than one that about but sentence | 
|---|
| 0:17:01 | and no idea the frame elements are quite a lot | 
|---|
| 0:17:07 | we i have like | 
|---|
| 0:17:09 | they fit into semantic space body into these three is more formally the only tool | 
|---|
| 0:17:13 | because | 
|---|
| 0:17:13 | we have thirteen dialogue acts exactly like we so during the in the rest of | 
|---|
| 0:17:16 | the presentation | 
|---|
| 0:17:17 | and we also provide semantics in a them in terms of frame semantics | 
|---|
| 0:17:22 | well we have three main frame elements these are actually this the same the same | 
|---|
| 0:17:25 | semantic layer theoretically but there are two different layers or variational e | 
|---|
| 0:17:30 | and if you can see we have a lot of four | 
|---|
| 0:17:32 | embedded structure a frame inside on the frame and this kind of stuff | 
|---|
| 0:17:36 | a this is the mapping we had to do again | 
|---|
| 0:17:39 | with the different semantic layer is basically same dialogue acts dialogue acts frames and frames | 
|---|
| 0:17:43 | and frame element some arguments | 
|---|
| 0:17:46 | and of course | 
|---|
| 0:17:47 | the these are the two aspect that we could tackle why using this corpus so | 
|---|
| 0:17:51 | is not incur of domain because he's not a score of the mean of the | 
|---|
| 0:17:54 | other one | 
|---|
| 0:17:54 | it is enough to have that we have | 
|---|
| 0:17:56 | different kind of interaction and we have also sentences coming | 
|---|
| 0:17:59 | from two to different scenarios that can be | 
|---|
| 0:18:03 | the house scenario and the shopping mall scenario jealousy charting something coming from these interaction | 
|---|
| 0:18:09 | with the month in answer about | 
|---|
| 0:18:12 | but we don't want to sell it is completely closed domain mostly because the other | 
|---|
| 0:18:17 | record with a much more of the mean than this one | 
|---|
| 0:18:19 | but it every multi task and is there really moody dialogue at frame on each | 
|---|
| 0:18:23 | sentence | 
|---|
| 0:18:24 | and k that is out of | 
|---|
| 0:18:27 | the might look quite we hear the about | 
|---|
| 0:18:29 | i'm gonna explain why the like this | 
|---|
| 0:18:31 | so most that's one i report here is the same exact measure that was reporting | 
|---|
| 0:18:36 | for the nh the nlu benchmark so | 
|---|
| 0:18:38 | we have take into account only when the span | 
|---|
| 0:18:40 | of to structure the overlap okay | 
|---|
| 0:18:43 | and | 
|---|
| 0:18:43 | the results are quite high | 
|---|
| 0:18:45 | and the main reason is that to the corpus is not been delexicalised | 
|---|
| 0:18:49 | so there are sentences are quite similar | 
|---|
| 0:18:52 | and then the system be a very well | 
|---|
| 0:18:53 | but you don't have to get parts of by doubt because | 
|---|
| 0:18:56 | if we look at the last one could be the second one is basically only | 
|---|
| 0:18:59 | using the | 
|---|
| 0:19:00 | the coal two thousand set of task evaluation which is a standard and we report | 
|---|
| 0:19:05 | the need for general comparison with other system | 
|---|
| 0:19:07 | but the most important one is the last one with a that is the exact | 
|---|
| 0:19:11 | match | 
|---|
| 0:19:11 | and the laughter of the exact match is telling us | 
|---|
| 0:19:14 | how well the system over the pipeline with working completely so we were taking into | 
|---|
| 0:19:18 | account the exact span | 
|---|
| 0:19:21 | of | 
|---|
| 0:19:23 | all of the target structure | 
|---|
| 0:19:24 | and also | 
|---|
| 0:19:25 | we were | 
|---|
| 0:19:26 | yes we were | 
|---|
| 0:19:30 | we were actually | 
|---|
| 0:19:31 | trying to get | 
|---|
| 0:19:32 | i mean a frame was actually correctly dog only if the also the dialogue that | 
|---|
| 0:19:36 | what's quality data so with actually the end-to-end system | 
|---|
| 0:19:39 | in a pipeline and that is | 
|---|
| 0:19:40 | the measure we have to chase | 
|---|
| 0:19:43 | no two | 
|---|
| 0:19:45 | conclude and some future work so the system that i presented which is these their | 
|---|
| 0:19:49 | cross domain moody task | 
|---|
| 0:19:52 | and that you system for not a language understanding to | 
|---|
| 0:19:55 | for conversational i a that we designed a is actually running in the shopping mall | 
|---|
| 0:20:01 | you feel on | 
|---|
| 0:20:03 | the video i showed you was formed from the deployment we have done | 
|---|
| 0:20:07 | and is gonna be derived for three months in a role | 
|---|
| 0:20:09 | some pos during the weekend to do some matter out easy vendors rebooting the system | 
|---|
| 0:20:13 | but we | 
|---|
| 0:20:14 | manage to collect a lot of the time order maybe integrate them in the corpus | 
|---|
| 0:20:17 | and release it and of this year | 
|---|
| 0:20:19 | if we manage to back them properly into the checking only the latest beginning of | 
|---|
| 0:20:23 | next year | 
|---|
| 0:20:25 | we have to deal with their this area with different a demon sad this | 
|---|
| 0:20:28 | it means not relying on these heuristic on the syntactic structure but actually simultaneous most | 
|---|
| 0:20:33 | honestly starting | 
|---|
| 0:20:35 | in but that's sequences are moved event sequence e the canopy one inside the other | 
|---|
| 0:20:38 | if any topic because we actually already of this system we | 
|---|
| 0:20:42 | finally the final added few months ago so we didn't have time to the meeting | 
|---|
| 0:20:45 | here but these exist and then there is a branch on that everybody the ti | 
|---|
| 0:20:50 | show you which is about this new system | 
|---|
| 0:20:55 | but of our work is | 
|---|
| 0:20:56 | this one of generating a general framework for frame neck structure so it doesn't | 
|---|
| 0:21:01 | method it's you audio the application that is the reason behind | 
|---|
| 0:21:04 | we are trying to create a network that can be with all the possible frame | 
|---|
| 0:21:08 | like structure passing this is our a long-term goal something very big but we are | 
|---|
| 0:21:13 | actually pushing for that | 
|---|
| 0:21:14 | and the last bit is mostly dealing with this special tagging of segment that a | 
|---|
| 0:21:19 | segmented utterances we are like that in our corpus there were many | 
|---|
| 0:21:23 | small bit of sentence that the user with one thing because they were stopping you | 
|---|
| 0:21:27 | the basic dating so the missing the first part of the sentence like i would | 
|---|
| 0:21:30 | like to | 
|---|
| 0:21:31 | and there's asr what actually this equation is that was sending the thing to the | 
|---|
| 0:21:36 | bus set and the bus to work correctly by think it by the with some | 
|---|
| 0:21:39 | bit missing | 
|---|
| 0:21:40 | now when the user with thing | 
|---|
| 0:21:42 | to find the starbucks for example we receiving these find the starbucks there was contextualize | 
|---|
| 0:21:47 | the as a fine finding locating frame | 
|---|
| 0:21:50 | but we didn't know it was also a frame element of the previous | 
|---|
| 0:21:53 | structured so we are studying the way to | 
|---|
| 0:21:55 | make the system aware of what has been part before | 
|---|
| 0:21:58 | so that you can actually give more info what information in the context of the | 
|---|
| 0:22:02 | same utterance even if these broken by idea is to | 
|---|
| 0:22:05 | and | 
|---|
| 0:22:06 | this is everything | 
|---|
| 0:22:07 | okay thanks very much | 
|---|
| 0:22:13 | okay so that's it's time for questions | 
|---|
| 0:22:23 | no him | 
|---|
| 0:22:30 | hi and thanks to the rate talk and always good to see rows of being | 
|---|
| 0:22:34 | benchmark i'm just curious did you use i just default out of the box parameters | 
|---|
| 0:22:38 | the did you do but it during | 
|---|
| 0:22:40 | so i we just with the results from the people of the benchmark and they | 
|---|
| 0:22:45 | were only saying that the | 
|---|
| 0:22:48 | something like a little bit of the and specific training and would for the end | 
|---|
| 0:22:51 | it is something like that | 
|---|
| 0:22:54 | and bumper for and they use the version | 
|---|
| 0:22:57 | there was to using the crf and not the narrow one and a tensor for | 
|---|
| 0:23:01 | one okay so that's actually like a very basic version i suppose | 
|---|
| 0:23:08 | questions | 
|---|
| 0:23:09 | okay | 
|---|
| 0:23:12 | so he showed the architecture their with some intermediate layers also be serious are they | 
|---|
| 0:23:18 | also into me just supervision here | 
|---|
| 0:23:21 | thirty one so this labels via alarm and sonar they also | 
|---|
| 0:23:25 | supervised labels used as you know that is all the supervised parts of the five | 
|---|
| 0:23:29 | multitasking in this sense that we are solving the three task at the same time | 
|---|
| 0:23:32 | so you need | 
|---|
| 0:23:34 | slightly more complicated data set for that to have all of that supervised | 
|---|
| 0:23:38 | while we have more labels than just and | 
|---|
| 0:23:41 | we need to the dialogue act in this case what are the scenarios we need | 
|---|
| 0:23:44 | the egg the actions and the frame and their the arguments basically so that's why | 
|---|
| 0:23:49 | the data vectors is called the moody does because we have this three layers okay | 
|---|
| 0:23:53 | but for a c was really important to different seed we didn't action and dialogue | 
|---|
| 0:23:57 | acts because have a show you | 
|---|
| 0:23:58 | it will many cases in which it was important for the robot to have a | 
|---|
| 0:24:02 | better idea of what was going on in the single sentence okay | 
|---|
| 0:24:06 | okay | 
|---|
| 0:24:10 | thanks for talking a question in the last slide you mentioned it's a frame like | 
|---|
| 0:24:15 | so what's the difference between four and like on the framenet | 
|---|
| 0:24:19 | a frame like so unlike what if a to whatever can be | 
|---|
| 0:24:25 | mm someone is the enough traction which represent a predication in a sentence and have | 
|---|
| 0:24:30 | some arguments | 
|---|
| 0:24:32 | this is like the general frame like you know like the very | 
|---|
| 0:24:35 | bold | 
|---|
| 0:24:35 | it's the same as the frame that's so the data was this decision making the | 
|---|
| 0:24:39 | same that big difference is that frame at the very busy fight ut behind | 
|---|
| 0:24:43 | and that there are some extra two d is the most things like some relationship | 
|---|
| 0:24:47 | between frames and the results of special frame elements like at the lexical unit itself | 
|---|
| 0:24:51 | which make it easier to look at the frame in the sentence | 
|---|
| 0:24:54 | but | 
|---|
| 0:24:55 | what we like to do is it doesn't matter where e framenet thirty five just | 
|---|
| 0:24:58 | in time slot like from the i-th this corpus or any other corpus | 
|---|
| 0:25:02 | wait like to i'm we are trying to build the is a shallow semantic by | 
|---|
| 0:25:07 | so they can deal with all this stuff of the same time | 
|---|
| 0:25:09 | as better as possible is if a kind of map task but we have trying | 
|---|
| 0:25:13 | to incorporate these different | 
|---|
| 0:25:14 | aspects of the ut is then we have trying to deal with them | 
|---|
| 0:25:17 | more or less that in different ways but without compromising | 
|---|
| 0:25:21 | the assistive led to all their kind of formant | 
|---|
| 0:25:24 | one other question with us what to that used for data annotation | 
|---|
| 0:25:29 | so we actually had to for our corpus we had to develop already interface | 
|---|
| 0:25:34 | is always nice basically a web interface where we have all the token i sentence | 
|---|
| 0:25:39 | and we can talk everything on that and the score was as be entirely i | 
|---|
| 0:25:45 | mean something with been collecting in the last we have then it takes a long | 
|---|
| 0:25:48 | time ago it's a it's | 
|---|
| 0:25:51 | it is a hard task to collect these sentences and also we have to filter | 
|---|
| 0:25:54 | out many of them because the context of the most different i sometimes we went | 
|---|
| 0:25:59 | to the rubber gap to do this collection and of a lot of noise and | 
|---|
| 0:26:02 | things we were also value that you're | 
|---|
| 0:26:05 | file of these then we stopped but in the and we were always applying some | 
|---|
| 0:26:09 | people from all alarm | 
|---|
| 0:26:11 | to annotate them like to three of them then you know doing some unintended beam | 
|---|
| 0:26:14 | and annotation trying to get whether the actual understood out that but with working if | 
|---|
| 0:26:18 | a very long process okay and | 
|---|
| 0:26:21 | we're the computational linguist but opposite thing point so | 
|---|
| 0:26:24 | it is very hard but this that's | 
|---|
| 0:26:29 | that's that the situation with the corpus | 
|---|
| 0:26:32 | okay so we have run time so it's not speak again | 
|---|
| 0:26:36 | okay | 
|---|