0:00:15and the hybrid hamming layer i it from a liberal and i'm here to present
0:00:19a data set i collected and annotated with my colleagues at a little bit
0:00:25highness is actually here with me if you want to talk to him
0:00:32there is a the motivation behind this dataset is that there is indeed
0:00:36for dialogue systems to be able to handle complex interactions
0:00:41one motivation comes from studies and e commerce and there is a paper by month
0:00:45later in twenty eleven
0:00:47where they show that users that come to an e commerce website problem sometimes with
0:00:51a very well defined cool
0:00:54in mind but sometimes they just come to shop around or they don't really know
0:00:58what they want just one to look for options
0:01:01there is also
0:01:03sorry some interest in the dialogue community and most notably there was a paper last
0:01:09or it's a dialogue by finding the and i mean distance
0:01:12i think it was based that's papers
0:01:14last year
0:01:15it's that has any idea that the state tracking for flexible interaction
0:01:19and is this in this paper they try to move a beyond the traditional
0:01:24linear slot filling paradigm and try to handle more complex
0:01:29conversations where you have different user goals and possibly across domains
0:01:33so we decided are so for this work actually didn't have a proper dataset to
0:01:38test their method because they there wasn't anything available
0:01:42so the
0:01:44modified an existing data set and so we decided to actually try to collect data
0:01:49and promote this kind of work for future dialogue systems
0:01:55so we collected one thousand two hundred and sixty nine human-human interactions and the travel
0:02:01we also propose a new time frame tracking and the dataset is fully annotated and
0:02:06publicly available at this url
0:02:11so when i talk about linear slot filling what i mean it's something like this
0:02:16is actually here dialogue from the dataset
0:02:19and here and so the user basically gives you some constraints you want to go
0:02:24somewhere from columbus it doesn't really know where
0:02:26then the wizard is the agent two plays the role of the dialogue system
0:02:31he proposes two options vancouver draw no then the user gives a bit more information
0:02:35about his constraints
0:02:36and then at the end of day and then the user asks
0:02:39for information about the offers from the wizard
0:02:42and that the and the user box the
0:02:45one of the proposed trips
0:02:46so here the user will never really changes during the dialogue it's very just drilling
0:02:51down some options
0:02:53and by nonlinear slot filling i mean something like this dialogue which is also from
0:02:58our data is that it was able to onto to support entirely on the slides
0:03:02are just cut the interesting part
0:03:04so here this is a representation of the different options that the user
0:03:09see the mouse you can okay
0:03:12so on the left
0:03:13the this is a representation of the different options and goals that the user might
0:03:18have during the dialogue
0:03:19so by nonlinear slot filling what i mean is that at the beginning the user
0:03:24is talking about or in some going to toronto
0:03:27and then and he explores a options and i think in green
0:03:32but at the end of the dialog the actually decides to go back to that
0:03:36you're on a trip and then
0:03:38so in this case
0:03:39and the user goal changes during the dialogue but the user also goes from one
0:03:43able to the other and if we want to be able to actually broke the
0:03:47drawing a package for this trees are we need to remember it
0:03:51so let's that of into the details of the datasets freeze the domain so it's
0:03:55a travel domain we had trouble packages with a round trip flight and a hotel
0:04:00this is an example of a package so you had you hold our
0:04:03the flights with their time and the dates
0:04:07and for the hotel we had are the category which is the number of stars
0:04:11we also have guessed readings on a scale of and
0:04:14of one to ten and amenities and vicinity so
0:04:18on the rows
0:04:20those are the first one is
0:04:22a bit too small to read
0:04:24but it vicinity so vicinity of the hotel you have something like shopping malls museums
0:04:31but is universities airports et cetera so that
0:04:36the distribution
0:04:37and on the o
0:04:39a button graph we had the number of amenities burr hotels so the amenities could
0:04:45be breakfast wifi
0:04:47whether the what has a spot those kind of things
0:04:50and so that for most hotels we have more than one and automatically so that
0:04:55the users
0:04:56had something some ground
0:04:58some matter to compare to what else one against each other
0:05:02and we had two hundred and sixty eight hotels and one o nine cities in
0:05:10so for this dataset we hired
0:05:12twelve participants to collect the entire data
0:05:17are over twenty days don't our data collection last
0:05:20the twenty day i'll for of the participants
0:05:24it entire data collection and the other ones where hired for just one week
0:05:30and each dialogue was performed ugly a chat on slack
0:05:34so we had about that was a pairing up to user is
0:05:39and then they can they were able to chat so when the user what spare
0:05:43to a wizard you would get a task
0:05:45and we generated those that is based on templates like this one
0:05:49so are basically we tell the user his goal
0:05:52and to generate those are tasks from the templates we just replace the placeholders for
0:05:57the different entities with values that we randomly true from the database
0:06:03two very the task
0:06:05we actually
0:06:07word error probability for each template
0:06:10so for this template would say
0:06:12and has a probability of additive
0:06:15point five to succeed
0:06:16so that means that when we actually wary the database with the entities
0:06:22well fifty
0:06:23present of the time it will every turn results and fifty percent of the time
0:06:27it want to return results
0:06:29and when it won't return results we would give to the user we would either
0:06:32tell the user to close the dialogue
0:06:34or we would give him some alternative like if nothing much easier constraint then tried
0:06:39increasing your budget by twelve hundred
0:06:42so as i said we only had twelve participants and we collected a bit more
0:06:46than a thousand dialogues
0:06:47so to keep it interesting for them
0:06:50we tried to tell them to play roles and try to very the way they
0:06:55speak to the to the wizard and to anchorage just a bit more we also
0:07:00growed sound fine
0:07:02templates like this one so that was at the time when pocket mango was very
0:07:08popular so we told them to pretend that there are pokemon hunter and they're really
0:07:12wanna go to the city because there is a very rare pokemon there and that
0:07:16they should find a good package to do that
0:07:21to keep it interesting we are created such templates and we then kind of
0:07:26throughout the day data collection so that they would have different tasks and they did
0:07:31they would they would stay engaged in the data collection
0:07:39we also gave some instructions to the user to make sure that we collected dialogues
0:07:43that we could use so we told them to not use too much and comments
0:07:47buying but also to use some so that you know what it's data bit realistic
0:07:53so we told them to make personally the lectures and
0:07:59we also told them to feel free to and the conversation at any time because
0:08:03we wanted them to feel like they're real users
0:08:05and for that we also created some templates that would
0:08:10and courage to select one of the templates words
0:08:13you're a pop star you're an absolute geneva and you want accept anything under five
0:08:18so sometimes you know there would be we act like a different just close the
0:08:22dialogue and leave so that was interesting for us to have different cases the
0:08:26successful dialogues in there are lots where the user would just three
0:08:29we also told them to try to spell things directly to keep not too complicated
0:08:35and we told them to
0:08:38try to determine what they can get for their money so that they would really
0:08:41exploring the options compare the hotels and
0:08:45try to figure out what's in the database
0:08:49so on the wizard side so the agent
0:08:53playing the role of the dialogue system at the beginning of each dialogue they get
0:08:57a link to search interface that look like that
0:09:01so on the left
0:09:04you have although searchable fields and on the right you have the results
0:09:08and for each search the wizard will always get up to ten results so from
0:09:12zero to ten
0:09:15and you can also see
0:09:17the little tab on top
0:09:18so basically what we did is that
0:09:21every time the user would change i've been strange so it might so here it's
0:09:25for which cd baltimore
0:09:28if the user would say then okay what about to run all then we create
0:09:32this search and you have so that if the user wants to go back to
0:09:36the wizard can do it easily and wouldn't have to repeat the search over again
0:09:42and we also gave instructions to see whether it
0:09:45those where whites
0:09:46critical for us to be able to have a dataset where we can actually try
0:09:50to imitate the wizard behaviour
0:09:52so we told them to be polite and not jump
0:09:56and on the role played by the user
0:09:58claim that a mistake
0:10:00and this the start point also relates to that we told them your knowledge of
0:10:05the world is only a limited by the database because we don't want the wizard
0:10:08to start talking about pokemon
0:10:10or things that we can't we don't wanna dialogue system to do so we just
0:10:14pull them to
0:10:15you know that the user is gonna play a role in be kind of funny
0:10:18but try to just
0:10:19talk like a dialogue system basically
0:10:23i we also tell them to told them to try to spell things correctly for
0:10:28and now the second point we told them to very the way a cancer
0:10:33the user and we told them that sometimes
0:10:36they can try to say something that is a bit impromptu so imagine if you're
0:10:40having a dialogue and then the middle of it the wizard with say hello
0:10:44doesn't make sense
0:10:45and we did that because we wanted to have so
0:10:48we have a lot of experience in training dialogue systems with reinforcement learning and the
0:10:52problem with that is that if you only have
0:10:55positive examples and you don't know
0:10:57what a mistake looks like so something that you shouldn't do at some point of
0:11:01the dialogue it's it makes it a bit hard
0:11:03and as a way to
0:11:06measure how
0:11:08how that
0:11:09was there are in the in the dataset we ask
0:11:12the user to read the dialogue at the end of each dialogue
0:11:16and we told them to base the rating only on the wizard behaviour so if
0:11:20they didn't get any results because there wasn't any result in the database
0:11:24but the wizard was helpful and we told them to give a maximum score
0:11:28so we had suppose on the scale of one to five and those are available
0:11:31as the dataset
0:11:32and as we can say as we can see there are a few most of
0:11:36them have
0:11:37the maximal score of five but somehow
0:11:40lower scores because the wizard was not completely operators and the actions that were not
0:11:45very helpful
0:11:49then other statistics of the corpus this is the proportion of dialogue
0:11:53through dialogue length so number of turns in a dialogue as you can see
0:12:00for of the dataset is around
0:12:03fifteen turns bird the averages that fifty turns per dialogue so even though we have
0:12:07only one thousand three hundred sixty nine dialogues we have about twenty thousand turns in
0:12:15a then this is the number of dialogue act
0:12:19this is the distribution of dialogue act types in the dataset so we had about
0:12:22twenty dialogue act types
0:12:26and the number of dialogue acts per turn so during one turn because it's human
0:12:32dialogues and
0:12:34there was more than one dialogue act per turn very often as you can see
0:12:38about three percent of the time
0:12:40there is more than one dialogue act type opportunity
0:12:46that is that isn't in frames so once a frame but we
0:12:49so and i said what we really want to do is
0:12:52remember everything that the user has
0:12:54tool this during the dialogue so that we can
0:12:59get back to one option if the user decides to put that option in the
0:13:02so we took inspiration from state tracking and the definition of a state and a
0:13:08dialog state tracking challenge in this challenge they define the state by the user constraints
0:13:14and at the user requests so everything that the user's task if he asks for
0:13:19the price or for the
0:13:22the name of the what out that that's a request
0:13:25and we also added things that we
0:13:28saw in the dataset and that we needed
0:13:30one is user binary questions so those are questions where you have
0:13:36so the user is
0:13:37a request is like the user is asking for price
0:13:40a binary question is when the user asks is the price
0:13:43two thousand dollars for instance so that's the yes no answer
0:13:47and we also had comparison request
0:13:50where the user as
0:13:51to compare something between two or tells you can ask if there is what do
0:13:55a cheaper than hotel be for instance
0:13:58and so those are examples of frames and the how their related so those two
0:14:03hotels are children of the
0:14:06the bowl
0:14:08as you can see
0:14:09and something you in our dataset is that
0:14:14frames can be created by users but also by whether it's so every time the
0:14:18wizard makes a proposition for hotel we create a frame because we want to remember
0:14:22it in case the user wants to book this hotel
0:14:27so we had a we
0:14:29made up a few rules for frame creation after analysing the dataset and seeing what
0:14:34makes sense
0:14:35and for frame creation
0:14:37we create a new frame every time the user changes a value so here at
0:14:42the beginning the user is to go to atlantis so that's one frame
0:14:46and then on these are utterance the user asked to go to never land and
0:14:51sold or destination cities change the we create an you separate frame with this value
0:14:56for the destination city
0:14:57actually changes a more entities here but we need to just have one tend to
0:15:02change to creating you frame
0:15:04and so that's one type of frame creation but we also create a new frame
0:15:10one the wizard makes a proposition for hotel and we put in this frame all
0:15:14the properties of the hotel
0:15:16so that gives you are frequencies of those behaviours
0:15:20in the dataset
0:15:21as for changing frames
0:15:24as you can see it's all user controls
0:15:27because we want
0:15:29the wizard to really be an assistant and
0:15:31just a dialogue system to really be an assistant and propose things but then the
0:15:35user controls what we're talking about the user controls the topic and the
0:15:40in the dialogue so the user or only has the power to change the frame
0:15:45that were talking about
0:15:46and so that happens
0:15:48which in you frame when the user proposes a new values a leafy changes the
0:15:52destination city then we automatically switch to that new frame
0:15:57if the user decides to consider an option a hotel and ask more information about
0:16:01those this option then we also switch to that option is a frame corresponding to
0:16:06that option
0:16:07and we can also switch to an earlier frame if the user says for instance
0:16:12and the dialogue that actually earlier okay let's go back to toronto package then we
0:16:16switch to the frame corresponding to the toronto package
0:16:21we also have annotations for dialogue acts and slots
0:16:26so the dialogue acts
0:16:28we have general purpose function still kind of typical dialogue act inform offer compare
0:16:34we also have dialogue act specific for frame tracking with the which is which frame
0:16:38that in the case when the user switches to are a frame
0:16:42then a for the slots we have all the fields in the database we also
0:16:47have specific ask the slots describing specific aspects of the dialogue
0:16:53while one is intense so the intent of the user is to book for instance
0:16:57action is their counterparts on the on the wizard side so the wizard book a
0:17:01hotel we annotated as action equal book
0:17:05and count is when the user gives the number of hotels in the database corresponding
0:17:10to the user constraints are sometimes the wizard will they i have stream or tell
0:17:14them about a more since the we would
0:17:16we would annotated with count peoples three
0:17:20and then we have specific
0:17:23to report
0:17:25the creation and a modification of that of a frames
0:17:28so we actually
0:17:31automatically annotated the frames and the content in the under frames based on those slots
0:17:36so those slots are it for each new frame we give a to a new
0:17:41reference so every time the user preferences the past frame
0:17:45and read and write
0:17:47so i'm gonna go faster here
0:17:49so that's an example of how we used read and write
0:17:53for read it's
0:17:54basically it so we sorry wherein frame five here the "'cause" the active frame is
0:17:59frame five
0:18:01but the wizard five talks about
0:18:04values that were provided in frame for so reread those values from frame for and
0:18:09we would put them in figure five
0:18:10and for right it's on the last utterance
0:18:14duh wizard provides new information
0:18:17about a frame that we already talked about before so we write this information and
0:18:21the preview in frame for
0:18:24even though we're the currently active frame is
0:18:27the frame number six a basis it's a bit
0:18:30complicated like that but
0:18:31it's basically a way to track of all the values and then
0:18:38populate the content of the frames
0:18:42so i statistics are some statistics of frame changes in the dataset
0:18:46the average number of frame changes
0:18:49created per dialogue is six point seven
0:18:52and the average number of frame switches is a three point
0:18:56fifty eight and we get a we have a lot of variability between the daleks
0:19:00as you can see here
0:19:01so we observe do the behaviour that we wanted to observe
0:19:06we also trying to see so we had five experts annotating the dataset and we
0:19:12evaluating how well they agreed on the annotation
0:19:17and we got a reasonable agreements
0:19:21so we propose baselines with for this dataset one is an nlu baseline that was
0:19:27choose to you kind of how hard piano your task was
0:19:30we adapted model from arnold and colleagues published in twenty sixteen
0:19:37and we predict dialogue act type and slot
0:19:39and slot values and we get about eighty percent accuracy so
0:19:45it's all already pretty good but there is room for improvement
0:19:48so for frame tracking ripple for the task
0:19:51so if you want to create a dialogue system that's gonna be able to
0:19:56in memory all the frames talked about during the dialogue you'll have to do it
0:19:59to create the frames dynamically as throughout the dialogue but we decided to take the
0:20:04first step
0:20:05of having a simple task
0:20:06so if you know all the frames created so far you have the new user
0:20:11and the nlu annotation for this user utterance so you know the dialogue acts in
0:20:16the slot types
0:20:17and the task consists of for each
0:20:23find the frame that it references so here for instance
0:20:27that's efficiency nipples mine reference to frame number one
0:20:31budget a post you cheaper actually makes was created new frame
0:20:35and flexibly view of the steeple true refers to the current frame
0:20:42are we proposed a rule based baseline that was very simple and that we just
0:20:47we just observed some behaviour in the and the dataset and so we propose a
0:20:52very simple baseline so basically if the user can forms a new value we create
0:20:57a new frame
0:20:59we switch to a previous frame if we find the mouse is that the user
0:21:03is talking about in one of the previous frame
0:21:05and basically
0:21:08very simple rules are those of some for
0:21:12switching to frames
0:21:13and of so the performance was bad because rules are not enough to do this
0:21:19we kind of breaking down based on
0:21:22different cases and the dataset so it
0:21:25for frame switching
0:21:26if the user provides a slot so it's as they are let's go back to
0:21:29toronto package
0:21:31then we get about forty five percent performance
0:21:34if the user replies to a previous frame but without specifying a specific slot
0:21:39then it's harder because we don't it's harder to understand what the users talking about
0:21:45after a wizard after the wizard proposes a hotel so that after an offer
0:21:50most of time the user will ask for more information about this hotel so
0:21:55very often we would switch to that frame so what that's easier also to predict
0:21:59and it's easier than one there is no offers so we get a lower performance
0:22:05and for frame creation we can predict that no frame is greeted but it's harder
0:22:10to predict when the frame is created
0:22:13and as followup work we
0:22:16okay so we had a paper was the better model that
0:22:21outperform the baseline by a lot
0:22:24we presented it workshop at a c l very recently
0:22:28and so to conclude this is the new human dataset to study complex state tracking
0:22:34we have turn level annotation of dialogue act slots and phrase we also propose a
0:22:38new task which is frame tracking and some baseline
0:22:41thanks for your attention
0:22:49the first minutes for questions
0:23:02fixed would talk could utilize the language variability
0:23:06but it's a few but anyway
0:23:10over one thousand dialogues actually the user actually filled or increasing the
0:23:18so by just eyeballing we didn't really
0:23:21compute anything but by just looking at the dialogue they really playing the really get
0:23:25into a they play the roles and they just change their language sometimes it goes
0:23:30from very polite to more
0:23:33like young speaking it there's a lot of variability thanks to
0:23:51possible to combinations so it is to monitor from you to see would be
0:23:58to generate will fall
0:24:00so it's of combinations able to do something over it sorry if
0:24:05but only from you
0:24:08to work well
0:24:10so that's
0:24:11that's something we decided not to deal with the we actually asked to always talk
0:24:16about one thing at a time
0:24:20but with the true for example the system
0:24:22is it should have seen from small words
0:24:27we would have would have all right
0:24:42to thank you for interesting to before i just quickly you and the u
0:24:51three point but the among the you can use you pixels detailed results tools to
0:25:00promote collagen dreams
0:25:04so we record all those urges and that the end result of the such as
0:25:12that's an idea that we had we have we haven't really try to see if
0:25:16it's really reliable but
0:25:18everything was not searchable database as well so that's probably had and we're actually
0:25:25that something when it's a we're collecting more dialogue right now to make it bigger
0:25:29and now we're gonna make all the field in the database searchable
0:25:33so that we can record of those searches and then do something like that
0:25:39just one more question
0:25:44all clusters let's take the speaker again