and the hybrid hamming layer i it from a liberal and i'm here to present

a data set i collected and annotated with my colleagues at a little bit

highness is actually here with me if you want to talk to him


there is a the motivation behind this dataset is that there is indeed

for dialogue systems to be able to handle complex interactions

one motivation comes from studies and e commerce and there is a paper by month

later in twenty eleven

where they show that users that come to an e commerce website problem sometimes with

a very well defined cool

in mind but sometimes they just come to shop around or they don't really know

what they want just one to look for options

there is also

sorry some interest in the dialogue community and most notably there was a paper last

or it's a dialogue by finding the and i mean distance

i think it was based that's papers

last year

it's that has any idea that the state tracking for flexible interaction

and is this in this paper they try to move a beyond the traditional

linear slot filling paradigm and try to handle more complex

conversations where you have different user goals and possibly across domains

so we decided are so for this work actually didn't have a proper dataset to

test their method because they there wasn't anything available

so the

modified an existing data set and so we decided to actually try to collect data

and promote this kind of work for future dialogue systems

so we collected one thousand two hundred and sixty nine human-human interactions and the travel


we also propose a new time frame tracking and the dataset is fully annotated and

publicly available at this url

so when i talk about linear slot filling what i mean it's something like this

is actually here dialogue from the dataset

and here and so the user basically gives you some constraints you want to go

somewhere from columbus it doesn't really know where

then the wizard is the agent two plays the role of the dialogue system

he proposes two options vancouver draw no then the user gives a bit more information

about his constraints

and then at the end of day and then the user asks

for information about the offers from the wizard

and that the and the user box the

one of the proposed trips

so here the user will never really changes during the dialogue it's very just drilling

down some options

and by nonlinear slot filling i mean something like this dialogue which is also from

our data is that it was able to onto to support entirely on the slides

are just cut the interesting part

so here this is a representation of the different options that the user

see the mouse you can okay

so on the left

the this is a representation of the different options and goals that the user might

have during the dialogue

so by nonlinear slot filling what i mean is that at the beginning the user

is talking about or in some going to toronto

and then and he explores a options and i think in green

but at the end of the dialog the actually decides to go back to that

you're on a trip and then

so in this case

and the user goal changes during the dialogue but the user also goes from one

able to the other and if we want to be able to actually broke the

drawing a package for this trees are we need to remember it

so let's that of into the details of the datasets freeze the domain so it's

a travel domain we had trouble packages with a round trip flight and a hotel

this is an example of a package so you had you hold our

the flights with their time and the dates

and for the hotel we had are the category which is the number of stars

we also have guessed readings on a scale of and

of one to ten and amenities and vicinity so

on the rows

those are the first one is

a bit too small to read

but it vicinity so vicinity of the hotel you have something like shopping malls museums

but is universities airports et cetera so that

the distribution

and on the o

a button graph we had the number of amenities burr hotels so the amenities could

be breakfast wifi

whether the what has a spot those kind of things

and so that for most hotels we have more than one and automatically so that

the users

had something some ground

some matter to compare to what else one against each other

and we had two hundred and sixty eight hotels and one o nine cities in


so for this dataset we hired

twelve participants to collect the entire data

are over twenty days don't our data collection last

the twenty day i'll for of the participants

it entire data collection and the other ones where hired for just one week

and each dialogue was performed ugly a chat on slack

so we had about that was a pairing up to user is

and then they can they were able to chat so when the user what spare

to a wizard you would get a task

and we generated those that is based on templates like this one

so are basically we tell the user his goal

and to generate those are tasks from the templates we just replace the placeholders for

the different entities with values that we randomly true from the database


two very the task

we actually

word error probability for each template

so for this template would say

and has a probability of additive

point five to succeed

so that means that when we actually wary the database with the entities

well fifty

present of the time it will every turn results and fifty percent of the time

it want to return results

and when it won't return results we would give to the user we would either

tell the user to close the dialogue

or we would give him some alternative like if nothing much easier constraint then tried

increasing your budget by twelve hundred


so as i said we only had twelve participants and we collected a bit more

than a thousand dialogues

so to keep it interesting for them

we tried to tell them to play roles and try to very the way they

speak to the to the wizard and to anchorage just a bit more we also

growed sound fine

templates like this one so that was at the time when pocket mango was very

popular so we told them to pretend that there are pokemon hunter and they're really

wanna go to the city because there is a very rare pokemon there and that

they should find a good package to do that


to keep it interesting we are created such templates and we then kind of

throughout the day data collection so that they would have different tasks and they did

they would they would stay engaged in the data collection

we also gave some instructions to the user to make sure that we collected dialogues

that we could use so we told them to not use too much and comments

buying but also to use some so that you know what it's data bit realistic

so we told them to make personally the lectures and


we also told them to feel free to and the conversation at any time because

we wanted them to feel like they're real users

and for that we also created some templates that would

and courage to select one of the templates words

you're a pop star you're an absolute geneva and you want accept anything under five


so sometimes you know there would be we act like a different just close the

dialogue and leave so that was interesting for us to have different cases the

successful dialogues in there are lots where the user would just three

we also told them to try to spell things directly to keep not too complicated

and we told them to

try to determine what they can get for their money so that they would really

exploring the options compare the hotels and

try to figure out what's in the database

so on the wizard side so the agent

playing the role of the dialogue system at the beginning of each dialogue they get

a link to search interface that look like that

so on the left

you have although searchable fields and on the right you have the results

and for each search the wizard will always get up to ten results so from

zero to ten

and you can also see

the little tab on top

so basically what we did is that

every time the user would change i've been strange so it might so here it's

for which cd baltimore

if the user would say then okay what about to run all then we create

this search and you have so that if the user wants to go back to


the wizard can do it easily and wouldn't have to repeat the search over again

and we also gave instructions to see whether it

those where whites

critical for us to be able to have a dataset where we can actually try

to imitate the wizard behaviour

so we told them to be polite and not jump

and on the role played by the user

claim that a mistake

and this the start point also relates to that we told them your knowledge of

the world is only a limited by the database because we don't want the wizard

to start talking about pokemon

or things that we can't we don't wanna dialogue system to do so we just

pull them to

you know that the user is gonna play a role in be kind of funny

but try to just

talk like a dialogue system basically

i we also tell them to told them to try to spell things correctly for


and now the second point we told them to very the way a cancer

the user and we told them that sometimes

they can try to say something that is a bit impromptu so imagine if you're

having a dialogue and then the middle of it the wizard with say hello

doesn't make sense

and we did that because we wanted to have so

we have a lot of experience in training dialogue systems with reinforcement learning and the

problem with that is that if you only have

positive examples and you don't know

what a mistake looks like so something that you shouldn't do at some point of

the dialogue it's it makes it a bit hard

and as a way to

measure how

how that

was there are in the in the dataset we ask

the user to read the dialogue at the end of each dialogue

and we told them to base the rating only on the wizard behaviour so if

they didn't get any results because there wasn't any result in the database

but the wizard was helpful and we told them to give a maximum score

so we had suppose on the scale of one to five and those are available

as the dataset

and as we can say as we can see there are a few most of

them have

the maximal score of five but somehow

lower scores because the wizard was not completely operators and the actions that were not

very helpful

then other statistics of the corpus this is the proportion of dialogue

through dialogue length so number of turns in a dialogue as you can see


for of the dataset is around

fifteen turns bird the averages that fifty turns per dialogue so even though we have

only one thousand three hundred sixty nine dialogues we have about twenty thousand turns in


a then this is the number of dialogue act

this is the distribution of dialogue act types in the dataset so we had about

twenty dialogue act types

and the number of dialogue acts per turn so during one turn because it's human

dialogues and

there was more than one dialogue act per turn very often as you can see

about three percent of the time

there is more than one dialogue act type opportunity


that is that isn't in frames so once a frame but we

so and i said what we really want to do is

remember everything that the user has

tool this during the dialogue so that we can

get back to one option if the user decides to put that option in the


so we took inspiration from state tracking and the definition of a state and a

dialog state tracking challenge in this challenge they define the state by the user constraints

and at the user requests so everything that the user's task if he asks for

the price or for the

the name of the what out that that's a request

and we also added things that we

saw in the dataset and that we needed

one is user binary questions so those are questions where you have

so the user is

a request is like the user is asking for price

a binary question is when the user asks is the price

two thousand dollars for instance so that's the yes no answer

and we also had comparison request

where the user as

to compare something between two or tells you can ask if there is what do

a cheaper than hotel be for instance

and so those are examples of frames and the how their related so those two

hotels are children of the

the bowl


as you can see

and something you in our dataset is that

frames can be created by users but also by whether it's so every time the

wizard makes a proposition for hotel we create a frame because we want to remember

it in case the user wants to book this hotel

so we had a we

made up a few rules for frame creation after analysing the dataset and seeing what

makes sense

and for frame creation

we create a new frame every time the user changes a value so here at

the beginning the user is to go to atlantis so that's one frame

and then on these are utterance the user asked to go to never land and

sold or destination cities change the we create an you separate frame with this value

for the destination city

actually changes a more entities here but we need to just have one tend to

change to creating you frame

and so that's one type of frame creation but we also create a new frame

one the wizard makes a proposition for hotel and we put in this frame all

the properties of the hotel

so that gives you are frequencies of those behaviours

in the dataset

as for changing frames

as you can see it's all user controls

because we want

the wizard to really be an assistant and

just a dialogue system to really be an assistant and propose things but then the

user controls what we're talking about the user controls the topic and the

in the dialogue so the user or only has the power to change the frame

that were talking about

and so that happens

which in you frame when the user proposes a new values a leafy changes the

destination city then we automatically switch to that new frame

if the user decides to consider an option a hotel and ask more information about

those this option then we also switch to that option is a frame corresponding to

that option

and we can also switch to an earlier frame if the user says for instance

and the dialogue that actually earlier okay let's go back to toronto package then we

switch to the frame corresponding to the toronto package

we also have annotations for dialogue acts and slots

so the dialogue acts

we have general purpose function still kind of typical dialogue act inform offer compare

we also have dialogue act specific for frame tracking with the which is which frame

that in the case when the user switches to are a frame

then a for the slots we have all the fields in the database we also

have specific ask the slots describing specific aspects of the dialogue

while one is intense so the intent of the user is to book for instance

action is their counterparts on the on the wizard side so the wizard book a

hotel we annotated as action equal book

and count is when the user gives the number of hotels in the database corresponding

to the user constraints are sometimes the wizard will they i have stream or tell

them about a more since the we would

we would annotated with count peoples three

and then we have specific


to report

the creation and a modification of that of a frames

so we actually

automatically annotated the frames and the content in the under frames based on those slots

so those slots are it for each new frame we give a to a new


reference so every time the user preferences the past frame

and read and write

so i'm gonna go faster here

so that's an example of how we used read and write

for read it's

basically it so we sorry wherein frame five here the "'cause" the active frame is

frame five

but the wizard five talks about

values that were provided in frame for so reread those values from frame for and

we would put them in figure five

and for right it's on the last utterance

duh wizard provides new information

about a frame that we already talked about before so we write this information and

the preview in frame for

even though we're the currently active frame is

the frame number six a basis it's a bit

complicated like that but

it's basically a way to track of all the values and then


populate the content of the frames

so i statistics are some statistics of frame changes in the dataset

the average number of frame changes

created per dialogue is six point seven

and the average number of frame switches is a three point

fifty eight and we get a we have a lot of variability between the daleks

as you can see here

so we observe do the behaviour that we wanted to observe

we also trying to see so we had five experts annotating the dataset and we

evaluating how well they agreed on the annotation

and we got a reasonable agreements

so we propose baselines with for this dataset one is an nlu baseline that was

choose to you kind of how hard piano your task was

we adapted model from arnold and colleagues published in twenty sixteen

and we predict dialogue act type and slot

and slot values and we get about eighty percent accuracy so

it's all already pretty good but there is room for improvement

so for frame tracking ripple for the task

so if you want to create a dialogue system that's gonna be able to


in memory all the frames talked about during the dialogue you'll have to do it

to create the frames dynamically as throughout the dialogue but we decided to take the

first step

of having a simple task

so if you know all the frames created so far you have the new user


and the nlu annotation for this user utterance so you know the dialogue acts in

the slot types

and the task consists of for each


find the frame that it references so here for instance

that's efficiency nipples mine reference to frame number one

budget a post you cheaper actually makes was created new frame

and flexibly view of the steeple true refers to the current frame

are we proposed a rule based baseline that was very simple and that we just

we just observed some behaviour in the and the dataset and so we propose a

very simple baseline so basically if the user can forms a new value we create

a new frame

we switch to a previous frame if we find the mouse is that the user

is talking about in one of the previous frame

and basically

very simple rules are those of some for

switching to frames

and of so the performance was bad because rules are not enough to do this


we kind of breaking down based on

different cases and the dataset so it

for frame switching

if the user provides a slot so it's as they are let's go back to

toronto package

then we get about forty five percent performance

if the user replies to a previous frame but without specifying a specific slot

then it's harder because we don't it's harder to understand what the users talking about

after a wizard after the wizard proposes a hotel so that after an offer

most of time the user will ask for more information about this hotel so

very often we would switch to that frame so what that's easier also to predict

and it's easier than one there is no offers so we get a lower performance


and for frame creation we can predict that no frame is greeted but it's harder

to predict when the frame is created

and as followup work we

okay so we had a paper was the better model that

outperform the baseline by a lot

we presented it workshop at a c l very recently

and so to conclude this is the new human dataset to study complex state tracking

we have turn level annotation of dialogue act slots and phrase we also propose a

new task which is frame tracking and some baseline

thanks for your attention

the first minutes for questions

fixed would talk could utilize the language variability

but it's a few but anyway

over one thousand dialogues actually the user actually filled or increasing the

so by just eyeballing we didn't really

compute anything but by just looking at the dialogue they really playing the really get

into a they play the roles and they just change their language sometimes it goes

from very polite to more

like young speaking it there's a lot of variability thanks to

possible to combinations so it is to monitor from you to see would be

to generate will fall

so it's of combinations able to do something over it sorry if

but only from you

to work well

so that's

that's something we decided not to deal with the we actually asked to always talk

about one thing at a time

but with the true for example the system

is it should have seen from small words


we would have would have all right

to thank you for interesting to before i just quickly you and the u

three point but the among the you can use you pixels detailed results tools to

promote collagen dreams

so we record all those urges and that the end result of the such as


that's an idea that we had we have we haven't really try to see if

it's really reliable but


everything was not searchable database as well so that's probably had and we're actually

that something when it's a we're collecting more dialogue right now to make it bigger

and now we're gonna make all the field in the database searchable

so that we can record of those searches and then do something like that

just one more question

all clusters let's take the speaker again