0:00:14she you not good afternoon
0:00:17i am casey kennington
0:00:20currently boise state university but this is work that i did
0:00:24well i was to build a full university with along with that was long and
0:00:29and i'm gonna give my two cents on
0:00:31a continuation i guess on yesterday's discussion on personal assistants
0:00:36"'cause" we're gonna tell you a little bit about a personal assistant of that we've
0:00:39been working on
0:00:40and if you don't know what a personal assistant is your in the wrong conference
0:00:46you've heard of them you've use them and they're great i mean they their useful
0:00:51not we dialogue people aren't the only ones using and lay people are using
0:00:55quite often quite regularly
0:00:58but
0:01:00when these laypeople use these
0:01:03systems
0:01:03these dialogue systems essentially these personal assistants they do weird things with them and they
0:01:08complain about mary all things
0:01:11and so today want to talk about a few of those things and maybe make
0:01:15a approach addressing a couple of them
0:01:18one thing is that they kind of have a difficulty signalling affordances someone shorter but
0:01:23yesterday and things you can do with your e
0:01:25why doesn't need a book
0:01:28that you need to disney to be signal somehow and it shows be a lot
0:01:35of these sure speech recognition output and sometimes it's great perfect
0:01:39but you know well
0:01:41that speech recognition even if it is perfect does not you know understand
0:01:46that something else that needs to happen here
0:01:49they don't know that understood until it finally does something comes back and the results
0:01:52are
0:01:53maybe what they wanted maybe not
0:01:55another thing is the user has to expressed
0:01:58express their intended one goal
0:01:59that you have to say the whole thing wait for to get back to them
0:02:02and then they can continue wanting
0:02:05sort of like this again with the system
0:02:08looking into that a little bit more if you if you consider a
0:02:12personal assistant on a continuum like there's some one extreme you have these
0:02:17person or systems that i don't even really want to talk to you
0:02:22they
0:02:23want to its apparently easier to predict your life then it is to predict
0:02:28what you're trying to say and so groove allows trying to do this in this
0:02:31is useful
0:02:34on the other side of the continuum you have the full turn
0:02:38personal assistant that is expecting you to
0:02:40given entire intent and then it
0:02:42that was all that's understanding and you do some kind of response maybe there's something
0:02:46in the middle that would be a little bit nicer
0:02:49sub-turn little bit a little bit to the left ear so
0:02:52i say call mom and there's some sort of feedback that it understood be a
0:02:56and i know that understood me a nice to amend it and then i can
0:02:59say on speaker phone and okay good
0:03:03and we can move this may be given a little bit more to the left
0:03:06and say something call a your mom
0:03:09one speaker phone
0:03:13it's
0:03:14exactly that's what i meant to say
0:03:16so there's a little bit production it's not trying to predict your entire life it's
0:03:19allowing it to give at least part of the intent but that's doing some prediction
0:03:22error so we can maybe make our dialogue systems fit some runs continuum that's useful
0:03:27for any particular user
0:03:29we want to look at this a little bit
0:03:32really quick related work some inspiration joyce tries work on misalignment manners signalling understanding and
0:03:37others work
0:03:40on backchannels stuff on arts and
0:03:44work on goodies which we kind of are gonna do here and then of course
0:03:48lose project
0:03:50we would take inspiration from all of these
0:03:52for some reason they're not none of these people here
0:03:59but we're gonna do something using all this all of these as a sort of
0:04:03inspiration so we're gonna signal ongoing understanding
0:04:06you can agree
0:04:07assuming here of course that people have a way to display agree so this might
0:04:11not work on something like the amazon
0:04:13echo but most people have other phones with them and can use the personal assistant
0:04:18with the display
0:04:21and with it with this really backchannels don't overlap speech so for talking and its
0:04:25updating and showing them its understanding then it's not gonna have any problems importantly works
0:04:30incrementally
0:04:31that is word for word are explained that the moment a little bit more and
0:04:34it works with
0:04:36minimal or no training data
0:04:41the rest the talk is as follows i'm gonna explain our system
0:04:44and the components of it and then
0:04:47see if that system is worth its salt
0:04:50well first the system
0:04:53at first blush looks like any other dialogue system you've ever seen their speech there's
0:04:57nlu errors dialogue management there's some way to convey the it
0:05:03i'm response to the user
0:05:04user with technology in but in this case agree
0:05:07the speech recognition i'm not gonna going too much it's
0:05:10google asr we have it modularised here nicely to give us incremental
0:05:15results so word-byword it's coming back to us and we take the those that incremental
0:05:22output from the asr give it to our nlu
0:05:25and are not use working in lockstep with that so one takes a word
0:05:30and we're gonna use the in the simple incremental update model which we introduced in
0:05:33sect dial and that's in two thousand thirteen
0:05:36and without getting technical you can look at the paper if you like
0:05:40equation thing like that you can if you what you get is you don't word
0:05:45and its going to produce a distribution over slots
0:05:48and that's can be given to the dm the dm the dialogue manager gonna use
0:05:52that somehow
0:05:54with this little provision when someone utters a word
0:05:57asr gives us a word
0:05:59that is the same as more similar to
0:06:02a value that could fill a candidate slot
0:06:05then that's gonna get more credit and this is how we are able to make
0:06:09the system work with little or no training data and then build up from there
0:06:13that's no you're
0:06:16but the dialog managers taking these
0:06:19word for word the not use given this these slot
0:06:23distributions to dialogue management dialogue manager has to do something with that
0:06:28though
0:06:29in fact it's making one of four
0:06:32there are simple decisions one is
0:06:34i get a slot a look at its confidence value and what why do i
0:06:38can wait
0:06:39if it's if the confidence values well just sort of ignore it
0:06:43in particular so particular value isn't enough to make the slot the one that i
0:06:47want
0:06:49or i can select something
0:06:51is above some confidence threshold than the slot as good let's fill it with this
0:06:54value
0:06:55or to others here is we're close to that threshold
0:06:58but not quite there so let's make a clarification request and somehow display that agree
0:07:05and then of course they have to be able to confirm that request
0:07:08i want to point out here that it is here between sort of the nlu
0:07:12on the dialogue manager
0:07:14where this and pointing is done we're not doing and pointing with speech recognition that's
0:07:18just always on
0:07:20and it's here that where
0:07:22so they can stop and pause and think and what do something it'll wait for
0:07:25them to finish so they can do things in instalments so it sort of semantic
0:07:28driven and pointing
0:07:30and we can use of and i'll
0:07:32for this it's sort of rulebased at the moment but we have the provisions are
0:07:36there now for
0:07:38reinforcement learning and learning on-line to improve the system as people interact with it
0:07:43now we do we
0:07:45the dialogue manager decides which was to be filled and it says gui here's what
0:07:50the decision i've made please convey this information to the user
0:07:53and the golay you'll notice right off the bat we aren't
0:07:58obviously aren't you are designers
0:08:01but here's the here is that you turn the system on and
0:08:04this comes up it's in java script so
0:08:07and it just looks like a right branching tree and really that's all it is
0:08:10but right here you can already see what the importance as r o we can
0:08:13do these five things are nice
0:08:14i don't have to guess i'd have to play with it in figure out what
0:08:17it knows and what it doesn't know
0:08:19and so i look at this thing is a well you know i am kinda
0:08:22hungry and it will go then into the food domain and sort of open up
0:08:26the treatments a lot
0:08:28if you if you're hungry then i
0:08:30you know one where you want you know what you want and where
0:08:34you're gonna unit
0:08:35and i can say you know i'm among we first and thai food and at
0:08:38that point in
0:08:40go to the top here and
0:08:43shoulders note and read a question mark for this clarification state did you say tie
0:08:47in to the and this to me as
0:08:50into it in that it
0:08:52is trying to understand me and i have to do is say yes or i
0:08:55mean time and that would fit
0:08:56basically feel that slot which
0:09:00conveyed visually means that it just collapses that are the tree and shows like this
0:09:03so the here's a here's a frame that is filled
0:09:06and it shown visually like this
0:09:11that's our system
0:09:13recall right
0:09:16now well we did some experiments to see if that's system it was everything we
0:09:21hoped it would be and where to put some people in front of it
0:09:25though
0:09:27we want to test a couple of things about this system so we're gonna break
0:09:31it up in the basically for different
0:09:34different settings
0:09:36we want to test
0:09:38we want to see if our incremental system is better than or more useful i
0:09:42suppose than the traditional one
0:09:46so we're gonna let them play with that of first and give them a trial
0:09:49phase here's our system here some tasks to do them and get used to the
0:09:52interface and then we're gonna
0:09:55sort of move start on the very right side of the continue one where they're
0:09:58doing this
0:09:59traditional
0:10:02current turn taking full fully intend mentioning
0:10:11personal assistant
0:10:12so and points
0:10:14as usual
0:10:15kind of like the traditional personal system
0:10:17so we then we
0:10:19then move the continuum move on the continuum a little bit to the left and
0:10:23nouns incremental now we're doing some terms
0:10:25and you can
0:10:29do things in instalments
0:10:31and then we have phase three for removing that
0:10:34a little bit more to the left on a continuum answering
0:10:37now it's going to adapt to you a little bit and try to predicts and
0:10:41fill some these slots for you
0:10:44or expanded a little bit phase one acted like a standard personal assistant silence and
0:10:48pointing before they can we would even show and the asr was shown like it
0:10:52is in your standard personal system
0:10:55based to is incremental phase so they did phase one for four minutes
0:10:59and then they began face-to-face to did not display asr is just the query and
0:11:03it just was always there are showing always updating
0:11:07and the endpointing as i mentioned was done semantically
0:11:11s two and determine there was a question and we just asked them you know
0:11:14what you think about
0:11:16these different systems so there was a ten questions and we ask some you know
0:11:20that they prefer the first system the second system either or both
0:11:24and case three started this was the adaptability adaptive phase
0:11:29which is basically the same as face to with adaptation and the wayward is that's
0:11:33very simple way
0:11:35if base
0:11:36if they did it task
0:11:38basically build a slot or frame
0:11:41and they
0:11:42did that same thing again it will remember it and start to
0:11:45ask them just immediately ask a clarification so instead of saying i want this i
0:11:50want the thai food they would say i'm hungry and then it would say then
0:11:53it just have to say yes and it was shown slots for them
0:11:56and then after three times we just filling all the frame entirely for
0:12:00and also an example of that much for video card movement
0:12:03and then after face three we had another questionnaire that compared phases two three
0:12:08so here's that video
0:12:10so this is in german i'm doing this
0:12:13so if you speak your mind you apologise from my accent and so anyway so
0:12:18i'm saying something like this i'm hungry us i want to eat something around here
0:12:22maybe thai food
0:12:23and it does a clarifications are to say exactly
0:12:26and then i repeat this several times to show you the adaptability of this
0:12:30this isn't something you would do you're not gonna take your personal assistant read be
0:12:34yourself five times
0:12:36it's gonna give us a lot
0:12:38but just to show the functionality of this
0:12:45stress
0:12:49are
0:12:52i
0:12:54so
0:12:56it's filter not just one more kiss
0:13:00we are hungry and now it's also
0:13:03i feel like
0:13:05and i don't see that same thing i am hungry
0:13:09so
0:13:16and then the last time i said calmly
0:13:19if someone else
0:13:25i'm a pretty
0:13:27pretty easy going to predict yes but this is common
0:13:30it will use their people want to use these personal assistance data the same thing
0:13:33over and over again
0:13:35my brother here's an act my brother everyday twice a day all opens up as
0:13:40i phone subspace yuri
0:13:42google voice you traffic
0:13:44every day
0:13:45is it just like that and it gets the response he once in people do
0:13:49this and it could probably just pop up and shown the traffic
0:13:54where am here
0:13:55so we got fourteen participants to come and sit down with our system so we
0:14:00set them data at a table there is a
0:14:02a screen that show the task that they were to do not spend a moment
0:14:05and then there is a chat with it was a turn on its side it
0:14:08shows the gui and the gooey was this was as i showed you and it's
0:14:13it's javascript so it was in a in a web browser basically a motel what
0:14:16and then as a keyboard push a button to let them know that they couldn't
0:14:19one
0:14:21but to signal about that the task was complete rather so the tasks were like
0:14:26this there are five possible tasks call reminder
0:14:29find a restaurant leave a message or find a route between two cities
0:14:34and that asks questions icons and the task items were randomly chosen randomly chosen task
0:14:39randomly chose the slot so we want them to convey to the system and then
0:14:43there is a fifty percent chance later that the task would be repeated
0:14:48here's an example
0:14:49they were said they'd be sitting down playing with this the system and then something
0:14:53like this would pop up on the screen and that thousand or call
0:14:56peter
0:14:57and the system with then
0:15:00due to its magic then show
0:15:03google really show it's gooey and once they
0:15:06recognise that understood then they would push a button and a new task pop up
0:15:12and they were charged with doing so many of these task as possible
0:15:15because the we wanted to do this
0:15:19and not just let him play with it because the tasks
0:15:22help us
0:15:25collect some objective measures as well if we tell them we want them to do
0:15:28is many tasks as possible in the four minutes of to have to interact with
0:15:32each setting of the system then we can learn a little bit more about how
0:15:35productive they work
0:15:37so here's the other tasks they would see stuff like this
0:15:39so we have the twenty most common german names you know how to most published
0:15:43cities in germany billfold it turns out as among them
0:15:48and you know everything else part of the so there's quite a few possibilities that
0:15:52could be said here
0:15:54but again
0:15:55we didn't train this at all we just sort of type these and got a
0:15:58list of stuff and threw it into to the system important that was the end
0:16:01of it and then worked
0:16:05but here some results from the questionnaire as we get we can we can conclude
0:16:09the following based on sums some significance courses that they generally like the gucci
0:16:15they counterintuitive to use an easy and understandable
0:16:18and that was our main focus now something goal
0:16:22the grill optimistic to be taken care of locally and they did this a lot
0:16:26if a mistake if the if of slot was filled with the wrong thing they
0:16:29would immediately try to fix it
0:16:31it didn't always just push a button move on to the next task or
0:16:34there is a keyword they could say that could we start from the beginning they
0:16:37generally trying to fix it right there and it was able to do it for
0:16:40the most the time
0:16:42and they didn't generally notice that the between face to face three the incremental and
0:16:47adaptive phase they didn't really know there's
0:16:48something adapting but for those who did not which was about half of them they
0:16:52notice that was face three nineveh did get wrong and there's a listing of all
0:16:56the questions and there's more in the in the results section of the paper on
0:16:58this because of the
0:17:00this is what some things we want to highlight from that
0:17:04so
0:17:05the objective results we are these tell in interesting story so we just cut we
0:17:10just kinda that the number of tasks of their able to do in the different
0:17:14settings
0:17:15and once they get increments one adaptive variable to do quite a few more tasks
0:17:19at least they thought the tasks were complete
0:17:22and here the next the next rows frame accuracies so when all the slots in
0:17:26the framework the same as the one that we wanted them to convey in the
0:17:30task that we showed
0:17:32and the adaptive wanna
0:17:33does quite well because basis it's part of the time the slots are already field
0:17:38for them
0:17:39so it score one for google now
0:17:41i guess trying to predict your life is actually maybe easier than learning how to
0:17:45understand language
0:17:48the other to tell an interesting the more interesting story we get f-score which is
0:17:52basically maybe the entire frame wasn't correct but the this gives a and idea of
0:17:58the correctness of the slots of the frame maybe wanted to the slots were correct
0:18:01one wasn't
0:18:03and
0:18:04in this case incremental lower and then look at the time the time is about
0:18:08the same across all and this tells us that the degree was
0:18:12intuitive enough that in the in the printed
0:18:15phase where they are just playing with it in the trial phase
0:18:19they learn enough about an experience enough that they are just getting used to it
0:18:22over time
0:18:26and
0:18:28what both these rules tell kind of that story
0:18:31so it helps to be a little bit more productive especially in the adaptive the
0:18:34adaptive
0:18:36ending
0:18:37so they're kinda nice results not the most stellar thing this thing is and you
0:18:42know going to be in everyone's phone next month
0:18:46but
0:18:47like i said we didn't use any training data and it was fairly robust
0:18:55some discussion here
0:18:57our incremental personal assistant or ip a different i suppose allow users to make mistakes
0:19:02easier and sooner allow the users to interpret the state of the system's understanding
0:19:08and under the adaptive settings it allows users to be more productive you get more
0:19:12tasks done in this kind of the setting where we're driving them to do tasks
0:19:16like this
0:19:17and endpointed based on semantics not based on site
0:19:20i have a nice thing
0:19:23future work
0:19:27i mandarin is the obvious thing we have a system no training data let's interact
0:19:31with it and it should start to learn and do things better
0:19:34and the mechanisms of their siam the nlu model we have the dialogue manager we
0:19:39have all have provisions for this we just need some kind of a supervision signal
0:19:42which we have if the frames filament get sent on their happy with that
0:19:46we can give feedback now to say those utterances led to this then that should
0:19:50that should help the nlu and hope that the dialogue manager work better
0:19:53same for additive
0:19:55and better use user modelling and adaptability
0:19:58like to be improved
0:20:00also web based authoring loose does this a lot of systems other that do this
0:20:04right now it's not too bad you can after adjacent file and it'll important there's
0:20:08tools for that and is actually fairly quick and easy but where they softly might
0:20:11be nice and then of course we need to scale up to more
0:20:15larger domains degrees the bottleneck here and it's sort of a two edged sword you
0:20:18wanna show your stuff but also be able to handle lots and lots of general
0:20:23things so
0:20:25that is it thank you
0:20:33note that focus on
0:20:37if the
0:20:51right
0:20:58right like a like i said we're not ui
0:21:02experts bring us to if you're right it's gives call i guess on but what
0:21:05we have right now is sort of a max after their seven or eight knows
0:21:09that is just sort of dot the thing you have to do there is there
0:21:13and what gets shown what are the top seven that you will show and if
0:21:17those are if there's something that's not english on their then you doing something wrong
0:21:21so there's more user modelling that happens in that regard what get shown on the
0:21:24gui
0:21:26better no you would help with that
0:21:29better user model and help with the
0:21:31good question
0:21:33research future stuff
0:21:35i q
0:21:47right that i'm not that the future work i mean the way we don't the
0:21:51provisions are there are also in this you can you can click on of the
0:21:54clicking doesn't do anything about the idea is kind of like the stuff on larson
0:21:57to his gui as you can talk about the gui itself and navigate to go
0:22:01insane know why don't want any of those go down a little bit we start
0:22:04right there are some exactly
0:22:06exactly so you can flip through it put stuff and you can add something if
0:22:10it's not there that would be nice to and i guess but right and system
0:22:12in as becomes intent that you can use in the future the gui should be
0:22:15able to help with that
0:22:18okay
0:22:38right so it
0:22:42right so the common question comment was on the semantic endpointing bit of it i
0:22:49something to look at i don't have
0:22:52don't have an answer
0:22:54definitely something considering
0:23:05right
0:23:06agree
0:23:19no not
0:23:20i want to be really clear on that they're in the trial phase maybe they've
0:23:24done all the adapting they're done adapting but the system is so rudimentary and simple
0:23:30and the gui is that it doesn't it doesn't do much you know there's only
0:23:34a couple of things that it that it does they learn about a very quickly
0:23:38that's why that time to really change
0:23:40you know the average time per
0:23:43for task
0:23:45so they weren't just
0:23:46getting used to it over time because they are already used to before they even
0:23:49started the first phase that's kind of the taken thing i got from the objective
0:23:53scores
0:23:55that's something we were concerned with that's why we designed it this way
0:23:59that was i need i knew someone asks a question i'm glad somebody did exactly
0:24:03we because of the way we wanted to do the comparisons we wanted to do
0:24:08this objective comparisons and we wanted to do some objective scores and this was a
0:24:12debate we had what we ended up doing it this way with the hope that
0:24:14if we designed the right way
0:24:16you don't get used to write beginning we will have as facts and the numbers
0:24:20can show that
0:24:22i'm glad you ask that