0:00:15 | i i'm constantly and |
---|
0:00:18 | or with that |
---|
0:00:21 | and |
---|
0:00:23 | i'm into it |
---|
0:00:27 | and i'm willing to reduce the news that does go framework for a that it |
---|
0:00:32 | was that attracting it is called task when it is |
---|
0:00:37 | to handle flexible interaction |
---|
0:00:40 | though this is my online and after |
---|
0:00:43 | giving a brief introduction i'm i will get to some challenges we one addressed in |
---|
0:00:49 | the store |
---|
0:00:50 | and the approaches to be used to solve the problems |
---|
0:00:54 | and i'm gonna show some experimental results on benchmark test there's that and then our |
---|
0:01:00 | conclude my talk and if the time permits |
---|
0:01:03 | then i will |
---|
0:01:05 | peeper to some technical details one task for in parsing |
---|
0:01:10 | so we have seen a lot of one recent advances in statistical devastated tracking |
---|
0:01:16 | and |
---|
0:01:18 | of the next thing is that many algorithms have been shown |
---|
0:01:22 | to be active a divorce the result from search task is |
---|
0:01:26 | bottom so we got a really a robust to systems to some noises errors like |
---|
0:01:33 | asr errors in a style varies |
---|
0:01:35 | but they are usually limited just some you know session based simple task simple a |
---|
0:01:41 | corpus the dialogues |
---|
0:01:44 | given the |
---|
0:01:46 | servers in the interest in conversation agents and the enormous use cases |
---|
0:01:52 | it seems like necessary to extended the previous |
---|
0:01:58 | but those to handle multiple task is with a complex calls |
---|
0:02:02 | in just like long interactions task |
---|
0:02:07 | so let me talk about the set of challenges to be want to address your |
---|
0:02:13 | and our approach is to solve them |
---|
0:02:15 | and the first the challenge you be complex schools |
---|
0:02:18 | so what i mean by complete score is the |
---|
0:02:22 | any combination of |
---|
0:02:24 | positive and negative how constraints |
---|
0:02:27 | so in |
---|
0:02:28 | restaurant |
---|
0:02:29 | finding domain |
---|
0:02:30 | you can say italian or french but not tie such kind of thing and the |
---|
0:02:35 | approach we take is a very straightforward |
---|
0:02:38 | we just to do |
---|
0:02:40 | constant a level |
---|
0:02:41 | belief tracking rather than a slot level tracking |
---|
0:02:45 | though a second challenge is to handle complex input from distributed et al use |
---|
0:02:51 | so |
---|
0:02:51 | the complex input includes not only complex calls about multiple task is that the same |
---|
0:02:56 | time |
---|
0:02:57 | so we introduce some new |
---|
0:03:00 | a problem |
---|
0:03:02 | were concept of called task frame parsing to address this challenge |
---|
0:03:08 | though |
---|
0:03:09 | to scale the conversational agent platform or we usually adopt a distributed architecture so in |
---|
0:03:16 | this architecture we have on several |
---|
0:03:19 | numerous actually a service providers or without their own slu and sort of as a |
---|
0:03:25 | components and we also provide you know on a rate of a common components like |
---|
0:03:31 | slus and task is |
---|
0:03:33 | and then down the when user input outcomes in it would especially to all these |
---|
0:03:39 | components and its component to will return |
---|
0:03:42 | on their own interpretation so it's the |
---|
0:03:45 | platform to where is duck a possibly completing |
---|
0:03:49 | a semantic interpretations to come of it though a coherent |
---|
0:03:53 | semantic interpretation |
---|
0:03:56 | so for example when user says connections from soho to me the town and on |
---|
0:04:00 | p m |
---|
0:04:01 | italian restaurant near times square and have friendly coffee shop |
---|
0:04:06 | then |
---|
0:04:08 | the trends it a domain will detect just so as a form |
---|
0:04:13 | we did town it's to one p m as |
---|
0:04:15 | time and so on |
---|
0:04:18 | it goes the similarly for local domain that's as well |
---|
0:04:23 | so |
---|
0:04:23 | at you can see there are completing slots |
---|
0:04:27 | and |
---|
0:04:28 | what we wanna get is |
---|
0:04:31 | a list of coherent |
---|
0:04:34 | task |
---|
0:04:35 | a frame parser is like this to it could samples |
---|
0:04:39 | so the first parse |
---|
0:04:43 | identified a first |
---|
0:04:45 | three spans as the trains it task frame and it also has to more a |
---|
0:04:51 | local task is |
---|
0:04:54 | and it's a probably right so it gets a high score like to point eight |
---|
0:04:59 | and the second one is a less likely and it one you have to local |
---|
0:05:04 | ask frames and you get so variable scored like to point two |
---|
0:05:11 | so we call this process as task frame parsing |
---|
0:05:16 | so we use |
---|
0:05:19 | beam search using mcmc with the simulated annealing |
---|
0:05:23 | umbilical many algorithms to be can use |
---|
0:05:27 | to do this you press we chose to use this method because it allows us |
---|
0:05:32 | to integrate hard constraints with the power probabilistic reasoning very easily |
---|
0:05:38 | so i'll one you get sample of hard constraints will be mutually exclusive in yes |
---|
0:05:43 | recently is |
---|
0:05:46 | to |
---|
0:05:48 | either way act items with the same span cannot be on used at the same |
---|
0:05:53 | time that's kind of constraints we one mean |
---|
0:05:56 | and to do probabilistic reasoning |
---|
0:06:00 | we use the normalized global what would be a model like is |
---|
0:06:04 | so we can get the confidence score at the end of the thing |
---|
0:06:07 | and there are numerous a features are you can use so easily port our paper |
---|
0:06:12 | for more details |
---|
0:06:14 | and thus |
---|
0:06:15 | third challenge is about flexible |
---|
0:06:18 | course task management so to do this we also introduced a new concept of cold |
---|
0:06:25 | task greenish |
---|
0:06:27 | so yes i'm suppose this a situation |
---|
0:06:31 | so a user starts with this conversation with a two task select a weather information |
---|
0:06:36 | restaurant or finding |
---|
0:06:38 | and then she continues this composition with the transportation task and ticket booking without compute |
---|
0:06:45 | leading the first to one |
---|
0:06:47 | and they she laughed and you know ten does some meeting and came back |
---|
0:06:51 | and try to reach assume that a restaurant or related to task |
---|
0:06:57 | and then a sheep |
---|
0:06:59 | now finishes the restaurant booking and they moved to the transportation and ticket booking again |
---|
0:07:05 | and complete them |
---|
0:07:09 | so if we you use a traditional stack based on |
---|
0:07:14 | task management then you might have on several problems |
---|
0:07:17 | first you might not be able to do some you know or multiple passwords at |
---|
0:07:22 | the same time |
---|
0:07:23 | usually and the other problem is information loss |
---|
0:07:27 | so when you can't to the turn three if the system about the first restaurant |
---|
0:07:34 | of finding is complete then the relevant information by removing gone so you just you |
---|
0:07:41 | know we started a restaurant a task at turn three again |
---|
0:07:46 | on the contrary if the a system that it it's a cup in company |
---|
0:07:51 | then on the system can resume the rest want to related task |
---|
0:07:56 | without relevant information the past but when you get to do a time for actually |
---|
0:08:02 | the system should you know most of popped up to a transportation and to get |
---|
0:08:07 | looking to resume this a restaurant booking task |
---|
0:08:11 | so we it the relevant information for task of okay the task at time for |
---|
0:08:16 | will be gone |
---|
0:08:18 | so |
---|
0:08:18 | anyway you might suffer from information allows |
---|
0:08:23 | to handle this problem |
---|
0:08:25 | we come up with the task of the image kinds |
---|
0:08:28 | and there is no restriction on the number of phone a task of states in |
---|
0:08:33 | the task may need |
---|
0:08:34 | so you can from multiple cats goes and so many you want |
---|
0:08:39 | and also you a of the task between each grows at each turn |
---|
0:08:44 | so i mean whenever you we have a new turn you just add a new |
---|
0:08:49 | task as days |
---|
0:08:50 | and retrieve relevant information from that have stays in the past |
---|
0:08:55 | so for a transportation and ticket booking |
---|
0:08:58 | can get some information from rest of |
---|
0:09:00 | want to finding if these are task as |
---|
0:09:04 | are you know are related in our higher up to nine |
---|
0:09:10 | and turned really |
---|
0:09:11 | you can you know |
---|
0:09:14 | resume their restaurant finding |
---|
0:09:17 | even if the long time |
---|
0:09:19 | and you can retrieve the relevant information from the task of state and first turn |
---|
0:09:24 | out to one or without any problem |
---|
0:09:27 | and |
---|
0:09:28 | you know that you can similarly for the current for |
---|
0:09:32 | so basically we don't removal or abandon any information in the past |
---|
0:09:37 | and you can always retrieve relevant information from the past |
---|
0:09:41 | and them to you know current task as there is to give you the ideal |
---|
0:09:45 | from the current focus |
---|
0:09:47 | and how do we do the context of matching |
---|
0:09:50 | or we construct context the stats |
---|
0:09:53 | it is very simple given this has got lineage |
---|
0:09:57 | we set a time window |
---|
0:09:59 | and then you construct a beep is that |
---|
0:10:02 | to collected or |
---|
0:10:04 | the latest belief estimates |
---|
0:10:07 | p for the time window |
---|
0:10:10 | and then you construct |
---|
0:10:13 | motion dataset |
---|
0:10:15 | and user act a set |
---|
0:10:16 | by collecting all question acts |
---|
0:10:19 | and the |
---|
0:10:20 | task of frame parts is |
---|
0:10:22 | in the time window down to |
---|
0:10:24 | and then |
---|
0:10:26 | you have an and the context the stats and based on the current |
---|
0:10:31 | much an act and the current |
---|
0:10:34 | task of frame parse you try to select which information you want to use |
---|
0:10:40 | to update the current apply |
---|
0:10:42 | so it's not just you know a bunch of a binary classifications |
---|
0:10:47 | so we use a lot just a regression |
---|
0:10:50 | and there are a bunch of other features for this task so you can refer |
---|
0:10:54 | to my a very well for them |
---|
0:10:58 | and the forward challenge |
---|
0:11:02 | it's about a casket disambiguation |
---|
0:11:05 | there |
---|
0:11:05 | always could be some and but in task detection |
---|
0:11:11 | to sort of this |
---|
0:11:13 | this problem we |
---|
0:11:16 | use |
---|
0:11:16 | on n-best list of the task of images |
---|
0:11:19 | and this a on the user's that |
---|
0:11:23 | i wanna put high then this could i don't interpreted as restaurant to finding or |
---|
0:11:31 | travel |
---|
0:11:34 | we have them to pass clean it is here |
---|
0:11:37 | and then when the user clicked by a four |
---|
0:11:41 | a real intention on like i saying i meant i wanna travel to try high |
---|
0:11:48 | then down on the you know second |
---|
0:11:51 | has continues will get higher score because it's a more coherent |
---|
0:11:56 | in this way we many cities |
---|
0:11:59 | the task or ambiguity |
---|
0:12:03 | the overall tyler was that a tracking procedure |
---|
0:12:07 | older consisted of three steps first we do task frame parsing still given a set |
---|
0:12:14 | of one possible to completing semantic frames from distributed slus |
---|
0:12:19 | we generate a coherent task a task of frame parses |
---|
0:12:24 | and then given the task frame parses we try to retrieve a relevant information from |
---|
0:12:29 | the past has "'cause" states in the lineage |
---|
0:12:33 | and it happens for each of the image |
---|
0:12:36 | and we use this retrieving information and the input information |
---|
0:12:41 | to date at the task stays at this turn |
---|
0:12:44 | the how do we do |
---|
0:12:46 | the task update |
---|
0:12:48 | actually |
---|
0:12:49 | this is one of the most a trivial task is in this framework |
---|
0:12:54 | because we can pick up any |
---|
0:12:58 | meant but was developed a so far for dialog state tracking because thus they ask |
---|
0:13:03 | a state update is not in part of a dialog state tracking for conventional setting |
---|
0:13:08 | so |
---|
0:13:09 | you can you know enjoy a wide range of different algorithms like a discriminative map |
---|
0:13:15 | of the and channel document or you know these troubled done by you know tuning |
---|
0:13:19 | the data |
---|
0:13:21 | and done you can control well between the |
---|
0:13:25 | i believe it has to make use of belief estimates and the wall |
---|
0:13:30 | observations |
---|
0:13:31 | so |
---|
0:13:32 | to make the analysis simply a simple |
---|
0:13:36 | five |
---|
0:13:37 | we actually just about that |
---|
0:13:39 | the a generative rulebased the manifold of from to go at all |
---|
0:13:43 | and we use this algorithm for belief tracking for each slot value pair |
---|
0:13:49 | and |
---|
0:13:49 | these rules are just you know aiding the current of be the of |
---|
0:13:54 | well by you know i agree creating a negative and positive confidence scores |
---|
0:14:02 | so let's move on to evaluation |
---|
0:14:04 | so we used |
---|
0:14:06 | dstc two on to evaluate our algorithm and its based on the restaurant of finding |
---|
0:14:12 | domain |
---|
0:14:13 | and one interesting characteristic of this day that's that is |
---|
0:14:17 | relatively frequent user's goal changes |
---|
0:14:21 | so if our method working well in this |
---|
0:14:25 | a test dataset then the context venture |
---|
0:14:28 | sure of and then |
---|
0:14:30 | in any information for all the gold |
---|
0:14:34 | so the let's look at the result |
---|
0:14:37 | actually our mentality |
---|
0:14:39 | show the best performance so far one accuracy |
---|
0:14:43 | and |
---|
0:14:45 | this actually can tell you |
---|
0:14:48 | that the importance of better competition in tyler was state tracking problem |
---|
0:14:55 | and |
---|
0:14:55 | we got this performance to be down using any in seem it would like you |
---|
0:15:01 | know |
---|
0:15:01 | a system combination and word neural networks where |
---|
0:15:06 | decision trees it's just a rule based update for that's a ask a state update |
---|
0:15:13 | and we want to evaluate our system on more complex |
---|
0:15:18 | interactions |
---|
0:15:19 | but unfortunately there is no i sure able on data set out there so we |
---|
0:15:24 | i had at assimilate some datasets |
---|
0:15:28 | and we |
---|
0:15:29 | to dstc three data |
---|
0:15:31 | as our a base line base corpus |
---|
0:15:34 | and because the that |
---|
0:15:36 | contains multiple cats case |
---|
0:15:38 | like a restaurant to finding copy shop finding end up finding |
---|
0:15:42 | so we simulated three datasets |
---|
0:15:48 | with a deeper and a representative settings forced to one |
---|
0:15:52 | and does not have |
---|
0:15:54 | any other user goals a complex social goals and no multiple task is so we |
---|
0:16:00 | just to use that the s this |
---|
0:16:01 | dstc three it is itself and for a second setting we a have a complex |
---|
0:16:08 | usual to simulated and no multiple has goes well and down for the re |
---|
0:16:14 | the third dataset we have a both complex user goals and multiple task is this |
---|
0:16:19 | numbers are all rips task for this corpora |
---|
0:16:25 | so was look at the which alt |
---|
0:16:28 | if you look at the joint goal accuracy |
---|
0:16:32 | we actually compare our system with the baseline system in dstc |
---|
0:16:38 | and if we look at the or joint goal accuracy then the are almost all |
---|
0:16:44 | from a baseline system traps a very sharply from zero point five seven two zero |
---|
0:16:49 | point three one and zero point zero two |
---|
0:16:52 | well |
---|
0:16:53 | our system |
---|
0:16:55 | dropped some words and the lights your point nine |
---|
0:16:59 | point five nine two zero point four white and zero point three object that |
---|
0:17:04 | so keeping the fact that |
---|
0:17:07 | the task gets exponentially harder with a complete with respect to the complexity |
---|
0:17:12 | this is gentle reduction is a big when |
---|
0:17:16 | and we for their evaluate our system we don't work on results |
---|
0:17:21 | the t l t st |
---|
0:17:23 | all p uses oracle parses and t l |
---|
0:17:27 | yes t or uses both |
---|
0:17:29 | oracle parses and were local |
---|
0:17:33 | a context patches |
---|
0:17:35 | and you can see the improved results by using oracle information |
---|
0:17:40 | so this indicates that there's a some room for future improvement |
---|
0:17:46 | then we conclude my talk |
---|
0:17:49 | we have proposed |
---|
0:17:51 | new statistical dialog state tracking a framework called a task greenish to orchestrate multiple task |
---|
0:18:01 | is treated the complex scores across multiple domains in continuous interaction |
---|
0:18:06 | and it's a proof of concept we demonstrate good performance on common benchmark test datasets |
---|
0:18:13 | and possibly simulate dialogue corpus |
---|
0:18:17 | and some interesting future direction can't include stop is the use of sophisticated machine learning |
---|
0:18:23 | models like not gbd keyword a random for a restorative neural networks |
---|
0:18:29 | i'm pretty sure you can get the problem as much higher than on the problem |
---|
0:18:33 | was that is shown here by just using this techniques for task a state update |
---|
0:18:39 | and i can also i also interest the in extending this framework |
---|
0:18:46 | for weakly supervised learning to be used to cost |
---|
0:18:50 | and |
---|
0:18:51 | or so i'm interest the and to see some potential impact on other dialogue system |
---|
0:18:57 | components i provide a more comprehensive state representation likely on task of images |
---|
0:19:04 | okay |
---|
0:19:06 | i have about one minute |
---|
0:19:09 | that |
---|
0:19:10 | so |
---|
0:19:11 | basically |
---|
0:19:13 | task of revising a was like this |
---|
0:19:15 | given this input |
---|
0:19:18 | wanna go to high work or in |
---|
0:19:21 | then |
---|
0:19:22 | there are let's say there are two domain and they generate two different interpretations |
---|
0:19:28 | like the d o the opera to and the bottom two |
---|
0:19:33 | and we identify all possible for a casket frame for each dialogue act item |
---|
0:19:39 | and we have a special |
---|
0:19:43 | task frame quote in a tape |
---|
0:19:45 | to accommodate all unnecessary information and then to the a task is to get the |
---|
0:19:52 | right assignment from a dialect item to do you know why task of brains |
---|
0:19:58 | so the parsing algorithm |
---|
0:20:01 | well start with the some configuration that somehow valid |
---|
0:20:05 | then it moves assignment one at a time |
---|
0:20:10 | and i equal |
---|
0:20:12 | to the reason and without you know what word a scores about you can actually |
---|
0:20:16 | it has to do you know how proper configuration with a high score |
---|
0:20:26 | i think this is for my presentation thank you for i |
---|
0:20:35 | okay |
---|
0:20:46 | right sure |
---|
0:20:50 | right |
---|
0:20:53 | actually it's so done through some feature functions because |
---|
0:21:01 | that's a |
---|
0:21:03 | as you extending the task of the n is actually we keep the timestamp |
---|
0:21:09 | and it just times that so the feature function to match the context uses timestamp |
---|
0:21:16 | and so one of features |
---|
0:21:17 | so it |
---|
0:21:18 | as their context you know gets all way for their father from the current times |
---|
0:21:23 | that then you know you will have a last chance to fetch the information |
---|
0:21:28 | and |
---|
0:21:29 | so it's |
---|
0:21:32 | okay |
---|
0:21:35 | okay so actually it involves another notion i guess is i you know long-term memory |
---|
0:21:42 | and then this tomorrow about short term interaction management |
---|
0:21:45 | and then another my resource you marcel a long term memory management and so you |
---|
0:21:50 | are gonna have a more you know a perpetual memory there |
---|
0:21:54 | and it'll be it was so use the four |
---|
0:21:58 | you know some features to disambiguate board some to boost the some evidence that's kind |
---|
0:22:03 | of thing |
---|
0:22:04 | so you |
---|
0:22:05 | need them more memory structure |
---|
0:22:08 | other than just you know how short-term dynamic structure |
---|
0:22:16 | random of course |
---|
0:22:49 | i missed that the initial part of the question so |
---|
0:22:52 | can you can |
---|
0:22:54 | to everyone you have multiple in turn-holding where do you start |
---|
0:22:57 | probably a run all the dialogue so it it's a more about policy so given |
---|
0:23:04 | you know a lot of what ambiguity or a higher you know how entropy in |
---|
0:23:08 | your state representation |
---|
0:23:10 | actually you can train you know some how smart policy where there |
---|
0:23:16 | it's a better to ask confirmation at this time or users to assume something were |
---|
0:23:22 | you try to retrieve some |
---|
0:23:24 | you know users happy from long term memory |
---|
0:23:27 | so all these are you know determine the by your policy so it's a kinda |
---|
0:23:32 | you know another module |
---|
0:23:34 | that takes care of such kind of things |
---|
0:23:40 | small question |
---|
0:24:01 | i think he asking that |
---|
0:24:05 | i'm repeating question |
---|
0:24:06 | i think that the question that |
---|
0:24:09 | i did classification |
---|
0:24:12 | for each a constraint for complex calls it the right |
---|
0:24:38 | okay so |
---|
0:24:42 | what he ask is that then whether it can use us to classification to predict |
---|
0:24:48 | the how users |
---|
0:24:51 | input if the right |
---|
0:24:53 | intention different classifier |
---|
0:24:59 | to break the based is okay so i think |
---|
0:25:02 | is actually are shown in this or frame are but i didn't actually you know |
---|
0:25:06 | use that the they're kind of classifiers only actually that's a necessary part i guess |
---|
0:25:12 | to for scalability |
---|
0:25:14 | because you know if we used as the consider all possible interpretation |
---|
0:25:19 | for all possible slus or don't have the components |
---|
0:25:23 | then the complexity will explore explode |
---|
0:25:27 | so i just a you know to some filtering |
---|
0:25:30 | and so preprocessing step |
---|
0:25:32 | and do this the parsing |
---|
0:25:34 | to you know construct |
---|
0:25:36 | parse they can contain more people |
---|
0:25:39 | a task is it at one utterance but you need to classification it it's a |
---|
0:25:44 | little bit difficult to happen multiple has case |
---|
0:25:48 | and so you are you answer |
---|
0:25:51 | structure |
---|
0:25:53 | okay so terrible |
---|
0:26:13 | absolutely there okay i let me repeat the question so are that i have to |
---|
0:26:17 | use the context to integrate got a user's utterances |
---|
0:26:22 | actually i'm using just the corpus the so i didn't have to use the context |
---|
0:26:27 | the two you know understand that the user utterance |
---|
0:26:31 | but there have been a lot of research is |
---|
0:26:35 | you to try to use the context to interpret the intentions so there's no reason |
---|
0:26:40 | on that to use it |
---|
0:26:43 | right okay something again |
---|