0:00:15i i'm constantly and
0:00:18or with that
0:00:21and
0:00:23i'm into it
0:00:27and i'm willing to reduce the news that does go framework for a that it
0:00:32was that attracting it is called task when it is
0:00:37to handle flexible interaction
0:00:40though this is my online and after
0:00:43giving a brief introduction i'm i will get to some challenges we one addressed in
0:00:49the store
0:00:50and the approaches to be used to solve the problems
0:00:54and i'm gonna show some experimental results on benchmark test there's that and then our
0:01:00conclude my talk and if the time permits
0:01:03then i will
0:01:05peeper to some technical details one task for in parsing
0:01:10so we have seen a lot of one recent advances in statistical devastated tracking
0:01:16and
0:01:18of the next thing is that many algorithms have been shown
0:01:22to be active a divorce the result from search task is
0:01:26bottom so we got a really a robust to systems to some noises errors like
0:01:33asr errors in a style varies
0:01:35but they are usually limited just some you know session based simple task simple a
0:01:41corpus the dialogues
0:01:44given the
0:01:46servers in the interest in conversation agents and the enormous use cases
0:01:52it seems like necessary to extended the previous
0:01:58but those to handle multiple task is with a complex calls
0:02:02in just like long interactions task
0:02:07so let me talk about the set of challenges to be want to address your
0:02:13and our approach is to solve them
0:02:15and the first the challenge you be complex schools
0:02:18so what i mean by complete score is the
0:02:22any combination of
0:02:24positive and negative how constraints
0:02:27so in
0:02:28restaurant
0:02:29finding domain
0:02:30you can say italian or french but not tie such kind of thing and the
0:02:35approach we take is a very straightforward
0:02:38we just to do
0:02:40constant a level
0:02:41belief tracking rather than a slot level tracking
0:02:45though a second challenge is to handle complex input from distributed et al use
0:02:51so
0:02:51the complex input includes not only complex calls about multiple task is that the same
0:02:56time
0:02:57so we introduce some new
0:03:00a problem
0:03:02were concept of called task frame parsing to address this challenge
0:03:08though
0:03:09to scale the conversational agent platform or we usually adopt a distributed architecture so in
0:03:16this architecture we have on several
0:03:19numerous actually a service providers or without their own slu and sort of as a
0:03:25components and we also provide you know on a rate of a common components like
0:03:31slus and task is
0:03:33and then down the when user input outcomes in it would especially to all these
0:03:39components and its component to will return
0:03:42on their own interpretation so it's the
0:03:45platform to where is duck a possibly completing
0:03:49a semantic interpretations to come of it though a coherent
0:03:53semantic interpretation
0:03:56so for example when user says connections from soho to me the town and on
0:04:00p m
0:04:01italian restaurant near times square and have friendly coffee shop
0:04:06then
0:04:08the trends it a domain will detect just so as a form
0:04:13we did town it's to one p m as
0:04:15time and so on
0:04:18it goes the similarly for local domain that's as well
0:04:23so
0:04:23at you can see there are completing slots
0:04:27and
0:04:28what we wanna get is
0:04:31a list of coherent
0:04:34task
0:04:35a frame parser is like this to it could samples
0:04:39so the first parse
0:04:43identified a first
0:04:45three spans as the trains it task frame and it also has to more a
0:04:51local task is
0:04:54and it's a probably right so it gets a high score like to point eight
0:04:59and the second one is a less likely and it one you have to local
0:05:04ask frames and you get so variable scored like to point two
0:05:11so we call this process as task frame parsing
0:05:16so we use
0:05:19beam search using mcmc with the simulated annealing
0:05:23umbilical many algorithms to be can use
0:05:27to do this you press we chose to use this method because it allows us
0:05:32to integrate hard constraints with the power probabilistic reasoning very easily
0:05:38so i'll one you get sample of hard constraints will be mutually exclusive in yes
0:05:43recently is
0:05:46to
0:05:48either way act items with the same span cannot be on used at the same
0:05:53time that's kind of constraints we one mean
0:05:56and to do probabilistic reasoning
0:06:00we use the normalized global what would be a model like is
0:06:04so we can get the confidence score at the end of the thing
0:06:07and there are numerous a features are you can use so easily port our paper
0:06:12for more details
0:06:14and thus
0:06:15third challenge is about flexible
0:06:18course task management so to do this we also introduced a new concept of cold
0:06:25task greenish
0:06:27so yes i'm suppose this a situation
0:06:31so a user starts with this conversation with a two task select a weather information
0:06:36restaurant or finding
0:06:38and then she continues this composition with the transportation task and ticket booking without compute
0:06:45leading the first to one
0:06:47and they she laughed and you know ten does some meeting and came back
0:06:51and try to reach assume that a restaurant or related to task
0:06:57and then a sheep
0:06:59now finishes the restaurant booking and they moved to the transportation and ticket booking again
0:07:05and complete them
0:07:09so if we you use a traditional stack based on
0:07:14task management then you might have on several problems
0:07:17first you might not be able to do some you know or multiple passwords at
0:07:22the same time
0:07:23usually and the other problem is information loss
0:07:27so when you can't to the turn three if the system about the first restaurant
0:07:34of finding is complete then the relevant information by removing gone so you just you
0:07:41know we started a restaurant a task at turn three again
0:07:46on the contrary if the a system that it it's a cup in company
0:07:51then on the system can resume the rest want to related task
0:07:56without relevant information the past but when you get to do a time for actually
0:08:02the system should you know most of popped up to a transportation and to get
0:08:07looking to resume this a restaurant booking task
0:08:11so we it the relevant information for task of okay the task at time for
0:08:16will be gone
0:08:18so
0:08:18anyway you might suffer from information allows
0:08:23to handle this problem
0:08:25we come up with the task of the image kinds
0:08:28and there is no restriction on the number of phone a task of states in
0:08:33the task may need
0:08:34so you can from multiple cats goes and so many you want
0:08:39and also you a of the task between each grows at each turn
0:08:44so i mean whenever you we have a new turn you just add a new
0:08:49task as days
0:08:50and retrieve relevant information from that have stays in the past
0:08:55so for a transportation and ticket booking
0:08:58can get some information from rest of
0:09:00want to finding if these are task as
0:09:04are you know are related in our higher up to nine
0:09:10and turned really
0:09:11you can you know
0:09:14resume their restaurant finding
0:09:17even if the long time
0:09:19and you can retrieve the relevant information from the task of state and first turn
0:09:24out to one or without any problem
0:09:27and
0:09:28you know that you can similarly for the current for
0:09:32so basically we don't removal or abandon any information in the past
0:09:37and you can always retrieve relevant information from the past
0:09:41and them to you know current task as there is to give you the ideal
0:09:45from the current focus
0:09:47and how do we do the context of matching
0:09:50or we construct context the stats
0:09:53it is very simple given this has got lineage
0:09:57we set a time window
0:09:59and then you construct a beep is that
0:10:02to collected or
0:10:04the latest belief estimates
0:10:07p for the time window
0:10:10and then you construct
0:10:13motion dataset
0:10:15and user act a set
0:10:16by collecting all question acts
0:10:19and the
0:10:20task of frame parts is
0:10:22in the time window down to
0:10:24and then
0:10:26you have an and the context the stats and based on the current
0:10:31much an act and the current
0:10:34task of frame parse you try to select which information you want to use
0:10:40to update the current apply
0:10:42so it's not just you know a bunch of a binary classifications
0:10:47so we use a lot just a regression
0:10:50and there are a bunch of other features for this task so you can refer
0:10:54to my a very well for them
0:10:58and the forward challenge
0:11:02it's about a casket disambiguation
0:11:05there
0:11:05always could be some and but in task detection
0:11:11to sort of this
0:11:13this problem we
0:11:16use
0:11:16on n-best list of the task of images
0:11:19and this a on the user's that
0:11:23i wanna put high then this could i don't interpreted as restaurant to finding or
0:11:31travel
0:11:34we have them to pass clean it is here
0:11:37and then when the user clicked by a four
0:11:41a real intention on like i saying i meant i wanna travel to try high
0:11:48then down on the you know second
0:11:51has continues will get higher score because it's a more coherent
0:11:56in this way we many cities
0:11:59the task or ambiguity
0:12:03the overall tyler was that a tracking procedure
0:12:07older consisted of three steps first we do task frame parsing still given a set
0:12:14of one possible to completing semantic frames from distributed slus
0:12:19we generate a coherent task a task of frame parses
0:12:24and then given the task frame parses we try to retrieve a relevant information from
0:12:29the past has "'cause" states in the lineage
0:12:33and it happens for each of the image
0:12:36and we use this retrieving information and the input information
0:12:41to date at the task stays at this turn
0:12:44the how do we do
0:12:46the task update
0:12:48actually
0:12:49this is one of the most a trivial task is in this framework
0:12:54because we can pick up any
0:12:58meant but was developed a so far for dialog state tracking because thus they ask
0:13:03a state update is not in part of a dialog state tracking for conventional setting
0:13:08so
0:13:09you can you know enjoy a wide range of different algorithms like a discriminative map
0:13:15of the and channel document or you know these troubled done by you know tuning
0:13:19the data
0:13:21and done you can control well between the
0:13:25i believe it has to make use of belief estimates and the wall
0:13:30observations
0:13:31so
0:13:32to make the analysis simply a simple
0:13:36five
0:13:37we actually just about that
0:13:39the a generative rulebased the manifold of from to go at all
0:13:43and we use this algorithm for belief tracking for each slot value pair
0:13:49and
0:13:49these rules are just you know aiding the current of be the of
0:13:54well by you know i agree creating a negative and positive confidence scores
0:14:02so let's move on to evaluation
0:14:04so we used
0:14:06dstc two on to evaluate our algorithm and its based on the restaurant of finding
0:14:12domain
0:14:13and one interesting characteristic of this day that's that is
0:14:17relatively frequent user's goal changes
0:14:21so if our method working well in this
0:14:25a test dataset then the context venture
0:14:28sure of and then
0:14:30in any information for all the gold
0:14:34so the let's look at the result
0:14:37actually our mentality
0:14:39show the best performance so far one accuracy
0:14:43and
0:14:45this actually can tell you
0:14:48that the importance of better competition in tyler was state tracking problem
0:14:55and
0:14:55we got this performance to be down using any in seem it would like you
0:15:01know
0:15:01a system combination and word neural networks where
0:15:06decision trees it's just a rule based update for that's a ask a state update
0:15:13and we want to evaluate our system on more complex
0:15:18interactions
0:15:19but unfortunately there is no i sure able on data set out there so we
0:15:24i had at assimilate some datasets
0:15:28and we
0:15:29to dstc three data
0:15:31as our a base line base corpus
0:15:34and because the that
0:15:36contains multiple cats case
0:15:38like a restaurant to finding copy shop finding end up finding
0:15:42so we simulated three datasets
0:15:48with a deeper and a representative settings forced to one
0:15:52and does not have
0:15:54any other user goals a complex social goals and no multiple task is so we
0:16:00just to use that the s this
0:16:01dstc three it is itself and for a second setting we a have a complex
0:16:08usual to simulated and no multiple has goes well and down for the re
0:16:14the third dataset we have a both complex user goals and multiple task is this
0:16:19numbers are all rips task for this corpora
0:16:25so was look at the which alt
0:16:28if you look at the joint goal accuracy
0:16:32we actually compare our system with the baseline system in dstc
0:16:38and if we look at the or joint goal accuracy then the are almost all
0:16:44from a baseline system traps a very sharply from zero point five seven two zero
0:16:49point three one and zero point zero two
0:16:52well
0:16:53our system
0:16:55dropped some words and the lights your point nine
0:16:59point five nine two zero point four white and zero point three object that
0:17:04so keeping the fact that
0:17:07the task gets exponentially harder with a complete with respect to the complexity
0:17:12this is gentle reduction is a big when
0:17:16and we for their evaluate our system we don't work on results
0:17:21the t l t st
0:17:23all p uses oracle parses and t l
0:17:27yes t or uses both
0:17:29oracle parses and were local
0:17:33a context patches
0:17:35and you can see the improved results by using oracle information
0:17:40so this indicates that there's a some room for future improvement
0:17:46then we conclude my talk
0:17:49we have proposed
0:17:51new statistical dialog state tracking a framework called a task greenish to orchestrate multiple task
0:18:01is treated the complex scores across multiple domains in continuous interaction
0:18:06and it's a proof of concept we demonstrate good performance on common benchmark test datasets
0:18:13and possibly simulate dialogue corpus
0:18:17and some interesting future direction can't include stop is the use of sophisticated machine learning
0:18:23models like not gbd keyword a random for a restorative neural networks
0:18:29i'm pretty sure you can get the problem as much higher than on the problem
0:18:33was that is shown here by just using this techniques for task a state update
0:18:39and i can also i also interest the in extending this framework
0:18:46for weakly supervised learning to be used to cost
0:18:50and
0:18:51or so i'm interest the and to see some potential impact on other dialogue system
0:18:57components i provide a more comprehensive state representation likely on task of images
0:19:04okay
0:19:06i have about one minute
0:19:09that
0:19:10so
0:19:11basically
0:19:13task of revising a was like this
0:19:15given this input
0:19:18wanna go to high work or in
0:19:21then
0:19:22there are let's say there are two domain and they generate two different interpretations
0:19:28like the d o the opera to and the bottom two
0:19:33and we identify all possible for a casket frame for each dialogue act item
0:19:39and we have a special
0:19:43task frame quote in a tape
0:19:45to accommodate all unnecessary information and then to the a task is to get the
0:19:52right assignment from a dialect item to do you know why task of brains
0:19:58so the parsing algorithm
0:20:01well start with the some configuration that somehow valid
0:20:05then it moves assignment one at a time
0:20:10and i equal
0:20:12to the reason and without you know what word a scores about you can actually
0:20:16it has to do you know how proper configuration with a high score
0:20:26i think this is for my presentation thank you for i
0:20:35okay
0:20:46right sure
0:20:50right
0:20:53actually it's so done through some feature functions because
0:21:01that's a
0:21:03as you extending the task of the n is actually we keep the timestamp
0:21:09and it just times that so the feature function to match the context uses timestamp
0:21:16and so one of features
0:21:17so it
0:21:18as their context you know gets all way for their father from the current times
0:21:23that then you know you will have a last chance to fetch the information
0:21:28and
0:21:29so it's
0:21:32okay
0:21:35okay so actually it involves another notion i guess is i you know long-term memory
0:21:42and then this tomorrow about short term interaction management
0:21:45and then another my resource you marcel a long term memory management and so you
0:21:50are gonna have a more you know a perpetual memory there
0:21:54and it'll be it was so use the four
0:21:58you know some features to disambiguate board some to boost the some evidence that's kind
0:22:03of thing
0:22:04so you
0:22:05need them more memory structure
0:22:08other than just you know how short-term dynamic structure
0:22:16random of course
0:22:49i missed that the initial part of the question so
0:22:52can you can
0:22:54to everyone you have multiple in turn-holding where do you start
0:22:57probably a run all the dialogue so it it's a more about policy so given
0:23:04you know a lot of what ambiguity or a higher you know how entropy in
0:23:08your state representation
0:23:10actually you can train you know some how smart policy where there
0:23:16it's a better to ask confirmation at this time or users to assume something were
0:23:22you try to retrieve some
0:23:24you know users happy from long term memory
0:23:27so all these are you know determine the by your policy so it's a kinda
0:23:32you know another module
0:23:34that takes care of such kind of things
0:23:40small question
0:24:01i think he asking that
0:24:05i'm repeating question
0:24:06i think that the question that
0:24:09i did classification
0:24:12for each a constraint for complex calls it the right
0:24:38okay so
0:24:42what he ask is that then whether it can use us to classification to predict
0:24:48the how users
0:24:51input if the right
0:24:53intention different classifier
0:24:59to break the based is okay so i think
0:25:02is actually are shown in this or frame are but i didn't actually you know
0:25:06use that the they're kind of classifiers only actually that's a necessary part i guess
0:25:12to for scalability
0:25:14because you know if we used as the consider all possible interpretation
0:25:19for all possible slus or don't have the components
0:25:23then the complexity will explore explode
0:25:27so i just a you know to some filtering
0:25:30and so preprocessing step
0:25:32and do this the parsing
0:25:34to you know construct
0:25:36parse they can contain more people
0:25:39a task is it at one utterance but you need to classification it it's a
0:25:44little bit difficult to happen multiple has case
0:25:48and so you are you answer
0:25:51structure
0:25:53okay so terrible
0:26:13absolutely there okay i let me repeat the question so are that i have to
0:26:17use the context to integrate got a user's utterances
0:26:22actually i'm using just the corpus the so i didn't have to use the context
0:26:27the two you know understand that the user utterance
0:26:31but there have been a lot of research is
0:26:35you to try to use the context to interpret the intentions so there's no reason
0:26:40on that to use it
0:26:43right okay something again