0:00:15 | so i'm sure apparent carnegie mellon this is a collaboration work with nail i in |
---|
0:00:21 | turn |
---|
0:00:21 | and my abide alan black and alex rudnicky over there |
---|
0:00:25 | today i'm gonna talk about strategy and policy learning rate nontask-oriented conversational system |
---|
0:00:30 | so as we are now that non-task arrogant conversation systems allow people color the chat |
---|
0:00:36 | bots or social chat |
---|
0:00:38 | so the task is empower we say social chatting and then always people ask me |
---|
0:00:43 | why do we need social chatting |
---|
0:00:47 | so the motivation is simple actually |
---|
0:00:49 | so if we see that human conversations we actually use a lot of social chatting |
---|
0:00:54 | in our conversations when you're meeting someone very certain task you actually try to do |
---|
0:01:00 | some social chatting to use that presenting the conversation it talk about your weekends before |
---|
0:01:05 | you got into a meeting a genders |
---|
0:01:08 | yes come social chatting is there a certain type of conversations most abuses social tie |
---|
0:01:14 | with your coworkers was your friend of course it has all their application feels like |
---|
0:01:19 | education |
---|
0:01:20 | you want eager to turn to be social intelligent to be able to use these |
---|
0:01:24 | are kind of clusters are chatting to interleave the conversations |
---|
0:01:28 | i think health care |
---|
0:01:30 | in language learning we say that in a complex task data used in these areas |
---|
0:01:36 | social chatting that essential |
---|
0:01:39 | so there are we wanna designing a system that is able to perform social chatting |
---|
0:01:44 | and so we say we have some of the closing in mine one is just |
---|
0:01:48 | tend to be appropriate |
---|
0:01:50 | well the system to be able to go into dumps with the conversation |
---|
0:01:54 | what the system to provide a variety of answers to suited when users |
---|
0:02:00 | there |
---|
0:02:03 | well we wanna say the main goal is to make sure the system is coherent |
---|
0:02:06 | apart re in a signal turned and turn level |
---|
0:02:10 | so we just applying this of happiness that occur in the response coherence with the |
---|
0:02:14 | user utterance so we have three labels around an interpretable inappropriate or |
---|
0:02:21 | so later we're gonna use these labels to you about a girl systems |
---|
0:02:25 | their first-order we need a lot of data i don't to evaluate the system in |
---|
0:02:30 | the same time we also wanted to have are fairly easy pipeline to actually do |
---|
0:02:35 | the evaluation |
---|
0:02:36 | people have been working on the art systems a know that it's hard to get |
---|
0:02:40 | data |
---|
0:02:40 | are you one |
---|
0:02:42 | kristen is one single they don't like |
---|
0:02:44 | and user evaluation you have to have a user to interact with the system it's |
---|
0:02:49 | also very expensive |
---|
0:02:51 | so here we in order to expedite the process we average about that are taxed |
---|
0:02:56 | api so people can access the channel on web browser |
---|
0:03:02 | we can have multiple people to talk to that at same time it's multi-threaded |
---|
0:03:06 | and so we also automatically connect to the user to a rating task harder the |
---|
0:03:12 | conversation that they can rate whether certain response is a problem not we give them |
---|
0:03:17 | a whole dialogue history to review |
---|
0:03:20 | so i'll we make it open sort of both the data and the co |
---|
0:03:25 | so you can get a form i get |
---|
0:03:27 | so we also have like demos that around on a on amazon mechanical turk some |
---|
0:03:32 | more machine which re sorry the rounds |
---|
0:03:35 | twenty four hours seven days a week and so if we go over so we |
---|
0:03:40 | just gonna d a little bit so here is here |
---|
0:03:44 | years screen then you type in something for example the job losses |
---|
0:03:48 | i like me to the egg harbour we talk about music |
---|
0:03:50 | there sure |
---|
0:03:52 | what do you want |
---|
0:03:54 | what do you want to talk about |
---|
0:03:59 | there was a almost everything and you also the interaction is very easy it's a |
---|
0:04:04 | very nice way to motivate the user to interact with the system |
---|
0:04:10 | and it is also very easy way to evaluate data so we sometimes posted a |
---|
0:04:14 | mechanical turk or social networks to actually get more user |
---|
0:04:25 | there |
---|
0:04:26 | let's take a step back to you look at the previous works about task oriented |
---|
0:04:31 | system |
---|
0:04:31 | so we usually are familiar with this architecture once we get the user input |
---|
0:04:36 | we do language understanding that we going to a dialog manager used decide what to |
---|
0:04:41 | generate and the end we have system output |
---|
0:04:45 | so a lot of work have been doing that if there is some not understanding |
---|
0:04:48 | happening in the system so some something that users that is not |
---|
0:04:53 | comprehensible for the system |
---|
0:04:55 | now a lot of people have designed conversational strategies to handle these are is for |
---|
0:05:00 | example we sing can you say that again or dummy we are very familiar with |
---|
0:05:04 | copies conversational strategies |
---|
0:05:07 | it can however |
---|
0:05:09 | there are a lot of work and |
---|
0:05:11 | allowing you numbers are can be agenda cmu have been dealing was |
---|
0:05:16 | you think on the p a tuple or the mpe to optimize the process of |
---|
0:05:21 | choosing which strategy to use that which plane globally to optimize the |
---|
0:05:26 | task completion rate so |
---|
0:05:29 | and this in the previous work on task oriented system can we you do that |
---|
0:05:32 | on down task current system |
---|
0:05:35 | so the research questions |
---|
0:05:36 | as can we d and can we develop conversational strategies to handle |
---|
0:05:41 | for example we really care about the proper in it and can we and of |
---|
0:05:45 | this you know probability nontask-oriented system |
---|
0:05:48 | and can we actually use this kind of globally plan policy to actually regulate the |
---|
0:05:53 | conversation for instance which i think |
---|
0:06:07 | re |
---|
0:06:09 | you |
---|
0:06:33 | i apologise for their pipeline |
---|
0:06:35 | question |
---|
0:07:02 | already apologised for their |
---|
0:07:03 | disturbance |
---|
0:07:05 | so we try to train trying to say that can we use conversation and design |
---|
0:07:09 | conversation strategy and conversation policies |
---|
0:07:12 | to help the non-task utterances tend to be more appropriate |
---|
0:07:16 | zero and here we design of a architecture which is very similar to a task |
---|
0:07:20 | or an system |
---|
0:07:21 | so here we phrase first about once we get the user input then we try |
---|
0:07:26 | to use some context tracking strategies that we develop |
---|
0:07:29 | and then we're going to say that we generate a response |
---|
0:07:32 | and then if their responses and the system think there were i had |
---|
0:07:37 | the system has a high confidence that the response is a good one |
---|
0:07:40 | then we just |
---|
0:07:42 | produce the system response back to the user |
---|
0:07:45 | if there is a system is not confident that's a good response |
---|
0:07:49 | and we got into you find some block and some semantic dialogue lexical-semantic strategy that |
---|
0:07:54 | we introduce lately to deal with the low confidence if that if that works we're |
---|
0:08:00 | just use that those methods to generate output if that |
---|
0:08:05 | none of the conditions trackers in these strategies and we go into or engagement of |
---|
0:08:09 | happiness strategies to actually a pretty and generate with five |
---|
0:08:15 | there are in yesterday's prosody or we also talked about you know another system which |
---|
0:08:20 | is similar to this one we also take a lot and engagement in to the |
---|
0:08:24 | consideration of the whole top |
---|
0:08:26 | process |
---|
0:08:27 | so are we talk about |
---|
0:08:29 | then we have three sets of strategies that we're gonna talk later in details about |
---|
0:08:33 | how can we make the system more appropriate now how |
---|
0:08:37 | also policy that actually and |
---|
0:08:39 | actually choose between different strategies to make the whole process in a battery |
---|
0:08:46 | in |
---|
0:08:47 | to optimize the whole process globally |
---|
0:08:50 | there we say that we have two components we're gonna talk about the response generation |
---|
0:08:55 | side and the conversational strategy selection right |
---|
0:08:58 | the rest of a how do we track context so we |
---|
0:09:02 | we have first about anaphora resolution which is like we prove that we bring mainly |
---|
0:09:07 | that problem resolution |
---|
0:09:09 | we because we wanted to make a strategy that start ninety percent of the case |
---|
0:09:15 | and so for example like to you like taylor swept |
---|
0:09:17 | which attack the tailor swept |
---|
0:09:19 | and |
---|
0:09:20 | it's a yes i like are a lot and we replace her with a list |
---|
0:09:25 | but here for the next response generation |
---|
0:09:29 | we also do response ranking with a history similarity |
---|
0:09:32 | basically we use word to back to rank the similarity between the candidates and the |
---|
0:09:37 | previous word really utterance |
---|
0:09:41 | for example take taxes i watch a lot of |
---|
0:09:44 | baseball game a whole |
---|
0:09:46 | and the units there what you like most |
---|
0:09:48 | so here that we have two candidates |
---|
0:09:51 | so why is that like tell us what's |
---|
0:09:52 | the others are like were he bounced up so here we did if we do |
---|
0:09:56 | the word two vectors similarity past we will narrow down |
---|
0:10:00 | the second one is preferred because they are more on the same hazing system in |
---|
0:10:05 | semantic |
---|
0:10:06 | then we go into your response generation methods |
---|
0:10:09 | so after we ugh consider the context and history inside and then we do their |
---|
0:10:14 | actual generation so we have two methods that we actually is |
---|
0:10:18 | and select based on the confidence |
---|
0:10:20 | one keyword which we what we're triple matrix |
---|
0:10:23 | basically we of |
---|
0:10:25 | we find the keywords in the data i'll find the user the keyword thing the |
---|
0:10:29 | user's response and a match that in the database |
---|
0:10:32 | no we're turn the corresponding response that has the highest weight |
---|
0:10:36 | aggregated weight |
---|
0:10:39 | there we use the data that would you existing interview transcript statist antenna |
---|
0:10:43 | we also collect their personal data standard using mturk |
---|
0:10:48 | the other after the there are skipped on your network |
---|
0:10:52 | model |
---|
0:10:52 | so basically it we are using encoder and decoder to decode to generate the response |
---|
0:10:59 | we all concept i don't is on sixteen in this matter |
---|
0:11:02 | basically a we have two |
---|
0:11:03 | a message i we select the most of the wonder with the highest confidence |
---|
0:11:11 | here |
---|
0:11:12 | if the confidence that high in the response generation model we just switch and the |
---|
0:11:16 | response back to the user |
---|
0:11:18 | if it is low |
---|
0:11:19 | what we gonna to you as |
---|
0:11:22 | right |
---|
0:11:45 | re |
---|
0:11:53 | apologise for the expected being the |
---|
0:11:55 | or point following when greatly |
---|
0:12:05 | like |
---|
0:12:06 | i know how well |
---|
0:12:18 | right |
---|
0:12:39 | maybe you are okay |
---|
0:12:41 | okay |
---|
0:13:05 | so here we say that we go over some lexical-semantic strategy if the confidence generation |
---|
0:13:10 | score is low |
---|
0:13:12 | then finally were talk about other one |
---|
0:13:14 | there |
---|
0:13:22 | there we designed a row or strategies for example if the user repeats and twelve |
---|
0:13:27 | we're say you already is that |
---|
0:13:29 | and if the user is very it's replying with single where we're just react to |
---|
0:13:35 | that saying like you're do say something incomplete sentence |
---|
0:13:38 | our us to have grounding and technology a routing strategies |
---|
0:13:42 | a named entity |
---|
0:13:44 | so basically we detect the name entity and try to find that in the database |
---|
0:13:48 | and knowledge base and try to your use a template to fix |
---|
0:13:52 | so for example do you like clinton which content i'm talking about bill clinton the |
---|
0:13:57 | for the do you know state |
---|
0:13:58 | or kilogram and |
---|
0:14:00 | the democratic can |
---|
0:14:02 | so we also have run to out-of-vocabulary so for example we detect there are other |
---|
0:14:07 | work average then you template to generate the sentence and the same time we update |
---|
0:14:11 | the wer recovery as well |
---|
0:14:13 | so for example you to say |
---|
0:14:15 | your very confrontational take into excel |
---|
0:14:17 | what do you mean by confrontational |
---|
0:14:20 | there we a lot of queries try to get iq value to how these strategies |
---|
0:14:24 | are doing based document annotation about a proper in |
---|
0:14:29 | we can see that mostly people think they are appropriate where there are some problems |
---|
0:14:33 | for example if the named entity the wrong |
---|
0:14:36 | then the |
---|
0:14:37 | a generative responses were not be a correct |
---|
0:14:40 | for example we also have like your other work have the words if the user |
---|
0:14:44 | is asked to using some of more casual way of spelling is that you checked |
---|
0:14:50 | are trying to confront with that and that you there is find a inappropriate |
---|
0:15:02 | she intends to you has to existing already to trigger that come strategy so if |
---|
0:15:07 | none of the conditions triggers we actually going to or engagement of province strategies should |
---|
0:15:12 | to actively try to bring that you there and the conversation |
---|
0:15:16 | zero you look into previous literature |
---|
0:15:19 | basically we find that in communication cultures and literatures active participation it's really important |
---|
0:15:28 | also like positive feedback or encouragement we mainly implement a set of strategy that |
---|
0:15:34 | goes with the active participation strategy |
---|
0:15:37 | and zero well whenever we start at a conversation we usually pick a topic to |
---|
0:15:42 | in the shape that you to the user |
---|
0:15:44 | and then we would design each strategies which with respect to the topic and so |
---|
0:15:50 | we have to that you can stay on the topic or change the topic so |
---|
0:15:53 | if we use try to stay on the topic we could tell jokes they do |
---|
0:15:57 | you know that people usually spent for more time watching sport the actual playing any |
---|
0:16:03 | initiate activity for example you want |
---|
0:16:05 | game together sometime |
---|
0:16:07 | and talk more let's talk about more about work |
---|
0:16:11 | you can also change the top |
---|
0:16:12 | for example like how about we talk about |
---|
0:16:15 | and the topics with an open question that's interesting you sure with mu some interesting |
---|
0:16:20 | news on the internet |
---|
0:16:22 | so basically we also evaluated on the five minutes of these strategies based on the |
---|
0:16:28 | you there's really so here we only use a randomly selection policy which means that |
---|
0:16:32 | whenever we find the |
---|
0:16:34 | and the generation was not a gesture generation how that's as well |
---|
0:16:38 | and the not of their lexical-semantic strategies are triggered |
---|
0:16:41 | we go over to these we randomly select one of these strategies for that |
---|
0:16:47 | and we do find some of them are doing pretty good |
---|
0:16:50 | for example like you're initiation telling more |
---|
0:16:54 | so by some of them are actually doing pretty bad for example joe so maybe |
---|
0:16:58 | five |
---|
0:16:59 | there without the contact these strategies can go wrong very much |
---|
0:17:03 | so here is one of the humble |
---|
0:19:15 | apologise again |
---|
0:19:17 | sure you make up to time they're here the paillier case we can see so |
---|
0:19:24 | take out that a lot really like politics like talk paul and there's no i |
---|
0:19:29 | don't like politics zero why that and the user i just don't like politics |
---|
0:19:33 | and second here and then goes interior a strategy that but we |
---|
0:19:38 | watch of them together sometime that i told you got all want to talk about |
---|
0:19:42 | politics |
---|
0:19:43 | basic we find there is the in more poppy nothing side of the |
---|
0:19:47 | whenever if we struck and select the strategy with our with i'll taking the context |
---|
0:19:52 | into consideration that will look into closely to the semantic context we find that user |
---|
0:19:57 | r expressing negative sentiment in rural and at this time |
---|
0:20:02 | the correct way is to |
---|
0:20:03 | pick a strategy which is that's which topic |
---|
0:20:06 | actually can |
---|
0:20:08 | handle the situation when you there is happy about sure ideal watching your |
---|
0:20:13 | so we say that we need to model the context into you their strategy selection |
---|
0:20:19 | there |
---|
0:20:20 | basically we have to use a of work we wanted to a voice it's improper |
---|
0:20:23 | in this in a proper in it |
---|
0:20:25 | then we using reinforcement learning to do the global planning so we take some of |
---|
0:20:28 | state variables which are their uncertainty and which are some of their variable so we |
---|
0:20:34 | mentioned before |
---|
0:20:35 | for example system problem is competent |
---|
0:20:38 | there is a previous utterance sentiment competent and number of each strategy executed and term |
---|
0:20:45 | position most recently used strategy so we take all these into consideration in training our |
---|
0:20:51 | marines were smelling policy |
---|
0:20:54 | we use another chat about as assimilated to train the conversation and |
---|
0:20:59 | conversation |
---|
0:21:00 | so we have a reward function |
---|
0:21:02 | which is the combination of response to prominent a conversational taps any information gain |
---|
0:21:09 | there are the purpose we already defined it |
---|
0:21:11 | and then we train their about binary classifier based on the human like not label |
---|
0:21:17 | so this automatic predictor is gonna used in the reinforcement learning training process |
---|
0:21:23 | and also the company we define conversational data sets the constructed for all utterances |
---|
0:21:28 | and your role and that keeps on the same topic we also and are on |
---|
0:21:34 | the other automatic predictor based on the human annotation |
---|
0:21:39 | and finally we have the finer the other one which is the information gain which |
---|
0:21:43 | accounts for the variety of the conversation |
---|
0:21:45 | so we just like the number of unique where and the post that you very |
---|
0:21:49 | and the system have spoken |
---|
0:21:52 | so in the end we have way we |
---|
0:21:54 | i am prickly decided to wait to you are trained and two for the reward |
---|
0:21:58 | function which we think later we well we were gonna be using a machine learning |
---|
0:22:03 | about six to train the way |
---|
0:22:05 | zero |
---|
0:22:06 | we have another to policy that we compare our reinforcement learning policy against with first |
---|
0:22:10 | of the random selection policy |
---|
0:22:12 | the other is a local greedy policy which is based on the previous three sentence |
---|
0:22:17 | sentiment to decide a strategy |
---|
0:22:19 | for example |
---|
0:22:19 | i've the user is positive in a row we can say can talk more about |
---|
0:22:23 | this topic |
---|
0:22:24 | if it's an active with which are policy as which are topic |
---|
0:22:28 | so in the end we define what we have training where we are using their |
---|
0:22:31 | reinforcement learning train piloting and testing or not |
---|
0:22:35 | with real human interacting with the system |
---|
0:22:37 | we decrease the in a problem in it |
---|
0:22:40 | we increase the computational adapted and there are totally information gain |
---|
0:22:45 | they're the conclusion and we think the conversation and strategies design |
---|
0:22:50 | a unit lexical-semantic strategies are you in a are useful |
---|
0:22:54 | and considering and conversational history is useful |
---|
0:22:57 | and integrating out also didn't user and different upstream ml models are in the reinforcement |
---|
0:23:02 | learning is useful |
---|
0:23:06 | any questions |
---|
0:23:08 | okay |
---|
0:23:31 | yes so that's a good question so we basically we do you have like a |
---|
0:23:37 | different surface form in this kind of designing this |
---|
0:23:40 | strategy |
---|
0:23:41 | this is actually our future work we wanted to actually to see how can we |
---|
0:23:46 | generate sentences was pragmatics inside of it |
---|
0:23:48 | right now it's some is based on some templates |
---|
0:23:52 | so basically we tried to use different were in different worrying about |
---|
0:23:56 | it is still templates not really a very general |
---|
0:24:02 | jury |
---|
0:24:18 | and that's a good a question so here the idea we trying to say that |
---|
0:24:22 | we trying to integrate as much as |
---|
0:24:25 | their uncertainty of the conversation into the dialogue planning definitely of all these kind of |
---|
0:24:30 | where two vector |
---|
0:24:32 | is also an extra information can get into their own strategy selection |
---|
0:24:36 | or a star for if you're spoken dialogue system asr a error is |
---|
0:24:41 | so i think you definitely if you can optimize and considering all these uncertainties instead |
---|
0:24:46 | of the dialogue system we would be better |
---|
0:24:48 | but we haven't done that yet |
---|
0:24:52 | you much states |
---|
0:24:55 | basically it there |
---|
0:24:56 | it's like expansion and the space will expanding exponentially of you had a more variables |
---|
0:25:03 | and their |
---|
0:25:08 | any other questions |
---|
0:25:30 | o |
---|
0:25:30 | and that's a good question so basically we ask the user very so we just |
---|
0:25:35 | give the using with respect to user's utterance do thing |
---|
0:25:40 | the response is appropriate coherent no not |
---|
0:25:43 | so sometimes people think or if they're changing topic is kind of right on time |
---|
0:25:48 | they think it's appropriate |
---|
0:25:50 | if it's not |
---|
0:25:51 | and they would think it saying appropriate |
---|
0:25:54 | there is totally we give them pretty broad interpretation of how appropriate it is |
---|
0:25:59 | so a lot of people do you pick context into consideration what they're waving them |
---|
0:26:08 | true |
---|
0:26:09 | true |
---|
0:26:25 | pretty well pretty right so that's why we try to in the reward function we |
---|
0:26:30 | try to and come for the variety as well |
---|
0:26:34 | in the optimisation function zero basically |
---|
0:26:38 | appropriateness is like a one aspect of making the system communicable |
---|
0:26:44 | and the others make a being a file on there being provocative or anything else |
---|
0:26:48 | could be add up on that |
---|
0:26:50 | so i think it's like a different your inbox |
---|
0:26:52 | and their variety or personalisation the something could be considered |
---|
0:27:05 | i |
---|