Speech Transcript - Lexical Acquisition through Implicit Confirmations over Multiple Dialogues

0:00:16	and then it means
0:00:17	but not accommodating rumbled university japan
0:00:20	i think i try to talk about alex context kind of ipd strong impression upon
0:00:24	is shown over multiple thereof
0:00:27	and this is done during the lack of all university and the quantity fact leads
0:00:31	to the japan
0:00:34	okay so
0:00:35	she using a dialogue systems and should allow a dyad
0:00:40	and must scotty and that's system should apply a
0:00:44	under a
0:00:45	accumulated knowledge during their ideas design example
0:00:49	i used to a safe you know they are all that and that i-vector line
0:00:53	don't take shown you to pay once for try to write
0:00:57	and another a new that it i know that dialogue i like to fly to
0:01:02	right after the tightest that the other station
0:01:05	a three d the user utterances meetings that systems should apply and this kind of
0:01:11	information about the upright airline pilots
0:01:15	nice
0:01:16	and this can be used for the future recommendation
0:01:21	not correctly this kind of tum-initial is time we prepared by fifteen db don't a
0:01:27	but do you think it should to be applied here and you adding the dialogue
0:01:32	and the we also i'm really a closed domain that talked about
0:01:36	and it's you think know ledgebased if necessary records
0:01:40	assuming the dialogue corpus including all lexical item is unrealistic
0:01:46	so i'll talk about i we want to and make that's talked about
0:01:52	i two i buy a new concept dialogue and that will reduce cost to manually
0:01:59	and knowledge base and now we are building a chat about in the food and
0:02:05	the restaurant domain
0:02:06	a laconic target is to apply two and a
0:02:10	could than a hundred and they worry
0:02:13	and the subset tractable us should be able to continue type dialog even for unknown
0:02:18	talent
0:02:20	the idea example source that you that's it i try to cook now supporting today
0:02:26	and those is that it is not supporting it and on down for that system
0:02:31	and simplest cradle simplest case john is
0:02:35	so what even actually boring but is incredibly meetings such simple hundred maybe abrupt question
0:02:42	if the variable that's
0:02:44	so on a local it is in the lexicon are shown to implicit confirmation and
0:02:50	here that system tried to acquire that on court one protocol cut their worries over
0:02:56	unknown time to form a major
0:02:59	i don't see is also an example of a single example i will try to
0:03:04	class you're going to the and that's a great unknown time interval you
0:03:09	o the system predicted
0:03:12	okay we use one protocol category although this i don't amnesty one
0:03:18	this is done by a british previous method
0:03:23	our previous method
0:03:25	i thought to use these
0:03:28	channel that in the grammar and the four types of japanese a character types
0:03:33	and a
0:03:35	it can really
0:03:37	and this may be a intimacy
0:03:40	note that that's is then generate implicit confirmation request we lacked predictive category c o
0:03:48	for example a good initial restaurant how or not on the geography
0:03:53	and if the user on like i think so
0:03:58	they probably user response
0:04:00	and that extend it only if predicted how they what we see if correct why
0:04:06	not for all a user is this use a response
0:04:11	that system a quick story did i mean that
0:04:16	these categories seem to be to show are quite then that system and a quite
0:04:21	that's not supporting the only two in canadian category
0:04:26	and this and it makes k y ix speech components are all mutually it's not
0:04:31	a good for this task
0:04:34	so
0:04:35	this is mister example of explicit confirmation but
0:04:40	there we need was incorrect a format on be the only one could argue that
0:04:45	system if the model really an italian
0:04:48	so we think that this kind of a comparison we case degraded that you that
0:04:53	extra experience
0:04:55	it's like to hand if also express
0:04:58	confirmation and it is correct
0:05:01	but very well yes so i sometimes have a much more mushroom are not purely
0:05:07	and that system asks if the mushroom are rumoured audio the italian
0:05:11	relative to yes and i think the sets explicitly expression degrade the user experience so
0:05:18	we are now a lot wrote is using a implicit confirmation
0:05:25	okay but so and to determine with that they were either correct or not if
0:05:31	this card
0:05:33	i because you use that the response
0:05:35	you use that is boundaries b is expression
0:05:38	and that includes not only simple affine what people and negative responses
0:05:43	okay these lists either you very edus samples so you that it is picked up
0:05:48	and or yesterday and the that's if their masks i want to one if you
0:05:52	to japanese food
0:05:54	they that users did not exactly well what are you talking about
0:05:58	the by using these as a responsive that system can easily recognise that of the
0:06:03	predicted after what do you want incorrect
0:06:06	on the other hand in it you example or you that it is based upon
0:06:10	goal yesterday under that extensive i wanted to stop and you the food
0:06:14	and they do that
0:06:16	i likely to
0:06:18	so this is a difficult to determine
0:06:21	and italian so
0:06:23	this is not while it does not want a cue to did i mean that
0:06:28	but it is to predict it cut they were collected or not
0:06:31	so a lot of a obvious out a our problem without is to take various
0:06:37	features into consideration and before and after that increase the complementarity with
0:06:43	and i another i
0:06:45	about the other programming to be sort of it in this one that are you
0:06:49	that do not always respond to correctly
0:06:53	so that i don't that means that there are sometimes inconsistent
0:06:58	so this is also incorrect
0:07:01	confirmation via estimate of the onion started around and the japanese with the
0:07:06	and this user response i've just
0:07:10	so
0:07:12	if a guy's if there are you gotta that this
0:07:15	is this indicates that
0:07:20	activity correct
0:07:21	they are incorrect nor its will be added into the system not it's
0:07:27	so i left second problem is old is to exploit responses over multiple dialogues
0:07:35	okay so let me someone it our proposed method is a first one is to
0:07:39	design a deficit
0:07:41	a whole machine learning based classification
0:07:44	and that consider expression i dunno simple of comedy or negative responses
0:07:51	and that miss out that exploits user utterances around i don't know that the mean
0:07:57	before and after
0:07:58	implicit confirmation request
0:08:01	and all i think on the proposed method is to exploit the determination without although
0:08:06	multiple dialogue
0:08:07	and this become possible if other system is deployed on sr for this is a
0:08:14	conventional one-to-one dialogue
0:08:16	but now building but at about and it is the ensemble
0:08:20	so that system channel interact use a multiple you that's
0:08:26	if the same content
0:08:27	so we integrate that is out and
0:08:30	user that without
0:08:34	for determining
0:08:36	without that
0:08:37	quite a pretty story
0:08:38	predicted cutter will easily collect one
0:08:42	okay so this is overview of our method o cost
0:08:46	i've that i think i explained
0:08:48	are you that some unknown town
0:08:51	i do that system generate a implicit confirmation with a
0:08:56	predicted category c and the now use a i thought this so
0:09:01	utterance
0:09:02	and it's and that system calculate the probability p w three from a single user
0:09:07	response at this point i
0:09:10	and the next that system
0:09:13	and according to their responses problem and use that's
0:09:18	that is or like this
0:09:20	then after that we calculate a major role these a probability i'm so by integrating
0:09:29	believe probabilities are be forwarding to find out of confidence major to detach with it
0:09:35	to collect one
0:09:38	okay so this evaluation so i explain to the background on the proposed method and
0:09:43	the problem now i am you explain got a log files may result in more
0:09:48	detail
0:09:49	and the data for extra and experiment
0:09:51	and the next i explained that our
0:09:54	second propose a result
0:09:56	and without on the computer we computed my talk
0:10:01	for five middle part of our proposed with
0:10:04	well
0:10:06	so you calculate the probability that the response is that it is i believe and
0:10:11	without category c for unknown time w if the covariance relative to collect a note
0:10:16	but a
0:10:18	we i introduce our notation for you wanted to
0:10:24	so you don't is therefore user utterance
0:10:28	you are containing the unknown time w
0:10:31	and it's
0:10:34	increase the controversial
0:10:36	group-based
0:10:37	including the predicted included category and you do is that this response to
0:10:43	and the here we use
0:10:45	using logistic regression for
0:10:50	pretty for determining made it is predicted category the correct or not
0:10:56	and we incorporated in table p g s
0:11:00	so for the loop is expressed you do so
0:11:06	not only affirmative or negative expression but also some of the expression and we also
0:11:13	see this you expression and its relationship with what do you wanna under u two
0:11:20	and finally we also incorporated a relationship between you want you to
0:11:26	and to decide are listed and the teachers
0:11:30	so this part of the six
0:11:33	are constant six speech as
0:11:35	under these that
0:11:38	expression in u two
0:11:40	for the for the two is a complex wanted to the baseline and we also
0:11:46	you can incorporate it either voltages
0:11:50	and the second group represent a that express shown you to an adaptive a user
0:11:56	utterance before all have actually correct
0:11:58	and that
0:12:00	the last along its relationship between you and we used that means
0:12:04	are you a way that you want you to contain
0:12:08	the same one and whatnot
0:12:10	and also featured by
0:12:14	what is a before the result hundred
0:12:17	but data collection
0:12:19	so we collected a user utterance it's before and after implicit confirmation request
0:12:25	a fast by of clauses fourteen
0:12:27	the first we ask a walk while "'cause" to encode a think this is really
0:12:33	about a specified by i
0:12:37	for example i eight by how to fold up for that
0:12:41	and so then that system responded
0:12:45	i generate input is to call have initialized an implicit confirmation request
0:12:51	that it is that correct or incorrect
0:12:54	so
0:12:56	these are requests correspond to a this specify the time so we pretty be able
0:13:01	to increase to confirmation requests a for each specified that are
0:13:07	for example italian particle for data twenty well i mean the dishes
0:13:13	and then we ask the user we ask
0:13:17	the walker to respond to do this
0:13:19	a confirmation request
0:13:22	so we pretty the other a twenty channels under their corresponding correct and incorrect if
0:13:28	which to cover image only based
0:13:30	and the we asked
0:13:32	although one hundred workouts
0:13:34	and the quality a lot of two thousand and of their own
0:13:38	and we after that we excluded embodied utterances
0:13:44	what is so this is the result of user logistic regression only ten fold cross
0:13:49	validation
0:13:51	and the we gotta that can cut their policy is correct if the probability was
0:13:57	like larger and larger than zero point five
0:14:01	and of
0:14:02	this low so the baseline this really the proportion result
0:14:06	and this
0:14:07	table shows out a confusion matrix
0:14:10	and we can see that a classification accuracy improved
0:14:14	and
0:14:17	especially no precision of the detection of the product cut a woody
0:14:24	improved
0:14:25	and this the most significant feature was if able that the you might include the
0:14:31	cut they were eating use it is one
0:14:33	so that means and that same topic if the shared what the shared
0:14:38	in
0:14:39	the u one and this one
0:14:42	and that it is you insignificant a feature but not that it's a user included
0:14:48	start of it
0:14:50	then it in this result shows that proposed the p to improve the detection of
0:14:54	incorrect categories
0:14:57	what is needed to move along the next
0:14:59	that second problem in front
0:15:04	for this is a position she are so we take great it's the
0:15:10	probabilities and the
0:15:12	i integrate
0:15:13	that
0:15:14	probabilities
0:15:17	so this continuous major is to determine collect cut they what is wrong in the
0:15:21	user responses
0:15:23	so easy a also used a logistic regression
0:15:27	so we actually we test it as a regression function such as a random forest
0:15:32	additional buttons we showed that it up it out of the logistic regression
0:15:38	i don't we use this by the feature list at each year
0:15:42	and this undercutting what we see a very valid have correct one time with w
0:15:47	a range that computers the major xt does three shows so we change it is
0:15:51	a shorter the value and the if corpus it
0:15:57	exceeds a threshold
0:15:59	the system channel at the same
0:16:01	to that's if they would knowledge base
0:16:05	what is so this is a conditional so we use of that same data
0:16:10	as i explained before
0:16:12	and the we divided them into to rate the training and test with it
0:16:17	to make that experience perfectly open use a block on the request
0:16:22	and the we selected in this policy is happening with that probability a longevity from
0:16:28	a forty nine or forty eight one of the discourse in that we have all
0:16:33	a lot of data fit
0:16:34	and the daily all that in the feature value problem that in response it according
0:16:39	to the computer vision
0:16:42	and this
0:16:44	i really show the result
0:16:46	and this me six we present a fast and that the recognition performance improvement by
0:16:52	multiple user responses the second one is how many sports event need to acquire these
0:16:57	are correctly
0:16:58	they were
0:16:59	the third one is how to fit furniture for constant
0:17:04	well this is that it out for the last question
0:17:08	so we introduced
0:17:12	e but so that the meantime break even point until indicating
0:17:16	it indicates that parting mean precision rate is equal to the recall rate
0:17:22	so i received operational and recall car and the we can see that p b
0:17:28	e p value all and do not have to while larger than
0:17:35	the top and recall while so
0:17:39	that point and the diva any the larger than two
0:17:43	so this means that added to the user response is i had able to improve
0:17:48	the logistic regression functional
0:17:51	performance
0:17:52	and the two on the to determine if the predicted categories collect on
0:17:59	under the second question is how many user responses are needed
0:18:03	so i wrote it
0:18:07	the increase in to be p-value i do that
0:18:11	a horizontal axis is done about in
0:18:15	so this pdf on this graph shows that increases in viterbi peabody while that's right
0:18:22	in the in the one small
0:18:24	so this indicates that it is worthwhile to ask them why users
0:18:29	for
0:18:30	that's a that's what you that's
0:18:33	implicit confirmation request
0:18:36	and the we all we can also see that the deep they diminished in
0:18:41	become between class
0:18:43	so this means that the d needed to be done problem asking what use
0:18:49	the final question is how can we fit to the threshold
0:18:52	so we think that hype original data are required because
0:18:57	that systems should avoid applying incorrect information
0:19:01	so we set high spatial so that the pressure data becomes also almost a one
0:19:06	and the we predict recall rate in this day
0:19:11	so we can see that the recall rate for indy five
0:19:15	well as a zero point two one table one seven five
0:19:20	so it is another all but we think but the old because
0:19:25	we want to avoid a writing imported incorrect information
0:19:30	and we also see that article recall rate but it only increase if we've
0:19:35	so this means that substandard high threshold you know that system to a quite more
0:19:40	categories along with high pressure right
0:19:44	okay so let me somewhat i to this talk
0:19:47	so a lot of timit goal is enabling with a realise that system that allowed
0:19:54	to do that is dialogue
0:19:56	and the tackled in this paper is to determine you stuck at a forty if
0:19:59	correct or not a sort of that you wish to complement your process
0:20:04	and the we propose to the middle part by dividing a feature set and that's
0:20:09	the kind of it integrates the probability o into one complex vector
0:20:14	and the result so that performance was improved
0:20:19	so our future work is o
0:20:21	two for the party line you are used to compare the implicit and explicit confirmation
0:20:27	when you quit
0:20:28	so we assume that an implicit confirmation it's a bit
0:20:33	in that in the viewpoint of the user experience but we need to verify
0:20:38	under the second one is to incorporate the proposed effort into it prototype
0:20:43	the taurus it for your attention
0:20:51	okay so we have about that so many possible questions
0:21:04	and it's
0:21:08	okay
0:21:10	in future work
0:21:12	i
0:21:17	g u i
0:21:24	i
0:21:25	e
0:21:33	i
0:21:45	i think
0:21:50	right
0:21:54	you
0:22:03	three
0:22:06	so you
0:22:09	once
0:22:18	yes it's incentive for your comment and the i think we you
0:22:22	and you want to use it to undo a we need to
0:22:28	carefully designed that
0:22:30	com experiment and
0:22:32	i only that we needed to compare the just techniques speech type and but you
0:22:37	proceed
0:22:38	i and also
0:22:40	explicit and
0:22:42	not a kind of intuitive and based centre for document
0:22:47	a question
0:22:49	so one to the to do so it is to have the system so
0:22:55	is you
0:22:58	rubber goal is to tell you not just cool
0:23:01	so that gives the user chance to say no stop
0:23:05	or just
0:23:06	not do anything so it is not as intrusive so you don't do not problem
0:23:11	but sometimes very clear that this make this assumption
0:23:15	so
0:23:16	one point is don't being with this method in that sense about
0:23:22	so intent to talk about that
0:23:26	one cory to
0:23:28	enjoying the conversation
0:23:30	and you said that is we think that this kind of are repeated based on
0:23:35	you very annoying
0:23:36	four we want to
0:23:39	and on the cheap conversational a continue and it
0:23:43	as a to do that we are introducing a implicit confirmation
0:23:49	you just a i mean i do so it is not a question is just
0:23:52	the state so the user doesn't a response you don't to the dropped it is
0:23:58	you that's
0:23:59	most of your research on
0:24:01	there are lots of the target restaurant
0:24:20	so
0:24:22	that combining the implicit and explicit intuitively if you have a high confidence
0:24:29	understanding you could use the implicit and it very well conference
0:24:34	my is the explicit
0:24:38	you know that seems a little more natural maybe getting over some of the deception
0:24:41	issues
0:24:43	the first question
0:24:44	consider you could also they can
0:24:46	these things got nothing to annoy keeping a threshold how many questions are allowed to
0:24:50	ask but it those kind of techniques
0:24:55	so i already that under a at various you on the data that dialogue strategy
0:25:02	so we just think that repeating with this kind of expressions to computationally very annoying
0:25:07	so but we need to
0:25:10	but i thought you could speechto coverage on implicit confirmation
0:25:15	so i don't
0:25:17	for convenience if one to control that confirmation process
0:25:27	okay so it is just about times so that each sensor speaker

Lexical Acquisition through Implicit Confirmations over Multiple Dialogues

Second WOCHAT Special Session on Chatbots and Conversational Agents (WOCHAT-SS)

Kohei Ono, Ryu Takeda, Eric Nichols, Mikio Nakano and Kazunori Komatani