Speech Transcript - Small Talk Improves User Impressions of Interview Dialogue Systems

0:00:14	thank you
0:00:16	my name is madonna kernel or on the proteins to depend
0:00:19	i'm going to
0:00:21	put in this talk with this title
0:00:24	well
0:00:26	i'm sorry this try to us it's almost everything but it me have been out
0:00:32	into me
0:00:33	for detecting in
0:00:35	vol the goal overall result is to build in the real data systems that use
0:00:41	that are willing to use
0:00:43	why we focus on interview data fifteen i because they can be used for collecting
0:00:48	information from humans and
0:00:51	they can organise that you permission
0:00:54	and users are expected to the scroll zero a parser information to want to welcome
0:01:00	eighty seconds on the human system
0:01:03	and
0:01:04	although below
0:01:07	interviewed items a commercial potential
0:01:11	well quite we focus on systems that use of the willing to use a because
0:01:17	some applications need to be used repeatedly all of
0:01:22	for example systems will die recording and the decoding is able to have been a
0:01:28	i mean
0:01:29	need to be used d v d three
0:01:34	their couple previous one
0:01:36	though
0:01:38	you have you need not a popular applications all the time system
0:01:44	a database source but there are couple a fist and i mean that have been
0:01:50	very useful for a defensible would have the same
0:01:54	or rating current scroll the as this temple government pencils are assigned surveys and a
0:02:02	simple five at all people's use about the future role
0:02:06	so it's us this then focus
0:02:10	manual obtaining likely than much information from users and
0:02:15	i'm you're not sure people are willing to use these distinctive utt
0:02:21	well our approach e
0:02:23	the twenty minute interview dialogue system
0:02:26	and
0:02:27	codebook
0:02:30	because that's what to carol users to enjoy a conversation
0:02:34	and also had there a couple a meter well all other studies
0:02:40	that's shows more talking of use increases the user look up on pins on an
0:02:45	engagement
0:02:46	and
0:02:48	some studies
0:02:50	so that i want to increase the price
0:02:56	or
0:02:58	there are two possible approaches to integrate well i need to deal with that of
0:03:03	the primary strategy and the sometimes in books
0:03:06	interview and the second is the to deal with interview that the primary sort of
0:03:12	the end sometime thing using
0:03:14	in that smalltalk the proposed approach needs closer to human conversations but it might can
0:03:21	into in many an utterance is because that the current technology all utterances can we
0:03:26	do not good that i
0:03:28	you mode
0:03:30	and the second approach is not are about right you have the advantage
0:03:34	that
0:03:36	it can go back to the interview you meet of multiple
0:03:40	wrong
0:03:41	so we think that second approach
0:03:46	we'll only implement each other
0:03:49	face and based on our approach it is a japanese text based interview data system
0:03:55	for direct recording
0:03:57	it asks the user all other heroes the day before
0:04:03	on the like the comedies and like this
0:04:06	all other systems that what did you have the what we proceed the data and
0:04:10	i and i have here
0:04:13	and that smalltalk
0:04:15	starts directly
0:04:19	well the objective of this system is to hold in rocky information on all of
0:04:24	what the user up on each key
0:04:27	you know they can use the of the user directly having not he did and
0:04:32	that it at all
0:04:34	the computation time t or i dunno
0:04:38	time
0:04:41	the simple
0:04:43	knowing that you by three
0:04:48	and this is architectural
0:04:50	and then explain each more do
0:04:54	the first analysis and all that if you got a mute point was on all
0:04:58	japanese well known that known
0:05:02	if you can mute one was sensible japanese and not
0:05:06	nothing the well
0:05:09	multiple other company
0:05:12	well
0:05:14	countries okay for a system also note that three hundred and the ball and things
0:05:20	and
0:05:21	their approach finding a fruit groups a corresponding to meet accomplishments chi psi the maintenance
0:05:29	one this new on form
0:05:33	and the language understanding problem
0:05:38	i press creation and semantic content
0:05:41	extraction that address
0:05:43	contracts creation
0:05:45	classify the user utterance and the three types screen in and negative out and then
0:05:51	the only thing about that
0:05:52	and the number all utterance type is more because the interview data you a bit
0:05:58	of thing
0:06:00	simple
0:06:01	and
0:06:02	we try to a system based on need classified of a comedy about
0:06:08	and we use logistic regression trees probable words pete rose to all qualities classification
0:06:15	and the semantic content extraction on a extract five kinds of information namely food and
0:06:21	drink in reading loop amount would and i'm having good
0:06:26	and we use
0:06:27	they are very high will be talking missile then the dictionary lookup
0:06:32	and the training data can to crawl up by
0:06:36	a fifty six hundred out of it
0:06:41	and
0:06:45	and i mean want to and dialogue management role in my view
0:06:50	all
0:06:52	e p the frame based dialogue management
0:06:54	and
0:06:57	the prince we like i
0:07:00	well let us assume that there is sixteen like p and the user utterance e
0:07:07	but
0:07:08	understood and the type you can i the system
0:07:13	phone lines the type is the from team on the content you like this
0:07:18	and
0:07:21	a knowledge of ac to happen that and each user found
0:07:25	to be one this mean and that's anti use put here
0:07:30	and based on this claim a the next
0:07:35	system based on
0:07:36	like to have anything at all rounds used in it
0:07:43	in anaemic screen
0:07:45	could group operation
0:07:47	four point and extracted you mean if not in an utterance
0:07:52	is that system needs to know each could group because and the it needs to
0:07:59	know
0:08:01	you
0:08:03	peering the frame that the with the name should be
0:08:07	and
0:08:10	well that system utterance i while you're right it's could groups will narrow using would
0:08:16	go a middle
0:08:18	that's one
0:08:19	like the
0:08:22	well
0:08:23	now in its roles it the system estimates the to the group using e
0:08:29	the name and thus could the names sorted on features using the logistic regression and
0:08:34	generate
0:08:35	in articulation
0:08:38	like this so this is a binary system
0:08:43	the
0:08:44	in need determined based on a video probably you but i don't mean that a
0:08:49	detailed explanation for that
0:08:54	in from all gotten in a joint ugly
0:09:00	like the
0:09:02	well that's
0:09:03	candidates for the system smalltalk utterances are selected from a predefined a four hundred what
0:09:09	into account
0:09:11	based on the type and the content all preceding you that
0:09:15	opening
0:09:17	for example when the user utterance is a problem of p and negative a more
0:09:21	utterance there already
0:09:24	and
0:09:25	the useful forty two whatever
0:09:29	were created using the based on your are on
0:09:35	we have something
0:09:37	but you also model got currently like
0:09:41	these it is my favourite fruit you know example you scroll and great need showing
0:09:48	input the and do you write sampling you know example asking a creation
0:09:57	finally item explain a about direction at that stage some order to is one utterance
0:10:03	from i liked and it's from
0:10:05	or which sentence mortal apparently
0:10:09	and
0:10:11	me how a very i we do very simple strategy
0:10:16	and the number all the important thing a mortal
0:10:21	after each user only right system based on needs fixed to n
0:10:27	so
0:10:29	in times of extremes in an exchange it's of course after a
0:10:35	and coral to the information
0:10:38	i and each star a small talk after it's randomly
0:10:43	children from county
0:10:49	we conducted a user study to
0:10:54	investigate the effectiveness of the a small talk about it
0:10:59	well we compare the three constant the first one is no used to you condition
0:11:07	that mean unique or their no other words the number of this cannot be in
0:11:12	each mortal you that's on all available
0:11:16	at this is the baseline
0:11:19	we also compare one is to use on the john and three is the condition
0:11:26	okay that's we use the you condition
0:11:30	mean the number of this came out that the in each one two three
0:11:39	we recorded it one hundred participants by a problem solving
0:11:45	and we didn't collect there are also provide function in the on each i mean
0:11:51	for the and they don't have to they've but a you know you
0:11:57	and
0:12:00	that the participants are is that i don't talk about to engage in a be
0:12:04	the fist enemy the three conditions then the overall content or not
0:12:10	after it's better they were asked to evaluate that of it in table writing on
0:12:15	a five a point you
0:12:19	the much analysis didn't answer
0:12:21	a limited to seventy three to avoid too long a conversation
0:12:28	i
0:12:30	well we what it what have a tuple or hundred but this one
0:12:36	but we found that a partition of the dialog albeit party on
0:12:44	programs that's that the
0:12:48	and it's not a matter liking that in writing
0:12:51	or else is a know how the program
0:12:56	well we use the
0:12:57	on the data or one in nine into participant
0:13:03	a basis in and like this
0:13:07	of course of noise you can be shown on the these normal in-car
0:13:12	the language understanding of home and
0:13:17	like is utterance type classification accuracy nine the one point important and semantic wanting extracts
0:13:25	accuracy is the whole point
0:13:29	okay well then you don't know
0:13:31	bad
0:13:32	and also the anybody could group estimation accuracy
0:13:37	but you for when the robot in that
0:13:42	this is not this is also you know
0:13:47	okay
0:13:47	these right examples all correctly dialogues
0:13:52	or noise you only john and one is you on john and
0:13:56	three st you condition
0:14:00	one if you only on dial
0:14:05	i don't have more or
0:14:08	shown in a rate one and o
0:14:12	also in three is the you only on dial
0:14:16	longer or
0:14:17	that's model we use forty
0:14:26	and
0:14:27	is
0:14:28	a sort and showed in user input is shown
0:14:35	well okay a related problem
0:14:39	it was
0:14:41	sort the scroll saw noise do you ones on and a blue well it's all
0:14:45	the
0:14:46	scores for one is to you only some and three
0:14:51	agreement balls so that
0:14:55	scroll three is the you condition
0:14:58	in
0:15:02	it
0:15:03	e
0:15:04	so of course last simplicity
0:15:08	a for simplicity and noise the u is the based because we there's no one
0:15:14	score
0:15:17	and we found that no one is you brought down
0:15:23	noise you cornerstone your
0:15:27	there are
0:15:31	i zero it aims at like one and what do you want to talk in
0:15:37	and library
0:15:41	and we also found that a three d you
0:15:45	is a good
0:15:49	well is to you
0:15:51	all zero i in like naturalness
0:15:55	want to talking and i've renice
0:15:59	although
0:16:00	and the
0:16:01	there are no
0:16:03	statistical significance
0:16:05	well work want to talk again and library in it but
0:16:10	the average
0:16:12	all so us to you is worse than one is t
0:16:19	well
0:16:22	then we discuss this one
0:16:24	impatient roles
0:16:25	three is the you are not a good at one is to you reading this
0:16:30	is probably big wheel including the number of local content
0:16:35	ladies the possibly yield in it and that are not than this
0:16:40	because of the probably you know
0:16:44	generating
0:16:50	appropriate model gotta
0:16:54	me a problem by
0:16:58	in the upper an appropriate initial model that the
0:17:05	well we don't only
0:17:08	so
0:17:10	alogue all three of this to you condition
0:17:13	and but you in buying deletion
0:17:17	and w found that i
0:17:20	at to pretend all the process have a small talk about it so
0:17:26	appropriate but
0:17:28	only twenty eight percent also explored a small talk about the
0:17:35	i appropriate
0:17:37	to e
0:17:40	and that's why
0:17:42	it's est you
0:17:45	no not given a good patient
0:17:50	in
0:17:50	and maybe conclude
0:17:53	this goal
0:17:57	at home if h is a like you proposed to denny
0:18:02	modal got are used to improve user input is shown all we have used an
0:18:06	existing
0:18:07	and
0:18:09	the recorded over user study using a japanese text based interview dialogues example
0:18:15	that the recording shortstop smalltalk utterances eva blowing pressure on to the user
0:18:23	it is also so this did start anything too many small talk utterances make
0:18:29	makes the user's impression words people are they greedy you want to be and it
0:18:39	it increases the possibility of anything learned to a better
0:18:45	well any future problem to a on the another is a buddy green bay you
0:18:52	how can anything small talk about that a fixed the gpu you
0:18:56	use of the system
0:18:58	or maybe
0:18:59	and will eat
0:19:01	they were all systems that you problem waiting to use repeatedly
0:19:06	but
0:19:08	the
0:19:10	user study reported in
0:19:12	these paul well you mean because the use that system twenty one
0:19:18	so
0:19:20	we need to investigate the issue
0:19:26	in another study
0:19:28	another is applied
0:19:29	and
0:19:31	and on a peaceful future work is to the robot missile role is a broader
0:19:38	direction on this phone or what
0:19:40	okay and the number all smalltalk on the fixed
0:19:45	but i think that you
0:19:49	is important to me than our number or also mortal thoughts it's great several and
0:19:54	depending on the appropriate can is so the generated
0:20:00	i mean a small talk about that and you're are currently working on
0:20:07	thank you very much
0:21:04	you understand a quick the based on he that i
0:21:10	well the
0:21:11	that to see smalltalk utterances
0:21:14	from the predefined with it
0:21:21	and three d are using a very
0:21:24	i mean
0:21:25	a simple
0:21:30	immediately it is simple
0:21:36	risk like this self training that wine the is that policies upfront even always the
0:21:42	negative odyssey
0:21:43	but
0:21:44	mm
0:21:46	all of course you know that at least a simple and
0:21:51	you need to do well on a corpus based missile two
0:21:55	ginit each
0:21:58	o appropriate that's multiple utterances and three me are trying to use various the using
0:22:07	various features like about a not only about the words but also
0:22:13	i mean
0:22:17	type of utterances and that of history and e p
0:22:23	a about enough amount of data
0:22:27	maybe we you
0:22:30	us to use
0:22:32	i mean deep running our here is the in based is able to
0:22:38	to the one multi
0:22:42	to a more most appropriate utterances of based on a dialogue context
0:23:38	so you have the statistics showing how the frequency of acceptable smalltalk remarks decreased as
0:23:47	you had second and third remarks and that seemed like a
0:23:52	possible explanation for why people prefer the one with one verses three utterances but i
0:24:00	am wondering if you
0:24:02	have the possibility to look at just the subset of cases that had more than
0:24:08	one acceptable remark and looking to see whether that had a had a different behavior
0:24:13	from the overall set of
0:24:16	three smalltalk utterances
0:24:23	you mean
0:24:26	if
0:24:27	what happens if
0:24:29	all three
0:24:31	about that is actually a right well we haven't sixty that
0:24:38	o
0:24:44	probably an excuse to look at and
0:24:50	so to divide the i mean
0:24:56	okay
0:24:57	but
0:24:58	sorry
0:25:00	but
0:25:02	dialogue
0:25:03	all the
0:25:06	also each time with very long and that there are many a small talk about
0:25:10	things and rory all
0:25:16	all its mortal
0:25:17	in one utterance in
0:25:20	all objects or there are
0:25:23	the sound quality works we are and some well doesn't or where
0:25:28	but this might be a good possibility for a following experiment specifically looking at good
0:25:33	versus not so good multi
0:25:35	i think it
0:25:38	it's good to know the user feel about for each column huh
0:25:44	by asking the another approach found to rate

Small Talk Improves User Impressions of Interview Dialogue Systems

Oral Session 6: Conversational phenomena and strategies

Takahiro Kobori, Mikio Nakano and Tomoaki Nakamura