Speech Transcript - Automatic Measures to Characterise Verbal Alignment in Human-Agent Interaction

0:00:15	so my name is a given to be some degree c and i'm currently a
0:00:19	postdoctoral researcher
0:00:22	and i'm going to present this work with a great level and phonetic nonbackchannel
0:00:29	and
0:00:30	first let me see if you were the buttons and the context of this work
0:00:34	so this work is part of the european project
0:00:39	i have a spell
0:00:40	which aims that the designing artificial which we get of information and assistance
0:00:46	and this assistant a on the form of the actual agents
0:00:51	but are it can that ever to engage in a pretty model interaction
0:00:57	involving verbal and nonverbal behavior
0:01:01	there's agents also aim at its adapting to the user
0:01:07	and adapting to for instance in expected situations such as interaction
0:01:12	as well as to this to show emotional state of the u
0:01:17	and
0:01:18	in these projects and that's to that interested in a convergence and that better alignment
0:01:25	as shown by the communication accommodation sorry
0:01:29	can value convergence of behaviour is a very important features of you menu many interaction
0:01:36	that occurs both at low level such as pos true accent speech right and that
0:01:43	high-level such as of the mental emotional and cognitive label
0:01:48	and in particular
0:01:52	human
0:01:53	human the participant
0:01:56	align the mb at all at many linguistic level such as the lexical syntactic and
0:02:01	semantic ones
0:02:05	and one consequence of successful alignments in dialogue i is a set and a repetitive
0:02:12	nice
0:02:13	and
0:02:16	as a consequence there are there is going to be a
0:02:20	some of dialog regions that are going to imagine between the dialogue participant
0:02:27	under the form of lexical items for instance
0:02:31	so on the slide you can see two example of a dialog which represent the
0:02:35	same face aging introduction every face of a negotiation
0:02:40	and in this
0:02:43	in this examples
0:02:45	the dialogue buttons
0:02:47	a core roles and their buttons are the main focus of this work
0:02:52	so on the left you can see that they are very few buttons
0:02:56	in this case we says that the available alignment is very low on the contrary
0:03:01	on the right example you can see that
0:03:03	that a participant's aligned us to may need that of routines
0:03:09	such as nice to meet you how are you good
0:03:13	in this case we are going to say that the better a alignment is higher
0:03:18	so the main focus on this work is to propose measures of the of alignment
0:03:22	based on this data which
0:03:27	so what you think about alignment for human machine interaction so first
0:03:32	we can see from human interaction and that's this is a subconscious phenomenon that naturally
0:03:38	appears and it has been shown by previous work
0:03:42	that speakers we use lexical as well as syntactic structures from previous utterances
0:03:50	on top of that
0:03:53	double and temporal alignment may facilitate successful taskoriented the conversations
0:03:59	however in human machine interaction
0:04:02	it has been shown that linguistic alignment cultures
0:04:06	and in particular are users at the lexical items and syntactic structures from the system
0:04:12	but this is only one way
0:04:15	in most of the system the user aligned with the system is not able to
0:04:20	like
0:04:23	so in this work all goal is to provide a virtual agent with the ability
0:04:27	to detect the alignment behavior of its human participant of each from an interlocutor
0:04:32	and to align or not depending on the strategy with the user
0:04:37	so them in which iteration
0:04:39	of using the about alignment for an agent
0:04:45	is set provide a natural source of evaluation in dialogue and in particular for the
0:04:51	natural language generation that
0:04:53	it also makes it possible to take into account the social emotional behavior of the
0:04:58	behaviour and works
0:05:00	as a social blue
0:05:02	and
0:05:03	it's also way of adapting results the need of an extensive user profile
0:05:10	and what we expect from
0:05:13	providing an agent with the ability of the body a line is to and this
0:05:19	agents ability likability and friendliness to improve
0:05:24	interaction naturalness as wavelet to maintain and for still user engagement
0:05:30	finally we aim at improving collaboration in taskoriented that
0:05:36	so
0:05:37	in this work or approach is to provide the majors a characterizing babble alignment
0:05:45	that are going to be based on the transcript on dialogue and on the shared
0:05:49	expression at the lexical
0:05:52	and a proposition stands on
0:05:55	i was stream in past
0:05:57	the first one is to extract
0:06:00	the dialogue routines other justices the shared expression from the dialogue transcripts
0:06:05	the second part is to be an expression lexicon from this shared expression a as
0:06:11	that's keep track of the expression and some features of these expressions
0:06:17	and then they're deriving measures of that better alignment from the data transcript and the
0:06:23	expression icsi
0:06:25	let me so if you word about the automatic building at the expression a lexicon
0:06:29	so in this work we provide a model where we define
0:06:33	a surface text but then at the utterance a shine expression as a surface text
0:06:37	but then at the utterance level that has been produced by both speakers in dialogue
0:06:42	so for instance you can see
0:06:45	i and example of dialogue
0:06:47	on the left of the slide that in the middle
0:06:50	where there is are shown expressions that's not gonna work for me
0:06:55	which is used to reject a proposition in a negotiation dialogue is that is used
0:06:59	by the interlocutor at
0:07:02	in it in that first term and by interlocutor b in the first
0:07:08	don't
0:07:09	so is a shared expression is part of the expression lexicon
0:07:14	and has been initiated by eight
0:07:18	and so in this paper we present a framework of expressions that maybe and but
0:07:24	the or not
0:07:26	and we also provide
0:07:28	way of automatically extracting is it their shared expression to be done expression next we
0:07:34	can automatically
0:07:36	so this is an instance of sequential best down mining in
0:07:42	and it involves the use of by you informatics algorithms that are usually used to
0:07:49	my in dna sequences
0:07:52	so in short
0:07:54	it is involve zeros are the reserving of the multiple common subsequence problems for the
0:08:00	generalize to fix tree data structure
0:08:03	and through this
0:08:05	base of sequential pattern mining we can be from the transcript of dialogue d v
0:08:10	a dialog lexical
0:08:13	then from the data transcript and the expression lexicon we derive some aspects for one
0:08:19	measures
0:08:21	to characterize verbal alignment
0:08:23	so the first measures a global on the single dialog
0:08:29	and now the expression lexicon size that is this is a number of a unique
0:08:33	shown expression other to establish between dialogue participant
0:08:37	and the expression by a variety which is the expression lexicon size a normalized by
0:08:43	the length of the not a given but as a number of to the total
0:08:47	number of token in the day
0:08:50	we also derive
0:08:53	measure that a specific to the speakers
0:08:57	first the expressed in the expression repetition measure
0:09:01	which
0:09:04	measure which gives the amount of token that is dedicated
0:09:09	to the repetition of an expression by the user
0:09:13	over the total amount of token
0:09:15	and the initiated the expression racial which determines for a given speaker the number
0:09:23	of expression that has been a initiated by him
0:09:31	so to study the proposed from a we present in this paper copies based contrastive
0:09:39	study
0:09:40	that stands on a real interaction copper well involving you menu man and you man
0:09:46	agent but
0:09:47	as well as artificial cover all which
0:09:52	and used as a baseline
0:09:54	and in this work we provide several a study comparing
0:09:59	the real interaction corpora right to our baseline
0:10:02	comparing a double alignment in you menu men covers and human-agent copies and also studying
0:10:07	some condition on the am an agent copy such as a negotiation
0:10:13	so let me so if you will about
0:10:15	the negotiation corpora that we are using this work
0:10:21	so this negotiation corpora
0:10:26	involve two participants is that are required to find an agreement
0:10:32	over the of the amount of
0:10:36	okay they are they have to share
0:10:39	and this negotiation task can be is a integrative that is to say that can
0:10:45	jana to be a wean for bus participant
0:10:48	all completed you
0:10:51	and
0:10:54	this couple right available in that you monuments aiding continue in the human agents sitting
0:11:01	you consume the slide an image from the human agent corpora
0:11:08	in the human-agent sitting
0:11:11	the agent is controlled by you are without of course system
0:11:15	that has been designed to be as natural as possible
0:11:19	and this was system involves more than a eleven thousand possible you challenge is so
0:11:26	the agent as a wider variety of you terence to express it's a
0:11:35	the human colour i never eighty four that a white the human-agent corpora
0:11:41	involve one hundred then fifty four down
0:11:46	from these a couple are we constructed all based about a baseline the showing it
0:11:52	corpora
0:11:53	which have been designed to break the dynamic of us interactive alignment protocol
0:12:00	and to do that we decided to break the cooking between you differences
0:12:04	so starting from a real interaction dialogue
0:12:08	what we have done is that we have k
0:12:12	all the utterances from a speaker
0:12:15	where substituting all the user utterances from the speaker from the others a speaker
0:12:20	by you two entities should which was an from one concludes
0:12:26	from sorry from there are several pull
0:12:30	but utterances are chosen randomly
0:12:32	and the prove a specific
0:12:37	for the human participant
0:12:39	the human participant facing an agent and for the agent
0:12:44	system
0:12:45	so on the slide you can see an example of real dialogue on the colour
0:12:50	and of the left
0:12:52	and one randomized version where all the utterances from the human participant had been that
0:13:00	subject you to buy a randomly choose an and jones
0:13:03	so the main idea of these corpora used to break the dynamic of interactive alignment
0:13:10	process
0:13:14	so the first one of the first hypothesis is that we are investigating in this
0:13:20	work
0:13:21	is that it's the dialogue participants should constitute a richer expression lexicon
0:13:27	in the real interaction call logs and what would happen incidentally industrial get corporal
0:13:35	in the artificial or
0:13:37	and so to investigate this it was hypothesis we looked at the expression very variety
0:13:44	measure from all model
0:13:47	and
0:13:48	what we found
0:13:50	is that there is a significant shift different difference between the you menu man
0:13:56	and so the it's at if you can talk about as well as for human
0:14:00	agent as in as and it's
0:14:03	artificial can talk about
0:14:06	in the sense that is expression body right variety is higher in the real interaction
0:14:11	copper wire than in the signal string will get one
0:14:15	so what we have observed is that's or it was is we have a provided
0:14:21	some arguments to can for this is this hypothesis is that in the sense that
0:14:27	we have observed a richer expression lexicon in the real interaction couple and then the
0:14:32	in the artificial ones
0:14:34	which have been designed to avoid
0:14:39	the interaction process the interactive alignment process and thus the constitution of expression mexico
0:14:47	then we have been interest the in the comparison of that better alignments shows a
0:14:54	measure that we propose a
0:14:57	between the human corpora corpus and the agent corpus
0:15:03	so here what we expected that we expected that moldable alignment from the human
0:15:10	in the human-agent interaction
0:15:15	then the agent the main reason is that
0:15:18	the agent even if it even if it's a was it has not been designed
0:15:23	to be able to align
0:15:24	and the second reason is that
0:15:27	the human participant may be influenced by the belief about the limitation of the communicative
0:15:32	get abilities of the agents
0:15:35	so to us to this i prissy six we looked at the initiated expression right
0:15:42	sure that we propose in a model as well as the expression repetition ratio
0:15:49	and in the human interaction
0:15:53	in terms i would that there are no differences between the two speakers in that
0:15:57	it's there is a symmetrical that by alignments
0:16:01	regarding of these two measures
0:16:04	bus dialogue participants initiate
0:16:06	approximately the same amount of expression
0:16:10	and they repeat also the same amount of
0:16:14	of expression
0:16:16	however
0:16:17	is this is not the case in the human agents and sitting
0:16:21	and we observe here
0:16:25	and estimate
0:16:28	so
0:16:29	this estimator e a
0:16:32	is
0:16:35	this end symmetry happened and
0:16:38	can be is summarized by the fact that
0:16:42	the human participants adopt more was initiated expression
0:16:48	which is not surprising because the which cannot
0:16:51	a adopt easy to use a human participant expression the human participants also they did
0:16:57	get small talk into the repetition of expression
0:17:01	so a here
0:17:03	this give some
0:17:07	arguments to say that the human participant
0:17:10	is influenced by its belief about the limitations
0:17:14	of the communicative capabilities of the agents
0:17:17	and it should be stressed that lets us test image three a does not appear
0:17:23	when considering the number of the can produce by each speaker or when considering the
0:17:27	change proportion
0:17:29	is the proportion of vocabulary
0:17:35	finally we looked at some conditioned on the human agent corpus and
0:17:42	we have mainly focus on the negotiation type
0:17:47	in we wanted to see if there was an impact
0:17:50	on the verbal alignment indicators
0:17:53	given the type of negotiation so integrative negotiation which i don't know to be a
0:17:59	wean a distributive
0:18:03	negotiation
0:18:04	which is a competitive one
0:18:06	and what we found is that
0:18:08	both negotiation type have as a similar amounts the c is a similar value for
0:18:16	the expression for it
0:18:18	that is to says that down
0:18:20	the same amount of expression
0:18:22	that are created in both dialogues but there is a clear difference in the text
0:18:28	prediction repetition ratio
0:18:30	which shows that's
0:18:32	in the competitive in the negotiation
0:18:36	dialogue participants
0:18:39	repeats
0:18:40	all and their body allowing more
0:18:42	then in wean negotiation
0:18:48	so
0:18:51	the fact what we provide here is arguments to us about the fact that it's
0:18:59	competitive negotiation
0:19:01	due to more rubber alignment and one it was this is that
0:19:07	the participants a need to be already allowing more on control proposition
0:19:14	so to conclude on in this work and we have proposed automatic and generic measures
0:19:20	of the other alignment based on sequential pattern mining at the level of stuff first
0:19:25	of texture differences
0:19:26	that makes it possible to characterize
0:19:30	interesting aspect of that by law alignment such as the reading position process
0:19:35	the degree of repetition between that a participant and the orientation of the about that
0:19:39	alignment
0:19:41	we have contrast construe a contrastive then you menu man and you men agent that
0:19:47	better alignment showing us that there is a symmetry in babble alignment
0:19:53	when a given now indicators on
0:19:57	in human interaction why there is an asymmetry in human-agent interaction
0:20:02	and this touch we wanted to evenly comfy m some hypothesis is from they need
0:20:08	to ensure
0:20:10	and the perspective that we want to explore used to used as a measure that
0:20:16	we propose in a dialogue system and should be stressed that the major based on
0:20:21	very efficient algorithm is to say
0:20:26	linear complexity algorithms
0:20:31	we would like also to investigate this
0:20:36	more the query and to do a qualitative analysis of that but alignments between a
0:20:40	human interaction in human-agent interaction
0:20:43	such as a function and analysis of the repetition
0:20:47	and finally we would like to investigate
0:20:51	that was are comparable here menu man and human-agent gabor
0:20:55	to confirm or reasons
0:20:57	thank you for your attention and i'm now ready to answer your question ratio image
0:21:14	thanks for the top i was i was wondering several things about adopt actually one
0:21:20	of them is i i'm not quite sure you said something that on
0:21:26	way of the machine adapting to
0:21:30	to the user there's nothing out there
0:21:34	you have any idea why is nothing out there because when i looked into
0:21:40	it was like slot filling kind of dialogue and that was difficult because you don't
0:21:44	have a lot of data about user
0:21:46	to make the system about two but in this kind of data it might be
0:21:50	different and also the second question is whether
0:21:52	the measures that you come up with would work got for turn level
0:21:57	so if you have the decision to change from mexico expression
0:22:00	with those words to make changes that the turn level rather than
0:22:04	several turn
0:22:06	but like rather than taking into consideration example for
0:22:11	so for the first question about that there are systems that are able to align
0:22:17	as in some interesting work and they are pointed out in the in our paper
0:22:22	the main disadvantages that most of the system i'll based out rule based
0:22:27	and specific to some domain
0:22:30	all of some tasks
0:22:32	and the idea and providing measures and used to go towards more data driven way
0:22:38	an automatic way of aligning
0:22:42	but there are some system that i module
0:22:45	and the second question
0:22:49	so if i understand you where is that if we change the granularity of where
0:22:55	we've well where we look for expression
0:22:59	so we can not be over your problem i
0:23:03	don't see the em program in using all means of to be just changing that
0:23:07	when you're writing and ueller richie
0:23:10	of the units
0:23:13	which we
0:23:14	do you think you would get this you would keep the same accuracy
0:23:20	i don't know we have one check because here we go to variable for a
0:23:26	couple always very when the limited you challenge is
0:23:33	if we look at
0:23:36	i'm not sure to understand their we will your point in fact
0:23:39	we can talk a yes
0:23:49	hello i am here but talk about how you are looking on the degree of
0:23:57	repetition and what i didn't you are looking as repeated
0:24:01	i think perhaps not counting
0:24:04	probably so you get things like
0:24:08	i'm interested in shares or whatever was and in the next one you're getting a
0:24:12	time
0:24:14	in content items as being the repetition
0:24:18	in terms of being you know sort of
0:24:22	alignment which i think in this case where the participants don't really have so much
0:24:27	like what they say that phone first-person pronoun there is only one
0:24:34	and
0:24:35	you have similar ones for me i think that if you were doing alignments
0:24:40	on the on that might also be the same sort of a problem
0:24:45	what the in i think that it's just one of the difficulty to work when
0:24:51	we walk misalignment is that it can be very
0:24:56	you can very specific
0:24:59	words such as the difference between what time i used adding all at what time
0:25:05	and it is going to be very important in that case
0:25:10	and in this work we have chosen to
0:25:15	select all the expression
0:25:17	and to can everything even though we are probably counting some
0:25:25	expression that in around and that are still going to happen even without that but
0:25:31	alignment
0:25:32	but what we show in the by comparing to the strongest cultural
0:25:37	i think is that
0:25:41	when people line
0:25:44	they will create mall expression
0:25:49	so
0:25:52	i just if you were telling
0:25:57	information for
0:25:59	right
0:26:01	i think you would want to understand some of these things are alignment in some
0:26:05	ways
0:26:07	so that you would be producing delays
0:26:09	thinking
0:26:11	and regarding that
0:26:13	since
0:26:14	the expression mexican keeps track of expression instead our future such as the frequency
0:26:20	such as a recent c of an expression we can use it is it's is
0:26:24	it is features to feature out
0:26:28	an interesting expression
0:26:30	but can you because i could be extremely free
0:26:34	and it could be very recent as well
0:26:37	mm
0:26:43	i can just two
0:26:45	to copy this behaviour
0:26:50	we can choose to stop my sentences by the same expression that we use for
0:26:54	instance i want to align
0:26:59	thank you very much nothing to speaker again

Automatic Measures to Characterise Verbal Alignment in Human-Agent Interaction

Joint Special Session on Negotiation Dialog

Guillaume Dubuisson Duplessis, Chloé Clavel and Frédéric Landragin