Speech Transcript - Understanding the User in Socialbot Conversations

0:00:15	okay
0:00:16	so i'm not talking about understanding the use a user in social but conversations
0:00:24	and i really representing
0:00:27	the a team of students here so i wanna knowledge the students are really a
0:00:33	part of this
0:00:34	i also work we enjoy denotes the and that's faculty advisers
0:00:40	and that it's
0:00:42	it's been a lot of fun and the students i really
0:00:46	okay so it doesn't know about amazon like surprise and so i should point out
0:00:52	that name they are here a sounding board and then i don't think it is
0:00:58	again something
0:00:59	and is an important e
0:01:02	was a response to a caller competition
0:01:06	which was the l x imposed on a lot surprise and so the idea
0:01:10	back in twenty sixteen base elicited skull and they want university students
0:01:16	to build a social but i
0:01:18	and well as it is also but to converse quote unquote coherently and originally a
0:01:24	with people on a good topics inference and so it's very open domain
0:01:29	so my gratitude on with the team leader said you want to do this and
0:01:34	i think you're crazy but okay
0:01:36	and he got it together and they wrote a proposal and the
0:01:42	and the intended to select and then we can then the field of the system
0:01:46	and all that
0:01:48	at the end of that we had about ten million or more than ten million
0:01:52	conversations
0:01:54	with real users
0:01:55	and between that
0:01:57	and the fact that we're working with the new type of conversational i basically what
0:02:02	we where it is there is a lot of research problems in that is based
0:02:07	dialogue that it is i hadn't thought of before and so i'm the focus of
0:02:14	this type is gonna be understanding user
0:02:17	a particular including user modelling but i want to start out by saying this is
0:02:23	used once all these the overall big picture i'll give you a little of those
0:02:28	probably picture is just one small piece
0:02:31	so what it what i mean by social plots so and why do i mean
0:02:34	by think this is a new type of conversational ai
0:02:38	so a lot of work in conversational ai has two spaces and people often talk
0:02:45	about it as two different possible task
0:02:48	so there is the virtual system and what task oriented dialogue
0:02:53	and in that
0:02:54	type of dialogue system
0:02:57	you're executing can we have the answering questions it or something that is social back
0:03:03	and forth
0:03:04	on the opposite end of the spectrum is a chap which is oriented towards chitchat
0:03:10	kind of how are you know what you're doing today but it really limited content
0:03:16	to talk about
0:03:20	i like to think of these not is to different option
0:03:23	but as
0:03:25	a two different types of conversation you know broader space has at least two dimensions
0:03:32	probably more but there is the accomplished task dimension where the virtual assistants trying to
0:03:39	do something in the chat but is not and there is a social conversation dependent
0:03:43	where the jackpot is being social but doesn't have as much to talk about
0:03:49	so what we are trying to do us something that's in between
0:03:55	we do we're a little bit less social and a little bit less
0:04:00	a task oriented
0:04:02	then the other two
0:04:04	well i i'd argue that it is to some extent
0:04:09	a task goal oriented because you're providing information
0:04:14	so there's some so most social exchange and information so with that background
0:04:19	what i'm gonna talk about
0:04:22	is initially that then of the social but for our chi specifically and that is
0:04:31	that the conversational gateway
0:04:33	and all of us system overview i'm gonna kind is true that because this is
0:04:39	early days of working on social but that's and the architecture with you is not
0:04:45	gonna be the our architecture that anybody'll use a couple years from now but we
0:04:50	need to understand it to see how we're collecting the data and what we're doing
0:04:55	then they want to focus in on characteristics of real users and this is just
0:05:00	an analysis somewhat anecdotal but i think it's important to understand where we're going and
0:05:05	then i'll start in panel talk a little bit of our first steps in user
0:05:09	modelling and out in this was something queries
0:05:12	okay so this is also by as the conversational way
0:05:17	so what
0:05:19	we see
0:05:20	is that this social but when people come to talk at social but they are
0:05:25	not they don't have a specific task that they wanted you don't wanna work
0:05:30	a restaurant reservation for example that they do you come up with some sorta
0:05:36	ideas of what they might
0:05:38	and yes or conversely
0:05:40	and they were new information and their priorities are interested in a of all their
0:05:45	goals available
0:05:46	and so the social but is still indicating that a balding
0:05:50	so one of vocal set
0:05:54	the users are also in this case
0:05:56	coming to a little a little device
0:05:59	to talk to okay that our accessible dot so they know they're talking to about
0:06:05	we are not trying to pass a two pass into i
0:06:10	i would argue that users should know that they're talking to a box and so
0:06:15	making the lasso human like as to what the users may not be such a
0:06:21	thing to do
0:06:24	pretty much the systems
0:06:26	i know that you know in some
0:06:29	for some people channel after a little controversy all this is not a chat but
0:06:33	i think there really are applications for this
0:06:37	for example you could imagine in language learning having a conversational agent that can converse
0:06:43	was which is a good way to practise language tutoring systems a good way to
0:06:48	interact with learning about information at their own case with depending on your own interests
0:06:56	you know
0:06:57	are you we're using
0:07:00	a chapter information exploration interactive health information recommandations and just to give you a nice
0:07:07	you have how you can imagine that so when i come home i actually use
0:07:11	the my i'm not a power user but i actually use my
0:07:15	why alexi the and often times when i come home i want to listen to
0:07:20	the news well i'm at dinner well you can imagine if you could interact with
0:07:25	the you could tailor the news to the stuff that you're actually interested in
0:07:32	and that there is the notion of an exercise coach or your coach so we
0:07:38	end up teaching conversational a high course
0:07:41	screenrecorder a building on what we have learned the teams of students to read and
0:07:46	there was a great coaching a i system that
0:07:52	one of the student teams bill so a lot of actual applications like think this
0:07:58	technology can lead to and a lot of people are shown that interested in
0:08:04	okay so are you is that it's a conversational gateway timeline content so again when
0:08:10	you get when o
0:08:12	you might want to talk to a the in a rat we had your system
0:08:15	to learn about what's going on in the world
0:08:18	and in this particular case we're scraping client and the contact would be a new
0:08:24	source it is it could be video could be well actually no with all text
0:08:30	was so it's new sources the could be whether we're not using quarter of we
0:08:34	use a and b
0:08:36	we read from a red it's the discussion for that could be so all of
0:08:41	the stuff that's online you could interact with
0:08:46	so just to give you an example or even by
0:08:48	this is an actual dialog in all examples i'm gonna give you are actual examples
0:08:53	and exposing a lot of our system
0:08:57	so in the first case you have to start out with you says let's chat
0:09:02	that evokes a system because we're supposed to be anonymous in the competition everybody was
0:09:08	required just a this is the know what surprise social but and then added that
0:09:12	can just go on a chat and
0:09:16	you have to chat about topics you have just play games and chat about whether
0:09:22	so we are for or something
0:09:26	somebody will accept the
0:09:29	they will talk about that and try to leave the conversation for thirty eight somebody's
0:09:35	not saying too much
0:09:38	and so far in set with this case we're talking about movies and we might
0:09:43	talk about a director or there were we might go which you that sort of
0:09:51	like so that's how the dialogue going
0:09:53	in the beginning i'm showing here
0:09:56	a recognition error the person after house or a person says that alright reason we
0:10:01	can get that rat get that the answer responded that correctly is because actually we
0:10:07	have n best alternatives
0:10:09	and so we could do you get out in figure out based on probabilities and
0:10:14	based on the actual context responding to house k at the present actually said
0:10:21	okay
0:10:23	so i want to highlight why this and how this type of social but is
0:10:31	different from a virtual assistant that has much more research
0:10:38	well so that have i use that is a sort of conversational ai cyst
0:10:44	components and even if you're doing and to and
0:10:47	you're and i and would sort of rebuild we're often different stages that maybe training
0:10:53	and the and reduce different stages of the speech and language understanding
0:10:57	the dialogue management response generation but also every system is gonna have some sort of
0:11:04	backend application that you're interacting with
0:11:08	so and a virtual assistant
0:11:12	the speech and language understanding is constrained domain
0:11:16	can be and easier task you like task intends
0:11:21	oftentimes you're filling out forms a binding constraints to resolve with the person wants to
0:11:28	do on the social by the end
0:11:33	are more social
0:11:37	or information oriented i want information on this topic
0:11:42	so the entrance are a little bit your french
0:11:44	and in terms of understanding at the sentiment is gonna play a role
0:11:51	the dialogue management side on the virtual system you're trying to resolve ambiguities security and
0:11:58	options to figure out what's the best solution to this problem
0:12:01	and then executed task
0:12:04	and the roar would be timely completion of the task
0:12:09	a lot so i
0:12:11	you're actually trying to learn about the interests of the user
0:12:15	and the suggestions at least in our system but that's information oriented you one make
0:12:20	suggestions of things that might wanna hear about
0:12:24	and the reward is user satisfaction which is not so concrete
0:12:29	and that's very challenging
0:12:32	the backend per a virtual system with a and b is structured database are back
0:12:40	and is totally unstructured so we have data structure
0:12:46	and lastly because it's a constraint or maybe virtual assistant response generation is you are
0:12:53	then in our case
0:12:55	which is an open-domain because we could be presenting information on in there
0:13:02	okay so let me tell you a little bit about our system
0:13:06	and i'll give into a how we
0:13:11	the velocity cut it a little bit overview and then evaluate the system
0:13:18	so
0:13:19	again this is the new problem when we started we had no experience with the
0:13:23	lexus skills we didn't have our own dialogue system and
0:13:27	using their tools
0:13:29	well as it really a good solution because it will for designing for speech or
0:13:34	fine
0:13:36	and that's not what we were doing we're actually doing conversation
0:13:39	as opposed to you know the form filling task oriented things that people have designed
0:13:44	apps
0:13:47	so
0:13:48	that was a little hard
0:13:50	and find that
0:13:53	there was no data no people often chair challenge is of that no amazon had
0:14:00	data they just they should have given that you know there was no data amazon
0:14:04	did not have data they had interaction straight and transactional interaction like such a kitchen
0:14:11	timer
0:14:12	you know plane using
0:14:14	they did not have conversations
0:14:17	this was one of the reasons i'm sure this part of the competition
0:14:20	and i
0:14:23	after
0:14:24	the performance or so getting the data from other teams in the recognizer recognition error
0:14:31	rate went down a according to them in a paper three percent
0:14:35	so i really didn't have the data
0:14:38	so it's a new it was unusual you're a new probable and what that means
0:14:43	was there's no existing degraded entering so we started out thinking that's what we would
0:14:48	do when we started out with do we present a sequence modeling with whiskers it
0:14:53	doesn't work
0:14:54	because it's all data
0:14:58	so we have read a yes in terms of starting from scratch
0:15:04	i think
0:15:06	because we're starting from scratch our system was you see that in
0:15:12	so that data that we collected in the beginning
0:15:15	you know was good retrain your recognizer
0:15:17	what was not so good morning how to improve our system
0:15:21	so this is all the say we're at the beginning the system wasn't so good
0:15:25	it of all it had to well okay so that setting the state probably to
0:15:30	the system design
0:15:33	alright so we when we first started building a system we first started getting data
0:15:38	we realised have that it was we side effect okay what we wanna think about
0:15:42	in terms of designing this just
0:15:45	so
0:15:46	i think that people what makes someone a good conversationalist
0:15:50	so you know to a perceptron and you looking for people to talk to you
0:15:56	generally want to talk to somebody has something interesting to say
0:16:00	okay
0:16:01	and how we also want to talk to somebody listening to you and
0:16:05	joint we are interested what
0:16:07	you have set
0:16:09	okay
0:16:10	the principle seem reasonable to apply to a social but and in fact i think
0:16:15	they really work for us your some examples
0:16:18	so
0:16:20	we saw that users would react positively children something you will tell you later how
0:16:26	we have got that information so for example around christmas time
0:16:31	a people what like to talk about christmas and we in calling our content have
0:16:36	undefined
0:16:37	this little tidbit space accent beer ingredients to the international space and station just in
0:16:43	time for christmas and a lot of people that was kind of interest and they
0:16:47	like that piece of information data and also like sort of
0:16:52	cool size of our a lot of the users are turkeys and so they like
0:16:58	the fact that babies as you are ten months get is that how much someone
0:17:01	values a particular goal
0:17:03	by observing how hard they are willing to work to achieve that
0:17:07	i interesting people that was interesting and like that
0:17:11	they do not like all news
0:17:14	so that we had a fixed that problem really early on we tell me something
0:17:19	that's two years old that gave us better use
0:17:22	the also didn't like unpleasantly then it you know it turns out there's a lot
0:17:26	of bad news in terms of current events i mean that if you're scraping you
0:17:31	will get plane accidents where people die and things like that
0:17:37	so we started hearing or and you are visiting us that reactions
0:17:42	but filtering is really hard problem
0:17:45	so we can filter for people dying but we are a piece of news that
0:17:50	people really didn't like was something about cutting the dog's head off so that's really
0:17:56	unpleasant we wanna with that
0:17:59	so another thing that we want to try to do show interest in what the
0:18:02	user says of course they're gonna lose interest if you're not
0:18:06	if you get too much stuff they
0:18:08	that you don't want to talk about they wanna get acknowledgement
0:18:12	something that's really working in these conversations they need to get encouragement to express their
0:18:17	opinions does not be used to this
0:18:20	so we ask questions like have seen superman
0:18:26	it's layer
0:18:27	which part did you like best
0:18:30	so that's important part of the dialogue
0:18:33	and fortunately to ask questions you need a little bit of knowledge of the work
0:18:40	so you can ask seven standard questions about movies but once the domain gets brighter
0:18:45	we might ask questions like this article mentioned google have you heard of
0:18:52	yes
0:18:53	i generated this happened to us in the demo
0:18:57	unit we we're doing this averaged ml so in this case you know everybody last
0:19:02	but sometimes you know what are the actual uses a gets annoying
0:19:06	alright so this leads to our design philosophy of just summarise briefly
0:19:12	we're content driven and user central
0:19:15	so we had to do daily and i need to keep are
0:19:19	are information price
0:19:22	so we had a large and dynamic content collect collection and represent with the knowledge
0:19:27	graph
0:19:28	and dialogue manager that promotes popular content and diverse sources
0:19:33	or the user centred side we had language understanding that incorporates
0:19:42	sentiment analysis
0:19:44	we try to learn a user personality in the world around topic changes and tracking
0:19:49	j engagement and on the language durations so i
0:19:53	we tried to use prosody appropriate grounding
0:20:00	so
0:20:01	this is the system and i'm not gonna tell you everything i'm just giving you
0:20:05	to the lecture but you can see is a language understanding component dialogue management component
0:20:11	language generation there's this back and where we're doing content management
0:20:15	we're using and
0:20:17	and question answering system that
0:20:20	in this are provided
0:20:22	we're using not expert we're using eight of us for
0:20:26	some text analysis
0:20:29	so that's a big picture there's lots of modules because we're at the beginning stages
0:20:33	were constantly swapping in and changing things
0:20:37	and enhancing things so it is a modular architecture to be able to about the
0:20:42	rapid development
0:20:45	so very quickly aren't each of the different components
0:20:50	natural language understanding is multidimensional
0:20:55	we're trying to capture different things some responses can be long and in capture both
0:21:01	questions and commands
0:21:03	we have to cut taxes topics that people are trying to talk about and the
0:21:08	user reactions
0:21:11	the dialogue manager is hierarchical l
0:21:14	so we have a master and minions and the master is trying to control the
0:21:20	overall conversation negotiate and right topics to talk about
0:21:26	thinking about coherence of topics
0:21:29	engagement of the user and of course it's important to the since work on trent
0:21:34	content driven
0:21:36	two are considered content availability you don't want to suggest talking about something that you
0:21:40	don't have anything to say about it
0:21:42	the minutes it'll are focused things
0:21:46	for related to social aspects of the conversation and different types of news sources "'cause"
0:21:52	different types of news sources
0:21:54	or information sources
0:21:56	come with different types of
0:21:59	metadata an extra information so with movies we have relations between you know actors and
0:22:05	movies well for a general news source we just have the news and the metadata
0:22:11	about the top
0:22:15	this is
0:22:16	back to the example it you before
0:22:19	and in this example there's stages of negotiation and that would be handled by the
0:22:24	master and
0:22:27	different types of information sources that were jumping around the n
0:22:32	that are handled by the different
0:22:35	go so are different many skills so the movie is one skill
0:22:40	we great from a celebrated that skulls channel is
0:22:45	and so that the last hole
0:22:47	those often are willie
0:22:52	and then we also sh great from another source it's giving us a and that's
0:22:55	we're that job you're between skills
0:23:00	and the language understanding so i
0:23:02	basically we get dialogue acts
0:23:05	for
0:23:07	the dialogue manager and we get information that's to be presented from the dialogue manager
0:23:13	and the response generation is gonna take those internet into the actual texture got it
0:23:19	you're gonna say that includes a brace generation but also prosody adjustment
0:23:25	the tricky thing for the so for the things use a lot you just the
0:23:29	prosody in the speech synthesis
0:23:31	so we have no control over audio but we do you have control
0:23:34	i'm using s m l
0:23:37	so you can
0:23:38	make your
0:23:39	i'm like enthusiastic
0:23:43	of which you have to do with the prosody instead of having the above three
0:23:46	d
0:23:48	intonation
0:23:49	by for the is that we present in
0:23:53	news we actually just read as it is we rebuilt or it to get things
0:23:58	that are covered more conversational
0:24:00	but we're
0:24:01	but that's text
0:24:03	pretty domain and that's really hard to control prosody for
0:24:08	actually we also do some filtering in the response generation which will see later
0:24:15	content management has this end we crawl online content
0:24:19	we have to filter inappropriate and depressing content
0:24:23	then we index to index to using some language some parsing and entity detection
0:24:30	we use metadata that we get from the source
0:24:34	for topic information but also use popularity metadata
0:24:38	and then we
0:24:39	good at all into a big knowledge graph our knowledge graph and eighty thousand entries
0:24:46	and three thousand topics so in and you can have multiple topics
0:24:51	so here's a idea
0:24:53	so we would take for example over in
0:24:57	e upper left inside
0:24:59	are a bunch of news article or
0:25:03	bits of content that mention ut austin over here it is a bunch of things
0:25:07	that men mentioned google
0:25:09	et cetera
0:25:12	okay so the system is evaluated
0:25:17	by what amazon decided and basically that was really one to five user ratings that
0:25:24	was the most important thing and then in terms of the final that there is
0:25:28	that i it it's duration the ultimate goal
0:25:31	if we had made it to twenty minutes
0:25:34	with all the judges then we the team would've gotten a million dollars
0:25:38	so we actually did really well
0:25:41	i didn't expect us to get your five minutes
0:25:43	so ten minutes was pretty good
0:25:46	it's a hard it's really hard problem
0:25:47	but the interesting thing is the other judges that we're not so all of the
0:25:53	development was the amazon users
0:25:55	but they are three people for interactive interactors and three people for judges
0:26:02	where for the finals and they were people who were motivated to improve the system
0:26:09	people who were like news reporters you're conversational it is
0:26:12	and so the motivated conversationalist
0:26:15	actually last a lot longer than the average amazon user however there are more critical
0:26:19	so the average amazon user divas higher score so that's basically how it works
0:26:26	so what we
0:26:27	i actually pretty balanced and
0:26:29	is the average the amazon users
0:26:32	but the rating is at the end of the conversation
0:26:36	you have a huge amount of variance
0:26:39	and some of them
0:26:42	declines rate is actually more than half of them inclined to rate the system
0:26:47	so the ratings are expensive noisy and sparse
0:26:51	and i haven't that
0:26:52	you can have you know we're not occur between the states we get word sense
0:26:58	this then you in a weird sense ambiguities can lead you to do something that's
0:27:03	off topic
0:27:05	and so you can have guide conversations you can get is i can get that
0:27:08	depressing news you can have sections of the conversation that are working well
0:27:13	and sections that don't work so well
0:27:15	so you're or a score
0:27:17	is not a equally representing all parts of the conversation
0:27:22	and so in order to actually use that overall score
0:27:27	to meaningfully do design
0:27:29	we have taken a and then to the fact that users give us more information
0:27:34	they actually accept or reject topics that we propose
0:27:39	they proposed topics
0:27:41	and the reaction to the content is important
0:27:45	so what we actually do
0:27:47	it's we take the conversation level recognition and we projected back to dialogue segment we
0:27:53	can segment just because we know the topics from the system's perspective
0:27:58	and we project that using the information of user engagement
0:28:03	so you could be projected that non-uniformly
0:28:07	and once we have those segment level estimated ratings
0:28:12	then we can aggregate across conversations for example we can aggregate across topic we can
0:28:17	add aggregate cross specific content
0:28:19	or we can apply across eventually accurately aggregated cost use it right
0:28:25	so this is how we could figure out a this is the content a lot
0:28:28	of people like
0:28:29	this is a constant a lot of people don't work so that's basically it
0:28:34	so what i'm a bunch of the user's task just some kind regarding constraints
0:28:38	we could not you
0:28:40	and i think we have a audio side
0:28:43	so speech recognition
0:28:46	all we got is text we get an audio for privacy reasons
0:28:49	asr is imperfect
0:28:51	we don't get any audio so we don't get
0:28:54	pauses we don't have sentence segmentation that's been changed in the version but we didn't
0:28:59	have that
0:29:01	we don't have intonation so there's a lot of things that we can is
0:29:06	detect
0:29:09	and this is it we can do u s and all but that's all we
0:29:14	can do
0:29:15	so there are some constraints so that just to say
0:29:18	a lot of the errors are false alarm errors are all gonna show you have
0:29:23	any examples you can appreciate
0:29:25	okay so i'm just several conversations
0:29:29	so what i wanna say here
0:29:31	is used some observations and then talk about personal implications
0:29:38	and then all the three of these i can talk about the user modeling
0:29:42	so
0:29:44	there are
0:29:46	for dinner points and wanna make for users have different interests
0:29:51	they may have opinions on a different opinion on the thing is a
0:29:56	and use were example in the us
0:29:59	news about from
0:30:00	little is the whole or opposite reactions from users
0:30:06	they have different senses of humour
0:30:09	some people like our jobs and some people don't
0:30:13	there is they have different interaction styles different well and they're different ages isn't of
0:30:18	family so just a you example how this impacts the system
0:30:23	one of the things that we found
0:30:26	was people like to talk about vampires for some reason
0:30:29	so this was the piece of information that a presented a lot to people and
0:30:36	that
0:30:37	basically says did you know that relation vampires are tiny monsters that perot into people's
0:30:42	heads
0:30:43	and for some the talk about that
0:30:45	now we don't control the prosody on this because this is general content so it's
0:30:50	basically read prosody
0:30:52	and so when people are listening to this
0:30:55	if there actually listening they are often amused as a kind of an so but
0:31:01	sometimes
0:31:03	they think it's
0:31:04	a bad
0:31:05	okay so they're not of
0:31:09	or
0:31:10	they what they had
0:31:13	because this is didn't make sense to them
0:31:18	i times you can tell they're not really listen
0:31:22	so far well
0:31:26	citrus
0:31:29	but last three
0:31:30	there are a user community is a little more complicated
0:31:35	there are also the callipers
0:31:38	and so this would and
0:31:41	resulting in topic changes for those people like that
0:31:47	they are different interaction styles so this is one user
0:31:51	talking about vampires all kind of this was i useful user i'll come back to
0:31:56	this for other examples
0:31:59	and then we know that she user which is actually more frequent category
0:32:05	where a lot of the answers when one word so
0:32:10	this is important to appreciate that it affects
0:32:14	language unit
0:32:15	so
0:32:16	the a type of user actually is a lot harder for language understanding
0:32:22	because
0:32:23	there are there is more recognition errors
0:32:27	we're not
0:32:28	you know it's harder to get intent
0:32:31	this type of user actually is also hard for language understanding because
0:32:37	we don't have prosody
0:32:40	so what it's saying no in a way that
0:32:44	so it
0:32:46	if i ask a question
0:32:48	do you want to hear more about this and the person says no
0:32:51	that means they do not want to hear more about this if you a request
0:32:54	if you say something and there are a lot and pairs as know that
0:33:01	if you wanna hear more about this
0:33:03	and so it's important because we don't have prosody
0:33:07	that we use state dependent dialogue and language understanding but even that doesn't always got
0:33:14	it
0:33:15	so this is my argument for
0:33:17	right industry we give us project
0:33:21	okay so they have different calls to the information seeking goal
0:33:26	the information some people just generally want to know more others ask specific questions others
0:33:32	is really hard questions like why
0:33:35	i there is
0:33:36	a like maybe empire percent
0:33:39	well i'll laugh are on and start asking a relevant question to the topic of
0:33:45	vampires
0:33:46	but not
0:33:48	and the user to call or the that are that we were talking about is
0:33:53	it really true that are like tv vampires and then there is a speech recognition
0:33:58	here
0:34:00	opinions sharing
0:34:02	some people would like to spark a lot like to also share their opinions that
0:34:07	actually not so hard to deal with because you can you that is you might
0:34:11	in a party and not in huh
0:34:15	and then there's other people who want to get to know each other they want
0:34:18	to find out
0:34:19	why a lexus favourite x is tell us about their favourite axes and so those
0:34:24	are different levels you have to accommodate
0:34:27	a we also have an adversarial user is we share suppose three family friendly
0:34:33	if we do things that are not in we currently we got taken offline as
0:34:37	this is really in the field
0:34:40	for us
0:34:41	did not use everything
0:34:44	so we did not wanna get taken offline
0:34:46	so we work really hard and we did many times we worked really hard though
0:34:51	to build content filters in the come up with strategies to handle adversarial users
0:34:57	so in this particular case we're not supposed to talk about anything
0:35:03	related to pornography or sacks or anything like that
0:35:07	so you just but a lot of users so you just have to have a
0:35:11	strategy for dealing with that so
0:35:14	in this case
0:35:17	we just tell people are much as well
0:35:20	when they have a sense of language one time we got taken offline because if
0:35:24	you didn't understand what they said sometimes a good strategies to repeat what they set
0:35:30	and that
0:35:31	and so what we were doing this we were filtering all the concept we were
0:35:35	presenting but we forgot to filter what the people's
0:35:38	so our solution there was to take the babble heard and replace it with random
0:35:45	funny words is one of my students came up with this that i would never
0:35:48	i thought it was a really stupid idea but it actually people laugh so it
0:35:52	people really liked
0:35:54	so we say things like unicorn i imagine you record or it's actually more funny
0:36:00	if it's in the middle of a conversation and its you know butterfly open your
0:36:05	whatever it is
0:36:06	and change then we change the subject and then there's a lot of people who
0:36:10	manage and control it just have a strategy
0:36:14	which
0:36:15	i don't understand or whatever okay
0:36:20	the last problem is working with children and you have a lot of children and
0:36:25	problem one in working with children it is a that speech recognition just doesn't work
0:36:30	as well for young children everybody knows that
0:36:32	it companies
0:36:34	have included some stuff to get it h o ring age found in him to
0:36:38	lower things but really young children it doesn't work as well
0:36:41	i'm quite sure i'm looking at the and bass
0:36:44	this is a kid talking about the pet hamster but other than that it's really
0:36:48	hard to figure out what they were talking about in this case asking them to
0:36:52	repeat
0:36:53	is not gonna solve the problem it's better to just change the topic
0:37:00	they think it's content filtering
0:37:02	so when you're talking to a kid at christmas time
0:37:08	a lot of times in the us a lot of people want to talk about
0:37:11	class
0:37:13	fortunately a lot of the contents
0:37:16	that we were scraping from was or at all
0:37:20	and
0:37:22	we take it also i because we set sail a class was a lot i
0:37:27	another concept sequences that
0:37:31	i we were not only people this i two is that would so ever saw
0:37:35	that what points that start talking about the other class
0:37:39	so results actually
0:37:42	okay so we have a user personality not well
0:37:45	it's based on the fly factor model that's based on the we ask questions based
0:37:51	on this two questions but the real world readable more conversational
0:37:55	weaker ones and things questions that we don't actually used to make it
0:38:01	more engaging the people but we can ask human because this is the and the
0:38:06	interaction where we're supposed talk about topics of people to want to just
0:38:10	you know do you with all sorry
0:38:12	so the data we have is very noisy and impoverished we're not asking that many
0:38:17	questions
0:38:18	buy tickets is it doesn't give us some information so what we can see
0:38:23	is that personality for the things that we explored
0:38:27	thus correlate certain types of personality correlates with higher user ratings
0:38:32	so people who are extroverted
0:38:35	agreeable
0:38:36	or in haven't you give us high ratings okay sort of make sense
0:38:44	i think that's interesting is there is a statistically significant correlation
0:38:49	we owe personality traits and some of the topics that they like
0:38:54	you know not for the topics a lot of people use
0:38:59	not everything but there is system
0:39:01	this correlation
0:39:04	for certain types like kindergarten actually hurts
0:39:09	there was the that data seem to be pretty good some extra perks like recent
0:39:13	fashion introvert like a i-th routing task
0:39:18	if you are and imaginative you like
0:39:22	and you like things like a i've time travel anyway and
0:39:27	low conscientious now as was explained as you know you don't like to in your
0:39:31	home or a and those work with those people like pokemon be in one craft
0:39:36	so that data actually sort of it sounds
0:39:39	okay so just summary here
0:39:43	the implications are that
0:39:45	age and dialect
0:39:47	that the implications are the user characteristics okay every single component of the system
0:39:55	that age trial are dialect verbosity a pack language understanding your interests the fact that
0:40:03	dialogue management and the types of if you're you talk a lot more errors that
0:40:10	affects the dialogue management strategy
0:40:13	you're interested that content management
0:40:15	you're h does because of how you filtering is
0:40:20	as we begin to user modeling we wanna multidimensional content
0:40:24	in that so we can get ratings the different user trials
0:40:28	and lastly the phrasing that we use the generation
0:40:34	if we have more information about the user
0:40:36	should be adjusted based on
0:40:39	so a user modeling
0:40:42	this is really early work
0:40:44	so this is a preliminary so nothing public but i thought it would be under
0:40:48	talk about in this audience
0:40:50	so i'm gonna talk a little bit about why we care for content nanking and
0:40:54	the user but in future embedding models
0:40:57	so
0:40:59	while we wanted to the task that were interested in and is given a particular
0:41:04	the contents
0:41:06	a project whether the user is going to engage positively or negatively
0:41:10	or slowly with that content
0:41:13	and so the time span is gonna be characterized in terms of the information source
0:41:17	topic entities
0:41:20	at some point later sentence and valence but we haven't done that yet
0:41:24	the user engagement is characterized in terms of what topics as the user suggest
0:41:31	what topics
0:41:32	what does the user accept or reject
0:41:36	positive or negative sentiment in reaction to the content but also a positive or negative
0:41:42	sentiment in reaction to the ball
0:41:45	because that reflects an overall being unhappy with content but maybe not a specific font
0:41:51	probably generally
0:41:55	so the types of features were using
0:41:58	include both some user independent stuff that's like the bias term
0:42:03	so relatedness the current topic and general popularity in dialogues
0:42:08	but then the user specific features for mapping these different types of measures of engagement
0:42:14	into a few additional features
0:42:18	and then the work trying to use the light cues
0:42:24	the user to capture things like age personality
0:42:29	not the issue here is
0:42:32	we have very little data so we don't know
0:42:36	we have to treat each conversation independently conversations we know that no the conversation came
0:42:41	from the same device
0:42:43	but these devices are used by families and oftentimes use more than one person so
0:42:47	you cannot assume that
0:42:49	the person is the same problem
0:42:54	conversation to conversation
0:42:56	for specific device
0:42:58	in the future you can still have that information but this is we have to
0:43:02	use only a conversation
0:43:04	so that it is very sparse
0:43:07	so you have to learn from other users
0:43:09	so
0:43:11	just this is just a motivational slide
0:43:14	this is just say that the user is really important so when we're predicting the
0:43:19	final rating of the conversation if we consider topic factors
0:43:25	i didn't factors and user factor so topic factors are what the topics or the
0:43:30	topic coherent stuff like that
0:43:33	who was it's just by the agent that there is there are things that the
0:43:37	agent's is
0:43:40	how they say that and then the user factors are user engagement and
0:43:46	the robot's them and things like that
0:43:49	user factors are alone
0:43:52	you better performance than everything together
0:43:55	in predicting the final conversation level so the user is really of work
0:44:02	okay so
0:44:04	i do not mention neural networks except to say that we didn't you and training
0:44:11	so i'm gonna now mentioned in that it doesn't mean that in fact are used
0:44:18	because everything has to be passed et cetera but we are using them in terms
0:44:22	of finding user embeddings
0:44:24	so the first thing we did was actually not be used a neural network
0:44:30	well as latent dirichlet allocation
0:44:34	a which is a standard way to do topic modeling that works modeling for any
0:44:42	task
0:44:43	so what we're thinking about what we think about this is each user is a
0:44:46	bag of words
0:44:48	and that would be a document like a documents
0:44:52	and we're gonna come up where represent lda the clusters the different what about topics
0:44:59	of lda would be user type so unsupervised learning user types
0:45:05	so we just had to just do let's just use hand what topics or clusters
0:45:11	because we don't think there's that many different user types and this would be undercut
0:45:16	somewhat interpretable
0:45:17	and that if you look at the most frequent words
0:45:22	you the following phenomena
0:45:26	people who like interact with certain types of things the people like to know what's
0:45:30	one particular cluster people talk about music was another particular cluster
0:45:35	and the personality quiz
0:45:38	like this
0:45:39	shows that another cluster
0:45:43	interesting and
0:45:44	a lot interest in the let's the
0:45:47	shows that
0:45:48	and another cluster
0:45:50	interesting you know be oriented so with the legs what your name what's your favourite
0:45:57	but analysis self oriented person i think i am
0:46:03	there's people who are generally positive
0:46:06	a whole one interesting
0:46:09	and there's people who are interested immediately
0:46:14	so that i l
0:46:18	it's so first of all you play traffic the lda in order to get the
0:46:23	interesting interpretable cluster you have clusters you have to do some frames
0:46:27	you have dropped frequent words it turns out i really that we needed to keep
0:46:32	yes and no in there is a positive people and negative people
0:46:35	but because you get yes no a questions are just so i'm gonna get those
0:46:40	in there
0:46:40	that you have to for them out
0:46:44	so uniqueness to make it work and there is you know there's this class and
0:46:47	i have fundamentally in a perplexity of that's what we're doing
0:46:53	without is that the right objective take your right users
0:46:57	well trained on another a problem we played around the different objective
0:47:03	to learn user embeddings and this was user we identification this is also unsupervised
0:47:09	and then it is
0:47:11	you're gonna take a bunch of sentences from user and bunch of other senses orchards
0:47:18	from the same user
0:47:21	and try to learn embeddings that make those things from the same user closer together
0:47:28	and things
0:47:29	to a user
0:47:32	farther apart
0:47:33	okay so we have
0:47:35	distance to sell
0:47:37	we want to minimize
0:47:39	and distance to others we're gonna maximizes it's a minus sign
0:47:43	so when somebody's talking about tasks and they keep talking about task
0:47:46	we want those to be close
0:47:48	and when they talk about something totally different that's gonna be five away
0:47:52	that is another way of dealing with drawing up things
0:47:57	so
0:48:00	if we this work was actually done related
0:48:04	and we have this problem where we're gonna let cid each and what are you
0:48:09	serious and i say finally somebody else like that
0:48:13	you from their tweets
0:48:15	so using this unsupervised learning which we call reality it turns out and you're picking
0:48:21	in from forty one person in forty three thousand random people we evaluated with mean
0:48:27	reciprocal rank
0:48:29	so basically the mean rain
0:48:32	with our best
0:48:34	just which was initialized with worked about
0:48:37	and then use the identification is twelve that well at a forty three thousand is
0:48:41	pretty good
0:48:42	lda is a five hundred
0:48:44	so this type of user adding i think is very promising
0:48:49	very for dealing with learning about user types
0:48:53	okay so how do we evaluate them that's with this task of embedding channel project
0:49:00	engagement
0:49:02	and a conversation level ratings x
0:49:05	okay so in summary
0:49:07	the a unit summarize the sounding board stuff and then the user stuff so basically
0:49:13	the social by
0:49:16	as a conversational gateway
0:49:18	involves not
0:49:20	accomplishing tasks
0:49:22	i in hearing about helping the user of all the goals and collaborating to learn
0:49:28	interest
0:49:30	and the user what the user is doing is learning new fast
0:49:35	exploring information ensuring opinions
0:49:37	so that the end of your conversational a system
0:49:42	the radical system components are basically related to the user into the common to tracking
0:49:47	the user intents
0:49:50	and engagement
0:49:51	but may also managing and evolving collection of contents
0:49:58	with you can think about a social chat knowledge
0:50:01	and as i said in the beginning
0:50:04	million conversations with real users and this new form of conversational it i
0:50:10	least menu problems of this is just the tip of the ester
0:50:14	okay so the sociological asr that user group that information exploration
0:50:20	re a user variation so
0:50:23	you know i'm sure that either conversationally i get a lot of user variation but
0:50:27	it
0:50:28	but with a lot
0:50:30	understanding the user involves no
0:50:34	not just what they said you can send that but also they are and lastly
0:50:38	that use amount has implications for all components of the dialogue system
0:50:43	and for evaluation
0:50:45	so lots of open issues this is the typical shape of the iceberg
0:50:49	a user and reward functions dialogue policy learning
0:50:54	user response generation and so we have a context where language modeling that use the
0:50:59	user model as an input
0:51:01	and rate for user simulators of those times the things you could do we haven't
0:51:05	started out and you have this
0:51:09	but well as dependent the word function we have anyway it so it's a at
0:51:13	this platform for language processing research and that i will still
0:51:43	so that is stuff that i know best about and they're definitely where other people
0:51:48	you
0:51:49	participate who are interested in user modelling
0:51:53	so wouldn't be so the system we feel that had no user modeling this is
0:51:58	the coast
0:52:00	value that this is close to competition
0:52:03	using our data
0:52:05	okay
0:52:05	so we had no user modelling and they're
0:52:07	we didn't have a the detection of engagement and the personality stuff
0:52:13	and we did that actually started with you we use personality to predict topics
0:52:18	so we had a little bit
0:52:19	but not the not about it
0:52:23	so there were other people interested in user modelling i don't know specifically what it
0:52:27	what they did
0:52:30	the presentations that were so i know more about the three finalists because of their
0:52:37	presentations
0:52:41	i
0:52:42	don't think there was
0:52:45	much using modelling and in that
0:52:50	so
0:52:52	so i would say i don't know as much
0:52:57	more of that was so we did less the
0:53:03	trying to use reinforcement learning and that sort of stuff because
0:53:08	we just that we don't have the day
0:53:11	so the people to more of
0:53:14	that approach so i think there is a difference
0:53:19	in terms of the silence of the approaches
0:53:22	i you know when and the thing is
0:53:26	everything is important
0:53:27	so you know button most important
0:53:31	you know that i think the user modeling definitely the user
0:53:35	centric stuff so that the thing is in terms of being user centred we will
0:53:38	change topics quickly
0:53:41	and to if things were going style
0:53:44	so i think that helped us i think the prosody sensitive generation process
0:53:49	but i think most importantly having lots of topic
0:53:52	contents
0:53:53	interesting content
0:53:55	helpless
0:53:57	but you know the other stuff that people did it probably would have helped us
0:54:01	if we had incorporated it just that was not always some time
0:54:05	so it's hard to compare what was more important
0:54:09	across teams
0:54:43	exactly and that was indeed the strategy
0:54:46	i
0:54:51	i agree and so we don't do very often
0:54:54	so what we did is we had
0:54:57	a series of strategies
0:54:59	for when we didn't understand what the person said
0:55:02	that was one of them
0:55:05	we also have the strategy of asking
0:55:07	for repetition
0:55:10	we also have the strategy of saying we don't understand
0:55:15	so there was
0:55:17	i think there is at least five different strategies
0:55:21	we would cycle between with some randomness a but also some use a the sentiments
0:55:28	of that percent to figure out
0:55:31	the detected sentiment to figure out
0:55:34	which to prioritise
0:55:36	tobias
0:55:38	between the different strategies so our way of dealing with it is to sample between
0:55:42	different strategies
0:55:44	there were actually at least one t maybe more than one team that actually used
0:55:50	a lighter
0:55:51	and incorporated adam's in the same way isn't it wasn't like our are many skills
0:55:58	with a little bit like harmony still so they allow to take a shall use
0:56:01	the lights the slot eliza into the conversation
0:56:06	we did not do that
0:56:08	we just had that as one
0:56:11	particular strategy our own implementation of it
0:56:21	very few
0:56:25	but people do assets order to take
0:56:29	and
0:56:31	the ask questions that are a little bit more difficult so that's the that's like
0:56:35	the why question
0:56:37	there you people do that that's really hard you don't have a we don't have
0:56:41	a solution for that right now
0:56:44	more or and
0:56:46	they'll ask them or slightly more specific question
0:56:50	and we can come up with not a great response the least
0:56:58	better than i don't know
0:57:00	the thing i don't know when you say what did you find interesting is a
0:57:06	it can valid but not great response
0:57:42	wonderful question that you are asking that question they are not and because we don't
0:57:46	have the prosody we can't tell
0:57:50	and so a unit at different version of this talk i those examples and it's
0:57:56	very frustrating
0:58:05	you will know you would have a i mean prosody analysis is not perfect right
0:58:09	but you would have a much better ideas so you could it would be easier
0:58:13	to get sarcasm
0:58:17	no request
0:58:38	so are now natural language generation is not at all sophisticated
0:58:44	that's an area where i would definitely want to improve it's just
0:58:49	in my own mind it's not the highest priority so when we were generating the
0:58:54	content
0:58:55	about you know the news of the information or whatever it was
0:59:00	basically you take what we got from read it and we that with minimal transformations
0:59:07	so there there's transformations to make it
0:59:11	shorter
0:59:14	there's transformations to there are some simple things
0:59:18	to make it a little bit more suited to a conversation
0:59:21	all but mostly things that are really not suited to conversation we just for well
0:59:26	so that strictly just
0:59:28	the wrappers around the
0:59:31	are generated but that's fairly straightforward
0:59:36	so this is an area
0:59:37	that
0:59:39	we could do a whole lot better
0:59:52	so the knowledge crowd
0:59:56	basically provides links
0:59:58	we we've it's
0:59:59	man it's on you want the details the actual technical details
1:00:05	they use dynamo db on the amazon cloud stuff and i can point you to
1:00:11	my grad student how we do that it it's really important because we have to
1:00:16	handle lots of conversations when we're alive we have to handle conversations all over the
1:00:22	country
1:00:23	so everything had to be super efficient
1:00:27	within a conversation you have to respond quickly so everything has to be super efficient
1:00:32	so the what the knowledge graph allows you do years
1:00:37	say from this point
1:00:40	if i want to stay on topic
1:00:44	or keep with related topics
1:00:45	this is
1:00:46	the region of the set of things that i could go to and that we
1:00:50	have a content ranking

Understanding the User in Socialbot Conversations

Keynotes

Prof. Mari Ostendorf