Speech Transcript - Decision Learning in Data Science: Where John Nash Meets Social Media

0:00:15	Thank you, Brian.
0:00:16	Good morning.
0:00:17	It is my honour to be here
0:00:19	and this is my first time in Interspeech.
0:00:23	I thought that
0:00:24	Interspeech have an interesting culture that before the talk there is dancing and singing,
0:00:30	but it was not offered to me.
0:00:32	So
0:00:33	and I saw many old friends here, many familiar faces
0:00:37	and this community is technically like a cousin to me.
0:00:43	I have learnt a lot from this community.
0:00:46	I'm working in signal processing in general
0:00:52	with application to communication, multimedia and many other things, but not speech.
0:00:56	I don't know why, but I found the answer yesterday.
0:01:05	thing, okay.
0:01:07	Yesterday,
0:01:09	when I first came to America as a teaching assistant
0:01:13	I was teaching assistant for signal systems
0:01:16	and I had to teach students there's linear shift in invariance ??,
0:01:21	there's a concept ??
0:01:23	And every time they were laughing behind me,
0:01:25	so I asked them once: Why are you laughing?
0:01:28	And they told me.
0:01:30	F. You need to pronounce your F.
0:01:35	That's why I cannot do speech processing.
0:01:38	So
0:01:40	today
0:01:42	just like Brian said,
0:01:43	there's no speech, there's no keynote, if I submit a paper to Interspeech it will
0:01:47	be 'yes' or it will be rejected,
0:01:49	because it doesn't match heading, keynote yesterday or keyword at all. However
0:01:58	I hope this can be useful to you. I learnt something like hidden Markov model
0:02:03	and deep learning
0:02:04	from this community and I hope that what you see today, the idea, may be
0:02:08	useful to you.
0:02:10	Okay,
0:02:10	so the story begins.
0:02:12	First
0:02:14	social media.
0:02:15	This is a new phenomenon. More and more decisions and
0:02:20	activity in our daily life are being recorded, tracked and shared.
0:02:25	It's scary, right?
0:02:26	Yes, it's scary. This is kind of user-generated big data.
0:02:31	For example like our motion, our movement everywhere, they are being tracked.
0:02:35	Yes, it is scary. However it is also a 'plus'.
0:02:40	Some potential and very interesting and important research direction for us
0:02:45	and machine learning is a very powerful tool
0:02:49	as being used in this community and also potentially
0:02:52	in many other enterprises, such as social media.
0:02:56	Machine learning aims to use reasoning to find new and relevant
0:03:00	information given some background knowledge. There're many applications
0:03:04	and
0:03:05	you all know, I believe many of you using that machine learning consists of three
0:03:10	elements:
0:03:10	representation, evaluation, automatisation. I will not get into detail of these.
0:03:15	Although it is a very powerful tool
0:03:17	there are limitations and constraints.
0:03:20	Why?
0:03:21	Because the generalization assumption may not hold.
0:03:24	Especially in user-generated data scenario.
0:03:27	Why?
0:03:28	Because the user behaves differently at different time under different settings.
0:03:34	You behave very differently when you are hungry or
0:03:36	you need to go to restroom then you cannot sit here, yeah? You, we
0:03:40	at different we may behave in a different way.
0:03:42	And the training set may not be statistically consistent
0:03:46	with the testing set.
0:03:48	That was the assumption in this machine learning algorithm
0:03:52	and also single objective functions
0:03:54	cannot cover user's
0:03:56	difference interests.
0:03:58	Users have different interests and therefore they have different objective functions.
0:04:02	And users are also rational and therefore they are all selfish. They all want to
0:04:08	optimize their own utility.
0:04:10	And we need to consider this.
0:04:12	So
0:04:14	the knowledge contained in the data
0:04:17	can be difficult to fully exploited from
0:04:20	this machine learning macroscopic view. Although it's very powerful.
0:04:25	Because the data are the outcome of users' interactions.
0:04:29	We cannot ignore the users' interactions.
0:04:32	We need to explicitly consider interaction of users
0:04:36	and basically that mean their decision process
0:04:38	should be taken into consideration.
0:04:41	And in the sequels you will find out that Game theory is
0:04:45	powerful tool
0:04:46	that can study users' strategic decision making, because we can take into
0:04:51	multiple objective function at a single time and we can
0:04:54	also take into consideration of users' local and individual interests
0:04:58	into consideration.
0:05:01	Okay,
0:05:02	let's face it.
0:05:03	Learning is for what? Learning is for making optimal decisions. We don't learn just for
0:05:10	fun.
0:05:10	And learning and decision making are coupled together
0:05:13	due to network externalities. What is that? What does I mean network externalities? I mean
0:05:18	that my decision, my action
0:05:20	will affect your decisions and your behavior.
0:05:24	We all affect each other. You made decision to come
0:05:26	here and I made decision to come here, that's why we are all here.
0:05:30	We affect each other. However what is the problem?
0:05:33	The problem is that there is a missing link.
0:05:36	And the missing link is that
0:05:38	in a very traditional sense machine/social learning and strategic decision making are still two
0:05:45	unrelated disciplines.
0:05:47	And so here
0:05:48	we propose
0:05:49	a notion of decision learning to preach in between
0:05:53	that
0:05:54	when we are doing learning we also need to consider the decision making process.
0:05:59	So this is a
0:06:00	title of the
0:06:01	talk. 'Decision learning' by that we mean learning with
0:06:06	strategic
0:06:07	decision making.
0:06:09	Okay,
0:06:10	I'm going to use three example to illustrate
0:06:12	how decision learning work.
0:06:14	First decision learning with Evolutionary User Behavior. Second is with Sequential User Behavior and then
0:06:21	we'll talk about how we design mechanism
0:06:23	from system perspective.
0:06:26	First let's talk about Evolutionary User Behavior. Here I want
0:06:30	to use information diffusion over social network as an example.
0:06:34	Okay,
0:06:35	now you see
0:06:37	this is the Twitter hashtag during 2008 US Presidential elections.
0:06:43	It
0:06:44	captures spreading of comments and phrasing by candidates.
0:06:48	And look at
0:06:49	pink one, you can see that pink one
0:06:51	this one says it is lipstick on a pig.
0:06:54	It is a phrase by famous Sarah Palin, if you know her.
0:06:58	And you can say that when a phrase was comment
0:07:01	there is a duration and it'll reach to the top
0:07:04	and then eventually it decreases and then goes down.
0:07:08	You see that everything have a durations.
0:07:10	And
0:07:11	this is for the political comment
0:07:14	and this was done in 2008. Now, if you are politician you want to measure
0:07:22	that your
0:07:23	message can be delivered. When to reach to the peak?
0:07:26	When will be the starting point and when will be the end point?
0:07:28	It had to be before the election not after, right?
0:07:31	Okay, now this is the other point.
0:07:34	That is online advertising for new product.
0:07:36	How to predict
0:07:38	the popularity of this advertisement and eventually what is the market share?
0:07:45	Can we predict that?
0:07:47	Okay, all of this relate to one problem that we call Information Diffusion Problem.
0:07:51	How information diffuses?
0:07:54	Users exchange information over social
0:07:56	network. Okay, we're talking about this on the our social network
0:07:59	and the study of this information diffusion,
0:08:01	this problem is important. Why? Because if we understand the dynamics
0:08:06	of this information diffusion. We can predict and maybe control
0:08:10	the start/peak timing and distribution.
0:08:13	And we can estimate a final reach of the populations
0:08:16	it also perhaps we can identify the influential users and links
0:08:21	for our purpose.
0:08:23	Okay,
0:08:24	now
0:08:25	Dynamics In Information Diffusion.
0:08:28	It is a sequence of decision making processes.
0:08:31	It is not what we said, what we though that information just
0:08:35	spreads by itself.
0:08:37	No.
0:08:38	For information to diffuse
0:08:40	it relies on other users to forward or post the information they receive.
0:08:45	Like I have my social network and if you are in my network, I post
0:08:48	something and you will see that
0:08:50	and once you see that you have to make a decision to post it or
0:08:53	not
0:08:53	and then from there now everybody when you are in somebody's social network
0:08:58	everybody have a decision making process: Should I post, is the information exciting,
0:09:03	are our friends interested in it? Or I don't want to post because
0:09:07	it could be embarrassing.
0:09:09	There is a decision making process here
0:09:11	and information diffusion is an evolving process
0:09:15	like our evolution.
0:09:17	So
0:09:18	for information diffusion we ask
0:09:21	whether to post
0:09:23	or forward or not.
0:09:25	And for evolution process
0:09:27	when there's a gene that have a mutant
0:09:29	we ask
0:09:30	whether to take mutant or not.
0:09:33	So it's similar, so we want to relay this together and model this problem.
0:09:37	And social network is always illustrated by a graph structure and we understand that so
0:09:42	we are using the
0:09:43	Graphical evolutionary game.
0:09:45	Now
0:09:46	what is this Graphical Evolutionary Game? Let me make a very simple explanation.
0:09:51	Each player have
0:09:52	a notion of fitness.
0:09:55	How fit you are? How fit I am?
0:09:57	Okay and so this is an evolution that we all know. The stronger one, the
0:10:03	fittest will survive.
0:10:05	A users' fitness
0:10:07	is determined by this as you can see I tried to use this equation and
0:10:11	this is the user's fitness.
0:10:13	And B is the baseline fitness that
0:10:15	depend on myself, okay.
0:10:17	And this U
0:10:18	is interaction payoff
0:10:20	determined by the environment.
0:10:22	And this alpha is selection intensity.
0:10:26	So if I'm going to interpret this I would say
0:10:28	that
0:10:29	someone, like for someone here Haizhou Li here in Singapore
0:10:34	his fitness depends on his own fitness
0:10:38	plus the selection intensity multiplied by overall environment fitness of Singapore together, that is
0:10:44	his own fitness strenght.
0:10:47	I think that makes sense.
0:10:48	So
0:10:50	an evolution process has two important elements. One is selection and one randomness.
0:10:56	These two are very important in the evolutionary process.
0:10:59	So let me use this example. We have a graph
0:11:02	and then randomly we select
0:11:05	a node
0:11:06	that can be a user.
0:11:08	And once we select this one we have already randomness in here, now
0:11:13	we are going to make selections, okay.
0:11:15	So
0:11:16	we have to compute fitness.
0:11:19	Its own fitness and also its neighbours' fitness.
0:11:23	I want to compute all these fitness and we select
0:11:26	the strongest fits
0:11:28	to imitate.
0:11:29	It doesn't have to be imitation, I just used imitation as an example.
0:11:33	There's a ?? protest and ?? despair there many different kinds of processes.
0:11:36	Okay,
0:11:37	so
0:11:38	we need to know which one is the best
0:11:41	fits
0:11:42	and then we are going to imitate
0:11:45	and
0:11:46	this is the best fit and we imitate,
0:11:49	okay.
0:11:49	So this is the
0:11:51	a typical evaluation process that is determined
0:11:55	by users fitness.
0:11:57	Now
0:11:58	Information Diffusion Over Online Social Networks
0:12:00	I have just say that
0:12:03	online social network can be presented as a graph.
0:12:06	As we can see it's a graph.
0:12:07	And this information diffusion depend on the user's actions to forward or not depends on
0:12:13	utility.
0:12:14	If I like or my friend may like it I will calculate the utility of
0:12:19	that and if
0:12:20	that utility is strong I'm going to forward or post.
0:12:23	If that is embarrassing then I am not going to do it. It'd damage my
0:12:27	reputation.
0:12:28	So this is a very similar to the
0:12:31	Graphical evolutionary game.
0:12:33	Now on the left hand side
0:12:35	is the Graphical evolutionary game. On the right hand side we have the Social network,
0:12:39	okay.
0:12:39	Now
0:12:40	there we have notion of fitness as I
0:12:43	explain. Right hand side we have the user's utility. Now we
0:12:46	basically we try to model the entire problem using this utility.
0:12:51	Is there interest or not that we can do so. So on left hand side
0:12:55	is that how can we map
0:12:58	and relay that utility or our interest to the notion of fitness
0:13:02	and then we can use Graphical evolutionary game which is a
0:13:05	very powerful tool that have been useful sometime, that we can use.
0:13:09	Okay, so how can we calculate that? I'd use a very simple example here.
0:13:15	Now let's look at this graph.
0:13:17	Now we have a user.
0:13:20	He chooses strategy to foward.
0:13:23	Okay. What is his own fitness?
0:13:25	Here I choose B=1.
0:13:27	This is based on his own.
0:13:29	And the neighbour is what?
0:13:30	We have Kf neighbor here. We have Kf neighbour
0:13:34	with utility that forwards also.
0:13:37	The rest of them K-Kf
0:13:39	choose not to forward.
0:13:41	Okay, so this is the utility of all his neighbours and you have selection intensity
0:13:46	if that
0:13:46	overall together with his own
0:13:49	fitness. This is the overall fitness of
0:13:53	the user who have the strategies to forward.
0:13:56	For someone not to forward,
0:13:57	we'll be doing the same,
0:13:59	okay.
0:14:00	Now for this
0:14:01	network, we have this configuration probability.
0:14:05	We have the fitness of the 'center guy'.
0:14:07	Now we need to calculate the fitness of all his neighbours.
0:14:11	Let's say someone with strategy to forward also.
0:14:14	What is the
0:14:16	fitness
0:14:17	and if someone with strategy not to forward
0:14:20	what is the fitness? So we calculated all this.
0:14:23	Now
0:14:24	with this we want to calculate what is the
0:14:28	probability that someone will change from forward strategy
0:14:32	to not to forward strategy?
0:14:34	And we can calculate all this, okay. The details I will just omit.
0:14:37	By doing all this
0:14:39	we can find the dynamics of information diffusion.
0:14:43	This dynamics of percentage of users who will
0:14:46	forward the information will be captured by this equation that we can calculate
0:14:50	and
0:14:51	the final evolutionary stable state
0:14:54	can be also obtained by this. And let me explain this. What is this
0:14:58	if the utility of forwards, all forwards is much larger than the rest who will
0:15:03	not do anything about it then
0:15:06	one hundred percent of the users will all forward and post.
0:15:08	Because it's too advantageous to them.
0:15:12	If on the other hand
0:15:14	not to forward have the highest utility and to forward
0:15:17	have the lowest utility then noone is going to do it.
0:15:20	Follows in between, there will be a stable percentage of population that will do that,
0:15:27	okay. This looks complicated.
0:15:29	Now K is degree of freedom. Meaning for each node how many friends you have.
0:15:34	This assumes that
0:15:35	(scale) is a large enough, okay. Let's assume K is sufficiently large. Now you can
0:15:40	see that basically it
0:15:41	is independent of the degree of freedom.
0:15:44	And meaning what? Meaning only the utility can determine how much percentage of the population
0:15:50	will forward or post information or not.
0:15:53	Okay, now let's see this
0:15:56	experiment. These are real data we acquired from Facebook. In this particular graph we have
0:16:02	4039 users with 88234 edges.
0:16:05	And here we have ten
0:16:07	subgroups, ten major user subgroups. First this is a social network so you can see
0:16:12	that in
0:16:13	the low scale it is a straight line here. This is so called scale-free network
0:16:18	meaning
0:16:18	that this is low scale so low scale people have very high
0:16:23	degree of connectivity, meaning it's very powerful. It is exponentially less, okay. So that is
0:16:29	something that we all knew before.
0:16:31	Okay, now let's look at Evolutionary Stable States.
0:16:35	That is
0:16:36	now let's look at left hand side first.
0:16:39	We use four cases as examples. So now we know the utility function, okay. Utility
0:16:44	function of
0:16:45	four parameters now are our model parameters. We want do model of information diffusion process.
0:16:50	So now look at this. The first case.
0:16:53	If utility of forwards is very high or higher than others, then what?
0:16:57	One hundred percent of user or everybody will post.
0:17:01	If
0:17:02	it reduces
0:17:03	and is not as good
0:17:05	however still good enough then
0:17:07	then there are some percentages in this case some sixty and some percent that will
0:17:12	post or forward.
0:17:13	If it's getting less
0:17:15	then about thirty and some percent
0:17:17	will post and forward.
0:17:18	If on the other hand
0:17:20	it's not good at all. The utility is so bad, there's nobody left to post
0:17:24	then
0:17:25	nobody is going to forward it
0:17:27	and do anything. And there is a rising time and we can calculate that time
0:17:32	and eventual population,
0:17:33	that percentage of people. We said there are ten subgroups and in each group they
0:17:37	all behave the same. This
0:17:38	is just to show that.
0:17:40	Okay, now this is something that we say: Okay, if we had a model parameter
0:17:45	what is the behavior?
0:17:46	Now, how about we don't have the model parameter. We don't.
0:17:49	We only had the data and this of the data from MemeTracker.
0:17:53	MemeTracker is this
0:17:54	it is
0:17:55	this news, this cycle network. It builds map
0:18:01	for the daily news cycle, okay.
0:18:03	By trying to link
0:18:05	that what others,
0:18:06	these are news report,
0:18:08	for these information.
0:18:10	You see as example here is a comment: we're not commenting
0:18:15	on that story I'm afraid. Okay. With this
0:18:18	then someone reports: we're not commenting on that. And there are three different links. And
0:18:23	then somebody
0:18:23	changed a little bit: we're not commenting on that story.
0:18:27	And then there is a link, okay. So
0:18:30	this is a very good source of
0:18:32	to find these information, how you diffuse, okay. So we had this data and this
0:18:36	is a huge amount of
0:18:37	data. As you can see that we have more than 172 million news, articles and
0:18:42	sources.
0:18:43	Okay, so we might from here
0:18:45	now what we want to do is
0:18:47	we want to learn the utility
0:18:48	that is, I have this vast amount of data,
0:18:51	can I find a utility? I can reduce to few parameters to describe how informations
0:18:56	may diffuse.
0:18:58	Okay, this is the result.
0:19:00	This is dynamic of four pieces of information.
0:19:03	This curve associates with word
0:19:06	Google Wave.
0:19:08	This associates with the word
0:19:10	Tehran.
0:19:11	The grey line is a real data.
0:19:14	And the red one is our fitting, our result.
0:19:17	You can see it very well and the blue line
0:19:20	is using these only
0:19:22	using only the machine learning approach.
0:19:24	So you can see that
0:19:26	what we can do can fit much better.
0:19:29	And most importantly that not only that
0:19:32	we describe this information diffusion phenomena into, in this case only in two parameter, because
0:19:38	we normalize these two, okay there are more and more. We only used two parameters
0:19:43	we can describe how the information
0:19:44	diffuse, when it stops, when it reached to the peak and the
0:19:48	way it (ends) and if we only have partial information we can predict.
0:19:51	I didn't have something to show you we can predict
0:19:55	and also.
0:19:57	Okay.
0:19:58	Now
0:19:59	let's do this experiment. We have five
0:20:02	group of sites from the database, okay. We know this is a
0:20:05	large database. Each group contains five hundred sites.
0:20:10	And we estimate the final stable population,
0:20:13	the lowest population that posts or forwards these information.
0:20:16	So these five groups we can find very interesting.
0:20:20	At the end it had different percentage of
0:20:23	users in this particular group.
0:20:25	That will post or forward information.
0:20:28	And
0:20:31	this is to show that the black one is, these are from,
0:20:36	these are our model.
0:20:38	And the red one is from the real dataset and you can see that they
0:20:42	are very consistent.
0:20:43	Okay, now back to this.
0:20:44	What does this means?
0:20:46	This is very interesting. We can see that group five
0:20:49	behave cohesively
0:20:52	or share major common interests.
0:20:54	That means that those users in this particular group
0:20:59	more than eighty one percent will agree with the social (group),
0:21:04	with their own networking friends. Whenever they post something
0:21:07	eighty one percent that they will also post.
0:21:11	So
0:21:12	and however in the group one. There only about
0:21:16	nineteen percent will do so.
0:21:18	Therefore if you hire an advertiser you want to
0:21:21	do advertisements. Which group are going to forward to?
0:21:25	And then to expect
0:21:27	a high
0:21:28	percentage of
0:21:30	results that many people with see.
0:21:32	That would be this group.
0:21:34	So we can
0:21:35	expect our group five in this particular case behave
0:21:38	cohesively and share major common interests. However group one
0:21:43	share rather very little common interest.
0:21:45	This is to mine
0:21:47	the coherence of group and
0:21:50	if we can do anything about it or not.
0:21:53	Okay.
0:21:54	Now I'm going to change gear to talk about Sequential user behavior.
0:21:59	Here I'm going to use some
0:22:01	well known website that we collected data from to
0:22:07	illustrate the problem. You all know Groupon. I think some of you may also use
0:22:10	it and advantage of it,
0:22:11	right? On Groupon we sometimes find out there is a very good deal for a
0:22:15	restaurant
0:22:16	that you always go and we buy from there.
0:22:19	And
0:22:20	and Yelp.
0:22:22	We're also using Yelp.
0:22:23	When I am here I am using Singapore app to determine which restaurant I want
0:22:29	to go.
0:22:31	Once we pull them together it gets very interesting.
0:22:33	We found that
0:22:35	this is a Yelp rating
0:22:36	for some very good restaurant.
0:22:39	We found that
0:22:41	Yelp's star rating decline, okay you see this is declining after a succesful Groupon deal.
0:22:49	With rating that's very high and
0:22:52	many people buy this Groupon
0:22:54	and then in the next few months you can see that
0:22:57	its rating is declining.
0:23:00	Why?
0:23:03	This is a negative network externality
0:23:06	in place here. The one I'm mentioned, I'm going to explain that.
0:23:09	So this degrading quality may be due to overwhelming number of customers, because it's too
0:23:15	successful. Many people buy and then in the following day many people will show up
0:23:20	and they only have three waitress. And then you have one hundred people who show
0:23:24	up. Quality would not be good.
0:23:26	Kitchen is limited and may not be good.
0:23:29	That could be the reason.
0:23:31	Okay, so
0:23:33	what is a phenomena here?
0:23:35	Learning
0:23:36	to get best deal. Everybody want to get best deal.
0:23:39	And then they make a decision to take advantage of that and then their decisions
0:23:44	affect each other.
0:23:45	Reduce all this ?? fashion from each other.
0:23:49	That's what we call
0:23:50	negative network externality.
0:23:53	Your decision and my decision
0:23:55	eventually affect each other,
0:23:58	okay.
0:23:59	So how can we have model this problem?
0:24:01	I like to go to a problem in machine learning called Chinese restaurant problem or
0:24:07	Chinese
0:24:07	restaurant process. Some of you are using it also.
0:24:10	I don't need to explain this is Singapore if you don't know
0:24:13	what Chinese restaurant problem is, just choose chinese restaurant you go to, okay.
0:24:17	So
0:24:17	Chinese restaurant problem
0:24:19	we have limited round tables with unknown size
0:24:22	and this round table is of unknown size could be cloud service or it could
0:24:25	be a deal,
0:24:26	okay.
0:24:27	And customer which choose table to sit
0:24:29	or open a new table.
0:24:32	For the Chinese restaurant process we have infinite number of a tables,
0:24:36	with infinite number of seats.
0:24:38	A customer enters and with predefined probably to
0:24:43	either choose table to sit or open new table.
0:24:46	This is non-strategic because it's parametric, okay.
0:24:50	So now
0:24:51	we introduce Chinese restaurant game.
0:24:53	We want to introduce this strategic behaviour, this decision making here. Why? Because
0:24:58	eventually we want to understand the negative
0:25:02	externality effect, to model it and to understand that.
0:25:06	Okay, so this Chinese restaurant game.
0:25:09	We have table
0:25:11	with unknown size
0:25:12	Rx (θ):
0:25:13	is system state (θ). System state is the condition of the restaurant. How much money
0:25:16	they have, how much
0:25:17	budget they have and the environment, okay.
0:25:19	A customer has signal S about the system state from
0:25:22	advertisement, from friends, from wherever they have been before.
0:25:26	Or from advertisement.
0:25:28	Which table to seat
0:25:30	to have maximum utility?
0:25:32	You know that if you go to chinese restaurant some day it'd very unfortunate many
0:25:36	people seat
0:25:36	with you and eat, is that right?
0:25:39	Utilities is very bad. It's not confortable at all.
0:25:42	And this is a negative network externality.
0:25:44	More customers at one table less space for each user and therefore
0:25:49	utility is less
0:25:50	for past and future.
0:25:52	What does that mean in future?
0:25:54	That mean
0:25:55	when you come in
0:25:57	based on the condition you sit at a table that may not be the best
0:26:01	decision.
0:26:03	In ten minutes
0:26:04	ten people may sit with you, okay. So you need to predict the future and
0:26:08	that is the problem and the
0:26:09	future affect you decision right now.
0:26:11	And that make it in fact difficult.
0:26:15	Okay, so this is a sequential decision making problem, customers make decisions sequentially
0:26:20	and information of observation that we have is that for every customer comes,
0:26:24	every player, they can only seat
0:26:27	at that moment how many people seat at the table.
0:26:30	And then also the signal, okay, from those who seat before him
0:26:34	that's all he can get.
0:26:35	So when first customer arrives
0:26:38	it picks up a table
0:26:39	and then the second one
0:26:40	pick another table, and so on and everybody picks one. Sequential decision making process.
0:26:49	Can
0:26:49	number three user make the best decision at that moment?
0:26:53	No.
0:26:54	If he cannot predict the future and see the future his decision at that moment
0:26:58	may not be the
0:26:59	best, because people not yet coming are going to affect him,
0:27:03	okay.
0:27:04	So now let's assume there is a perfect signal, meaning that
0:27:07	everybody know all the signals, all the decisions other people may want to do given
0:27:11	that
0:27:11	condition that they have.
0:27:15	I would not get to the details so I'll explain
0:27:17	then there is equilibrium grouping meaning the best utility is
0:27:21	that when you choose that
0:27:24	with this signal from everybody it will be the best.
0:27:29	When anyone changes to other table
0:27:32	the utility will only reduce.
0:27:34	So that would be the best strategy.
0:27:36	When a signal is perfect.
0:27:38	Let me explain this. So what will be the strategy?
0:27:41	When everything was,
0:27:43	all the condition if I come in I will know
0:27:46	ten minutes or thirty minutes later of what people come here
0:27:48	I have the perfect information. You know what?
0:27:50	If I come in
0:27:51	I'm going to know which table have the best utility and I'm going to choose
0:27:55	that.
0:27:55	The second one will choose that also, until the table is filled. And then they
0:27:58	will go to the
0:27:59	second best utility and do so, because the signal is perfect, right.
0:28:03	So
0:28:05	customers who come early
0:28:07	have
0:28:08	advantage
0:28:10	but those were protect signals.
0:28:11	In real life
0:28:13	it's not that easy.
0:28:14	It's imperfect signal. We cannot have the
0:28:17	perfect signal and therefore we have to learn
0:28:20	from the noisy signal to form our belief and this is the learning process
0:28:25	and we can use Bayesian learning or some other thing to construct the belief that
0:28:29	we have,
0:28:31	okay.
0:28:32	So
0:28:35	now this.
0:28:36	What is the best response? The best response is:
0:28:39	let's choose the best utility.
0:28:42	But in order to choose the best utility we need to know the final distribution
0:28:46	of the users and like
0:28:48	I said we need to know who come in, where they will seat?
0:28:52	So this utility function depend on subsequent users' decisions
0:28:56	and this is a negative network externality.
0:29:00	And we need to predict the future decisions to find a best response
0:29:03	therefore we had to ?? backward induction to do so. So that everybody is going
0:29:08	to have optimal
0:29:09	decision we can predict and then backward induction from there.
0:29:13	Okay. Now this is the summary of this Chinese restaurant game and this is some
0:29:18	example.
0:29:19	Remember this?
0:29:20	I say that
0:29:21	if there is a good Groupon deal
0:29:25	the utility will go down like that, okay. The red one is the real data.
0:29:29	And now let's use linear model in this model
0:29:32	the utility, okay.
0:29:33	So
0:29:34	this is utility that we model. We learnt from the real data that this is
0:29:37	the utility.
0:29:38	And based on this utility
0:29:40	this utility and then we can attain all these parameters that we have and then
0:29:44	we can use
0:29:46	what we had just derived.
0:29:47	We consider all this negative network extarnalities. We can predict the future to find optimal
0:29:52	result.
0:29:53	As you can see the blue one is learning with negative network externality that we
0:29:58	propose and then
0:29:59	the red one is without. Meaning only at that moment.
0:30:02	We can increase ?? DSSR rating.
0:30:04	We can indeed increased the rating by doing so.
0:30:08	So if we consider this
0:30:11	negative network externality then the strategy will perform much better.
0:30:17	In fact we can also use this to devise strategy.
0:30:20	This is a New restaurant's strategy, okay. Now we have a new restaurant.
0:30:24	This is a low quality restaurant and a high quality restaurant.
0:30:29	Now
0:30:30	what is the number of customers?
0:30:34	Look at this. For a low quality restaurant
0:30:37	in order to increase the number of customers who arrive
0:30:41	the deal price had to be less.
0:30:45	If your quality is not good, you make it cheaper.
0:30:49	If
0:30:50	high quality restaurant even if deal price is a little higher you can still make
0:30:55	a good profit.
0:30:57	This is a signal quality. Your advertisement and also all affects to be none.
0:31:03	If you a high-quality restaurant you want signal quality
0:31:06	to be very good, so people know this is a good restaurant.
0:31:10	If you have a low quality restaurant. Don't let know too much about you, okay.
0:31:14	That would be better.
0:31:16	And next we get to revenue. This is now what is the revenue?
0:31:20	High quality and low high quality restaurants they can also all achive peak revenue.
0:31:26	So
0:31:27	for high quality restaurant to achive peak revenue
0:31:30	the signal equality had to be good
0:31:32	then the revenue was higher. Let people know
0:31:34	and then
0:31:36	through the word of mouth people will come.
0:31:39	However for low quality restaurant it's not good.
0:31:42	Don't let people..
0:31:44	The better the signal quality the lower revenue you are going to have and see.
0:31:49	Okay, so I think this all is common sense, however it's highly nonlinear
0:31:53	and we had to model that and this is indeed a very highly nonlinear problem
0:31:57	and I don't have this
0:31:59	slide to show you. And in our paper you can take a look. So high
0:32:03	quality restaurant should try
0:32:04	to increase the signal quality and low quality address should hide the
0:32:09	quality information and use low deal price to attract the customers.
0:32:15	Indeed we developped a family of this. We had a Chinese restaurant game and we
0:32:18	also had Dynamic Chinese
0:32:19	restaurant game for people who come and go.
0:32:21	And that is for example just like this Wi-Fi, okay. Many people come here and
0:32:25	many people may leave and
0:32:27	they decide which network they want to choose. And also we had this Indian Buffet
0:32:30	Game.
0:32:31	And that is
0:32:32	interesting. We see all this in US, okay.
0:32:36	Chinese restaurant in this all round tables you can
0:32:39	see in Indian restaurant also had these buffet thing.
0:32:42	Okay so
0:32:43	we have multiple
0:32:45	choice of what we can do
0:32:47	and this can be applied to many different scenarios.
0:32:50	And we've applied to for example we have seen this in Yelp, okay. There is
0:32:54	Yelp on Amazon.
0:32:55	And this is so for this review, okay. It's a sequential decision and Question and
0:33:00	answer sites. You can see
0:33:01	all these Question and Answers sites, we also had a problem on this.
0:33:05	Also many other sites as long as you have this sequential decision process that come
0:33:10	in
0:33:10	and why these .. we can apply this to that, because social computing system in
0:33:16	this case, they are all structured in the same fashion that
0:33:20	users arise sequentially
0:33:21	and to make decisions on
0:33:23	whether to do something or not, whether to forward something or not.
0:33:26	And also there are externalities amount users.
0:33:29	Your decision, your actions will affect each other and also there is some uncertainty. So
0:33:34	in this
0:33:35	sense Chinese restaurant game will be a natural choice in modeling and analyzing these users'
0:33:40	behavior, so
0:33:41	that we can come up with the best
0:33:43	piece of the model we have seen and also to devise some incentive mechanism,
0:33:47	okay. To attain some results that you want.
0:33:50	And I believe that in speech and language community
0:33:53	that you also have something that you may want to do.
0:33:56	Speaking of this mechanism
0:33:58	we want to see this I believe that you use a lot of this a
0:34:01	labeling also in this community.
0:34:03	Now you have seen all this
0:34:05	that's all from users' point of view, so you may ask can
0:34:08	system designer design something based on this?
0:34:10	Yes.
0:34:11	So we want to see if we can design some mechanism to
0:34:14	attain something that we really want.
0:34:17	In this case I want to show you
0:34:19	is that how can we collect
0:34:21	good
0:34:22	data
0:34:23	okay.
0:34:25	Large scale labeled dataset
0:34:28	is very important in many applications. I think you all know better than me.
0:34:31	Because more data leads to better accuracy,
0:34:34	however large scale in-house annotation is very expensive.
0:34:38	You need to have a thick pocket to do so.
0:34:41	And often it becomes a bottleneck for that.
0:34:43	And recently you have seen that microtask crowdsourcing is very popular.
0:34:49	It claims to be a very promising way, because it can handle large volume, short
0:34:53	time and low cost.
0:34:54	All the benefits
0:34:56	and a question is: is that so?
0:34:58	We have many website that can do so,
0:35:01	okay.
0:35:03	There's a problem
0:35:04	you see all the
0:35:05	positive side, but there's something that didn't tell you.
0:35:08	That
0:35:09	due to limited budget, okay you and I don't have deep pockets, okay
0:35:13	so that the collected data often have low quality.
0:35:18	Why
0:35:19	low quality? I will explain that. T mutual incentive thing, okay.
0:35:22	Now
0:35:24	for machine learning all signal processing, okay.
0:35:27	Sometime we say okay
0:35:29	let's deal with it, let's cope with this
0:35:31	so let's filter out low quality data or cope with
0:35:34	noise with this modified algorithm to be more robust and probable.
0:35:38	But there is after-effect,
0:35:40	okay. This is a our typical behavior up to that.
0:35:44	Now with these decision learing solution
0:35:47	maybe we can ask
0:35:49	can we do a better job before that?
0:35:51	Can we incentivize
0:35:54	with a mechanism to ensure high quality data from the first place?
0:36:00	Okay
0:36:02	you want to give me more money? No, I don't have money to get you.
0:36:05	Let's see if we can detain ?? something
0:36:07	that can make you happy.
0:36:09	We allow
0:36:09	increase in your budget, maybe reduce your budget, okay. What that is
0:36:13	okay.
0:36:15	This
0:36:16	crowdsourcing
0:36:18	has such characteristic
0:36:20	repetitive and tedious small tasks. Yes or no, correct or not correct. And you do
0:36:25	that
0:36:25	and how much do they earn?
0:36:27	USD 0.04 cents for one thing.
0:36:29	Too many
0:36:31	too many of them have house or wife and they have nothing to do,
0:36:34	when baby is sleeping they can make a little bit more budget, so they can
0:36:38	help pay for some of the bill.
0:36:41	So worker receives a small amount of reward and there's no competion among workers.
0:36:45	So what's the problem?
0:36:47	The problem is that this may lack proper incentive.
0:36:51	It is profitable for a worker to submit poor solutions.
0:36:55	Yeah?
0:36:56	Because of the
0:36:58	the number of solution is just too high, nobody is going to check on it.
0:37:02	It'd be too expensive for you to check on it, right?
0:37:04	Think about it. That's why it's expensive.
0:37:07	So
0:37:08	and however you want to check on all this, just like I said.
0:37:11	Provide incentive is challanging. Why? Requesters typically have low budgets for each task
0:37:18	and also to offer better incentive you have to pay more, so people can do
0:37:22	better job, you need more
0:37:23	money and also in order to verify the solution is also expensive. All these are
0:37:28	problems.
0:37:29	So what incentive mechanism should
0:37:31	requesters employ to collect high quality solutions in a cost-effective way?
0:37:38	So here we propose a mechanism
0:37:40	that sounds interesting and we also have the Oracle analysts that found it can work
0:37:44	and we also did
0:37:45	experiments, so what is that? I'll show you.
0:37:47	Now up to now there are two very basic reward mechanisms
0:37:51	that we can do so. That we can evaluate the solution.
0:37:54	One is called consensus mechanism. Consensus mean what? Okay
0:37:58	now I have all these solutions people submitted.
0:38:00	I do a majority vote. I don't know the right answer but
0:38:04	I do a majority check. After majority check I know it's correct.
0:38:10	Correct fits the majority check.
0:38:12	Or I can do reward accuracy mechanism. Meaning I can
0:38:17	I can verify
0:38:19	the solution is correct or not with probability.
0:38:21	It's too expensive with probability. I want to verify all of them, but I don't
0:38:25	have time.
0:38:25	If I had time I'd it myself.
0:38:27	Okay, now I have these people so I may have some probability points, probability principle,
0:38:32	probability
0:38:32	that I'm going to make a sample
0:38:34	and then to check on the solution.
0:38:37	Now you can see these two had cost C or ??
0:38:41	And this relates to a parameter and this is not under control of requester.
0:38:46	So there's a problem here.
0:38:48	This cost, the one I mentioned, is a fundamental problem.
0:38:51	For the traditional (methods) it's a problem.
0:38:54	So
0:38:55	now
0:38:56	before we do so
0:38:58	let's
0:38:59	come up with this incentive mechanism first that's called Mt, okay.
0:39:03	This is called Incentive Mechanism via Training. Idea is this
0:39:06	to employ quality-aware training to reduce mechanism cost.
0:39:11	What does that mean?
0:39:13	We will assign
0:39:15	enforced unpaid training to workers when
0:39:18	performing poorly.
0:39:20	You flip off one poorly
0:39:22	then they had to go to some training time and in this training time they're
0:39:25	not going to make any money
0:39:27	okay and to guess some credential back so before they can do that again.
0:39:32	So this is more like a punishment if they don't do well.
0:39:35	So
0:39:36	okay now we had two states. Everybody had two states.
0:39:39	What are they?
0:39:40	Okay this is normal.
0:39:42	They produce results, give them money so that they are happy.
0:39:46	However
0:39:47	if they don't do well
0:39:49	they are going to go to the training state.
0:39:51	And in this training state workers have to do a set of training tasks.
0:39:56	Unpaid.
0:39:58	Okay
0:39:58	to gain qualification
0:40:00	for the working state.
0:40:03	Okay this can be just a policy
0:40:05	to deter it,
0:40:06	not necessary may be implemented. You will see that.
0:40:09	So everybody now
0:40:10	had two states.
0:40:11	Now you can see two states.
0:40:13	One of these is worker's action at working state.
0:40:16	Now everybody's action at working state.
0:40:19	This will determine the probability of this worker may continue to be in working state
0:40:24	or
0:40:24	he may go to the training state.
0:40:26	And for how long it may get from training state back to working state.
0:40:31	Worker will prefer to be here to make money.
0:40:34	However if the job is not good enough he is going to go here. Nobody
0:40:38	want to come over here.
0:40:41	Okay, so worker's action at this moment
0:40:44	may not only affect his immediate utility, but also his future utility or both
0:40:51	together.
0:40:52	There's a notion of this long-term expected utility here
0:40:56	and basically for you formulate this problem
0:40:58	it is
0:41:00	is a like the
0:41:01	Markov Decision Process (MDP).
0:41:02	But this Markov Decision Process is not easy, because that
0:41:07	the
0:41:08	this MDP faced by each workers also depends on others' actions.
0:41:14	Our action, how's the quality of our work is depending on how others are doing
0:41:19	their quality work, so it is indeed
0:41:21	Game-Theoretic Markov Decision Process. That means what you are doing affects what I am doing.
0:41:27	What I'm doing will affect what you're doing.
0:41:29	Okay
0:41:30	so from here
0:41:31	I will just summarize the results. Basically we show that
0:41:34	with this Game-Theoretic MDP
0:41:37	there is a guarantee of Nash Equilibrium
0:41:39	as long as the number of training set N is large enough.
0:41:44	If the number or training set,
0:41:45	I mean the training problem to be done N is too small,
0:41:48	nobody will worry, okay. I go to the training state, very soon I can get
0:41:52	out just like somebody commits a crime,
0:41:54	you put them in the jail and one day
0:41:56	they come out. You know what? Many people walk out and in everyday.
0:41:59	If you put them in for ten years. Very few will .. they will think
0:42:03	twice before do something again.
0:42:05	Okay, now
0:42:06	we also show that
0:42:08	if we
0:42:08	follow certain design procedure
0:42:11	we can achieve arbitrary low mechanism cost. The cost can be even
0:42:15	very small, almost to zero, if we design it correctly.
0:42:21	Okay, now let's see the experiment setup.
0:42:25	Task, a very simple task
0:42:27	this is to calculate the sum of two randomly generated double digit numbers.
0:42:33	Second year,
0:42:34	second grade student can do so, okay. And (workers are given) ten point for each
0:42:38	accepted solution.
0:42:39	And the goal is for
0:42:40	these participants to maximize or to accumulate these points. This is not real money. We
0:42:45	work
0:42:45	with points, we are doing an experiment.
0:42:48	And the text assigned to these participants comes in three sets. And each one is
0:42:53	limited to three minutes, okay.
0:42:54	Only in this limit had to do so.
0:42:58	And so we had these three sets.
0:43:00	The first set is that reward accuracy mechanism with sampling probability 1, that mean
0:43:05	in this
0:43:06	in this set
0:43:08	we tell all the participants
0:43:10	that we are going to check your every solutions.
0:43:13	And the set number two
0:43:15	with sampling probability 3.
0:43:17	That we will only have thirty percent or probability 0.3 to check your solutions.
0:43:23	That is set two. Set three is
0:43:26	sampling probability policy 0.3 only, however we also tell them we also have this incentive
0:43:31	mechanism.
0:43:32	That is if you don't do well enough
0:43:34	you are going to do some training problem before you can come back to do
0:43:38	that.
0:43:39	Okay, so what are the results that we get?
0:43:42	If we tell them that we are going to have this sampling probability 1, you
0:43:46	can tell that
0:43:47	the accuracy, this is the histogram, number of wokers and their accuracy, the accuracy is
0:43:52	quite good.
0:43:55	With 0.3 probability you can see that most of them don't care
0:43:59	okay, because only
0:44:02	0.3 probability will be checked.
0:44:04	And this is the sampling probability.
0:44:08	You can say .. how about this guy? You know there's always
0:44:11	nerd, okay. And they always do good job.
0:44:15	And now with this
0:44:17	sampling probability 0.3. We tell them: hey we have this, if you got checked, you
0:44:21	are going to put
0:44:22	into there. Now you can see the result is basicly as good as this result.
0:44:29	And you say: Okay, who are these students?, can they do this easily?
0:44:33	They are all
0:44:34	engineering students, graduated student, okay. With forty one of them to participate.
0:44:40	So these show that
0:44:41	in fact with such machanism we indeed can collect good data from beginning. We don't
0:44:46	have to
0:44:47	take it saying that data is not good enough and just take it from there.
0:44:52	Let's
0:44:53	do something you know
0:44:56	do to better if we can collect much better data.
0:45:00	Okay, the conclusion and final thought here is that here we see three social media
0:45:06	examples
0:45:08	to illustrate the concept of decision learning.
0:45:12	In this information diffusion problem
0:45:15	we
0:45:16	try to learn
0:45:18	users' utility
0:45:20	functions
0:45:21	for understanding and modeling strategic decision making.
0:45:25	And for the Groupon and Yelp example that we've seen
0:45:28	we try to learn from each others' interactions.
0:45:32	We have this negative network externality
0:45:35	for better decision making.
0:45:37	And for this crowdsouricng we design a mechanism to
0:45:41	enforce users' strategic behaviour to attain better quality data for better learning.
0:45:47	So
0:45:49	by decision learning
0:45:51	we mean that it is the learning with strategic decision making.
0:45:55	And these two are something that has to be considered together.
0:45:59	And we can analyze users' optimal behaviour from user's perspective
0:46:03	and we can also design optimal mechanism from
0:46:05	system designers' perspective.
0:46:08	Now for the
0:46:10	coming big data tsunami. This is something that we envision.
0:46:14	We are going to have big data here
0:46:17	and these big data and
0:46:18	is not
0:46:20	?? steady data overlay
0:46:22	that
0:46:23	Bob and Alice is going to
0:46:26	learn
0:46:27	from this data and be it whatever model here. I say Markov decision process, it
0:46:30	can be anything to learn from here.
0:46:32	Once you learn something from here, you are going to make decision.
0:46:35	But (when) you make decision you are going to take action and most of these
0:46:38	are sequential actions. We are at
0:46:40	different places and different people.
0:46:42	And then you have the outcome and this outcome will come back to affect the
0:46:46	data.
0:46:47	Like I said the data is not static data
0:46:49	and therefore
0:46:50	Bob and Alice
0:46:52	one can be in Singapore and one can be in US, their actions and their
0:46:56	decisions will eventually affect each other,
0:46:57	because the data have been changed.
0:47:00	And this is something that
0:47:02	we believe that is something that
0:47:04	both
0:47:05	challenging and the potential future for research. And I believe that
0:47:12	this is something that also can be
0:47:15	very promising in speech and language community.
0:47:20	Okay, that's it. End of story. Thank you for your attention.
0:47:32	So I am glad, I think we all made a good decision to come to
0:47:36	professor's Ray Liu talk.
0:47:38	We have time for few questions.
0:47:47	I think this was a wonderful talk.
0:47:49	I've noticed that in your Chinese restaurant decision process that you had this backward reduction
0:47:54	stamp and
0:47:55	you showed your backward reduction for just one time step, from T+1 to T.
0:48:00	The question is: is it computationally feasible to do backward reduction from the end of
0:48:05	the end of time to different time?
0:48:07	Oh yes. That's correct.
0:48:49	I think it's surprising issue, right? I mean it's easier
0:48:52	okay to make one cent for task, if they are more difficult they make 10
0:48:55	cent for task and they had to think twice
0:48:57	right? Which one can make better profit?
0:49:01	So I think that different
0:49:03	what I show is just one dimension, meaning I just used one example. In that
0:49:08	particular we
0:49:08	designed mechanism in that way.
0:49:11	In a different scenario
0:49:12	you can design different mechanism in a different way. I think you should be all
0:49:17	fine. There's
0:49:17	no single solution here. This is just a concept, meaning that we can do so.
0:49:22	And I used an example to illustrate that.
0:49:27	Do I have to push? Yes.
0:49:29	I am a little intriqued by the experiments that you have done.
0:49:32	I'm just wondering that if we look at social networking
0:49:36	you know first time you are using it, maybe you're kind of getting
0:49:42	bogged by that information diffusion that happens.
0:49:46	But you know second time or third time user, you just ignore all the incentives.
0:49:50	I'm just going to do what I normally do.
0:49:52	Say every time you do that experiments I am going to get a new set
0:49:55	of
0:49:56	people who are
0:49:57	using this and then making decisions based on that?
0:50:02	Okay, because there are some microphone problems I didn't hear clearly. But let me
0:50:08	guess that I think you are saying that
0:50:10	if they do many times they have different experiences so may game the system, right?
0:50:16	So I think it would not (happen), because we use game theory to formulate that,
0:50:20	we found
0:50:20	ultimate equilibrium in that solution, so
0:50:23	basicly doesn't matter how they game,
0:50:25	they will have the best strategy that can achieve their best equilibrium solution for that
0:50:30	so doesn't matter what they game. They only have
0:50:33	some of the best strategy they can do, they can go.
0:50:35	Whatever other strategy would not get them better so it's okay.
0:50:44	Anymore questions?
0:50:53	So seems like they want to make a decision to go for coffee
0:50:57	early, but before that
0:50:59	I would like to show our gratitude to a professor Ray Liu and our vice-president
0:51:05	Haizhou will present
0:51:06	a souvenir to professor Liu.

Decision Learning in Data Science: Where John Nash Meets Social Media

Keynotes

K. J. Ray Liu, University of Maryland, College Park, USA