Speech Transcript - Augmenting conversations with a speech understanding anticipatory search engine

0:00:15	alright first let me thank you for the invitation and the opportunity to
0:00:20	to come to all the modes
0:00:22	it's so funny because a friend of mine saying all you going to the middle
0:00:25	of nowhere i said no i'm going to the middle more idea
0:00:30	and i really enjoy coming to new places that i've never been to
0:00:35	so i talk about thirty is
0:00:38	and new trend sort of technology trend that is really stripping merging and taking off
0:00:43	and that is this notion of anticipatory search and how much a speech can contribute
0:00:49	to that
0:00:52	here sort of our region imagine you having a conversation with a friend and she
0:00:56	says only to atone in spitting five minutes and as and putting down the phone
0:01:00	and i'm and i look at the screen this is what i wanna see right
0:01:05	i wanna
0:01:06	basically have the directions of word to any to go and what do we need
0:01:10	to be in five minutes
0:01:12	and if you think about it we can have all the pieces already right would
0:01:16	have user location we have good maps we have good directions we have speech recognition
0:01:22	we have some reasonable understanding and so it's kind of a matter of putting it
0:01:27	all together into one compelling application
0:01:32	so that's kind of the premise we realize that the way that you find information
0:01:37	is changing
0:01:39	and we're moving towards kind of a query free search in the sense that instead
0:01:44	of having to you proactively when you have a something going to find out having
0:01:48	to fire up a browser final
0:01:50	finally a search box and type in your query getting results it can be much
0:01:55	more proactive you when you're context and what you've said and what where you are
0:02:00	the information can come to you as opposed to are you having to find information
0:02:06	but of course we're not alone in this in this idea
0:02:09	recourse well the technology isn't future is that recently joined google had is a has
0:02:15	a pretty similar vision so as search engines may be seen also search engine is
0:02:20	that they one weight to be ask questions
0:02:22	so releasing in our conversations
0:02:25	what we say what we
0:02:26	right but we would we here and they want to submit are needs
0:02:30	and that's
0:02:31	remotes that the same premise that
0:02:34	expect maps was built on
0:02:36	so let's look at some of the enabling trends
0:02:40	for and to separate research
0:02:42	there's mobile devices
0:02:44	there's a i that is making progress
0:02:47	and then and so if you put it together there's applications that can take contextual
0:02:52	information and start making good predictions about what the user what informational needs of the
0:02:57	user might be
0:02:58	so like let's look at these you know in more detail
0:03:02	it's obviously not surprise that
0:03:04	about the whites sre you could as you can probably go anywhere
0:03:08	to and you know a few minutes later there's a couple of
0:03:12	you
0:03:13	videos on youtube already about that event and you know hundreds of pictures the in
0:03:18	fact there's technologies now that are trying to recreate some sort of a three D
0:03:22	map just based on the fact that you have images from different point of view
0:03:27	so
0:03:29	then there's the amazing sort of growth of mobile devices so this is a statistic
0:03:35	for are smart phones and tablets both running
0:03:38	i O S and an and right and of course the absolute count there's us
0:03:43	in china because of the
0:03:44	population that have the highest up numbers but if you look at the growing market
0:03:49	is basically southeast asia and on and stuff so the merrick and some other a
0:03:54	growing market
0:03:56	so
0:03:57	we're ending up in a position where pretty much any adult is gonna have
0:04:03	the smart phone in their pockets
0:04:05	and so that really changes to the possibilities of what you can do with that
0:04:11	because this martin this is mobile devices have a lot of sensors and you can
0:04:16	think of well of course we have cameras we have microphones this why there is
0:04:22	a gps
0:04:23	but also if you look closely for example in this so
0:04:26	let's see is for there's gestures sensors proximity sensors covers politics or amateurs
0:04:33	there's even a humility sensor so that you could drop your phone in the water
0:04:38	they can what the warranty
0:04:41	and
0:04:42	barometer
0:04:43	so basically it turns out that this device is that we are not pockets in
0:04:48	so to some extent no more about where we are then we ourselves might be
0:04:52	aware
0:04:56	and there's more right
0:04:58	we all know about sort of logos of that also has
0:05:02	you know bone-conduction transducer in addition to well other stuff and then more futuristic things
0:05:08	right like there's research actually by
0:05:12	and you hear unusual that is able to do recognition just based on the other
0:05:18	facial must look activity right you have these sensors so i could be talking and
0:05:25	i said without formation a you'd be able to still recognise so in fact i
0:05:30	was talking to well to marry that may be an interesting challenge
0:05:34	for some
0:05:36	feature and used evaluation
0:05:39	then there's this more you know three stick a electro and the follow gram headsets
0:05:45	that it still kind of you know not very clear what you can do with
0:05:50	them but they're becoming more stylish so people might start wearing them
0:05:55	and then there's interesting things like this happen application from what roller
0:06:00	where
0:06:01	basically they have this idea that we all the nowhere an electric a tattoo here
0:06:07	nor next
0:06:07	that is gonna have the microphone and you can help also with speech recognition
0:06:13	there's all kinds of ideas about how to
0:06:17	collect more data about what we do in where we are
0:06:21	and then there's sort of progressive in the back and right once we get this
0:06:25	information what can we do with it
0:06:28	and there's been some talk here about how much progress we're making we're all familiar
0:06:33	with this
0:06:34	with this chart of the famous a word error rates for different tasks
0:06:39	no are we reaching some sort of a plateau but we know that that's not
0:06:43	the case because there's working dynamic speaker adaptation there's all these work in the in
0:06:49	the deep neural networks that we've been talking about also work in extremely large language
0:06:53	models that are making the recognition be better
0:06:58	there's also some working and all you not language understanding around conversation and topic modeling
0:07:03	there's a knowledge grabbed all talking a second and so if you put all these
0:07:07	together with some machine learning algorithms we're getting to a point where can be
0:07:13	start to be reasonably good at understanding
0:07:16	a human conversation
0:07:19	so
0:07:20	this is in this audience this is this is obviously very well known but it
0:07:24	is gonna remarkable that we now have
0:07:27	these a fairly substantial improvements in down to convert accuracy things to these
0:07:33	do you will networks and there's work here from microsoft ibm google and there's
0:07:37	others in the room that are working on this
0:07:40	something that you might not be as familiar which is the fact that deep learning
0:07:45	is also being applied to not a language understanding
0:07:49	and i would
0:07:51	when you to
0:07:52	but to make sure that you're aware of the so called down for sentiment treebank
0:07:56	was recently released by at stanford university
0:08:00	and there's is a nice paper recursive give models for semantic compositional at over sentiment
0:08:05	treebank by other soccer and was also i mean the same group as andrew on
0:08:09	and on chris manning
0:08:12	and what they do is
0:08:14	the
0:08:16	they published made available this corpus all over eleven thousand annotated utterances where they've been
0:08:25	parsed in this binary parse tree and then every node is mean annotated with the
0:08:30	sentiment about whether it's from very negative neutral prosody very positive
0:08:37	and so the and then to the interesting part is
0:08:41	how
0:08:42	they man so the make they make use of theme multiple
0:08:48	layers you know deep neural network to actually model the saying that the levels in
0:08:54	a parse tree
0:08:56	so that bottom-up can composition really fine the sentiment about a value at any you
0:09:03	know by doing these steps
0:09:05	so for example if you look at the sentence this film doesn't care about cleverness
0:09:09	weirder and you know that kind of intelligent humour
0:09:12	there's words like humour the case of plus a very positive one intelligent also so
0:09:17	this whole parse-tree
0:09:19	we sparsity
0:09:20	except when you reach the negation just doesn't
0:09:24	care about these and so the overall sentiment is negative
0:09:28	and this is very powerful because after now the traditional model has been back of
0:09:33	words
0:09:34	a vector space and it's
0:09:38	heart to model these relationships and
0:09:41	we all know that up
0:09:42	language has a deep structure it's kind of a recursive structure and
0:09:47	there's is long distance relationships with
0:09:49	certain modules within the sentence
0:09:51	that are harder to capture enough
0:09:54	in unless you
0:09:56	really get a false sense of the parse tree
0:09:58	so applying this
0:10:01	they gate getting gains of well
0:10:05	you know what twenty five percent
0:10:07	improvement in the
0:10:08	accuracy of the recognition of the sentiment over these this corpus which by the ways
0:10:13	about movies this is from
0:10:15	movie reviews
0:10:17	so that so encouraging that
0:10:19	that this technique that is not popular enough asr can also be transferred to natural
0:10:25	language understanding
0:10:27	then there's another a very important train
0:10:30	the way i seed in how we can improve that which understanding
0:10:35	and
0:10:36	just of all these earlier today with saying well the kind of the you in
0:10:40	asr use gone of missing a bit
0:10:42	i think knowledge graphs a really the answer to that
0:10:46	and wise that well because
0:10:48	we can go from this kind of disembodied strings
0:10:52	two and kurt entities in the real world right there is a nice but possible
0:10:57	that says from strings to thinks
0:11:00	so what that what is what is that
0:11:03	and knowledge graph really you can think of it as these giant network what the
0:11:08	nodes are concepts and then they're slings that really one entity to another for example
0:11:13	you know george cloning appears in ocean's twelve and you know this is movies and
0:11:19	an actors
0:11:20	and how they're really to each other
0:11:23	and the interesting part is if you know some history
0:11:27	you might remember psych
0:11:30	which was an attempt was still open sec still exist
0:11:34	it's an attempt to kind of create these very complex representation of
0:11:39	all known human
0:11:41	knowledge especially strip common sense
0:11:44	but the problem is that one is be able by hand
0:11:47	and they spend a lot of time deciding whether a property of an object is
0:11:51	intrinsic or extrinsic
0:11:54	kind of splitting hairs a something that is not there so it quite relevant the
0:11:58	way that this knowledge graphs are being built now is different
0:12:01	you will start with
0:12:05	start with wikipedia
0:12:08	and there you know
0:12:09	there's a at the data sets of machine readable version would you pdf that you
0:12:13	can ingest and then you can start extracting these entities and the relationships and there's
0:12:18	some certain degree of money alteration we can get pretty far with an automatic process
0:12:22	and so companies are doing this
0:12:25	and
0:12:26	for example has knowledge graph that has ten million entities and thirty million properties in
0:12:32	time you know connections microsoft have their own the court's authority and they have three
0:12:36	hundred billion entities
0:12:38	well five have a five hundred twenty million and it is an eighteen good properties
0:12:43	and then there's also more specialised ones
0:12:46	like factual for example which is a database of places point of interest local businesses
0:12:52	and they're also getting to sixty six million entries
0:12:56	in fifty different kind
0:12:58	and then of course you can take social media
0:13:01	and see their origin of entities and relations use which is people as a as
0:13:07	the version of a knowledge graph and so linked units now what twenty five million
0:13:11	users and facebook is over a billion
0:13:15	so
0:13:16	if you think carefully about these it means that
0:13:20	anytime the do you relate to what concept
0:13:23	or named entity like a place robotically organisation or person
0:13:27	you could actually you're able to grab that and map it onto one of these
0:13:32	entities
0:13:34	so that the traditional idea more in the linguistic side of
0:13:39	we do part-of-speech and we find this subject and the object
0:13:43	we can is they'll be some relationship
0:13:45	but this is still not really it's groups
0:13:48	i a bit easier material with the knowledge graph you kind of and for these
0:13:53	and say you're referring to this movie you're bring to that person and then there's
0:13:58	all kinds of inferences and disambiguation that you can do all
0:14:02	without knowledge right
0:14:04	so
0:14:04	i think to the fact that we can start to represent pretty much more human
0:14:09	knowledge
0:14:10	at least in the terms of sir
0:14:12	concepts and entities
0:14:14	in a way that it's read fit you know you know you know a commercial
0:14:18	representation is very important and that's very big step towards real natural language understanding because
0:14:23	it's more grounded
0:14:27	one of the usages
0:14:29	for
0:14:31	for a knowledge graphics for disambiguation and there's is classic sentence from linguistics rate i
0:14:38	saw the men on the keel
0:14:39	with the telescope
0:14:41	that can be interpreted in a variety of ways similar which are depicted in this
0:14:45	funny graph right so it's what the linguists call a prepositional phrase attachment
0:14:51	problem is it
0:14:52	with a telescope is it attached to the hill or to the man
0:14:56	or to me and on the hill again does it types of the manner to
0:14:59	me so
0:15:02	traditionally there's been really no way to solve this except for context but if you
0:15:07	think about imagine that you have access to my amazon purchase history
0:15:14	how do you and you saw
0:15:15	but i just bought a telescope you know two weeks ago pen you would have
0:15:19	a kind of a this idea of the priors right you could have a very
0:15:22	strong prior that it is me who is using the telescope to see the man
0:15:26	on the hill
0:15:27	so
0:15:28	it's obvious that the more context and the different sources of this context that we
0:15:33	can have access to
0:15:34	gonna help disambiguate natural language
0:15:37	that's context in one aspect and then gonna with different idea is that we also
0:15:42	know that you're intent and what you're looking for also depends on where you are
0:15:47	so that's another
0:15:48	place where
0:15:50	location now is important contextual location
0:15:54	this is this is not new there's a bunch of companies that are using for
0:15:58	example exploring the yours as location local search obviously by sort for japanese restaurants depending
0:16:05	on where i am gonna get different results
0:16:07	one yell for example
0:16:09	then there's also company select employee i that focus on
0:16:12	sort of predicting what you might need based on your calendar entries there's Q at
0:16:17	startup that was recently part by apple also in this space and then there's also
0:16:21	obviously google now
0:16:22	that
0:16:23	sort of
0:16:24	use able to ingest things like your email and makes sense at and understand that
0:16:29	you wanna have a flight or a hotel reservation and then take it makes use
0:16:32	of that information to bring a relevant alerts when the time is right
0:16:38	and finally the last piece is the recommend or systems right we're all familiar with
0:16:44	things that they like and amazon you get recommendations for books depending on the stuff
0:16:48	that you've but for
0:16:49	and the way the systems work is kind of semantic like a lot of spell
0:16:53	data but the users and then they class of the users and see all your
0:16:57	similar to these users so you might also like this on the book and this
0:17:01	is expanding for your net flicks from movies and or an spotty five for music
0:17:05	a link in facebook for people that you might know et cetera so
0:17:10	all these
0:17:11	systems are using context to kind of make predictions or anticipate things that you might
0:17:17	mean
0:17:18	so
0:17:19	it is within this general context of the emergence of anticipatory sort that we start
0:17:26	this company and expect laps is the technology company based in san francisco
0:17:31	that we start about
0:17:32	twenty five years ago
0:17:34	with this idea of creating a technology platform that especially designed
0:17:41	for
0:17:42	this real-time applications that are gonna be able to ingest a lot of states
0:17:47	give you relevant contextual information
0:17:50	so
0:17:51	in sort of run step
0:17:52	the way works as we
0:17:55	are able to receive
0:17:57	it's real time and dates about what you are
0:18:00	what you might be saying
0:18:02	what you reading like on a new email
0:18:05	and you can
0:18:05	assign different weights to some of these modalities right so something but i say or
0:18:10	something that i treat is gonna have a higher
0:18:14	wait and something that
0:18:15	i'm an email that i receive which i may just sort of scheme or read
0:18:19	as opposed to
0:18:21	deep
0:18:22	read deeply
0:18:23	so but we take all these inputs in real time and this allows and we
0:18:28	process then we extract important pieces of information from all the sources and that creates
0:18:33	dynamic model our best representation of what the user is doing and their intent and
0:18:40	therefore were able to
0:18:42	all sorts cap for information across many different data sources to try to provide information
0:18:47	there's gonna be useful to that user at that point i
0:18:52	and as a forty example of this platform
0:18:55	which created mine mel
0:18:58	mind meld it's right now and i put our
0:19:00	that understands or conversation
0:19:02	and fines content as you speak
0:19:05	you can think a little bit of the sky where you can invite people and
0:19:09	start talking
0:19:10	and then we'll get
0:19:12	interesting content based on that
0:19:16	and all gonna give a demo in a second
0:19:19	important aspect of the design of my mlps that we wanted it to make it
0:19:22	very easy to share information because if it ever tried to have a kind of
0:19:27	a collaboration session a using sky people quickly find especially the i
0:19:32	on the ipod that it's difficult to say you wanna share a an article you
0:19:36	have to leave the sky at have to find a browser or to some searches
0:19:41	and then you find sort of the url and then to try to send the
0:19:45	url thrust of the sky i am which may or may not be active and
0:19:49	so it's a bit cumbersome so we wanted to
0:19:52	make it very easy for users to be able to discover
0:19:55	to
0:19:57	to navigate and then to share information
0:20:02	in the stuff that you share becomes a permanent archive of the conversation then you
0:20:07	can look back to use
0:20:10	right so with that things that
0:20:13	when a give it a little demo
0:20:18	my email
0:20:20	see how that
0:20:21	works
0:20:24	so this is my ml and you can see that i have access to
0:20:27	some of the sessions or conversations that have taken place in the past we can
0:20:33	think of you may have a recording meetings like every tuesday you have your update
0:20:38	with your colleagues and so you would joint that section because everybody's already
0:20:43	invited
0:20:44	and plus you can have all the context
0:20:47	all the things to
0:20:48	the shared items and the and the conversation that when that was previously happening that
0:20:54	session
0:20:55	but for now i'm gonna start a new session
0:20:59	and i can give a name
0:21:03	learn what's
0:21:04	i can make it friends only
0:21:07	what can make it public rights invite only
0:21:11	and
0:21:16	it's if the connection works
0:21:23	this is now making a call to facebook
0:21:25	the face at i
0:21:27	that
0:21:28	okay here we go so
0:21:31	let's say that i will invite alex
0:21:35	likes my able
0:21:37	okay
0:21:41	so
0:21:42	now what i'm the only one in the conversation and so otherwise if as soon
0:21:47	as alex joins you would also see
0:21:49	information about the speaker right
0:21:51	you know the thing that we found when you talk to people like no
0:21:54	web text run to
0:21:56	on the
0:21:57	on some sort of a conference call
0:21:59	people tend to kind of google each other and find the lincoln profile well here
0:22:03	is in which is you that to you right
0:22:05	and this is a discovery screen so i'm the only one seeing this information
0:22:11	but if i decide to share then everybody else in the conversation would see that
0:22:15	which is why for example
0:22:17	you know they find the current location
0:22:21	of the user right here in the
0:22:23	in this whole what's congress hotel
0:22:28	so
0:22:29	the most interesting parties
0:22:31	when you have multiple speakers but for now i'm just gonna give
0:22:35	so we will real demo of how this looks like
0:22:40	okay mine mel
0:22:46	in mind meld
0:22:49	so was
0:22:50	wondering a whether you by some part about present no batman's brain mapping initiative
0:22:56	i so this new technical clarity that makes brains transparent
0:23:00	that might be a help for L
0:23:02	for these mapping initiative
0:23:12	so
0:23:12	you can see that you know the we show you that about a ticker items
0:23:18	here of
0:23:19	what we own what we recognise we try to extract some of the of the
0:23:23	key phrases
0:23:26	and
0:23:27	and then we know we do some post processing and bring irrelevant results
0:23:33	see what else
0:23:36	okay mine mel
0:23:38	so we're gonna have some friends over maybe we should cook some italian food
0:23:43	it we can do a mean a strong to so
0:23:46	fitted you know for it
0:23:48	maybe that would be nice
0:24:01	so you can see the mean wait works
0:24:06	if i like this for example i can drag
0:24:10	and share it
0:24:11	and this is what becomes part of the of the archive
0:24:15	which then everybody in a conversation C and also becomes experiment archive but i can
0:24:20	also access through a browser
0:24:28	anybody has a topic or something that might be interested in
0:24:42	i is okay my mel so paper more anyways interested in deep belief neural networks
0:24:48	that's something that we've been talking about
0:24:51	at this L ieee asru
0:24:54	conference in other modes
0:25:12	so
0:25:14	one of the issues is i think pattern i are not connected in facebook
0:25:19	because otherwise we would have found
0:25:22	the right "'cause" are model
0:25:25	i
0:25:36	however if we are
0:25:42	not even this one okay
0:25:44	this is but you can see right so something
0:25:49	let's stick to ieee okay i
0:25:54	so one of the things that we do is we do look at the intersection
0:25:58	of the social graph of the different participant you know call
0:26:01	so that we can then
0:26:03	be better at
0:26:06	disambiguating
0:26:07	no named entities right so
0:26:09	so if we had been connected and
0:26:12	pay a pit on brno would have been the real they don't know what in
0:26:15	right here
0:26:23	alright so
0:26:25	but
0:26:27	let me go back to the
0:26:29	presentation real quick here
0:26:31	so
0:26:32	this is the platform that we've than the we build and
0:26:36	if you wanna sort of
0:26:39	dig a little bit deeper
0:26:41	one of the novelties i think is that were combining the traditional and all P
0:26:45	with a more we call and of search style approach
0:26:50	because the interesting part is that were able to model
0:26:53	semantic relevance
0:26:55	based on the context
0:26:57	the what we're speaker least be easily set and the user model and also from
0:27:01	the different data sources that you can you have access to
0:27:05	so basis something like work we go for dinner and then the other person says
0:27:09	i don't know you like japanese sure any good base around union square
0:27:13	we're building these incremental context
0:27:16	about the overall intent of the conversation
0:27:19	and so
0:27:21	were able to then you know
0:27:23	do natural language processing the usual stuff part-of-speech tagging noun phrase chunking named entity extraction
0:27:28	anaphora resolution semantic parsing topic modeling and some degree of discourse modelling and pragmatics
0:27:35	but then the or the piece is that depending on the signal
0:27:39	that we get from each of these different data sources and you can think of
0:27:42	my social graph that was mentioning
0:27:45	the local businesses that factual or el can give you
0:27:48	personal files right you give us access to drop box or to europe will drive
0:27:54	we can make take that of the data source
0:27:57	and then there's more the more general web with needles and general content and videos
0:28:04	but what's interesting is that even this the response that we get when we do
0:28:08	all these searches
0:28:09	that also informed as about what is relevant and what is not
0:28:13	about that particular
0:28:14	you know conversation
0:28:17	put in other words if for example you work to build an application that only
0:28:20	deals with movies and T V shows an actor stand any reference to something else
0:28:25	that would not find a match
0:28:27	would basically not give you
0:28:28	results
0:28:29	but that also means that would be much more precise right in terms of the
0:28:34	answers the that you give the relevancy of the content
0:28:38	in so this is something that
0:28:40	because we have well
0:28:42	kind of very scalable and fast backend
0:28:45	allows us to do multiple searches
0:28:48	and we have some cash as well but basically these
0:28:50	makes as
0:28:52	be able to compute the semantic relevance of an utterance never a dynamic way
0:28:56	based on context and also based on the type of results that we obtain
0:29:02	so this is a you know technology conference so what tech technical conference some of
0:29:07	the ongoing R and D as you can imagine is quite substantial
0:29:11	in the on the speech side
0:29:13	there's
0:29:14	we have two engines we have an embedded engine that runs on the ad
0:29:18	and also we have passed club a speech processing so an interesting
0:29:22	research is you know how to balance that and how to
0:29:27	how to be able to on the one hand listen continuously put on the other
0:29:31	also be robust to network issues
0:29:34	and then there's in terms of practical usage there's things that you can imagine detecting
0:29:38	sub optimal audio conditions like when the speakers so far on the mic noise environments
0:29:44	as we all know heavy accents are an issue
0:29:47	and then
0:29:48	one of things we found is because is an ipod at it's very natural for
0:29:51	people to kind of leave it on the table and two things happened they speak
0:29:55	to each from far away and also the can be multiple people
0:29:58	speaking on you know to the same device and our models try to do some
0:30:02	speaker adaptation
0:30:04	and sometimes that doesn't work that well
0:30:08	and then sort of the issue with this kind of the holy grail of could
0:30:11	we detect you know a sequence of long
0:30:14	and grammatical works and
0:30:18	when he's gone of you bridge
0:30:19	and of course there's techniques to do that but
0:30:21	we're trying to get
0:30:23	improve the accuracy of that
0:30:24	and then in terms of natural language processing in information retrieval also kind of a
0:30:28	design question are things like the class i cannot P problems like word sense disambiguation
0:30:33	although obviously the knowledge graph helps a lot
0:30:36	and then enough resolution and some of these things we do with the social graph
0:30:42	an important aspect is
0:30:43	these knowledge graph is useful but
0:30:45	how do you dynamically updated how do you keep it fresh
0:30:49	and we have some
0:30:50	some techniques for that but it's
0:30:53	it so
0:30:54	ongoing research
0:30:56	then every important aspect is
0:30:59	deciding that the sorts working this right
0:31:02	as we all know if we if you leave a speech engine on
0:31:05	but i remember an anecdote from are alex waibel that you told me once it
0:31:09	as an engine running in his house and then when he was doing the dishes
0:31:13	with a look cling incline that you know the search engine was spouting all kinds
0:31:17	of the interesting
0:31:19	hypotheses
0:31:21	this is been alluded to of course you can have a fairly robust voice activity
0:31:24	detection
0:31:25	but there's
0:31:27	there's always room for improvement
0:31:30	the search more than is as i mention is not just
0:31:33	understanding that something is speech but also detecting of how relevant something is within this
0:31:38	within the context and this comes of these other point of the interrupt ability and
0:31:45	mind meld is a bit too verbose right this is just a showcase of what
0:31:49	you can do also because the ipod has a lot of real state sequence shoulders
0:31:53	different articles in practice and through the a i'll talk about in a second
0:31:57	you have a lot of control about how like twenty one to be interrupted when
0:32:02	you wanna
0:32:03	a search result for an article to be
0:32:07	to be shown and this is
0:32:09	a function of at least two factors one is
0:32:13	have
0:32:13	in place in the request is how much the user ones to have certain information
0:32:18	and the other one is what i was mentioning about the nature of the information
0:32:23	found how strong is the signal from the data sources about the relevancy of what
0:32:27	i'm gonna show
0:32:29	and what i mean by that is
0:32:31	you can think of
0:32:33	but you by set
0:32:36	the difference between
0:32:38	what is the latest movie by woody allen
0:32:42	versus i've been talking about woody allen in
0:32:44	and i mentioned the that
0:32:46	the keys latest movie et cetera
0:32:48	right so one is a direct question where am the intent is clear more like
0:32:53	a serial like application where and trying to find the specific information the other one
0:32:58	is a reference sort of in passing about
0:33:00	something
0:33:01	i'm and so
0:33:02	that
0:33:03	would be the these understanding of
0:33:06	how eager i am to receive that bit of information
0:33:09	so that's work that is ongoing being able to model that
0:33:14	and then finally
0:33:16	we have a fair amount of feedback from this right especially when the user shares
0:33:21	an article that's a pretty strong signal that was relevant
0:33:25	on the negative side haven't shown you this but you consider flick one of the
0:33:31	entries on the on the right on the left hand side that eager items as
0:33:34	we call them you can delete them so that would be good of negative feedback
0:33:38	about
0:33:39	certain entity or a key phrase that was not
0:33:41	deemed relevant by the user
0:33:44	how to
0:33:45	optimize the learning that we can obtain from taking that user feedback
0:33:49	is also something that
0:33:50	that we working on
0:33:52	especially because
0:33:54	the decision to show certain article based is complex enough that
0:33:58	sometimes it's harder to assign the right sort of credit or blame for how we
0:34:03	got there
0:34:06	so just do well
0:34:09	sort of
0:34:10	twenty five what we're doing there's two products that we're offering
0:34:14	one is that might melt
0:34:16	my not obvious what you see here
0:34:18	and as a matter of fact
0:34:19	the mind meld out
0:34:21	is gonna be alive on the apple store tonight
0:34:25	so
0:34:27	we've been working need for awhile and it's finally happening
0:34:30	so if a if you're welcome to tried out
0:34:33	i guess will be tonight well
0:34:36	for whatever time zone you're up store a is set to so i think
0:34:41	new zealand users might already be able to download it
0:34:44	and then for the us will be
0:34:46	in a few hours
0:34:50	so that's a mimo but then
0:34:52	the other thing is
0:34:54	were also offering these the same functionality when api about a rest based api
0:35:00	that
0:35:01	you're able to well
0:35:03	give this creates sessions and you and users and give this real time updates so
0:35:08	that and then you can query for what is the most relevant you can also
0:35:13	select the different data sources and so it any given point you can ask for
0:35:17	what are modeled thing system most relevant set of articles
0:35:22	with a certain parameters for ranking et cetera so we're having
0:35:27	already a system
0:35:29	degree of well of scoring
0:35:31	how lots
0:35:32	with comes
0:35:33	for example some or all of our backers which include by the way google ventures
0:35:38	and also sums and
0:35:40	intel top twenty car
0:35:42	liberty mutual
0:35:44	they're all in the backers that we're trying to do some prototypes with
0:35:49	so
0:35:50	i'm character to try it out and
0:35:53	i was thinking that
0:35:55	because i'm actually gonna be missing the launch party that is happening in san francisco
0:35:59	i'm gonna take our banquet that the bishop's palace as the ones party for might
0:36:04	know
0:36:10	that's what i want to say and we have some time for questions
0:36:32	was at all
0:36:34	the i was wondering i'll the track the users they the example the key we
0:36:41	want to eat something and then
0:36:44	is it is still sticking to the restaurant domain and me and no
0:36:49	what the example you show that's all you're adding information and how about you change
0:36:55	information that you previously used switch to another domain
0:37:00	how you jack use
0:37:03	there's to right of information that we use for that one is simply time right
0:37:08	that sort of as time passes you can of so you decay certain previous and
0:37:12	trees
0:37:13	the other one is some
0:37:15	kind of topic detection clustering the we're doing so that
0:37:19	sentences that still seem to relate to the same topic kind of you know how
0:37:24	help a
0:37:25	sort of ground round that topic
0:37:28	and then there's also
0:37:31	some user modeling about you know you're previous sessions so that we have certain
0:37:37	prior weights
0:37:48	what
0:37:53	well so you know there there's
0:37:58	i'm not gonna sitting some specific algorithm that we use but you can imagine there's
0:38:03	some you know statistical techniques to
0:38:06	to do that modeling
0:38:09	where small startup we can not like reveal everything
0:38:15	so like very much so it's great another question so
0:38:21	i one point you happened mentioned
0:38:26	asr you and all the modes probably enough came out as a that's are you
0:38:31	a slu and columbus
0:38:34	no it's
0:38:36	it would same
0:38:38	the really what
0:38:39	that what you've shown us are ways of organising information at the output and the
0:38:44	process
0:38:45	but also same particularly not example when the beanie the
0:38:50	not only that it's actually well it does know exactly where you work it's without
0:38:55	map
0:38:56	and it might even figure out that you're at this thank all layers are you
0:39:00	but this things we're not being reflected in the lower level transcription process so i
0:39:06	was wondering how the mites you don't have to tell us anything it's buster's father
0:39:12	figured and to train nice things
0:39:15	well it's obviously a that the research issue of how you
0:39:20	make the most of the contextual information and unfortunately
0:39:24	asr specially the well the these cloud based asr
0:39:30	de at this point doesn't
0:39:32	fully support the kind of adaptation and comp and dynamic modification that would like to
0:39:38	do
0:39:39	but that's kind of a and an obvious thing to do in the same way
0:39:43	that you constructs larger contexts and fine you know the all the people that you're
0:39:47	related to and at that you're specific lexicon having something like the location and the
0:39:52	towns nearby or would be something
0:39:55	very no sort of natural to do
0:39:58	but we're not being this
0:40:02	i have to say your search for better more innocent implement because
0:40:06	the previous one used to be a and the step and it has so that
0:40:10	you made
0:40:11	so when you search for permanent you go okay so this is better
0:40:16	well that the asr was no hundred percent accuracy
0:40:20	which one to use
0:40:23	actually we use the writing including a new ones and cools
0:40:35	sex for a talk also pitch wondering about privacy occurrence i was on those impression
0:40:41	that's the more
0:40:43	i want to
0:40:45	interact with this mind meld at some live in or a need to be transparent
0:40:50	for the for that and my personal data
0:40:57	well i have a
0:40:59	actually a philosophical reflection
0:41:02	that
0:41:02	as a society with this technology we are going to words what i'm calling with
0:41:07	transparent bring
0:41:08	a and
0:41:10	if you think closely about it up
0:41:12	the better we are collecting data but users and modeling their thing intentions
0:41:18	we can get to a point where
0:41:21	you can almost all of complete your thought
0:41:24	right assume that you start typing the query and gonna be knows what you might
0:41:27	one
0:41:28	and of course is just a little bit of science fiction but
0:41:31	we're kind of getting there and so i think the way to address that is
0:41:35	by doing very transparent about this process
0:41:40	and giving you full control that what is it that you wanna share for how
0:41:43	long
0:41:44	because
0:41:45	that's really the only way to modulated it's not just say one gonna opt out
0:41:49	and not just gonna use
0:41:50	any of these
0:41:52	anticipate research because basically will be unavoidable right but so i think it's
0:41:58	it's
0:41:59	what we need to do is how well some clear
0:42:03	settings about
0:42:04	what you wanna share with this out for how long
0:42:07	and then insuring the back and that that's really
0:42:09	the only way the only usage
0:42:11	of that information
0:42:15	but as an example
0:42:16	we're not recording
0:42:18	this
0:42:19	the voice rate
0:42:20	and is the only thing that is permanent in this particular mind all application
0:42:24	are this the articles that you've specifically share
0:42:28	that's the only think that
0:42:34	so i'm happy that maybe if you're looking at pedro if you are task pedro
0:42:40	in police record would you see something
0:42:44	it may be you wouldn't wanna see so is there anyway i like when you're
0:42:47	looking at your space
0:42:48	do you have certain
0:42:52	contexts that you're searching for things when you bring information back like let's say you
0:42:57	know this descending order social setting or some other context
0:43:02	yes so what one of the shortcomings of the little demo idea is first of
0:43:06	all you was only one speaker it's always more interesting when israel conversation
0:43:10	and the second is it wasn't really a long ranging conversation about certain topic where
0:43:16	mine mel excels at least in say you wanna
0:43:20	planned application with you know some of the frames are somewhere else and you sail
0:43:24	well gonna go here then you explore different places you can stay things you can
0:43:27	do and you share that when you have
0:43:30	a long range in conversation with this with the kind of overarching goal
0:43:34	that's where it works the best if you keep sort of switching around then it
0:43:38	becomes more like a serial like search that doesn't have much
0:43:42	in just a quick question so how do you build your pronunciation so if you
0:43:46	look at asr you would spell line out that if you look at icassp you
0:43:50	actually see it doesn't work
0:43:52	that's it's mostly in the lexicon there's certain
0:43:56	abbreviations there are more typically
0:44:00	separated like you know i guess are you or some other ones like need to
0:44:02	alright guys that would be a spoken is a war
0:44:05	it's so it's becomes in the pronunciation lexicon pretty much
0:44:12	you more questions

Augmenting conversations with a speech understanding anticipatory search engine

Applications Day

Marsal Gavalda (Expect Labs)