Speech Transcript - Selected poster summaries, panel discussion

0:00:16	okay so the what should follow we should be the up on L on the
0:00:22	application the end we should have the selected posters
0:00:25	and i as i have found out D V somehow didn't manage to organise the
0:00:29	think well so we don't we didn't have an exactly posters so i quickly around
0:00:32	and i was searching for the best posters on that would fit the supplication the
0:00:37	N actually found that we have them here so i found it best posters all
0:00:40	these
0:00:41	google new ones
0:00:43	microsoft research for next we also i would invite deep
0:00:47	of this so these are probably not all sorts of these posters but
0:00:50	i would invite some people to this point L so you can discuss the application
0:00:54	issues but maybe let me do that way that we i don't in white again
0:00:58	the i'm sorry with the database the last
0:01:00	speaker a still here so i if i can there i would just invite all
0:01:04	day speakers that we had here today to the cd here and the of the
0:01:09	people that we see on the on the posters you like somebody from nuance microsoft
0:01:14	i dunno we have anybody here if you if you want to join us to
0:01:18	you are just our company joint also and
0:01:21	can i keep you you're for a little longer so that and then we should
0:01:25	we should the
0:01:27	well i help that the audience we'll help me to ask be important questions that
0:01:32	we can ask the people from industry and
0:01:36	a wicked the people that build the application we had several talks about applications
0:01:43	do we haven't nikon lever because they here we have you mural source of the
0:01:46	people are talking all the people are talking about application because also talking about how
0:01:51	to calibrate system that they work for all the operating points and we can use
0:01:55	them for all the different applications
0:01:59	that's so the first think that i well i want to talk to think that
0:02:01	may cost of was the most interesting today
0:02:04	i shouldn't have probably all this question about the i mean they my question here
0:02:08	will be i did we actually find this a they useful and the real and
0:02:12	something from the people that the presented out what present that presently that's some think
0:02:17	about what they were common and you do we want to organise such sessions maybe
0:02:23	at some
0:02:25	other conference do we think that this was actually some review lance anything useful or
0:02:29	the what the people at the parallel thing that
0:02:34	we should have learned from that more maybe you have now sounds even to
0:02:38	to tell us what should have been the take out the message from your talks
0:02:42	and again in a short summary and what you think that we should have lunch
0:02:47	from your data research you should have planned for you
0:02:59	i mean
0:03:00	numb
0:03:06	very interesting because you kind of to me
0:03:13	okay i mean
0:03:16	and technology
0:03:18	product
0:03:23	we had wrote the mean and i think it's
0:03:28	one for researchers that are working
0:03:33	to be able to
0:03:35	explain what we do one shows the importance and ultimately the fact that can
0:03:52	and we now we have all these like this talk and we get using of
0:03:56	them but i think that have also and
0:04:00	did you notice how much they thought they had that's not very right only result
0:04:03	we have so much actually for that
0:04:08	a better so that you are collecting how much like two thousand
0:04:12	hours per second or what was it by our
0:04:17	i haven't done in my no i
0:04:22	my lack of envelope estimate is
0:04:28	but once you once you told me that with a
0:04:37	there some speech and six companies that process
0:04:41	of thousands of hours of audio for a right you matching although all reported in
0:04:47	call centres
0:04:48	when you say these all maybe my dream order is always recorded reliability purposes right
0:04:53	so that
0:04:58	not much of it is processed except for
0:05:00	more and more thinking
0:05:03	industry companies that are lines we know the mean and you know but that means
0:05:12	thing is really
0:05:14	for tens of thousands of hours
0:05:17	so
0:05:19	it sounds like to see
0:05:26	really but i i'm never will be well the privacy issues but you might model
0:05:31	you really collect something like thousand hours so i
0:05:37	our
0:05:39	i guess that you could even do the things like in negotiating with your customers
0:05:43	that they would be willing to give us one second per hour for free and
0:05:49	if you were willing to share that we thought that would be actually now nine
0:05:53	thousand hours per year and it would be pretty happy about that so
0:05:58	you know this comes out of
0:06:01	that problem is that
0:06:03	the you got framework
0:06:05	the signal and you know many people i don't know if i would like
0:06:10	boy samples to be available
0:06:12	E it's a lost battle a there's no way that the cost was reworked for
0:06:17	no one's for us for whatever is doing a speech at this scale
0:06:22	is not in favour i was telling somebody that before that i think that's actually
0:06:29	we do collect this initial databases that you know at least in the case of
0:06:33	we send people to a country and we collect like a couple hundred hours
0:06:37	those are collected with consent from the uses
0:06:42	that those databases might be feasible to open sort the problem is that and not
0:06:48	sure that the consent agreement that the wording of the consent agreement says that
0:06:54	you know the data will be available outside i don't know
0:07:01	anybody in the audience any
0:07:03	only opening
0:07:06	it does help me push them that if they should be possible
0:07:12	okay so i think we sort of where you sort of no work we want
0:07:16	from you just data
0:07:19	and i was curious that mouse sensor sitting on the other side of this terrible
0:07:22	what is that you would like to see this community really be working on
0:07:29	from your perspective
0:07:35	i mean that's a little all the work done on neural networks is great i
0:07:39	mean and we have been actively participating in that
0:07:44	there's another thing google that
0:07:46	just funding we use pen
0:07:51	unlike few million dollars evaluating grants many of what many of which are go to
0:07:56	places like cmu i don't know you're word about one i know
0:08:00	people seem you get them
0:08:03	so it's not just
0:08:06	the we have they we keep money
0:08:08	a
0:08:10	a joint here listening to me
0:08:12	we might
0:08:13	a
0:08:15	i'm not sure i will
0:08:18	have you know a nystrom suggestions i think of the work that designed a common
0:08:21	at least relevant
0:08:22	it is true that i
0:08:24	the kind of things we care about
0:08:27	in more big data and we can also would you so that that's a problem
0:08:34	we need to think about some mechanism to
0:08:37	to help i mean we have listings likely they'll art n-gram corpora
0:08:44	because in all those are wanted to statistics on it is text and its not
0:08:48	so
0:08:50	subject to all these
0:08:52	privacy considerations
0:08:56	i think they in a work related to semantic understanding composition systems
0:09:02	it's just really want to us
0:09:05	i wouldn't call it a universities to send proposals from that area i think that
0:09:10	will resonate well
0:09:12	they were they working in languages i have to say that
0:09:16	we don't feel is that relevant to us because
0:09:19	i mean we care about language is that have everything system
0:09:24	a lot of the limitations that us are operating are kind of self imposed
0:09:29	right we can collect two hundred hours in that we store a lot of the
0:09:33	stuff is not available on
0:09:36	lexical mean for example that's interesting
0:09:39	you know learning pronunciations from data
0:09:42	but we have a lot of research in the area to
0:09:47	i'm not what does
0:09:49	i
0:09:52	i have another comment about sharing of data this is not directly relevant for speech
0:09:58	recognition but it works for a speaker and also for language recognition
0:10:04	so
0:10:07	many of you probably already know what the i-vector is you take a whole segment
0:10:11	of speech possibly even a few minutes long and
0:10:14	you
0:10:16	basically trained at that the gmm model to reflect what's happening in the speech and
0:10:22	you projects the parameters of the gmm model onto a relatively small vector maybe four
0:10:29	hundred six hundred dimensions
0:10:31	and
0:10:32	that works really well for recognizing languages and speakers so
0:10:38	people are or less reluctant to ship data in that form so people will give
0:10:43	you
0:10:45	that allow you to type of their sites
0:10:47	a bunch of i-vectors because you cannot your what is being said
0:10:52	so one example is there is currently nasa's has just launched a new
0:10:59	speaker recognition evaluation
0:11:01	i've made a whole bunch of i-vectors available this is data which that are normally
0:11:06	shabbily with the world it's the it's the
0:11:11	that's the some ldc data i believe
0:11:13	so that a strings attached to the ldc data but they're giving away these i-vectors
0:11:18	basically without conditions
0:11:22	so
0:11:27	i like to implement and a lexus question
0:11:31	i think there's actually disconnect between the research and then the in this is going
0:11:38	with regards to the applications are actually the driving the speech work might be
0:11:46	and most of the in a bigger companies the going off the conversational systems
0:11:53	this a design example google now and then a there's a microsoft as experts
0:12:00	so what i see even though this is that actually a speech recognition and understanding
0:12:04	workshop
0:12:05	and that only a handful of papers on understanding and everyone is working on speech
0:12:09	recognition
0:12:11	that is what you know it's that it's not balanced right now and i look
0:12:16	at the em an L P A C L
0:12:19	you know who all this at a data model on the theoretical side you know
0:12:23	they're not as much since this is a application i see that this is the
0:12:28	community we should be investing more because this is the right people but i know
0:12:32	we're not doing that
0:12:33	and the second piece is there there's search why we observe that expert actually launch
0:12:39	the T V signal it's free for natural conversational search in entertainment search you look
0:12:45	at the most frequent scabies people are using single bird to word cured is then
0:12:51	not really using
0:12:52	you can say show me movies with tom hanks from nineteen eighties
0:12:58	today don't search even though the system handles it so there is the barium now
0:13:02	in a keyword based search and more and alan conversational a typo search and of
0:13:08	course the you know a search in keyword search voice search those of the blockers
0:13:13	all the priors on people's mine
0:13:15	and how are we going to get over this in is the going to take
0:13:18	time or what do we need to do about that
0:13:37	i will make comment so what on the a question about the amount of the
0:13:41	data the latter speaking hit a ball right about the internet there is a lot
0:13:47	of data is
0:13:50	given that of the proposed to be sure to
0:13:53	on the you to one another
0:13:56	or close
0:13:58	the people are about that this database public we should to find of a how
0:14:02	to use this the
0:14:05	source
0:14:09	i will figure at ibm in your position and i understand the problems of sharing
0:14:14	data
0:14:14	but
0:14:15	and also on the side and apply them are a little bit about
0:14:19	problems with models
0:14:21	and i must say from my perspective
0:14:23	the things that you could do for us
0:14:26	is you could share the error analysis of your data
0:14:30	now i must so
0:14:33	and i can say
0:14:35	as strongly as i can
0:14:36	i don't know any scientific endeavour
0:14:39	the made progress but how big the number of errors
0:14:42	that that's that simply counting
0:14:45	but i'd analysis of the kind types of errors that you see
0:14:49	types of conditions under which those errors happen would be very helpful for the entire
0:14:53	community you guys see a tremendous amount of data and i'm sure that you categorise
0:14:58	the errors of that data
0:15:00	we would love to see the categorisation
0:15:19	some jewel if i don't know if it's here
0:15:24	he argued earlier that
0:15:27	the quality was much more important than quantity of data of that we have the
0:15:31	quality guys out there and all that with the back
0:15:35	could you argue this is the way
0:15:45	i think you need both right
0:15:49	and
0:16:05	that the long run that's useless
0:16:09	activity
0:16:09	i wouldn't call it useless
0:16:12	but you know then within a willis each team we
0:16:17	we have a little bit of these quality because of our acoustic modeling team for
0:16:22	the most part they use a annotated data
0:16:26	transcribed data while a on my team we don't do it because we have it
0:16:32	once in charge of maintaining
0:16:35	forty eight languages anything all the training room so
0:16:40	so i always argue that
0:16:43	some of the techniques that they
0:16:46	or improvements that they manage to get my not be
0:16:52	translatable to the other situation where you are in a supervised weights all
0:16:58	i think realistically
0:17:01	i
0:17:02	personally i would argue that are unsupervised
0:17:05	is the way and i would work only the community
0:17:11	could get more and more a research in this area because this is very open
0:17:17	we still don't know
0:17:19	you talk to people in my children in a about the way we do training
0:17:23	and it will be shock
0:17:25	like what the herald we have because we're getting i mean you think about it
0:17:28	is a lot of all
0:17:31	scan all we are right you're using a system and you are using the prophecies
0:17:35	tend to train itself
0:17:37	a this something bizarre and four and there were a but it works right
0:17:45	and if i was
0:17:47	trying to organise some a word so but
0:17:52	with high i mean we thought about it about this particular topic unsupervised
0:17:55	acoustic and language anymore lexical modeling
0:17:59	for the next interspeech you know
0:18:01	in singapore i just
0:18:03	it was a little work on and just lazy but that i would encoded somebody
0:18:07	to organise got or so and i will make scroll wheel and help
0:18:13	so i
0:18:15	should be up there but here
0:18:17	tired
0:18:19	there is that the elephant in the room
0:18:23	we heard a little about it
0:18:25	but in the this we used to say that a we're looking for the keys
0:18:29	on the white and that's why we use cepstrum
0:18:33	and now for doing very well and asr about the real
0:18:39	problem is not asr this semantics
0:18:42	and that it's not being addressed at all
0:18:45	this
0:18:46	community supposed to be with you are in the U is very important
0:18:53	you wanna get very good the transcribing in a them on the bigger the to
0:18:58	transcribe as well as the amount of data that you work training well never be
0:19:03	able to be read by anybody you really need to go much further and going
0:19:08	to
0:19:09	language understanding some sort remember before this becomes
0:19:19	so i'd like to follow a primer comment there
0:19:23	all of you seen lots of great papers and presentations here at asr you still
0:19:27	have to mark to take place a year from now we'll have S L T
0:19:32	and like to how and so i'd like to ask if anyone i'm handle here
0:19:38	might have some suggestions on your challenges are things that you sign here
0:19:43	that might motivated challenge or some type of collaborative effort that it might take things
0:19:49	that we've learned from this meeting
0:19:51	and maybe try to deal planning for next december
0:19:55	to train addressing issues that may come up from this discussion
0:20:08	no one says
0:20:14	i mean if it's some of the things i mention anything our would be very
0:20:18	valuable such as distant
0:20:20	speech recognition in fact just being able to recognise that this speaker is too far
0:20:26	away let alone correctly recognized what they're saying would be useful i just anything at
0:20:31	the relates to finding stuff
0:20:34	realising that the speaker is in a sub optimal condition that'll be useful
0:20:47	okay
0:20:49	ten fifteen years ago when i started of the speech samples lot of work multimodality
0:20:52	seems to be
0:20:54	totally data
0:20:55	heard the word once or twice today
0:20:58	is that something that universities could work on the rest of something that you guys
0:21:02	of honour
0:21:03	drive down with thousands of hours of
0:21:06	annotated or unannotated data are as well and we shouldn't even bother to look at
0:21:09	it again
0:21:13	multimodality use robots or
0:21:15	video material
0:21:20	i mean we have an application that has video feed constantly on our user and
0:21:25	i think that would be useful for us to be able to make use that
0:21:29	kind of data
0:21:30	to improve speech or any number of other
0:21:34	types of inputs from are users
0:21:38	that being said we have devices like that now that have a camera aimed at
0:21:41	users all the time i don't know that was necessarily true fifteen years ago that
0:21:45	was always count
0:21:46	now we cameras and microphones carry around in our pockets constantly so
0:21:50	from my perspective be lovely the inverse is to solve the problem for me it's
0:21:54	like it just take a nice black box employed in a get twenty percent better
0:21:57	success and everything
0:21:59	that the same time just saying you got thousands of hours of
0:22:02	that they know that we won't have
0:22:04	also you have ten a hundred grad students i don't have so
0:22:11	where
0:22:12	maybe not right there but i know there are a lot of grad students at
0:22:15	cmu
0:22:17	all slave them for you
0:22:24	i wasn't to say that i think microsoft has done it very good job with
0:22:27	that they can and right
0:22:29	where you can capture adjusters
0:22:32	i found that really interesting because
0:22:34	you know home environment
0:22:37	i
0:22:37	maybe you can even compensate
0:22:40	for everything the recognizer so i personally think is interesting but i would like to
0:22:44	you can so as to say
0:22:49	so it is also my within that it is connected so it's a device that
0:22:54	can be easily used for data collection and the committee gonna buy a voice and
0:23:00	the by a human and likes and the like bodies they shows so if the
0:23:06	research is very important
0:23:10	quicker corporate you know how to or comments
0:23:14	if a we're here for actually are why don't have a simple right
0:23:27	yes so for our language model training we use
0:23:31	a lot of sources as i mentioned
0:23:33	i'm one of the sources we use is also the transcriptions of the record
0:23:38	after some filtering
0:23:40	i actually you do some sort of into voice down
0:23:44	a standard place in techniques and you look at which data source contributes the most
0:23:49	of the quality of the language model then supervised data source a contributes a lot
0:23:55	so we will use
0:24:00	not quite there are here for training a company wide or compare from this one
0:24:09	from agnitio information silence
0:24:12	okay yes we will have access to other are what i call that there are
0:24:18	a little information for example whether they use their click on the result meaning they
0:24:24	accepted they hypothesis we provide
0:24:28	or whether the user to stay in a conversation seems like that
0:24:31	a
0:24:33	it's can actually this whole thing is surprise to us initially we look at this
0:24:37	kind of data and we figured this is going to be great because we will
0:24:41	be able to sample
0:24:43	from
0:24:44	regions in the confidence distribution where the confidence is lower
0:24:50	i'm compensate because the user click right basically is telling us
0:24:55	we did something right but we haven't seen any improvement i turns out that at
0:25:01	least so far that confidence scoring placidly states and things like that works pretty well
0:25:06	so i mean it has being a bit of a disappointment to us that this
0:25:09	latter signals don't seem to have much
0:25:15	thank you
0:25:18	the normal
0:25:20	questioned the moment let me may be written to D what you were talking about
0:25:23	before there was the what's rarities i-vector mentioned so actually what i have seen just
0:25:29	during the approach of idiot that you were
0:25:32	people working with us
0:25:34	from google he can with interesting problem that he wants to train neural network on
0:25:41	on i-vectors but since you have you could extract i-vectors from a thousand millions of
0:25:49	for of
0:25:51	recordings then he could use completely different technique and eventually he was successful for short
0:25:57	duration is something that possibly we would be also interested and if you had available
0:26:02	though those i-vectors and
0:26:05	we could eventually be interested in running something on such data because at the end
0:26:10	the only thing that we care about is that the next asr you will be
0:26:13	again on some nice sunny place and we need to write paper for that
0:26:17	so and so perhaps the components could be more proactive in this sense that you
0:26:23	maybe you see this interesting problem so maybe you could think of
0:26:27	how to generate something that you can actually share with us which is actually no
0:26:31	real value for us in the sense that we could train our system on that
0:26:35	but generating these kind of challenges that you give us these i-vectors and just play
0:26:40	and whatever you want with that and because this is something that we are interested
0:26:44	in
0:26:45	in fact we know that such problem would exist for google or we could guess
0:26:49	but it wouldn't know what kind of i thought how short segments and
0:26:53	what kind of data are you interested in running a language identification that and i
0:26:58	guess the similar problem would be even maybe natural language understanding you would have some
0:27:02	sparsity problems you could possibly extract something information from the data ensuring with us
0:27:07	we can maybe people are not working on such problems because we again we don't
0:27:11	have this they also this is so you say that maybe we should sign up
0:27:15	for the we should think of some
0:27:18	some project that google would be even willing to pay for but maybe people don't
0:27:23	even think of such project because they didn't have the initial data play with and
0:27:27	then to find that there is actually some interesting problem
0:27:37	anybody else's anything close like to
0:27:40	i knew that the problem is that you have and then we use a lot
0:27:45	number that i think what the locations saying is it's a matter of a mindset
0:27:51	then we give an example from my side but not my mindset is the mindset
0:27:55	of incorporate department
0:27:57	no says that this is the danger and doesn't make compensate analysis it's really important
0:28:02	but
0:28:03	no need so maybe i should give an example rate so i'm johns hopkins and
0:28:07	while i think we a little bit speech and language groups in the movie actually
0:28:10	known for the hospital not medical school
0:28:13	and that is gobs and gobs of medical data which is similar to extremely valuable
0:28:18	and anytime a large medical dataset is collected belief into the work on it they
0:28:23	every look for bayes to make it available in other words that tendencies not of
0:28:28	the large decrease in the not so that's not bothered about it they were clearly
0:28:31	had to figure out how to the an animal i do but anonymized it'd be
0:28:35	identified or whatever they call it
0:28:37	and so that's and i have guided of saying this data we can get good
0:28:41	things out of it but maybe someone out that in the world will get something
0:28:44	more out of it so let's see how we can make it available and like
0:28:47	and the cosine but speaker id language id dataset like it turned out that given
0:28:52	the state-of-the-art it might be enough to give people i-vectors i've seen other examples of
0:28:57	this
0:28:57	does a lot of jean had a essays and things like that better you take
0:29:01	into the be identified and then you give it out so if you started thinking
0:29:05	that and start pushing back because he these liars as the same know their first
0:29:10	answer little bit no
0:29:11	right so it don't take no for an answer
0:29:14	and just try to explore what will pass legal master because it is really in
0:29:19	that addresses the community to expose students to these kinds of datasets and problems and
0:29:25	again of innovative next breakthroughs gonna come from
0:29:28	so i think they should satisfy commit yourselves to say
0:29:33	let's try and they cannot for example a lot of gaily google in particular there's
0:29:36	a big commitment open source
0:29:38	and that didn't come about easily i mean you remember the days when companies are
0:29:42	the copyright everything in a local used to go out
0:29:45	but that change in the same way i think we should actively push
0:29:49	these lawyers and say it this is necessary to go
0:29:53	i think that is another aspect
0:29:56	but it it's definitely as i see your point and at some level i say
0:30:03	so there is the legal aspect is that privacy aspect a day
0:30:09	the trouble that will
0:30:11	goes the perception that all their collecting data privacy these privacy that so
0:30:17	there is the public relations aspect this is have to be managed very carefully because
0:30:21	you'd only takes and generally saying all goal is collecting data and setting you with
0:30:26	everybody
0:30:27	analysis us that of that i remember some years ago a well i can't remember
0:30:32	quite what they did
0:30:34	but we try to italy some chat data and some audible happened then somebody found
0:30:39	out something about a woman has a huge P R disaster and things like that
0:30:43	make these large scale so you just saw
0:30:46	so it's difficult at you know i have to be honest is very difficult to
0:30:51	two pass to these
0:30:53	all these barriers and then and then the other thing you have to deal with
0:30:57	is we executives that sound of then they look at
0:31:02	i data as a competitive advantage
0:31:05	so
0:31:06	it is possible it has been blinded pass like when we will use these n-gram
0:31:11	corpus
0:31:13	but it requires a lot of work been all non on or been taught
0:31:19	a
0:31:20	well i during the students here
0:31:22	so they can work money or whether you by that fact wanted to spend
0:31:27	so what we got with this
0:31:29	and
0:31:30	it is difficult
0:31:31	i know the success stories so i don't live many people know this but and
0:31:35	then but we started working on penalty he was at microsoft
0:31:39	and microsoft initial reaction was to we can keep it all in house and i
0:31:43	believe just like
0:31:44	for really heart and that gives jeff created for making sure that kaldi state open
0:31:50	source so i didn't know that
0:31:52	examples where we have succeeded should try
0:31:59	i agree with that i really would look like me to work on child speech
0:32:03	and we have a dataset that we've been collecting that we would love to be
0:32:07	able to release a the problem we have decide legal is you know word twenty
0:32:12	percent company
0:32:14	we have a problem like that we're gonna doing
0:32:16	that they will just be gone
0:32:19	because we get to we're gonna be crushed we have you know you're wanting left
0:32:23	and if someone's users because we still their kids voice and then knows what happened
0:32:29	i mean we're spurt completely and i think from a cost benefit analysis like that
0:32:35	risk is just we to be to take for a company of our size
0:32:39	but that doesn't mean that we would not love to have
0:32:42	the bright minds in this room around the world working on children speech we think
0:32:46	that's a wonderful problem that has
0:32:49	interesting and unique issues that are not present an adult speech
0:32:54	especially the conversational aspects that you generally don't see very much of a with love
0:32:58	to be able to do it
0:33:01	getting that
0:33:03	if the identification is challenging because the regulation the us that if it has maybe
0:33:08	a child's voice on digits personally identifiable there's no way to de identified and still
0:33:13	have audio
0:33:15	that's challenge
0:33:27	and a large amount of data to drive the research i don't remember and i
0:33:32	think the this should start with the end the in an S F or darpa
0:33:36	red and they should i know create the next babble or something about along the
0:33:42	lines almost the model
0:33:44	information search using speech as the main interface
0:33:49	they should generated data rather than looking up the global or microsoft
0:33:53	that won't happen now the thing is that you're to push the envelope so it's
0:33:57	i'll give an exact another example the google in the microsoft and gram carb i
0:34:02	and show you can harvest trillions of web pages be kind and you say to
0:34:06	be very useful so in other words
0:34:08	let's start by finding point solutions and hopefully a act in the limit individually the
0:34:14	liars we get the message that these kinds of thing okay but i think we
0:34:18	really should take an expectation say can we have this problem by giving it can
0:34:22	be a that maybe that's way to go
0:34:24	so i will say that one there is a will there is a way
0:34:29	and
0:34:30	corpora
0:34:31	the corporations like google and microsoft really are hiding behind the lawyers
0:34:36	and i have a very specific case
0:34:39	which is in our
0:34:42	program
0:34:43	to read documents
0:34:45	i don't
0:34:46	we had made ldc generate data for us and that was good but we know
0:34:52	that there would be other phenomena that would happen in the field
0:34:56	that happens to their happen to be in a huge collection form that you're as
0:35:02	your are core in nineteen ninety three
0:35:05	that was actually released totally cleared and released but somehow somebody in the government decide
0:35:12	that
0:35:13	that it really could not be released and we classify the data put it away
0:35:19	however
0:35:21	through a lot of paints mostly me and my staff
0:35:25	we manage to get that data we were least on the condition
0:35:30	and that cost a bit of money that somebody would have to go through all
0:35:34	release data and simply remove all the pi a personal information
0:35:40	and once that was done we have an incredibly valuable corpus
0:35:46	to work with
0:35:47	a so
0:35:49	it may be able to go over all microsoft
0:35:53	amazon facebook a to go through some expense make sure that a the data is
0:35:59	cleansed and then release it to the world so i give them the challenge to
0:36:03	try to the
0:36:08	i just thought of the suggestion
0:36:11	that might help with these which would be
0:36:14	if it comes from the user
0:36:16	let's say that we allow the user to opt in
0:36:20	and click as checkbox it says whenever use google voice i actually one these data
0:36:25	to be shared with the research community in the same like that there's is on
0:36:30	that you can decide whether you wanna be an organ donor right i'll you could
0:36:35	and the thing is the new generations
0:36:38	are also much more eager to should basically share everything right but i'm sure that
0:36:43	the evil it is just one percent of the users would be happy to let
0:36:47	that data used for any purpose that would be already you know millions of out
0:36:51	of hours
0:36:52	and so maybe it's not that far fetched and then there's no issues and so
0:36:56	as more and more people quote unquote transparent if you've read the circle for example
0:37:03	so it be an easy way to just have this state available and in fact
0:37:09	it could even be
0:37:11	kind of a requirement to say one donating this speech to well so i wanna
0:37:16	actually needed to
0:37:17	you know that the whole research community
0:37:25	i like to ruin microsoft better wanna donated the work into your sorry a so
0:37:31	if i can maybe i can make a know it
0:37:34	challenge or something that's a for microsoft and google would you consider maybe you bring
0:37:39	in you know some summer internship students and because even if you are to kinda
0:37:45	go through megabit same type of data here in setup and work a nice piece
0:37:49	they could be shared with the community because
0:37:51	even if someone's gonna release in check out that box assembling to really sit there
0:37:56	can still be sensitive information in there they do not thinking about when you're actually
0:38:00	kind of doing this and so if there some way to kinda have like a
0:38:05	litmus test of what
0:38:08	constitute something beyond
0:38:10	you know what would be publicly you available or something i i'm just trying to
0:38:15	identify the space and if it's trays out of that remove it
0:38:19	so would you consider supporting a couple of summer internships used to go bill that
0:38:24	for the community
0:38:31	i wear expect a small startup
0:38:37	a i don't know i mean
0:38:40	this is not something always can decide
0:38:44	you think i have a lot of power i don't
0:38:46	a
0:38:50	not i and just on a
0:38:54	i bring it up but i you know i have low expectations
0:38:58	to be on this is a lot of work
0:39:07	but with all this talk about data in back to a better they you had
0:39:12	mentioned the fifty languages are so you've collected in one week at a time i
0:39:17	presume they're sort of the network of contractors out there that are actually doing the
0:39:21	crowd sourcing in providing some of the language expertise could you say something about that
0:39:29	so
0:39:32	when we just of the language is therefore we
0:39:37	we basically made a conscious decision to not outsource
0:39:43	the whole
0:39:45	and for to
0:39:47	to work still not companies
0:39:49	because
0:39:51	we realise it was easier faster for us to do it ourselves
0:39:56	so we build this organisation to a lot of data collections and the linguistic annotation
0:40:01	so it's a combination of actually so the smallest that is like five
0:40:06	people full time
0:40:08	and then there is a lot of contractors that we bring cap linguistic teams for
0:40:14	three six months
0:40:16	we have all the tool infrastructure so they can work remotely
0:40:21	and a lot of the work from our stuff is managing this organisation because that
0:40:26	at any time that is like a hundred and fifty full timers and it's only
0:40:29	a contractors the linguistic annotations
0:40:32	and then for some so we
0:40:36	consciously made is the system to do it internally to have control of the whole
0:40:39	thing so for things that are small annotations that will require
0:40:44	to quickly we use that what on teams whistle so it so we have a
0:40:49	linguist and they annotators
0:40:50	and then when we require a large volume annotations then we use mentors we use
0:40:58	a lot of vendors not just one
0:41:01	mostly to keep a little bit of competitive person
0:41:04	and we force then to use or tools
0:41:07	so that the advantage of doing that is that as they're not if they use
0:41:11	our tools
0:41:13	you know the annotations come into our web tools and
0:41:15	in this what based also immediately
0:41:19	the comment or system and they we started then to our process
0:41:24	but at least at that level you know you sounds like you are i don't
0:41:28	i mean i sounds like you are
0:41:31	applying a reasonable a lot of
0:41:34	of annotation in quality control and is your process isn't all that different from what
0:41:39	mary describes with a with the babel program
0:41:42	is i mean is that reasonable
0:41:44	to say to i mean a lot of this stuff is for testing sets right
0:41:51	so it's not necessarily training corpora is mostly testing sets that because of the scale
0:41:55	of languages is a lot of late that right evaluate if every quarter you transcribe
0:42:01	thirty thousand utterances their language and then you focus on three or four domains
0:42:06	but language model for the top the languages you are talking is only about
0:42:11	i do not have a million
0:42:14	utterances per month been transcribed just for testing purposes
0:42:18	so
0:42:20	lexicons
0:42:22	in something which is we also
0:42:24	i mean as i said lexicons is something that
0:42:28	probably we need a little bit more work to automate but that the thing also
0:42:32	is
0:42:34	from the point of view of quality
0:42:36	there are things you can the with money or that it is you can do
0:42:39	investing a lot of a algorithms
0:42:41	and
0:42:43	and you know we have okay i want to sound we're more limited in engineers
0:42:47	and a speech scientist that in money not as much or something but
0:42:51	so it's easier not seriously it's easier for us to spend money and get data
0:42:56	transcribed
0:42:57	the and
0:42:59	to hire are
0:43:00	a lot people sometimes
0:43:03	so it
0:43:04	i all the way it is
0:43:12	this conversation because it still staying
0:43:15	with all let's get a lot of data
0:43:18	and let's get by better asr unit
0:43:21	and one of the problems and i saw that in the past
0:43:27	one we had lots of computing powers forces people with didn't when you got corrupted
0:43:31	by all this data keep working the same paradigms lately have a slight paradigm shift
0:43:37	and nobody bothers to
0:43:40	so that
0:43:41	think
0:43:42	come up with new methods of dealing with that
0:43:45	and
0:43:46	the entire black all of semantics will not be solved in the matter how much
0:43:51	data are going to
0:44:00	so it's i just the ldc you delete all the database is that we have
0:44:03	at the moment and we start from scratch and you're it should start thinking about
0:44:07	what kind of data we should actually start collecting now because i think again the
0:44:11	data that we have at the moment would be boring would be the same thing
0:44:23	so i have one question
0:44:26	the biggest part of this community i think is the graduate student
0:44:30	or at least part of it and i see that
0:44:36	the
0:44:39	the work is more is heavily driven by what's happening in the industry there's you
0:44:45	know it's very fast but it's very changing
0:44:48	and we have and a very good banner good i think
0:44:52	do so to tell us what we
0:44:55	she wouldn't and worked
0:44:59	the
0:45:00	university programs where that you could recommend the steps that you good data for
0:45:06	i was to so to get up to speed with
0:45:09	what's going on
0:45:11	but that's my first question
0:45:13	and the second question is more to better your presentations very good
0:45:19	i just wanted to ask how to do so to scale up from the university
0:45:24	to
0:45:26	to what it is that you doing so those are two questions thanks
0:45:31	let the first one
0:45:33	actually going back to the having
0:45:37	maybe we should change the way we have no real expecting companies to do stuff
0:45:43	for you for us
0:45:46	i think this is a large can be the and you know i can collect
0:45:49	the type of data that you and need and that crowd sourcing with the people
0:45:56	here and there's a logical mean and i know
0:45:59	if you look at interspeech i classes on the order of thousands of people one
0:46:04	in this community so you know one can develop an application where you can get
0:46:08	all the data i would trust sounds you for creation able to as my personal
0:46:12	data
0:46:13	so that's one layer perhaps getting data and rather then you know who's gonna give
0:46:19	me the data can we generate the data
0:46:21	and going back to the question as i said i think there's a disconnect real
0:46:27	companies are going is you know the they had the data is the most important
0:46:33	thing it's not really machine learning or techniques that you're using
0:46:37	and they also all the devices to access the they on the have the they
0:46:43	on the software they on the data to they want to control how you access
0:46:48	just data and speech is the natural user interface one of the modalities that this
0:46:53	and they want to control speech that's why you want you know you see apple
0:46:57	use amazon microsoft other companies investing heavily in the city a that is a high
0:47:03	would you know like to have the students working on and there are challenges
0:47:08	and also there's another gap between you know search committee and language understanding speech community
0:47:16	the new did action is actually falling in between them that slap scale language understanding
0:47:21	and those are the areas i would in intended to focus
0:47:36	a so i either a very statistical right is the relation between a because us
0:47:42	speech and text to be because we had to domain of for a text processing
0:47:48	for data mining cut some sort so we need to get the any data from
0:47:54	B C doesn't need to be and i'll people but the analysis of the data
0:47:58	and the
0:47:59	analysis of correlation between the data those also so we can expect so anything from
0:48:04	speech but there is so huge
0:48:07	the possibility for the analysis
0:48:12	i in this day so but very important topic
0:48:17	or about solar this a big data analysis the system so it is here and
0:48:21	you can delete
0:48:24	there was the other half of the question for but
0:48:28	okay to the other half of the questions about how to scale from my university
0:48:32	to business that
0:48:35	i would say that the
0:48:37	the simple also these
0:48:40	go outside and ask the user does who really needs we were able to do
0:48:46	you use this really neat course if you do you go up to company so
0:48:52	that the work so the speech data the data immediately tell you will target difficulties
0:48:58	of would be to solve
0:49:01	this and the user this companies have money so if you are able to save
0:49:07	them some money or vq customers today i the if the money to
0:49:13	that it would have anything today
0:49:14	i guess that was originally
0:49:16	multilingual you
0:49:19	the group so that it goes from the university research to
0:49:23	kl
0:49:25	well i guess that was but a question compare draw originally like what i will
0:49:29	go manage to scale up from the university research
0:49:34	google
0:49:37	the expertise better right now that's a i think everybody came from the induced
0:49:43	the seed of this it's team is on industry people
0:49:47	i be an identity labs
0:49:50	a speech words
0:49:54	can i speak
0:49:57	so i just had a couple of
0:50:01	thoughts about some of the various things are going on first like and i can
0:50:04	agree that the connectors been a great resource to people doing multimodal research in universities
0:50:11	it's really it's a nice piece of hardware that it's easy to using like gestures
0:50:18	of those people in our lab and other places i know are using it
0:50:21	as well as sort of or publicly available speech recognizers
0:50:26	on the on the issue of the data i think
0:50:31	i don't think anything's ever gonna happen of companies that are collecting the data for
0:50:35	the reason to have been described all through the years even joe bell labs when
0:50:40	they had all the data
0:50:43	it wasn't share with the community sometimes these things later in time come out through
0:50:49	the ldc
0:50:51	but for the various reasons that pedro one others describe for
0:50:58	privacy issues and potential competitive issues
0:51:04	it's not gonna be really still take students there about the students work on the
0:51:08	data as interns
0:51:09	but having said that the techniques that they're using
0:51:13	it's not impossible to collect data ourselves there are
0:51:18	efforts to collect the data from different languages you can go out yourself and make
0:51:24	a apps
0:51:24	and have people read speech there's mechanisms to crowd sourced annotation if you really want
0:51:30	to do that the community could do that we've deployed apps and
0:51:35	you're not gonna collect data on the same scale but you can certainly as people
0:51:41	said it there's away all you can make it happen so i don't think we
0:51:44	should look to that be companies the feeders crumbs we can work on we can
0:51:50	if something really important we can go out of the community and make it happen
0:51:55	another thing talking about what research should people not of the company to be doing
0:51:59	or what should students be looking at
0:52:01	joe mention the analogy of she's under the spotlight well publicly available corpora sure they're
0:52:08	spotlight some people tend to work on those problems and the problems that companies are
0:52:13	working on also tend to be spotlights and you think about that but there's a
0:52:17	lot of heart problems out there
0:52:19	joe mentioned semantics
0:52:22	there are plenty of others that maybe are not commercially viable better are really heart
0:52:29	and interesting problem and i think would come back and benefit a more conventional thing
0:52:34	so people shouldn't just look at what's out there right now as what they should
0:52:39	be working on but think about
0:52:42	what are people not working on that are interesting are problems
0:52:47	so that's my two cents
0:52:50	so what i'd also like the question
0:52:54	it does seem to me that industrial research is really development it tends to be
0:52:59	in your term
0:53:01	and universities should be doing basic research and possibly things they could feed into development
0:53:08	type work
0:53:09	all i personally think universities and industry have to find a way to partner
0:53:15	in order to make sure that there is relevancy in terms of the research but
0:53:19	that you don't
0:53:21	for the basic research that has to go on at the university level and the
0:53:25	question is i think there's attention there data is an aspect of the data certainly
0:53:29	does drive problems people will go and participate in and in an open evaluation because
0:53:35	of the data the question i have is
0:53:38	what do you see is the ideal partnership between you are companies in universities because
0:53:44	ideally it shouldn't just a matter of recording there has to be a reason why
0:53:48	you wanna come to these conferences
0:53:50	and you have a potential to be able to shake that future students the future
0:53:55	phd students in a wide variety of countries and it does seem like something along
0:53:59	those lines seems an important thing to do
0:54:03	so at the content but i also think i would like to hear a little
0:54:07	bit about your thoughts about what the ideal partnership might be
0:54:27	i think there has to be an incentive
0:54:31	there enough problems
0:54:34	we had a size team in working in the product group
0:54:38	and we another that you know if you are not in research you are not
0:54:42	really setting your agenda in terms of the time schedule
0:54:46	you have certain deliverables you have a great ideas but you just it's not really
0:54:51	the priority because of the next deadline so that as a summer intern that actually
0:54:56	lifeline for so we have these great problems we just don't have time to what
0:55:01	a hand them and we have the summers to that's working but that's not really
0:55:04	the solution the solution is
0:55:06	you know the problems of a are all this and the i can then you
0:55:11	will be a hand those it's just what is the incentive on the university side
0:55:17	then we'll engage them working on these problems to me that is missing
0:55:25	and also say that there has been some more shift
0:55:29	so that you have been and research
0:55:32	when i first started long term research was about fifteen years
0:55:37	the a long term research is three years
0:55:40	and that's a real problem
0:55:42	and
0:55:43	to answer your question mary i'm not sure that industry should rivalries
0:55:49	i think
0:55:50	if the heart problems
0:55:52	artifact and possibly solve
0:55:54	eventually they'll find their work research if you're wall
0:55:58	the industry to do that the research
0:56:01	most likely the heart problems will never get done
0:56:07	i wasn't to sit
0:56:09	but idea where is to
0:56:11	and a lot of this in summary than right i mean a
0:56:15	induced response or things like a johns hopkins also
0:56:19	in this true sense employees there a on the company salary i mean i know
0:56:25	everybody that C
0:56:28	we sponsored conferences
0:56:30	students through some of programs
0:56:34	and actually that's an indirect way of influence i think many idea to this is
0:56:38	a initiated because of the student grams and they work with rookie or
0:56:43	whoever and they say hey that's like this at the end in it might expanded
0:56:49	there are university grounds that most companies ones they have a size they used to
0:56:55	ready to
0:56:57	the research and it is the care about
0:57:00	a son not sure
0:57:02	there is anything extra to be that and then of course it is that personal
0:57:06	connection right
0:57:07	a
0:57:09	i mean the fact that i'm afraid with some fact that the
0:57:13	we definitely it's totally it definitely works
0:57:17	so
0:57:18	and the coming here i always say that when i come to these conferences
0:57:22	but this particular one is a small enough that i can actually see that posters
0:57:26	but a larger conferences like i guess for me to value is to two cats
0:57:31	a without people in academia and see what they're doing in a dog and drink
0:57:37	a beer
0:57:38	i sit more kind of informal
0:57:40	way of an and sometimes tell than a weather you submit the world around we
0:57:44	would be interested in that
0:57:47	so as you know the more indeed it was of influence i don't think we
0:57:51	need to formalise it
0:57:52	so much
0:57:54	there i think they have been exceptions where
0:57:57	who'll a research lab something created with the sponsors it phone
0:58:03	university
0:58:04	i from the company i know
0:58:08	for example bewilderment set typically small seventy thousand dollars fifty thousand a list but they
0:58:14	have been cases where have the million dollars million dollar something given to university
0:58:19	to see in you centre
0:58:24	so i mean sometimes that happens
0:58:27	but that again is not at midas at my little pieces of the security vehicle
0:58:32	comes from a some foreign all these guys a then they given half a million
0:58:35	dollars
0:58:38	so i guess we have done the time that would result for this panel discussion
0:58:42	so i we should remember actually the idea that the next i guess maybe there
0:58:47	should be special discount for the people that are willing to record a conversation and
0:58:51	then we can collect the data and i'm not a of course also the conversation
0:58:55	ended maybe there should be special discount for the people that make this conversation at
0:58:59	the end of the blanket which would make it
0:59:02	would be more difficult condition and i guess that we know should all go and
0:59:06	practise for that
0:59:09	so let me thank all the all the speakers again

Selected poster summaries, panel discussion

Applications Day