Speech Transcript - Closing discussion "What's wrong with ASR and what can we do about it"

0:00:15	so i mean i don't have this is all one solely basically let's go back
0:00:19	to the they to the first day all the way and what is that you
0:00:24	didn't ask what is it whatever it is that you didn't say
0:00:28	you sort of each that maybe you said
0:00:31	that now is a good opportunity because it happens very often and their questions
0:00:36	but of comes out of don't know how fast but then you come home and
0:00:41	say all i wish that be so this so alex definitely yes it is one
0:00:46	thing to say
0:00:49	sorry for saying something but
0:00:55	well i'm close the circle of the
0:00:58	basically at the beginning of the meeting we
0:01:01	learned a lot
0:01:03	well it's i want a lot about what happens and people's brains and so forth
0:01:09	that's
0:01:11	i think that something systems prove that we can understand language
0:01:19	well i thought that's my
0:01:23	but it here the and you're basically tell us all that stuff but have read
0:01:29	out from between right after that talk i'm right now is basically you know probably
0:01:34	the wrong idea
0:01:35	so
0:01:36	are you buying well whatever you why are we do we need to learn something
0:01:41	from those are
0:01:43	and how can we do that
0:01:45	that there are so real sounds so you have two choices a one choice you
0:01:52	have it is about the models used all or something probably existence proof
0:01:57	but
0:01:58	the other is the think about computation
0:02:01	and build a model that way
0:02:02	models don't have to be the say so there's certainly used to pass
0:02:07	and i think learning for hence is a very bright
0:02:12	i pay some attention to that
0:02:15	recipes to which a lot a lot from the altar system
0:02:18	to the extent to understand it and we understand part of
0:02:23	so i think there's two avenues of information to mine once
0:02:30	physiology the others computation
0:02:33	but still there's also the model suggests also so what kind of the evidence we
0:02:39	can take advantage of
0:02:41	so i have a couple sense for this question too so you know regarding i
0:02:47	don't and tuesday the low resource data and we had some zero-resourced arts and that
0:02:51	sort of right most work and it turns out when you actually start removing supervision
0:02:57	from the system
0:02:58	the things that actually allow you to discover units as a speech automatically are not
0:03:04	the same features that we use for supervised process is not the same models that
0:03:08	we use this process
0:03:10	and so somehow
0:03:12	it is the case that i think well a lot of people might not be
0:03:15	interested in sort of that extreme of research because it might not always be practical
0:03:19	from one you don't insist that you can sell and so forth i think that
0:03:23	style of work where you're forced to sort of connect yourself to something like for
0:03:28	example of the and many was talking about with human language acquisition and make something
0:03:31	consisting
0:03:32	between those things can send you to new classes and models and you representations that
0:03:37	you're forced into that i think could eventually be that can be fed back into
0:03:42	the supervised case for forgetting
0:03:49	i'm glad that you go also back to the early days like monday and tuesday
0:03:53	be more skillful of optimism and not for the two for thursday where we all
0:03:59	i like of the coach our
0:04:01	i like to remind you that indeed i think that a community is diving into
0:04:07	new types of models
0:04:09	well below for worse because of course always then you start some new paradigm everybody
0:04:14	chime suddenly turned quickly you may get also discouraged but additionally these nonlinear systems each
0:04:21	of these scores neural networks or something they are very good in being able to
0:04:26	construct all kinds of
0:04:29	architectures highly parallel architectures
0:04:32	we have to think about the new a select up think the models the maximum
0:04:36	likelihood is gone right and this ordinance along i think there is a plenty of
0:04:41	work to do that if i may speak for myself i i'm big deal believer
0:04:47	in highly parallel assistance a layer
0:04:50	there is a many the use of speech being provided
0:04:54	and then the big issue is how do you pick up the most appropriate you
0:05:00	which is which might be appropriate for the situation so adaptation not abide by adapting
0:05:06	the parameters of the model by adapting like picking up the right processing stream very
0:05:12	much along the lines i was quite impressed
0:05:14	what chris while it was telling us that when he added a lot of noise
0:05:18	of course many few euros where good but the ones who were what were still
0:05:24	a very good so essentially my purse i'm speaking for myself no my view is
0:05:29	like that system should be highly parallel
0:05:33	the trained on the whatever data are available but not like one global model on
0:05:39	many parallel models
0:05:41	and it is possible different and independent models and then the big issue is that
0:05:45	you pick up a good one so this is that one direction i'm thinking about
0:05:49	i don't know what other people think about it
0:05:52	but i think that there is a whole i think that whole new a whole
0:05:56	new area of research is and whole possibility for new paradigms is coming
0:06:02	i mean that's what we see all the past few years with the re you
0:06:06	re invention or rediscovery of alternatives to gmm models
0:06:14	i didn't mean to speak i mean i just one mean and give you some
0:06:18	space for thinking what you want to sail you want to ask
0:06:33	so i would just like to
0:06:35	pos ask a question about the possible eventual test of the field in a feature
0:06:42	so it happens i mean i'm not old enough to see this but for example
0:06:45	for
0:06:46	for coding it happened it after their strong technology transfer understand much more established
0:06:53	the research fields
0:06:54	i it didn't die
0:06:56	trips freak terribly right
0:06:58	and this will happen one day with automatic speech recognition
0:07:00	we have some stop these methods and then
0:07:03	they won't be that much things to research i this is going to happens some
0:07:07	are applied
0:07:08	and i was wondering how much time do we have
0:07:11	because
0:07:12	we are already seen a very strong twenty times try and there's a lot of
0:07:16	investment by
0:07:17	all the major
0:07:19	technology by using the market
0:07:21	so are we close to really sorting is not i don't mean sorting semantic context
0:07:26	that's not condition
0:07:28	but are we close to
0:07:31	study some standards
0:07:32	and then is done
0:07:33	because what i we got the research on
0:07:35	how close are we
0:07:37	and years twenty years because the but my carrier right maybe it's side effect
0:07:48	i've life yes for the
0:07:51	i
0:07:54	i think i people that
0:07:57	that's good
0:08:00	this is average spectral for your funding sources
0:08:06	it's a can be all close hope that there is going to do and we
0:08:10	will that
0:08:11	stick i think i tell my students i come i still is that they are
0:08:15	they getting the speech recognition they are safe for life that this was my experience
0:08:24	somehow i think
0:08:26	comparing speech coding to
0:08:29	speech recognition just doesn't fly at all
0:08:32	i mean speech coding
0:08:35	unless you're going to try for their
0:08:39	utopia of three hundred bits per second which does then requires synthesis coding
0:08:45	there's just no comparison
0:08:48	very straightforward and eventually yes
0:08:52	standards with set
0:08:54	the field i same
0:08:58	could be
0:08:58	to about a coding of pictures
0:09:01	very trivial to cover pictures
0:09:04	we have an impact three impact for
0:09:06	it's all done
0:09:10	picture understanding which is very much like
0:09:14	is the thing
0:09:16	sort of book
0:09:19	i do think
0:09:20	that to
0:09:21	the feel this very far from that
0:09:26	but i think the field
0:09:28	will kill it
0:09:30	if it assumes that it as the solutions
0:09:33	and then continue
0:09:35	to plough through just working the solutions that we have right now
0:09:41	all done so one other thing that i would probably a
0:09:47	like to see happen is are
0:09:51	rather than sitting around and talking about what's wrong with the field
0:09:56	is possibly construct certain experiments
0:10:01	that could point
0:10:03	to what's going on
0:10:07	just
0:10:09	for example when steve was storing before
0:10:13	i was thinking
0:10:15	so you have a mismatch in acoustics and you have a mismatch and language
0:10:19	try to fix one without the other
0:10:22	and C
0:10:23	what is the result where it falls
0:10:29	so i think it's a wonderful want to remind people jump ears was advising us
0:10:35	to design a clear experiments with the answers
0:10:40	so that science can of speech can grow steadily step by step
0:10:46	rather than the rapture for computers and unproven theories
0:10:52	i have are
0:10:54	maybe a couple happens observations
0:10:56	we talk about neural nets
0:10:58	right now as an improvement and i'm sure it's obviously an improvement
0:11:03	it actually goes in the opposite direction
0:11:08	what we're all advising ourselves to do that is it does nothing about any independence
0:11:13	assumption it's just building a better gmm which is the place where you said that
0:11:17	wasn't a problem
0:11:18	it's not modeling dependence
0:11:19	except to the extent that we model longer feature sequences which we tried to do
0:11:25	with the gmms also
0:11:28	in terms of
0:11:30	where we will you know when will we sell but obviously not five years but
0:11:35	that doesn't mean ever
0:11:38	so it would be nice if we could come up with the right model obviously
0:11:40	that would be the best answer
0:11:42	i'm not sure that
0:11:45	speech coding and image coding i don't believe they were saw by coming up with
0:11:51	the right answer i think they were saul by coming up with good enough
0:11:56	answers that
0:11:59	wouldn't have been practical
0:12:02	twenty five years ago because the computing was not enough to
0:12:06	implement those solutions but they are now
0:12:09	and so those
0:12:11	fairly simple fairly brute force
0:12:15	expensive methods now we're practical and work just well enough
0:12:19	so i think speech recognition could go the same way it doesn't you know it
0:12:23	could i if we if someone is very smart pick the right answer that's great
0:12:27	but if you
0:12:30	look at how much we've improved over say the last twenty five to fifty years
0:12:36	there's been a big improvement
0:12:40	say and twenty five years
0:12:43	and if you imagine the improvement from twenty five years to now ago to now
0:12:49	maybe two more times
0:12:51	and the so this is next you know grows exponentially so fifty years from now
0:12:55	i think we could say with almost absolute certainty
0:13:00	speech recognition will be completely cell to all intents and purposes that is it'll work
0:13:06	for all the things you want to do little work very well it'll be fast
0:13:09	it'll be cheap there will be no more research in it
0:13:13	because you will have
0:13:16	computers with
0:13:18	i don't know what the right term is but change of the ninth
0:13:21	memory and computation where you know ten to the fifteenth computation and you'll have modeled
0:13:29	all those differences
0:13:31	by brute force it won't it still would never work to train on one thing
0:13:38	and then tested another but you want have to you will have trained on everything
0:13:43	you know you will of trained on samples of everything so that it just works
0:13:47	so
0:13:49	the doom and gloom doesn't have to work that way it would just be nicer
0:13:52	to find a more elegant solution sooner
0:13:55	bcmvn this is also positive value there is a just for fast
0:14:00	i don't know nine is probably this probably few more data people in this room
0:14:04	this is a actually would point there's a ten to nine some neurons in auditory
0:14:08	cortex so that must be turned to the nines
0:14:12	tend to the nines away so first solving the problem and maybe it is the
0:14:16	right way to go
0:14:19	i think there is another aspect that's missing which is a
0:14:23	looking at is speech recognition this is a little
0:14:28	no acoustic signal and you're model
0:14:31	model for
0:14:32	i think we need to bring in the context and
0:14:35	we are moving towards that
0:14:39	feature where the palestinians about the context about your personality
0:14:44	but the personalisation all these things should be
0:14:49	incorporated into whatever model
0:14:51	and that will be used some of these ambiguities that if you just looking at
0:14:55	the acoustics
0:14:56	that's another you know feature you know it
0:15:02	actually i would also like to continue on what she was telling us that there
0:15:08	is another one solution to speech recognition there is many right i mean there are
0:15:12	some just like there is many cars and many bicycles and many what side i
0:15:16	mean is something solutions we need solution to a problem
0:15:21	and of course what we keep thinking about all the time is that we will
0:15:24	so you can find peace i think it's okay to find many other so many
0:15:29	smaller solutions it is not questioning my mind that recognition made enormous progresses i mean
0:15:36	actually even i use it here and there i mean of informal will go voice
0:15:40	and this is this is already quite something say so google voice is a good
0:15:44	example since we have a over here i mean i where the solution came to
0:15:50	the point where it's becoming use for just like a car used for do we
0:15:55	all agree that this is not ideal way of
0:15:58	moving people from one place to another it works to some extent so i maybe
0:16:03	we should also think not only about this solution but about many
0:16:08	solutions to
0:16:10	i wasn't those say that
0:16:15	and this relates to
0:16:18	about data
0:16:19	one thing we see anything this is that
0:16:23	given our models language acoustic models
0:16:27	young a particular size
0:16:29	with a C V
0:16:31	and
0:16:32	and in that sense what you say about what was also somewhat
0:16:39	you were kind of suggesting and symbols of classifiers and rocky suggesting a personalisation their
0:16:44	estimate well because
0:16:47	we also and all that if i build the model just for you
0:16:50	and acoustic model just for you are language models just for you it really works
0:16:54	well
0:16:55	and
0:16:56	maybe is not the most a layer and solution but
0:17:00	given enough data and enough context
0:17:02	and in of computational resources that works really well
0:17:06	and i think don't want to see a lot of work in that direction the
0:17:10	prize will have to pay is that
0:17:12	you have to let a whoever's building the recognizer for you what there is no
0:17:16	one's or microsoft whatever
0:17:19	you have to let them access your data
0:17:22	and without that you will have to label within a speaker in the and then
0:17:26	a context system which might be good but not as well as it can be
0:17:30	or you may also provide the means for the user to a modified to technology
0:17:35	in such a way that it works best for that even user and a given
0:17:38	task right you don't have to the i'd necessarily of on the big brother whatever
0:17:43	for me thanks but if you provided technology
0:17:46	which is that have a just like actually most of the technology which we are
0:17:50	using thing about the car i mean you know you can drive it fast you
0:17:53	can drive it slow you can drive you crazy you can drive it safely and
0:17:57	it's a little bit up to you technology basically was provided in such a way
0:18:01	that user can adopt
0:18:03	it in due to its knees i'm use i think that it so this is
0:18:08	one way you the other ways you need we are trying to build is big
0:18:12	huge model which will and the income parse everything i'm more like
0:18:18	believer in many parallel models very much along the lines that human perception in general
0:18:23	because you need wherever you're looking the sensory perception typically always find many channels each
0:18:31	of them looking at the problem before and way
0:18:34	and of course what we have available to us is to pick up the best
0:18:38	way and any given time and this is something which we have two and perhaps
0:18:42	you know but i don't want to push physical direction which i'm thinking about i'd
0:18:45	like to
0:18:48	my belief is that it just building one solution for everything is maybe not also
0:18:53	the best the best way of
0:18:56	quite
0:18:59	so i just wanted to say that
0:19:01	that the world is a dramatically different place
0:19:05	now that it was in nineteen so
0:19:10	and that
0:19:11	that the constraints
0:19:14	that row
0:19:16	of the current sort of formalism they don't exist anymore and i think chip you're
0:19:21	in shell but says that and i agree that you know if somebody didn't know
0:19:26	anything about what the way we do this and they started
0:19:30	a fresh
0:19:31	and thought about it in the current context it would be remarkable
0:19:37	that person came up with the formalism that we do have now
0:19:41	and
0:19:42	i think that
0:19:44	we should spend more time i don't know we should do i certainly will thinking
0:19:50	you know about how to do this in a different way given what we have
0:19:54	and what we know about the brain i mean it's remarkable how much
0:19:59	more we know about humans
0:20:15	just comment concerning the speaker-dependent stuff that you put gets it seems year
0:20:22	but it's not really solving the problem i mean you can make really very good
0:20:26	speaker dependent model but then the person i don't know switch the microphone and you
0:20:30	are again most or he's called alright of no use some obscure digital coding which
0:20:34	is completely cleared for the human beings but because of some strange digital artifacts your
0:20:40	whole algorithms break again
0:20:41	so this is i think this is somehow for the people each i'm i mean
0:20:46	to help get business in the i completely speaker-dependent environment
0:20:49	and i assume that for the people reach are in the i don't know in
0:20:53	the environment which is completely speaker independent it must be kind of the power of
0:20:57	these you know because you have a huge amount of the data which a speaker
0:20:59	dependent so
0:21:01	but it's not really sort of the problem is making the problem we came out
0:21:05	of our error rate and everything obviously because you can train to the speaker but
0:21:08	it's not really dissolution
0:21:10	that you're looking for
0:21:12	this just commands and then also somehow my
0:21:15	intuition or feeling is that the
0:21:18	i just i just know that if i understand what the people are talking about
0:21:22	it easier to me all the to perform a speech recognition
0:21:26	so it has to do something with semantic and it has to case to do
0:21:30	something that semantic and with the with the intelligence and the and
0:21:35	i don't know on so we use but this is the C just the kind
0:21:39	of intuition
0:21:43	i have a common about the semantics
0:21:46	my perception is that
0:21:49	in any many groups
0:21:51	i mean many companies not so low resource
0:21:55	they tend to treat the recognition as a black box
0:21:58	and semantic models are built on top of it
0:22:01	maybe they do a little bit of accounting like or maybe let's go phonetic matches
0:22:07	just in case the recognizer makes a mistake
0:22:09	and i
0:22:11	and it that's okay to get something up and running but i think that's a
0:22:15	stupid mistake
0:22:17	that the semantics and the recognition so be closer together
0:22:24	i have to say it's difficult to convince some of the people doing
0:22:29	semantics that don't have any speech background
0:22:33	that since would be done differently but i believe
0:22:36	this would be influenced
0:22:37	back and forth
0:22:51	was mentioned that is
0:22:54	someone starting fresh
0:22:57	start with the approach we do
0:22:59	and it probably really true
0:23:01	one of you hear it
0:23:04	the someone E mailed out so gone into that once is
0:23:08	now we apply all the in that station the speaker adaptation or all the compensation
0:23:14	development features now neural networks someone have that right
0:23:19	it's just not gonna work right out by
0:23:22	and you can i
0:23:24	compensate for thousands of hours that on in its current a broken
0:23:37	the renaissance neural networks so morgan
0:23:47	using neural networks in the in their fibre formalism because nobody
0:23:55	you know
0:23:56	was that interested because of all the other things that we're working so well and
0:24:01	why would why would anyone in their right minds what it right
0:24:04	but then all of a certain work back to you know we're back in this
0:24:08	zone where people are doing it so i'll all i'm saying is that the less
0:24:11	and i take from that is
0:24:13	you know if you can if you can work in if you can get something
0:24:16	that is that is that makes sense and is and that is demonstrated really good
0:24:23	on a small problem
0:24:25	well then maybe that would be pretty compelling
0:24:28	i mean i agree with you though it's a it's the success is pretty are
0:24:33	you know if i have it is something that i am i gonna say what
0:24:36	we think about this for forty years know exactly
0:24:42	we all know thirty six
0:24:44	and maybe they are like to do something that we should do dishes designing experiments
0:24:48	where we say
0:24:50	i will show you on the state-of-the-art systems that my method works a little bit
0:24:55	better
0:24:55	because that's it itched it is not really such a very scientific is it i
0:25:00	mean assigned to the experiment is that you isolate one problem and you sort of
0:25:03	try to change the conditions and see the things go up postings go down into
0:25:09	the goodwill design experiment if you get worse and you predicted be worse
0:25:14	given your hypotheses i think you are meaning right we are almost never
0:25:20	report results i that because our belief is that the only way to convince our
0:25:25	peers that what you are doing is used to use was used for is that
0:25:30	you get a low word error rate is possible on the state-of-the-art systems with the
0:25:35	optimal accepted task whatever it is at the moment
0:25:38	so i designing good experiments again going back it seems seriously to jump beers be
0:25:44	designed a clear definite experiments so that science can grow step by step by step
0:25:50	i seen that we have to learn how to do that and since you mentioned
0:25:54	in new networks i want to share with you might personal experience
0:25:58	it's different houses here is going to be and he may not even remember
0:26:02	but a long time ago once the post postdoc at icsi here on the experiment
0:26:07	very he had a context independent a hmm-model a context independent phoneme and the you
0:26:14	wanna model and you wanted model was doing twice as good as the hmm and
0:26:20	that can means to be i mean you know that we stick to neural nets
0:26:23	throughout the dark ages on you of neural nets N I partially because we invent
0:26:28	have a so but in hmms an lvcsr as but as a partially because i
0:26:32	truly believe that because that was an experiment which was very convincing to me if
0:26:36	i have a simple a gmm model
0:26:39	without any context-dependency to try easy to of course building to do system and context
0:26:44	the i mean context independent hmm model which was the only way which we between
0:26:49	you have to be noted at a time
0:26:51	and you and that is doing twice as good as the hmm why wouldn't i
0:26:56	stick to this at you are like model i'm glad that we did
0:27:00	i don't know steep if you remember this experiment i say good but i think
0:27:03	it actually got a piece even in transactions eventually right
0:27:10	you know what one other where you can get use of out of a local
0:27:13	optimum is change the evaluation criteria right and i think and i think that's i
0:27:19	mean and part what mary's than what the babel program you know have keyword searches
0:27:22	the task in atwv well extracted word error rate it's not always perfect and i
0:27:26	think another thing that
0:27:28	people we seems to me really are to be reporting when you put a word
0:27:32	report a word error rate is not just the mean word error rate but the
0:27:35	variance across the utterances because you can have a five percent word error rate but
0:27:39	if a quarter of your utterances are essentially you know eighty percent word error rate
0:27:43	which can happen then you know that's a good way to start figuring out how
0:27:47	to get your
0:27:48	technology a little more reliable
0:27:51	i was hoping you would have a comment
0:27:54	i feel
0:27:56	i feel obligated to
0:27:58	to
0:28:00	talk about ancient history since i'm getting a little older now
0:28:05	i remember when hmms started and we were certainly not the first to use them
0:28:11	we were sort of in the middle of that
0:28:13	of that previous
0:28:15	a revolution
0:28:17	the big criticism there were two big criticisms of hmms
0:28:22	relative to the previous method the previous method was just write the rules because we
0:28:26	all know about speech and say how it works and those systems which i wrote
0:28:31	systems like that back and the early seventies because i was a late adopter of
0:28:35	hmms
0:28:37	those systems were very simple easy to understand extremely fast
0:28:44	needed no training data
0:28:46	that sounds nice right
0:28:49	and they could do very well on set on simple problems without training data and
0:28:54	the hmm is the government argued in other people argued and sometimes we argued hmms
0:28:59	were too complicated require too much storage too much training too much memory and would
0:29:06	never be practical
0:29:09	well obviously things changed and it wasn't only computing power that was a big factor
0:29:15	but it was also learning how to make it more efficient and we do a
0:29:22	combination of all of those not being
0:29:25	re so rigid just to say we have to do it with zero data and
0:29:29	just what i learned in my acoustic phonetics class
0:29:33	we could use data
0:29:34	more data always helped
0:29:36	learning to do speaker adaptation rather than speaker dependent models
0:29:42	okay neural nets
0:29:44	neural nets work done simple problems but not on more complicated problems
0:29:50	and what was need i'd say the reason it works now is because we can
0:29:55	now do you know it two three years ago the things that we're working we're
0:29:59	requiring two months of computation which is just you know unacceptable completely unacceptable some bold
0:30:06	people did that that's great and then they figured out how to get better computers
0:30:12	that all of this argues that each revolution which happens that at twenty five years
0:30:18	cycle
0:30:20	is the realisation that all of the intelligent things that we thought we knew
0:30:27	can stevens would tell us what happens with formant frequencies and i learned all those
0:30:31	things all of those were not the way to go the real understanding was not
0:30:36	the way to go with bothered us because we'd like to think about
0:30:42	we like to think about you know the them phonemes and things like that
0:30:48	but we know that phonemes are abstractions
0:30:51	we know that formants are an oversimplification
0:30:54	everything that we learn is an oversimplification and computers are just simply more powerful than
0:31:00	we are
0:31:02	then we can anything we can write the not more powerful than the brain but
0:31:06	the right more powerful than anything that we can write in a in a program
0:31:10	so i think
0:31:12	that would argue against
0:31:16	the i i'm not i'm not saying that you shouldn't keep trying to find the
0:31:21	right answer but i think history has told us that the right answer is think
0:31:26	about more efficient ways
0:31:29	both you know computing will increase its increased by factor of a thousand and the
0:31:33	last twenty five years both segments memory and storage and it will increase by a
0:31:37	factor of a thousand every twenty five years forever
0:31:41	and that's a big number in fifty years
0:31:46	but at the same time we can think about algorithms that are a thousand times
0:31:51	more efficient
0:31:52	that had that has happened and it will happen
0:31:57	it a little you know collects that's of data other people can collect parts of
0:32:01	data i think it will happen that we will have corpora that include the speech
0:32:06	of millions of people from
0:32:09	hundreds of languages in hundreds of environments
0:32:15	and if you just imagine that it was let's just pause it that it was
0:32:21	simple and easy to collect millions of hours from all these environments and memorise all
0:32:26	of it and learn what to do with it and compute it store it all
0:32:29	in something that fits in your you know in the chip that's embedded in your
0:32:34	in your hand or something or in your you in your head
0:32:39	well in it just works you don't know why or how it works but it
0:32:42	works
0:32:44	so i
0:32:46	while i have the same desire to understand
0:32:53	intellectually what's going on i would that almost anything that will be of the solution
0:32:58	that eventually works
0:33:04	so i'd like to make the other side
0:33:07	and the other side is if you look at the history of science
0:33:10	what's happened is
0:33:11	are truly
0:33:13	stupendous advances have come from understanding where we are
0:33:18	recurrent models don't work
0:33:20	it's not
0:33:21	that we shouldn't try to push models
0:33:23	but the think that you're describing
0:33:26	engineering
0:33:27	i'm pam of engineering what truly understanding comes from looking at the places where our
0:33:33	current models fail
0:33:35	and all of the things that we've been doing for the past twenty years are
0:33:39	data
0:33:40	for the next
0:33:42	and we should be paying attention to where we fail
0:33:45	and that's where we're gonna find the success
0:33:49	so a
0:33:51	one the to it at a little bit
0:33:54	it seems like this i think that i like which we always think
0:34:00	a
0:34:01	the old story is if you take
0:34:04	an infinite number of monkeys and give them
0:34:07	infinite number of typewriters eventually will i shakes
0:34:11	and i think that's what you're suggesting
0:34:14	a you have a few problems number one
0:34:18	more is lower it did
0:34:20	fairly much comes took came to an end
0:34:23	and that industry is facing the same problem unless there is a dramatic
0:34:28	technological shipped
0:34:30	a you're not going to get
0:34:33	the kind of doubling that we've seen every eighteen months
0:34:37	in the future
0:34:38	basically quantum mechanics eventually getting you way
0:34:43	the alignments are so narrow now that there are not too many atoms or
0:34:48	to allow for them to continue to be
0:34:52	a
0:34:53	somebody else said something about
0:34:56	well what happen if people started a
0:35:00	doing this research all over again would be find the same solution
0:35:05	a i'm waiting now a marvellous what paul designed the nature tries to explain evolution
0:35:12	not just of humans but rivers and everything else in terms of
0:35:18	physical laws
0:35:19	i highly suggest reading it it's very entertaining a but basically
0:35:24	and then going back to the coding i think when the coding what was done
0:35:28	it really was fundamental in the sense that we understood
0:35:33	a page and spectrum where the essence so for example the coding that works on
0:35:40	yourself on which is really meant to code speech if this is like in the
0:35:45	background it totally the stories because it really as adopted to the speech signal
0:35:51	so wasn't just a random brute force process it really depended on first lpc then
0:35:59	are is a coding the residual and all of that and that's why we have
0:36:03	such good coders and i think
0:36:06	a
0:36:07	the theory behind that was of course much more trivial then it is and in
0:36:12	language
0:36:13	so i do think that
0:36:15	we need to continue the work that we're doing but on the other hand do
0:36:21	a lot for some paradigm shifts a that would be more than just are increasing
0:36:27	a that's stochastic ability by introducing neural nets and
0:36:34	from where i said i thousand miles at a neural nets essentially are a generalization
0:36:40	of hmm their boats stochastic models it's just that in hmm you have essentially a
0:36:47	single it later
0:37:00	so i think the point about how much data and we need to solve the
0:37:03	problem by brute force comes down also to the question of
0:37:07	artificial intelligence right
0:37:08	so contain with these two stage scenarios one even scarier is that one day we're
0:37:14	going to get a activity in to use right
0:37:17	and so this process this when this happened or in the way so that moment
0:37:21	we're going to lose control of abstraction right machines are going to be better than
0:37:25	this ad created their own map structures so all this prior knowledge we want to
0:37:30	put into our models
0:37:31	is going to be are way you've seen things but machines are going to have
0:37:35	their way of seeing things
0:37:36	and when is it is discussions about saying
0:37:39	when we have to look at the problem and things like humans and
0:37:42	i think well
0:37:43	i is already happening that machine to create in they don't obstructions and they are
0:37:47	not into due to less but since they are two going to do better than
0:37:50	as in the long term we're done we might be better of just think you
0:37:54	know how the so much in sync up on the not how like to think
0:37:57	on this
0:37:58	how i can express the problem okay you at generative model that see it is
0:38:02	to me
0:38:03	maybe it should be intuitive to the machine
0:38:05	or to the harder right and deep neural networks
0:38:08	to some extent
0:38:09	okay
0:38:11	doing this i would very far away from that similarity right but when we will
0:38:16	reach that so maybe we'll webbetter of thinking
0:38:20	and i
0:38:33	that they are really always looking in the light and basically after fifty years over
0:38:38	artificial intelligence essentially of developed
0:38:42	tremendous methods for optimization and classification there is very little more can inference and logic
0:38:50	so i'm very good the to field is alive and well the si can see
0:38:55	from this discussion it really reminds me of which it reminded us that for one
0:39:02	of the first the asr you the workshops and i will also remember that even
0:39:07	in my introduction
0:39:08	where people were discussing fighting and it always the desire to move the field further
0:39:15	and i'm very happy that i think that we use exceeded too large extent in
0:39:19	this asr you to so let's just keep it's going i think otherwise i will
0:39:24	i will pass of the microphone to one zap who has a
0:39:28	a sound
0:39:29	since to say about is it is it the time for post the room or
0:39:33	basically i estimate i one commander is discussion i think
0:39:38	what we were discussing with the data that models the adequacy of models monitored by
0:39:43	C
0:39:43	i think well it turned little bit speech centric
0:39:47	so a little bit too selfish i fine so i think we forgot about the
0:39:52	users have a four technologies because i have the impression
0:39:55	that the well rarely people would just ultimately use the output the of asr and
0:40:00	say this is the output them your it finishes is most of the time is
0:40:04	just some meat product that would be further used by someone so actually
0:40:08	i like the way that the better what so speaking about that the well for
0:40:13	you would be the wer is not the automated metric but is the click through
0:40:16	rate wer of foreign call center traffic it might be the customers of destruction so
0:40:21	they have measures forty
0:40:23	for a government agency it might be the number of court
0:40:27	but the guys
0:40:28	and so on and so on so i think actually there is still quite some
0:40:31	work to do in propagating these target metrics
0:40:34	back to our field that i'd i don't know if there was like sufficient work
0:40:38	on this maybe they are not that only interested
0:40:41	in at W or wer and stuff like this just the just need to get
0:40:46	there were done
0:40:51	okay so we cook is sorry i didn't i didn't mean that the
0:40:55	find technical common and in the i did so no
0:41:01	no comments on this
0:41:02	one
0:41:04	lost

Closing discussion "What's wrong with ASR and what can we do about it"

4th Day