| 0:00:15 | uh well come to to a uh uh i guess |
|---|
| 0:00:18 | would morning everyone |
|---|
| 0:00:20 | and the first couple of practical the model |
|---|
| 0:00:24 | we have a a change of room |
|---|
| 0:00:26 | you know that the this club B was really small and you are afraid that people are uh would not |
|---|
| 0:00:31 | in a |
|---|
| 0:00:32 | so uh we moved everything from club B and the the expert sessions from club E |
|---|
| 0:00:38 | to the north hall |
|---|
| 0:00:39 | it's actually about the this uh a a hall on the second floor next deviation |
|---|
| 0:00:45 | and we should have more space the there so would be a known |
|---|
| 0:00:49 | uh actually |
|---|
| 0:00:50 | club |
|---|
| 0:00:51 | the |
|---|
| 0:00:51 | should be close than the |
|---|
| 0:00:53 | oh signs |
|---|
| 0:00:54 | would be there |
|---|
| 0:00:55 | then a for the internet really sorry for the trouble just today |
|---|
| 0:00:59 | that was close to a by you mobile by to provide |
|---|
| 0:01:03 | weighted um |
|---|
| 0:01:04 | uh |
|---|
| 0:01:05 | a range problems are all spot |
|---|
| 0:01:08 | so to you should be a variable again |
|---|
| 0:01:10 | but please uh |
|---|
| 0:01:12 | uh |
|---|
| 0:01:13 | oh uh we have a a just |
|---|
| 0:01:15 | five hundred twelve also available |
|---|
| 0:01:17 | there is no |
|---|
| 0:01:18 | you way |
|---|
| 0:01:19 | will more |
|---|
| 0:01:20 | so please disconnect when you |
|---|
| 0:01:22 | you not need to to be a and this is especially course my the not because the mark rosewood |
|---|
| 0:01:29 | on on just the state or or it all |
|---|
| 0:01:33 | then a for the bank at torch you know |
|---|
| 0:01:35 | we have a an i |
|---|
| 0:01:37 | you need to dig |
|---|
| 0:01:38 | i'm sorry for that but you don't have it the you will be a lot the to get on the |
|---|
| 0:01:43 | bus looking |
|---|
| 0:01:45 | real a very limited number of kids it's of available uh a the registration desk |
|---|
| 0:01:51 | then the the partial but it right at the the a section |
|---|
| 0:01:56 | the or to seven from and for number ten |
|---|
| 0:01:59 | and the transportation back from just lena |
|---|
| 0:02:01 | is not provided so |
|---|
| 0:02:03 | my crap |
|---|
| 0:02:05 | or continue or evening uh and this man it pops and or uh |
|---|
| 0:02:10 | the of rock |
|---|
| 0:02:12 | and uh uh i'm pretty much done a so uh there would be a short their introduction |
|---|
| 0:02:17 | of a are you done and other on a uh i |
|---|
| 0:02:27 | i |
|---|
| 0:02:32 | i |
|---|
| 0:02:33 | i |
|---|
| 0:02:35 | oh |
|---|
| 0:02:36 | hmmm |
|---|
| 0:02:38 | i |
|---|
| 0:02:39 | i |
|---|
| 0:02:40 | i |
|---|
| 0:02:41 | i |
|---|
| 0:02:43 | i |
|---|
| 0:02:48 | i |
|---|
| 0:02:52 | yeah true |
|---|
| 0:02:56 | and |
|---|
| 0:02:58 | i |
|---|
| 0:02:59 | so |
|---|
| 0:03:04 | i |
|---|
| 0:03:08 | and uh and uh |
|---|
| 0:03:09 | there's is time for the for the second one E |
|---|
| 0:03:12 | so uh |
|---|
| 0:03:14 | so the going to be given by |
|---|
| 0:03:16 | nelson morgan |
|---|
| 0:03:17 | from icsi berkeley |
|---|
| 0:03:19 | and uh and i get a month the |
|---|
| 0:03:21 | or or the non fiction of the name |
|---|
| 0:03:23 | will introduced a speaker and channel decision |
|---|
| 0:03:31 | you very much for coming so one B |
|---|
| 0:03:35 | the point |
|---|
| 0:03:36 | it is my great |
|---|
| 0:03:38 | and |
|---|
| 0:03:40 | i |
|---|
| 0:03:41 | on |
|---|
| 0:03:43 | and |
|---|
| 0:03:44 | yeah |
|---|
| 0:03:45 | rubber for |
|---|
| 0:03:47 | a |
|---|
| 0:03:48 | compute |
|---|
| 0:03:49 | but |
|---|
| 0:03:50 | probably from or |
|---|
| 0:03:52 | for those |
|---|
| 0:03:54 | you know walking speech for very long time |
|---|
| 0:03:57 | core |
|---|
| 0:03:58 | a number of techniques |
|---|
| 0:04:00 | i a |
|---|
| 0:04:02 | or also are at you get number of |
|---|
| 0:04:05 | number of you audience i C |
|---|
| 0:04:08 | so |
|---|
| 0:04:09 | for for those people |
|---|
| 0:04:12 | more than that much of the introduction |
|---|
| 0:04:15 | for those of you know him |
|---|
| 0:04:17 | it's also called |
|---|
| 0:04:19 | there you walk |
|---|
| 0:04:20 | a better with the one of the a |
|---|
| 0:04:22 | signal processing |
|---|
| 0:04:24 | then vol |
|---|
| 0:04:25 | and |
|---|
| 0:04:26 | i mean out a new addition |
|---|
| 0:04:28 | well for i for the problem |
|---|
| 0:04:31 | on the uh |
|---|
| 0:04:33 | what else can i say well i i i think that keep it sort or i will call more than |
|---|
| 0:04:37 | here you leave you at all be better than |
|---|
| 0:04:40 | looking at me |
|---|
| 0:04:42 | i |
|---|
| 0:04:44 | more |
|---|
| 0:04:54 | i i nick |
|---|
| 0:04:55 | well i thought it was time for a little bit of a reality check |
|---|
| 0:04:59 | and uh speech recognition |
|---|
| 0:05:01 | and |
|---|
| 0:05:02 | it's been around for a long time as i think everybody here knows |
|---|
| 0:05:06 | very long research history |
|---|
| 0:05:08 | uh lots of publications for decades many projects |
|---|
| 0:05:12 | and he sponsored project |
|---|
| 0:05:14 | systems have continually gotten better |
|---|
| 0:05:17 | it actually tended to converge so that there is |
|---|
| 0:05:20 | in some sense a a standard |
|---|
| 0:05:22 | automatic speech recognition system now |
|---|
| 0:05:24 | uh it's made it to a lot of commercial products |
|---|
| 0:05:28 | actually been used |
|---|
| 0:05:29 | actually works from time to time |
|---|
| 0:05:32 | and so in some sense |
|---|
| 0:05:33 | it seems to have graduate |
|---|
| 0:05:36 | but |
|---|
| 0:05:39 | yes fails where humans don't |
|---|
| 0:05:41 | and by the way those of you who have your P H Ds |
|---|
| 0:05:44 | know that your education hopefully was not done at that point |
|---|
| 0:05:49 | and there's probably a lot more to do here |
|---|
| 0:05:51 | uh somewhat argue |
|---|
| 0:05:53 | that there is little basic science that's been developed in quite a bit of time |
|---|
| 0:05:58 | lots of good engineering methods though |
|---|
| 0:06:00 | but they often require a great amount of data |
|---|
| 0:06:03 | uh as we learned yesterday there is a great deal of data |
|---|
| 0:06:07 | but not all of it is |
|---|
| 0:06:08 | available for use in the way that you like |
|---|
| 0:06:10 | and and are many tasks where you don't have that much |
|---|
| 0:06:13 | and each new task requires |
|---|
| 0:06:15 | uh essentially the same amount of effort you sort of have to start over again |
|---|
| 0:06:20 | so how do we get to this point |
|---|
| 0:06:21 | this is not gonna be anything like a complete history but |
|---|
| 0:06:25 | enough to make my point help |
|---|
| 0:06:27 | so |
|---|
| 0:06:28 | i'm gonna talk about the status current status in the standard methods |
|---|
| 0:06:31 | a very briefly |
|---|
| 0:06:33 | uh talk about some of the alternatives the people have worked with over the years |
|---|
| 0:06:37 | and where could we go from here |
|---|
| 0:06:41 | so |
|---|
| 0:06:41 | as i mentioned |
|---|
| 0:06:42 | speech recognition research has been around for a very long time |
|---|
| 0:06:46 | uh a significant papers for sixty years |
|---|
| 0:06:50 | by the nineteen seventies |
|---|
| 0:06:52 | in some sense major advances modeling and happened |
|---|
| 0:06:56 | that is the basic |
|---|
| 0:06:57 | mathematics behind hidden markov models |
|---|
| 0:07:00 | was done by |
|---|
| 0:07:02 | been a lots of improvements |
|---|
| 0:07:03 | that happened uh a for the next twenty years or so |
|---|
| 0:07:06 | and also in the features |
|---|
| 0:07:08 | which became |
|---|
| 0:07:09 | more less standard by ninety nine or so |
|---|
| 0:07:12 | there were some really important methodology improvements by ninety ninety in earlier days |
|---|
| 0:07:17 | people did many experiments but was very hard to compare them |
|---|
| 0:07:20 | and the notions of standard evaluations and standard datasets really took called by ninety nine year so |
|---|
| 0:07:27 | and over the all of these years |
|---|
| 0:07:29 | uh specially last twenty thirty years they've been continuous improvements |
|---|
| 0:07:33 | which were to some extent really close the related to more law |
|---|
| 0:07:36 | movements in the technology |
|---|
| 0:07:38 | that is |
|---|
| 0:07:39 | um more more computational capability |
|---|
| 0:07:42 | more more storage capability |
|---|
| 0:07:44 | long people the work with very large datasets |
|---|
| 0:07:46 | and develop very large models to well represent those large dataset |
|---|
| 0:07:51 | so on |
|---|
| 0:07:53 | so |
|---|
| 0:07:54 | there's an elephant the room which is the things |
|---|
| 0:07:57 | are not entirely working still |
|---|
| 0:08:00 | with these systems then fact have converged |
|---|
| 0:08:02 | was kind of a byproduct product of all of these standard evaluations which were |
|---|
| 0:08:06 | very good in many ways |
|---|
| 0:08:08 | but |
|---|
| 0:08:09 | when people found out that the other group |
|---|
| 0:08:11 | had something that they didn't they would copy a in very soon the system would become very much the same |
|---|
| 0:08:18 | so |
|---|
| 0:08:19 | what are some of the remaining problem |
|---|
| 0:08:22 | well |
|---|
| 0:08:22 | system still perform pretty poorly despite a large to work on this |
|---|
| 0:08:27 | in the presence of a significant mounts of acoustic noise |
|---|
| 0:08:30 | also reverberation |
|---|
| 0:08:32 | which is natural for |
|---|
| 0:08:34 | just about any situation |
|---|
| 0:08:37 | uh unexpected speaking rate or accent |
|---|
| 0:08:39 | that is by an expected i mean something that is not well represented in the training set |
|---|
| 0:08:45 | uh on from all your topics |
|---|
| 0:08:47 | uh the language models bring this a lot of the performance that we have and if you |
|---|
| 0:08:51 | don't have a particular topic represented in the language model can do poorly |
|---|
| 0:08:57 | and |
|---|
| 0:08:57 | a from the recognition performance per se how many words you get right |
|---|
| 0:09:01 | another thing that's important is knowing whether you're right or wrong |
|---|
| 0:09:05 | and that's very important for practical applications |
|---|
| 0:09:08 | and that still need some work as well |
|---|
| 0:09:12 | so turns out that even some fairly simple speech recognition task can still fail under some of these conditions |
|---|
| 0:09:17 | yielding some strange result |
|---|
| 0:09:20 | well so boy she no slow |
|---|
| 0:09:26 | voice recognition technology |
|---|
| 0:09:28 | and i |
|---|
| 0:09:29 | and shall |
|---|
| 0:09:31 | yeah know try voice recognition technology |
|---|
| 0:09:33 | no i one to change action |
|---|
| 0:09:37 | oh i |
|---|
| 0:09:40 | i |
|---|
| 0:09:43 | yeah |
|---|
| 0:09:45 | oh i |
|---|
| 0:09:47 | oh and oh yeah |
|---|
| 0:09:50 | i |
|---|
| 0:09:53 | i don't |
|---|
| 0:09:56 | i it was last |
|---|
| 0:09:58 | yeah time in any case |
|---|
| 0:10:01 | yeah |
|---|
| 0:10:03 | i |
|---|
| 0:10:04 | but |
|---|
| 0:10:05 | i shown in a |
|---|
| 0:10:06 | yeah |
|---|
| 0:10:09 | yeah |
|---|
| 0:10:10 | one |
|---|
| 0:10:13 | a |
|---|
| 0:10:15 | shacks |
|---|
| 0:10:18 | and |
|---|
| 0:10:20 | i |
|---|
| 0:10:21 | a |
|---|
| 0:10:21 | a |
|---|
| 0:10:22 | same time one is that a |
|---|
| 0:10:26 | i |
|---|
| 0:10:27 | i |
|---|
| 0:10:28 | yeah |
|---|
| 0:10:29 | small a |
|---|
| 0:10:31 | yeah |
|---|
| 0:10:33 | a |
|---|
| 0:10:35 | i |
|---|
| 0:10:37 | a lot |
|---|
| 0:10:38 | if you do not feel at all angles are we can getting a |
|---|
| 0:10:42 | i |
|---|
| 0:10:44 | anyway |
|---|
| 0:10:45 | so that was funny |
|---|
| 0:10:47 | i hope you think it was funny but |
|---|
| 0:10:49 | what |
|---|
| 0:10:49 | hasn't worked in real life as opposed to just the jokes |
|---|
| 0:10:53 | and what have |
|---|
| 0:10:56 | so uh let me start off with |
|---|
| 0:10:58 | uh |
|---|
| 0:10:59 | some results from some of these standard evaluations are referred to |
|---|
| 0:11:03 | this is a graph the people in speech of seen a million times |
|---|
| 0:11:06 | uh |
|---|
| 0:11:07 | is this other one |
|---|
| 0:11:09 | um |
|---|
| 0:11:10 | for those of you who are familiar with this main thing to note is that uh P we start E |
|---|
| 0:11:14 | R stands for word error rate |
|---|
| 0:11:16 | hi high word error rate is obviously bad this is time and the axis |
|---|
| 0:11:20 | and each of these lines represents a series of ten |
|---|
| 0:11:23 | oh this is a kind of messy graph so it's cleaning up a little |
|---|
| 0:11:26 | and |
|---|
| 0:11:27 | uh this is uh a task done in the early nineties uh called eight is |
|---|
| 0:11:32 | and the main thing to see here as with a lot of these is that to starts off at a |
|---|
| 0:11:35 | pretty high error rate people work for a while |
|---|
| 0:11:38 | and after while a gets down to uh a pretty reasonable error rate |
|---|
| 0:11:43 | that's go to another one this was uh |
|---|
| 0:11:45 | a a conversational telephone speech |
|---|
| 0:11:47 | you have the same sort of a fact and do remember that the this is a a a a um |
|---|
| 0:11:52 | a logarithmic scale here |
|---|
| 0:11:53 | so even though it looks like it hasn't come down very far really did come down pretty far but after |
|---|
| 0:11:58 | well sort of levels off |
|---|
| 0:12:00 | uh more recently there's been a bunch work on speech from meetings which is also conversational |
|---|
| 0:12:05 | these are from the uh individual head mounted microphones |
|---|
| 0:12:09 | she we still didn't have a huge effects of background noise or or or reverberation or anything |
|---|
| 0:12:14 | and there wasn't actually a huge amount of progress after some of the initial uh initial work |
|---|
| 0:12:20 | uh now these are |
|---|
| 0:12:21 | these evaluations |
|---|
| 0:12:23 | uh a commercial products |
|---|
| 0:12:25 | i think |
|---|
| 0:12:26 | uh uh you |
|---|
| 0:12:27 | a lot of information is proprietary |
|---|
| 0:12:29 | but i think working can say is that |
|---|
| 0:12:31 | a partial products work some of the time for some people |
|---|
| 0:12:34 | and they often don't work |
|---|
| 0:12:35 | for others |
|---|
| 0:12:37 | so what is the state |
|---|
| 0:12:39 | well the recognition systems were either |
|---|
| 0:12:42 | work really well for somebody |
|---|
| 0:12:44 | or they'll be terribly brittle and reliable |
|---|
| 0:12:47 | uh i know that when my wife and i both tried a uh a dictation systems they work wonderfully for |
|---|
| 0:12:52 | her and terrible for me i think i i well my words something |
|---|
| 0:12:57 | so here's an abbreviated review |
|---|
| 0:12:59 | of what standard |
|---|
| 0:13:01 | by ninety ninety one |
|---|
| 0:13:03 | we had |
|---|
| 0:13:05 | uh feature extraction |
|---|
| 0:13:06 | basically being based on frames every ten milliseconds or so |
|---|
| 0:13:10 | compute |
|---|
| 0:13:11 | some something from a short spectrum |
|---|
| 0:13:14 | uh i things called mel-frequency cepstral coefficients |
|---|
| 0:13:17 | well |
|---|
| 0:13:18 | mention a bit more about a second |
|---|
| 0:13:20 | uh |
|---|
| 0:13:21 | P L P is another common method develop by then |
|---|
| 0:13:25 | delta cepstra |
|---|
| 0:13:26 | uh |
|---|
| 0:13:26 | uh essentially temporal derivatives of the cepstra |
|---|
| 0:13:30 | and on the statistical side |
|---|
| 0:13:32 | uh acoustic modeling hidden markov models were quite standard |
|---|
| 0:13:36 | it typically by this point represented |
|---|
| 0:13:38 | context-dependent phoneme or units or phoneme like unit |
|---|
| 0:13:42 | uh the language models are pretty much by this time all statistical |
|---|
| 0:13:46 | and they represent it context-dependent words |
|---|
| 0:13:50 | so all this with a by ninety ninety one |
|---|
| 0:13:52 | a let's move to two thousand the eleven |
|---|
| 0:13:56 | there it is |
|---|
| 0:13:58 | uh |
|---|
| 0:13:59 | notice all the changes |
|---|
| 0:14:02 | okay that's a little unfair |
|---|
| 0:14:04 | uh a will have actually done work in the last twenty years |
|---|
| 0:14:07 | and this is |
|---|
| 0:14:09 | they representation of a a lot of it i think |
|---|
| 0:14:11 | and these of had big affects |
|---|
| 0:14:13 | i don't mean to minimize |
|---|
| 0:14:14 | errors |
|---|
| 0:14:15 | uh various kinds of normalisation is uh meeting variance kind of normalisation |
|---|
| 0:14:20 | uh a a an online version of that that we called rasta |
|---|
| 0:14:23 | uh vocal tract length normalisation which |
|---|
| 0:14:26 | compresses or expands the spectrum and |
|---|
| 0:14:29 | in such a way is as to match the models better |
|---|
| 0:14:33 | um |
|---|
| 0:14:34 | and uh then |
|---|
| 0:14:35 | adaptation in feature transformation |
|---|
| 0:14:38 | uh either adapting better to test set that somewhat different from the training set |
|---|
| 0:14:42 | uh or uh |
|---|
| 0:14:44 | various that changes is to make the features more discriminative |
|---|
| 0:14:49 | discriminate range training |
|---|
| 0:14:51 | actually |
|---|
| 0:14:52 | uh changing the statistical models |
|---|
| 0:14:55 | in such a way as to make them more discriminant between different speech sound |
|---|
| 0:14:59 | did did have more more data or of the years and that required |
|---|
| 0:15:03 | lots of work to figure out how to handle that |
|---|
| 0:15:05 | but aside from handling it was also taking advantage of lots of data |
|---|
| 0:15:09 | which was didn't come for free so was lots of engineering work there |
|---|
| 0:15:14 | uh people found that |
|---|
| 0:15:15 | combining systems helped and sometimes combining |
|---|
| 0:15:18 | pieces of systems helped |
|---|
| 0:15:20 | and that's been an important thing in improving uh perform an |
|---|
| 0:15:24 | and because |
|---|
| 0:15:25 | uh speech recognition was starting to go into applications you had to be concerned about speed |
|---|
| 0:15:30 | and this been a lot of work on that |
|---|
| 0:15:33 | well but more and some of this uh |
|---|
| 0:15:35 | the main point uh about mel cepstrum and plp a wanna make is that |
|---|
| 0:15:40 | each of "'em" use this kind of warped frequency scale |
|---|
| 0:15:43 | uh in which you have better resolution at low frequencies and high frequencies |
|---|
| 0:15:47 | "'cause" our perception of different uh |
|---|
| 0:15:50 | speech sounds is very different at low frequencies high frequencies |
|---|
| 0:15:53 | no cepstrum and plp used a different mechanisms |
|---|
| 0:15:57 | for getting a smooth spectrum uh |
|---|
| 0:16:00 | delta cepstrum uh |
|---|
| 0:16:02 | uh as big as i said is basically |
|---|
| 0:16:05 | uh time derivatives uh of the cepstrum |
|---|
| 0:16:09 | um |
|---|
| 0:16:10 | hidden markov model this is a graphical form of it |
|---|
| 0:16:13 | and main thing to see here this is a a |
|---|
| 0:16:16 | a statistical dependency graph |
|---|
| 0:16:18 | uh and |
|---|
| 0:16:20 | say X three is only dependent on the current state |
|---|
| 0:16:24 | each of these |
|---|
| 0:16:25 | time steps |
|---|
| 0:16:26 | uh |
|---|
| 0:16:27 | are represented here |
|---|
| 0:16:29 | and if you know Q three |
|---|
| 0:16:31 | uh then Q two Q one X one X to tell you nothing about X three |
|---|
| 0:16:35 | so that's a very very strong statistical conditional independence model |
|---|
| 0:16:40 | and that's pretty much what people have used in these |
|---|
| 0:16:43 | are now standard cyst |
|---|
| 0:16:45 | this is my only equation |
|---|
| 0:16:47 | and uh those of you and speech will go oh yeah fact probably |
|---|
| 0:16:50 | most people say oh yeah |
|---|
| 0:16:52 | this |
|---|
| 0:16:53 | basically bayes rule |
|---|
| 0:16:55 | the idea is that |
|---|
| 0:16:56 | in statistical system |
|---|
| 0:16:58 | you want to pick the model |
|---|
| 0:16:59 | that is most probable given the data |
|---|
| 0:17:02 | and base so as you could expand in this way |
|---|
| 0:17:05 | and then you can get rid of the P of X because there's no dependence on the model |
|---|
| 0:17:12 | um |
|---|
| 0:17:12 | so |
|---|
| 0:17:13 | you realise these |
|---|
| 0:17:14 | uh likelihoods |
|---|
| 0:17:16 | of of probability of the two six given the model with mixtures of gaussians typically |
|---|
| 0:17:21 | you typically have each gaussian in just represented by means and variances there's no covariance represented between the features |
|---|
| 0:17:29 | and there's the weights of each of the gaussians |
|---|
| 0:17:31 | the you language priors |
|---|
| 0:17:32 | P of them |
|---|
| 0:17:34 | are uh |
|---|
| 0:17:35 | implemented with a n-gram |
|---|
| 0:17:37 | do a bunch accounting counting you do some smoothing |
|---|
| 0:17:40 | and it's basically a probability of a word given some word histories such as the frequent the recent |
|---|
| 0:17:46 | and minus one words |
|---|
| 0:17:49 | now i |
|---|
| 0:17:50 | the math is lovely but in practice we actually raise each of these things to some kind of power |
|---|
| 0:17:55 | this is to compensate for the fact that the models are |
|---|
| 0:17:58 | and that uh |
|---|
| 0:18:00 | there are really other dependence |
|---|
| 0:18:04 | um |
|---|
| 0:18:04 | this is a picture of the acoustic likelihood |
|---|
| 0:18:07 | uh uh uh uh estimator |
|---|
| 0:18:09 | there's a few steps in here each of these boxes can actually be fairly complicated but |
|---|
| 0:18:14 | just generally speaking |
|---|
| 0:18:15 | there's a some kind of space short spectral estimation |
|---|
| 0:18:19 | there's this vocal tract length normalisation i mention which compresses or expanse spectrum |
|---|
| 0:18:24 | the some kind of smoothing either by |
|---|
| 0:18:26 | uh throwing away some of the upper cepstra coefficients are why autoregressive modeling as is done in P L P |
|---|
| 0:18:33 | there's various kinds of linear transformations for instance for dimensionality reduction |
|---|
| 0:18:38 | uh and for discrimination better discrimination |
|---|
| 0:18:41 | then there's the statistical engine |
|---|
| 0:18:43 | that i mentioned before with this funny scaling um |
|---|
| 0:18:46 | in the log domain or raising to a power |
|---|
| 0:18:49 | in order to mixed with the |
|---|
| 0:18:50 | uh language model |
|---|
| 0:18:52 | okay well that seems simple enough but |
|---|
| 0:18:54 | actual systems that get the very best scores are a bit more complicated than this |
|---|
| 0:18:58 | uh there's well |
|---|
| 0:18:59 | first off there's the decoder and the language priors coming in |
|---|
| 0:19:03 | um |
|---|
| 0:19:05 | well you might have to of these france to |
|---|
| 0:19:08 | and |
|---|
| 0:19:09 | people found that this is very helpful for getting best perform |
|---|
| 0:19:13 | but you don't just put "'em" in in a very simple way |
|---|
| 0:19:17 | it's a very often the case that you have all sorts of stages is |
|---|
| 0:19:20 | with ugh |
|---|
| 0:19:22 | C W here's is crossword |
|---|
| 0:19:24 | or non crossword models and you produce graphs or lattice and you combine them at different points and you cross |
|---|
| 0:19:30 | at that |
|---|
| 0:19:32 | well |
|---|
| 0:19:33 | this kind of reminds me of some work |
|---|
| 0:19:36 | by a uh |
|---|
| 0:19:37 | a berkeley grad of for about a century ago name rube goldberg |
|---|
| 0:19:41 | and this is these self operating napkin |
|---|
| 0:19:44 | the self operating napkin is activated when these ships spoon a a is raised to mouth |
|---|
| 0:19:50 | uh pulling string P and thereby jerking little C |
|---|
| 0:19:54 | which throws crack or D past parrot P |
|---|
| 0:19:57 | uh pair of jumps after cracked or and perch have tilt |
|---|
| 0:20:01 | which uh uh a process C it's G in into pale H |
|---|
| 0:20:06 | the extra weight in the pale pulls the cord i which opens and |
|---|
| 0:20:10 | uh i which lights the cigarette lighter J |
|---|
| 0:20:13 | and this uh |
|---|
| 0:20:14 | turn lights the rocket which pulls the sickle which cuts the string |
|---|
| 0:20:19 | which |
|---|
| 0:20:20 | was the pendulum to swing back and forth |
|---|
| 0:20:22 | thereby by wiping the chen |
|---|
| 0:20:25 | uh for this |
|---|
| 0:20:26 | time my view of current speech recognition system |
|---|
| 0:20:32 | it's successful at wiping the chance sometime time |
|---|
| 0:20:35 | so i wanna talk a little bit about alternatives |
|---|
| 0:20:37 | and i wanna say that the at the outset |
|---|
| 0:20:40 | that these are just some of the alternatives |
|---|
| 0:20:42 | a conference like this has uh a lot of work |
|---|
| 0:20:45 | happily |
|---|
| 0:20:46 | uh in in many different directions |
|---|
| 0:20:48 | is the ones i wanted to give as examples |
|---|
| 0:20:52 | but first i wanna say |
|---|
| 0:20:54 | a little bit |
|---|
| 0:20:55 | about |
|---|
| 0:20:57 | what else is there |
|---|
| 0:20:58 | besides the main |
|---|
| 0:21:02 | the great sage natural and |
|---|
| 0:21:04 | was tracked down by a seeker |
|---|
| 0:21:06 | and the or ask the sage |
|---|
| 0:21:09 | what is the secret to happiness |
|---|
| 0:21:12 | sage answered |
|---|
| 0:21:13 | good judge |
|---|
| 0:21:16 | well the sick said that's |
|---|
| 0:21:17 | that's so very well |
|---|
| 0:21:18 | master but |
|---|
| 0:21:20 | how does one obtain good judgement |
|---|
| 0:21:23 | and the master said |
|---|
| 0:21:24 | from experience |
|---|
| 0:21:27 | so the seek a okay experience |
|---|
| 0:21:30 | but |
|---|
| 0:21:31 | how does one obtain this experience |
|---|
| 0:21:34 | and the master said |
|---|
| 0:21:35 | bad judgement |
|---|
| 0:21:39 | so |
|---|
| 0:21:40 | here's of exercise exercises that we many other people of done in bad judgement |
|---|
| 0:21:44 | we've pursued |
|---|
| 0:21:46 | different signal representation |
|---|
| 0:21:48 | uh some of them are related to perception |
|---|
| 0:21:50 | to auditory models france |
|---|
| 0:21:53 | a mean rate and synchrony has a to send ups model from "'em" some time ago |
|---|
| 0:21:57 | uh and into and sample interval histogram |
|---|
| 0:22:00 | from uh uh way it gets a |
|---|
| 0:22:02 | i each of these |
|---|
| 0:22:03 | were |
|---|
| 0:22:04 | related to models of neural firing |
|---|
| 0:22:08 | uh how |
|---|
| 0:22:09 | how fast they want to how much they synchronise one another |
|---|
| 0:22:12 | what uh timing there was between the fire |
|---|
| 0:22:15 | and they had some interesting performance in noise uh they |
|---|
| 0:22:19 | i not been adopted any serious way |
|---|
| 0:22:22 | but |
|---|
| 0:22:23 | there's interesting technology there an interesting scientific models |
|---|
| 0:22:27 | then their stuff that's more and the psychological side these were sort of based on on models of fit |
|---|
| 0:22:32 | physiology |
|---|
| 0:22:33 | uh then there is uh model uh |
|---|
| 0:22:36 | really from the psychological side and multi band systems based on critical bands going all the way back to |
|---|
| 0:22:42 | fletcher's work work of others |
|---|
| 0:22:44 | uh and |
|---|
| 0:22:46 | uh the idea here is that if you have a system that's just looking at part of the spectrum |
|---|
| 0:22:50 | if the disturbances in that part of the spectrum |
|---|
| 0:22:53 | uh then you can deal with that separately |
|---|
| 0:22:56 | note of had some X some six |
|---|
| 0:22:58 | and then something that uh |
|---|
| 0:23:00 | you can observe both that the physiological and psychological level |
|---|
| 0:23:04 | is the importance of tip different um modulations |
|---|
| 0:23:08 | particularly temporal but also spectral modulations and the signal |
|---|
| 0:23:13 | uh then there's on the production side is been a bunch of work by people on |
|---|
| 0:23:17 | uh given the fact that there is only if you articulatory uh mechanism |
|---|
| 0:23:22 | uh that maybe you can represent things that way and O be more se saying and |
|---|
| 0:23:26 | the better |
|---|
| 0:23:27 | better representation the signal one |
|---|
| 0:23:29 | represent this over time in their been |
|---|
| 0:23:31 | hidden dynamic |
|---|
| 0:23:32 | uh models that attempt to do this and |
|---|
| 0:23:35 | trajectory models sometimes the trajectory models had nothing to do with the physiological models but |
|---|
| 0:23:40 | uh sometimes they did |
|---|
| 0:23:43 | and articulatory features which you could think of as a quantized version of the articulator positions and so for |
|---|
| 0:23:51 | then another direction was artificial neural networks which of been around for a very long time |
|---|
| 0:23:57 | um |
|---|
| 0:23:58 | actually before nineteen sixty one but |
|---|
| 0:24:00 | i picked out this one discriminant analysis iterative design |
|---|
| 0:24:04 | the pick that out "'cause" a lot of people don't know about it a lot of people think that a |
|---|
| 0:24:07 | multilayer networks the big N in the eighties |
|---|
| 0:24:10 | but actually neck can sixty one they had a multilayer network that work very well for some problems is actually |
|---|
| 0:24:15 | used industrial E |
|---|
| 0:24:16 | for a that case after that |
|---|
| 0:24:19 | um which the first uh layer of units was uh uh a bunch of gaussians and after that you had |
|---|
| 0:24:24 | a you had linear perceptron |
|---|
| 0:24:27 | couple years later uh other was work at stanford |
|---|
| 0:24:30 | in which they actually did apply some of this kind of stuff to speech these were actually linear adaptive units |
|---|
| 0:24:35 | to actually called add lines |
|---|
| 0:24:37 | uh burning would row sent me uh uh |
|---|
| 0:24:39 | a technical report |
|---|
| 0:24:40 | sri is struggle interest is the cover real technical report nineteen sixty three |
|---|
| 0:24:46 | uh is a page from it that shows a |
|---|
| 0:24:48 | uh a block diagram of try blew up here for |
|---|
| 0:24:51 | mean it's |
|---|
| 0:24:52 | and starts off with some band filters basically you getting some power measures in each band |
|---|
| 0:24:57 | and then here these add lines which uh give you some sets of outputs |
|---|
| 0:25:02 | which one to a typewriter a pair |
|---|
| 0:25:06 | um |
|---|
| 0:25:07 | nineteen eighties so an explosion of interest in the neural network |
|---|
| 0:25:11 | uh |
|---|
| 0:25:11 | very area |
|---|
| 0:25:13 | uh part of this |
|---|
| 0:25:14 | was sparked by |
|---|
| 0:25:16 | a a rediscovery discovery say of your were back propagation |
|---|
| 0:25:20 | just basically propagating the effect of errors from the output of the system |
|---|
| 0:25:24 | back to the individual weight |
|---|
| 0:25:27 | uh in the late eighties uh number of us worked on hybrid hmm artificial neural network systems |
|---|
| 0:25:34 | where the neural networks were used this probability estimators stick to get the emission uh probabilities for the hmm |
|---|
| 0:25:41 | um |
|---|
| 0:25:41 | last decade or so uh quite a few people have taken off on the tandem idea |
|---|
| 0:25:46 | which is do you which is a particular way of using artificial neural networks |
|---|
| 0:25:50 | as feature extractor is |
|---|
| 0:25:52 | and i will just mention uh briefly |
|---|
| 0:25:55 | uh a fairly recent development of the networks |
|---|
| 0:25:59 | and |
|---|
| 0:26:00 | how uh |
|---|
| 0:26:02 | how innovative it is is a is the question |
|---|
| 0:26:04 | but there's definitely some new things going on there which i think are interesting |
|---|
| 0:26:09 | uh |
|---|
| 0:26:10 | the obvious difference between this in the previous networks to can to be more layers that and steep |
|---|
| 0:26:15 | there's also sometimes and unsupervised pre-training |
|---|
| 0:26:20 | uh |
|---|
| 0:26:21 | there's actually several papers at this conference there's also a special issue |
|---|
| 0:26:24 | uh in uh november of the transaction |
|---|
| 0:26:28 | um here's a couple papers that this conference i think this if you others as well as one from the |
|---|
| 0:26:32 | nails river E |
|---|
| 0:26:34 | they had a lot different numbers in the paper but uh i pick one out |
|---|
| 0:26:38 | and just |
|---|
| 0:26:40 | it did they most the numbers had the same general trend |
|---|
| 0:26:43 | mfcc |
|---|
| 0:26:44 | bad |
|---|
| 0:26:45 | deep mlp good |
|---|
| 0:26:47 | uh and the old mlp somewhere in between |
|---|
| 0:26:50 | these are error rates so low again uh low is good |
|---|
| 0:26:54 | and uh there is a large vocabulary um |
|---|
| 0:26:58 | voice search |
|---|
| 0:26:59 | uh paper which uh i |
|---|
| 0:27:01 | is is that poster today |
|---|
| 0:27:03 | uh i had a sixteen percent set the their metric was sentence error reduction |
|---|
| 0:27:08 | and they had a nice improvement compared to |
|---|
| 0:27:10 | a system that used uh M P which is a a a very common discriminant training |
|---|
| 0:27:15 | approach |
|---|
| 0:27:20 | okay |
|---|
| 0:27:20 | so that it to some of the alternatives is again there's you i'm sure |
|---|
| 0:27:24 | many people this audience good think of a many of |
|---|
| 0:27:29 | where could we go from here |
|---|
| 0:27:30 | or |
|---|
| 0:27:31 | in my opinion where should be go from |
|---|
| 0:27:35 | well |
|---|
| 0:27:36 | better features and models |
|---|
| 0:27:39 | um |
|---|
| 0:27:40 | i've suggested |
|---|
| 0:27:41 | better models of hearing in production |
|---|
| 0:27:44 | uh could press perhaps lead to better features |
|---|
| 0:27:48 | uh better models of these features |
|---|
| 0:27:50 | better acoustic models |
|---|
| 0:27:53 | models of understanding better language models dialogue models pragmatics and so on |
|---|
| 0:27:58 | all these are likely to be import |
|---|
| 0:28:01 | the other thing which i'm gonna go into a bit especially at the end is understanding the errors |
|---|
| 0:28:06 | understanding what the assumptions are |
|---|
| 0:28:08 | that are going into our models |
|---|
| 0:28:10 | and how to get past |
|---|
| 0:28:15 | so we start with models of hearing |
|---|
| 0:28:17 | so there are |
|---|
| 0:28:19 | useful approximations to the action of for free that is uh |
|---|
| 0:28:23 | uh from here |
|---|
| 0:28:25 | to the auditory in your of |
|---|
| 0:28:27 | and when i say useful approximations i mean that there are are number of people who've worked |
|---|
| 0:28:33 | and |
|---|
| 0:28:34 | simplifying the models that if that were used earlier |
|---|
| 0:28:38 | and |
|---|
| 0:28:39 | crafting them more towards |
|---|
| 0:28:41 | uh good engineering |
|---|
| 0:28:43 | tools |
|---|
| 0:28:44 | some of those are looking kind of promise |
|---|
| 0:28:47 | uh there's new information about the auditory cortex which i'm gonna brief the refer to |
|---|
| 0:28:51 | next few slides |
|---|
| 0:28:53 | including some results with noise |
|---|
| 0:28:56 | um |
|---|
| 0:28:57 | it's good to learn from a biological examples because uh you know humans are pretty good in many situations that |
|---|
| 0:29:03 | at recognizing speech |
|---|
| 0:29:05 | but |
|---|
| 0:29:06 | it's |
|---|
| 0:29:06 | probably good also not to be purist |
|---|
| 0:29:08 | and to mix |
|---|
| 0:29:09 | in size that you get from these things with good engineering approaches |
|---|
| 0:29:13 | and i i i think there's some |
|---|
| 0:29:15 | uh good possibilities there |
|---|
| 0:29:17 | uh this bottom bullet it is just to note that |
|---|
| 0:29:20 | as with many things in this talk a money talking about some of the field |
|---|
| 0:29:24 | and a mostly talking about single channel |
|---|
| 0:29:26 | but uh people have to ears they make pretty good use of them when they were |
|---|
| 0:29:31 | uh and that's |
|---|
| 0:29:33 | something to keep in mind |
|---|
| 0:29:34 | and of course you can go to many years in some situations with microphone arrays and that's a good thing |
|---|
| 0:29:39 | to |
|---|
| 0:29:39 | think about |
|---|
| 0:29:40 | that's not a topics and i'm expanding on and the stock |
|---|
| 0:29:44 | and the same thing with visual information visual information is used by people whenever they can |
|---|
| 0:29:49 | uh and i'm not gonna talk about that but it's obviously imp or |
|---|
| 0:29:53 | okay a a is gonna talk about this a cortical stuff |
|---|
| 0:29:58 | uh the slightest courtesy of uh she she shah it's not just the slide but also the idea |
|---|
| 0:30:03 | uh and the idea that which comes from experiments that uh he in it's guys |
|---|
| 0:30:09 | and gals |
|---|
| 0:30:10 | have |
|---|
| 0:30:11 | uh done with a small mammals |
|---|
| 0:30:14 | uh a that have |
|---|
| 0:30:16 | pretty similar |
|---|
| 0:30:17 | really part of the cortex X |
|---|
| 0:30:19 | uh a primary auditory cortex |
|---|
| 0:30:21 | to what people have |
|---|
| 0:30:23 | also been some other work with people |
|---|
| 0:30:25 | uh and |
|---|
| 0:30:27 | these |
|---|
| 0:30:27 | uh |
|---|
| 0:30:28 | this if you mention this is being the kind of spectrogram that's received that this primary auditory cortex |
|---|
| 0:30:35 | what they've observed is that there's a bunch of what are called split spectro-temporal receptive fields S T R apps |
|---|
| 0:30:41 | which are little filters |
|---|
| 0:30:42 | that process it in time and frequency |
|---|
| 0:30:46 | and you could think of them as processing temporal modulations which you called rate and spectral modulations which called scale |
|---|
| 0:30:53 | and you imagine there being a cube |
|---|
| 0:30:55 | at each time point |
|---|
| 0:30:57 | with auditory frequency |
|---|
| 0:30:59 | and uh |
|---|
| 0:31:00 | rate and scale |
|---|
| 0:31:02 | and much as you would like to be able to in in and a regular spectrogram |
|---|
| 0:31:07 | uh |
|---|
| 0:31:08 | de emphasise the areas where the signals noise was poor |
|---|
| 0:31:11 | and emphasise areas with the sings noise was good |
|---|
| 0:31:14 | you have perhaps an even greater chance |
|---|
| 0:31:16 | to do this kind of emphasis you have a as |
|---|
| 0:31:19 | uh if you're expanded out to this cube |
|---|
| 0:31:22 | that's the general idea |
|---|
| 0:31:23 | so you could end up with a lot of these different spectrotemporal receptive fields |
|---|
| 0:31:27 | you could implement them and you could try to do something good with them pick out a good |
|---|
| 0:31:32 | uh if limitation that we and and number of people have been trying |
|---|
| 0:31:36 | is |
|---|
| 0:31:37 | a |
|---|
| 0:31:38 | uh what we would call T many stream |
|---|
| 0:31:41 | uh implementation |
|---|
| 0:31:43 | as opposed to multi-stream which uh was what i we shown before you we'd have two or three streams just |
|---|
| 0:31:49 | refers to the quantity |
|---|
| 0:31:50 | but what's in each stream is |
|---|
| 0:31:53 | one of the representation one these spectro-temporal receptive fields implemented by a gabor filter |
|---|
| 0:31:57 | and by a multilayer perceptron |
|---|
| 0:31:59 | that's a discriminatively trained discriminant between different speech sounds |
|---|
| 0:32:04 | you get a whole lot of these and some of implementations we at three hundred |
|---|
| 0:32:07 | uh and then you have to figure out how to combine them or select them |
|---|
| 0:32:11 | hopefully again to de emphasise the ones that are uh bad indicators of what was set |
|---|
| 0:32:19 | so |
|---|
| 0:32:20 | another interesting side light of this kind of approach |
|---|
| 0:32:23 | is that it's a good fit to modern high speed computing that it's |
|---|
| 0:32:27 | as i think a lot of you know |
|---|
| 0:32:29 | the clock rates and or long going up the way they used to other cpus use |
|---|
| 0:32:33 | and so the way that manufacturers are trying to give us more performances by having many more core |
|---|
| 0:32:38 | the graphics processors are an extreme example of this |
|---|
| 0:32:41 | this kind of structure is a really good match to that |
|---|
| 0:32:44 | uh because it's it's what they call an embarrassingly parallel |
|---|
| 0:32:48 | um we found that this room this kind of approach does remove a significant number of errors particularly and noise |
|---|
| 0:32:54 | but also a as it turns out in the clean condition |
|---|
| 0:32:58 | um |
|---|
| 0:32:59 | it combines well with pure engineering not auditory |
|---|
| 0:33:02 | kind of methods |
|---|
| 0:33:03 | uh such as wiener filter based methods |
|---|
| 0:33:06 | and we'd like to think that it could combine well with other auditory models all we haven't really done that |
|---|
| 0:33:11 | work yet |
|---|
| 0:33:14 | um |
|---|
| 0:33:15 | statistical |
|---|
| 0:33:16 | acoustic models |
|---|
| 0:33:19 | uh we currently use these critical assumption |
|---|
| 0:33:22 | and one of things about using very different kinds of features is that this can really change their statistical properties |
|---|
| 0:33:27 | from what the ones we have now |
|---|
| 0:33:29 | and so these assumptions |
|---|
| 0:33:31 | i could be violated in yet different way |
|---|
| 0:33:35 | uh there have been all turn models that were propose that allow you to bypass these typical assumptions |
|---|
| 0:33:41 | but part of the problem is the figure out |
|---|
| 0:33:43 | which statistical dependencies to put in |
|---|
| 0:33:48 | um models of language an understanding |
|---|
| 0:33:51 | i think it's probably pretty clear those you don't know me that this isn't a research area |
|---|
| 0:33:55 | but it's of obvious import |
|---|
| 0:33:57 | and |
|---|
| 0:33:58 | one of the things that uh |
|---|
| 0:34:01 | has been frustrating to a lot of people in fact a member fred jelinek being physically frustrated about this |
|---|
| 0:34:07 | is that |
|---|
| 0:34:08 | it's very very tough to get much improvement |
|---|
| 0:34:10 | over simple n-grams that is a probability of word given some number of previous work |
|---|
| 0:34:16 | but |
|---|
| 0:34:16 | it can be very important |
|---|
| 0:34:18 | two |
|---|
| 0:34:19 | get further information |
|---|
| 0:34:21 | and we know this for sure for people |
|---|
| 0:34:25 | me tell you little story |
|---|
| 0:34:27 | uh one day |
|---|
| 0:34:29 | i was walking out of i csi |
|---|
| 0:34:31 | and i had on one of these catch this is a cap for the oakland athletics to local |
|---|
| 0:34:37 | make league baseball club |
|---|
| 0:34:39 | i also had on a jacket |
|---|
| 0:34:41 | that had the same insignia on it |
|---|
| 0:34:44 | and i had a radio |
|---|
| 0:34:45 | hell to my head i was walking down the street |
|---|
| 0:34:49 | and a guy across the street |
|---|
| 0:34:50 | moderately noisy street |
|---|
| 0:34:51 | yeah |
|---|
| 0:34:52 | or |
|---|
| 0:34:55 | and i said |
|---|
| 0:34:56 | oh can five to three |
|---|
| 0:35:00 | anyway |
|---|
| 0:35:01 | we'd like to be able to do that with a machine |
|---|
| 0:35:06 | so where we go from here |
|---|
| 0:35:09 | well |
|---|
| 0:35:10 | research what continue to get good ideas |
|---|
| 0:35:13 | uh |
|---|
| 0:35:15 | every time you get the shower or maybe you have a have a good idea coming out |
|---|
| 0:35:20 | but |
|---|
| 0:35:20 | what's the best methodology |
|---|
| 0:35:22 | what's the best way to proceed along this path |
|---|
| 0:35:25 | so maybe we can learn from some other disciplines |
|---|
| 0:35:29 | and let me give |
|---|
| 0:35:30 | uh a kind of stretched analogy to |
|---|
| 0:35:33 | the search for a cure for cancer |
|---|
| 0:35:35 | and again i'm gonna tell you a little story |
|---|
| 0:35:38 | uh us the personal one uh is about an uncle of mine names sydney far per |
|---|
| 0:35:43 | um |
|---|
| 0:35:44 | now |
|---|
| 0:35:44 | my uncle set the and the forties |
|---|
| 0:35:46 | uh was |
|---|
| 0:35:48 | i path file just |
|---|
| 0:35:49 | at harvard med channel |
|---|
| 0:35:50 | however but centre |
|---|
| 0:35:52 | and at children's hospital boston |
|---|
| 0:35:55 | and |
|---|
| 0:35:56 | he |
|---|
| 0:35:57 | unfortunately fortunately got to see lots of little children |
|---|
| 0:36:00 | of we came yeah |
|---|
| 0:36:01 | uh once they were diagnosed they only had a few week |
|---|
| 0:36:05 | as a pathologist he mostly dealt with P two dishes and so forth you didn't really wasn't really a clinician |
|---|
| 0:36:11 | but he got this thought |
|---|
| 0:36:13 | that maybe if you could come up with chemicals |
|---|
| 0:36:16 | that were more poison this to the cancer cells than they were to the normal cells |
|---|
| 0:36:20 | maybe he could extend the lives of these K |
|---|
| 0:36:23 | are we experimented with this in the petri dishes the course for the most part for a while |
|---|
| 0:36:27 | they need then any came up with something that he thought would work |
|---|
| 0:36:31 | any tried it out |
|---|
| 0:36:32 | everybody's permission |
|---|
| 0:36:34 | and some of these kids |
|---|
| 0:36:35 | and low and behold |
|---|
| 0:36:36 | it actually did extend their lives for a while |
|---|
| 0:36:39 | this was |
|---|
| 0:36:40 | the first |
|---|
| 0:36:41 | known |
|---|
| 0:36:41 | case of came at there |
|---|
| 0:36:45 | this just |
|---|
| 0:36:46 | great and it started a whole revolution the ended up starting a big centre national cancer institute stuff uh |
|---|
| 0:36:52 | it's not the data fibre reverence to |
|---|
| 0:36:55 | and |
|---|
| 0:36:57 | um |
|---|
| 0:36:59 | the key point i wanna make about it |
|---|
| 0:37:01 | is that |
|---|
| 0:37:02 | there's this quandary |
|---|
| 0:37:03 | between curing patients |
|---|
| 0:37:05 | you have these patients are coming through |
|---|
| 0:37:07 | who are in terrible straits |
|---|
| 0:37:10 | but on the other hand |
|---|
| 0:37:12 | you don't have any time |
|---|
| 0:37:14 | to figure out what's really going on |
|---|
| 0:37:17 | and there were |
|---|
| 0:37:18 | important early successes based on hunches the my own call than many others had |
|---|
| 0:37:24 | and there wasn't time to wear in the real cause for things |
|---|
| 0:37:27 | and by the way stories like this |
|---|
| 0:37:29 | for surgery surgical interventions and for radiation as well |
|---|
| 0:37:34 | uh |
|---|
| 0:37:35 | so there's some success |
|---|
| 0:37:37 | but they still |
|---|
| 0:37:38 | didn't find a general curve cure and uh as you know to this day there's still is no general cure |
|---|
| 0:37:42 | for cancer |
|---|
| 0:37:43 | but things are a lot better every missions or longer and so forth |
|---|
| 0:37:47 | and now there's |
|---|
| 0:37:48 | starting to be some understanding of the biological mechanisms and one hopes that this will lead to to keep |
|---|
| 0:37:54 | uh a solution |
|---|
| 0:37:56 | so this is wonderful book a strong the recommend the emperor of all melodies |
|---|
| 0:38:00 | uh |
|---|
| 0:38:01 | about uh like the industry have cancer |
|---|
| 0:38:05 | and i'll just read this |
|---|
| 0:38:06 | isn't thing the speaker viewers i think of remedies |
|---|
| 0:38:09 | in such time as we have considered of the cost |
|---|
| 0:38:12 | here must be imperfect claim and to no purpose |
|---|
| 0:38:15 | where and the "'cause" of that first been searched |
|---|
| 0:38:18 | this again doesn't belie the fact that it can be very useful |
|---|
| 0:38:22 | to uh go ahead and try to fix something along the way |
|---|
| 0:38:26 | but in the long term you need to understand what's going on |
|---|
| 0:38:30 | so as opposed to just |
|---|
| 0:38:32 | trying our bright ideas which we all do |
|---|
| 0:38:35 | how about finding out what's wrong |
|---|
| 0:38:39 | the statistical approach |
|---|
| 0:38:40 | to speech recognition requires |
|---|
| 0:38:42 | assumptions that made reference to |
|---|
| 0:38:44 | there known literally to be false |
|---|
| 0:38:47 | this may or may not be a problem |
|---|
| 0:38:49 | maybe it's just handled by uh say raising these |
|---|
| 0:38:52 | uh likelihoods to power |
|---|
| 0:38:55 | how can we learn |
|---|
| 0:38:57 | so there's a some work that's been started i wanted the call your tension to |
|---|
| 0:39:01 | from steve work men and larry gaelic okay |
|---|
| 0:39:03 | starting a couple years ago |
|---|
| 0:39:05 | where what they did was to consider each assumption separately |
|---|
| 0:39:09 | and then rather can trying to fix the models |
|---|
| 0:39:12 | modified the data |
|---|
| 0:39:13 | B some resampling S um |
|---|
| 0:39:15 | some uh |
|---|
| 0:39:16 | bootstrapping kind approaches |
|---|
| 0:39:18 | to match the models |
|---|
| 0:39:20 | observe the improvement |
|---|
| 0:39:22 | and use that to inspire more bright ideas |
|---|
| 0:39:26 | but this point |
|---|
| 0:39:27 | the really just focused on the diagnosis part and not on the |
|---|
| 0:39:30 | a new bright ideas frankly |
|---|
| 0:39:32 | so this is being pursued also at icsi and the how to project which is outing unfortunate characteristics of hmm |
|---|
| 0:39:39 | and |
|---|
| 0:39:40 | uh i'm gonna give you just a couple results for more recent version i should add by the way that |
|---|
| 0:39:44 | uh |
|---|
| 0:39:45 | it's is a different K like this is larry sonde dan |
|---|
| 0:39:48 | who just this P H D with that's |
|---|
| 0:39:51 | um |
|---|
| 0:39:52 | but |
|---|
| 0:39:53 | first this is a |
|---|
| 0:39:55 | very simplified system so the error rate for wall street journal is is pretty is pretty high here |
|---|
| 0:40:01 | and uh it's |
|---|
| 0:40:03 | the output |
|---|
| 0:40:04 | uh demonstrably does not really fit the G M distribution that you got from the training set |
|---|
| 0:40:10 | and it definitely doesn't satisfy the independence assumptions and you get this thirteen percent |
|---|
| 0:40:16 | uh |
|---|
| 0:40:17 | now if you simulated data really just generated from the models |
|---|
| 0:40:21 | you should do pretty well in a fact you do basically |
|---|
| 0:40:23 | uh virtually all of the errors go away |
|---|
| 0:40:27 | but here's the interesting one i think |
|---|
| 0:40:29 | if you |
|---|
| 0:40:30 | use resampled sample data so this is the actual speech data |
|---|
| 0:40:34 | but you're just resampling it such a way |
|---|
| 0:40:37 | to assure the statistical conditional independence |
|---|
| 0:40:41 | it also gets rid of nearly all of the year |
|---|
| 0:40:45 | now they're studies are a lot more detailed this there's a a lot of |
|---|
| 0:40:48 | a lot of things that they're looking at |
|---|
| 0:40:50 | a lot of things that trying out |
|---|
| 0:40:51 | but i think this gives the flavour |
|---|
| 0:40:53 | of what they're doing |
|---|
| 0:40:58 | so |
|---|
| 0:40:59 | in summary |
|---|
| 0:41:02 | uh a speech recognition is mature mature |
|---|
| 0:41:05 | in some sense it has an advanced degree |
|---|
| 0:41:07 | this because it's been around a long time and their commercial systems and so forth |
|---|
| 0:41:12 | and yet we still find it to be brittle |
|---|
| 0:41:15 | and uh we essentially have to start over again with each new task |
|---|
| 0:41:20 | uh the recent improvements |
|---|
| 0:41:21 | have been really quite incremental or a lot of things of sort of levelled off |
|---|
| 0:41:26 | we need to rethink |
|---|
| 0:41:28 | kind of like going back to school |
|---|
| 0:41:30 | kind of like continuing education |
|---|
| 0:41:33 | uh we may need more basic models |
|---|
| 0:41:35 | uh more may need more |
|---|
| 0:41:37 | basic features |
|---|
| 0:41:39 | we may need more study of air |
|---|
| 0:41:43 | and |
|---|
| 0:41:44 | the other thing i wanna briefly mention is that |
|---|
| 0:41:48 | we do live in error where there is a huge amount of computation available |
|---|
| 0:41:52 | and even though the clock rates don't continue to go up is they have |
|---|
| 0:41:56 | uh do to uh many core systems |
|---|
| 0:41:59 | and |
|---|
| 0:42:00 | cloud computing and so forth |
|---|
| 0:42:01 | there is gonna continue to be |
|---|
| 0:42:03 | an increased availability to lots of computation |
|---|
| 0:42:06 | and this |
|---|
| 0:42:07 | should make it possible for us to consider |
|---|
| 0:42:10 | huge numbers of models |
|---|
| 0:42:12 | uh and methods |
|---|
| 0:42:13 | that we wouldn't consider before |
|---|
| 0:42:15 | for instance on the front end side |
|---|
| 0:42:17 | these uh auditory based or cortical base things can really blowup up the computation |
|---|
| 0:42:22 | from the simple kind of stuff that you have with mfccs or P L P |
|---|
| 0:42:28 | uh |
|---|
| 0:42:28 | so |
|---|
| 0:42:30 | it's good to do that it's good to try things |
|---|
| 0:42:34 | that might take a lot of computation even if they might not work yeah in your i phone just now |
|---|
| 0:42:39 | um um so |
|---|
| 0:42:43 | you also have to know and then sure you all do that just having more computation is not a panacea |
|---|
| 0:42:48 | doesn't actually solve things |
|---|
| 0:42:49 | but it can potentially |
|---|
| 0:42:51 | a give you a lot more possibility |
|---|
| 0:42:54 | that's pretty much what i want to say |
|---|
| 0:42:56 | uh |
|---|
| 0:42:57 | i do wanna acknowledge that the stuff have talked about is not a particularly for me for many people including |
|---|
| 0:43:03 | people outside our level |
|---|
| 0:43:05 | uh but i do want to thank |
|---|
| 0:43:07 | the many current former students and postdocs visitors icsi staff |
|---|
| 0:43:12 | and particularly give a shout out to |
|---|
| 0:43:14 | hynek hermansky every pore large she option honesty workman jordan cone |
|---|
| 0:43:18 | here's my shameless plug for a book |
|---|
| 0:43:21 | uh which he did already mentioned |
|---|
| 0:43:23 | that is gonna be out this fall thanks to tons of work from dan ellis |
|---|
| 0:43:27 | and other contributors i should say |
|---|
| 0:43:29 | uh like uh |
|---|
| 0:43:31 | gel and |
|---|
| 0:43:32 | and |
|---|
| 0:43:33 | job for then the |
|---|
| 0:43:35 | um |
|---|
| 0:43:37 | simon king for instance |
|---|
| 0:43:39 | and |
|---|
| 0:43:41 | thank you for your attention |
|---|
| 0:43:50 | K |
|---|
| 0:43:52 | a |
|---|
| 0:43:53 | sorry |
|---|
| 0:43:54 | having time i'm |
|---|
| 0:43:56 | oh |
|---|
| 0:43:58 | you |
|---|
| 0:43:58 | what is only a lot not of time bringing up |
|---|
| 0:44:04 | yeah |
|---|
| 0:44:05 | you feel |
|---|
| 0:44:06 | i |
|---|
| 0:44:08 | oh |
|---|
| 0:44:16 | i promised i put on that |
|---|
| 0:44:18 | yes are you thing to remind you about why |
|---|
| 0:44:21 | i if know what is a question |
|---|
| 0:44:23 | or |
|---|
| 0:44:25 | okay |
|---|
| 0:44:26 | and |
|---|
| 0:44:28 | you mike |
|---|
| 0:44:29 | yeah |
|---|
| 0:44:30 | uh |
|---|
| 0:44:31 | right |
|---|
| 0:44:33 | think |
|---|
| 0:44:35 | yeah |
|---|
| 0:44:36 | what are you in the remote |
|---|
| 0:44:38 | i know that |
|---|
| 0:44:39 | think |
|---|
| 0:44:41 | by |
|---|
| 0:44:42 | yeah |
|---|
| 0:44:43 | speak Q mine |
|---|
| 0:44:44 | oh |
|---|
| 0:44:44 | oh don't hold back um yeah okay |
|---|
| 0:44:47 | right at a time |
|---|
| 0:44:49 | i |
|---|
| 0:44:50 | yes |
|---|
| 0:44:51 | well though they they still a chance |
|---|
| 0:44:52 | it's still which chance |
|---|
| 0:44:53 | get get the courage |
|---|
| 0:44:55 | um |
|---|
| 0:44:57 | i i think that the right answer is |
|---|
| 0:44:59 | i don't know |
|---|
| 0:45:02 | because |
|---|
| 0:45:03 | for instance |
|---|
| 0:45:04 | well i used to say when people talk to me about this is that |
|---|
| 0:45:07 | okay i think of |
|---|
| 0:45:08 | speech recognition is as being in three pieces there's |
|---|
| 0:45:12 | the representations that you have |
|---|
| 0:45:14 | there's at the statistical models and the search and so forth in the middle |
|---|
| 0:45:19 | and then there's |
|---|
| 0:45:20 | uh all of the things that you could imagine doing with speech understanding and pragmatics |
|---|
| 0:45:24 | it's X and so forth |
|---|
| 0:45:26 | and i used the think that okay the first one i know a little bit about |
|---|
| 0:45:30 | uh and i and i i feel very strongly and you know bunny results to back this up that that's |
|---|
| 0:45:35 | very important for improving |
|---|
| 0:45:37 | the last one i is not my area of expertise but where have seen in other is certainly and human |
|---|
| 0:45:42 | case |
|---|
| 0:45:44 | i believe that's very important |
|---|
| 0:45:45 | so i sort of thought the middle part |
|---|
| 0:45:47 | yeah O you works well in |
|---|
| 0:45:50 | uh but then so this |
|---|
| 0:45:51 | this study |
|---|
| 0:45:53 | and i'm not so sure |
|---|
| 0:45:54 | no i actually think that you should |
|---|
| 0:45:56 | uh pursue whatever it is that you |
|---|
| 0:45:59 | i feel |
|---|
| 0:46:00 | yeah feel is of greatest interest |
|---|
| 0:46:01 | i actually think the key thing |
|---|
| 0:46:03 | is they have interesting france |
|---|
| 0:46:07 | a for nine now i see |
|---|
| 0:46:10 | now |
|---|
| 0:46:11 | uh you like or know it's what i actually think i and if here and here right |
|---|
| 0:46:16 | okay |
|---|
| 0:46:18 | heard |
|---|
| 0:46:19 | a |
|---|
| 0:46:19 | yeah |
|---|
| 0:46:20 | okay |
|---|
| 0:46:21 | my is louder the |
|---|
| 0:46:24 | all of these uh a technique used right or |
|---|
| 0:46:27 | since we |
|---|
| 0:46:29 | i'm spectral analysis |
|---|
| 0:46:30 | roaches |
|---|
| 0:46:32 | pretty much uh uh everything just right |
|---|
| 0:46:34 | i |
|---|
| 0:46:35 | or |
|---|
| 0:46:36 | in almost all |
|---|
| 0:46:38 | but from now on |
|---|
| 0:46:40 | you |
|---|
| 0:46:40 | um |
|---|
| 0:46:41 | spectral techniques like much |
|---|
| 0:46:43 | C P L you of |
|---|
| 0:46:45 | from |
|---|
| 0:46:45 | reading some aspects of |
|---|
| 0:46:47 | us |
|---|
| 0:46:48 | course things |
|---|
| 0:46:49 | you guys most |
|---|
| 0:46:51 | but the big problems it seems to me you're still will interfere |
|---|
| 0:46:55 | or fear from other sources |
|---|
| 0:46:57 | reverberation |
|---|
| 0:46:59 | uh |
|---|
| 0:47:00 | spatial |
|---|
| 0:47:01 | hearing and so forth where us |
|---|
| 0:47:04 | or you much help |
|---|
| 0:47:06 | distinguishing mode |
|---|
| 0:47:07 | sources |
|---|
| 0:47:08 | direction |
|---|
| 0:47:10 | the uh the other dimension |
|---|
| 0:47:12 | uh uh uh |
|---|
| 0:47:13 | fine |
|---|
| 0:47:14 | role |
|---|
| 0:47:15 | information or something that has been explored lot |
|---|
| 0:47:17 | the |
|---|
| 0:47:18 | psychological and |
|---|
| 0:47:20 | is about you |
|---|
| 0:47:22 | and |
|---|
| 0:47:23 | few steps |
|---|
| 0:47:24 | and |
|---|
| 0:47:25 | so ensemble interval histogram that |
|---|
| 0:47:27 | and |
|---|
| 0:47:28 | drop another |
|---|
| 0:47:29 | drop |
|---|
| 0:47:30 | or |
|---|
| 0:47:30 | mention an entirely |
|---|
| 0:47:32 | a kind of a |
|---|
| 0:47:33 | but |
|---|
| 0:47:34 | station |
|---|
| 0:47:34 | see |
|---|
| 0:47:36 | you |
|---|
| 0:47:36 | get |
|---|
| 0:47:39 | rel |
|---|
| 0:47:39 | and source you |
|---|
| 0:47:41 | same time |
|---|
| 0:47:42 | you say much about that |
|---|
| 0:47:45 | that that's the direction course |
|---|
| 0:47:46 | so that's you |
|---|
| 0:47:48 | what you think about that |
|---|
| 0:47:49 | direction and we get |
|---|
| 0:47:51 | people working in |
|---|
| 0:47:52 | you pay more attention |
|---|
| 0:47:55 | things beyond young |
|---|
| 0:47:58 | what why spectral i guess which you mean a short-term spectral right |
|---|
| 0:48:02 | and uh i i may not have done this is clearly as like could but i think the shah must |
|---|
| 0:48:07 | stuff that i was making reference to |
|---|
| 0:48:10 | a certainly can be long time the the the their spectro-temporal representation |
|---|
| 0:48:15 | what you feed |
|---|
| 0:48:17 | uh the the different |
|---|
| 0:48:18 | quite the cortical |
|---|
| 0:48:20 | filters |
|---|
| 0:48:21 | can be a very different kind of spectrogram when that takes advantage of the sort of stuff and i think |
|---|
| 0:48:26 | absolutely what we should do |
|---|
| 0:48:28 | and that's these disturbances the multiple sources the reverberation et cetera |
|---|
| 0:48:33 | uh uh i agree that's |
|---|
| 0:48:34 | that's the biggest challenge that C |
|---|
| 0:48:36 | if someone talks about the performance of humans versus |
|---|
| 0:48:40 | uh a a speech recognition systems in the current generation systems that's the easiest playstation of the difference |
|---|
| 0:48:46 | so uh |
|---|
| 0:48:48 | i completely agree |
|---|
| 0:48:49 | sorry am |
|---|
| 0:48:50 | i'm not being a politician i actually do agree |
|---|
| 0:48:54 | i |
|---|
| 0:48:55 | uh |
|---|
| 0:48:56 | results |
|---|
| 0:48:57 | i |
|---|
| 0:48:58 | oh |
|---|
| 0:48:59 | hmms |
|---|
| 0:49:00 | see |
|---|
| 0:49:00 | just |
|---|
| 0:49:04 | i modeling |
|---|
| 0:49:07 | true |
|---|
| 0:49:07 | S |
|---|
| 0:49:09 | so |
|---|
| 0:49:11 | you |
|---|
| 0:49:11 | so |
|---|
| 0:49:12 | i |
|---|
| 0:49:15 | i |
|---|
| 0:49:17 | so go |
|---|
| 0:49:18 | yeah |
|---|
| 0:49:19 | the most |
|---|
| 0:49:21 | i |
|---|
| 0:49:26 | uh_huh |
|---|
| 0:49:27 | really |
|---|
| 0:49:28 | thus |
|---|
| 0:49:29 | a more attention that |
|---|
| 0:49:32 | K i didn't pay yeah |
|---|
| 0:49:34 | but |
|---|
| 0:49:35 | but this is you were certainly are are reinforcing my my bias as |
|---|
| 0:49:39 | uh oh go it is getting up but |
|---|
| 0:49:42 | um |
|---|
| 0:49:43 | i i'm mostly a front-end person these days have been for a while and i agree that there's a lot |
|---|
| 0:49:49 | to be done there |
|---|
| 0:49:50 | i didn't mean the say at all that the language modeling and so forth was |
|---|
| 0:49:54 | was the bulk of it |
|---|
| 0:49:55 | even that study at the end was just saying for fairly simple case with the sensually matched training and test |
|---|
| 0:50:01 | uh that |
|---|
| 0:50:02 | uh |
|---|
| 0:50:03 | you could |
|---|
| 0:50:04 | jimmy with the data in such a way |
|---|
| 0:50:07 | to match the models assumptions and you could do much better |
|---|
| 0:50:10 | but uh one of the things that we're gonna be trying to do in follows to that study is looking |
|---|
| 0:50:15 | at mismatched conditions |
|---|
| 0:50:17 | what can you |
|---|
| 0:50:18 | i cases with noise and reverberation and so forth |
|---|
| 0:50:21 | in which case i don't think the effect will be quite as big |
|---|
| 0:50:24 | and |
|---|
| 0:50:25 | you know it's garbage in garbage out if basically you feed in representation |
|---|
| 0:50:30 | that are not |
|---|
| 0:50:31 | uh giving you the information you need how are you gonna get it at the yeah so |
|---|
| 0:50:36 | i i i agree with you but i was trying to be fair or not only to people that co |
|---|
| 0:50:40 | but also because |
|---|
| 0:50:42 | i feel that uh in if you cover the space |
|---|
| 0:50:45 | of all these different cases |
|---|
| 0:50:47 | there many cases where these other areas are in fact very pour |
|---|
| 0:50:51 | and human beings as with my base po example human beings to make use of higher level information |
|---|
| 0:50:56 | uh often |
|---|
| 0:50:57 | in order to figure out what was said what up important about was that |
|---|
| 0:51:01 | which leads me to george's question |
|---|
| 0:51:03 | as you were talking |
|---|
| 0:51:05 | i was |
|---|
| 0:51:05 | constantly we with the |
|---|
| 0:51:07 | analogy |
|---|
| 0:51:09 | um in speech recognition with almost |
|---|
| 0:51:12 | you know |
|---|
| 0:51:12 | irresistibly and at |
|---|
| 0:51:15 | a things and optical character recognition |
|---|
| 0:51:18 | and so |
|---|
| 0:51:19 | uh |
|---|
| 0:51:20 | almost every slide hand hand irresistible analogies uh from the a current successes to future direction is to problems that |
|---|
| 0:51:28 | are being experience |
|---|
| 0:51:29 | uh_huh and i i'm just wondering is there |
|---|
| 0:51:31 | a cross disciplinary knowledge that can be leveraged yeah is is it is it being language |
|---|
| 0:51:37 | to speech recognition except in the sense that some of these alternative there uh |
|---|
| 0:51:42 | approaches |
|---|
| 0:51:43 | uh F they have tried looking at uh the spectrogram has an image |
|---|
| 0:51:47 | uh and so forth some of the neural network techniques that were developed uh in optical character recognition |
|---|
| 0:51:54 | sort of came back the other way but a lot of it's gone |
|---|
| 0:51:57 | gone the other way |
|---|
| 0:51:58 | but |
|---|
| 0:51:59 | you know we can to be fairly fragmented community and and and not listen to each other quite as much |
|---|
| 0:52:04 | as we should |
|---|
| 0:52:06 | whose now |
|---|
| 0:52:11 | J |
|---|
| 0:52:12 | no i think he's of the dog |
|---|
| 0:52:15 | hold |
|---|
| 0:52:16 | well |
|---|
| 0:52:17 | oh i'm sorry i was drawing |
|---|
| 0:52:19 | to stay in was C |
|---|
| 0:52:21 | oh |
|---|
| 0:52:22 | couldn't |
|---|
| 0:52:23 | but the climbs |
|---|
| 0:52:24 | a plug for the for a go on a tour |
|---|
| 0:52:27 | um |
|---|
| 0:52:32 | i i have some exposure probably most people are so that some exposure to model |
|---|
| 0:52:37 | speech recognition technology |
|---|
| 0:52:39 | you real application |
|---|
| 0:52:40 | yeah i think you know of um |
|---|
| 0:52:43 | i've been exposed to google voice |
|---|
| 0:52:45 | perhaps many people have |
|---|
| 0:52:47 | yeah and |
|---|
| 0:52:48 | uh is not a |
|---|
| 0:52:49 | plug |
|---|
| 0:52:50 | google voice but |
|---|
| 0:52:52 | i think |
|---|
| 0:52:52 | model and uh point speech recognition technology to me use them easily we |
|---|
| 0:52:58 | good |
|---|
| 0:53:01 | considering the |
|---|
| 0:53:02 | uh the systems or |
|---|
| 0:53:04 | these are are have no |
|---|
| 0:53:06 | "'kay" will you really great |
|---|
| 0:53:08 | uh |
|---|
| 0:53:10 | semantic condo |
|---|
| 0:53:13 | in to see what people see what do systems can do you acoustic |
|---|
| 0:53:17 | use a movies |
|---|
| 0:53:18 | to me used |
|---|
| 0:53:21 | yeah and the so |
|---|
| 0:53:23 | where i so you the the channel used to be a to i don't know how to do that a |
|---|
| 0:53:28 | where i so you challenge uses |
|---|
| 0:53:31 | use um |
|---|
| 0:53:34 | creating models so the semantic context to you of the kind of support |
|---|
| 0:53:39 | to uh speech recognition that |
|---|
| 0:53:41 | we seen from uh the |
|---|
| 0:53:44 | you real |
|---|
| 0:53:45 | why which models |
|---|
| 0:53:46 | which |
|---|
| 0:53:47 | don't |
|---|
| 0:53:48 | model |
|---|
| 0:53:49 | that |
|---|
| 0:53:52 | okay well |
|---|
| 0:53:53 | that was a question uh |
|---|
| 0:53:55 | i |
|---|
| 0:53:56 | i i know i wasn't |
|---|
| 0:53:57 | um but also say something anyway which is that |
|---|
| 0:54:00 | uh i i am really taking the middle position |
|---|
| 0:54:03 | there plenty a task |
|---|
| 0:54:04 | uh where in fact |
|---|
| 0:54:06 | uh recognition does fail particularly in noise and reverberation so on |
|---|
| 0:54:10 | google voice search is is very impressive |
|---|
| 0:54:12 | but |
|---|
| 0:54:13 | you know there's a lot of |
|---|
| 0:54:13 | a lot of cases where things to fail |
|---|
| 0:54:16 | and |
|---|
| 0:54:17 | uh |
|---|
| 0:54:17 | we can see significant improvements |
|---|
| 0:54:20 | in a number of tasks |
|---|
| 0:54:21 | by changing the front so i think there is something important there |
|---|
| 0:54:25 | but in your in your state you one really attacking the front so what you're saying we have to pay |
|---|
| 0:54:29 | attention to the back and i completely agree |
|---|
| 0:54:34 | one more in it's probably time |
|---|
| 0:54:36 | i one change of subject a little bit um yeah given that i'll as can you say something about the |
|---|
| 0:54:40 | of |
|---|
| 0:54:41 | oh in this you courses academia and this research |
|---|
| 0:54:43 | you you got a a big put both see a both side |
|---|
| 0:54:47 | when it what is good what is that in you could for speech |
|---|
| 0:54:49 | or now and |
|---|
| 0:54:52 | which we go |
|---|
| 0:54:53 | i actually a pretty small for |
|---|
| 0:54:55 | and just re |
|---|
| 0:54:57 | but uh |
|---|
| 0:54:58 | uh well i think industry should fund the academia |
|---|
| 0:55:02 | i |
|---|
| 0:55:06 | yeah |
|---|
| 0:55:07 | i to |
|---|
| 0:55:12 | and |
|---|
| 0:55:12 | exactly |
|---|
| 0:55:14 | thanks for the actual |
|---|