Does ASR have a PHD, or is it just Piled Higher and Deeper?
Presented by: Nelson Morgan (International Computer Science Institute and UC Berkeley, USA), Author(s): Nelson Morgan (International Computer Science Institute and UC Berkeley, USA)
Automatic Speech Recognition (ASR) is a venerable research discipline, with significant publications going back to the early 1950's, and with many of the important conceptual breakthroughs occurring in the 1970's and 1980's. The technology is now good enough for ASR to be used as a component in many commercial applications. However, there remain many limitations, in particular failures observed in moderate amounts of noise or reverberation, or in unexpected speaking styles or topics, all conditions for which human beings can often recognize speech quite well. These remaining problems may be due to limited progress in the basic principles underlying ASR, as opposed to the ingenious engineering methods that have been developed to take advantage of Moore's Law improvements in storage and computational clout. Modern systems for ASR include many heterogeneous computational levels, piled one on top of another, each one added after some modest success when used in combination with the previous full system. It is possible that such complexity is required given the nature of the signal and the hidden nature of the intrinsic information. On the other hand, perhaps there are more principled ways of designing ASR systems based on some core principles that are as yet undiscovered (or unexploited), some of which may also be enabled by the increased computational capabilities that we expect in the future, particularly via parallelism in multi-core CPUs and GPUs. The presentation will review some of the history as well as the current status of ASR systems, with a particular emphasis on research approaches that have not made it into the mainstream but that still might have some potential for ameliorating the remaining problems. The talk will conclude with some suggestions of promising directions, including recent developments in diagnostics that may lead to deeper understanding. The speaker will make no attempt to provide a complete or even balanced description of ASR's history, but will instead focus on those developments that clarify his theme: improving short-term performance is valuable, but understanding the mechanisms for failure (and potentially success) is often much more important.