Robust Speech Recognition: more than just a lot of noise

Michael Seltzer (Microsoft Research)	Michael Seltzer

The robustness of speech recognition systems to acoustic variability is a key factor to their success or failure. This variability can arise from multiple sources, often simultaneously, including environmental noise, reverberation, speaker, and bandwidth. In this talk, we will discuss techniques that can be used to mitigate such variability and reduce the mismatch between the observed speech seen at runtime and the recognizer's acoustic models. We will compare and contrast front-end methods that enhance the signal or features with model-domain methods that adapt the HMM parameters. While most algorithms target a particular source of variability, we will also introduce methods that jointly compensate for multiple sources of mismatch. Although robustness algorithms are often evaluated using a recognizer trained from clean speech, most large-scale commercial systems are built from data collected in the field from real users. We will show how the described robustness techniques can be incorporated into training to reduce the unwanted variability in such data and create more accurate systems. Finally, we'll look at the role of robustness algorithms in commercial applications such as in-car infotainment systems and voice search on smartphones and discuss open challenges that have yet to be addressed.