ASRU 2013

Recent Progress in Unsupervised Speech Processing

Jim Glass (MIT)

The development of an automatic speech recognizer is typically a highly supervised process involving the specification of phonetic inventories, lexicons, acoustic and language models, along with annotated training corpora. Although some model parameters may be modified via adaptation, the overall structure of the speech recognizer remains relatively static. While this approach has been effective for problems when there is adequate human expertise and labeled corpora, it is challenged by less-supervised or unsupervised scenarios. It also stands in stark contrast to human processing of speech and language where learning is an intrinsic capability.

In this talk I will describe some of the speech and language research topics being investigated at MIT that require fewer or even zero conventional linguistic resources. In particular I plan to describe our recent progress in unsupervised spoken term discovery, and an inference-based method to automatically learn sub-word unit inventories from unannotated speech.


James Glass is a Senior Research Scientist at the MIT Computer Science and Artificial Intelligence Laboratory where he heads the Spoken Language Systems Group. He is also a Lecturer in the Harvard-MIT Division of Health Sciences and Technology. He received his graduate degrees in Electrical Engineering and Computer Science from MIT in 1985 and 1988. His primary research interests are in the area of speech communication and human-computer interaction, centered on automatic speech recognition and spoken language understanding. He has lectured, taught courses, supervised students, and published extensively in these areas. He is currently a Senior Member of the IEEE, an Associate Editor for the IEEE Transactions on Audio, Speech, and Language Processing, and a member of the Editorial Board for Computer, Speech, and Language.