The Babel Program and Low Resource Speech Technology
|Mary Harper (IARPA)|
This presentation will describe the data resources collected to support the Babel program, the challenges that performers have faced in the program when working with this data in the Base Period of the Program, and lessons learned. The goal of the Babel Program is to rapidly develop speech recognition capability for keyword search in new languages, working with speech recorded in a variety of conditions and with limited amounts of transcription. This effort requires the collection of speech data in a wide variety of languages to facilitate research efforts and assess progress toward Program objectives. The speech data is recorded in the country where the speakers reside and contains variability in speaker demographics and recording conditions. The Program will ultimately address a broad set of languages with a variety of phonotactic, phonological, tonal, morphological, and syntactic characteristics. In the Base Period, performers worked with four Development Languages (Cantonese, Pashto, Tagalog, and Turkish), and then were evaluated on a Surprise Language, Vietnamese, for which they had to build their systems in four weeks. The Program focused solely on telephone speech in the Base Period, but performers will also be working on speech collected with additional types of devices (e.g., table-top microphone) in order to foster research on channel robustness starting in the Option 1 period.
Dr. Mary P. Harper is currently a Program Manager in Incisive Analysis at The Intelligence Advanced Research Projects Activity (IARPA) where she is managing the Babel Program. She earned her BA (Psychology, 1976) at Kent State University, an MS (Psychology, 1980) at the University of Massachusetts, and both an MS and a PhD in Computer Science at Brown University (1986, 1990). From 1989-2007, she was a professor in the School of Electrical and Computer Engineering at Purdue University. Dr. Harper's academic research has focused on computer modeling of human communication with a focus on methods for incorporating multiple types of knowledge sources, including lexical, syntactic, prosodic, and visual sources. She has published over 100 peer-reviewed articles. Dr. Harper served as a rotating Program Director at National Science Foundation while continuing her research activities at Purdue from 2002-2005. She joined the Center for the Advanced Study of Language (CASL) at University of Maryland in 2005 as a senior research scientist investigating the use of human language technology (HLT). From 2006-2008, she also took on the role of Area Director for the Technology Use sub-area at CASL, in which role she led a team of researchers conducting research in this area. In 2008-2010, Dr. Harper, as a principal research scientist, worked together with researchers at the Johns Hopkins HLT Center of Excellence, to develop next-generation human language technologies.