ISCSLP 2014

The 9th International Symposium on Chinese Spoken Language Processing

Keynote 2: Multilingual Automatic Speech Recognition for Code-switching Speech

Tanja Schultz

The performance of speech and language processing technologies has improved dramatically over the last years, with an increasing number of systems being deployed in a variety of languages and applications. Unfortunately, recent methods and models heavily rely on the availability of massive amounts of resources which only become available in languages spoken by a large number of people in countries of great economic interest, and populations with immediate information technology needs. Furthermore, todays speech processing systems target monolingual scenarios for speakers who are assumed to use one single language while interacting via voice. However, I believe that today’s globalized world requires truly multilingual speech processing systems which support phenomena of multilingualism such as code-switching and accented speech. As these are spoken phenomena, methods are required which perform reliably even if only few resources are available.

In my talk I will present ongoing work at the Cognitive Systems Lab on applying concepts of Multilingual Speech Recognition to rapidly adapt systems to yet unsupported or under-resourced languages. Based on these concepts, I will describe the challenges of building a code-switch speech recognition system using the example of Singaporean speakers code-switching between Mandarin and English. Proposed solutions include the sharing of data and models across both languages to build truly multilingual acoustic models, dictionaries, and language models. Furthermore, I will describe the web-based Rapid Language Adaptation Toolkit (RLAT, see http://csl.ira.uka.de/rlat-dev) which lowers the overall costs for system development by automating the system building process, leveraging off crowd sourcing, and reducing the data needs without suffering significant performance losses. The toolkit enables native language experts to build speech recognition components without requiring detailed technology expertise. Components can be evaluated in an end-to-end system allowing for iterative improvements. By keeping the users in the developmental loop, RLAT can learn from the users’ expertise to constantly adapt and improve. This will hopefully revolutionize the system development process for yet under-resourced languages.