ISCSLP 2014

The 9th International Symposium on Chinese Spoken Language Processing

Keynote 1: Large Scale Neural Network Optimization for Mobile Speech Recognition Applications

Michiel Bacchiani

Recent years have shown a large scale adoption of speech recognition by the public, in particular around mobile devices. Google, with its Android operating system, has integrated speech recognition as a key input modality. The decade of speech that our recognizer processes each day is a clear indication of the popularity of this technology with the public. This talk will describe the current mobile speech applications in more detail. In particular, it will provide a more detailed description of the Deep Neural Network (DNN) technology that is used as the acoustic model in this system and its distributed, asynchronous training infrastructure. Since a DNN is a static classifier, it is ill matched to the speech recognition sequence classification problem. The asynchrony that is inherent to our distributed training infrastructure further complicates the optimization of such models. Our recent research efforts have focused on the optimization of the DNN model, matched to the speech recognition problem. This has resulted in three related algorithmic improvements. First a novel way to bootstrap training of a DNN model. Second the use a sequence rather than a frame-based optimization metric. Third, we have succeeded in applying a recurrent neural network structure to our large scale, large vocabulary application. These novel algorithms have shown effective even in light of the asynchrony in our training infrastructure. The algorithms have reduced the error rate of our system with 10% or more over DNNs well optimized with a frame-based objective. And this trend is holding across all 48 languages where we support speech recognition as an input modality.