The 9th International Symposium on Chinese Spoken Language Processing

Keynote 3: Selected Challenges and Solutions for DNN Acoustic Modeling

Yifan Gong

Acoustic modeling with DNN (Deep Neural Networks) has been shown to deliver high speech recognition accuracy on broad range of application scenarios. Increasingly DNN is used in commercial speech recognition products, on either server or device based computing platforms. This creates opportunities for developing algorithms and engineering solutions for DNN-based modeling.

For large scale speech recognition applications, this presentation focuses on several recent techniques to make DNN more effective, including reducing sparseness and run-time cost with SVD based training, improving robustness to acoustic environment with i-vector based DNN modeling, adapting to speakers based on small number of free parameters, increasing language capability by reusing speech training material across languages, parameter tying for multi-style DNN training, reducing word error rate by adding large amount of un-transcribed data, boosting the accuracy of small DNN with behavior transferring training.

The presentation will also identify and elaborate the limitation of current DNN in acoustic modeling, illustrated by experimental results from various applications, and discuss some future directions in DNN for speech recognition.