Pushing the Limits of Non-Autoregressive Speech Recognition <BR>(Oral presentation)

Pushing the Limits of Non-Autoregressive Speech Recognition
(Oral presentation)

Edwin G. Ng (Google, USA), Chung-Cheng Chiu (Google, USA), Yu Zhang (Google, USA), William Chan (Google, Canada)

We combine recent advancements in end-to-end speech recognition to non-autoregressive automatic speech recognition. We push the limits of non-autoregressive state-of-the-art results for multiple datasets: LibriSpeech, Fisher+Switchboard and Wall Street Journal. Key to our recipe, we leverage CTC on giant Conformer neural network architectures with SpecAugment and wav2vec2 pre-training. We achieve 1.8%/3.6% WER on LibriSpeech test/test-other sets, 5.1%/9.8% WER on Switchboard, and 3.4% on the Wall Street Journal, all without a language model.

InterSpeech 2021

Pushing the Limits of Non-Autoregressive Speech Recognition
(Oral presentation)

Search in Audio

Related Recordings

Layer Pruning on Demand with Intermediate CTC
(Oral presentation)

Real-time End-to-End Monaural Multi-speaker Speech Recognition
(Oral presentation)

InterSpeech 2021

Pushing the Limits of Non-Autoregressive Speech Recognition (Oral presentation)

Search in Audio

Related Recordings

Layer Pruning on Demand with Intermediate CTC (Oral presentation)

Real-time End-to-End Monaural Multi-speaker Speech Recognition (Oral presentation)

Pushing the Limits of Non-Autoregressive Speech Recognition
(Oral presentation)

Layer Pruning on Demand with Intermediate CTC
(Oral presentation)

Real-time End-to-End Monaural Multi-speaker Speech Recognition
(Oral presentation)