InterSpeech 2021

Streaming for ASR/RNN Transducers

Super-Human Performance in Online Low-latency Recognition of Conversational Speech
(3 minutes introduction)

Thai-Son Nguyen (KIT, Germany), Sebastian Stüker (KIT, Germany), Alex Waibel (KIT, Germany)

Multiple Softmax Architecture for Streaming Multilingual End-to-End ASR Systems
(3 minutes introduction)

Vikas Joshi (Microsoft, India), Amit Das (Microsoft, USA), Eric Sun (Microsoft, USA), Rupesh R. Mehta (Microsoft, India), Jinyu Li (Microsoft, USA), Yifan Gong (Microsoft, USA)

An Efficient Streaming Non-Recurrent On-Device End-to-End Model with Improvements to Rare-Word Modeling
(3 minutes introduction)

Tara N. Sainath (Google, USA), Yanzhang He (Google, USA), Arun Narayanan (Google, USA), Rami Botros (Google, USA), Ruoming Pang (Google, USA), David Rybach (Google, USA), Cyril Allauzen (Google, USA), Ehsan Variani (Google, USA), James Qin (Google, USA), Quoc-Nam Le-The (Google, USA), Shuo-Yiin Chang (Google, USA), Bo Li (Google, USA), Anmol Gulati (Google, USA), Jiahui Yu (Google, USA), Chung-Cheng Chiu (Google, USA), Diamantino Caseiro (Google, USA), Wei Li (Google, USA), Qiao Liang (Google, USA), Pat Rondon (Google, USA)

Reducing Exposure Bias in Training Recurrent Neural Network Transducers
(3 minutes introduction)

Xiaodong Cui (IBM, USA), Brian Kingsbury (IBM, USA), George Saon (IBM, USA), David Haws (IBM, USA), Zoltán Tüske (IBM, USA)

Bridging the gap between streaming and non-streaming ASR systems by distilling ensembles of CTC and RNN-T models
(3 minutes introduction)

Thibault Doutre (Google, USA), Wei Han (Google, USA), Chung-Cheng Chiu (Google, USA), Ruoming Pang (Google, USA), Olivier Siohan (Google, USA), Liangliang Cao (Google, USA)

Bridging the gap between streaming and non-streaming ASR systems by distilling ensembles of CTC and RNN-T models
(longer introduction)

Thibault Doutre (Google, USA), Wei Han (Google, USA), Chung-Cheng Chiu (Google, USA), Ruoming Pang (Google, USA), Olivier Siohan (Google, USA), Liangliang Cao (Google, USA)

Mixture Model Attention: Flexible Streaming and Non-Streaming Automatic Speech Recognition
(3 minutes introduction)

Kartik Audhkhasi (Google, USA), Tongzhou Chen (Google, USA), Bhuvana Ramabhadran (Google, USA), Pedro J. Moreno (Google, USA)