InterSpeech 2021

Topics in ASR: Robustness, feature extraction, and far-field ASR

End-to-End Spelling Correction Conditioned on Acoustic Feature for Code-switching Speech Recognition
(3 minutes introduction)

Shuai Zhang (UCAS, China), Jiangyan Yi (CAS, China), Zhengkun Tian (UCAS, China), Ye Bai (UCAS, China), Jianhua Tao (UCAS, China), Xuefei Liu (CAS, China), Zhengqi Wen (CAS, China)

Phoneme Recognition through Fine Tuning of Phonetic Representations: a Case Study on Luhya Language Varieties
(3 minutes introduction)

Kathleen Siminyu (Georgia Tech, USA), Xinjian Li (Carnegie Mellon University, USA), Antonios Anastasopoulos (George Mason University, USA), David R. Mortensen (Carnegie Mellon University, USA), Michael R. Marlo (Mizzou, USA), Graham Neubig (Carnegie Mellon University, USA)

Speech Acoustic Modelling using Raw Source and Filter Components
(3 minutes introduction)

Erfan Loweimi (University of Edinburgh, UK), Zoran Cvetkovic (King’s College London, UK), Peter Bell (University of Edinburgh, UK), Steve Renals (University of Edinburgh, UK)

IR-GAN: Room impulse response generator for far-field speech recognition
(3 minutes introduction)

Anton Ratnarajah (University of Maryland, USA), Zhenyu Tang (University of Maryland, USA), Dinesh Manocha (University of Maryland, USA)

Multi-Channel Transformer Transducer for Speech Recognition
(3 minutes introduction)

Feng-Ju Chang (Amazon, USA), Martin Radfar (Amazon, USA), Athanasios Mouchtaris (Amazon, USA), Maurizio Omologo (Amazon, USA)

Multi-Channel Transformer Transducer for Speech Recognition
(longer introduction)

Feng-Ju Chang (Amazon, USA), Martin Radfar (Amazon, USA), Athanasios Mouchtaris (Amazon, USA), Maurizio Omologo (Amazon, USA)

Leveraging Phone Mask Training for Phonetic-Reduction-Robust E2E Uyghur Speech Recognition
(3 minutes introduction)

Guodong Ma (Xinjiang University, China), Pengfei Hu (Tencent, China), Jian Kang (Tencent, China), Shen Huang (Tencent, China), Hao Huang (Xinjiang University, China)

Raw Waveform Encoder with Multi-Scale Globally Attentive Locally Recurrent Networks for End-to-End Speech Recognition
(3 minutes introduction)

Max W.Y. Lam (Tencent, China), Jun Wang (Tencent, China), Chao Weng (Tencent, China), Dan Su (Tencent, China), Dong Yu (Tencent, USA)