Applications in transcription, education and learning

Weakly-supervised word-level pronunciation error detection in non-native English speech
(longer introduction)

Daniel Korzekwa (Amazon, Poland), Jaime Lorenzo-Trueba (Amazon, UK), Thomas Drugman (Amazon, UK), Shira Calamaro (Amazon, UK), Bozena Kostek (Gdansk University of Technology, Poland)

End-to-End Speaker-Attributed ASR with Transformer
(3 minutes introduction)

Naoyuki Kanda (Microsoft, USA), Guoli Ye (Microsoft, USA), Yashesh Gaur (Microsoft, USA), Xiaofei Wang (Microsoft, USA), Zhong Meng (Microsoft, USA), Zhuo Chen (Microsoft, USA), Takuya Yoshioka (Microsoft, USA)

Explore Wav2vec 2.0 for Mispronunciation Detection
(3 minutes introduction)

Xiaoshuo Xu (Tencent, China), Yueteng Kang (Tencent, China), Songjun Cao (Tencent, China), Binghuai Lin (Tencent, China), Long Ma (Tencent, China)

Lexical Density Analysis of Word Productions in Japanese English Using Acoustic Word Embeddings
(3 minutes introduction)

Shintaro Ando (University of Tokyo, Japan), Nobuaki Minematsu (University of Tokyo, Japan), Daisuke Saito (University of Tokyo, Japan)

Deep feature transfer learning for automatic pronunciation assessment
(3 minutes introduction)

Binghuai Lin (Tencent, China), Liyuan Wang (Tencent, China)

"You don't understand me!": Comparing ASR results for L1 and L2 speakers of Swedish
(3 minutes introduction)

Ronald Cumbal (KTH, Sweden), Birger Moell (KTH, Sweden), José Lopes (Heriot-Watt University, UK), Olov Engwall (KTH, Sweden)

NeMo Inverse Text Normalization: From Development To Production
(3 minutes introduction)

Yang Zhang (NVIDIA, USA), Evelina Bakhturina (NVIDIA, USA), Kyle Gorman (CUNY Graduate Center, USA), Boris Ginsburg (NVIDIA, USA)

Improvement of Automatic English Pronunciation Assessment with Small Number of Utterances Using Sentence Speakability
(3 minutes introduction)

Satsuki Naijo (Tohoku University, Japan), Akinori Ito (Tohoku University, Japan), Takashi Nose (Tohoku University, Japan)

InterSpeech 2021