"You don't understand me!": Comparing ASR results for L1 and L2 speakers of Swedish <BR>(3 minutes introduction)

"You don't understand me!": Comparing ASR results for L1 and L2 speakers of Swedish
(3 minutes introduction)

Ronald Cumbal (KTH, Sweden), Birger Moell (KTH, Sweden), José Lopes (Heriot-Watt University, UK), Olov Engwall (KTH, Sweden)

The performance of Automatic Speech Recognition (ASR) systems has constantly increased in state-of-the-art development. However, performance tends to decrease considerably in more challenging conditions (e.g., background noise, multiple speaker social conversations) and with more atypical speakers (e.g., children, non-native speakers or people with speech disorders), which signifies that general improvements do not necessarily transfer to applications that rely on ASR, e.g., educational software for younger students or language learners. In this study, we focus on the gap in performance between recognition results for native and non-native, read and spontaneous, Swedish utterances transcribed by different ASR services. We compare the recognition results using Word Error Rate and analyze the linguistic factors that may generate the observed transcription errors.

InterSpeech 2021

"You don't understand me!": Comparing ASR results for L1 and L2 speakers of Swedish
(3 minutes introduction)

Search in Audio

Related Recordings

Deep feature transfer learning for automatic pronunciation assessment
(3 minutes introduction)

NeMo Inverse Text Normalization: From Development To Production
(3 minutes introduction)

InterSpeech 2021

"You don't understand me!": Comparing ASR results for L1 and L2 speakers of Swedish (3 minutes introduction)

Search in Audio

Related Recordings

Deep feature transfer learning for automatic pronunciation assessment (3 minutes introduction)

NeMo Inverse Text Normalization: From Development To Production (3 minutes introduction)

"You don't understand me!": Comparing ASR results for L1 and L2 speakers of Swedish
(3 minutes introduction)

Deep feature transfer learning for automatic pronunciation assessment
(3 minutes introduction)

NeMo Inverse Text Normalization: From Development To Production
(3 minutes introduction)