Best of Both Worlds: Robust Accented Speech Recognition with Adversarial Transfer Learning <BR>(3 minutes introduction)

Best of Both Worlds: Robust Accented Speech Recognition with Adversarial Transfer Learning
(3 minutes introduction)

Nilaksh Das (Georgia Tech, USA), Sravan Bodapati (Amazon, USA), Monica Sunkara (Amazon, USA), Sundararajan Srinivasan (Amazon, USA), Duen Horng Chau (Georgia Tech, USA)

Training deep neural networks for automatic speech recognition (ASR) requires large amounts of transcribed speech. This becomes a bottleneck for training robust models for accented speech which typically contains high variability in pronunciation and other semantics, since obtaining large amounts of annotated accented data is both tedious and costly. Often, we only have access to large amounts of unannotated speech from different accents. In this work, we leverage this unannotated data to provide semantic regularization to an ASR model that has been trained only on one accent, to improve its performance for multiple accents. We propose Accent Pre-Training (Acc-PT), a semi-supervised training strategy that combines transfer learning and adversarial training. Our approach improves the performance of a state-of-the-art ASR model by 33% on average over the baseline across multiple accents, training only on annotated samples from one standard accent, and as little as 105 minutes of unannotated speech from a target accent.

Extending Pronunciation Dictionary with Automatically Detected Word Mispronunciations to Improve PAII's System for Interspeech 2021 Non-Native Child English Close Track ASR Challenge
(3 minutes introduction)

Wei Chu , Peng Chang , Jing Xiao

InterSpeech 2021

Best of Both Worlds: Robust Accented Speech Recognition with Adversarial Transfer Learning
(3 minutes introduction)

Search in Audio

Related Recordings

Zero-shot Cross-Lingual Phonetic Recognition with External Language Embedding
(3 minutes introduction)

Extending Pronunciation Dictionary with Automatically Detected Word Mispronunciations to Improve PAII's System for Interspeech 2021 Non-Native Child English Close Track ASR Challenge
(3 minutes introduction)

InterSpeech 2021

Best of Both Worlds: Robust Accented Speech Recognition with Adversarial Transfer Learning (3 minutes introduction)

Search in Audio

Related Recordings

Zero-shot Cross-Lingual Phonetic Recognition with External Language Embedding (3 minutes introduction)

Extending Pronunciation Dictionary with Automatically Detected Word Mispronunciations to Improve PAII's System for Interspeech 2021 Non-Native Child English Close Track ASR Challenge (3 minutes introduction)

Best of Both Worlds: Robust Accented Speech Recognition with Adversarial Transfer Learning
(3 minutes introduction)

Zero-shot Cross-Lingual Phonetic Recognition with External Language Embedding
(3 minutes introduction)

Extending Pronunciation Dictionary with Automatically Detected Word Mispronunciations to Improve PAII's System for Interspeech 2021 Non-Native Child English Close Track ASR Challenge
(3 minutes introduction)