Zero-shot Cross-Lingual Phonetic Recognition with External Language Embedding <BR>(3 minutes introduction)

Zero-shot Cross-Lingual Phonetic Recognition with External Language Embedding
(3 minutes introduction)

Heting Gao (University of Illinois at Urbana-Champaign, USA), Junrui Ni (University of Illinois at Urbana-Champaign, USA), Yang Zhang (MIT-IBM Watson AI Lab, USA), Kaizhi Qian (MIT-IBM Watson AI Lab, USA), Shiyu Chang (MIT-IBM Watson AI Lab, USA), Mark Hasegawa-Johnson (University of Illinois at Urbana-Champaign, USA)

Many existing languages are too sparsely resourced for monolingual deep learning networks to achieve high accuracy. Multilingual phonetic recognition systems mitigate data sparsity issues by training models on data from multiple languages and learning a speech-to-phone or speech-to-text model universal to all languages. However, despite their good performance on the seen training languages, multilingual systems have poor performance on unseen languages. This paper argues that in the real world, even an unseen language has metadata: linguists can tell us the language name, its language family and, usually, its phoneme inventory. Even with no transcribed speech, it is possible to train a language embedding using only data from language typologies (phylogenetic node and phoneme inventory) that reduces ASR error rates. Experiments on a 20-language corpus show that our methods achieve phonetic token error rate (PTER) reduction on all the unseen test languages. An ablation study shows that using the wrong language embedding usually harms PTER if the two languages are from different language families. However, even the wrong language embedding often improves PTER if the language embedding belongs to another member of the same language family.

Search in Audio

Related Recordings

The TAL system for the INTERSPEECH2021 Shared Task on Automatic Speech Recognition for Non-Native Children’s Speech
(longer introduction)

Gaopeng Xu , Song Yang , Lu Ma , Chengfei Li , Zhongqin Wu

Best of Both Worlds: Robust Accented Speech Recognition with Adversarial Transfer Learning
(3 minutes introduction)

Nilaksh Das , Sravan Bodapati , Monica Sunkara , Sundararajan Srinivasan , Duen Horng Chau

InterSpeech 2021

Zero-shot Cross-Lingual Phonetic Recognition with External Language Embedding (3 minutes introduction)

Search in Audio

Related Recordings

The TAL system for the INTERSPEECH2021 Shared Task on Automatic Speech Recognition for Non-Native Children’s Speech (longer introduction)

Best of Both Worlds: Robust Accented Speech Recognition with Adversarial Transfer Learning (3 minutes introduction)

Zero-shot Cross-Lingual Phonetic Recognition with External Language Embedding
(3 minutes introduction)

The TAL system for the INTERSPEECH2021 Shared Task on Automatic Speech Recognition for Non-Native Children’s Speech
(longer introduction)

Best of Both Worlds: Robust Accented Speech Recognition with Adversarial Transfer Learning
(3 minutes introduction)