Systems for Low-Resource Speech Recognition Tasks in Open Automatic Speech Recognition and Formosa Speech Recognition Challenges <BR>(Oral presentation)

Systems for Low-Resource Speech Recognition Tasks in Open Automatic Speech Recognition and Formosa Speech Recognition Challenges
(Oral presentation)

Hung-Pang Lin (National Sun Yat-sen University, Taiwan), Yu-Jia Zhang (National Sun Yat-sen University, Taiwan), Chia-Ping Chen (National Sun Yat-sen University, Taiwan)

We, in the team name of NSYSU-MITLab, have participated in low-resource speech recognition of the Open Automatic Speech Recognition Challenge 2020 (OpenASR20) and Formosa Speech Recognition Challenge 2020 (FSR-2020). For the tasks in the challenges, we build and compare end-to-end (E2E) systems and Deep Neural Network Hidden Markov Model (DNN-HMM) systems. In E2E systems, we implement an encoder with Conformer architecture and a decoder with Transformer architecture. In addition, a speaker classifier with a gradient reversal layer is included in the training phase to improve the robustness to speaker variation. In DNN-HMM systems, we implement the Time-Restricted Self-Attention and Factorized Time Delay Neural Networks for the DNN front-end acoustic representation learning. In OpenASR20, the best word error rates we achieved are 61.45% for Cantonese and 74.61% for Vietnamese. In FSR-2020, the best character error rate we achieved is 43.4% for Taiwanese Southern Min Recommended Characters and the best syllable error rate is 25.4% for Taiwan Minnanyu Luomazi Pinyin.

InterSpeech 2021

Systems for Low-Resource Speech Recognition Tasks in Open Automatic Speech Recognition and Formosa Speech Recognition Challenges
(Oral presentation)

Search in Audio

Related Recordings

The TNT Team System Descriptions of Cantonese and Mongolian for IARPA OpenASR20
(Oral presentation)

Combining Hybrid and End-to-end Approaches for the OpenASR20 Challenge
(Oral presentation)

InterSpeech 2021

Systems for Low-Resource Speech Recognition Tasks in Open Automatic Speech Recognition and Formosa Speech Recognition Challenges (Oral presentation)

Search in Audio

Related Recordings

The TNT Team System Descriptions of Cantonese and Mongolian for IARPA OpenASR20 (Oral presentation)

Combining Hybrid and End-to-end Approaches for the OpenASR20 Challenge (Oral presentation)

Systems for Low-Resource Speech Recognition Tasks in Open Automatic Speech Recognition and Formosa Speech Recognition Challenges
(Oral presentation)

The TNT Team System Descriptions of Cantonese and Mongolian for IARPA OpenASR20
(Oral presentation)

Combining Hybrid and End-to-end Approaches for the OpenASR20 Challenge
(Oral presentation)