InterSpeech 2021

The TNT Team System Descriptions of Cantonese and Mongolian for IARPA OpenASR20
(Oral presentation)

Jing Zhao (Tsinghua University, China), Zhiqiang Lv (Tencent, China), Ambyera Han (Tencent, China), Guan-Bo Wang (Tsinghua University, China), Guixin Shi (Tsinghua University, China), Jian Kang (Tencent, China), Jinghao Yan (Tencent, China), Pengfei Hu (Tencent, China), Shen Huang (Tencent, China), Wei-Qiang Zhang (Tsinghua University, China)
This paper presents our work for OpenASR20 Challenge. We describe our Automatic Speech Recognition (ASR) systems for Cantonese and Mongolian under both constrained and unconstrained conditions. For constrained condition, a hybrid NN-HMM ASR system play the main role, while for unconstrained condition, an end-to-end ASR system outperforms traditional hybrid systems significantly due to adequate training data. Besides, we adapt to the challenging PSTN conditions using publicly available wideband dictated speech with similar accent, respectively for the two languages. Furthermore, data cleanup, language tailored features, multi-band training, data augmentation, pre-training and system fusions are incorporated. Our submitted systems have achieved excellent performances for the two conditions.