SRI-B End-to-End System for Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages <BR>(3 minutes introduction)

SRI-B End-to-End System for Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages
(3 minutes introduction)

Hardik Sailor (Samsung, India), Kiran Praveen T. (Samsung, India), Vikas Agrawal (Samsung, India), Abhinav Jain (Samsung, India), Abhishek Pandey (Samsung, India)

This paper describes SRI-B’s end-to-end Automated Speech Recognition (ASR) system proposed for the subtask-1 on multilingual ASR challenges for Indian languages. Our end-to-end (E2E) ASR model is based on the transformer architecture trained by jointly minimizing Connectionist Temporal Classification (CTC) & Cross-Entropy (CE) losses. A conventional multilingual model which is trained by pooling data from multiple languages helps in terms of generalization, but it comes at the expense of performance degradation compared to their monolingual counterparts. In our experiments, a multilingual model is trained by conditioning the input features using a language-specific embedding vector. These language-specific embedding vectors are obtained by training a language classifier using an attention-based transformer architecture, and then considering its bottleneck features as language identification (LID) embeddings. We further adapt the multilingual system with language specific data to reduce the degradation on specific languages. We propose a novel hypothesis elimination strategy based on LID scores and length-normalized probabilities that optimally select the model from the pool of available models. The experimental results show that the proposed multilingual training and hypothesis elimination strategy gives an average 3.02% of relative word error recognition (WER) improvement for the blind set over the challenge hybrid ASR baseline system.

Search in Audio

Related Recordings

Multilingual and code-switching ASR challenges for low resource Indian languages
(3 minutes introduction)

Anuj Diwan , Rakesh Vaideeswaran , Sanket Shah , Ankita Singh , Srinivasa Raghavan , Shreya Khare , Vinit Unni , Saurabh Vyas , Akash Rajpuria , Chiranjeevi Yarra , Ashish Mittal , Prasanta Kumar Ghosh , Preethi Jyothi , Kalika Bali , Vivek Seshadri , Sunayana Sitaram , Samarth Bharadwaj , Jai Nanavati , Raoul Nanavati , Karthik Sankaranarayanan

Hierarchical Phone Recognition with Compositional Phonetics
(3 minutes introduction)

Xinjian Li , Juncheng Li , Florian Metze , Alan W. Black

InterSpeech 2021

SRI-B End-to-End System for Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages (3 minutes introduction)

Search in Audio

Related Recordings

Multilingual and code-switching ASR challenges for low resource Indian languages (3 minutes introduction)

Hierarchical Phone Recognition with Compositional Phonetics (3 minutes introduction)

SRI-B End-to-End System for Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages
(3 minutes introduction)

Multilingual and code-switching ASR challenges for low resource Indian languages
(3 minutes introduction)

Hierarchical Phone Recognition with Compositional Phonetics
(3 minutes introduction)