Speaker normalization using Joint Variational Autoencoder <BR>(3 minutes introduction)

Speaker normalization using Joint Variational Autoencoder
(3 minutes introduction)

Shashi Kumar (Samsung, India), Shakti P. Rath (Reverie Language Technologies, India), Abhishek Pandey (Samsung, India)

Speaker adaptation is known to provide significant improvement in speech recognition accuracy. However, in practical scenario, only a few seconds of audio is available due to which it may be infeasible to apply speaker adaptation methods such as i-vector and fMLLR robustly. Also, decoding with fMLLR transformation happens in two-passes which is impractical for real-time applications. In recent past, mapping speech features from speaker independent (SI) space to fMLLR normalized space using denoising autoencoder (DA) has been explored. To the best of our knowledge, such mapping generally does not yield consistent improvement. In this paper, we show that our proposed joint VAE based mapping achieves a large improvements over ASR models trained using filterbank SI features. We also show that joint VAE outperforms DA by a large margin. We observe a relative improvement of 17% in word error rate (WER) compared to ASR model trained using filterbank features with i-vectors and 23% without i-vectors.

Search in Audio

Related Recordings

Low Resource German ASR with Untranscribed Data Spoken by Non-native Children - INTERSPEECH 2021 Shared Task SPAPL System
(longer introduction)

Jinhan Wang , Yunzheng Zhu , Ruchao Fan , Wei Chu , Abeer Alwan

The TAL system for the INTERSPEECH2021 Shared Task on Automatic Speech Recognition for Non-Native Children’s Speech
(3 minutes introduction)

Gaopeng Xu , Song Yang , Lu Ma , Chengfei Li , Zhongqin Wu

InterSpeech 2021

Speaker normalization using Joint Variational Autoencoder (3 minutes introduction)

Search in Audio

Related Recordings

Low Resource German ASR with Untranscribed Data Spoken by Non-native Children - INTERSPEECH 2021 Shared Task SPAPL System (longer introduction)

The TAL system for the INTERSPEECH2021 Shared Task on Automatic Speech Recognition for Non-Native Children’s Speech (3 minutes introduction)

Speaker normalization using Joint Variational Autoencoder
(3 minutes introduction)

Low Resource German ASR with Untranscribed Data Spoken by Non-native Children - INTERSPEECH 2021 Shared Task SPAPL System
(longer introduction)

The TAL system for the INTERSPEECH2021 Shared Task on Automatic Speech Recognition for Non-Native Children’s Speech
(3 minutes introduction)