Variational Auto-Encoder Based Variability Encoding for Dysarthric Speech Recognition <BR>(Oral presentation)

Variational Auto-Encoder Based Variability Encoding for Dysarthric Speech Recognition
(Oral presentation)

Xurong Xie (CAS, China), Rukiye Ruzi (CAS, China), Xunying Liu (CUHK, China), Lan Wang (CAS, China)

Dysarthric speech recognition is a challenging task due to acoustic variability and limited amount of available data. Diverse conditions of dysarthric speakers account for the acoustic variability, which make the variability difficult to be modeled precisely. This paper presents a variational auto-encoder based variability encoder (VAEVE) to explicitly encode such variability for dysarthric speech. The VAEVE makes use of both phoneme information and low-dimensional latent variable to reconstruct the input acoustic features, thereby the latent variable is forced to encode the phoneme-independent variability. Stochastic gradient variational Bayes algorithm is applied to model the distribution for generating variability encodings, which are further used as auxiliary features for DNN acoustic modeling. Experiment results conducted on the UASpeech corpus show that the VAEVE based variability encodings have complementary effect to the learning hidden unit contributions (LHUC) speaker adaptation. The systems using variability encodings consistently outperform the comparable baseline systems without using them, and obtain absolute word error rate (WER) reduction by up to 2.2% on dysarthric speech with “Very low” intelligibility level, and up to 2% on the “Mixed” type of dysarthric speech with diverse or uncertain conditions.

Bayesian Parametric and Architectural Domain Adaptation of LF-MMI Trained TDNNs for Elderly and Dysarthric Speech Recognition
(Oral presentation)

Jiajun Deng , Fabian Ritter Gutierrez , Shoukang Hu , Mengzhe Geng , Xurong Xie , Zi Ye , Shansong Liu , Jianwei Yu , Xunying Liu , Helen Meng

InterSpeech 2021

Variational Auto-Encoder Based Variability Encoding for Dysarthric Speech Recognition
(Oral presentation)

Search in Audio

Related Recordings

Adversarial Data Augmentation for Disordered Speech Recognition
(Oral presentation)

Bayesian Parametric and Architectural Domain Adaptation of LF-MMI Trained TDNNs for Elderly and Dysarthric Speech Recognition
(Oral presentation)

InterSpeech 2021

Variational Auto-Encoder Based Variability Encoding for Dysarthric Speech Recognition (Oral presentation)

Search in Audio

Related Recordings

Adversarial Data Augmentation for Disordered Speech Recognition (Oral presentation)

Bayesian Parametric and Architectural Domain Adaptation of LF-MMI Trained TDNNs for Elderly and Dysarthric Speech Recognition (Oral presentation)

Variational Auto-Encoder Based Variability Encoding for Dysarthric Speech Recognition
(Oral presentation)

Adversarial Data Augmentation for Disordered Speech Recognition
(Oral presentation)

Bayesian Parametric and Architectural Domain Adaptation of LF-MMI Trained TDNNs for Elderly and Dysarthric Speech Recognition
(Oral presentation)