InterSpeech 2021

Handling acoustic variation in dysarthric speech recognition systems through model combination
(Oral presentation)

Enno Hermann (Idiap Research Institute, Switzerland), Mathew Magimai-Doss (Idiap Research Institute, Switzerland)
Developing automatic speech recognition (ASR) systems that recognise dysarthric speech as well as control speech from unimpaired speakers remains challenging. Including more highly variable dysarthric speech during training can also negatively affect the performance on control speakers, which is not desirable when developing speech recognisers for a wider audience. In this work, we analyse how the acoustic variability of dysarthric speech affects ASR systems and propose the combination of multiple acoustic models trained on different subsets of speakers to mitigate this effect. This approach shows improvements for both dysarthric and control speakers on the Torgo and UA-Speech corpora.