Automatic Speech Recognition of Disordered Speech: Personalized models outperforming human listeners on short phrases <BR>(Oral presentation)

Automatic Speech Recognition of Disordered Speech: Personalized models outperforming human listeners on short phrases
(Oral presentation)

Jordan R. Green (MGH Institute of Health Professions, USA), Robert L. MacDonald (Google, USA), Pan-Pan Jiang (Google, USA), Julie Cattiau (Google, USA), Rus Heywood (Google, USA), Richard Cave (MND Association, UK), Katie Seaver (MGH Institute of Health Professions, USA), Marilyn A. Ladewig (Cerebral Palsy Associations of New York State, USA), Jimmy Tobin (Google, USA), Michael P. Brenner (Google, USA), Philip C. Nelson (Google, USA), Katrin Tomanek (Google, USA)

This study evaluated the accuracy of personalized automatic speech recognition (ASR) for recognizing disordered speech from a large cohort of individuals with a wide range of underlying etiologies using an open vocabulary. The performance of these models was benchmarked relative to that of expert human transcribers and two different speaker-independent ASR models trained on typical speech. 432 individuals with self-reported disordered speech recorded at least 300 short phrases using a web-based application. Word error rates (WERs) were estimated for three different ASR models and for human transcribers. Metadata were collected to evaluate the potential impact of participants, atypical speech characteristics, and technical factors on recognition accuracy. Personalized models outperformed human transcribers with median and max recognition accuracy gains of 9% and 80%, respectively. The accuracies of personalized models were high (median WER: 4.6%) and better than those of speaker-independent models (median WER: 31%). The most significant improvements were for the most severely affected speakers. Low signal-to-noise ratio and fewer training utterances were associated with poor word recognition, even for speakers with mild speech impairments. Our results demonstrate the efficacy of personalized ASR models in recognizing a wide range of speech impairments and severities and using an open vocabulary.

Loading player

Search in Audio

Related Recordings

Investigating the Utility of Multimodal Conversational Technology and Audiovisual Analytic Measures for the Assessment and Monitoring of Amyotrophic Lateral Sclerosis at Scale
(Oral presentation)

Michael Neumann , Oliver Roesler , Jackson Liscombe , Hardik Kothare , David Suendermann-Oeft , David Pautler , Indu Navar , Aria Anvar , Jochen Kumm , Raquel Norel , Ernest Fraenkel , Alexander V. Sherman , James D. Berry , Gary L. Pattee , Jun Wang , Jordan R. Green , Vikram Ramanarayanan

Handling acoustic variation in dysarthric speech recognition systems through model combination
(Oral presentation)

Enno Hermann , Mathew Magimai-Doss

InterSpeech 2021

Automatic Speech Recognition of Disordered Speech: Personalized models outperforming human listeners on short phrases (Oral presentation)

Search in Audio

Related Recordings

Investigating the Utility of Multimodal Conversational Technology and Audiovisual Analytic Measures for the Assessment and Monitoring of Amyotrophic Lateral Sclerosis at Scale (Oral presentation)

Handling acoustic variation in dysarthric speech recognition systems through model combination (Oral presentation)

Automatic Speech Recognition of Disordered Speech: Personalized models outperforming human listeners on short phrases
(Oral presentation)

Investigating the Utility of Multimodal Conversational Technology and Audiovisual Analytic Measures for the Assessment and Monitoring of Amyotrophic Lateral Sclerosis at Scale
(Oral presentation)

Handling acoustic variation in dysarthric speech recognition systems through model combination
(Oral presentation)