InterSpeech 2021

Assessing Posterior-Based Mispronunciation Detection on Field-Collected Recordings from Child Speech Therapy Sessions
(3 minutes introduction)

Adam Hair (Texas A&M University, USA), Guanlong Zhao (Texas A&M University, USA), Beena Ahmed (UNSW Sydney, Australia), Kirrie J. Ballard (University of Sydney, Australia), Ricardo Gutierrez-Osuna (Texas A&M University, USA)
A critical component of child speech therapy is home practice with a caregiver, who can provide feedback. However, caregivers oftentimes struggle with accurately rating speech and with perceiving pronunciation errors. One potential solution for this issue is to embed automatic mispronunciation-detection (MPD) algorithms within digital speech therapy applications. To address the need for MPD within child speech therapy, we investigated posterior-based mispronunciation detection using a custom corpus of disordered speech from children that had been manually annotated by an expert clinician. Namely, we trained a family of phoneme-specific logistic regression classifiers (LRC) and support vector machines (SVM) on log posterior probability and log posterior ratio features. Our results show that these classifiers outperformed baseline Goodness of Pronunciation scoring by 11% and 10%, respectively. Even more importantly, in an offline test, the LRC and SVM classifiers outperformed student clinicians at identifying mispronunciations by 18% and 16%, respectively. These results suggest that posterior-based mispronunciation detection may be suitable to provide at-home therapy feedback for children.