InterSpeech 2021

Source and Vocal Tract Cues for Speech-based Classification of Patients with Parkinson’s Disease and Healthy Subjects
(longer introduction)

Tanuka Bhattacharjee (Indian Institute of Science, India), Jhansi Mallela (Indian Institute of Science, India), Yamini Belur (NIMHANS, India), Nalini Atchayaram (NIMHANS, India), Ravi Yadav (NIMHANS, India), Pradeep Reddy (NIMHANS, India), Dipanjan Gope (Indian Institute of Science, India), Prasanta Kumar Ghosh (Indian Institute of Science, India)
Parkinson’s disease (PD) affects both source and vocal tract components of speech. Various speech cues explored in literature for automatic classification of individuals with PD and healthy controls (HC) implicitly carry information about both these components. This work explicitly analyzes the contribution of source and vocal tract attributes toward automatic PD vs. HC classification, which has not been done earlier to the best of our knowledge. Here fundamental frequency (fo) is used to capture source information. For quantifying vocal tract information, speech waveforms are converted to unvoiced forms and mel-frequency cepstral coefficients (MFCC), denoted by voicing-removed MFCC, are obtained from them. Experimental results suggest that (1) the relative merit of source and vocal tract cues in classifying PD vs. HC largely depends on the speech task being considered, (2) both cues complement each other across all tasks, (3) while MFCC encodes both source and vocal tract features, source information captured by fo is different and further complements MFCC when the classifiers are trained and tested under clean or matched noise conditions, thereby enabling the feature-level fusion of fo and MFCC to achieve the best classification accuracy, (4) under unseen noise conditions, fo alone proves to be a highly noise-robust feature.