InterSpeech 2021

Visual Speech for Obstructive Sleep Apnea Detection
(3 minutes introduction)

Catarina Botelho (INESC-ID Lisboa, Portugal), Alberto Abad (INESC-ID Lisboa, Portugal), Tanja Schultz (Universität Bremen, Germany), Isabel Trancoso (INESC-ID Lisboa, Portugal)
Obstructive sleep apnea (OSA) affects almost one billion people worldwide and limits peoples’ quality of life substantially. Furthermore, it is responsible for significant morbidity and mortality associated with hypertension, cardiovascular diseases, work and traffic accidents. Thus, the early detection of OSA can save lives. In our previous work we used speech as biomarker for automatic OSA detection. More recently, we leveraged the fact that OSA patients have anatomical and functional abnormalities of the upper airway and an altered craniofacial morphology, and therefore explore information from facial images for OSA detection. In this work, we propose to combine speech and facial image information to detect OSA from YouTube vlogs. This in-the-wild data poses an inexpensive alternative to standard data collected for medical applications, which is often scarce, imbalanced and costly to acquire. Besides speech and facial images, we propose to include visual speech as a third modality, inspired by the emerging field of silent computational paralinguistics. We hypothesize that embeddings trained from lip reading integrate information on the craniofacial structure, on speech articulation and breathing patterns, thus containing relevant cues for OSA detection. Fusion of the three modalities achieves an accuracy of 82.5% at the speaker level.