|Pascal Hecker (audEERING, Germany), Florian B. Pokorny (Universität Augsburg, Germany), Katrin D. Bartl-Pokorny (Universität Augsburg, Germany), Uwe Reichel (audEERING, Germany), Zhao Ren (Universität Augsburg, Germany), Simone Hantke (audEERING, Germany), Florian Eyben (audEERING, Germany), Dagmar M. Schuller (audEERING, Germany), Bert Arnrich (Universität Potsdam, Germany), Björn W. Schuller (audEERING, Germany)|
With the COVID-19 pandemic, several research teams have reported successful advances in automated recognition of COVID-19 by voice. Resulting voice-based screening tools for COVID-19 could support large-scale testing efforts. While capabilities of machines on this task are progressing, we approach the so far unexplored aspect whether human raters can distinguish COVID-19 positive and negative tested speakers from voice samples, and compare their performance to a machine learning baseline. To account for the challenging symptom similarity between COVID-19 and other respiratory diseases, we use a carefully balanced dataset of voice samples, in which COVID-19 positive and negative tested speakers are matched by their symptoms alongside COVID-19 negative speakers without symptoms. Both human raters and the machine struggle to reliably identify COVID-19 positive speakers in our dataset. These results indicate that particular attention should be paid to the distribution of symptoms across all speakers of a dataset when assessing the capabilities of existing systems. The identification of acoustic aspects of COVID-19-related symptom manifestations might be the key for a reliable voice-based COVID-19 detection in the future by both trained human raters and machine learning models.