InterSpeech 2021

Ensemble-within-ensemble classification for escalation prediction from speech
(Oral presentation)

Oxana Verkholyak (RAS, Russia), Denis Dresvyanskiy (Universität Ulm, Germany), Anastasia Dvoynikova (RAS, Russia), Denis Kotov (Universität Ulm, Germany), Elena Ryumina (RAS, Russia), Alena Velichko (RAS, Russia), Danila Mamontov (Universität Ulm, Germany), Wolfgang Minker (Universität Ulm, Germany), Alexey Karpov (RAS, Russia)
Conflict situations arise frequently in our daily life and often require timely response to resolve the issues. In order to automatically classify conflict (also referred to as escalation) speech utterances we propose ensemble learning as it improves prediction performance by combining several heterogeneous models that compensate for each other’s weaknesses. However, the effectiveness of the classification ensemble greatly depends on its constituents and their fusion strategy. This paper provides experimental evidence for effectiveness of different prediction-level fusion strategies and demonstrates the performance of each proposed ensemble on the Escalation Sub-Challenge (ESS) in the framework of the Computational Paralinguistics Challenge (ComParE-2021). The ensembles comprise various machine learning approaches based on acoustic and linguistic characteristics of speech. The training strategy is specifically designed to increase the generalization performance on the unseen data, while the diverse nature of ensemble candidates ensures high prediction power and accurate classification.