InterSpeech 2021

Annotation Confidence vs. Training Sample Size: Trade-off Solution for Partially-Continuous Categorical Emotion Recognition
(3 minutes introduction)

Elena Ryumina (RAS, Russia), Oxana Verkholyak (RAS, Russia), Alexey Karpov (RAS, Russia)
Commonly adapted design of emotional corpora includes multiple annotations for the same instance from several annotators. Most of the previous studies assume the ground truth to be an average between all labels or the most frequently used label. Current study shows that this approach may not be optimal for training. By filtering training data according to the level of annotation agreement, it is possible to increase the performance of the system even on unreliable test samples. However, increasing the annotation confidence inevitably leads to a loss of data. Therefore, balancing the trade-off between annotation quality and sample size requires careful investigation. This study presents experimental findings of audio-visual emotion classification on a recently introduced RAMAS dataset, which contains rich categorical partially-continuous annotation for 6 basic emotions, and reveals important conclusions about optimal formulation of ground truth. By applying the proposed approach, it is possible to achieve classification accuracy of UAR=70.51% on the speech utterances with more than 60% agreement, which surpasses previously reported values on this corpus in the literature.