Annotation Confidence vs. Training Sample Size: Trade-off Solution for Partially-Continuous Categorical Emotion Recognition <BR>(3 minutes introduction)

Annotation Confidence vs. Training Sample Size: Trade-off Solution for Partially-Continuous Categorical Emotion Recognition
(3 minutes introduction)

Elena Ryumina (RAS, Russia), Oxana Verkholyak (RAS, Russia), Alexey Karpov (RAS, Russia)

Commonly adapted design of emotional corpora includes multiple annotations for the same instance from several annotators. Most of the previous studies assume the ground truth to be an average between all labels or the most frequently used label. Current study shows that this approach may not be optimal for training. By filtering training data according to the level of annotation agreement, it is possible to increase the performance of the system even on unreliable test samples. However, increasing the annotation confidence inevitably leads to a loss of data. Therefore, balancing the trade-off between annotation quality and sample size requires careful investigation. This study presents experimental findings of audio-visual emotion classification on a recently introduced RAMAS dataset, which contains rich categorical partially-continuous annotation for 6 basic emotions, and reveals important conclusions about optimal formulation of ground truth. By applying the proposed approach, it is possible to achieve classification accuracy of UAR=70.51% on the speech utterances with more than 60% agreement, which surpasses previously reported values on this corpus in the literature.

Search in Audio

Related Recordings

Human-in-the-Loop Efficiency Analysis for Binary Classification in Edyson
(3 minutes introduction)

Per Fallgren , Jens Edlund

Europarl-ASR: A Large Corpus of Parliamentary Debates for Streaming ASR Benchmarking and Speech Data Filtering/Verbatimization
(3 minutes introduction)

Gonçal V. Garcés Díaz-Munío , Joan-Albert Silvestre-Cerdà , Javier Jorge , Adrià Giménez Pastor , Javier Iranzo-Sánchez , Pau Baquero-Arnal , Nahuel Roselló , Alejandro Pérez-González-de-Martos , Jorge Civera , Albert Sanchis , Alfons Juan

InterSpeech 2021

Annotation Confidence vs. Training Sample Size: Trade-off Solution for Partially-Continuous Categorical Emotion Recognition (3 minutes introduction)

Search in Audio

Related Recordings

Human-in-the-Loop Efficiency Analysis for Binary Classification in Edyson (3 minutes introduction)

Europarl-ASR: A Large Corpus of Parliamentary Debates for Streaming ASR Benchmarking and Speech Data Filtering/Verbatimization (3 minutes introduction)

Annotation Confidence vs. Training Sample Size: Trade-off Solution for Partially-Continuous Categorical Emotion Recognition
(3 minutes introduction)

Human-in-the-Loop Efficiency Analysis for Binary Classification in Edyson
(3 minutes introduction)

Europarl-ASR: A Large Corpus of Parliamentary Debates for Streaming ASR Benchmarking and Speech Data Filtering/Verbatimization
(3 minutes introduction)