InterSpeech 2021

Automatic Error Correction for Speaker Embedding Learning with Noisy Labels
(3 minutes introduction)

Fuchuan Tong (Xiamen University, China), Yan Liu (Xiamen University, China), Song Li (Xiamen University, China), Jie Wang (Xiamen University, China), Lin Li (Xiamen University, China), Qingyang Hong (Xiamen University, China)
Despite the superior performance deep neural networks have achieved in speaker verification tasks, much of their success benefits from the availability of large-scale and carefully labeled datasets. However, noisy labels often occur during data collection. In this paper, we propose an automatic error correction method for deep speaker embedding learning with noisy labels. Specifically, a label noise correction loss is proposed that leverages a model’s generalization capability to correct noisy labels during training. In addition, we improve the vanilla AM-Softmax to estimate a more robust speaker posterior by introducing sub-centers. When applied on the VoxCeleb dataset, the proposed method performs gracefully when noisy labels are introduced. Moreover, when combining with the Bayesian estimation of PLDA with noisy training labels at the back-end, the whole system performs better under conditions in which noisy labels are present.