InterSpeech 2021

Graph-based Label Propagation for Semi-Supervised Speaker Identification
(3 minutes introduction)

Long Chen (Amazon, USA), Venkatesh Ravichandran (Amazon, USA), Andreas Stolcke (Amazon, USA)
Speaker identification in the household scenario (e.g., for smart speakers) is typically based on only a few enrollment utterances but a much larger set of unlabeled data, suggesting semi-supervised learning to improve speaker profiles. We propose a graph-based semi-supervised learning approach for speaker identification in the household scenario, to leverage the unlabeled speech samples. In contrast to most of the works in speaker recognition that focus on speaker-discriminative embeddings, this work focuses on speaker label inference (scoring). Given a pre-trained embedding extractor, graph-based learning allows us to integrate information about both labeled and unlabeled utterances. Considering each utterance as a graph node, we represent pairwise utterance similarity scores as edge weights. Graphs are constructed per household, and speaker identities are propagated to unlabeled nodes to optimize a global consistency criterion. We show in experiments on the VoxCeleb dataset that this approach makes effective use of unlabeled data and improves speaker identification accuracy compared to two state-of-the-art scoring methods as well as their semi-supervised variants based on pseudo-labels.