Graph-based Label Propagation for Semi-Supervised Speaker Identification <BR>(3 minutes introduction)

Graph-based Label Propagation for Semi-Supervised Speaker Identification
(3 minutes introduction)

Long Chen (Amazon, USA), Venkatesh Ravichandran (Amazon, USA), Andreas Stolcke (Amazon, USA)

Speaker identification in the household scenario (e.g., for smart speakers) is typically based on only a few enrollment utterances but a much larger set of unlabeled data, suggesting semi-supervised learning to improve speaker profiles. We propose a graph-based semi-supervised learning approach for speaker identification in the household scenario, to leverage the unlabeled speech samples. In contrast to most of the works in speaker recognition that focus on speaker-discriminative embeddings, this work focuses on speaker label inference (scoring). Given a pre-trained embedding extractor, graph-based learning allows us to integrate information about both labeled and unlabeled utterances. Considering each utterance as a graph node, we represent pairwise utterance similarity scores as edge weights. Graphs are constructed per household, and speaker identities are propagated to unlabeled nodes to optimize a global consistency criterion. We show in experiments on the VoxCeleb dataset that this approach makes effective use of unlabeled data and improves speaker identification accuracy compared to two state-of-the-art scoring methods as well as their semi-supervised variants based on pseudo-labels.

Search in Audio

Related Recordings

Fusion of Embeddings Networks for Robust Combination of Text Dependent and Independent Speaker Recognition
(3 minutes introduction)

Ruirui Li , Chelsea J.-T. Ju , Zeya Chen , Hongda Mao , Oguz Elibol , Andreas Stolcke

Dr-Vectors: Decision Residual Networks and an Improved Loss for Speaker Recognition
(3 minutes introduction)

Jason Pelecanos , Quan Wang , Ignacio Lopez Moreno

InterSpeech 2021

Graph-based Label Propagation for Semi-Supervised Speaker Identification (3 minutes introduction)

Search in Audio

Related Recordings

Fusion of Embeddings Networks for Robust Combination of Text Dependent and Independent Speaker Recognition (3 minutes introduction)

Dr-Vectors: Decision Residual Networks and an Improved Loss for Speaker Recognition (3 minutes introduction)

Graph-based Label Propagation for Semi-Supervised Speaker Identification
(3 minutes introduction)

Fusion of Embeddings Networks for Robust Combination of Text Dependent and Independent Speaker Recognition
(3 minutes introduction)

Dr-Vectors: Decision Residual Networks and an Improved Loss for Speaker Recognition
(3 minutes introduction)