Variational Information Bottleneck based Regularization for Speaker Recognition <BR>(3 minutes introduction)

Variational Information Bottleneck based Regularization for Speaker Recognition
(3 minutes introduction)

Dan Wang (WHUT, China), Yuanjie Dong (WHUT, China), Yaxing Li (WHUT, China), Yunfei Zi (WHUT, China), Zhihui Zhang (WHUT, China), Xiaoqi Li (WHUT, China), Shengwu Xiong (WHUT, China)

Speaker recognition (SR) is inevitably affected by noise in real-life scenarios, resulting in decreased recognition accuracy. In this paper, we introduce a novel regularization method, variable information bottleneck (VIB), in speaker recognition to extract robust speaker embeddings. VIB prompts the neural network to ignore as much speaker-identity irrelevant information as possible. We also propose a more effective network, VovNet with an ultra-lightweight subspace attention module (ULSAM), as a feature extractor. ULSAM infers different attention maps for each feature map subspace, enabling efficient learning of cross-channel information along with multi-scale and multi-frequency feature representation. The experimental results demonstrate that our proposed framework outperforms the ResNet-based baseline by 11.4% in terms of equal error rate (EER). The VIB regularization method gives a further performance boost with an 18.9% EER decrease.

Loading player

SpeakerStew: Scaling to Many Languages with a Triaged Multilingual Text-Dependent and Text-Independent Speaker Verification System
(3 minutes introduction)

Roza Chojnacka , Jason Pelecanos , Quan Wang , Ignacio Lopez Moreno

InterSpeech 2021

Variational Information Bottleneck based Regularization for Speaker Recognition
(3 minutes introduction)

Search in Audio

Related Recordings

Unsupervised Bayesian Adaptation of PLDA for Speaker Verification
(3 minutes introduction)

SpeakerStew: Scaling to Many Languages with a Triaged Multilingual Text-Dependent and Text-Independent Speaker Verification System
(3 minutes introduction)

InterSpeech 2021

Variational Information Bottleneck based Regularization for Speaker Recognition (3 minutes introduction)

Search in Audio

Related Recordings

Unsupervised Bayesian Adaptation of PLDA for Speaker Verification (3 minutes introduction)

SpeakerStew: Scaling to Many Languages with a Triaged Multilingual Text-Dependent and Text-Independent Speaker Verification System (3 minutes introduction)

Variational Information Bottleneck based Regularization for Speaker Recognition
(3 minutes introduction)

Unsupervised Bayesian Adaptation of PLDA for Speaker Verification
(3 minutes introduction)

SpeakerStew: Scaling to Many Languages with a Triaged Multilingual Text-Dependent and Text-Independent Speaker Verification System
(3 minutes introduction)