InterSpeech 2021

Variational Information Bottleneck based Regularization for Speaker Recognition
(3 minutes introduction)

Dan Wang (WHUT, China), Yuanjie Dong (WHUT, China), Yaxing Li (WHUT, China), Yunfei Zi (WHUT, China), Zhihui Zhang (WHUT, China), Xiaoqi Li (WHUT, China), Shengwu Xiong (WHUT, China)
Speaker recognition (SR) is inevitably affected by noise in real-life scenarios, resulting in decreased recognition accuracy. In this paper, we introduce a novel regularization method, variable information bottleneck (VIB), in speaker recognition to extract robust speaker embeddings. VIB prompts the neural network to ignore as much speaker-identity irrelevant information as possible. We also propose a more effective network, VovNet with an ultra-lightweight subspace attention module (ULSAM), as a feature extractor. ULSAM infers different attention maps for each feature map subspace, enabling efficient learning of cross-channel information along with multi-scale and multi-frequency feature representation. The experimental results demonstrate that our proposed framework outperforms the ResNet-based baseline by 11.4% in terms of equal error rate (EER). The VIB regularization method gives a further performance boost with an 18.9% EER decrease.