Variational Information Bottleneck based Regularization for Speaker Recognition
(3 minutes introduction)![https://www.isca-speech.org/archive/interspeech_2021/wang21j_interspeech.html](/images/interspeech/full-paper-isca.png)
Dan Wang (WHUT, China), Yuanjie Dong (WHUT, China), Yaxing Li (WHUT, China), Yunfei Zi (WHUT, China), Zhihui Zhang (WHUT, China), Xiaoqi Li (WHUT, China), Shengwu Xiong (WHUT, China) |
---|
Speaker recognition (SR) is inevitably affected by noise in real-life scenarios, resulting in decreased recognition accuracy. In this paper, we introduce a novel regularization method, variable information bottleneck (VIB), in speaker recognition to extract robust speaker embeddings. VIB prompts the neural network to ignore as much speaker-identity irrelevant information as possible. We also propose a more effective network, VovNet with an ultra-lightweight subspace attention module (ULSAM), as a feature extractor. ULSAM infers different attention maps for each feature map subspace, enabling efficient learning of cross-channel information along with multi-scale and multi-frequency feature representation. The experimental results demonstrate that our proposed framework outperforms the ResNet-based baseline by 11.4% in terms of equal error rate (EER). The VIB regularization method gives a further performance boost with an 18.9% EER decrease.