|Xu Li, Jinghua Zhong, Jianwei Yu, Shoukang Hu, Xixin Wu, Xunying Liu, Helen Meng|
Speaker verification systems usually suffer from the mismatch problem between the training and evaluation data, such as the speaker population mismatch, the channel and environment variations. In order to address this issue, it requires the system to have good generalization ability on the unseen data. In this work, we incorporate Bayesian neural network (BNN) into deep neural network (DNN) x-vector speaker verification system to improve the system’s generalization ability. With the weight uncertainty modeling provided by BNN, we expect the system could generalize better on the evaluation data and make verification decisions more precisely. Our experiment results indicate that DNN x-vector system could benefit from BNN especially when the mismatch problem is severe in the out-of-domain evaluation. Specifically, results show that the system could benefit from BNN by a relative EER decrease of 2.66% and 2.32% respectively for short- and long-utterance in-domain evaluation. Additionally, the fusion of DNN x-vector and Bayesian x-vector systems could achieve further improvement. Moreover, the evaluation conducted with a larger mismatch, i.e. NIST SRE10 core test in the out-of-domain evaluation, suggests that BNN could bring a larger relative EER decrease of around 4.69%.