InterSpeech 2021

Cross-database replay detection in terminal-dependent speaker verification
(3 minutes introduction)

Xingliang Cheng (Tsinghua University, China), Mingxing Xu (Tsinghua University, China), Thomas Fang Zheng (Tsinghua University, China)
The vulnerability of automatic speaker verification (ASV) systems against replay attacks becomes a severe problem. Although various methods have been proposed for replay detection, the generalization capability is still limited. For instance, a detection model trained on one database may fully fail when tested on another database. In this paper, we adopt the one-class learning technology to address the cross-database problem. Different from conventional two-class models that discriminate genuine speeches from replay attacks, the one-class model focuses on the within-class variance of genuine speeches, which is naturally robust to unseen attacks. In this study, we choose the Gaussian mixture model (GMM) as the one-class model and design two utterance-level features which reduce the uncertainties of genuine class while still be distinguishable from non-genuine class. Experiments conducted on three public replay datasets show that, compared to the state-of-the-art methods, the proposed method demonstrates promising generalization capability under cross-database scenarios.