Odyssey 2016

The Speaker and Language Recognition Workshop

Spoofing Detection on the ASVspoof2015 Challenge Corpus Employing Deep Neural Networks

Md Jahangir Alam, Patrick Kenny, Vishwa Gupta, Themos Stafylakis
This paper describes the application of deep neural networks (DNN), trained to discriminate between human and spoofed speech signals, to improve the performance of spoofing detection. In this work we use amplitude, phase, linear prediction residual, and combined amplitude-phase-based acoustic level features. First we train a DNN on the spoofing challenge training data to discriminate between human and spoofed speech signals. Delta filterbank spectra (DFB), delta plus double delta linear prediction cepstral coefficients (DLPCC) and product spectrum-based cepstral coefficients (DPSCC) features are used as inputs to the DNN. For each feature, posteriors and bottleneck features (BNF) are then generated for all the spoofing challenge data using the trained DNN. The DNN posteriors are directly used to decide if a test recording is spoofed or human. For spoofing detection with the acoustic level features and the bottleneck features we build a standard Gaussian Mixture Model (GMM) classifier. When tested on the spoofing attacks (S1-S10) of ASVspoof2015 challenge evaluation corpus, DFB-BNF, DLPCC-BNF, DPSCC-BNF and DPSCC-DNN systems provided equal error rates (EERs) of 0.013%, 0.0%, 0.022%, and 1.00% respectively, on the S1-S9 spoofing attacks. On the all ten spoofing attacks (S1-S10) the EERs obtained by these four systems are 3.23%, 3.3%, 3.28 and 2.18%, respectively.