Spoofing Detection on the ASVspoof2015 Challenge Corpus Employing Deep Neural Networks

Md Jahangir Alam, Patrick Kenny, Vishwa Gupta, Themos Stafylakis

This paper describes the application of deep neural networks (DNN), trained to discriminate between human and spoofed speech signals, to improve the performance of spoofing detection. In this work we use amplitude, phase, linear prediction residual, and combined amplitude-phase-based acoustic level features. First we train a DNN on the spoofing challenge training data to discriminate between human and spoofed speech signals. Delta filterbank spectra (DFB), delta plus double delta linear prediction cepstral coefficients (DLPCC) and product spectrum-based cepstral coefficients (DPSCC) features are used as inputs to the DNN. For each feature, posteriors and bottleneck features (BNF) are then generated for all the spoofing challenge data using the trained DNN. The DNN posteriors are directly used to decide if a test recording is spoofed or human. For spoofing detection with the acoustic level features and the bottleneck features we build a standard Gaussian Mixture Model (GMM) classifier. When tested on the spoofing attacks (S1-S10) of ASVspoof2015 challenge evaluation corpus, DFB-BNF, DLPCC-BNF, DPSCC-BNF and DPSCC-DNN systems provided equal error rates (EERs) of 0.013%, 0.0%, 0.022%, and 1.00% respectively, on the S1-S9 spoofing attacks. On the all ten spoofing attacks (S1-S10) the EERs obtained by these four systems are 3.23%, 3.3%, 3.28 and 2.18%, respectively.

Switch Camera

Odyssey 2016

The Speaker and Language Recognition Workshop

Spoofing Detection on the ASVspoof2015 Challenge Corpus Employing Deep Neural Networks

Search in Audio

Speech Transcript

Related Recordings

A PLDA Approach for Language and Text Independent Speaker Recognition

Age-Related Voice Disguise and its Impact on Speaker Verification Accuracy