VOICE LIVENESS DETECTION FOR SPEAKER VERIFICATION BASED ON A TANDEM SINGLE/DOUBLE-CHANNEL POP NOISE DETECTOR
|Sayaka Shiota, Fernando Villavicencio, Junichi Yamagishi, Nobutaka Ono, Isao Echizen, Tomoko Matsui|
This paper presents an algorithm for detecting spoofing attacks against automatic speaker verification (ASV) systems. While such systems now have performances comparable to those of other biometric modalities, spoofing techniques used against them have progressed drastically. Several techniques can be used to generate spoofing materials (e.g., speech synthesis and voice conversion techniques), and detecting them only on the basis of differences at an acoustic speaker modeling level is a challenging task. Moreover, differences between live and artificially generated material are expected to gradually decrease in the near future due to advances in synthesis technologies. A previously proposed voice liveness detection framework aimed at validating whether speech signals were generated by a person or artificially created uses elementary algorithms to detect pop noise. Detection is taken as evidence of liveness. A more advanced detection algorithm has now been developed that combines single- and double-channel pop noise detection. Experiments demonstrated that this tandem algorithm detects pop noise more effectively: the detection error rate was up to 80% less that those achieved with the elementary algorithms.