Odyssey 2020

The Speaker and Language Recognition Workshop

Phase Spectrum of Time-flipped Speech Signals for Robust Spoofing Detection

Sung-Hyun Yoon, Min-Sung Koh, Ha-Jin Yu
In spoofing detection, it is important to capture the attributes related to spoofing attacks from a speech signal. A speech signal has various information such as the speaker, phrase, and environment. When the time sequence of the speech signal is flipped (i.e., time reversal and an additional circular shift), phase spectrum is changed although magnitude spectrum is not changed. It has the effect of data augmentation showing additional attributes in phase spectrum which are not included in magnitude spectrum. We assume that those additional attributes in phase spectrum of time-flipped speeches are related to unseen intraclass conditions. Motivated by our assumption, we propose a method of using the phase spectrum based features from both the original and time-flipped speech signals together. If our assumption stands good, it has the effect of reducing intraclass variances because the previously unseen attributes in magnitude spectrum can be considered in phase spectrum. The additional attributes in phase spectrum are helpful to build more robust spoofing detection systems. The experimental results on ASVspoof 2019 logical and physical access scenarios exhibit significant performance improvements for both scenarios compared to that of the baseline.