|Itshak Lapidot, Jean-Francois Bonastre|
In the context of detection of speaker recognition identity impersonation, we observed that the waveform probability mass function (PMF) of genuine speech differs from significantly of of PMF from identity theft extracts. In previous work we present the analysis of logical access (LA), i.e., for synthesized or converted speech. In this work we extend the analysis for physical access (PA) (replayed speech) as well. We will show that for the replayed data, the changes in PMF influence significantly on spoofing detection performance. Then, we wish to reduce the distribution gap between bona fide speech waveforms and replayed speech waveforms. We propose a genuinization of the spoofing speech (by analogy with Gaussianisation), by shifting the spoofing speech PMF close to the PMF of genuine speech. Our genuinization is evaluated on ASVspoof 2019 challenge datasets, using the baseline system provided by the challenge organization. In terms of equal error rate (EER) it seems that both, linear frequency Cepstral coefficient (LFCC) and constant Q cepstral coefficients (CQCC) features based systems lead to better results when applied on non-genuanized replayed data (even if lower in terms of min-tDCF for the CQCC system). On the other hand, when the systems are trained on genuanized data, the results on genuanized replayed data are very good compared to the results obtained without applying genuinization on the data. As in LA case, the performance is not consistent and it opens problematic questions on generalization capabilities of anti-spoofing systems.