Analysis and Optimization of Bottleneck Features for Speaker Recognition
|Alicia Lozano-Diez, Anna Silnova, Pavel Matejka, Ondrej Glembek, Oldrich Plchot, Jan Pesan, Lukas Burget, Joaquin Gonzalez-Rodriguez|
Degraded signal quality and incomplete voice probes have severe effects on the performance of a speaker recognition system. Unified audio characteristics (UACs) have been proposed to quantify multi-condition signal degradation effects into posterior probabilities of quality classes. Lately, we showed that UAC-based quality vectors (q-vectors) are efficient at the score-normalization stage. Hence, we motivate q-vector based calibration by using functions of quality estimates (FQEs). In this work, we examine the robustness of calibration approaches to low-SNR and short-duration conditions utilizing measured and estimated quality indicators. Thereby, camparisons are drawn to quality measure functions (QMFs) employing oracle SNRs and sample duration. In the robustness study, low-SNR and short-duration conditions are excluded from calibration training. The present analysis provides insights on the behaviour of calibration schemes in combined conditions of high signal degradation and short segment duration regarding accurate approximation of idealized calibration. We seek calibration methods in order to parsimonious preserve robustness against unseen data. Separate analysis is provided on duration- and noise-only scenarios as well as on combined duration and noise scenarios. QMFs and FQE significantly outperform the conventional condition-mismatched calibration scheme. A hybrid concept for unknown-quality calibration is concluded.