Odyssey 2012

The Speaker and Language Recognition Workshop

Regularization of All-Pole Models for Speaker Verification Under Additive Noise

Presented by:
Cemal Hanilci
Cemal Hanilci, Tomi Kinnunen, Rahim Saeidi, Jouni Pohjalainen, Paavo Alku and Figen Ertas

Regularization of linear prediction based mel-frequency cepstral coefficient (MFCC) extraction in speaker verification is considered. Commonly, MFCCs are extracted from the discrete Fourier transform (DFT) spectra of speech frames. In our recent study, it was shown that replacing the DFT spectrum estimation step with the conventional and temporally weighted linear prediction (LP) and their regularized versions increases the recognition performance considerably. In this paper, we provide a thorough analysis on the regularization of conventional and temporally weighted LP methods. Experiments on the NIST 2002 corpus indicate that regularized all-pole methods yield large improvements on recognition accuracy under additive factory and babble noise conditions (e.g. 10% relative improvement over standard DFT method for 0 dB SNR factory noise) in terms of both equal error rate (EER) and minimum detection cost function (MinDCF).