Utterance Partitioning with Acoustic Vector Resampling for I-Vector based Speaker Verification

Presented by:

Wei RAO

Author(s):

Wei RAO and Man-Wai MAK

I-vector has become a state-of-the-art technique for text-independent speaker verification. The major advantage of i-vectors is that they can represent speaker-dependent information in a low-dimension Euclidean space, which opens up opportunity for using statistical techniques to suppress session- and channel-variability. This paper investigates the effect of varying the conversation length and the number of training sessions per speakers on the discriminative ability of i-vectors. The paper demonstrates that the amount of speaker-dependent information that an i-vector can capture will become saturated when the utterance length exceeds a certain threshold. This finding motivates us to maximize the feature representation capability of i-vectors by partitioning a long conversation into a number of sub-utterances in order to produce more i-vectors per conversation. Results on NIST 2010 SRE suggest that (1) using more i-vectors per conversation enhances the capability of LDA and WCCN in suppressing session variability, especially when the number of conversations per training speaker is limited; and (2) increasing the number of i-vectors per target speaker helps the i-vector based SVMs to find better decision boundaries, thus making SVM scoring outperforms cosine distance scoring by 22% in terms of minimum normalized DCF.

Odyssey 2012

The Speaker and Language Recognition Workshop

Utterance Partitioning with Acoustic Vector Resampling for I-Vector based Speaker Verification

Search in Audio

Related Recordings

Variance-Spectra based Normalization for I-vector Standard and Probabilistic Linear Discriminant Analysis

Study on the Effects of Intrinsic Variation using i-Vectors in Text-Independent Speaker Verification