Study on the Effects of Intrinsic Variation using i-Vectors in Text-Independent Speaker Verification

Presented by:

Sheng Chen

Author(s):

Sheng Chen, Mingxing Xu, and Emlyn Pratt

Speaker verification performance is adversely affected by mismatches between training and testing data in intrinsic variations. This paper explores how recent technologies focused on modeling the total variability behave in addressing the effects of intrinsic variation in speaker verification. The effects of intrinsic variation are investigated from six aspects including speaking style, speaking rate, speaking volume, emotional state, physical status, and speaking language. The speaker and session variability are modeled with the i-vector framework in the total variability space and the cosine similarity is used as the final decision score in the i-vector based speaker verification system. Intrinsic variations are compensated in the i-vector framework with a variety of techniques, specifically Linear Discriminant Analysis (LDA), Within-Class Covariance Normalization (WCCN) and Nuisance Attribute Projection (NAP). Experiments in the intrinsic corpus show that speaker volume has dramatic effects on the results of speaker verification systems and whisper speech brings the largest degradation of speaker verification performance. The best results are obtained by i-vector modeling with the combined compensation of LDA and WCCN in the i-vector based systems. Compared to the GMM-UBM based system, around 36.78% relative improvement in Equal Error Rate (EER) is obtained in the i-Vector+LDA+WCCN system.

Odyssey 2012

The Speaker and Language Recognition Workshop

Study on the Effects of Intrinsic Variation using i-Vectors in Text-Independent Speaker Verification

Search in Audio

Speech Transcript

Related Recordings

Variance-Spectra based Normalization for I-vector Standard and Probabilistic Linear Discriminant Analysis

Utterance Partitioning with Acoustic Vector Resampling for I-Vector based Speaker Verification