Incorporating Duration Information into I-Vector-Based Speaker Recognition Systems

Bostjan Vesnicer, Jerneja Zganec-Gros, Simon Dobrisek and Vitomir Struc

Most of the existing literature on i-vector-based speaker recognition focuses on recognition problems, where i-vectors are extracted from speech recordings of sufficient length. The majority of modeling/recognition techniques therefore simply ignores the fact that the i-vectors are most likely estimated unreliably when short recordings are used for their computation. Only recently, were a number of solutions proposed in the literature to address the problem of duration variability, all treating the i-vector as a random variable whose posterior distribution can be parameterized by the posterior mean and the posterior covariance. In this setting the covariance matrix serves as a measure of uncertainty that is related to the length of the available recording. In contract to these solutions, we address the problem of duration variability through weighted statistics. We demonstrate in the paper how established feature transformation techniques regularly used in the area of speaker recognition, such as PCA or WCCN, can be modified to take duration into account. We evaluate our weighting scheme in the scope of the i-vector challenge organized as part of the Odyssey, Speaker and Language Recognition Workshop 2014 and achieve a minimal DCF of 0.280, which at the time of writing puts our approach in third place among all the participating institutions.

Odyssey 2014

The Speaker and Language Recognition Workshop

Incorporating Duration Information into I-Vector-Based Speaker Recognition Systems

Search in Audio

Speech Transcript

Related Recordings

STC Speaker Recognition System for the NIST i-Vector Challenge

Linearly Constrained Minimum Variance for Robust I-vector Based Speaker Recognition