Odyssey 2016

The Speaker and Language Recognition Workshop

Compensation for phonetic nuisance variability in speaker recognition using DNNs

Themos Stafylakis, Patrick Kenny, Vishwa Gupta, Jahangir Alam, Marcel Kockmann
In this paper, a new way of using phonetic DNN in text-independent speaker recognition is examined. Inspired by the Subspace GMM approach to speech recognition, we try to extract i-vectors that are invariant to the phonetic content for the utterance. We overcome the assumption of gaussian distributed senones by combining DNN with UBM posteriors and we form a complete EM algorithm for training and extracting phonetic content compensated i-vectors. A simplified version of the model is also presented, where the phonetic content and speaker subspaces are learned in a decoupled way. Covariance adaptation is also examined, where the covariance matrices are reestimated rather than copied from the UBM. A set of primary experimental results is reported on NIST-SRE 2010, with modest improvement when fused with the standard i-vectors.