Joint Factor Analysis for Text-Dependent Speaker Verification

Patrick Kenny, Themos Stafylakis, Alam Jahangir, Pierre Ouellet and Marcel Kockmann

We tackle the problem of text-dependent speaker verification using a version of Joint Factor Analysis (JFA) in which speaker-phrase variability is modeled with a factorial prior and channel variability with a subspace prior. We implemented this using Zhao and Dong’s variational Bayes algorithm, an extension of Vogt’s Gauss-Seidel method that supports UBM adaptation to the speaker and channel effects in enrollment and test utterances. We report results on the RSR2015 dataset obtained with two types of likelihood ratio and several strategies for UBM adaptation. We found that using a large UBM and decomposing JFA into a feature extractor and a simple back end classifier (in a way broadly analogous to the i-vector/PLDA cascade) gives better results than using likelihood ratios of either type to make verification decisions. This method involves no UBM adaptation other than to the lexical content of utterances and it is based on Vogt’s algorithm rather than Zhao and Dong’s. It results in an equal error rate of 0.5% on the RSR2015 evaluation set.

Odyssey 2014

The Speaker and Language Recognition Workshop

Joint Factor Analysis for Text-Dependent Speaker Verification

Search in Audio

Speech Transcript

Related Recordings

Short-Duration Speaker Modelling with Phone Adaptive Training

Text-Dependent Speaker Verification System in VHF Communication Channel