MagNetO: X-vector Magnitude Estimation Network plus Offset for Improved Speaker Recognition

Daniel Garcia-Romero, Greg Sell, Alan Mccree

We present a magnitude estimation network that is combined with a modified ResNet x-vector system to generate embeddings whose inner product is able to produce calibrated scores with increased discrimination. A three-step training procedure is used. First, the network is trained using short segments and a multi-class cross-entropy loss with angular margin softmax. During the second step, only a reduced subset of the DNN parameters are refined using full-length recordings. Finally, the magnitude estimation network is trained using a binary cross-entropy loss over pairs of target and non-target trials. The resulting system is evaluated on 4 widely-used benchmarks and provides significant discrimination and calibration gains at multiple operating points.

Odyssey 2020

The Speaker and Language Recognition Workshop

MagNetO: X-vector Magnitude Estimation Network plus Offset for Improved Speaker Recognition

Search in Audio

Speech Transcript

Related Recordings

BERTphone: Phonetically-aware Encoder Representations for Utterance-level Speaker and Language Recognition

Orthogonality Regularizations for End-to-End Speaker Verification