I-Vector Representation Based on GMM and DNN for Audio Classification

Najim Dehak

The I-vector approach became the state of the art approach in several audio classification tasks such as speaker and language recognition. This approach consists of modeling and capturing all the different variability in the Gaussian Mixture Model (GMM) mean components between several audio recordings. More recently several subspace approaches had been extended on modeling the variability between the GMM weights rather than the GMM means. These last techniques such as Non-negative Factor Analysis (NFA) and Subspace Multinomial Model (SMM) needed to deal with the fact that the GMM weights are always positive and they should sum to one. In this talk, we will show how the NFA and SMM approaches or similar other subspaces approaches can be also used to model the hidden layer neuron activations on the deep neural network model for sequential data recognition task such as language and dialect recognition.

Switch Camera

Odyssey 2016

The Speaker and Language Recognition Workshop

I-Vector Representation Based on GMM and DNN for Audio Classification

Search in Audio

Speech Transcript

Related Recordings

Voice conversion and spoofing countermeasures for speaker verification

Understanding individual-level speech variability: From novel speech production data to robust speaker recognition