Augmented Data Training of Joint Acoustic/Phonotactic DNN i-vectors for NIST LRE15

Alan Mccree, Greg Sell, Daniel Garcia-Romero

This paper presents the JHU HLTCOE submission to the NIST 2015 Language Recognition Evaluation, including critical and novel algorithmic components, use of limited and augmented training data, and additional post-evaluation analysis and improvements. All of our systems used i-vectors based on Deep Neural Networks (DNNs) with discriminatively-trained Gaussian classifiers, and linear fusion was performed with duration-dependent scaling. A key innovation was the use of three different kinds of i-vectors: acoustic, phonotactic, and joint. In addition, data augmentation was used to overcome the limited training data of this evaluation. Post-evaluation analysis shows the benefits of these design decisions, as well as further potential improvements.

Switch Camera

Odyssey 2016

The Speaker and Language Recognition Workshop

Augmented Data Training of Joint Acoustic/Phonotactic DNN i-vectors for NIST LRE15

Search in Audio

Speech Transcript

Related Recordings

LID-senone Extraction via Deep Neural Networks for End-to-End Language Identification

On autoencoders in the i-vector space for speaker recognition