Odyssey 2016

The Speaker and Language Recognition Workshop

Improvements on Deep Bottleneck Network based I-Vector Representation for Spoken Language Identification

Yan Song, Ruilian Cui, Ian Mcloughlin, Lirong Dai
Recently, the i-vector representation based on deep bottleneck network(DBN) pre-trained for automatic speech recognition has received significant interest for both speaker verification(SV) and language identification(LID). In a previous work, we presented a unified DBN based i-vector framework, referred to as DBN-pGMM i-vector [1]. In this paper, we replace the pGMM with a phonetic mixture of factor analyzers (pMFA), and propose a new DBN-pMFA i-vector. The DBN-pMFA ivector includes the following improvements on previous one. 1) a pMFA model is derived from the DBN, which can jointly perform feature dimension reduction and de-correlation in a single linear transformation. 2) a shifted DBF, termed SDBF, is proposed to exploit the temporal contextual information, and 3) a senone selection scheme is proposed to make the i-vector extraction more efficient. We evaluate the proposed DBNpMFA i-vector on the most confused six languages selected from NIST LRE 2009. The experimental results demonstrate that DBN-pMFA can consistently outperform the previous DBN based framework [1]. The computational complexity can be significantly reduced by applying a simple senone selection scheme.