|Hanwu Sun, Trung Hieu Nguyen, Guangsen Wang, Kong Aik Lee, Bin Ma, Haizhou Li|
This paper presents a detailed description and analysis of I2R submission, which is among the top performing systems, to the 2015 NIST language recognition i-vector machine learning challenge. Our submission is a fusion of several sub-systems based on linear discriminant analysis (LDA), support vector machine (SVM), multi-layer perceptron (MLP), deep neural network (DNN), and multi-class logistic regression. Central to our work presented in this paper is a novel out-of-set (OOS) detection scheme for selecting i-vectors from an unlabeled development set. It consists of a best fit out-of-set selection followed by cluster purification. We also propose a novel empirical kernel map to be used with SVM. Experimental results show that the proposed approach achieves significant improvement on both the progress and evaluation sets defined for the i-vector challenge. Our final submission achieves 55.0% and 54.5% relative improvement over the baseline system on the progress and evaluation sets, respectively.