Hierarchical speaker clustering methods for the NIST i-vector Challenge

Elie Khoury, Laurent El Shafey, Marc Ferras and Sebastien Marcel

The process of manually labeling data is very expensive and sometimes infeasible due to privacy and security issues. This paper investigates the use of two algorithms for clustering unlabeled training i-vectors. This aims at improving speaker recognition performance by using state-of-the-art supervised techniques in the context of the NIST i-vector Machine Learning Challenge 2014 . The first algorithm is the well-known Ward clustering that aims at optimizing an objective function across all clusters. The second one is a cascade clustering, which benefits from the latest advances in speaker modeling and session compensation techniques, and relies on both the cosine similarity and probabilistic linear discriminant analysis (PLDA). Furthermore, this paper investigates the multi-clustering fusion that opens the door for further improvements. The experimental results show that the use of the automatically labeled i-vectors to train supervised methods such as LDA, PLDA or linear logistic regression-based fusion, decreases the minimum decision cost function by up to 22%.

Odyssey 2014

The Speaker and Language Recognition Workshop

Hierarchical speaker clustering methods for the NIST i-vector Challenge

Search in Audio

Speech Transcript

Related Recordings

Incorporating Duration Information into I-Vector-Based Speaker Recognition Systems

Linearly Constrained Minimum Variance for Robust I-vector Based Speaker Recognition