Unsupervised Speaker Adaptation based on the Cosine Similarity for Text-Independent Speaker Verification

SESSION 4: Speaker and language recognition – scoring, confidences and calibration

Added: 14. 7. 2010 11:08, Author: Stephen Shum, Najim Dehak (Massachusetts Institute of Technology), Reda Dehak (Laboratoire de Recherche et de Developpement de l'EPITA), James Glass (Massachusetts Institute of Technology), Length: 0:29:13

This paper proposes a new approach to unsupervised speaker adaptation inspired by the recent success of the factor analysis-based Total Variability Approach to text-independent speaker verification [1]. This approach effectively represents speaker variability in terms of low-dimensional total factor vectors and, when paired alongside the simplicity of cosine similarity scoring, allows for easy manipulation and efficient computation [2]. The development of our adaptation algorithm is motivated by the desire to have a robust method of setting an adaptation threshold, to minimize the amount of required computation for each adaptation update, and to simplify the associated score normalization procedures where possible. To address the final issue, we propose the Symmetric Normalization (S-norm) method, which takes advantage of the symmetry in cosine similarity scoring and achieves competitive performance to that of the ZT-norm while requiring fewer parameter calculations. In subsequent experiments, we also assess an attempt to replace the use of score normalization procedures altogether with a Normalized Cosine Similarity scoring function [3].

We evaluated the performance of our unsupervised speaker adaptation algorithm under various score normalization procedures on the 10sec-10sec and core conditions of the 2008 NIST SRE dataset. Using results without adaptation as our baseline, it was found that the proposed methods are consistent in successfully improving speaker verification performance to achieve state-of-the-art results.

  Speech Transcript



Please sign in to post your comment!

  Lecture Information

Number of views: 1191
Video resolution: 720x576 px
Audio track: MP3 [10.03 MB], 0:29:13