Cosine Similarity Scoring without Score Normalization Techniques
SESSION 4: Speaker and language recognition – scoring, confidences and calibration
Added: 14. 7. 2010 11:08, Author: Najim Dehak (MIT Computer Science and Artificial Intelligence Laboratory, Cambridge), Reda Dehak (Laboratoire de Recherche et de Developpement de l'EPITA (LRDE), Paris), James Glass (MIT Computer Science and Artificial Intelligence Laboratory, Cambridge), Douglas Reynolds (MIT Lincoln Laboratory, Lexington), Patrick Kenny (Centre de Recherche d’Informatique de Montréal (CRIM), Montréal), Length: 0:28:16
In recent work , a simplified and highly effective approach to speaker recognition based on the cosine similarity between low-dimensional vectors, termed ivectors, defined in a total variability space was introduced. The total variability space representation is motivated by the popular Joint Factor Analysis (JFA) approach, but does not require the complication of estimating separate speaker and channel spaces and has been shown to be less dependent on score normalization procedures, such as z-norm and t-norm. In this paper, we introduce a modification to the cosine similarity that does not require explicit score normalization, relying instead on simple mean and covariance statistics from a collection of impostor speaker ivectors. By avoiding the complication of z- and t-norm, the new approach further allows for application of a new unsupervised speaker adaptation technique to models defined in the ivector space. Experiments are conducted on the core condition of the NIST 2008 corpora, where, with adaptation, the new approach produces an equal error rate (EER) of 4.8% and min decision cost function (MinDCF) of 2.3% on all female speaker trials.