Hunting for Wolves in Speaker Recognition

SESSION 7: Speaker and Language recognition - Evaluations and performance testing

Added: 14. 7. 2010 11:08, Author: Lara Stoll (International Computer Science Institute and UC-Berkeley), George Doddington (), Length: 0:25:59

Identification and selection of speaker pairs that are difficult to distinguish offers the possibility of better focusing speaker recognition research, while also reducing the amount of data needed to estimate system performance with confidence. This work aims to predict which speaker pairs will be difficult for automatic speaker recognition systems to distinguish, by using features that characterize speakers, and thus provide a measure of speaker similarity. Features tested include pitch, jitter, shimmer, formant frequencies, energy, long term average spectrum energy, histograms of frequencies from roots of LPC coefficients, and spectral slope. Absolute and percent differences, Euclidean distance, and correlation coefficients are utilized to measure the closeness of these speaker features. Using data from NIST's 2008 Speaker Recognition Evaluation, the largest changes in detection cost and false alarm rate for similar speaker pairs (relative to all speaker pairs) occurs when speaker pairs are selected using the Euclidean distance between vectors of the mean first, second, and third formant frequencies. Even bigger differences in performance occur when speaker pairs are selected using the KL divergence between speaker-specific GMMs as a measure of similarity. In general, the feature-measures considered here are more successful at finding easy-to-distinguish speaker pairs than difficult-to-distinguish ones, and can provide potentially useful information about a speaker's tendency to be similar or dissimilar to other speakers.

  Speech Transcript



Please sign in to post your comment!

  Lecture Information

Number of views: 473
Video resolution: 720x576 px
Audio track: MP3 [8.92 MB], 0:25:59