Intra-speaker variability effects on Speaker Verification performance

SESSION 5: Speaker recognition – Inter-session variability

Přidáno: 14. 7. 2010 11:08, Autor: Juliette Kahn, Nicolas Audibert, Solange Rossato, Jean-François Bonastre (Laboratoire Informatique d'Avignon, University of Avignon), Délka: 0:23:55

Speaker verification systems have shown significant progress and have reached a level of performance that make their use in practical applications possible. Nevertheless, large differences in terms of performance are observed, depending on the speaker or the speech excerpt used. This context emphasizes the importance of a deeper analysis of the system's performance over average error rate. In this paper, the effect of the training excerpt is investigated using ALIZE/SpkDet on two different corpora: NIST-SRE 08 (conversational speech) and BREF 120 (controlled read speech). The results show that the SVS performance are highly dependent on the voice samples used to train the speaker model: the overall Equal Error Rate (EER) ranges from 4.1% to 29.1% on NIST-SRE 08 and from 1.0% to 33.0% on BREF 120. The hypothesis that such performance differences are explained by phonetic contents of voice samples is studied on BREF 120.

Potřebujete Flash Player.