Short-Duration Speaker Modelling with Phone Adaptive Training

Giovanni Soldi, Simon Bozonnet, Federico Alegre, Christophe Beaugeant and Nicholas Evans

This paper presents a new approach to feature-level phone normalisation which aims to improve speaker modelling in the case of short-duration training data. The new approach is referred to as phone adaptive training (PAT). Based on constrained maximum likelihood linear regression (cMLLR) and previous work in speaker adaptive training (SAT), PAT learns a set of transforms which project features into a new phone-normalised but speaker-discriminative space. Originally investigated in the context of speaker diarization, this paper presents new work to assess and optimise PAT at the level of speaker modelling and in the context of automatic speaker verification (ASV). Experiments show that PAT improves the performance of a state-of-the-art iVector ASV system by 50% relative to the baseline.

Odyssey 2014

The Speaker and Language Recognition Workshop

Short-Duration Speaker Modelling with Phone Adaptive Training

Search in Audio

Speech Transcript

Related Recordings

Joint Factor Analysis for Text-Dependent Speaker Verification

Text-Dependent Speaker Verification System in VHF Communication Channel