AN INVESTIGATION OF SUBSPACE MODELING FOR PHONETIC AND SPEAKER VARIABILITY IN AUTOMATIC SPEECH RECOGNITION
Acoustic Modeling
Přednášející: Richard Rose, Autoři: Richard Rose, Shou-Chun Yin, Yun Tang, McGill University, Canada
This paper investigates the impact of subspace based techniques for modeling speaker variability and phonetic variability in automatic speech recognition(ASR). There are many well known approaches to speaker space based adaptation which represent sources of variability as a projection within a low dimensional subspace. A new approach to acoustic modeling in ASR, referred to as the subspace based Gaussian mixture model (SGMM), represents phonetic variability as a set of projections applied at the state level in a hidden Markov model (HMM) based acoustic model. The impact of the SGMM in modeling these intrinsic sources of variability is evaluated for a continuous speech recognition (CSR) task where the performance of continuous density HMM(CDHMM) based ASR systems is already reasonably good. Speaker independent SGMM based ASR was shown to provide an 18% reduction in word error rate (WER) over the CDHMM and a 5% reduction in WER over unsupervised speaker adaptation in the resource management CSR domain.
Informace o přednášce
Nahráno: | 2011-05-25 15:25 - 15:45, Panorama |
---|---|
Přidáno: | 15. 6. 2011 17:03 |
Počet zhlédnutí: | 73 |
Rozlišení videa: | 1024x576 px, 512x288 px |
Délka videa: | 0:20:20 |
Audio stopa: | MP3 [6.87 MB], 0:20:20 |
Komentáře