SuperLectures.com

SYNTHESIZING VISUAL SPEECH TRAJECTORY WITH MINIMUM GENERATION ERROR

Speech Synthesis

Full Paper at IEEE Xplore

Přednášející: Lijuan Wang, Autoři: Lijuan Wang, Microsoft Research Asia, China; Yi-Jian Wu, Microsoft Corporation, China; Xiaodan Zhuang, Beckman Institute / University of Illinois at Urbana-Champaign, China; Frank K. Soong, Microsoft Research Asia, China

In this paper, we propose a minimum generation error (MGE) training method to refine the audio-visual HMM to improve visual speech trajectory synthesis. Compared with the traditional maximum likelihood (ML) estimation, the proposed MGE training explicitly optimizes the quality of generated visual speech trajectory, where the audio-visual HMM modeling is jointly refined by using a heuristic method to find the optimal state alignment and a probabilistic descent algorithm to optimize the model parameters under the MGE criterion. In objective evaluation, compared with the ML-based method, the proposed MGE-based method achieves consistent improvement in the mean square error reduction, correlation increase, and recovery of global variance. It also improves the naturalness and audio-visual consistency perceptually in the subjective test.


  Přepis řeči

|

  Slajdy

Zvětšit slajd | Zobrazit všechny slajdy

0:00:16

  1. slajd

0:00:45

  2. slajd

0:02:32

  3. slajd

0:06:00

  4. slajd

0:08:16

  5. slajd

0:09:26

  6. slajd

0:10:58

  7. slajd

0:14:24

  8. slajd

0:14:51

  9. slajd

0:15:09

 10. slajd

0:16:14

 11. slajd

0:17:40

 12. slajd

  Komentáře

Please sign in to post your comment!

  Informace o přednášce

Nahráno: 2011-05-26 15:25 - 15:45, Panorama
Přidáno: 15. 6. 2011 19:16
Počet zhlédnutí: 28
Rozlišení videa: 1024x576 px, 512x288 px
Délka videa: 0:22:45
Audio stopa: MP3 [7.71 MB], 0:22:45