SuperLectures.com

UT-SCOPE: TOWARDS LVCSR UNDER LOMBARD EFFECT INDUCED BY VARYING TYPES AND LEVELS OF NOISY BACKGROUND

Speech Analysis

Full Paper at IEEE Xplore

Přednášející: Hynek Boril, Autoři: Hynek Boril, John H.L. Hansen, The University of Texas at Dallas, United States

Adverse environments impact the performance of automatic speech recognizers in two ways -- directly by introducing acoustic mismatch between the processed speech signal and acoustic models of the recognizer, and indirectly by affecting the way speakers communicate to maintain intelligible communication over noise (Lombard effect). Currently, an increasing number of studies have analyzed Lombard effect with respect to speech production and perception, yet a limited attention has been paid to its impact on speech systems, especially regarding larger vocabulary tasks. This study presents a large vocabulary speech material captured in the recently acquired portion of UT-Scope database, produced in several types and levels of simulated background noise (highway, crowd, pink). The impact of noisy background variations on speech parameters is studied together with the effects on automatic speech recognition. A front-end cepstral normalization utilizing a modified RASTA filter is proposed and shown to improve recognition performance in a side-by-side evaluation with several common and state-of-the-art normalization algorithms.


  Přepis řeči

|

  Slajdy

Zvětšit slajd | Zobrazit všechny slajdy

0:00:16

  1. slajd

0:00:40

  2. slajd

0:01:23

  3. slajd

0:02:39

  4. slajd

0:05:39

  5. slajd

0:06:44

  6. slajd

0:08:02

  7. slajd

0:09:21

  8. slajd

0:10:07

  9. slajd

0:10:49

 10. slajd

0:11:31

 11. slajd

0:13:56

 12. slajd

0:14:16

 13. slajd

0:14:55

 14. slajd

0:16:11

 15. slajd

0:17:45

 16. slajd

0:18:15

 17. slajd

0:18:35

 18. slajd

0:20:41

 19. slajd

  Komentáře

Please sign in to post your comment!

  Informace o přednášce

Nahráno: 2011-05-25 10:10 - 10:30, Panorama
Přidáno: 15. 6. 2011 15:21
Počet zhlédnutí: 14
Rozlišení videa: 1024x576 px, 512x288 px
Délka videa: 0:21:29
Audio stopa: MP3 [7.27 MB], 0:21:29