SuperLectures.com

MULTISTREAM SPEAKER DIARIZATION THROUGH INFORMATION BOTTLENECK SYSTEM OUTPUTS COMBINATION

Speaker Diarization

Full Paper at IEEE Xplore

Přednášející: Petr Motlíček, Autoři: Deepu Vijayasenan, Fabio Valente, Petr Motlicek, Idiap Research Institute, Switzerland

Speaker diarization of meetings recorded with Multiple Distant Microphones makes extensive use of multiple feature streams like MFCC and Time Delay of Arrivals (TDOA). Typically the combination happens using separate models for each feature stream. This work investigates if the combination of multiple feature streams can happen through the combination of multiple diarization systems performed using those features. The paper extends the previously proposed Information Bottleneck method to handle the combination of several probabilistic diarization outputs. In contrast to the conventional model-based feature combination, this technique is referred as system-based combination. Furthermore the paper introduces an hybrid model-system combination. Experiments are run on data from the Rich Transcription campaigns and show that the system based combination largely outperforms the model based combination by (37\%) relative. The hybrid approaches improve by (10-20\%). The analysis of errors shows that the improvements come from the recordings where the individual MFCC and TDOA systems provide very different performances.


  Přepis řeči

|

  Slajdy

Zvětšit slajd | Zobrazit všechny slajdy

0:00:22

  1. slajd

0:01:16

  2. slajd

0:02:12

  3. slajd

0:03:17

  4. slajd

0:04:06

  5. slajd

0:07:17

  6. slajd

0:08:56

  7. slajd

0:09:59

  8. slajd

0:11:08

  9. slajd

0:11:27

 10. slajd

0:12:23

 11. slajd

0:12:46

    10. slajd

0:12:54

    11. slajd

0:13:24

 12. slajd

0:13:57

 13. slajd

0:14:44

 14. slajd

0:15:32

 15. slajd

0:16:36

 16. slajd

0:18:01

 17. slajd

  Komentáře

Please sign in to post your comment!

  Informace o přednášce

Nahráno: 2011-05-24 14:05 - 14:25, Panorama
Přidáno: 16. 6. 2011 18:43
Počet zhlédnutí: 24
Rozlišení videa: 1024x576 px, 512x288 px
Délka videa: 0:20:11
Audio stopa: MP3 [6.82 MB], 0:20:11