Semi-supervised On-line Speaker Diarization for Meeting Data with Incremental Maximum A-posteriori Adaptation

Giovanni Soldi, Massimiliano Todisco, Héctor Delgado, Christophe Beaugeant, Nicholas Evans

Almost all current diarization systems are off-line and ill-suited to the growing need for on-line or real-time diarization. Our previous work reported the first on-line diarization system for the most challenging speaker diarization domain involving meeting data. Even if results were not dissimilar to those reported for on-line diarization in less challenging domains, error rates were high and unlikely to support any practical applications. The first novel contribution in this paper relates to the investigation of a semi-supervised approach to on-line diarization whereby speaker models are seeded with a modest amount of manually labelled data. In practical applications involving meetings, such data can be obtained readily from brief roundtable introductions. The second novel contribution relates to a incremental MAP adaptation procedure for efficient, on-line speaker modelling. When combined, these two developments provide an on-line diarization system which outperforms a baseline, off-line system by a significant margin. When configured appropriately, error rates may be low enough to support practical applications.

Switch Camera

Odyssey 2016

The Speaker and Language Recognition Workshop

Semi-supervised On-line Speaker Diarization for Meeting Data with Incremental Maximum A-posteriori Adaptation

Search in Audio

Speech Transcript

Related Recordings

Influence of transition cost in the segmentation stage of speaker diarization

Analysis of the Impact of the Audio Database Characteristics in the Accuracy of a Speaker Clustering System