Semi-supervised On-line Speaker Diarization for Meeting Data with Incremental Maximum A-posteriori Adaptation
|Giovanni Soldi, Massimiliano Todisco, Héctor Delgado, Christophe Beaugeant, Nicholas Evans|
Almost all current diarization systems are off-line and ill-suited to the growing need for on-line or real-time diarization. Our previous work reported the first on-line diarization system for the most challenging speaker diarization domain involving meeting data. Even if results were not dissimilar to those reported for on-line diarization in less challenging domains, error rates were high and unlikely to support any practical applications. The first novel contribution in this paper relates to the investigation of a semi-supervised approach to on-line diarization whereby speaker models are seeded with a modest amount of manually labelled data. In practical applications involving meetings, such data can be obtained readily from brief roundtable introductions. The second novel contribution relates to a incremental MAP adaptation procedure for efficient, on-line speaker modelling. When combined, these two developments provide an on-line diarization system which outperforms a baseline, off-line system by a significant margin. When configured appropriately, error rates may be low enough to support practical applications.