Online Speaker Diarization Equipped with Discriminative Modeling and Guided Inference <BR>(3 minutes introduction)

Online Speaker Diarization Equipped with Discriminative Modeling and Guided Inference
(3 minutes introduction)

Xucheng Wan (Huawei Technologies, China), Kai Liu (Huawei Technologies, China), Huan Zhou (Huawei Technologies, China)

Despite considerable efforts, online speaker diarization remains an ongoing challenge. In this study, we propose to tackle the challenge from two perspectives, to endow diarization model with discriminability and to rectify less-reliable online inference with guidance. Specifically, based on the current prior art, UIS-RNN, two enhancement approaches are proposed to concretize our motivations. The effectiveness of our proposals is experimentally validated by results on the AMI evaluation set. With substantial relative improvement of 48.7%, our online speaker diarization system significantly outperformed its baseline. More impressively, its performance in terms of diarization error rate is better than most state-of-the-art offline systems.

InterSpeech 2021

Online Speaker Diarization Equipped with Discriminative Modeling and Guided Inference
(3 minutes introduction)

Search in Audio

Related Recordings

Three-class Overlapped Speech Detection using a Convolutional Recurrent Neural Network
(3 minutes introduction)

End-to-end speaker segmentation for overlap-aware resegmentation
(3 minutes introduction)

InterSpeech 2021

Online Speaker Diarization Equipped with Discriminative Modeling and Guided Inference (3 minutes introduction)

Search in Audio

Related Recordings

Three-class Overlapped Speech Detection using a Convolutional Recurrent Neural Network (3 minutes introduction)

End-to-end speaker segmentation for overlap-aware resegmentation (3 minutes introduction)

Online Speaker Diarization Equipped with Discriminative Modeling and Guided Inference
(3 minutes introduction)

Three-class Overlapped Speech Detection using a Convolutional Recurrent Neural Network
(3 minutes introduction)

End-to-end speaker segmentation for overlap-aware resegmentation
(3 minutes introduction)