InterSpeech 2021

LEAP Submission for the Third DIHARD Diarization Challenge
(longer introduction)

Prachi Singh (Indian Institute of Science, India), Rajat Varma (Indian Institute of Science, India), Venkat Krishnamohan (Indian Institute of Science, India), Srikanth Raj Chetupalli (Indian Institute of Science, India), Sriram Ganapathy (Indian Institute of Science, India)
The LEAP submission for DIHARD-III challenge is described in this paper. The proposed system is composed of a speech bandwidth classifier, and diarization systems fine-tuned for narrowband and wideband speech separately. We use an end-to-end speaker diarization system for the narrowband conversational telephone speech recordings. For the wideband multi-speaker recordings, we use a neural embedding based clustering approach, similar to the baseline system. The embeddings are extracted from a time-delay neural network (called x-vectors) followed by the graph based path integral clustering (PIC) approach. The LEAP system showed 24% and 18% relative improvements for Track-1 and Track-2 respectively over the baseline system provided by the organizers. This paper describes the challenge submission, the post-evaluation analysis and improvements observed on the DIHARD-III dataset.