ECAPA-TDNN Embeddings for Speaker Diarization <BR>(3 minutes introduction)

ECAPA-TDNN Embeddings for Speaker Diarization
(3 minutes introduction)

Nauman Dawalatabad (IIT Madras, India), Mirco Ravanelli (Mila, Canada), François Grondin (Université de Sherbrooke, Canada), Jenthe Thienpondt (Ghent University, Belgium), Brecht Desplanques (Ghent University, Belgium), Hwidong Na (Samsung, Korea)

Learning robust speaker embeddings is a crucial step in speaker diarization. Deep neural networks can accurately capture speaker discriminative characteristics and popular deep embeddings such as x-vectors are nowadays a fundamental component of modern diarization systems. Recently, some improvements over the standard TDNN architecture used for x-vectors have been proposed. The ECAPA-TDNN model, for instance, has shown impressive performance in the speaker verification domain, thanks to a carefully designed neural model. In this work, we extend, for the first time, the use of the ECAPA-TDNN model to speaker diarization. Moreover, we improved its robustness with a powerful augmentation scheme that concatenates several contaminated versions of the same signal within the same training batch. The ECAPA-TDNN model turned out to provide robust speaker embeddings under both close-talking and distant-talking conditions. Our results on the popular AMI meeting corpus show that our system significantly outperforms recently proposed approaches.

Search in Audio

Related Recordings

Target-Speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker
(3 minutes introduction)

Maokui He , Desh Raj , Zili Huang , Jun Du , Zhuo Chen , Shinji Watanabe

Advances in integration of end-to-end neural and clustering-based diarization for real conversational speech
(3 minutes introduction)

Keisuke Kinoshita , Marc Delcroix , Naohiro Tawara

InterSpeech 2021

ECAPA-TDNN Embeddings for Speaker Diarization (3 minutes introduction)

Search in Audio

Related Recordings

Target-Speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker (3 minutes introduction)

Advances in integration of end-to-end neural and clustering-based diarization for real conversational speech (3 minutes introduction)

ECAPA-TDNN Embeddings for Speaker Diarization
(3 minutes introduction)

Target-Speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker
(3 minutes introduction)

Advances in integration of end-to-end neural and clustering-based diarization for real conversational speech
(3 minutes introduction)