InterSpeech 2021

Team02 Text-Independent Speaker Verification System for SdSV Challenge 2021
(longer introduction)

Woo Hyun Kang (CRIM, Canada), Nam Soo Kim (Seoul National University, Korea)
In this paper, we provide description of our submitted systems to the Short Duration Speaker Verification (SdSV) Challenge 2021 Task 2. The challenge provides a difficult set of cross-language text-independent speaker verification trials. Our submissions employ ResNet-based embedding networks which are trained using various strategies exploiting both in-domain and out-of-domain datasets. The results show that using the recently proposed joint factor embedding (JFE) scheme can enhance the performance by disentangling the language-dependent information from the speaker embedding. However, upon analyzing the speaker embeddings, it was found that there exists a clear discrepancy between the in-domain and out-of-domain datasets. Therefore, among our submitted systems, the best performance was achieved by pre-training the embedding system using out-of-domain dataset and fine-tuning it with only the in-domain data, which resulted in a MinDCF of 0.142716 on the SdSV2021 evaluation set.