|Woo Hyun Kang (CRIM, Canada), Nam Soo Kim (Seoul National University, Korea)|
In this paper, we provide description of our submitted systems to the Short Duration Speaker Verification (SdSV) Challenge 2021 Task 2. The challenge provides a difficult set of cross-language text-independent speaker verification trials. Our submissions employ ResNet-based embedding networks which are trained using various strategies exploiting both in-domain and out-of-domain datasets. The results show that using the recently proposed joint factor embedding (JFE) scheme can enhance the performance by disentangling the language-dependent information from the speaker embedding. However, upon analyzing the speaker embeddings, it was found that there exists a clear discrepancy between the in-domain and out-of-domain datasets. Therefore, among our submitted systems, the best performance was achieved by pre-training the embedding system using out-of-domain dataset and fine-tuning it with only the in-domain data, which resulted in a MinDCF of 0.142716 on the SdSV2021 evaluation set.