Speaker Characterization Using TDNN, TDNN-LSTM, TDNN-LSTM-Attention based Speaker Embeddings for NIST SRE 2019

Chien-Lin Huang

In this paper, we explore speaker characterization using the time-delay neural network, long short-term memory neural network, and attention (TDNN-LSTM-Attention) based speaker embedding. The speaker embeddings of TDNN, TDNN-LSTM, TDNN-LSTM-Attention are investigated on a large scale of train and testing datasets. Different types of front-end feature extraction are investigated to find good features for speaker embedding. To increase the amount and diversity of the training data, 4 kinds of data augmentation are used to create 7 new copies of the original data. The proposed methods are evaluated with the National Institute of Standards and Technology (NIST) speaker recognition evaluation (SRE) tasks. Experimental results show that the proposed methods achieve the minimum decision cost function of 0.372 and 0.392 with the NIST SRE 2018 and SRE 2019 evaluation datasets, respectively.　

Search in Audio

Speech Transcript

Show speech transcript

Related Recordings

0:18:20

Speaker Detection in the Wild: Lessons Learned from JSALT 2019

Leibny Paola Garcia Perera, Jesus Villalba, Herve Bredin, Jun Du, Diego Castan, Alejandrina Cristia, Latane Bullock, Ling Guo, Koji Okabe, Phani Sankar Nidadavolu, Saurabh Kataria, Sizhu Chen, Leo Galmant, Marvin Lavechin, Lei Sun, Marie-Philippe Gill, Bar Ben-Yair, Sajjad Abdoli, Xin Wang, Wassim Bouaziz, Hadrien Titeux, Emmanuel Dupoux, Kong Aik Lee, Najim Dehak

0:13:07

Combined Vector Based on Factorized Time-delay Neural Network for Text-Independent Speaker Recognition

Tianyu Liang, Yi Liu, Can Xu, Xianwei Zhang, Liang He

Odyssey 2020

The Speaker and Language Recognition Workshop