Joint Feature Enhancement and Speaker Recognition with Multi-Objective Task-Oriented Network <BR>(3 minutes introduction)

Joint Feature Enhancement and Speaker Recognition with Multi-Objective Task-Oriented Network
(3 minutes introduction)

Yibo Wu (Tianjin University, China), Longbiao Wang (Tianjin University, China), Kong Aik Lee (A*STAR, Singapore), Meng Liu (Tianjin University, China), Jianwu Dang (Tianjin University, China)

Recently, increasing attention has been paid to the joint training of upstream and downstream tasks, and to address the challenge of how to synchronize various loss functions in a multi-objective scenario. In this paper, to address the competing gradient directions between the speaker classification loss and the feature enhancement loss, we propose an asynchronous subregion optimization approach for the joint training of feature enhancement and speaker embedding neural networks. For the asynchronous subregion optimization, the squeeze and excitation (SE) method is introduced in the enhancement network to adaptively select important channels for speaker embedding. Furthermore, channel-wise feature concatenation is applied between the input feature and the enhanced feature to address the distortion of speaker information that is caused by enhancement loss. By using the proposed joint training network with asynchronous subregion optimization and channel-wise feature concatenation, we obtained relative gains of 11.95% and 6.43% in equal error rate on a noisy version of Voxceleb1 and VOiCES corpus, respectively.

Search in Audio

Related Recordings

Deep Feature CycleGANs: Speaker Identity Preserving Non-parallel Microphone-Telephone Domain Adaptation for Speaker Verification
(3 minutes introduction)

Saurabh Kataria , Jesús Villalba , Piotr Żelasko , Laureano Moro-Velázquez , Najim Dehak

Speaker anonymisation using the McAdams coefficient
(3 minutes introduction)

Jose Patino , Natalia Tomashenko , France), Massimiliano Todisco , Andreas Nautsch , Nicholas Evans

InterSpeech 2021

Joint Feature Enhancement and Speaker Recognition with Multi-Objective Task-Oriented Network (3 minutes introduction)

Search in Audio

Related Recordings

Deep Feature CycleGANs: Speaker Identity Preserving Non-parallel Microphone-Telephone Domain Adaptation for Speaker Verification (3 minutes introduction)

Speaker anonymisation using the McAdams coefficient (3 minutes introduction)

Joint Feature Enhancement and Speaker Recognition with Multi-Objective Task-Oriented Network
(3 minutes introduction)

Deep Feature CycleGANs: Speaker Identity Preserving Non-parallel Microphone-Telephone Domain Adaptation for Speaker Verification
(3 minutes introduction)

Speaker anonymisation using the McAdams coefficient
(3 minutes introduction)