InterSpeech 2021

Source Separation II

Graph-PIT: Generalized permutation invariant training for continuous separation of arbitrary numbers of speakers
(3 minutes introduction)

Thilo von Neumann (Universität Paderborn, Germany), Keisuke Kinoshita (NTT, Japan), Christoph Boeddeker (Universität Paderborn, Germany), Marc Delcroix (NTT, Japan), Reinhold Haeb-Umbach (Universität Paderborn, Germany)

TEACHER-STUDENT MIXIT FOR UNSUPERVISED AND SEMI-SUPERVISED SPEECH SEPARATION
(3 minutes introduction)

Jisi Zhang (University of Sheffield, UK), Cătălin Zorilă (Toshiba, UK), Rama Doddipatla (Toshiba, UK), Jon Barker (University of Sheffield, UK)

Few shot-learning of new sound classes for target sound extraction
(3 minutes introduction)

Marc Delcroix (NTT, Japan), Jorge Bennasar Vázquez (NTT, Japan), Tsubasa Ochiai (NTT, Japan), Keisuke Kinoshita (NTT, Japan), Shoko Araki (NTT, Japan)

AvaTr: One-Shot Speaker Extraction with Transformers
(3 minutes introduction)

Shell Xu Hu (Upload AI, USA), Md. Rifat Arefin (Upload AI, USA), Viet-Nhat Nguyen (Upload AI, USA), Alish Dipani (Upload AI, USA), Xaq Pitkow (Upload AI, USA), Andreas Savas Tolias (Upload AI, USA)

Vocal Harmony Separation using Time-domain Neural Networks
(3 minutes introduction)

Saurjya Sarkar (Queen Mary University of London, UK), Emmanouil Benetos (Queen Mary University of London, UK), Mark Sandler (Queen Mary University of London, UK)

Vocal Harmony Separation using Time-domain Neural Networks
(longer introduction)

Saurjya Sarkar (Queen Mary University of London, UK), Emmanouil Benetos (Queen Mary University of London, UK), Mark Sandler (Queen Mary University of London, UK)

Speaker Verification-Based Evaluation of Single-Channel Speech Separation
(3 minutes introduction)

Matthew Maciejewski (Johns Hopkins University, USA), Shinji Watanabe (Johns Hopkins University, USA), Sanjeev Khudanpur (Johns Hopkins University, USA)

IMPROVED SPEECH SEPARATION WITH TIME-AND-FREQUENCY CROSS-DOMAIN FEATURE SELECTION
(3 minutes introduction)

Tian Lan (UESTC, China), Yuxin Qian (UESTC, China), Yilan Lyu (UESTC, China), Refuoe Mokhosi (UESTC, China), Wenxin Tai (UESTC, China), Qiao Liu (UESTC, China)

Neural Speaker Extraction with Speaker-Speech Cross-Attention Network
(3 minutes introduction)

Wupeng Wang (NUS, Singapore), Chenglin Xu (NUS, Singapore), Meng Ge (NUS, Singapore), Haizhou Li (NUS, Singapore)

Deep audio-visual speech separation based on facial motion
(3 minutes introduction)

Rémi Rigal (Orange Labs, France), Jacques Chodorowski (Orange Labs, France), Benoît Zerr (Lab-STICC (UMR 6285), France)