InterSpeech 2021

Speech enhancement and intelligibility

Funnel Deep Complex U-net for Phase-Aware Speech Enhancement
(3 minutes introduction)

Yuhang Sun (OPPO, China), Linju Yang (OPPO, China), Huifeng Zhu (OPPO, China), Jie Hao (OPPO, China)

Perceptual Contributions of Vowels and Consonant-vowel Transitions in Understanding Time-compressed Mandarin Sentences
(3 minutes introduction)

Changjie Pan (SUSTech, China), Feng Yang (Shenzhen Second People’s Hospital, China), Fei Chen (SUSTech, China)

Speech Enhancement with Weakly Labelled Data from AudioSet
(3 minutes introduction)

Qiuqiang Kong (ByteDance, China), Haohe Liu (ByteDance, China), Xingjian Du (ByteDance, China), Li Chen (ByteDance, China), Rui Xia (ByteDance, China), Yuxuan Wang (ByteDance, China)

Improving Perceptual Quality by Phone-Fortified Perceptual Loss using Wasserstein Distance for Speech Enhancement
(3 minutes introduction)

Tsun-An Hsieh (Academia Sinica, Taiwan), Cheng Yu (Academia Sinica, Taiwan), Szu-Wei Fu (Academia Sinica, Taiwan), Xugang Lu (NICT, Japan), Yu Tsao (Academia Sinica, Taiwan)

MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement
(3 minutes introduction)

Szu-Wei Fu (Academia Sinica, Taiwan), Cheng Yu (Academia Sinica, Taiwan), Tsun-An Hsieh (Academia Sinica, Taiwan), Peter Plantinga (Ohio State University, USA), Mirco Ravanelli (Mila, Canada), Xugang Lu (NICT, Japan), Yu Tsao (Academia Sinica, Taiwan)

A Spectro-Temporal Glimpsing Index (STGI) for Speech Intelligibility Prediction
(3 minutes introduction)

Amin Edraki (Queen’s University, Canada), Wai-Yip Chan (Queen’s University, Canada), Jesper Jensen (Aalborg University, Denmark), Daniel Fogerty (University of Illinois at Urbana-Champaign, USA)

A Spectro-Temporal Glimpsing Index (STGI) for Speech Intelligibility Prediction
(longer introduction)

Amin Edraki (Queen’s University, Canada), Wai-Yip Chan (Queen’s University, Canada), Jesper Jensen (Aalborg University, Denmark), Daniel Fogerty (University of Illinois at Urbana-Champaign, USA)

Self-Supervised Learning Based Phone-Fortified Speech Enhancement
(3 minutes introduction)

Yuanhang Qiu (Massey University, New Zealand), Ruili Wang (Massey University, New Zealand), Satwinder Singh (Massey University, New Zealand), Zhizhong Ma (Massey University, New Zealand), Feng Hou (Massey University, New Zealand)

Incorporating Embedding Vectors from a Human Mean-Opinion Score Prediction Model for Monaural Speech Enhancement
(3 minutes introduction)

Khandokar Md. Nayem (Indiana University, USA), Donald S. Williamson (Indiana University, USA)

Incorporating Embedding Vectors from a Human Mean-Opinion Score Prediction Model for Monaural Speech Enhancement
(longer introduction)

Khandokar Md. Nayem (Indiana University, USA), Donald S. Williamson (Indiana University, USA)

Restoring degraded speech via a modified diffusion model
(3 minutes introduction)

Jianwei Zhang (Arizona State University, USA), Suren Jayasuriya (Arizona State University, USA), Visar Berisha (Arizona State University, USA)

Restoring degraded speech via a modified diffusion model
(longer introduction)

Jianwei Zhang (Arizona State University, USA), Suren Jayasuriya (Arizona State University, USA), Visar Berisha (Arizona State University, USA)