InterSpeech 2021

Speech Synthesis: Speaking Style and Emotion

Controllable Context-Aware Conversational Speech Synthesis
(3 minutes introduction)

Jian Cong (Northwestern Polytechnical University, China), Shan Yang (Tencent, China), Na Hu (Tencent, China), Guangzhi Li (Tencent, China), Lei Xie (Northwestern Polytechnical University, China), Dan Su (Tencent, China)

Expressive Text-to-Speech using Style Tag
(3 minutes introduction)

Minchan Kim (Seoul National University, Korea), Sung Jun Cheon (Seoul National University, Korea), Byoung Jin Choi (Seoul National University, Korea), Jong Jin Kim (SK Telecom, Korea), Nam Soo Kim (Seoul National University, Korea)

SponSpeech: Adaptive Text to Speech for Spontaneous Style
(3 minutes introduction)

Yuzi Yan (Tsinghua University, China), Xu Tan (Microsoft, China), Bohan Li (Microsoft, China), Guangyan Zhang (CUHK, China), Tao Qin (Microsoft, China), Sheng Zhao (Microsoft, China), Yuan Shen (Tsinghua University, China), Wei-Qiang Zhang (Tsinghua University, China), Tie-Yan Liu (Microsoft, China)

Towards Multi-Scale Style Control for Expressive Speech Synthesis
(3 minutes introduction)

Xiang Li (Tsinghua University, China), Changhe Song (Tsinghua University, China), Jingbei Li (Tsinghua University, China), Zhiyong Wu (Tsinghua University, China), Jia Jia (Tsinghua University, China), Helen Meng (Tsinghua University, China)

Synthesis of expressive speaking styles with limited training data in a multi-speaker, prosody-controllable sequence-to-sequence architecture
(3 minutes introduction)

Slava Shechtman (IBM, Israel), Raul Fernandez (IBM, USA), Alexander Sorin (IBM, Israel), David Haws (IBM, USA)