InterSpeech 2021

Speech Synthesis: Speaking Style and Emotion

Controllable Context-Aware Conversational Speech Synthesis
Jian Cong (Northwestern Polytechnical University, China), Shan Yang (Tencent, China), Na Hu (Tencent, China), Guangzhi Li (Tencent, China), Lei Xie (Northwestern Polytechnical University, China), Dan Su (Tencent, China)

Expressive Text-to-Speech using Style Tag
Minchan Kim (Seoul National University, Korea), Sung Jun Cheon (Seoul National University, Korea), Byoung Jin Choi (Seoul National University, Korea), Jong Jin Kim (SK Telecom, Korea), Nam Soo Kim (Seoul National University, Korea)

SponSpeech: Adaptive Text to Speech for Spontaneous Style
Yuzi Yan (Tsinghua University, China), Xu Tan (Microsoft, China), Bohan Li (Microsoft, China), Guangyan Zhang (CUHK, China), Tao Qin (Microsoft, China), Sheng Zhao (Microsoft, China), Yuan Shen (Tsinghua University, China), Wei-Qiang Zhang (Tsinghua University, China), Tie-Yan Liu (Microsoft, China)

Towards Multi-Scale Style Control for Expressive Speech Synthesis
Xiang Li (Tsinghua University, China), Changhe Song (Tsinghua University, China), Jingbei Li (Tsinghua University, China), Zhiyong Wu (Tsinghua University, China), Jia Jia (Tsinghua University, China), Helen Meng (Tsinghua University, China)

Synthesis of expressive speaking styles with limited training data in a multi-speaker, prosody-controllable sequence-to-sequence architecture
Slava Shechtman (IBM, Israel), Raul Fernandez (IBM, USA), Alexander Sorin (IBM, Israel), David Haws (IBM, USA)