InterSpeech 2021

Speech Synthesis: Prosody Modeling I

Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows
(3 minutes introduction)

Iván Vallés-Pérez (Amazon, UK), Julian Roth (Amazon, UK), Grzegorz Beringer (Amazon, Poland), Roberto Barra-Chicote (Amazon, UK), Jasha Droppo (Amazon, USA)

Fine-grained Prosody Modeling in Neural Speech Synthesis using ToBI Representation
(3 minutes introduction)

Yuxiang Zou (ByteDance, China), Shichao Liu (ByteDance, China), Xiang Yin (ByteDance, China), Haopeng Lin (ByteDance, China), Chunfeng Wang (ByteDance, China), Haoyu Zhang (ByteDance, China), Zejun Ma (ByteDance, China)

Intra-Sentential Speaking Rate Control in Neural Text-To-Speech for Automatic Dubbing
(3 minutes introduction)

Mayank Sharma (Amazon, India), Yogesh Virkar (Amazon, USA), Marcello Federico (Amazon, USA), Roberto Barra-Chicote (Amazon, UK), Robert Enyedi (Amazon, USA)

Intra-Sentential Speaking Rate Control in Neural Text-To-Speech for Automatic Dubbing
(longer introduction)

Mayank Sharma (Amazon, India), Yogesh Virkar (Amazon, USA), Marcello Federico (Amazon, USA), Roberto Barra-Chicote (Amazon, UK), Robert Enyedi (Amazon, USA)

Applying the Information Bottleneck Principle to Prosodic Representation Learning
(3 minutes introduction)

Guangyan Zhang (CUHK, China), Ying Qin (Beijing Jiaotong University, China), Daxin Tan (CUHK, China), Tan Lee (CUHK, China)