Speech Synthesis: Prosody Modeling I

Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows
(3 minutes introduction)

Iván Vallés-Pérez (Amazon, UK), Julian Roth (Amazon, UK), Grzegorz Beringer (Amazon, Poland), Roberto Barra-Chicote (Amazon, UK), Jasha Droppo (Amazon, USA)

Phoneme Duration Modeling Using Speech Rhythm-Based Speaker Embeddings for Multi-Speaker Speech Synthesis
(3 minutes introduction)

Kenichi Fujita (NTT, Japan), Atsushi Ando (NTT, Japan), Yusuke Ijima (NTT, Japan)

Fine-grained Prosody Modeling in Neural Speech Synthesis using ToBI Representation
(3 minutes introduction)

Yuxiang Zou (ByteDance, China), Shichao Liu (ByteDance, China), Xiang Yin (ByteDance, China), Haopeng Lin (ByteDance, China), Chunfeng Wang (ByteDance, China), Haoyu Zhang (ByteDance, China), Zejun Ma (ByteDance, China)

Intra-Sentential Speaking Rate Control in Neural Text-To-Speech for Automatic Dubbing
(3 minutes introduction)

Mayank Sharma (Amazon, India), Yogesh Virkar (Amazon, USA), Marcello Federico (Amazon, USA), Roberto Barra-Chicote (Amazon, UK), Robert Enyedi (Amazon, USA)

Intra-Sentential Speaking Rate Control in Neural Text-To-Speech for Automatic Dubbing
(longer introduction)

Mayank Sharma (Amazon, India), Yogesh Virkar (Amazon, USA), Marcello Federico (Amazon, USA), Roberto Barra-Chicote (Amazon, UK), Robert Enyedi (Amazon, USA)

Applying the Information Bottleneck Principle to Prosodic Representation Learning
(3 minutes introduction)

Guangyan Zhang (CUHK, China), Ying Qin (Beijing Jiaotong University, China), Daxin Tan (CUHK, China), Tan Lee (CUHK, China)

InterSpeech 2021