InterSpeech 2021

Speech Synthesis: Toward End-to-End Synthesis II

TacoLPCNet: Fast and Stable TTS by Conditioning LPCNet on Mel Spectrogram Predictions
(3 minutes introduction)

Cheng Gong (Tianjin University, China), Longbiao Wang (Tianjin University, China), Ju Zhang (Huiyan Technology, China), Shaotong Guo (Tianjin University, China), Yuguang Wang (Huiyan Technology, China), Jianwu Dang (Tianjin University, China)

Phonetic and Prosodic Information Estimation Using Neural Machine Translation for Genuine Japanese End-to-End Text-to-Speech
(3 minutes introduction)

Naoto Kakegawa (Okayama University, Japan), Sunao Hara (Okayama University, Japan), Masanobu Abe (Okayama University, Japan), Yusuke Ijima (NTT, Japan)

Phonetic and Prosodic Information Estimation Using Neural Machine Translation for Genuine Japanese End-to-End Text-to-Speech
(longer introduction)

Naoto Kakegawa (Okayama University, Japan), Sunao Hara (Okayama University, Japan), Masanobu Abe (Okayama University, Japan), Yusuke Ijima (NTT, Japan)

Information Sieve: Content Leakage Reduction in End-to-End Prosody Transfer for Expressive Speech Synthesis
(3 minutes introduction)

Xudong Dai (Tianjin University, China), Cheng Gong (Tianjin University, China), Longbiao Wang (Tianjin University, China), Kaili Zhang (Tianjin University, China)

Information Sieve: Content Leakage Reduction in End-to-End Prosody Transfer for Expressive Speech Synthesis
(longer introduction)

Xudong Dai (Tianjin University, China), Cheng Gong (Tianjin University, China), Longbiao Wang (Tianjin University, China), Kaili Zhang (Tianjin University, China)

PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS
(3 minutes introduction)

Ye Jia (Google, USA), Heiga Zen (Google, Japan), Jonathan Shen (Google, USA), Yu Zhang (Google, USA), Yonghui Wu (Google, USA)

Speed up training with variable length inputs by efficient batching strategies
(3 minutes introduction)

Zhenhao Ge (Sony, USA), Lakshmish Kaushik (Sony, USA), Masanori Omote (Sony, USA), Saket Kumar (Sony, USA)

Speed up training with variable length inputs by efficient batching strategies
(longer introduction)

Zhenhao Ge (Sony, USA), Lakshmish Kaushik (Sony, USA), Masanori Omote (Sony, USA), Saket Kumar (Sony, USA)