Improve Cross-Lingual Text-To-Speech Synthesis on Monolingual Corpora with Pitch Contour Information <BR>(3 minutes introduction)

Improve Cross-Lingual Text-To-Speech Synthesis on Monolingual Corpora with Pitch Contour Information
(3 minutes introduction)

Haoyue Zhan (NetEase, China), Haitong Zhang (NetEase, China), Wenjie Ou (NetEase, China), Yue Lin (NetEase, China)

Cross-lingual text-to-speech (TTS) synthesis on monolingual corpora is still a challenging task, especially when many kinds of languages are involved. In this paper, we improve the cross-lingual TTS model on monolingual corpora with pitch contour information. We propose a method to obtain pitch contour sequences for different languages without manual annotation, and extend the Tacotron-based TTS model with the proposed Pitch Contour Extraction (PCE) module. Our experimental results show that the proposed approach can effectively improve the naturalness and consistency of synthesized mixed-lingual utterances.

Loading player

EfficientSing: A Chinese Singing Voice Synthesis System Using Duration-Free Acoustic Model and HiFi-GAN Vocoder
(3 minutes introduction)

Zhengchen Liu , Chenfeng Miao , Qingying Zhu , Minchuan Chen , Jun Ma , Shaojun Wang , Jing Xiao

InterSpeech 2021

Improve Cross-Lingual Text-To-Speech Synthesis on Monolingual Corpora with Pitch Contour Information
(3 minutes introduction)

Search in Audio

Related Recordings

Cross-lingual Low Resource Speaker Adaptation Using Phonological Features
(3 minutes introduction)

EfficientSing: A Chinese Singing Voice Synthesis System Using Duration-Free Acoustic Model and HiFi-GAN Vocoder
(3 minutes introduction)

InterSpeech 2021

Improve Cross-Lingual Text-To-Speech Synthesis on Monolingual Corpora with Pitch Contour Information (3 minutes introduction)

Search in Audio

Related Recordings

Cross-lingual Low Resource Speaker Adaptation Using Phonological Features (3 minutes introduction)

EfficientSing: A Chinese Singing Voice Synthesis System Using Duration-Free Acoustic Model and HiFi-GAN Vocoder (3 minutes introduction)

Improve Cross-Lingual Text-To-Speech Synthesis on Monolingual Corpora with Pitch Contour Information
(3 minutes introduction)

Cross-lingual Low Resource Speaker Adaptation Using Phonological Features
(3 minutes introduction)

EfficientSing: A Chinese Singing Voice Synthesis System Using Duration-Free Acoustic Model and HiFi-GAN Vocoder
(3 minutes introduction)