CVC: Contrastive Learning for Non-parallel Voice Conversion <BR>(3 minutes introduction)

CVC: Contrastive Learning for Non-parallel Voice Conversion
(3 minutes introduction)

Tingle Li (Tsinghua University, China), Yichen Liu (Tsinghua University, China), Chenxu Hu (Tsinghua University, China), Hang Zhao (Tsinghua University, China)

Cycle consistent generative adversarial network (CycleGAN) and variational autoencoder (VAE) based models have gained popularity in non-parallel voice conversion recently. However, they often suffer from difficult training process and unsatisfactory results. In this paper, we propose a contrastive learning-based adversarial approach for voice conversion, namely contrastive voice conversion (CVC). Compared to previous CycleGAN-based methods, CVC only requires an efficient one-way GAN training by taking the advantage of contrastive learning. When it comes to non-parallel one-to-one voice conversion, CVC is on par or better than CycleGAN and VAE while effectively reducing training time. CVC further demonstrates superior performance in many-to-one voice conversion, enabling the conversion from unseen speakers.

Loading player

Search in Audio

Related Recordings

A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker Identity in Dysarthric Voice Conversion
(3 minutes introduction)

Wen-Chin Huang , Kazuhiro Kobayashi , Yu-Huai Peng , Ching-Feng Liu , Yu Tsao , Hsin-Min Wang , Tomoki Toda

Fine-tuning pre-trained voice conversion model for adding new target speakers with limited data
(3 minutes introduction)

Takeshi Koshizuka , Hidefumi Ohmura , Kouichi Katsurada

InterSpeech 2021

CVC: Contrastive Learning for Non-parallel Voice Conversion (3 minutes introduction)

Search in Audio

Related Recordings

A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker Identity in Dysarthric Voice Conversion (3 minutes introduction)

Fine-tuning pre-trained voice conversion model for adding new target speakers with limited data (3 minutes introduction)

CVC: Contrastive Learning for Non-parallel Voice Conversion
(3 minutes introduction)

A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker Identity in Dysarthric Voice Conversion
(3 minutes introduction)

Fine-tuning pre-trained voice conversion model for adding new target speakers with limited data
(3 minutes introduction)