InterSpeech 2021

CVC: Contrastive Learning for Non-parallel Voice Conversion
(3 minutes introduction)

Tingle Li (Tsinghua University, China), Yichen Liu (Tsinghua University, China), Chenxu Hu (Tsinghua University, China), Hang Zhao (Tsinghua University, China)
Cycle consistent generative adversarial network (CycleGAN) and variational autoencoder (VAE) based models have gained popularity in non-parallel voice conversion recently. However, they often suffer from difficult training process and unsatisfactory results. In this paper, we propose a contrastive learning-based adversarial approach for voice conversion, namely contrastive voice conversion (CVC). Compared to previous CycleGAN-based methods, CVC only requires an efficient one-way GAN training by taking the advantage of contrastive learning. When it comes to non-parallel one-to-one voice conversion, CVC is on par or better than CycleGAN and VAE while effectively reducing training time. CVC further demonstrates superior performance in many-to-one voice conversion, enabling the conversion from unseen speakers.