Fre-GAN: Adversarial Frequency-consistent Audio Synthesis <BR>(3 minutes introduction)

Fre-GAN: Adversarial Frequency-consistent Audio Synthesis
(3 minutes introduction)

Ji-Hoon Kim (Korea University, Korea), Sang-Hoon Lee (Korea University, Korea), Ji-Hyun Lee (Korea University, Korea), Seong-Whan Lee (Korea University, Korea)

Although recent works on neural vocoder have improved the quality of synthesized audio, there still exists a gap between generated and ground-truth audio in frequency space. This difference leads to spectral artifacts such as hissing noise or reverberation, and thus degrades the sample quality. In this paper, we propose Fre-GAN which achieves frequency-consistent audio synthesis with highly improved generation quality. Specifically, we first present resolution-connected generator and resolution-wise discriminators, which help learn various scales of spectral distributions over multiple frequency bands. Additionally, to reproduce high-frequency components accurately, we leverage discrete wavelet transform in the discriminators. From our experiments, Fre-GAN achieves high-fidelity waveform generation with a gap of only 0.03 MOS compared to ground-truth audio while outperforming standard models in quality.

Search in Audio

Related Recordings

Harmonic WaveGAN: GAN-Based Speech Waveform Generation Model with Harmonic Structure Discriminator
(3 minutes introduction)

Kazuki Mizuta , Tomoki Koriyama , Hiroshi Saruwatari

GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis
(3 minutes introduction)

Jinhyeok Yang , Jae-Sung Bae , Taejun Bak , Young-Ik Kim , Hoon-Young Cho

InterSpeech 2021

Fre-GAN: Adversarial Frequency-consistent Audio Synthesis (3 minutes introduction)

Search in Audio

Related Recordings

Harmonic WaveGAN: GAN-Based Speech Waveform Generation Model with Harmonic Structure Discriminator (3 minutes introduction)

GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis (3 minutes introduction)

Fre-GAN: Adversarial Frequency-consistent Audio Synthesis
(3 minutes introduction)

Harmonic WaveGAN: GAN-Based Speech Waveform Generation Model with Harmonic Structure Discriminator
(3 minutes introduction)

GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis
(3 minutes introduction)