InterSpeech 2021

Speech Enhancement with Topology-enhanced Generative Adversarial Networks (GANs)
(3 minutes introduction)

Xudong Zhang (CUNY Graduate Center, USA), Liang Zhao (CUNY Lehman College, USA), Feng Gu (CUNY CSI, USA)
Speech enhancement is one of the effective approaches in improving speech quality. Neural network models have been widely used in speech enhancement, such as recurrent neural networks (RNNs), long short-term memory networks (LSTMs), and generative adversarial networks (GANs). However, some of them either handle the speech noise removal tasks in the spectral domain or lack the waveform recovery capability. As a result, the enhanced speeches still include noisy signals. In this study, we propose a topology-enhanced GAN model to tackle noisy speeches in an end-to-end structure. We use the topology features of speech waves as additional constraints and modify the objective function of the GAN by adding a penalty term. The penalty term is a Wasserstein distance of topology features measuring the difference between the generated speech and the corresponding clean speech. We evaluate the proposed speech-enhanced model on the public speech data set with 56 speakers and 20 different types of noisy conditions. The experimental results indicate that the topology features improve the performance of GANs on speech enhancement in metrics of PESQ, CBAK, COVL, and SSNR.