InterSpeech 2021

Comparing Speech Enhancement Techniques for Voice Adaptation-Based Speech Synthesis
(3 minutes introduction)

Nicholas Eng (University of Auckland, New Zealand), C.T. Justine Hui (University of Auckland, New Zealand), Yusuke Hioka (University of Auckland, New Zealand), Catherine I. Watson (University of Auckland, New Zealand)
This study investigates the use of speech enhancement techniques in creating text-to-speech voices with degraded or noisy speech. A number of synthetic voices were created using speech that was first degraded by different noise types at various signal-to-noise ratios (SNRs), then enhanced through four speech enhancement algorithms: Subspace, Wiener filter, SEGAN and a DNN-based method. Subjective listening tests show that the quality of the synthetic voices produced by subspace and the DNN-based method enhanced speech outperforms the quality of the voices created using Wiener filter or SEGAN enhanced speech at low SNRs, and speech enhanced by the subspace method results in higher quality synthetic speech at higher SNRs.