InterSpeech 2021

Development of a Psychoacoustic Loss Function for the Deep Neural Network (DNN)-Based Speech Coder
(3 minutes introduction)

Joon Byun (Yonsei University, Korea), Seungmin Shin (Yonsei University, Korea), Youngcheol Park (Yonsei University, Korea), Jongmo Sung (ETRI, Korea), Seungkwon Beack (ETRI, Korea)
This paper presents a loss function to compensate for the perceptual loss of the deep neural network (DNN)-based speech coder. By utilizing the psychoacoustic model (PAM), we design a loss function to maximize the mask-to-noise ratio (MNR) in multi-resolution Mel-frequency scales. Also, a perceptual entropy (PE)-based weighting scheme is incorporated onto the MNR loss so that the DNN model focuses more on perceptually important Mel-frequency bands. The proposed loss function was tested on a CNN-based autoencoder implementing the softmax quantization and entropy-based bitrate control. Objective and subjective tests conducted with speech signals showed that the proposed loss function produced higher perceptual quality than the previous perceptual loss functions.