Development of a Psychoacoustic Loss Function for the Deep Neural Network (DNN)-Based Speech Coder <BR>(3 minutes introduction)

Development of a Psychoacoustic Loss Function for the Deep Neural Network (DNN)-Based Speech Coder
(3 minutes introduction)

Joon Byun (Yonsei University, Korea), Seungmin Shin (Yonsei University, Korea), Youngcheol Park (Yonsei University, Korea), Jongmo Sung (ETRI, Korea), Seungkwon Beack (ETRI, Korea)

This paper presents a loss function to compensate for the perceptual loss of the deep neural network (DNN)-based speech coder. By utilizing the psychoacoustic model (PAM), we design a loss function to maximize the mask-to-noise ratio (MNR) in multi-resolution Mel-frequency scales. Also, a perceptual entropy (PE)-based weighting scheme is incorporated onto the MNR loss so that the DNN model focuses more on perceptually important Mel-frequency bands. The proposed loss function was tested on a CNN-based autoencoder implementing the softmax quantization and entropy-based bitrate control. Objective and subjective tests conducted with speech signals showed that the proposed loss function produced higher perceptual quality than the previous perceptual loss functions.

Loading player

InterSpeech 2021

Development of a Psychoacoustic Loss Function for the Deep Neural Network (DNN)-Based Speech Coder
(3 minutes introduction)

Search in Audio

Related Recordings

A Two-stage Approach to Speech Bandwidth Extension
(3 minutes introduction)

Protecting gender and identity with disentangled speech representations
(3 minutes introduction)

InterSpeech 2021

Development of a Psychoacoustic Loss Function for the Deep Neural Network (DNN)-Based Speech Coder (3 minutes introduction)

Search in Audio

Related Recordings

A Two-stage Approach to Speech Bandwidth Extension (3 minutes introduction)

Protecting gender and identity with disentangled speech representations (3 minutes introduction)

Development of a Psychoacoustic Loss Function for the Deep Neural Network (DNN)-Based Speech Coder
(3 minutes introduction)

A Two-stage Approach to Speech Bandwidth Extension
(3 minutes introduction)

Protecting gender and identity with disentangled speech representations
(3 minutes introduction)