Incorporating Embedding Vectors from a Human Mean-Opinion Score Prediction Model for Monaural Speech Enhancement <BR>(longer introduction)

Incorporating Embedding Vectors from a Human Mean-Opinion Score Prediction Model for Monaural Speech Enhancement
(longer introduction)

Khandokar Md. Nayem (Indiana University, USA), Donald S. Williamson (Indiana University, USA)

Objective measures of success, such as the perceptual evaluation of speech quality (PESQ), signal-to-distortion ratio (SDR), and short-time objective intelligibility (STOI), have recently been used to optimize deep-learning based speech enhancement algorithms, in an effort to incorporate perceptual constraints into the learning process. Optimizing with these measures, however, may be sub-optimal, since the objective scores do not always strongly correlate with a listener’s evaluation. This motivates the need for approaches that either are optimized with scores that are strongly correlated with human assessments or that use alternative strategies for incorporating perceptual constraints. In this work, we propose an attention-based approach that uses learned speech embedding vectors from a mean-opinion score (MOS) prediction model and a speech enhancement module to jointly enhance noisy speech. Our loss function is jointly optimized with signal approximation and MOS prediction loss terms. We train the model using real-world noisy speech data that has been captured in everyday environments. The results show that our proposed model significantly outperforms other approaches that are optimized with objective measures.

Search in Audio

Related Recordings

Incorporating Embedding Vectors from a Human Mean-Opinion Score Prediction Model for Monaural Speech Enhancement
(3 minutes introduction)

Khandokar Md. Nayem , Donald S. Williamson

Restoring degraded speech via a modified diffusion model
(3 minutes introduction)

Jianwei Zhang , Suren Jayasuriya , Visar Berisha

InterSpeech 2021

Incorporating Embedding Vectors from a Human Mean-Opinion Score Prediction Model for Monaural Speech Enhancement (longer introduction)

Search in Audio

Related Recordings

Incorporating Embedding Vectors from a Human Mean-Opinion Score Prediction Model for Monaural Speech Enhancement (3 minutes introduction)

Restoring degraded speech via a modified diffusion model (3 minutes introduction)

Incorporating Embedding Vectors from a Human Mean-Opinion Score Prediction Model for Monaural Speech Enhancement
(longer introduction)

Incorporating Embedding Vectors from a Human Mean-Opinion Score Prediction Model for Monaural Speech Enhancement
(3 minutes introduction)

Restoring degraded speech via a modified diffusion model
(3 minutes introduction)