Combating Reverberation in NTF-based Speech Separation using a Sub-Source Weighted Multichannel Wiener Filter and Linear Prediction
|Mieszko Fraś (AGH UST, Poland), Marcin Witkowski (AGH UST, Poland), Konrad Kowalczyk (AGH UST, Poland)|
Sound source separation (SS) from the microphone signals capturing speech in reverberant conditions is a formidable task. This paper addresses the problem of joint separation and dereverberation of speech using the multichannel Wiener filter (MWF) that is tailored to the sub-source modeling of each speech source with a full-rank mixing matrix. Specifically, the parameters of the proposed sub-source-weighted (SSW) spatial filter are estimated using the sub-source based expectation maximization (EM) algorithm with multiplicative updates (MU) and the localization prior distribution (LP) on the mixing matrix (SSEM-MU-LP). In addition, we strengthen dereverberation by incorporating a Generalized Weighted Prediction Error (GWPE) algorithm. The proposed method is evaluated using a large dataset of two-channel recordings of clean speech convolved with both real and synthesized impulse responses. The results of the experiments show the superior performance of the proposed method in reverberant conditions in comparison to using the standard NTF-based separation with the vanilla MWF in terms of signal-to-distortion ratio (improvement of 3–5.6 dB) and other commonly used sound separation metrics.