InterSpeech 2021

F-T-LSTM based Complex Network for Joint Acoustic Echo Cancellation and Speech Enhancement
(Oral presentation)

Shimin Zhang (Northwestern Polytechnical University, China), Yuxiang Kong (Northwestern Polytechnical University, China), Shubo Lv (Northwestern Polytechnical University, China), Yanxin Hu (Northwestern Polytechnical University, China), Lei Xie (Northwestern Polytechnical University, China)
With the increasing demand for audio communication and online conference, ensuring the robustness of Acoustic Echo Cancellation (AEC) under the complicated acoustic scenario including noise, reverberation and nonlinear distortion has become a top issue. Although there have been some traditional methods that consider nonlinear distortion, they are still inefficient for echo suppression and the performance will be attenuated when noise is present. In this paper, we present a real-time AEC approach using complex neural network to better modeling the important phase information and frequency-time-LSTMs (F-T-LSTM), which scan both frequency and time axis, for better temporal modeling. Moreover, we utilize modified SI-SNR as cost function to make the model to have better echo cancellation and noise suppression (NS) performance. With only 1.4M parameters, the proposed approach outperforms the AEC-challenge baseline by 0.27 in terms of Mean Opinion Score (MOS).