InterSpeech 2021

A Causal U-net based Neural Beamforming Network for Real-Time Multi-Channel Speech Enhancement
(Oral presentation)

Xinlei Ren (Kuaishou Technology, China), Xu Zhang (Kuaishou Technology, China), Lianwu Chen (Kuaishou Technology, China), Xiguang Zheng (Kuaishou Technology, China), Chen Zhang (Kuaishou Technology, China), Liang Guo (Kuaishou Technology, China), Bing Yu (Kuaishou Technology, China)
People are meeting through video conferencing more often. While single channel speech enhancement techniques are useful for the individual participants, the speech quality will be significantly degraded in large meeting rooms where the far-field and reverberate conditions are introduced. Approaches based on microphone array signal processing are proposed to explore the inter-channel correlation among the individual microphone channels. In this work, a new causal U-net based multiple-in-multiple-out structure is proposed for real-time multi-channel speech enhancement. The proposed method incorporates the traditional beamforming structure with the multi-channel causal U-net by explicitly adding a beamforming operation at the end of the neural beamformer. The proposed method has entered the INTERSPEECH Far-field Multi-Channel Speech Enhancement Challenge for Video Conferencing. With 1.97M model parameters and 0.25 real-time factor on Intel Core i7 (2.6GHz) CPU, the proposed method has outperforms the baseline system of this challenge on PESQ, Si-SNR and STOI metrics.