|Shoufeng Lin (Curtin University, Australia), Zhaojie Luo (Osaka University, Japan)|
In the speech signal processing area, far-field speaker localization using only the audio modality has been a fundamental but challenging problem, especially in presence of reverberation and a varying number of moving speakers. Many existing methods use speech onsets as reliable directional cues against reverberation and interference. However, signal processing can be computationally costly especially in time domain. In this paper, we present a computationally efficient implementation of the recently proposed Onset-Multichannel Cross Correlation Coefficient (MCCC) method. Instead of scanning the entire spatial grid, reverse mapping and linear interpolation are used. The proposed algorithm with better efficiency is referred to as the Onset-MCC in this paper. Performance of the Onset-MCC is studied over various reverberant and noisy scenarios. To further suppress outliers and address miss-detections, as well as for the adaptive tracking of a varying number of moving speakers, we present an adaptive implementation of the generalized labeled multi-Bernoulli (GLMB) filter. As shown in studied cases, the proposed system demonstrates reliable and accurate location estimates in far-field (T60 = 1s), and is applicable to tracking an unknown and time-varying number of moving speakers.