InterSpeech 2021

Multiple Sound Source Localization Based on Interchannel Phase Differences in All Frequencies with Spectral Masks
(3 minutes introduction)

Hyungchan Song (GIST, Korea), Jong Won Shin (GIST, Korea)
One of the most widely used cues for sound source localization is the interchannel phase differences (IPDs) in the frequency domain. However, the spatial aliasing makes the utilization of the IPDs in the high frequencies difficult, especially when the distance between the microphones is high. Recently, the phase replication method which considers the direction-of-arrival (DoA) candidates corresponding to all the possible unwrapped phase differences in all frequency bins was proposed. However, high frequency bins with possible spatial aliasing contribute more when constructing initial DoA histograms compared with low frequency bins, which may not be desirable for source localization. In this paper, we propose to utilize the IPDs in all the frequency bins with equal weights regardless of maximum number of phase wrapping in that frequency for dual microphone sound source localization. We applied spectral masks based on local signal-to-noise ratios and coherences between microphone signals to exclude time-frequency bins without directional audio signal from the DoA histogram construction. Experimental results show that the proposed method results in more distinct peaks in the DoA histogram and outperforms the conventional method in various noisy and reverberant environments.