InterSpeech 2021

Time Delay Estimation for Speaker Localization Using CNN-Based Parametrized GCC-PHAT Features
(Oral presentation)

Daniele Salvati (Università di Udine, Italy), Carlo Drioli (Università di Udine, Italy), Gian Luca Foresti (Università di Udine, Italy)
We propose a time delay estimation (TDE) method for speaker localization based on parametrized generalized cross-correlation phase transform (PGCC-PHAT) functions and convolutional neural networks (CNNs). The PGCC-PHAT is used to build a feature matrix, which gives TDE information of two microphone signals with different normalization levels in the cross-correlation functions. The feature matrix is processed by a CNN, composed by several convolutional layers and fully connected layers and by a regression output for the directly estimation of the time difference of arrival (TDOA). Simulations in noisy and reverberant adverse conditions show that the proposed method improves the TDOA estimation performance if compared to the GCC-PHAT.