Small Footprint Multi-channel Keyword Spotting

Jilong Wu, Yiteng Huang, Hyun-Jin Park, Niranjan Subrahmanya, Patrick Violette

Noise robustness remains a challenging problem in on-device keyword spotting. Using multiple-microphone algorithms like beamforming improves accuracy, but it inevitably pushes up computational complexity and tends to require more memory. In this paper, we propose a new neural-network based architecture which takes multiple microphone signals as inputs. It can achieve better accuracy and incurs just a minimum increase in model size. Compared with a single-channel baseline which runs in parallel on each channel, the proposed architecture reduces the false reject (FR) rate by 36.3% and 46.4% relative on dual-microphone clean and noisy test sets, respectively, at a fixed false accept rate.　

Odyssey 2020

The Speaker and Language Recognition Workshop

Small Footprint Multi-channel Keyword Spotting

Search in Audio

Speech Transcript

Related Recordings

Joint Training End-to-End Speech Recognition Systems with Speaker Attributes

Assessing Child Communication Engagement via Speech Recognition in Naturalistic Active Learning Spaces