A Low-Power Text-Dependent Speaker Verification System with Narrow-Band Feature Pre-Selection and Weighted Dynamic Time Warping
|Qing He, Gregory Wornell and Wei Ma
To fully enable voice interaction in wearable devices, a system requires low-power, customizable voice-authenticated wake-up. Existing speaker-verification (SV) methods have shortcomings relating to power consumption and noise susceptibility. To meet the application requirements, we propose a low-power, text-dependent SV system comprising a sparse spectral feature extraction front-end showing improved noise robustness and accuracy at low power, and a back-end running an improved dynamic time warping (DTW) algorithm that preserves signal envelope while reducing misalignments. Without background noise, the proposed system achieves an equal-error-rate (EER) of 1.1%, compared to 1.4% with a conventional Mel-frequency cepstral coefficients (MFCC)+DTW system and 2.6% with a Gaussian mixture universal background (GMM-UBM) based system. At 3dB signal-to-noise ratio (SNR), the proposed system achieves an EER of 5.7%, compared to 13% with a conventional MFCC+DTW system and 6.8% with a GMM-UBM based system. The proposed system enables simple, low-power implementation such that the power consumption of the end-to-end system, which includes a voice activity detector, feature extraction front-end, and back-end decision unit, is under 380 uW.