End-to-End Transformer-Based Open-Vocabulary Keyword Spotting with Location-Guided Local Attention
(3 minutes introduction)
|Bo Wei (Samsung, China), Meirong Yang (Samsung, China), Tao Zhang (Samsung, China), Xiao Tang (Samsung, China), Xing Huang (Samsung, China), Kyuhong Kim (Samsung, Korea), Jaeyun Lee (Samsung, Korea), Kiho Cho (Samsung, Korea), Sung-Un Park (Samsung, Korea)|
Open-vocabulary keyword spotting (KWS) aims to detect arbitrary keywords from continuous speech, which allows users to define their personal keywords. In this paper, we propose a novel location guided end-to-end (E2E) keyword spotting system. Firstly, we predict endpoints of keyword in the entire speech based on attention mechanism. Secondly, we calculate the existence probability of keyword by fusing the located keyword speech segment and text with local attention. The results on Librispeech dataset and Google speech commands dataset show our proposed method significantly outperforms the baseline method and the latest small-footprint E2E KWS method.