InterSpeech 2021

Effects of Prosodic Variations on Accidental Triggers of a commercial Voice Assistant
(3 minutes introduction)

Ingo Siegert (OvG Universität Magdeburg, Germany)
The use of modern voice assistants has rapidly grown and they can be found in more and more households. By design, these systems have to scan every sound in their surroundings waiting for their respective wake-word before being able to react to the users’ commands. The drawback of this method is that phonetic similar expressions can activate the voice assistant and thus speech utterances or whole private conversations will be recorded and streamed to the cloud back-end for further processing. Many news articles and scientific work reported on inaccurate wake-word detection. Resulting in at least a user’s confusion or at worst security breaches. The current paper is based on a broader analysis of phonetic similar accidental triggers conducted by Schönherr et al., they presented a systematic analysis to detect accidental triggers, using a pronouncing dictionary and a weighted, phone-based Levenshtein distance. In this work, the previously identified accidental triggers are recorded by several speakers under various conditions to investigate the influence of phonetic variances (i.e. intonation and speaking/articulation rate) on the robustness of accidental triggers in a real-world environment.