InterSpeech 2021

Take a breath: Respiratory sounds improve recollection in synthetic speech
(3 minutes introduction)

Mikey Elmers (Universität des Saarlandes, Germany), Raphael Werner (Universität des Saarlandes, Germany), Beeke Muhlack (Universität des Saarlandes, Germany), Bernd Möbius (Universität des Saarlandes, Germany), Jürgen Trouvain (Universität des Saarlandes, Germany)
This study revisits Whalen et al. (1995, JASA) by evaluating English speaking participants in a perception experiment to determine if their recollection is affected by including breath noises in sentences generated by a speech synthesis system. Whalen found an improvement in recollection for sentences that were preceded by a breath noise compared to sentences without one. While Whalen and colleagues used formant synthesis to render the English sentences, we use a modern concatenative synthesis system. The present study uses inhalations of three different lengths: 0 ms (no breath noise), 300 ms (short breath noise), and 600 ms (long breath noise). Our results are consistent with Whalen and colleagues for the 600 ms condition, but not for the 300 ms condition, indicating that not all inhalations improved recollection. The present study also found a significant effect for sentence length, illustrating that shorter sentences have higher accuracy for recollection than longer sentences. Overall, the present study indicates that respiratory sounds are important to the recollection of synthesized speech and that researchers should focus on longer and more complex types of speech, such as paragraphs or dialogues, for future studies.