InterSpeech 2021

Parsing speech for grouping and prominence and the typology of rhythm
(3 minutes introduction)

Michael Wagner (McGill University, Canada), Alvaro Iturralde Zurita (McGill University, Canada), Sijia Zhang (McGill University, Canada)
Humans appear to be wired to perceive acoustic events rhythmically. English speakers, for example, tend to perceive alternating short and long sounds as a series of binary groups with a final beat (iambs), and alternating soft and loud sounds as a series of trochees. This generalization, often called the ‘Iambic-trochaic Law’ (ITL), although viewed as an auditory universal by some, has been argued to be shaped by language experience. Earlier work on the ITL had a crucial limitation, in that it did not tease apart the percepts of grouping and prominence, which the notions of iamb and trochee inherently confound. We explore how intensity and duration relate to percepts of prominence and grouping in six languages (English, French, German, Japanese, Mandarin, and Spanish). The results show that the ITL is not universal, and that cue interpretation is shaped by language experience. However, there are also invariances: Duration appears relatively robust across languages as a cue to prominence (longer syllables are perceived as stressed), and intensity for grouping (louder syllables are perceived as initial). The results show the beginnings of a rhythmic typology based on how the dimensions of grouping and prominence are cued.