InterSpeech 2010

Beyond Sentence Prosody

Chiu-yu Tseng (Institute of Linguistics, Academia Sinica, Taipei, Taiwan)
The prosody of a sentence (utterance) when it appears in a discourse context differs substantially from when it is uttered in isolation. This talk focuses on why global prosody is an intrinsic part of naturally occurring speech. That is to say, prosodic chunking and phrasing occur not only at the sentence level, but also at the discourse level. Read and spontaneous L1 Mandarin speech data, as well as L1 and L2 English data, will be presented to illustrate our proposal that higher-level discourse information takes syntax, phonology and lexicon as sub-level units, and hierarchical contributions add higher units to lower ones to derive multi-phrase global prosody. Traces of global prosody found in lower-level speech units are abundant in the speech signal; their seemingly random occurrences can, in fact, be systematically derived. In the pitch domain, we will show evidence of down-stepping both within and across phrase boundaries, explain why phrasal F0 resets are not uniform, and why some overall F0 trajectories flatten out. In the temporal domain, we will show how speaking rate is adjusted mostly by and across phrases, rather than words, why sometimes pause duration does not occur between boundaries and is thus not the most reliable boundary cue, and why both pre-boundary (phrase-final) lengthening and shortening are found consistently. Furthermore, examination of units larger than the sentence has revealed why prosodic context exhibits both neighborhood linear adjacency and cross-over associative concurrence, and why phrasal prominence must yield to discourse focus. It is argued here that the sentence is not the ultimate unit of speech planning, and that global prosody must take precedence because it more accurately reflects the size and scale of speech planning. The planning itself is highly flexible, however, as our L1 and L2 speech data reveal. In summary, to better understand and model realistic speech, looking from the sentence level up or looking top-down from higher levels of prosodic organization may produce the most interesting results.