InterSpeech 2020

The cognitive status of simple and complex models

Janet B. Pierrehumbert
(University of Oxford)
Abstract Human languages are extraordinarily rich systems. They have extremely large lexical inventories, and the elements in these inventories can be combined to generate a potentially unbounded set of distinct messages. Regularities at many different levels of representation — from the phonetic level through the syntax and semantics — support people's ability to process mappings between the physical reality of speech, and the objects, events, and relationships that speech refers to. However, human languages also simplify reality. The phonological system establishes equivalence classes amongst articulatory-acoustic events that have considerable variation at the parametric level. The semantic system similarly establishes equivalence classes amongst real-world phenomena having considerable variation. The tension between simplicity and complexity is a recurring theme of research on language modelling. In this talk, I will present three case studies in which a pioneering simple model omitted important complexities that were either included in later models, or that remain as challenges to this day. The first is the acoustic theory of speech production, as developed by Gunnar Fant, the inaugural Medal recipient in 1989. By approximating the vocal tract as a half-open tube, it showed that the first three formants of vowels (which are the most important for the perception of vowel quality) can be computed as a linear systems problem. The second is the autosegmental-metrical theory of intonation, to which I contributed early in my career. It made the simplifying assumption that the correct model of phonological representation will support the limited set of observed non-local patterns, while excluding non-local patterns that do not naturally occur. The third case concerns how word-formation patterns are generalised in forming new words, whether though inflectional morphology (as in “one wug; two wugs”) or derivational morphology (as in “nickname, unnicknameable”). Several early models of word-formation assume that the morphemes are conceptual categories, sharing formal properties of other categories in the cognitive system. For all three case studies, I will suggest that — contrary to what one might imagine — the simple models enjoyed good success precisely because they were cognitively realistic. The most successful early models effectively incorporated ways in which the cognitive system simplifies reality. These simplifications are key to the learnability and adaptability of human languages. The simplified core of the system provides the scaffolding for more complex or irregular aspects of language. In progressing from simple models to fully complex models, we should make sure we continue to profit from insights into how humans learn, encode, remember, and produce speech patterns. Janet B. Pierrehumbert is the Professor of Language Modelling in the Department of Engineering Science at the University of Oxford. She received her BA in Linguistics and Mathematics at Harvard in 1975, and her Ph.D in Linguistics from MIT in 1980. Much of her Ph.D dissertation research on English prosody and intonation was carried out at AT&T Bell Laboratories, where she was also a Member of Technical Staff from 1982 to 1989. At AT&T Bell Labs, she collaborated with 2015 ISCA Medalist Mary Beckman on a theory of tone structure in Japanese, and with 2011 ISCA Medalist Julia Hirschberg on a theory of intonational meaning. After she moved to Northwestern University in1989, her research program used a wide variety of experimental and computational methods to explore how lexical systems emerge in speech communities. She showed that the mental representations of words are at once abstract and phonetically detailed, and that social factors interact with cognitive factors as lexical patterns are learned, remembered, and generalized. Pierrehumbert joined the faculty at the University of Oxford in 2015 as a member of the interdisciplinary Oxford e-Research Centre; she is also an adjunct faculty member at New Zealand Institute of Language, Brain, and Behaviour. Her current research uses machine-learning methods to model the dynamics of on-line language. She is a founding member of the Association for Laboratory Phonology, and a Fellow of the Linguistic Society of America, the Cognitive Science Society, and the American Academy of Arts and Sciences. She was elected to the National Academy of Sciences in 2019.