InterSpeech 2021

Intonation Transcription and Modelling in Research and Speech Technology Applications

Cong Zhang, Amalia Arvaniti, Kathleen Jepson, Katherine Marcoux
This tutorial covers the theory and practical applications of intonation research. The following three topics will be introduced to speech technology engineers and researchers new to the field of intonation and prosody: a. the fundamentals of the autosegmental-metrical theory of intonational phonology (AM), a widely accepted phonological framework of intonation; b. a range of automatic and manual annotation methods that can create fast or detailed transcriptions of prosody; c. state-of-the-art modelling techniques for explaining intonation. Organizers Amalia Arvaniti Kathleen (Katie) Jepson Cong Zhang Katherine Marcoux Amalia Arvaniti Amalia Arvaniti is the Chair of English Language and Linguistics at Radboud University, Netherlands. She received her Ph.D. from the University of Cambridge (1991) and has since then held appointments at the University of Kent (2012-2020), the University of California, San Diego (2002-2012), the University of Cyprus (1995-2001), and the University of Oxford (1991-1994). She has published extensively on prosody, particularly on the phonetics and phonology of intonation, and the nature and measurement of speech rhythm. Her research is currently supported by an ERC Advanced grant titled SPRINT which investigates the role of variation in the phonetics and phonology of the intonation systems of English and Greek. Amalia was co-editor and then editor of the Journal of the International Phonetic Association (2014-2015 and 2015-2019 respectively). She also serves on the editorial board of the Journal of Phonetics, Journal of Greek Linguistics, and the Studies in Laboratory Phonology series of Language Science Press; from 2000 to 2020 she was also on the editorial board of Phonology. She is currently the President of the Executive Permanent Council for the Organisation of the International Congress of Phonetic Sciences (2019-2023). Kathleen Jepson Kathleen Jepson is a postdoctoral researcher on Amalia Arvaniti’s ERC-funded SPRINT project, based at Radboud University. She received her Bachelor’s degree (Honours) from the Australian National University in 2013, and her PhD in Linguistics from the University of Melbourne in 2019. Kathleen’s doctoral research, supervised by Prof. Janet Fletcher, Dr. Ruth Singer, and Dr. Hywel Stoakes, was a description of aspects of the prosodic system of Djambarrpuyŋu, an Australian Indigenous language. She has experience in conducting data collection for prosodic analysis in remote locations, and developing analyses of under-described languages. Kathleen’s research interests include the production and perception of prosody, particularly intonation, as well as language description of under-resourced languages in Australia and the Pacific region. Cong Zhang Cong Zhang is a postdoctoral researcher on the ERC-funded SPRINT project at Radboud University. She is in charge of collecting, analysing, modelling, and interpreting the English intonation data. Cong received her DPhil degree from the University of Oxford in 2018, with a thesis examining the interaction of tone and intonation in Tianjin Mandarin. Following her DPhil, she worked as a TTS Linguistics Engineer at A-Lab, Rokid Inc., where she led a project for developing a Singing Voice Synthesis system. Cong’s research covers various aspects of speech prosody; she is also interested in bridging the gap between linguistics theories and speech technology. Katherine Marcoux Katherine Marcoux is the lab manager of SPRINT. She assists with various aspects of the research process, mainly focusing on data analysis. Marcoux completed her MSc at the Universitat Pompeu Fabra, after which she began her PhD thesis at Radboud University investigating the production and perception of native and non-native Lombard speech. She is currently finalizing her doctoral manuscript.