Odyssey 2016

The Speaker and Language Recognition Workshop

Understanding individual-level speech variability: From novel speech production data to robust speaker recognition

Shri Narayanan
The vocal tract is the universal human instrument played with great dexterity and skill in the production of speech to convey rich linguistic and paralinguistic information. The understanding of how individuals differ in their speech articulation due to differences in shape and size of their physical vocal instrument, and its acoustic consequences are not well understood. Knowledge of how people differ in their speech production can help create improved automatic speaker recognition technologies as well as inform design of technologies for robust speech-based access to people and information. The talk focuses on steps toward advancing scientific understanding of how vocal tract morphology and speech articulation interact and explain the variant and invariant aspects of speech signal properties across talkers. Of particular scientific interest is the nature of articulatory strategies adopted by individuals in the presence of structural differences across them to achieve phonetic equivalence. Equally of interest are what aspects of, and how, vocal tract morphological differences are reflected in the acoustic speech signal, and if those differences can be estimated from speech acoustics. A crucial part of this goal is to create forward and inverse computational models that relate vocal tract details to speech acoustics toward shedding light on individual speaker differences and informing design of robust speaker recognition technologies. Speech research has mainly focused on surface speech acoustic properties; there remain open questions on how speech properties co-vary across talker, linguistic and paralinguistic conditions. However, there are limitations to uncovering the underlying details from the acoustic signal alone. This talk will describe efforts on direct investigation of the dynamic human vocal tract using novel magnetic resonance imaging techniques and computational modeling to illuminate inter-speaker variability in vocal tract structure, as well as the strategies by which linguistic articulation is implemented. Applications to speaker modeling and recognition will be presented.