InterSpeech 2018

From Vocoders to Code-Excited Linear Prediction: Learning How We Hear What We Hear

Bishnu S. Atal, ISCA medalist (Department of Electrical Engineering, University of Washington, Seattle)
It all started almost a century ago, in 1920s. A new undersea transatlantic telegraph cable had been laid. The idea of transmitting speech over the new telegraph cable caught the fancy of Homer Dudley, a young engineer who had just joined Bell Telephone Laboratories. This led to the invention of Vocoder - its close relative Voder was showcased as the first machine to create human speech at the 1939 New York World's Fair. However, the voice quality of vocoders was not good enough for use in commercial telephony. During the time speech scientists were busy with vocoders, several major developments took place outside speech research. Norbert Wiener developed a mathematical theory for calculating the best filters and predictors for detecting signals hidden in noise. Linear Prediction or Linear Predictive Coding became a major tool for speech processing. Claude Shannon established that the highest bit rate in a communication channel in presence of noise is achieved when the transmitted signal resembles random white Gaussian noise. Shannon’s theory led to the invention of Code-Excited Linear Prediction (CELP). Nearly all digital cellular standards as well as standards for digital voice communication over the Internet use CELP coders. The success in speech coding came with understanding of what we hear and what we do not. Speech encoding at low bit rates introduce errors and these errors must be hidden under the speech signal to become inaudible. More and more, speech technologies are being used in different acoustic environments raising questions about the robustness of the technology. Human listeners handle situations well when the signal at our ears is not just one signal, but also a superposition of many acoustic signals. We need new research to develop signal-processing methods that can separate the mixed acoustic signal into individual components and provide performance similar or superior to that of human listeners.

Bishnu S. Atal is an Affiliate Professor in the Electrical Engineering Department at the University of Washington, Seattle, WA. Born in India, Atal received his bachelor’s degree in physics from the University of Lucknow, Diploma of the Indian Institute of Science, Bangalore, and a Ph.D. in electrical engineering from Brooklyn Polytechnic Institute. He joined Bell Laboratories in 1961, where he researched speech and acoustics until retiring in 2002. Atal holds more than 16 patents. Inspired by the high cost of long-distance phone calls to his family in India when he first moved to the U.S., Atal’s research led to the invention of efficient digital speech coders and standards that lie at the heart of practically every mobile phone in use today. His work has enabled wireless networks to use less spectrum space and fewer towers, enabling even countries without substantial fiber-optic infrastructures to join the mobile revolution. He is a member of the U.S. National Academy of Sciences and National Academy of Engineering. His many honors include the IEEE Jack S. Kilby Signal Processing Medal (2013), the Benjamin Franklin Medal in Electrical Engineering (2003), the Thomas Edison Patent Award (1994), the New Jersey Hall of Fame Inventor of the Year Award (2000), and the IEEE Morris N. Liebmann Memorial Field Award (1986). Bishnu resides in Mukilteo, Washington. He has two daughters, Alka and Namita, two granddaughters, Jyotica and Sonali, and two grandsons, Ananth and Niguel.