InterSpeech 2017

Re-inventing speech – the biological way

Björn Lindblom
Professor emeritus University of Stockholm Sweden, Professor emeritus University of Texas at Austin USA
The mapping of the Speech Chain has so far been focused on the experimentally more accessible links – e g, acoustics – whereas the brain’s activity during speaking and listening has understandably received less attention. That state of affairs is about to change now thanks to the new sophisticated tools offered by brain imaging technology. At present many key questions concerning human speech processes remain incompletely understood despite the significant research efforts of the past half century. As speech research goes neuro we could do with some better answers. In this paper I will attempt to shed some light on some of the issues. I will do so by heeding the advice that Tinbergen once gave his fellow biologists on explaining behavior. I paraphrase: Nothing in biology makes sense unless you simultaneously look at it with the following questions at the back of your mind: How did it evolve? How is it acquired? How does it work here and now? Applying the Tinbergen strategy to speech I will, in broad strokes, trace a path from the small and fixed innate repertoires of non-human primates to the open-ended vocal systems that humans learn today. Such an agenda will admittedly identify serious gaps in our present knowledge but, importantly, it will also bring an overarching possibility:

It will strongly suggest the feasibility of bypassing the traditional linguistic operational approach to speech units and replacing it by a first-principles account anchored in biology.

I will argue that this is the road-map we need for a more profound understanding of the fundamental nature spoken language and for educational, medical and technological applications.

I began by studying for a medical degree but gradually my focus shifted to music and languages. Planning to make a living as a foreign language teacher I attended classes that happened to include two lectures on acoustic phonetics by Gunnar Fant at KTH in Stockholm. ‘Anyone interested in s summer job? We could use people with a linguistics background’. He then went on to describe the project. Although I cannot honestly say that I had understood much of the lectures, I volunteered and got lucky. I was completely blown away by the dynamics of the KTH lab and its research activities. This was the early sixties – the post-World War II era with lavish funding on communications and computer technology.

Later in life, I came across an anecdote about Richard Feynman, famous physicist who is said to have left the following formulation permanently on the blackboard of his office: ‘What I cannot create I do not understand!’

Bingo! Was he referring to the acoustic theory of speech production and copy speech synthesis? In a way, he could have been. More importantly I believe, in this short phrase, he managed to capture the ultimate essence of good science – general knowledge based on first principles. It has been at the back of mind for over fifty years as I have studied how spoken language works on-line, how it is learned and how it came to be.

Applying the Feynman criterion to our own broad field shows that we still have a long way to go. There would be nothing wrong with embarking on that voyage equipped with the tools of Big Data and modern hi-tech neuroscience – on the contrary. But ultimately the quality of our applications – e g clinical, educational –will be a function of how well we really understand how humans do it.

End of sermon. Chop, chop.