Well the main thing I'm grateful for
is for the award and this wonderful medal. It's a
amazing honor.
particularly pleasing to me because I love this community. I love the Interspeech
community and the Interspeech conferences.
Some people in the audience, I don't know who ??, but she knows particularly that
I'm particularly proud of my ISCA,
previously ESCA, membership number being thirty.
And here is a list of the conferences in the Interspeech series starting
with the predecessor of the first Eurospeech and it was the meeting in Edinburgh in
All of the Eurospeech conferences and on the ICSOP
conferences and since Interspeech 2000
and the one ?? come read and the one I was actually at.
And another four that you find my name in the program was
co-author or member or area chair.
And so that's only three of the them.
You see I have nothing to do with it's Genevan, it's Burgan and it's Budapest.
I have actually being to
Pittsburgh and I've been to Geneva.
Pitty about Budapest.
Such a lovely city and I'll probably never get the chance. I missed it in
However I love these conferences
it's the interdisciplinary nature that I particularly
You heard from the introduction that some
interdisciplinary is
... well it's heart of psycholinguistics
that we're the interdisciplinary undertaking.
But I loved the idea from the beginning of bringing all the speech communities together
in a single organization and
single conference series.
I think the founding fathers of the organisations, the founding
members of Eurospeech
quite a broad theme there
and the founding
father or founding fellow, because we
never knew who it was, for ICSOP that was Heroi Fujisaki.
These people were visionaries
and the continuing success of this conference series is a tribute
to their vision.
in the 1980's, early 90's
and that's
that's why I'm very proud to be to be part of this
of this community, this interdisciplinary community
I love the conferences and I'm just tremendously grateful
for the award of this medal, so thank you very much to everybody
back to my title slide.
I'm afraid it's a little messy
or they're all my affiliations on that. Tanja
already mentioned most of them. You would think wouldn't you that
the various people involved would at least chosen the same shade of red
down on the right-hand side is my primary affiliation at the moment
the MARCS Institute and University of Western Sydney. My previous european
affiliations which I still have a meritus position on the
left of the bottom
the upper layer of loggers there.
I want to call your attention to for practical reason.
So on the on the right is the Centre Of Excellence For The Dynamics Of
Language which is the
an enormous ground actually, it's the big prize in Australian
ground landscape
and this is
this is gonna run for
seven years. It's just started. In fact if I'm
not mistaken it's actually today, it's the first
day of its operation. So it was just awarded, we've just been setting it up
of the last six months and it's starting off today.
And it's a grant worth some 28 million Australian Dollars over seven years
and on the left of that is another big ground
running in the Netherlands for the last .. it's been going for about a year
and a half now
Language in Interaction
and that's a similar kind of undertaking and again it's 27 million euros
over period of ten years.
it is remarkable
that two
government organizations, two government research councils, across different sides of
The World more and less simultaneously saw it was really important to stick some serious
into language research, speech and language research.
Okay now the practical reason that I wanted draw
your attention to these two is that they both have websites
if you have
bright undergraduates looking for a PhD place
at the moment, please go to the Language and Interaction web website where every
six months for at least next six years will be
bunch of new PhD positions advertised.
We are looking worldwide for bright PhD
candidates. It's being run mainly as a training
ground, so the mainly PhD positions on this ground.
And on the right if you know somebody's looking for a postdoc position we are
about to in
the Centre of Excellence about to advertise a very large number of postdoctoral positions mostly
of them require linguistics background,
but please go on look at that website
too, if you or your students or anybody you know
is looking for such a position.
Onto my title Learning about speech why did I choose that?
As Tanja
rubbed in
there weren't many topics that I could have chosen.
In choosing this one
I was guided by first looking at the abstracts for the other keynote
talks in this conference.
And I discovered that there is a theme
two of them actually have learning in the title, two out of the others.
And all of them address some
form of learning about speech and I thought well okay
it would be really useful
in the spirit of encouraging the interdisciplinary communication and integration across the various
Interspeech areas,
if I took
the same kind of
general theme
and started by
by sketching what I think of the
some of them most important basic attributes
of human learning about speech. Namely.
But it starts at
the very earliest possible moment,
no kidding,
I will illustrate that in a second.
That it
actually shapes the
processing, it engineers the
the algorithms that
are going on in your brain
that is that the speech you learn about
sets up the processing that you're going to be using for the rest of your
life. This is
also was foreshadowed and what Tanja just told you about me.
And it never stops, it never stops learning.
so onto
the first part of that.
So let's listen to something.
Warning: you won't be able to understand it.
Well, at least I hope not.
Okay, I see several people in the audience
making ...
movements to show that they have understood what was going on.
what we know now that
infants start learning about speech as soon as the auditory system that they have
is functional.
And the auditory system becomes functional in the third trimester of a mother's pregnancy.
But this to say
for the last three months before you are born you are already listening
to speech
a baby is born
the baby already shows preference for the native language or another language. Very like you
can't tell a difference between individual languages for instance, it's known that you can't tell
difference between Dutch and English on the day you born.
If you're
but you have a preference if you were
exposed to an environment speaking one of those languages for that kind of language.
So what did you think
was in that audio that I just played, I mean what did it sounds like?
Speech, right? But
what else could do ... What language was that?
Do you have any idea?
What language might that have been?
Was it Chinese?
I think that this is an easy question for you guys, come on.
Well, were they speaking chinese in that? No!
Yeah, but it was English, it was Canadian English actually, so
the point is you can't and the baby can
before birth
that it's recording taken from a Canadian team which did the
recording in the mood of
in a moment about eight and half months to nine
months of pregnancy, right? So you can put a little microphone in.
let's don't thing
too much about this.
You can actually make a recording within a womb and that's the kind of
audio that you get. So that kind of audio is
presented to a babies before they're even born and so that's why
they get born with preference, with knowing something about the general shape
of the language. So you can tell that's stress based language, right?
That was the stress based language you were listening to.
Learning about speech starts as early as possible.
We also know now, another thing that many people in this audience would know, that
actually infant
speech perception is one of the most rapidly
growing areas in speech processing,
speech research and all of the moment.
When I set up
a lab 15 years ago in the Netherlands, it was the first modern speech perception
infant speech lab in the Netherlands, now there're half a dozen.
And people who,
PhD students who graduate in this topic have no trouble finding a position. Everybody in
U.S. is hiring every psychology and linguistics
department's that have somebody doing infant speech perception at the moment.
Good job.
Good place
for students to get into.
But what
recent explosion of research in this area
has meant that some
we've actually overturned some of the initial ideas that we had in this area, so
we now know that
it is really
learning that's really grounded in social communication. It's
0:12:57these social interactions with the caregivers that
0:13:06the child to continue learning.
0:13:11That we also know that
0:13:14we don't teach individual words to the babies
0:13:18in the
0:13:19in this very early period they're mainly exposed to continuous speech input and they learn
0:13:25from it.
0:13:26That constructing vocabulary and phonology together
0:13:33it was first thought because of the results that we had that you had to
0:13:37learn the
0:13:40finding repertoire of your language first and only then you could start building a vocabulary.
0:13:47successful building of vocabulary is slow, but nevertheless the very first
0:13:55access to meaning can now be shown
0:13:58as early as the very first access to
0:14:05And the latest,
0:14:09also from my colleagues in Sydney, is that part of the,
0:14:14sorry you know how it was, the a kind of speech
0:14:19called Motherese. The special way you talk to babies.
0:14:22You know you see a baby and you start talking in a special way and
0:14:25it turns out
0:14:25that part of this is under the infants control, it's the infant who
0:14:32elicits this kind of speech by
0:14:35responding positively to it and
0:14:40also trains
0:14:42caregivers to stop doing or
0:14:45to start doing one kind of
0:14:49speech with enhanced finding contrasts and then stop doing that later and start doing
0:14:55individual words and so on. So that's all under the babies' control.
0:15:02So what we
0:15:04tried to do in the lab that I set up in
0:15:09Nijmegen, the Netherlands, some fifteen years ago was to
0:15:19the techniques, the electrophysiological techniques
0:15:23of brain sciences, so using Event-related potentials in the infant brain
0:15:29to look at
0:15:32the signature of word recognition in an infant brain, that's what we were looking for.
0:15:35We decided to go
0:15:36and look for what does word recognition look like
0:15:39in an infant's brain.
0:15:41And we found it.
0:15:42So he's an infant in our lab,
0:15:46sweet, right?
0:15:47You don't have to stick the electrodes on their heads
0:15:53separately, we just have a little cap
0:15:56they were quite happy to wear a little cap.
0:15:59and so
0:16:00what we usually do is
0:16:03familiarize them with speech, so it could be words in isolation or it could be
0:16:11and then we
0:16:14playing some
0:16:16speech as it might be
0:16:18continuous sentences containing
0:16:21the words that they've already heard or containing some other words.
0:16:29what we find is a particular kind of response, this is the word recognition response,
0:16:34a negative
0:16:35response to familiarized words compared to the
0:16:40unfamiliarized words.
0:16:42It's in the left side of the brain
0:16:48this is word onset, it's the word onset here, right.
0:16:54and you'll see it's about some
0:16:56half a second after
0:16:58word onset.
0:17:00And so this is the word recognition effect that you can see
0:17:05in an infant's brain.
0:17:13we know
0:17:14as I said that in the first year of life
0:17:18infants mainly hear
0:17:20continuous speech.
0:17:22Okay so they're able to learn words from continuous
0:17:25speech and so in this experiment
0:17:29we only used continuous speech.
0:17:34And this was with ten month old infants now they don't have understanding any of
0:17:38this, you don't have to
0:17:39understand. Whatever, it's in Dutch.
0:17:43It's just the
0:17:44showing what they were like, so that in the particular trial
0:17:49you'd have
0:17:50eight different
0:17:53sentences and all the sentences have one word in common
0:17:56and this is the word drummer, which happens to be drama, right?
0:18:03then you switch to hearing four
0:18:07later on
0:18:10the trick is that of course all of these things can occur in pairs, so
0:18:15for every infant
0:18:16that hears eight sentences with drummer
0:18:18right there's gonna be another
0:18:20infant that's gonna hear eight sentences with fakirs.
0:18:26so then you have two each of these sentences and what you expect is that
0:18:29you get more
0:18:30negative response to whichever word you have actually
0:18:36already heard
0:18:38and that's exactly what you found. This one has just been published, as you see.
0:18:42And so what we have is the proof that
0:18:45just exposing
0:18:47infants to a word in an continuous speech
0:18:51contexts is enough for them to recognize that same word form
0:18:55and now they don't have understanding of anything at ten months
0:18:57right, they are not understanding anything about. They're pulling out
0:19:00words out of continuous speech
0:19:03at this
0:19:04at this early age.
0:19:07now this is
0:19:09given the fact that
0:19:12input to infants is mainly continuous speech
0:19:18is of course vital that they can do this, right? And another
0:19:27important finding that has come from this series of
0:19:34experiments and in using infants' word recognition effect
0:19:39is that
0:19:43your later
0:19:44language performance
0:19:46as a child,
0:19:47right? So that
0:19:49if you're showing that
0:19:51to become negative going response that I've just talked about already
0:19:56at seven months which is very early
0:20:00if it's a nice big effect that you get, a big difference
0:20:05and if it's a nice clean
0:20:10a response in the brain
0:20:13for instance here is the
0:20:18I've sorted here
0:20:22groups of infants
0:20:23which had a negative responses at age of seven months
0:20:27or in the same experiment did not have a negative response.
0:20:31And at age three
0:20:33look at their comprehension scores, their sentence productions scores, the size of vocabulary scores.
0:20:40The blue guys, the ones who showed that segment, that word recognition effect in continuous
0:20:46at age seven months already
0:20:50much better. So it's a vital for your
0:20:52later development of
0:20:55speech and language competence.
0:20:57Here is an actual
0:21:01participant by participant correlation
0:21:05between the size of the response,
0:21:10so remember that we're looking at negative response so
0:21:13the bigger it is down here, right? The more negative it is
0:21:18the bigger your scores
0:21:20in the number of words you know at age one or the number of words
0:21:25you can speak
0:21:25at age two. Both correlate significantly, so this is really important.
0:21:32Okay, so starting early
0:21:36listening actually just to real continuous speech
0:21:41recognizing that what it consists of is
0:21:47items, that you can pull out of that speech signal and store for later use.
0:21:52That is
0:21:53setting up a vocabulary
0:21:54bin and starting early on that
0:21:56really launches your
0:21:58language skill.
0:22:01And we're currently working on just how long that some
0:22:06that effect lasts.
0:22:08So the second
0:22:10major topic
0:22:12that I want to talk about is how learning shapes processing.
0:22:17You'll know already from Tanja's introduction that this has actually been the
0:22:25theme of my research for the last
0:22:28well I don't think we are going how many years it is now
0:22:32for a long time.
0:22:34And I could easily stand here and talk for the whole hour about this topic
0:22:40or I could talk for a month about this topic alone but I'm not going
0:22:43to. I am going to take one particular
0:22:45really cool,
0:22:47very small
0:22:49example of how it works.
0:22:53So the point is that
0:22:56the way you actually deal with the speech signal,
0:23:00the actual processes that you apply
0:23:04at different
0:23:07depending on the language you
0:23:09grew up speaking or your primary language, right? So those of you out there
0:23:15for whom English is not your primary
0:23:19language you're gonna have different
0:23:21processes going on
0:23:23in your head
0:23:24than what I have.
0:23:28I'm gonna take this really tiny
0:23:32form of processing. So you take a fricative sound right s or f.
0:23:37Now these are pretty simple sounds.
0:23:39How do we recognise? How do we identify
0:23:42a sound,
0:23:43right? For these fricatives do we actually just
0:23:49the frication noise
0:23:51which is different for sss, fff.
0:23:54You can hear just hear the difference
0:23:56sss high frequency energy, right?
0:23:57fff is lower.
0:24:00Or do we analyze the surrounding
0:24:03that information in the vowels? Well, there is always transitional information in any speech
0:24:10signal between sounds. So are we using this in identifying s and f?
0:24:17Maybe we shouldn't because s and f are
0:24:20tremendously common
0:24:22sounds across languages and their pronunciation is very similar across languages so we probably
0:24:28expect it to be much the same way they are processed across languages.
0:24:32But we cannot always test whether
0:24:36vowel information is used in the following way.
0:24:41You ask:
0:24:43is going to be harder
0:24:45to identify particular sound,
0:24:48this works for any sound, right, now we are talking about s and f,
0:24:52if you insert them into a context that was originally added with another sound?
0:24:58So in the experiment I'm gonna tell you about
0:25:02your task is just to detect a sound that might be s or f in
0:25:06this experiment,
0:25:08And it's gonna be nonsense you're listening to so
0:25:10dokubapi pekida tikufa
0:25:13right and your task would then be to press the button when you hear f
0:25:18as sound of
0:25:19f in tikufa.
0:25:20And crucial thing is that every one of those target
0:25:23sound is gonna come from another recording every one of them
0:25:26and it's gonna be either another
0:25:28recording which had origin,
0:25:31which originally have the same.
0:25:39In the tikufa is either gonna have come from another utterance of tikufa
0:25:44or it's gonna come from
0:25:48the tiku_a is gonna come from
0:25:51and have the f put into it, right? So you're going to have
0:25:56mismatch in vowel cues if it was originally tikusa
0:26:00and congruent vowel cues if it was another utterance of tikufa.
0:26:04Now some of you who teach speech science may recognise
0:26:08this experiment because it was originally ... it's a very old experiment,
0:26:13Anybody recognised it?
0:26:15It was originally published in 1958, right? Really old experiment.
0:26:21First done with American English
0:26:25and the result was very surprising because what
0:26:28was found was different for f and s,
0:26:32That in the case of f
0:26:35if it came from another, if tiku_a was originally tikusa
0:26:42it was harder to, if you put the f
0:26:46into a different context that was much harder to detect it,
0:26:49whereas if you did it with the s there was zero effect
0:26:53of the cross-splicing. No effect whatsoever for s.
0:26:56But a big effect for f.
0:26:59So listeners are only using vowel context for f but they weren't using it for
0:27:04s, right? A so this
0:27:05just seemed like a bit of puzzle at the time. But you know in 1958,
0:27:09these old results has been
0:27:11in the text books for years you know. It's in the text books.
0:27:15And the explanation was well you know that it's the high frequency energy in s
0:27:20that makes it clearer,
0:27:21it's you don't need to listen to anything else the vowels, you can just do
0:27:25s on the frication noise
0:27:27alone but f is not so clear, so you need something else.
0:27:34As you will see
0:27:39I'm going to tell you about some thesis work of my student A. Wagner
0:27:44a few years ago.
0:27:46And she first replicated this experiment, so what I'm gonna plug up here is
0:27:52the cross-splicing effect
0:27:56for f minus the effect for s,
0:27:59right so,
0:28:00you know that
0:28:02the bigger effect for f
0:28:04than there is for s, we just saw that, right?
0:28:07And so she replicated that right. The original one was American English she did it
0:28:13with British English and get exactly the same
0:28:15effect, so the
0:28:18huge effect for f and very little effect for s
0:28:24So the size of the effect for f is bigger.
0:28:27And she did in Spanish and got exactly the same result,
0:28:32So it's looking good for the original hypothesis, right?
0:28:36And then she did it in Dutch.
0:28:39In fact there was no effect for either s or f in Dutch
0:28:44or in Italian, she did an Italian,
0:28:46or in German, she did in German,
0:28:48so okay.
0:28:51Audience response time again, right? So I missed that,
0:28:54I didn't tell you one crucial bit of information here.
0:28:58The Spanish listeners were in Madrid,
0:29:02so this is Castilian Spanish,
0:29:05so what two English,
0:29:08think now
0:29:09what two English
0:29:10and Castilian Spanish have
0:29:13that Dutch and
0:29:14German and
0:29:17Chinese or whatever languages don't have?
0:29:21You're good, you're really good.
0:29:23That's right.
0:29:27So here, this is the reason you think the original explanation
0:29:31?? that s is clearer.
0:29:34Accounts for the results for English and Spanish, but doesn't account for the results for
0:29:37Dutch and
0:29:38Italian and German, right? But the
0:29:43the explanation that
0:29:45you need extra information for f,
0:29:49because it's so like θ, right? Because f and θ are about the most confusable
0:29:55phonemes in any phoneme repertoire.
0:29:59As the confusion matrix of English certainly shows us.
0:30:04So you need the extra information for f just because there is another sound in
0:30:10your phoneme repertoire which its confusable with,
0:30:14but how do you test that explanation?
0:30:18you need,
0:30:19now you know I'm not gonna ask you to guess what's coming
0:30:21up, right, because you know it from it if you are looking at the slide.
0:30:25But you need a language
0:30:26which has a lot of different s sounds, right?
0:30:30Because then the effect should reverse
0:30:33if you find a language with a lot of other sounds like s
0:30:37and yes Polish is such a language.
0:30:40Then want you should find in that cross-slicing experiment is that
0:30:46you get a big effect
0:30:48for mismatching vowel cues for s
0:30:51and nothing much for f, if you don't have also have θ theta in the
0:30:55And that's exactly what you find in Polish.
0:30:58Very nice result. How cool is that overturn the textbooks in your PhD?
0:31:06we listened to different sources of information in different
0:31:11languages, right? So we learn to process the signal differently
0:31:16even s and f are really articulated much the same across languages, but in Spanish
0:31:21and English you
0:31:22have fricatives that resemble f and in Polish
0:31:25you have fricatives that resembles s, so you have to pay
0:31:28extra attention to surrounding,
0:31:31well it helps to pay extra attention to surrounding
0:31:36speech information to identify them.
0:31:39The information that surrounds
0:31:41inter-vowel vocalic
0:31:44consonants is always going to be there. There is always information in the vowel which
0:31:48you only use
0:31:49if it helps you.
0:31:52onto the third
0:31:54point that I want to make.
0:31:56Learning about speech
0:31:58never stops.
0:32:01Even if we were only to speak one language,
0:32:04even if we knew every word of that language, so we didn't have to learn
0:32:08any new words,
0:32:09even if we always heard speech spoken in clean conditions
0:32:13there still learning to be done, especially whenever we meet new
0:32:15talker which we can do every day. Especially at the conference.
0:32:22When we do meet new talkers, we adapt quickly.
0:32:26That's one of the
0:32:26the most robust findings in human speech recognition, right? We have no problem walking into
0:32:32a shop
0:32:33and engage in a conversation with somebody behind the counter we never spoken to before.
0:32:40And this kind of talker adaptation also begins very early
0:32:44in infancy
0:32:46and it continues through
0:32:53as I already said
0:32:55you know about
0:32:57particular talkers you can tell your
0:33:00mother's speech from other
0:33:02talkers at birth.
0:33:03So these experiments that people do at birth, right. I mean it's literally within
0:33:09the first couple of hours after an infant is born. In some labs they are
0:33:14presenting them with speech and see
0:33:16if they shown a preference. And they show a preference by sucking
0:33:19harder to keep the,
0:33:21you got to pacify the sucker with the transducer and
0:33:26keep speech signal going and you find
0:33:30that infants will suck longer that hear their own mother's voice than other voices.
0:33:36But when do they,
0:33:38when do they tell the difference between
0:33:43talkers, so you have new talkers, when can an infant
0:33:47tell whether,
0:33:50whether they're same or not?
0:33:52Well you can test discrimination easily
0:33:56in infants, right.
0:33:58And it's a method habituation test methat that we use.
0:34:03So what you do is that you have baby sitting on
0:34:07caretaker's mother's lap.
0:34:10And mother's listening to something else, right. You bring in a music tape or something,
0:34:14so mother
0:34:15can't hear what babies are hearing
0:34:19baby is hearing speech coming over
0:34:24and is looking at a pattern on the screen which
0:34:30and if they look away the speech will stop,
0:34:36What happens is you
0:34:37play them
0:34:39a repeating
0:34:40stimulus of some kind, so
0:34:42in this experiment that I'm gonna talk about, the repeating stimulus is just
0:34:46some sentences that they wouldn't understand
0:34:48being spoken by
0:34:50three different speakers, interchanging one's. Speaker will say
0:34:54a sentence and the next one will say a couple of sentences and the first
0:34:57one will also say a couple of sentences
0:34:58again and third speaker also says sentence These are just sentences that the babies can't
0:35:03actually understand.
0:35:04These babies are actually seven months old. Younger than the baby in the picture there.
0:35:11so as to the
0:35:13stimulus keeps repeating the infant keeps listening, right.
0:35:19And the stimulus keeps repeating,
0:35:22and the infant keeps listening,
0:35:24and the stimulus keeps repeating,
0:35:30and eventually baby get bored and looks away, right.
0:35:33And at that point
0:35:35you change the input,
0:35:38And then you wanna know if and that's the way you test discrimination, does the
0:35:43baby look back? Right.
0:35:44Look back at the screen and perk up.
0:35:47Okay and continues to look at
0:35:52the screen and thereby keep the speech going.
0:35:59these were seven month olds as I said, so really they don't understand anything like
0:36:04no words yet.
0:36:05Maybe that recognise their own name, that's about it.
0:36:10And we have
0:36:11got three different voices, the three different
0:36:15young women
0:36:17that have reasonably similar voices
0:36:19talking away and saying sentences that are you know way beyond seven month olds' comprehension
0:36:25like: Artist are attracted to life in the capital.
0:36:30And then at the point in which the infant
0:36:34loses attention you'll bring in a fourth voice,
0:36:39a new voice and the question is: Does the infant notice?
0:36:44So these are Dutch babies. This was run in Nijmegen.
0:36:49And yes, they do.
0:36:51They really do notice the difference, right.
0:36:55As long as it's in Dutch.
0:36:56We also did the experiment with four people talking in Japanese,
0:37:00four people talking Italian
0:37:02and it was no significant
0:37:06discrimination in that case. So it's only in the native language, right. That is to
0:37:10say the
0:37:10language of the environment that they have been exposed to.
0:37:15this is important because it's not
0:37:18whether speech is understood that's going on here, it's whether sound is familiar, beucase what
0:37:24infants are doing between six and nine months is there
0:37:27they're building up their knowledge of the phonology of
0:37:31their language and building up their first
0:37:35store of words.
0:37:39and then this is important. Some of you probably know the literature from forensic
0:37:44speach science on this and you know that
0:37:51if you're trying to do a voice lineup and pick a speaker you heard in
0:37:56criminal context or something and that speakers is speaking a language you don't know very
0:38:02you're much poorer at making a judgement than if they're speaking
0:38:06the same language as your native language.
0:38:10this appears to be based on exactly the same
0:38:13the same
0:38:14basic phonology
0:38:18adjustment that some
0:38:20that we see happening in the first year of life.
0:38:24And we can do a little bartery. We can show adaptation to
0:38:29to new talkers
0:38:31and strange speech sounds
0:38:33in a perceptual learning experiment that we first
0:38:37ran about eleven years ago
0:38:40and has been replicated in many languages and in many labs around The World since.
0:38:47And in this paradigm what we do is we start with a learning phase, right.
0:38:51Now there are many different kinds of things you can do in this learning phase,
0:38:55but one of them is
0:38:56to ask people to decide, they're listening to individual
0:39:01tokens and you ask them to decide
0:39:03is this the real world or not?
0:39:06And that's called lexical decision task, right.
0:39:09So here's somebody doing lexical decision and they're looking
0:39:12the hearing cushion,
0:39:13astopa, fire place, fire place yes, that's the word, magnify yes,
0:39:20heno no that's not a word, devilish yes, defa no that's not a word and
0:39:23so on just going through pressing the button.
0:39:25Yes, no, yes, no and so on.
0:39:27Now the crucial thing in this experiment that we're doing
0:39:30is that we're changing one of the sounds
0:39:33in the experiment,
0:39:35And we're gonna stick with s and f here, just to keep things simple,
0:39:40but again we've done it with a lot of different sounds,
0:39:45if you
0:39:46for instance had a
0:39:48sound that was halfway between s and f,
0:39:54create a sound along a continuum between s and f that's halfway in between, in
0:39:58the middle,
0:39:59and we stick it on the end of a word like which would've been giraffe
0:40:03but then that sounds like
0:40:08No, like here.
0:40:10Can you hear that it's a blend of f and s.
0:40:16and a dozen of other words in the experiment
0:40:20which all should have an f in
0:40:24if they had a s it would be a non-word, so we expose
0:40:30a group of people to learning that
0:40:33the way the speakers says f
0:40:35is this strange thing which is a bit more s like.
0:40:39Meanwhile there's another group
0:40:41that's doing the same experiment,
0:40:45And they're hearing things like this.
0:40:49That's exactly the same sound at the end of what should be horse.
0:40:54Right, so they have been trained
0:40:57hear that particular strange sound and identify it as s.
0:41:02Where the other group identifies it as
0:41:04as f, right.
0:41:06And then you do a standard phoneme categorization experiment,
0:41:12right. Where what everybody hear is exactly the same continue
0:41:28and some of them were better s and some of them were better f,
0:41:32but none of them are really good s but the
0:41:36the point is that
0:41:38you make a
0:41:40categorization function out of an experiment like that, right, which goes from one
0:41:45of those sounds to the other
0:41:47and you would normally,
0:41:49under normal conditions get
0:41:52a baseline categorization function that are shown up there
0:41:57and if you, but if you're
0:41:59if a category was expanded
0:42:01you might get that function and if your s category was expanded you might get
0:42:06that function okay so
0:42:07that's what we're gonna look at
0:42:09as a result of
0:42:10our experiment, which just one group of people and expanded their f category and another
0:42:14group of people
0:42:15and expanded their s category and that's exactly what you get,
0:42:21Completely different functions for identical continua,
0:42:26Okay, so we exposed these people to a change sound in just a few words
0:42:32so we had
0:42:35up to twenty words in our experiments, but people were
0:42:37tested on many fewer words and obviously
0:42:40in real life where the new talker probably works with one
0:42:48it only works if you could work out what the sound was
0:42:51supposed to be, right. And with real words, so if we did
0:42:54the same thing with non-words there's no significant shift, those are both exactly
0:42:59equivalent to the baseline function.
0:43:02So that's basically what we're doing.
0:43:05Adapting to talkers we just met by adapting our phoneme boundaries
0:43:11especially for them.
0:43:13Now this as I've already said
0:43:18has spawned a huge number of follow-up experiments, not only in our lab.
0:43:23We know that to generalize across the vocabulary don't have to
0:43:27have the same sound in a similar
0:43:33We know that lots of different kinds of exposure
0:43:37can bring about the adaptation
0:43:40doesn't have to be lexical decision task, you don't have to be making any decision
0:43:44about the word,
0:43:45you just have passive exposure, you can have
0:43:47non-sense words if their phone is phonotactic
0:43:51constraints force you to
0:43:54choose one particular sound.
0:43:57And we know that it's pretty much speaker's specific
0:44:01that is the least adjustment is bigger for the speaker you actually heard
0:44:06and we've done it across many different languages and I brought along some results
0:44:11from Mandarin, because Mandarin gives as something really beautiful.
0:44:16Namely that you can do the same
0:44:18adjustment, the same
0:44:21experiment with segments and with tones, right.
0:44:24Different kinds of speech sounds as I said not just
0:44:29the same segments that I used in that
0:44:32experimental but here they are again f and s in Mandarin. Same result.
0:44:38Very new data.
0:44:39And there is the result when you do it with tone one and tone two
0:44:43in Mandarin exactly the same way. Make an ambiguous stimulus halfway between tone one and
0:44:49tone two.
0:44:50And you get the same adjustment.
0:44:54You do
0:44:57use this, you can use this
0:44:59kind of adaptation
0:45:01effectively in a second language which is good.
0:45:06At least
0:45:07in this experiment by colleagues of mine in
0:45:12Nijmegen using the same Dutch input with Dutch listeners get
0:45:16exactly the same shift, right.
0:45:19German students, now German and Dutch are very close languages, and the German students come
0:45:26study in the Netherlands in Nijmegen, they take, imagine this the rest of
0:45:32you who've gone to study in an another
0:45:36country you know, which doesn't speak your L1 (first language).
0:45:41They take a course for five weeks,
0:45:44a course in Dutch for five weeks and at the end of that five weeks
0:45:48they just go into the lectures
0:45:49which are in Dutch
0:45:50and they're just treated like anybody else
0:45:55in the,
0:45:56so that long it takes to learn
0:45:58to get up to speed.
0:46:00If you're German that long it takes to get up to speed with
0:46:04Dutch, okay.
0:46:05So not surprisingly
0:46:07huge effect, the same effect, the same
0:46:14with German students in the Netherlands. I have to say that I'm actually, this is
0:46:20this is my current research, one of my current research projects
0:46:24and the news isn't hundred percent good on this
0:46:27topic after all, because I brought along some data which
0:46:32which is actually just from a couple weeks ago, we've only just got it in,
0:46:37and this is
0:46:42adaptation in two languages,
0:46:45in the same individuals. Now you just seen that graph.
0:46:48That's the Mandarin listeners doing the task in Mandarin
0:46:53and what I'm trying to do in one of my current projects
0:46:58is look at the processing
0:47:01of different languages by the same person,
0:47:05right. Because I want to track down what's
0:47:08what is the source of native language listening advantages in
0:47:12various different context and so what I'm trying to do now is look at the
0:47:18same people
0:47:19doing the same kind of task.
0:47:23It might be listening to noises, it might be perceptual learning for speakers and so
0:47:28in their different languages.
0:47:30So here are the same Mandarin listeners
0:47:33doing the English experiment.
0:47:39Not so good.
0:47:42it looks
0:47:42and these were tested in China so
0:47:47it was,
0:47:49they are not in immersion situation, it is their second language and they are living
0:47:54in their
0:47:54L1 environment, so that's not quite
0:47:57as hopeful as
0:48:00as the previous
0:48:05study. However one thing we know about some
0:48:08about this adaptation to talkers, we've already seen that discrimination
0:48:13between talkers is something that even seven month old listeners can do, so what about
0:48:19this kind of
0:48:22lexically based adaptation to strange pronunciation. We decided to test this in children
0:48:29which couldn't really use a
0:48:35lexical decision experiment, because you can't really ask kids, they don't know a lot of
0:48:42So we did a picture verification experiments with them.
0:48:46A giraffe and the one on the right is a Platypus, right.
0:48:49So the first one ends with the f and the second
0:48:52one ends with the s. We're doing the s/f thing again.
0:48:57and then we had a name continua for our
0:49:02for our
0:49:04finding categorization, so again you don't want to be asking young kids to
0:49:09decide whether they're hearing f or s, it's not natural
0:49:12task but if you teach them that the guy on the left is called Fimpy
0:49:16and the guy on the right is called Simpy
0:49:19and then you give them something that's halfway between Fimpy and Simpy, right.
0:49:25then you can
0:49:27get a phoneme categorization experiment and we first of all had to validate
0:49:33the task with adults, needless to say we did not
0:49:37have to do,
0:49:39the adults could just press a button.
0:49:42So I didn't have to point to the character and so on.
0:49:45But we get the same shift again for the adults
0:49:50and we get it with twelve year olds and we get it with sixty years
0:49:54olds and important differences with twelve
0:49:56year olds and six year olds is that twelve year olds can read already.
0:49:59And six year olds can't read.
0:50:01And there is a certain school of thought that believes
0:50:06that you get phoneme categories from reading. But you don't get phoneme categories from reading,
0:50:10you have
0:50:11your phoneme categories in place very early in life.
0:50:17that's exactly the same effect as you say very early in life even at age
0:50:23you're using your perceptual learning to
0:50:26understand new talkers.
0:50:28And I think I saw our debt over there, so I'm going to show some
0:50:31some of ?? data presented,
0:50:34so we know, yes there you are.
0:50:38This is some of the older work so that we know that
0:50:43that this kind of perceptual learning goes on in life. I brought this particular
0:50:51result which is again with s and f and was presented to Interspeech in 2012
0:50:57so I
0:50:58hope you were all there and you all heard it actually
0:51:01but they also have some
0:51:052013 paper with
0:51:07different phoneme continuum which I urge you also to look at.
0:51:14even when you're losing your hearing you'll still doing this perceptual learning
0:51:19and adapting to
0:51:23to new talkers, so learning about new talkers is just
0:51:27something that human listeners do
0:51:31the lifespan.
0:51:32So that brings me
0:51:33to my final slide.
0:51:36So this has been a
0:51:39tour through some highlights of some really important issues in human learning about speech.
0:51:44Namely that it starts as early as a possibly can,
0:51:48that it actually trains up the nature of the processes
0:51:52and that it never actually stops.
0:51:58when I was doing this I thought well actually you know
0:52:01I love these conferences because they're the
0:52:04interdisciplinary, because we get to talk about the same topic from
0:52:09from different viewpoints. So what actually
0:52:12would I think after
0:52:14preparing this talk?
0:52:17What I think is the
0:52:19biggest difference you could put your finger on between human learning about speech and
0:52:24machine learning about speech.
0:52:27So I have been talking about this during week and I'll give you
0:52:32that question to take to all the other keynotes and think about too
0:52:39if you'd say, you know, it starts at the earliest possible moment, well I mean
0:52:44so would a good machine
0:52:47learning algorithm, right? I mean
0:52:50it shapes the processing, it actually changes the algorithms that you're using, that's not the
0:52:55way because we usually start
0:52:58in programming
0:53:01machine learning system we start with the algorithm, right?
0:53:06You don't actually change the algorithm
0:53:08as a result of the input, but you could. I mean
0:53:12there's no logical reason why that can't be done I think.
0:53:19And never stops what I mean that's not the difference, is it? No that's not
0:53:22a difference you can run
0:53:23any machine learning algorithm as long as you like.
0:53:27I think buried in one of many very early slides is
0:53:32something which is crucially important
0:53:35and that is the social reward.
0:53:38That we now know to be really important factor in the early human
0:53:43learning about speech and you can think of humans
0:53:46as machines that really
0:53:49want to
0:53:50learn about speech. I'd be very happy to talk about this
0:53:54at any time
0:53:56during the rest of this week
0:53:58or at any other time
0:54:00too and I thank you very much for your attention.
0:54:29Hi and fascinating talk
0:54:31so a quick question. Your boundaries the ??. Do they change as a function
0:54:35of the adjacent vowels? So far versus
0:54:39fa, sa versus fa. ??
0:54:46We've always used a whatever was the constant
0:54:54So you're talkind about perceptual learning experiments?
0:54:59The last set of experiments, right? We've always tried to use a
0:55:05varying context so I can't answer that question. If we had used only a
0:55:14or hang on
0:55:16we did use a constant context in the non-word experiment with
0:55:25phonotactic constraints, but then that was different in many other ways so
0:55:31no I can't answer that question but,
0:55:36there is some tangential
0:55:39answer, information from another lab
0:55:44which has shown that people can learn
0:55:47in this way,
0:55:49a dialect feature
0:55:51that is only
0:55:53applied in a certain context.
0:55:57the answer would be yes. People would be sensitive to that if it was consistent,
0:56:11There are two in the same row.
0:56:19Have you found any sex specific differences in the infants' responses?
0:56:24Have we found sex specific differences in the infants' responses. There are some
0:56:29sex specific differences
0:56:31in. But we have not found them in
0:56:36in these speech
0:56:37segmentation. In the word recognition in continuous speech we've actually always looked
0:56:44and never found a significant difference between boys and girls.
0:56:52That was the a short one. So are there any other questions or not?
0:57:02With respect to the
0:57:05negative responses
0:57:07on the words
0:57:08that you used there,
0:57:10that was presented in the experiment
0:57:15at age three the children were..
0:57:17The size of the negative going brain potential, right?
0:57:25Is that just
0:57:27would you say that could be good to
0:57:31detect pathology?
0:57:35Definitely and the person whose name you saw on the slides as first author Caroline
0:57:42is actually starting a new
0:57:45personal career development award project in Amsterdam
0:57:50and in Utrecht, sorry in Utrecht, where she will actually look at that.
0:57:56Okay so, thank you so much again for delivering this wonderful keynote and
0:58:02congratulations again for being our ISCA medalist. I am happy that you're around so you
0:58:07can back our medallist over
0:58:09the whole duration of the Interspeech conference. Thank you Anne.