0:00:09so
0:00:11becomes a features from university of vienna these prevent something
0:00:16at the
0:00:16department of cognitive biology
0:00:19and main interest or in the evolution of language and the
0:00:25mobile communication in
0:00:27but separates
0:00:29and what makes this
0:00:30also very interesting for us is the t v
0:00:34all the users synthetic speech
0:00:36two
0:00:37investigate is questions into
0:00:40there's use hypotheses
0:00:43and
0:00:44are there is a
0:00:45from the
0:00:48i allowed artificial intelligence lab on the friday university
0:00:52brussels
0:00:53and he's
0:00:55interested in the also in the cognitive
0:00:58it uses of language and
0:01:01all the user's machine learning in speech technology for
0:01:06investigated in all
0:01:09this
0:01:09combinatorial
0:01:11factor can
0:01:12somehow be modeled
0:01:14and
0:01:15we also very well known for their work and
0:01:19also for the work on the
0:01:22nine q
0:01:23we ct a monkey vocal tract of speech ready which we will here today
0:01:29this is what i'm because
0:01:36my family
0:01:37is that sounds pretty good fact there are not you
0:01:40i'll try not to put
0:01:43thank you michael effective for the kind introduction said this is the first time bargain
0:01:47i have tried to do it
0:01:48tag team two you know like this that will see how well it works but
0:01:52all start off and then part will
0:01:55give you more technical details of the sort that i'm sure you all hungry for
0:01:59on saturday morning
0:02:00but i'll try and start off the start by giving some
0:02:05just perspective on why a biologist like myself
0:02:08who's interested in animal communication would dive in the speech science actually studied speech science
0:02:14with people like and stevens and mit one as opposed arc
0:02:18and use that used what kind of you guys invented to investigate how we animals
0:02:24make their sounds and y what those sounds me
0:02:29and then we're basically gonna talk
0:02:32so in other words that using the technology of speech science
0:02:36to create animal sounds to understand animal communication and then in the second part of
0:02:41that arc will turn that around and say how can we use an understanding of
0:02:45the animal vocal per tract
0:02:48to understand the evolution of human speech
0:02:50and that is that may the answer may surprise some of you
0:02:55okay so why would why would anyone want to synthesize animal vocalisations why would you
0:03:00wanna make a synthetic cats
0:03:02academy our a synthetic bark
0:03:05and's as i said
0:03:07my drive my main reason for this is because i'm a biologist
0:03:11reg interested in understanding the
0:03:13the biology of animal communication from the point of view of physics and physiology and
0:03:18because speech scientist of done so much of that work we can essentially borrow that
0:03:22to understand animal communication
0:03:25and then we'll turn of the second part where we try and understand how our
0:03:29speech act
0:03:30so i'm sure this is very familiar to you but i just wanna very quickly
0:03:35run through the source-filter theory i'm sure virtually all of you are familiar with this
0:03:39theory
0:03:40what as applies to human language what you might be more surprised by is how
0:03:45broad this theory applies across vertebrate
0:03:48so with the possible exception of fish dolphins another toothed whales and probably a few
0:03:56others like some rodent high frequency sounds
0:03:59this theory that was developed to understand our and speech apparatus and you know basically
0:04:03from the nineteen thirties onto the nineteen seventies turns out to apply to virtually all
0:04:09other sounds that you might think of dogs barking cows moving birds singing it's utterance
0:04:15the basic idea of course is that we can break the speech production
0:04:19process into two components the source which turns aside airflow at the sound and the
0:04:25filter which then modifies that's
0:04:27using formant frequencies which are vocal tract resonances that filter out certain frequency
0:04:32and this is an image that may look familiar
0:04:35this these are vocal folds except these of the vocal folds on the siberian tiger
0:04:40so these this is that a larynx that's the vocal folds are about that long
0:04:44so of course it makes very low frequency vocalisations but you can see that the
0:04:49basic process this error dynamically excited vibration is pretty much the same as what you
0:04:55would see in human vocal folds
0:04:58and of course the vibration rate of these vocal folds the rate at which they
0:05:01slap together determines the pitch of the sound
0:05:05and you may be wondering how we did this we didn't have a live tiger
0:05:09vocalise thing with an enter scope died want to do that this is a dead
0:05:14tagger so this tiger was removed from an animal that was used a nice put
0:05:17on a table we blew air through it and we videotape that and what that
0:05:21shows is just like in humans
0:05:24we don't need active neural firing at the rate of the fundamental frequency to create
0:05:30the source
0:05:31and that seems to be true in the vast majority of sounds bird songs acts
0:05:36are actually localising it at fundamentals of eight khz
0:05:40whales or for are of localising at fundamentals of ten khz
0:05:43all using the same principle
0:05:45there are a few exceptions in my favourite one that many of you will be
0:05:48familiar with
0:05:49is one task per
0:05:51that's a situation where the there is an actual contraction well each contraction of muscle
0:05:57that generates the paper is driven by the brain so that's one of the few
0:06:02exceptions where it's not this kind of passive vibration
0:06:05but again for the vast majority of sounds at we're talking about including everything we
0:06:09know from nonhuman primates this is the way
0:06:12so then that's source out whether it's noisy or harmonic passes through the vocal tract
0:06:18which
0:06:19i we show my students this image the formants being like windows that allow certain
0:06:23frequencies to pass through
0:06:25but it certainly much more fun to listen to what a form it is
0:06:29what i've done here is used lpc resynthesis
0:06:32to take the human speech which is of course of the source
0:06:36and the filter combines
0:06:38where and or
0:06:40and now i'm gonna take the formants of that speech
0:06:43and apply them to this source this is a bison whirring
0:06:48and this is what we hear as a result
0:06:50i
0:06:54i think everybody can understand the words even though it sounds more
0:06:58terrifying when it's a bison saying it
0:07:00just another random example this is an or well
0:07:05in here is the nor we're with my performance
0:07:11okay so i think that illustrates the point what we hear the vocal signal we
0:07:15here is this composite of source and filter
0:07:18and in these cases we can hear the filter doing the phonetic work
0:07:22and this but the source still comes through loud
0:07:25so taking this basic principles of source-filter theory we started thinking
0:07:30okay what kind of
0:07:31cues other than speech might be there an animal signals and one of the first
0:07:36things that's now been
0:07:37really extensively investigated was based on the idea that vocal tract length correlates with body
0:07:44size and because formant frequencies are determined by vocal tract length maybe formants provide a
0:07:50cue to body size in other species
0:07:52so the first part of this is easy we just get "'em" a riser x
0:07:56rays a measure of the vocal tract length you can do that on anaesthetised animals
0:08:00and then we is a little harder to get them to vocalise but when we
0:08:04do that and that of the formants we find this is just one of many
0:08:07cases these are monkeys that vocal tract length correlates with formant dispersion which is the
0:08:12average spacing between the formants and because vocal tract length correlates with body size that
0:08:18means the body length correlates very nicely
0:08:21with well sorry this is one body like correlates very nicely with formants
0:08:26and i first this in monkeys but then we didn't obvious and in pigs it's
0:08:30true in humans it's true and dear this seems like a kind of for the
0:08:34mental aspect of the voice signal that it carries information about body so
0:08:42so
0:08:43this is something that we can see as scientist objectively we can measure this
0:08:48but the question is do animals pay attention to that
0:08:51so it's fine if i go and i measure formants and i can say formants
0:08:54correlate with body size but that's kind of meaningless for animal communication unless the animals
0:08:59themselves perceive that signal
0:09:02so
0:09:03this is where animal sound synthesis comes and how do we ask that question how
0:09:07do we find out whether an animal is paying attention to formants
0:09:10and the answer this is a long time ago this you may some of you
0:09:13may recognise this all version of matlab running on an old macintosh that i generated
0:09:19this speech animal sounds synthesizer using very standard technology that most of you will be
0:09:24familiar with basically
0:09:26when you're prediction predict the formants subtract those away and we have an error signal
0:09:30which we can use as a source and then we can change the formants shift
0:09:34only the formants leaving everything else the same and ask if the animals perceive that
0:09:39shift inform
0:09:42now the way we do these experiments how do you ask an animal whether it
0:09:45perceives that we usually do you something called habituation this a bit you a sheep
0:09:49where we play a bunch of sounds that
0:09:52the in this case the formants remain the same but other aspects very the fundamental
0:09:57frequency the length et cetera varies performance are fixed
0:10:00and now once
0:10:02our listening animal
0:10:03stops paying attention
0:10:05so it may take
0:10:06ten plays or a hundred play is before the animal finally stops looking at the
0:10:11sound but once it's gotten with the original sounds then we play the sounds where
0:10:17we change the formants or change whatever variable interest
0:10:20and we
0:10:21if the animal pays attention to that
0:10:23if they perceive it
0:10:24and find it
0:10:25salient enough to be noticeable then they should look again
0:10:29okay
0:10:30so the first piece is i actually tried this with his whooping cranes a now
0:10:34explain why the second
0:10:36so what i'm gonna do you know it's sort of walk you through this experiment
0:10:39these are whooping crane contact calls
0:10:41and what we did is play a bunch of the actual calls from one particular
0:10:45brand
0:10:46and they sound like this
0:10:50or
0:10:51it's more here's another one sound pretty similar to our years
0:10:56and we keep playing those in cell are so these are recorded we're playing these
0:11:00from a laptop and now we see if the listening bird looks up to we
0:11:05wait till the bird goes down its feeding we play one of these sounds and
0:11:09it looks at
0:11:10because it sounds like there's another would be great
0:11:12so the logic is pretty simple
0:11:14the case of whooping cranes we had to do this in the winter
0:11:17it takes these birds hundreds of trials before they start listening before they start paying
0:11:21attention to the laptop dies and it starts snowing et cetera et cetera
0:11:25but eventually we were able to do this
0:11:27where you get the bird the bits are weighted by playing these kinds of sounds
0:11:31over and over
0:11:36anyway and then
0:11:37just to be safe
0:11:39we play a synthetic replica that we've run through the synthesizer but without changing the
0:11:43formants and if everything's fine they shouldn't just a bit rate of that hears with
0:11:48that sounds like
0:11:53pretty similar
0:11:54and now here's the key moment
0:11:56we play either the formants lowered
0:11:59where the formants fire
0:12:01or
0:12:03and of course you walk in here that because you're humans and you we already
0:12:06knew you perceive formants so the question is one of the birds do
0:12:09and when we do this what we find is that initially
0:12:13the birds respond eighty percent of the time on average but has we go as
0:12:17we get so twenty five or thirty trials finally the last but you a sheep
0:12:22trial
0:12:22by definition is the one where they don't look at all we actually get three
0:12:25of those in a row now we play that synthetic replica they don't work
0:12:30so that means or synthesizer is working and then finally we play these test stimuli
0:12:34and
0:12:35we get a massive just a pitch
0:12:38so we've done this that would make a difference
0:12:40sees and always found the same thing it seems like paying attention the formant frequency
0:12:44shifts
0:12:45in this kind of context is a basic mammalian thing
0:12:49birds to it monkeys do it dogs to it pigs do it and of course
0:12:54people
0:12:55so now you might ask can we go further with that and for example these
0:12:59are two colleagues who have used animal sound synthesis
0:13:03you basically look at what other species are using these formant frequencies for
0:13:10in this case we can show that the model that the deer or the colours
0:13:14are using these sounds as indicators of body size and the kind of evidence we
0:13:18have is for example males played by another male with its with lower formant frequencies
0:13:25that with an elongated vocal tract runaway and are afraid females find the more attractive
0:13:30et cetera et cetera this is again been done with many speech
0:13:34many of probably many of you have heard gear but you might not of her
0:13:38the colossal this is a colossal they have a very impressive vocalisation
0:13:48if you're wondering how a little teddy bear sized animal
0:13:52makes that terrifying sound
0:13:54it's because they actually have a track which is that they've
0:13:57pull the larynx down to make their vocal tract much longer then it would be
0:14:02and a normal animal so by and one getting their vocal tract they make themselves
0:14:06and vector
0:14:08just these are a few of the many publications that use this approach that i
0:14:13just been telling you about to dig deeper into animal communication so i hope but
0:14:20makes the case that this is a worthwhile thing to do it again in a
0:14:23wide variety of sleazy
0:14:26okay so now maybe getting something that's closer to what a lot of you do
0:14:29i wanna turn to the to the this is supposed to be part two sorry
0:14:33we just
0:14:34put this together yesterday
0:14:37why would you
0:14:38what i mean how can you turn this around to start ask questions about
0:14:43human communication based on what we understand about animals
0:14:46and the first fact that kind of course fact that many people in the world
0:14:50of speech sciences been trying to understand for a long time is the fact that
0:14:54we humans are amazing it imitating sounds we not only imitate the speech sounds of
0:14:59our environment
0:15:00but we learn to sing songs we can even in the tape animal sounds or
0:15:04basically kids will imitate whatever sounds they have a rare
0:15:07and it turns out that are nearest living relatives the great apes can't do this
0:15:11at all
0:15:13so this is just one example all these are examples of apes that been raised
0:15:17in human homes
0:15:19and of course a human child by the edge of about one is already making
0:15:22the sounds a bit it is already starting to say it's first words and making
0:15:26the sounds of its environment that adheres and it's in its native language phonology or
0:15:30phonology is and no eight has ever done that no ape has even spontaneously said
0:15:35mama much less learn complex vocalisations
0:15:39and the question that has i mean people are known this for a long time
0:15:42the question that has been driving this field for at least a hundred years and
0:15:47start once time is why is
0:15:49why is it that
0:15:51and animal
0:15:52that's in english seemingly so similar to us that can
0:15:55where to do things like i h
0:15:57and drive a car
0:16:00can even produce the most basic
0:16:02speech so
0:16:03with its vocal tract
0:16:06so that's the sort of driving force behind the second part of
0:16:09block
0:16:10and there's two theories darwin had already mentioned this one is that has something to
0:16:15do with the peripheral vocal apparatus
0:16:17and the other is that it has more to do with the brain and darwin
0:16:20said well they probably both matter but the brain is probably more important what we're
0:16:24gonna try and convince you now is that it is actually the brain that's g
0:16:29and vocal tract differences although they exist are not what are keeping a monkey or
0:16:34an ape from producing speech
0:16:37now the most famous example of
0:16:40a difference between us and apes is illustrated by this these m r is on
0:16:45the on the left side we see here a chimpanzee and the red line marks
0:16:50the vocal folds so that's the larynx
0:16:52and of course in humans the larynx is descended in the vocal tract it pulls
0:16:57down in the throat
0:16:58where is in the chimpanzee the lexus and a high position engaged in the nasal
0:17:03passage most the time
0:17:04and that means that on
0:17:06rests flat in the in them in the map of the tongue is basically sitting
0:17:10like this
0:17:11what happens in humans
0:17:13is that are we essentially swallow the back of our town are larynx to sends
0:17:18pulling the time with it so that we have this two part on that we
0:17:21can move up and down and back and forth and that's how we get this
0:17:25wide variety of speech
0:17:27so the idea and this goes back to darwin's time but it really became concrete
0:17:32in the nineteen sixties is that
0:17:34with the time like that
0:17:35you simply can't make the sense of speech and therefore no matter what brain was
0:17:40in control that vocal tract can make the sounds that you would need to imitate
0:17:44speech
0:17:46and it's a plausible hypothesis
0:17:48it goes back to actually my and meant for phil lieberman who was my phd
0:17:52thesis supervisor published a series of papers in the late sixties and early seventies
0:17:57and what he did was take a dead multi and the beta cast of the
0:18:01vocal tract of the smoky
0:18:03they use that to produce a computer program to simulate the sounds that
0:18:07vocal tract can make there was a lot of guesswork involved because it was one
0:18:11that multi and one cast
0:18:13but they did the best they could
0:18:14and what they found this is an formant one
0:18:18to space
0:18:19what they found it is yours the famous three vials the point files of english
0:18:23e
0:18:25and are that are found in most languages and all those things in there all
0:18:28the numbers are what the monkey vocal tract or what the computer model of the
0:18:32multi track remotely vocal tract could do
0:18:34so they concluded that the acoustic vowel space of a riesz as multi use quite
0:18:38restricted they lack the output mechanism
0:18:42for speech per
0:18:44and this is one of those ideas like i said it's its well-founded in acoustics
0:18:47if you look at what we actually do when we produce speech these just a
0:18:51couple videos that it will be familiar
0:18:54a rainbow as division of white light into many beautiful colours
0:18:57you see that from dancing around in that two dimensional space
0:19:01here it is slow down a bit
0:19:10so we use that ni that additional space "'cause" by swallowing the back of our
0:19:15turn we clearly are using that to its full extent when we produce speech
0:19:21so i think this lieberman hypothesis is quite plausible
0:19:26i became suspicious of this when we first started to train do x rays of
0:19:29animals as they vocalise instead of looking at data animals like this is the classic
0:19:34way of analysing the animal vocal tract take a day got cut in half and
0:19:38draw conclusions about that we trying to get a good localising in the x ray
0:19:44harder than it may seem
0:19:46i have that many animals sitting in a situation like this without localising at all
0:19:51but this little go was one of our first subjects in we played it it's
0:19:54mother's bleeds it would respond
0:19:56and this is what we saw in the extra
0:20:06also use again i want you to look in this region right there
0:20:09when you look that's this anonymous claimed
0:20:13at the glottis prevents mouth breathing so in other words the idea based on the
0:20:17static anatomy is that a goat can't breeze through its mouth
0:20:21and so here's what we actually see
0:20:25this i
0:20:26pulling down a
0:20:30such that every one of those vocalisations passes out through the mouth the get
0:20:34now this shouldn't be that surprising if you think about if you wanna make allow
0:20:38the sound you should other eight through your mouth and not through your nose but
0:20:42again this is what i'm data most acclaimed was impossible up until we started doing
0:20:46this work we've seen in another animal so this is a dog you're gonna see
0:20:49a very expensive pulling down of the larynx to send of the larynx when the
0:20:53dog barks this is low motion
0:21:01however
0:21:05that's the lyrics
0:21:08right
0:21:09what you can see here is that every time the dog parks
0:21:12the larynx pulls down pulling the back at the time with it and basically going
0:21:17into a human like vocal configuration but just one only animal is talking white only
0:21:22while it's vocal i
0:21:24the unusual thing about is that are larynx stays low we keep our larynx low
0:21:28light on not only while we're vocal
0:21:31so when we first got these data more than it's almost twenty years ago i
0:21:36became convinced that this that the set of the larynx can't be the crucial factor
0:21:41keeping animals from localising
0:21:43but unfortunately in the text books it canteens said the reason monkeys can't localise rates
0:21:48can't localise
0:21:49based on peripheral and that they just don't have the vocal tract
0:21:53and it was what i saw the simpsons episode where
0:21:56where
0:21:57it system
0:21:58the simpsons the main guy
0:22:01part no the old guy
0:22:03homer homework like you
0:22:04can wear gets this multi
0:22:06and the motley can talk so homers learning sign language are kept saying it's because
0:22:09he doesn't have the vocal tree
0:22:11so that's when we decided okay this dog and goat stuff isn't enough we have
0:22:15to do it with nonhuman primates and working together with passive thousand far whose monkeys
0:22:21they were and bart who's gonna take over from here we check x rays like
0:22:25this one
0:22:27the multi vocal arising
0:22:29and you'll see there's a little movement of the larynx just the same as we
0:22:32saw in the gutter in the dog and then we trace those to create a
0:22:36vocal tract model in this is where part's gonna
0:22:42i
0:22:49do you wanna take this
0:22:55that looks good
0:22:56a reality
0:22:58okay
0:23:00so
0:23:01yes how we actually
0:23:06and model to
0:23:10to create
0:23:11localization of the monkey no
0:23:14if you think about it it's very different problem from or a problem that requires
0:23:21a very different solution from what we use for human speech because what we're trying
0:23:26to do is to figure out what the monkey
0:23:30could do in principle with its vocal tract and it's not based on what it's
0:23:34actually doing the whole point is that we count multi don't well so
0:23:40so what we don't have is a corpus of data on which we could use
0:23:46some kind of machine learning problem
0:23:49so what we need to do is
0:23:52that really productive approach
0:23:54based on
0:23:56what is in it sends a very old fashioned way of going about speech synthesis
0:24:01and which is articulatory synthesis the not just recap which relate
0:24:07how it works for you but i assume you mural intimately familiar with it and
0:24:13what i would like to stress however is that even though we can to be
0:24:18talking about biology and about speech assigns
0:24:22these methods were developed by people who we're actually engineers they were also people interested
0:24:28in trying to be able to put is many phone conversations on transplant transatlantic cables
0:24:35as possible
0:24:37and so this is very much
0:24:40the fear read it has been developed by engineers by people who were working with
0:24:46the same goals
0:24:48as you guys
0:24:49so how this articulatory synthesis where well you start with an articulatory model you start
0:24:55with an it year of how the vocal tract works
0:25:00and from
0:25:03with a model you can create different positions of the tongue and lips et cetera
0:25:08and from that you need to calculate what is called an area function so an
0:25:14area function is basically the cross sectional area of the vocal tract at each position
0:25:20in the vocal tract
0:25:22and it turns out that the precise details of that area function
0:25:28well the area is the thing that counts the precise shape in the sense that
0:25:34for instance there is a
0:25:37right angle here in the vocal tract that's cool because of the wavelength interval you
0:25:43can ignore that so you can basically model it as straight q with the circular
0:25:51cross sectional shape but the area is the important thing now of course if you
0:25:56want to
0:25:58model that any computer model you have to discuss the score times that so what
0:26:02you and that is
0:26:04with is called a chi model so i and number of choose along the length
0:26:09of the vocal tract from that
0:26:12larynx basically to that
0:26:14and then on the basis of that you can calculate the acoustic response either in
0:26:20the time-domain the frequency domain so that's what we're going to do so how did
0:26:24we do that for the monkey model
0:26:26this is the x-ray image that to come sages child
0:26:32with the outline
0:26:34and in red here you can see the outline of the vocal tract
0:26:39so this is what we have this is what we start with we have we
0:26:43had about a hundred of these
0:26:46and i guess they were made by hand that ratings were made by hand and
0:26:50so what we first need to do is to figure out
0:26:54how the sound waves propagate through this tract
0:26:58and for that the technique that we use is called a medial axis transform so
0:27:04it's basically you're trying to squeeze
0:27:08a circle
0:27:09through that tract and that circle basically represents the propagating acoustic wavefront and if the
0:27:18line in the middle it's kind of the center of the wavefront and the radius
0:27:23of the circle
0:27:24for the diameter of the circle as the diameter of the vocal tract
0:27:32so this is what you end up with
0:27:38and so
0:27:40you can then calculate for each position
0:27:43in the vocal tract
0:27:45from the glottis to the lips
0:27:48the diameter
0:27:52okay so you have it
0:27:54a function
0:27:57the diameter of the vocal tract
0:27:59at each point in the vocal tract
0:28:01however the problem is that this is just
0:28:05part of what we need we need to have the area we don't need to
0:28:08have we that the diameter isn't enough so the problem is
0:28:14we need to calculate the area on the bases of the observed diameter
0:28:21no fortunately it turns out that do good approximation for those monkey vocal tract the
0:28:28function converting diameter to area
0:28:32is more or less the same everywhere in the vocal tract so how do we
0:28:36figured that out
0:28:39apart from the x-ray movies we also had a few mri scans of than the
0:28:45anaesthetised monkey
0:28:48and if you if you look at that
0:28:51so this is this side view so this is where the basically the monkeys
0:28:55let's are
0:28:58this is it's vocal tract
0:29:00here's the larynx
0:29:01and so you can make if you cross
0:29:04section of cuts there and you can see that the shape of the vocal tract
0:29:12i don't these different
0:29:14cross section there is
0:29:16follows this it's not quite a rabble but
0:29:20in this particular shape is kind of the same everywhere
0:29:24and so what you want to know is
0:29:28for a given opening of the vocal tract how large is that area so suppose
0:29:34that the
0:29:35the diameter would be
0:29:37about
0:29:39about this
0:29:42so the area would be this now if you open up further then of obviously
0:29:47the area gets bigger any turns out that follows you know it's just a matter
0:29:53of integration any turns out that what you find is that the areas proportional to
0:29:59some cut some constant
0:30:00times the diameter to the power of
0:30:04one point four there's no deep theoretical reason for that value of one point for
0:30:09each it's something that we learned from observing
0:30:13so now by applying that function to the diameters that we observe we actually find
0:30:20a
0:30:23the area function so this is
0:30:26the position
0:30:27and the area that at each point
0:30:30in the vocal tract no
0:30:34the next step is turning that into someone's
0:30:39and for that we use a again very old fashioned classical approach and acoustic a
0:30:46mobile an electric line analog of the vocal track again you can kind of see
0:30:51that historically a lot of this theory was
0:30:57developed by electrical engineers "'cause" it's an electrical electronic circuit so for each of those
0:31:05discrete to you
0:31:07the electric line a lot models just model basically models the physical wave equation with
0:31:13a little electrical circuit
0:31:16and from that
0:31:18we can then calculate the
0:31:21formant frequencies
0:31:26so for each of those hundred points
0:31:29we
0:31:32we can calculate the first and the second and third formant and these are the
0:31:37values we actually calculated for all those
0:31:42all those point
0:31:46and but there's
0:31:49didn't from this point we've kind of
0:31:55determined what the acoustic abilities of the monkey vocal tract or not
0:31:59from there there's different things that you could do
0:32:03in principle
0:32:06on the basis of this kind of data you can actually make a computer articulatory
0:32:10model
0:32:11and so this is something that is changing my to as done in nineteen eighty
0:32:16nine again you know quite some time ago on the basis of a very similar
0:32:21data about the human vocal tract
0:32:26but
0:32:28it's not certain that we have enough data to actually do the same thing so
0:32:33changing my to what he didn't was he made a thousand
0:32:39tracing so the vocal track and if you if you in if you know how
0:32:42difficult it is to make a single tracing
0:32:45you can imagine how much time he must've spent on making this model
0:32:51and what he then that is basically
0:32:55look at these articulations to a factor analysis and basically derive an hour and articulatory
0:33:02model
0:33:03and articulatory synthesizer so you could basically then use that model to synthesize new so
0:33:10no the problem is we don't have that many tracing so we couldn't problem probably
0:33:15couldn't make a good quality model
0:33:21what we wanted to do and what to comes is going to say in a
0:33:24moment to explain a moment it's re-synthesize some of these sounds and that's still very
0:33:30challenging with a articulatory synthesizer and it wasn't reading necessary for what we wanted to
0:33:37do so we took slightly different approach
0:33:40now
0:33:43one of the things we wanted to do with just quantify the
0:33:48articulatory abilities of monkeys and compared them to humans
0:33:53and wanting to do that
0:33:55we could measure the
0:33:58acoustic range of the monkey vocalisations and one way to do that is by calculating
0:34:04the convex hull now again i'm assume you're all familiar with what a convex whole
0:34:09is just very quickly show you how we did it basically if you wanna call
0:34:14calculate the context will
0:34:16you start with the one of the extreme points
0:34:21and then you
0:34:23basically
0:34:26fit a lying
0:34:27a round those points like if you if you would take a rubber band and
0:34:32just
0:34:33squeeze it around the points and then you can do several things you can calculate
0:34:37the area of the context of all or you can calculate the extend of these
0:34:42things in the f one or the first formant or the second formant and the
0:34:47thing that we did was we based ourselves on the extent
0:34:52well in the area and the extent
0:34:55and one of the things we get is the amp this week
0:35:00wanted to know how the monkey sound it
0:35:03it would be speaking
0:35:06and in order to do that we
0:35:08modified some human sounds in a way very similar to what the comes just showed
0:35:16remote recordings
0:35:18and so this is it
0:35:24sentences spoken by human we that's like this into the
0:35:30formant tracks which is basically which represents the
0:35:35the filter and the source
0:35:38and then we modified those formants
0:35:42in a
0:35:44in a way to make it more similar to a monkey vocal tract so what
0:35:47you've seen so far in the examples that to comes at play to you is
0:35:53where the formants were just shifted up or shifted down we did a little more
0:35:58so we modified them
0:36:01didn't just so the
0:36:05we need to shift the formants up a little bit because the monkey vocal tract
0:36:10is shorter than the human vocal tract so that the formants tend to be higher
0:36:15but in addition what we found is that the range of the second formant it's
0:36:21somewhat be used in the monkey vocal tract
0:36:24in comparison to the
0:36:27human vocal tract so we also
0:36:30breast the range of the second formant
0:36:33and then we resynthesized the sound
0:36:36now
0:36:37the thing with
0:36:40and analysis in terms of source and filter
0:36:44is that it's complete so if you have discourse information and the filter information
0:36:52you can basically
0:36:54re-synthesize the sound perfectly this and there's no loss
0:36:59so if we would you just
0:37:01the humans stores with the modified formants the sound would probably have sounded to perfect
0:37:09so what we wanted to do is use the source that was more monkey like
0:37:14so we actually also synthesized in use force which was based on a very simple
0:37:20model
0:37:23the monkey vocal folds which vibrating the much more irregular weight and human vocal folds
0:37:29do so we took our monkey stores
0:37:34applied
0:37:35the modified formant filter to it
0:37:38and then we got a real monkey focalization
0:37:42and this is where the complete x over again
0:37:44okay
0:37:45so
0:37:51hopefully that satisfied your morning need for technical details but now you must all be
0:37:57wondering after this is just a synopsis of the whole process that we x-ray the
0:38:00monkey making a hundred different vocal tract configurations
0:38:04basically everything that monkey did while he was in our x ray
0:38:08we trace those
0:38:09we use the medial axis and then this complex area diameter the area function to
0:38:15create the
0:38:16model of the vocal tract and then we can form for a synthesized performance from
0:38:21and so what we get here's the original data from lieberman that i showed you
0:38:26at the beginning so the red triangle represents a human females bocal the f one
0:38:32f range of two range of a human female with e a new making up
0:38:36the points
0:38:37and that little blue triangle is what the all model from lieberman said a monkey
0:38:42could do
0:38:43and this is what are mark our model looks like compared to that
0:38:47so unlike me romans model which is very restricted we can see that the multi
0:38:51what a remote key actually does would be to a quite wide variety and the
0:38:56first formant but a somewhat compressed second formant
0:39:01we use that to create multi vowels so artificial multi vowels that occupy the corner
0:39:07of the corners of that convex hull so with five motive hours in a discrimination
0:39:11task humans are basically at ceiling record so they do just as well with the
0:39:15monkey vowels as they do with human vowels and what that shows us
0:39:19is the to mark his capacity to produce a diverse set of files the same
0:39:23as the number in most human languages namely five
0:39:26is absolutely intact so the monkeys vocal tract
0:39:29has no problem doing that
0:39:31we also have good indications that things like bilabial and glottal stops et cetera et
0:39:37cetera many of the different consonants would be possible so clearly the multi vocal tract
0:39:42is capable of producing a wide range of seven
0:39:45note that all sounds very dry such kind of more interesting to hear what are
0:39:49model sounds like if we're trying to imitate human speech
0:39:53i usually so we the model for this was my wife
0:39:57so we had or speak a bunch of sentences but rather than play her first
0:40:01what you should understand i'm gonna play the monkey model first and see if you
0:40:04can understand with the smoke you say
0:40:06right i
0:40:09right
0:40:11everybody got it right
0:40:14okay and their this is my wife's formants with that synthetic monkey a source
0:40:21i
0:40:23okay
0:40:24right i
0:40:27time so
0:40:28what you can here is that there's the phonetic content is basically preserved the human
0:40:33formants are lower which makes sense because humans are larger than monkeys so it has
0:40:38a more based c and less where you're the sound to it but i
0:40:43that the phonetic content is basically present so what the shows us is that whatever
0:40:48it is that keeps a monkey or an eight rate and the human how speaking
0:40:53it's not the peripheral vocal tract it's not the anatomy of their total there
0:40:59and that's basically the conclusion that we drew from this paper the paper was called
0:41:02multi vocal tracts are speech ready
0:41:05and what that tells us is that rather than looking more at the anatomy of
0:41:09the vocal tract
0:41:10we should be paying attention to what to the brain that's in charge and that
0:41:16would be another talk to explain we have lots of evidence about what is about
0:41:19the human brain that gives a such exquisite control over a vocal apparatus but it
0:41:23doesn't seen that the vocal apparatus itself
0:41:26the crucial thing and put in other terms we've done it with the multi but
0:41:30i'm quite sure that the same thing would be true with a dog or a
0:41:33pig or a cal if a human brain were in control a dog or at
0:41:38cal or a pig or a monkey
0:41:41the vocal tract would be perfectly able to communicate english
0:41:45so
0:41:46there's a lot of work to do before we make talking animals but it's gonna
0:41:49involve the brain and not the vocal tract
0:41:53okay so that is our story that was actually faster than we thought just to
0:41:57they are general conclusions is that
0:42:01you can use these methods that we're mainly developed by physicists and engineers to understand
0:42:06human language for human speech to basically understand and synthesize a wide variety of vertebrate
0:42:13sounds
0:42:14i nearly work with four arms with birds and mammals but other people have used
0:42:18these same methods to do things like alligators and frauds so these are very general
0:42:24principles what you all learned in your sort of intro the speech class actually applies
0:42:28to most of the species we know about
0:42:31it's not the vocal tract that keeps most mammals from talking it's really their neural
0:42:36control of that vocal tract
0:42:38and i think the more general message that probably
0:42:42meaningful to pretty much everybody in this room is a better understanding of the physics
0:42:47and physiology of the vocal production system whether it's and the dog a remote you're
0:42:52a dirac a wall can really play a key role it should play a key
0:42:56role in speech synthesis
0:42:59and thus you wanna say a few extra words of wisdom i guess
0:43:03no
0:43:06okay so we i think we have plenty of time for questions so thanks to
0:43:10all the people who did this work and thank you for
0:43:29it'll take the question mike or should i
0:43:31i
0:43:34a cushion is able to
0:43:37inspired by using the women the ball box
0:43:43the vocal folds
0:43:45them again example can force for by using the like behaviour the dynamics will say
0:43:52he's trying to imitate a human it's just what dogs do when they bark it's
0:43:56the ways a second this is one point so and the second is that at
0:44:01the last part of the user that
0:44:03the key by the key difference lies in new mechanisms was really in the what
0:44:07no mechanism yes neural mechanism so my question is able
0:44:13as sometimes because of the dot plot the that this happens so will be disabilities
0:44:18but actually act was again and almost a result of the bit if
0:44:23it is not gonna but only in time
0:44:26so my question was
0:44:29i just talked that the debut the end of the vocal fold dynamics for the
0:44:33ball but
0:44:34and the most mapping that happens in the subject
0:44:37because of that these so is there any kind of q for this was a
0:44:41good use ms
0:44:42question i two r are you asking about the recovery of the source properties or
0:44:47i'm asking about the new them again is on that is responsible because for that
0:44:51piece was good
0:44:53for the auditory perception or for the production okay so what we know i don't
0:45:00have a slide for this but we know that in humans there are direct connections
0:45:04from the neural from the motor cortex onto the neurons you actually control the laryngeal
0:45:09and the tongue muscles
0:45:11those direct connections from cortex on to the laryngeal matter of us are not present
0:45:16in most members
0:45:17so these are absent in other primates they appear to be absent in austin cats
0:45:22and travel et cetera but in those p c's which are good vocal imitators and
0:45:27this includes many birds the parents and my numbers but it also include some packets
0:45:32include elephants it includes various the tations
0:45:36so in all of those groups that have been investigated these direct connections the equivalent
0:45:40of what we humans have are present so the current theory for what is it
0:45:46about our brains that gives us this control is that we have direct connections a
0:45:51lot of the motor neurons
0:45:52and in most animals there's only indirect connections via various brain stem intermediary onto the
0:45:59vocal system itself
0:46:00so in other words we've got this new we its essentially like a new gear
0:46:04shift on this each and vocal tract that we've got
0:46:08that gives our brains more control over it then we would otherwise have
0:46:15a lot more interesting talk
0:46:18so myself i have a free pass at home and a white or evidence we
0:46:23nitpick
0:46:24and so i also works for that it would be quite directional at all be
0:46:28remote or police and what they are saying yes i don't is you are there
0:46:32are also paper published in a channel about converting bring thing last told to speech
0:46:37that the much using speech synthesis for a construction
0:46:40of speech from right how do thing how it is possible to actual and or
0:46:44something similar for our pets to be able to evangelise handle task a signal possible
0:46:51sufficient
0:46:52but that's an interesting question so if
0:46:55given that we can use your all signals but fmri or geology to synthesize okay
0:47:02speech
0:47:04could we do the same thing for animals and my answer from most animals because
0:47:09of my answer the first question would be no the reason is that the there
0:47:14is a correspondence between the cortical signals that we can measure it something like fmri
0:47:21really g and the actual sounds that are produced
0:47:24because in most animals its mainly the brain stem in the midrange that are controlling
0:47:30these as someone attacking or a dog parks
0:47:33it doesn't in fact you can remove the cortex and a cat are still meowing
0:47:37adorable still more
0:47:39in the same way that a human baby who's born without cortex will still cry
0:47:43and laugh
0:47:44in a normal way
0:47:45so i but also say if i would be a lot easier to do this
0:47:48is probably better usage rent money
0:47:51see if you can synthesize laughter and crying
0:47:54from a cortical signal y prediction would be you and if you can do that
0:47:59humans then you won't be able to do it in so i would predict a
0:48:03fink laugh like what i go a that's not a real that i should be
0:48:08correctly control but when i really laugh are i really cry
0:48:12that's gonna be coming from this score brain that's very hard to measure and so
0:48:16you should be able to synthesize realistic laughter crying even it easy maybe
0:48:29do you have any evidence of what the which point enables cmbp connection between the
0:48:33brain and the vocal tract it starts appearing
0:48:36that's the unfortunate answer to that is no probably many of you know there's a
0:48:41there's a whole field in this you have a slide about this there's a whole
0:48:45field that's essentially trying to reconstruct
0:48:49based on fossils when in our history when of this i in the common in
0:48:54history of a revolution these that are capacity for speech occurred and the old argument
0:49:01was always based on if we could know when the larynx decided and we would
0:49:05know one speech occurred
0:49:06hey what i think i've shown you and all this work is that it's not
0:49:10alaryngeal descent
0:49:11that's crucial for speech it's these direct connections
0:49:14and those unfortunately there's just no fossil q
0:49:18to whether there's direct connections that's basically the stuff that really doesn't preserved even for
0:49:23an hour
0:49:25much less for in the fossil record you would need
0:49:27detailed narrow an at any on the micron level to answer that question so it
0:49:32even it's even hard with again
0:49:34please
0:49:37so to comes and i are
0:49:40well we agree on the importance of the of the neural control of course and
0:49:45but we can disagree on the
0:49:48exact precise interpretation of and what the vocal tract data means and video clip
0:49:58i can we do this you know how we think we're
0:50:04that so innocent you could say that has been some fine tuning of the of
0:50:09the human vocal tract to for localization and if you
0:50:15you know if you if you the little liberal in the interpretation of what we
0:50:18find in the fossil record you can say
0:50:23it happened somewhere between three million and three hundred thousand years ago
0:50:29it's not very precise i
0:50:34so that the evidence for this is based on various cues that supposedly indicate based
0:50:40on the base of the scroll what the position of the larynx and tone would
0:50:44be it just "'cause" with
0:50:46"'cause" i have these slides and i took them out "'cause" i thought we'd be
0:50:48too long i want to show you some examples on animals that have independently modify
0:50:54their vocal tract
0:50:56in a way that has nothing to do with speech so the way you can
0:50:58make your vocal tract longer is one make your nose longer like this process monkey
0:51:02or lots of various animals like elephants course you can stick your lips out which
0:51:07many species do so if you do this you sound bigger and if you do
0:51:11this you sound smaller or you can do more bizarre things like
0:51:14make an extension to your nasal tract that forms a big crest like that dinosaur
0:51:19up there or these birds which because sources at the base of the trachea have
0:51:24elongated trachea and all of these adaptations seem to be ways of making that animal
0:51:29sound bigger
0:51:30it's just a nice example this is an animal with the permanently descended larynx is
0:51:35a red deer and you'll find this a pretty impressive sound
0:51:39wow
0:51:41wow
0:51:43so the first thing you probably noticed in that images that pinnits pumping that we're
0:51:47going back that ignore that look at what's happening
0:51:50okay what's happening in the front of the animal and you'll see
0:51:54i as well
0:51:57back and forth
0:51:58and so when we first saw these videos we were like what is this and
0:52:01it turns out what this is that resting position of the larynx that's is a
0:52:06permanently descended larynx in an argument animal and watch what it does what it localisers
0:52:13i
0:52:16i
0:52:22so i think we could all agree that some much more impressive just set of
0:52:26the larynx then the few centimetres that happens in humans
0:52:30and it turns out
0:52:32these are not the only species because in our islands p c's there's a secondary
0:52:36the set of the larynx that happens only and then and only at puberty and
0:52:40i think that's exactly the same kind of adaptation that makes this to do your
0:52:44sound bigger the aurora or a bird sound bigger so i guess that's where we
0:52:49differ i think that
0:52:50even if we know when the larynx to send it in humans it could have
0:52:54been an adaptation to just make yourself sound bigger and it might have been a
0:52:58million years after that
0:53:00that we started using that for speech
0:53:02so that's why i really don't think the fossils are gonna answer because we do
0:53:05not have any answer the only way we're gonna get it i think is by
0:53:08is from genetics now we're covering genetics
0:53:11the gene genome from data seven the neanderthals and these that might help us answer
0:53:16this question about the recognition
0:53:22i've just want to mention that the result where you know scores against based on
0:53:26the part of the story my question is about earlier you and more to communicate
0:53:33of course okay bye divorce so
0:53:36you know you're talking about the vocal tract varies with a voice source of for
0:53:42really downtime it's whatever
0:53:45a lot of seems to do with a with a voice source do have an
0:53:48idea of video poker bring
0:53:50which is i don't aboard to
0:53:53to use pieces
0:53:57well not use the vocal really over emotions so for sure of social behaviors
0:54:03we we've got actually quite a lot of evidence about sort of overall vocabulary size
0:54:08for different species but most of that comes from relatively intuitive
0:54:13scientist listen and they say it in a there's about five sounds there is about
0:54:18twenty sounds there
0:54:20only a few species have we really don't what we need to do which is
0:54:23played back experiments to see what the animals discriminate from others and i would say
0:54:28in many cases that shows us that something that we think is one thing what's
0:54:32a i'm not i'm now or a bark or ground actually has multiple a variance
0:54:39so but i think a conservative number for animal vocabularies is something like fifteen thumbs
0:54:45and a less conservative number would be something like fifty difference
0:54:49and in some birds it goes a lot larger than that but if you're talking
0:54:52about your average mammal it somewhere in that right so roughly thirty would be a
0:54:57good nonhuman primate
0:55:00vocabulary size of discriminable so that have different meetings
0:55:04of course there are sounds animals like can make thousands of different sounds
0:55:09but they do this for example birds in their songs or wales in their songs
0:55:13but they don't appear to use this to second of different meetings so then we
0:55:18can talk about vocabulary anymore we have to just start talking about
0:55:23it's more like
0:55:24phonemes or syllables types router and then meetings
0:55:29we will say something
0:55:32sorry
0:55:36is there's somebody else but who and what do we know what is the frequency
0:55:41resolution of the monkey hearing
0:55:44so that we could hear the relative position of all the formants but
0:55:48to reproduce it absolutely i mean most monkeys have a higher free a higher high
0:55:53frequency cutoffs the most monkeys could hear up to forty or even sixteen khz so
0:55:58the high frequencies are more extensive than ours
0:56:01but where it counts in the low frequencies they're perfect frequency resolution so from five
0:56:06hundred hz to twenty five or thirty five hundred hertz which is where all that
0:56:09formant information is they can they can
0:56:12and that's why of course an animal like and or a chimpanzee or basically any
0:56:16other species you cares can learn to discriminate different human words
0:56:21virtually every dog knows its name and in some cases you can train a dog
0:56:24to discriminate between hundreds or even thousands of words
0:56:27and they can do that
0:56:29so the speech perception apparatus seems to be built on the basically why they share
0:56:34perceptual masking
0:56:38sorry
0:56:39i'm nothing and speech synthesis and of course leaving about how to
0:56:42it would be a place to say that but
0:56:44why
0:56:45actually did you
0:56:47need to do this in this is what we do not to sort of more
0:56:50standard phonetic thing just flew
0:56:52record load of loads of monkey localizations and measure the formant and what you what
0:56:58would happen if we did that
0:57:00well we we've done that and we've actually looked at the subset of the sounds
0:57:04so remember what we have a some of these vocal tract doing what multi vocal
0:57:09tract to do and that influence of things like feeding chewing swallowing et cetera it
0:57:15also includes a class of
0:57:17non vocal displays that most known human pride well most monkeys and apes to do
0:57:23things like this
0:57:25which it's called lip smacking it's a very typical primate thing but it's virtually silent
0:57:31so they make some able a little tiny bit of sand and once p c's
0:57:36they actually vocalise when they do it turns out that those that the most is
0:57:40doing a lot more with its vocal tract in these visual displays then it doesn't
0:57:44it's auditory display
0:57:45so if we just take that the vocal tract configurations where the monte is making
0:57:50a sound it's a subset of what the vocal tract can actually do and in
0:57:54project these nonvocal communicative so you
0:57:58could call them visual communication signals have a lot of the a lot of the
0:58:02interesting variance of the vocal tract shape are there
0:58:05and because those are silent we have to figure out what it sound like if
0:58:09the monkey was vocalised so that's why we have to that's why we had to
0:58:12do all this work that's why it took
0:58:14years to do this
0:58:16and then adjust and to that
0:58:20well i guess coincidentally almost at the same time as our paper came out that
0:58:25we change the way and according to which just mentioned here in the front and
0:58:29came up with the paper where they get exactly what you would use it and
0:58:33they five and basically that
0:58:36actually what the user to different monkey species act-utterance but and they can produce a
0:58:42surprisingly large range of silence that especially surprising if you compared to what the lieberman
0:58:50had claimed that they could produce
0:58:52but not as large as the range of sounds that are mobile produced so
0:58:58they do mainly not produce in their in their actual productions the potential that they
0:59:06have with their vocal tract
0:59:11i would like to come from that i understood correctly what you say on this
0:59:16slide
0:59:17that there is that more generally
0:59:22it is generally passive
0:59:24is the output or at least experiment that
0:59:30generally this woman from give the in two thousand then
0:59:35that just air flow is coming out
0:59:39and then we can say that the vibration rate is generally a c
0:59:45i think this is too risky
0:59:48because this is exactly what would happen if you i'm dead and you bust a
0:59:53are thrown
0:59:54air flow through my vocal folds
0:59:57i don't think we mush my much will be different
1:00:02and in order to do that even though to say that is generally passive i
1:00:07think you have to go and look
1:00:11more about neuronal activity
1:00:15and not just about experiment i respect teachers work but i think this is to
1:00:24dangers to
1:00:25to say these
1:00:27you on that slide i think there may be a miss i mean because we're
1:00:33not saying that you don't need muscles to put the larynx in to phonatory position
1:00:38of course you do that work in this case i move you tigers larynx in
1:00:43the phonatory position
1:00:45what we're saying is that the individual pulses that represent the fundamental frequency so the
1:00:49openings and closings of the glottis that's what that's what is passively determined by things
1:00:55like muscle tension and pressure
1:00:58so we're not saying that muscle activity doesn't play a role what we're saying that
1:01:03it doesn't have to happen at the periodicity of the fundamental frequency
1:01:08and that's obvious thing if you think about a pack that's producing sounds at forty
1:01:12thousand ten at a forty thousand hz there's no way neurons can fire that neurons
1:01:17basically can't fire faster than thousand
1:01:20so even if it didn't work for something like an elephant and it does work
1:01:24for something like a cat at thirty hz
1:01:26it could never work for most of the high causation
1:01:29even a cat two thousand hz and certainly not these animals that are producing in
1:01:34the high khz range it has to be passed because there's no way neurons can
1:01:38fire or muscles can twitch
1:01:40that rapidly
1:01:41so the clean is not then in humans you or any animal that you don't
1:01:44need to use muscles to put the and that to control the larynx you do
1:01:49but only that you don't need muscle activity at the frequency the fundamental frequency
1:01:53is that make sense
1:01:57it's better
1:02:04and some just curious
1:02:06you labour man and you both did work trying to figure out exactly the same
1:02:11thing a subject and i came to radically different conclusions so
1:02:17was the lieberman what's the improvements is that approach never going to work or what
1:02:22was the issue that distinguished and that you know that made the difference between what
1:02:27you did and he did and what can that teachers for other things we want
1:02:31to do as well do not draw conclusions
1:02:34i would say from the i mean maybe you can comment on this two but
1:02:38from the point of view the technology
1:02:41what we're doing to understand how you go from a vocal tract to formant frequencies
1:02:47not much just change they did a pretty good job a given the computers they
1:02:51had their simulation was pretty good their problem was in the biology their problem was
1:02:55that they took a single then animal and the expected that
1:02:59then animal was gonna tell them the range of motions that are possible in a
1:03:04living animals vocal tract
1:03:05so they had no indication of what the dynamics
1:03:08the vocal tract or
1:03:10from looking at the data and that's what we needed this x rays of a
1:03:13building monkey to be able to find out
1:03:16okay so but you don't saying that you can never figure out what to do
1:03:22is going on from a dead animal what so if you
1:03:27so
1:03:29so by the way that is class which should be familiar name two people working
1:03:33on speech synthesis with the call theorem one of these paper here and so he
1:03:38was basically the guy at the acoustic modeling
1:03:42work and so at the time there are q competing labs working on speech synthesis
1:03:48and i basically the acoustic model i used for my model is basically contemporaneous with
1:03:55a ten is quite small so indeed you know classic stuff
1:03:59so basically they just didn't have the data it's kind of like all eighties neural
1:04:03nets verses google
1:04:05they just didn't have the data and we have a data
1:04:13and
1:04:15yes
1:04:16and i think it's a very as
1:04:20defined benefit different bands right okay not can make it and fifteen t fact there
1:04:25is no
1:04:28something like fifteen to fifty as a session one and here is to now
1:04:32if the semantics of a time to express
1:04:35i was trained praying all rights a very different
1:04:38set
1:04:39just a it is a fiction planes are the and in my state is virtually
1:04:44pains they're very different to what they're trying to express
1:04:47there's a certain set of course vocalisations that are very widely shared among species for
1:04:53so for example sounds that means threat sounds that say i'm being mean and scary
1:04:58so i tend to be low and have very low performance
1:05:02sounds that are appeasing in saying that we don't hurt me i'm just a little
1:05:05guy tend to be high frequency
1:05:07so we see that class the vocalisations vary widely across mammals and birds
1:05:12then we have this class of kind of meeting vocalisations that a lot of species
1:05:17do but they typically sound very different sometimes it's males just going well like that
1:05:22and sometimes it much more interesting and complicated
1:05:24and then there's typically mother infant communications and so there's usually sounds that are that
1:05:31a mother users with for this particular in mammals that the mother uses to communicate
1:05:36again very widespread
1:05:38and then there's really weird stuff mike where all songs or echo location clicks at
1:05:44all phones that are really only found in particular groups so i'd say there's a
1:05:48kind of shared core of semantics and then various it's biology so there's all kinds
1:05:54of weird stuff in the corners but if you say parental care
1:06:00aggression affiliation
1:06:01and
1:06:03there's also alarm calls and three calls are pretty common but a handful of maybe
1:06:08five semantic axes would probably do it from a standard
1:06:20well the there are some vocalisations that basically saying i'm here
1:06:25and their other vocalisations the try their best a high that so back the a
1:06:29very high frequency quiet thing that tails off it makes it hard to find so
1:06:33various alarm calls are like that
1:06:36it like a there is an active basis is it
1:06:39so for fact that market i block
1:06:42in fact it is quite a lot of human where it's that's right
1:06:46but if a vocabulary but can express is so small that maps model about what
1:06:53making this pen
1:06:55seven or something that brightness
1:06:57to various have
1:07:00i do not put it in a fight
1:07:02if i if i and response
1:07:04and then where it's at an unconstrained a few
1:07:08that's kind of frustration very
1:07:14well i think that is a fundamental finding of animal communication is that animals understand
1:07:20a lot more the then they can say
1:07:22so essentially we have many species for example that understand not only their own species
1:07:27but they can learn the alarm calls of other species in their environment and of
1:07:31course animals raise with humans learn to understand human words and not of the species
1:07:36every produce those
1:07:38so it just does the child's write any of us are receptive vocabulary the words
1:07:42we understand are much larger than the number of words we say typically
1:07:46for most animals i think the receptive vocabulary is large and the productive vocabulary is
1:07:52very limited
1:07:53when they find that frustrating or not
1:07:55i don't know that's harder so
1:08:03so the
1:08:07humans have more control over all or there are also in the water no value
1:08:13model to use the excitation signal was much working or
1:08:18so project was to every other mean and what we present more clearer and more
1:08:23how to model
1:08:25this case back to this image we've done a lot of work now doing excise
1:08:31larynx work in one of the things we found is the most species can very
1:08:35easily be driven into a chaotic state
1:08:38where rather than this nice regular harmonic process that we see here you get essentially
1:08:45coupled oscillators and the vocal folds generating chaos and you can see the classic steps
1:08:50from by phonation into a triphone a period doubling to chaos in vocal folds in
1:08:55virtually every species that we looked at
1:08:58now and it seems to be very easy for most animals to go into a
1:09:02chaotic state and that's reflected by the fact that many sounds we hear animals produce
1:09:07or have a chaotic source
1:09:09so for example monkeys do this all the time they do this
1:09:13an even dog barks are like that there's the they let themselves use chaos much
1:09:18more in speech and you like this
1:09:22but unless you're batman
1:09:23you know
1:09:25nobody does that we we'd we favour this harmonic source for most things if you
1:09:30listen to a baby crying you'll hear plenty of k
1:09:33so i think what's hard to say is whether humans
1:09:37we can produce chaos with their vocal folds but do we just choose to use
1:09:41this nice regular harmonic nice clear pitch signal
1:09:44because it
1:09:46you know better for understanding or it sounds nice or a vocal folds actually less
1:09:52inclined to go chaotic
1:09:54than those of other species
1:09:55that's a question that i don't think we can answer at present
1:09:58but we certainly do a lot less chaos monkeys it's the most common thing you're
1:10:02gonna hear these threads grounds
1:10:04are chaotic and so that's what we were trying to model in the sentence
1:10:08so i've done if you
1:10:11models where there's interaction between the vocal tract in the vocal folds and also looking
1:10:16at chaotic vibrations and one of the other things that you find even if you
1:10:21get these chaotic vibrations is it's somewhat well it's
1:10:25quite a bit harder to control vocal fold onset so tends to be more gradual
1:10:30and which makes for instance it almost impossible to make a distinction between voiced and
1:10:35voiceless
1:10:36that consonants which are pretty important in speech and so am i just find out
1:10:42there but it seems that this
1:10:46more
1:10:47regular vibration of the human vocal fold is useful for speech whether it's you know
1:10:54being
1:10:56the being used by speech because that way or because whether it has become that
1:11:01way because it useful for speech that's another question
1:11:12okay
1:11:17thank you very much