0:00:13so it is my on privilege this morning to introduce a our keynote speaker frank on some
0:00:22we see a computational and cognitive neural scientist specialising in speech and sensory motor control
0:00:30is from the from the
0:00:34department of speech language hearing sciences and biomedical engineering at boston university when i also obtained his phd
0:00:43and is research combines theoretical modelling
0:00:46with behaviour or and your imaging experiments to characterise the neural computation underlying speech and language so this is a
0:00:55fascinating research field
0:00:58which we thought would advantages the informal all in research
0:01:04and so without further ado
0:01:06like a you to help me welcome a corpus of frank and
0:01:19morning thanks for showing up to thirty in the morning i'd like to start by thanking organisers for inviting to
0:01:25this conference in such a beautiful location
0:01:28and that also like to acknowledge my collaborators before it gets started the main collaborators on the work i'll talk
0:01:35about today include
0:01:36people from my lab at boston university including adjacent orville jonathan rumble or remember
0:01:42supper gauche alfonso the other yet to cast an on my a pave elise a cop annapolis and or in
0:01:48C V A
0:01:50but in addition we collaborate a lot with outside labs and i'll be talking about a number of projects that
0:01:56involve collaborations with people at mit including just a perk L melanie matthias and harlan lane
0:02:02we've work we should you my a to create a speech synthesizer we use for much of our modelling work
0:02:09and phillip kennedy and his colleagues at neural signals to work with us on our neural prosthesis project which i'll
0:02:16talk about at the end of the lecture
0:02:20the research program in our laboratory has the following goals
0:02:25we are interested in understanding the brain first and foremost and
0:02:29we're in particular interested in a lucid aiding the neural processes that underlie a normal speech learning and production
0:02:37but we are also interested in looking at disorders and our goal is to provide a mechanistic model based account
0:02:44and by model here i mean a neural network model that mimics the brain processes that are underlying speech and
0:02:52using this model to on understand communication disorders problems that happen when part of the circuit is broken
0:03:00and i'll talk a bit about communication disorders today but will focus on the last part of our work which
0:03:06is developing technologies that eight individuals with severe communication disorders and i'll talk a bit about project involving a patient
0:03:14with locked in syndrome who was
0:03:16given a brain implant in order to try to restore some speech processing
0:03:22the methods we use a include neural network modelling we use a very simple neural networks the neurons in our
0:03:29models are simply actors that i have a nonlinear thresholding a of the output
0:03:36we have other equations that define synaptic weights between the neurons
0:03:41and we adjust these weights in a learning process is better described in a bit
0:03:45we test the model using a number of different types of experiments we use motor and auditory cycle physics experiments
0:03:52to look at speech look at the formant frequencies for example drinks different speech task
0:03:57and we also use functional brain imaging including fmri but also i'm E G and E G to try to
0:04:04verify the model or i help us improve the model by pointing out weaknesses in the model
0:04:10and the final set of things we do a given that we're a computational neuroscience department we're interested in
0:04:17producing a technologies also that are capable of helping people with communication disorders and i'll talk about one project involves
0:04:24the development of neural prosthesis or a allowing people to speak to have problems with their that speech out
0:04:34the studies we carry out are largely organised around one particular model which we call the diva model and this
0:04:41is a neural network model of speech acquisition and production that we've developed over the past twenty years in our
0:04:48so in today's talk up first give you an overview of the diva model including a description of the process
0:04:53of learning that allows the model to tune up so that it can produce speech sound
0:04:57i'll talk a bit about how we extract simulated fmri activity from the model fmri is functional magnetic resonance imaging
0:05:05and this is a technique for measuring blood flow in the brain and areas of the brain that are active
0:05:11during that
0:05:12have increased blood flow one so we can identify from fmri what parts of the brain are most active for
0:05:17a task and differences in activities for different at task
0:05:23this allows us to test the model and i'll show an example of this where we use auditory perturbation of
0:05:28speech in real time so that a speaker is saying word but they hear something slightly different
0:05:33and we use this to test a particular aspect of the model which involves auditory feedback control of speech
0:05:40and then model and the talk with a presentation of a project that involved
0:05:46communication disorders in this case an extreme communication disorder in a patient with locked in syndrome was completely paralysed and
0:05:54unable to move
0:05:56and so we are working on prosody sees more people in this condition to help restore their ability to speak
0:06:03so that they can communicate with people around them
0:06:08this slide usable schematic of the diva model i will not be talking about the full model much i will
0:06:14use a simplified schematic in a minute
0:06:16a what i want to point out is that the different blocks in this diagram correspond to different brain regions
0:06:23that in include different
0:06:25what we call neural maps a neural map in our terminology is simply a set of neurons that represent a
0:06:32particular type of information so and motor cortex for example down here in the vector motor cortex part of the
0:06:38model we have articulator velocity imposition map
0:06:42what these are neurons basically that command that positions of speech articulators in and articulatory synthesizer
0:06:51i would just schema ties here so the output of our model is a set of commands to an articulatory
0:06:56synthesizer this is just a piece of software which you provide a set of articulator positions as input this a
0:07:04synthesiser we use the most is creative actions you my dad involve
0:07:09seven articulatory degrees of freedom there's a job degree of freedom three talking degrees of freedom to live degrees of
0:07:16freedom for opening in profusion
0:07:18and a larynx height degree of freedom and together once you specify these positions of these articulators you can create
0:07:26a vocal tract area function and you can use that area function to synthesise a and acoustic signal that would
0:07:32be produced by vocal tract of bad shape
0:07:36the models
0:07:39productions are that back to model in the form of auditory since mada sensory information that go to maps
0:07:45for auditory statements madison's restate located in auditory cortical areas in herschel drivers and the posterior superior temporal gyro
0:07:54and this may have sensory cortical areas in the central some at a sensory cortex and supra marginal gyro
0:08:01each of the large boxes here represents a map in this report cortex
0:08:05and the smaller boxes represent represents a sub cortical components of the model most notably a base of anglia loop
0:08:13for initiating speech output
0:08:16and sarabelle or loop
0:08:18which contribute to several aspects of production i'm going to focus on the cortical components of the model today for
0:08:26and so i'll use this simplified version of the model which doesn't have all the components but it has all
0:08:32the main processing levels that will need to go to today's talk show the highest level processing in the model
0:08:40is what we call a speech sound at
0:08:42and this is corresponds to cells in the left entropy motor cortex and inferior frontal gyros
0:08:49in what is commonly called broke "'cause" area and then the promoter court are cortex immediately behind broke as area
0:08:57in the model each one of these cells comes to represent a different speech sound and a speech sound in
0:09:03the model can be either a phoneme or syllable or even a multi syllabic phrase the key thing here is
0:09:10that it's something that's produce
0:09:11very frequently so that there's a stored motor program for that speech sound and the canonical sort of speech sound
0:09:18that we use
0:09:19is the syllable so for the remainder the talk i'll talk mostly about yeah syllable production when referring to the
0:09:24speech sound map
0:09:26so cells in the speech sound map project
0:09:30both to be primary motor cortex through what we call a feed-forward pathway at which is a set of learned
0:09:37commands for producing these speech sounds and the activate associated cells in the motor cortex that command the right articulator
0:09:45but also be speech map sound map cells project to sensory areas
0:09:49and what they do is they send
0:09:51targets to those sensory area so if i want to produce a particular syllable such as bar
0:09:57when i say bah i expect to hear certain things i expect certain formant frequencies that as a function of
0:10:03time and that information is represented by synaptic projections from the speech sound map over to what we call an
0:10:10auditory error my
0:10:11where this target is compared to incoming auditory information
0:10:16similarly when we produce a syllable we expected to feel a particular way when i say a for example i
0:10:22expect my lips to touch for the B E and then to release
0:10:25for the vowel this sort of information is represented in a smack sensory target that projects over to this matter
0:10:32sensory cortical areas where it is compared to incoming smell sensory information
0:10:37these targets are learned as is this feed forward command during learning process that'll describe briefly in just a minute
0:10:45the arrows in the diagram represent synaptic projections from one type of representation to another
0:10:52so you can think of these synaptic projections is basically transforming information from one sort of representation frame into another
0:10:59representation frame and the main representations we focus on here are
0:11:04phonetic representations in the speech sound map
0:11:06motor representations in the articulator velocity and position maps
0:11:11auditory representations in the auditory maps and finally estimate of sensory representation and smacked sensory map
0:11:18the auditory dimensions we use in the model are typically corresponding to formant frequencies and all that talk about that
0:11:25quite a bit as i go on in the talk
0:11:27whereas this matter sensory targets correspond to things like
0:11:31a fresher tactile information from the lips and the tong while you're speaking as well as muscle information about
0:11:40lengths of muscles that give you a read of where you're articulators are in the vocal tract
0:11:47okay so just to give you feel of what the model does so i'm gonna show the synthesizer the articulatory
0:11:54synthesizer with just purely random movements now so this is
0:11:58at what we do in the very early stages of learning in the model we randomly move the speech articulators
0:12:05that creates auditory information it's mada sensory information
0:12:09from the speech and we can associate auditory information and the smell sensory information with each other and with the
0:12:16motor information that was used to produce the movements of speech so these movements don't sound anything like speech as
0:12:23you'll see here
0:12:25so this is just a randomly activating the seven dimensions of movie
0:12:32so this is what the model does for the first forty five minutes we call this a babbling cycle take
0:12:37about forty five minutes real time to go through this
0:12:40and what the model does is it tunes up many of the projections between the different areas so here for
0:12:45example in red are the projections that are turn tune during this random babbling cycle
0:12:50so the key the key things being learned here are relationships between motor command
0:12:56mada sensory feedback and auditory feedback
0:12:59and in particular what the model needs to learn for producing sounds later is how to correct for sensory errors
0:13:06and so what the model was learning largely is if i need to change my first formant frequency in an
0:13:13upward direction for example because i'm too low
0:13:16then i need to activate a particular set of motor commands and this will come a flow through a feedback
0:13:21control mapped to the motor cortex
0:13:24and will translate this auditory error into a motor corrective command
0:13:29and similarly if i feel that my lips are not closing enough for be there will be a smack sensory
0:13:36error representing that and that's ml sense rare will then be mapped into a corrective motor command in the motor
0:13:43these arrows in red here are the transformations basically or synaptic weights their encoding these transformations and they're tuned up
0:13:51during this babbling cycle
0:13:55after the babbling cycle so to this point the model still has no sense of speech sounds this is correspond
0:14:01very early babbling in infant
0:14:04up to about six months of age before they start really learning in producing sounds from a particular language and
0:14:11the next stage of the model handles the learning of speech sounds from a particular language and this is the
0:14:16imitation process in the model
0:14:18and what happens in the imitation process is we provide the model with an auditory target so we give it
0:14:23a sound file of somebody producing a word or phrase
0:14:28the formant frequencies are extracted and are used as the auditory target for the model
0:14:34and the model then attempts to produce the sound by reading out whatever feed forward commands it might have if
0:14:41it just heard the sound for the first time for the first time it will not have any feed forward
0:14:46commands because it hasn't yet produce the sound it doesn't know what commands are necessary to produce the sound
0:14:51and so in this case it's going to rely largely on auditory feedback control in order to produce the sound
0:14:57because all it has an auditory target
0:14:59the model attempts to produce the sound it makes some errors but it does some things correctly due to the
0:15:05feedback control and it takes whatever commands are generated on the first attempt and uses them as the feed forward
0:15:11command for the next attack
0:15:13so the next attempt now has
0:15:16a better feed forward command so there the there will be fewer errors will be a less of a correction
0:15:22but again both the
0:15:24a feed forward command and the correction added together that's the total output that's then
0:15:29turned into the feed forward command for the next iteration and with each iteration the air gets smaller and smaller
0:15:35due to the incorporation of these corrective motor commands into the feed forward command
0:15:41just to give you an example of what that sounds like so here is an example that was presented to
0:15:46the model a ford learning
0:15:50the dog
0:15:52this is a speaker saying good doggy and
0:15:54here that are more
0:15:57a dog
0:15:58and what the model is going to now try to do is it's going to try to mimic this with
0:16:03initially no feed forward command and just using auditory feedback control auditory feedback control system was tuned up during the
0:16:11earlier babbling stage
0:16:13and so it does a reasonable rendition but it's kind of sloppy
0:16:18this is the second attempt it'll be significantly improve because the commands feedback commands from the first attempt to been
0:16:25now moved into the feed forward command
0:16:32and then by the sixth attempt the model has perfectly learn the sound meaning that it there are no errors
0:16:39in its formant frequencies which is all i can hear from the sound pretty much and so it sounds like
0:16:47this was the original
0:16:49a dog
0:16:50so what you can here is that the formant frequencies pretty much track the original formant frequencies in this case
0:16:55they tracked imperfectly we looked at just the first three formant frequencies of the speech sound
0:17:01when doing this and so in this case we would say the model has learned to produce this phrase now
0:17:06so it would have a speech sound map sell devoted to that phrase if we activate that sell it reads
0:17:12the phrase out now with no error too
0:17:16well an important aspect of this model is that it's a neural network in the reason we chose the neural
0:17:22network construction is so that we could
0:17:25investigate brain function in more detail so what we've done is we've taken each of the neurons in the model
0:17:31and we localise them in a standard brain space a stereo tactic space
0:17:37that is a commonly used for analysing neuroimaging results from experiments such as fmri experiments and so here these orange
0:17:46dots represent the different components of the model
0:17:50a here for example this is the central focus in the brain where the motor cortex is in front of
0:17:55the so central focus on the smell sensory cortex is behind it
0:17:58and we have representations of the speech articulators in this region in both hemispheres
0:18:03the auditory cortical areas include state cells and auditory error cells which was a novel prediction we made from the
0:18:11model that these cells would reside somewhere in the higher level auditory cortical areas and i'll talk about testing that
0:18:17prediction in you minute
0:18:19we have some at a sensory cells in the us mass entry cortical areas of the super marginal drivers here
0:18:26and these include are some have sensory error cells also crucial to
0:18:30feedback control
0:18:32and so forth so in general the representations in the model are bilateral meeting there are other neurons for
0:18:40representing the lip are located on in both hemispheres but the highest level of the model the speech sound map
0:18:47is left lateralized and the reason it's left lateralized is that
0:18:52a large amount of data from the neurology literature suggests that
0:18:57the left hemisphere is where we store our speech motor programs
0:19:01in particular if there is damage to the left entropy motor cortex or adjoining brokers area here in the inferior
0:19:09frontal drivers
0:19:10speakers have what's referred to as a proxy of speech and this is an inability to read out the motor
0:19:17programs for speech sound so they hear the sound they understand what the word is a and they
0:19:24they try to say it but they just can't get the syllables to come out and this in our bus
0:19:30because their motor programs represent about the speech sound map cells
0:19:34are damaged due to the stroke if you have a stroke in the right hemisphere in the corresponding location there
0:19:41is no upper active speech is largely spare
0:19:45and in our view this is because the right hemisphere as all described about that are is more involved in
0:19:51feedback control then feed forward control
0:19:54an important insight is that once an adult speakers learn to produce the speech sounds of his or her language
0:20:01and their speech articulators of largely stop growing
0:20:04they don't need feedback control very often because their feed forward commands are already accurate
0:20:10and if you for example listen to the speech of a somebody who became deaf as an adult for many
0:20:16years many years there's speech remains largely intelligible a presumably because these motor programs are intact
0:20:23and they by themselves are enough to produce the speech properly
0:20:28in an adult however if we do something novel to the person such as
0:20:32block their job why they try to try to speak or we perturbed auditory feedback of their speech then we
0:20:38should reactivate the feedback control system by first activating sensory error cells that detect that they sensory feedback isn't what
0:20:46it should be
0:20:47and then motor correction takes place to the feedback control pathways of the model
0:20:54okay so just to high like the
0:20:58use of these locations what i'll show you now is a typical simulation where we have the model produce an
0:21:05utterance in this case it saying how the
0:21:08and what you'll see you'll hear first the production in our model the activities of the neurons correspond to electrical
0:21:15activity in the brain
0:21:17fmri actually measures blood flow in the brain and blood flow is a function of the electrical activity but it's
0:21:23quite slow down relative to the activity peaks for five seconds after the speeches started and so what you'll see
0:21:33the brain activity starting to build up in terms of blood flow over time after the utterances produced
0:21:41so here the utterance was at the beginning but only later D C they hemodynamic response and this is actually
0:21:46quite useful for us because we can do neuroimaging experiments
0:21:50where people speak in silent
0:21:53and then we collect data after they're done speaking at the peak of this blood flow so what we would
0:21:58do is basically have them speak in silence and
0:22:03at this point we would take scans with an fmri scanner is very loud which would interrupt the speech if
0:22:09it was going on during your speech but in this case were able to scan after the speech is completed
0:22:14and get a measure of what brain activity what brain regions where active and how active they were during speech
0:22:23okay so that's an overview of the model next what i'll do is going to a little more detail about
0:22:28the functioning of the feedback control system
0:22:31and my main goal here is simply to give you i feel for the type of experiment we do we've
0:22:36done many experiments of this sort to test and refine the model over the years
0:22:41and the experimental talk about in this case is an experiment involving auditory perturbation of the speech signal well subject
0:22:48is speaking in an M R I scan
0:22:51so just to review then the model has the feed forward control system shown on the left ear and the
0:22:59feedback control system shown on the right
0:23:01and feedback control has both an auditory and isomap sensory component
0:23:06so during production of speech when we activate this speech sound map cell to produce the speech sound
0:23:13in the feedback control system we read out these targets to the sum at a sensory system into the auditory
0:23:18system and those targets are compared to the incoming auditoriums mada sensory information
0:23:25the targets take the form of regions so there's an acceptable region of F one that they can be in
0:23:30if they're anywhere within this region there okay but if they go outside of the region and ever cell is
0:23:35activated and that will drive the
0:23:38oh and by driving articulator movements that will move it back into the appropriate target region
0:23:45if we have an error arising in one of these maps and in particular we're gonna be focusing on the
0:23:51auditory error map
0:23:53what happens next in the models that the sarah gets transform
0:23:56through a feedback control map in the right up we motor cortex
0:24:01and then projected to the motor cortex in the form of a corrected motor command and so what the model
0:24:07is essentially learned is how to take auditory errors and correct them with motor movement
0:24:13in terms of mathematics this corresponds to a pseudo inverse of that you colby in matrix that relates the articulatory
0:24:20and auditory spaces
0:24:22and this can be learned during babbling simply by moving the articulators around and seeing what changes in some at
0:24:28a sensory and auditory state take place
0:24:31the fact that we have this feedback control map in the right entropy motor cortex now when the model that
0:24:36was partially the result of the experiment that i'll be talking about this was not originally in the model originally
0:24:42these projections what's of the primary motor cortex
0:24:44i'll show the experimental result the cost us to change that component of the model
0:24:52so i based on this feedback control system we can make some explicit predictions about brain activities during speech
0:24:59and in particular we made some predictions about what would happen if we shifted your first formant frequency during speech
0:25:07so that when we set it back to you over earphones in fifty milliseconds you hear something slightly different than
0:25:14what you're actually producing
0:25:16well according to our model the should "'cause" activity of cells and auditory error map which we have localised to
0:25:24posterior superior temporal drivers and that the adjoining plan and temporal these regions in these still be in fig
0:25:31on the temporal lobe
0:25:32so we should see increased activity there if we perturbed the speech
0:25:38and also we should see some motor corrective activity because according to our model the feedback control system will kick
0:25:45in when it hears this error even during that particular and
0:25:48and it will try to correct if the utterance is long enough it will try to correct the error that
0:25:54is her
0:25:56now keep in mind that auditory feedback takes time to get back up to the brain so that i'm from
0:26:02motor cortical activity tomb movement and sound output to get hearing that sound output in project
0:26:09ejecting about up to your auditory cortex is somewhere in the neighbourhood of a hundred two hundred fifty milliseconds
0:26:16and so we should see a corrective command kicking in not at the instant that the perturbation start
0:26:22what about a hundred or one twenty five milliseconds later because that's how long it takes to process this auditory
0:26:30so what we did was we developed a digital signal processing system that allowed us to shift the first formant
0:26:37frequency in real-time meaning that a subject hears the sound with a sixty millisecond delay which is pretty much unnoticeable
0:26:46to the subject
0:26:47even unperturbed speech has that same sixty millisecond delay so they're always hearing
0:26:52a slightly delayed version other speech over headphones we play a rather loud over the headphones and they speak quietly
0:26:59as a result of this and the reason we do that as we want to minimize things like bone conduction
0:27:04of the actual speech
0:27:06and make them focus on the auditory feedback that we're providing them which is the perturbed auditory feedback
0:27:12and what we do in particular is we take the first formant frequency and in one fourth of the utterances
0:27:18we will perturbed it either up or down so three out of every four utterances are unperturbed
0:27:25one in four is perturbed well excuse me one in eight is perturbed up and one in eight is perturbed
0:27:32down so
0:27:33they get these perturbations randomly distributed they can't predict them because first of all the direction changes all the time
0:27:42and secondly because many of the productions are not prepare
0:27:46and oh what we did well here's what this sounds like so the people were producing vowels
0:27:52that the bout and so the words that they would produce work are words like that and pack and
0:28:00and here's an example of on shifted speech before the perturbation
0:28:09and here is a case where we've shifted F one upward and upward shift about one corresponds to a more
0:28:16open mouth and that should make the pet
0:28:19a vowel sound a little bit more like an ad
0:28:22and so if you hear the perturbed version of that production
0:28:27it sounds more like that then yeah in this case so that original
0:28:41so it's consciously noticeable to you now when i play to you like this but most subjects don't notice what's
0:28:46going on during the experiment we asked them afterwards that they notice anything sometimes will say
0:28:52occasionally my speech sound a little odd but usually they didn't really notice that much of anything going on with
0:28:59their speech and yeah their brains are definitely picking up this difference and we found that without them or i
0:29:07we also look at their formant frequencies so what i'm showing here is
0:29:13a normalized for F one
0:29:16and what normalize means in this case is that the F one in a baseline on perturbed utterance
0:29:22is what we expect to see that will take the F one in a given utterance we'll compared to that
0:29:30it's exactly the same then we'll have a value of one so if they're producing the exact same thing is
0:29:36they do in the baseline they would stay flat on this value of one
0:29:39on the other hand if they're increasing their F one then we'll see the normalized F one go about one
0:29:46in if they're decreasing F one will see go below one
0:29:51gray shaded areas here are the competence in ninety five percent confidence intervals of the subjects productions in the experiment
0:29:59and what we see for the down shift is that over time the subjects increase their F one to try
0:30:06to correct for the ad decrease of F one that we
0:30:09given them with the perturbation
0:30:12and in the case where we up shift their speech they decrease F one as shown by this confidence interval
0:30:19the split between the two occurs right about where we expect which is somewhere around a hundred two hundred and
0:30:26fifty milliseconds after the first sound comes out that a here with the perturbation
0:30:33the solid lines here are the results of simulations of the diva model producing the same speech sounds under perturbed
0:30:41and so the black dashed line here shows the models productions in the option if condition we see weights about
0:30:47a hundred twenty five when this case actually it only weights about eighty milliseconds are delay loop which short here
0:30:52and then it starts to compensate for the utterance
0:30:56similarly in the down shift case it goes for about eighty milliseconds until it starts to your the error and
0:31:03then it compensates in an upward direction
0:31:05and we can see that the models productions fall in a confidence intervals of the subjects production so the model
0:31:11but i produces a good fit of the behavioural data
0:31:16but we also took a look at the neuroimaging data and on the bottom what i'm showing is the results
0:31:23of a simulation that we're and before be study where we generated predictions of fmri activity
0:31:30when we compare shifted speech to non shifted speech as i mentioned one we shift the speech that should uttering
0:31:37these auditory error cells on and we've localise them to these posterior areas of the temporal gyros here
0:31:44when those error cells become active they should lead to a motor correction and these are shown by activities in
0:31:51the motor cortex here in the model simulation
0:31:55now we also see a little bit stale valour activity here in the model but i'll skip that for two
0:32:02here on the top is what we actually got from our experimental results for the ship minus no ship contrast
0:32:08the auditory hair cells were pretty much where we expected them so first of all there are auditory ourselves there
0:32:15are cells in your brain that detect the difference between what you're saying and what you expect it to sound
0:32:20like even as an adult
0:32:22these auditory errors of become active at but we noticed is that the motor corrective activity we saw was actually
0:32:29right lateralized in it was pretty motor it wasn't bilateral and primary motor as we predicted it's farther forward in
0:32:36the brain it's in a more pretty motor cortical real area
0:32:39and it's right lateralized so one of the things we learned from this experiment was that auditory feedback control appears
0:32:46to be right lateralized in the frontal cortex
0:32:49and so we modify the model to have an auditory feedback that
0:32:53are sorry a feedback control map in the right entropy motor cortex area correspond with this region here
0:33:01we actually ran a parallel experiment where we perturbed speech with the balloon in the mouth so we actually
0:33:09we build a machine that
0:33:11a perturbed your job while you were speaking at so you would be saying something like a P and during
0:33:16the how this balloon would blowup very rapidly it was a little was actually the finger of a lot of
0:33:21that would follow up to about a centimetre and half and would block your job from closing so that when
0:33:26you were
0:33:27done with that i'm getting ready to say that consonant and the final vowel key then the job was blocked
0:33:33the job could move as much subjects compensate again
0:33:37and we saw in that experiment activity in their smell sense recordable areas corresponding to this matter sensory error map
0:33:45but we also saw a right lateralized motor cortical activity and so based on these two experiment
0:33:51we modify the model to include a right lateralized feedback control map that we did not have in the original
0:34:02okay so
0:34:03the other thing we can do is we can look at connectivity in brain activities using techniques such as structural
0:34:10equation modelling a very briefly in a structural equation modelling analysis what we would do is we would use a
0:34:18we define model of connectivity in the brain and then we would go and look at the fmri data and
0:34:24see how much of the covariance matrix of the fmri data we had a can be captured by this model
0:34:31if we optimize the connections and so what as cm does is it
0:34:36reduces connection strings that are produced in that modelling gives you goodness of fit data
0:34:41and in addition to being able to the data very well meaning that are cut connections in the model are
0:34:47in the right place
0:34:49we also noted a an increase in the what what's called effective connectivity so an increase the strength of the
0:34:56effect of these
0:34:57auditory areas on the motor areas in the right hemisphere when the speech was perturbed so the interpretation of that
0:35:05is when i picture of your speech but with an auditory perturbation like this
0:35:09the error cells are active that drives activity in the right that for motor cortex and so we have an
0:35:14increase affect on the motor cortex from the auditory areas in this case
0:35:19and so this is further support for the structure in the model and the feedback control system that we just
0:35:28the score
0:35:30okay so that's one example of an experimental test we've done a very large number of a test of this
0:35:36we've tested predictions of can "'em" addicts in the model so we look we work with people who measure articulator
0:35:45movements using
0:35:46electromagnetic articulatory this is a technique where you basically glue receiver coils on the talking in the lips and the
0:35:55job and you can measure the very accurately the position of the articulators of these points on the articulators
0:36:03in the midsagittal plane and from this you can estimate quite a accurately in time the positions of speech articulators
0:36:10and compare them to
0:36:12productions that use the in the model we've done a lot of work looking at for example phonetic context effects
0:36:19in our production which i'll come back to later R is a phoneme in english that is produced with a
0:36:24very wide range of articulatory variability
0:36:27the acoustic cues for are very stable this been shown by people such as voice in S P wilson
0:36:34and what you see in the if you produce movements with the model is that
0:36:40the model will also produce very different articulations for are in different phonetic contexts and this has to do with
0:36:45the fact that it's starting from dish different initial positions and it's simply going to the five closest point to
0:36:51the acoustic target
0:36:53that it can get to and that point will be in different parts of the articulator space depending on where
0:36:58you start
0:37:00we looked at a large number of experiments on other types a particular articulatory movements both in
0:37:08normal hearing and hearing impaired individuals we look at what happens when you put a bite blocked in we look
0:37:13at what happens when you noise mask these speakers and we've also looked at what happens over time for in
0:37:21speech of people with cochlear implants for example so
0:37:24in the case of a cochlear implant recipient that was an adult would already learn to speak
0:37:29when they first
0:37:31receive the cochlear implant they hear a sounds that are not the same as the sounds that they used here
0:37:38so their auditory targets don't match
0:37:41what's coming in from the cochlear implant and it actually impairs their speech for a little while a before about
0:37:48a month or so before they start to improve their speech and by a year it show up very strong
0:37:54improvements in the speech
0:37:56and according to the model this is occurring because they have to retune their auditory feedback control system to deal
0:38:02with the new feedback and only when that auditory feedback control system is tunic and they start to retune the
0:38:07movements to produce more distinct speech data
0:38:12a we've also done a number of neuroimaging experiments for example we predicted that you left entropy motor cortex
0:38:21involves syllabic motor programs
0:38:24and we use the technique called repetition suppression in fmri where you present us to really that change and some
0:38:32dimensions but don't change in other dimensions
0:38:35and with this technique you can find out what is it about the seemingly that a particular brain region cares
0:38:41about and using this technique we were able to show that in fact the only region in the brain that
0:38:46we found that had
0:38:47a syllabic sort of representation was the left entropy motor cortex where we believe these syllabic motor programs are located
0:38:54a highlighting the fact that the syllable is a particularly important entity for motor control
0:39:00and this we believe is because our syllables are very high we a practise and well to the motor programs
0:39:07that we can read out we don't have to produce the individual phonemes we read out the whole syllable as
0:39:12a motor program that we've stored in memory
0:39:16finally we've been able fourteen would lead to even at test the models predictions electra physiologically in this was in
0:39:24a case
0:39:25of a patient with locked in syndrome that'll state speak about in a bit and i'll talk about exactly what
0:39:30we were able to verify using electro physiology in this case actual recording from neurons in the court
0:39:39okay so
0:39:40the last part might talk now will start to focus on using the model to investigate communication disorders
0:39:47and we've done a number of studies of this sort we as i mentioned look that speech in normal hearing
0:39:54and hearing impaired populations
0:39:57we are now doing quite a bit of work on stuttering which is a very common speech disorder that affects
0:40:03about one percent of the population stuttering is a very complicated disorder it's been known
0:40:10since the beginning of time basically every culture seems to have people who stutter within them within that culture people
0:40:17been trying to cure stuttering for ever and we've been unable to do so and the brains of people who
0:40:23stutter are actually
0:40:24really similar to bring the people who don't stutter and unless you look very closely and if you start looking
0:40:30very closely you start to see things like white matter differences
0:40:35and grey matter thickness differences in the brain and these tend to be localised around the base of anglia alamo
0:40:41cortical loop and so are you of stuttering is that several different problems can occur in this loop very difference
0:40:48that people would who stutter
0:40:51can have different locations of damage or of an anomaly in their basic english alma cortical loop and this can
0:40:59lead all of these can lead to stuttering and the complexity of this order is partly because
0:41:05it's a system level disorder where different parts of the system can cause problems it's not always the same part
0:41:11of the system that's a problematic in different people who stutter and so one of the important areas of research
0:41:19for stuttering is
0:41:20computational modelling of this loop to get a much better understanding of what's going on and how these different problems
0:41:25can lead to similar sorts of behaviour
0:41:29we looked at we're looking at what's pass moderate dysphonia which is a vocal fold problem similar to just only
0:41:36it's a
0:41:37a problem where typically the vocal folds are too tense during speech
0:41:42again appears to be basal gangly a loop related
0:41:46a proxy of speech which involves left hemisphere frontal damage a child that a proxy of speech which is actually
0:41:52a different disorder from acquired a proxy a speech this tends to involve more widespread
0:42:00kind of lesser damage but in a more widespread a portion of the brain
0:42:05and so forth and the project all talk most about here will be a project involving neural prosthesis for locked
0:42:12in syndrome and this is a project that we're doing a are we done with bill kennedy from neural signals
0:42:19a locality developed technology for implanting brains of people with locked in syndrome and we help them build a prosthesis
0:42:28from that technology
0:42:31so typically are studies where we're looking at disorders involve some sort of damage version of the model it's a
0:42:37neural network so we can go in and we can mess up white matter projections which are these synaptic projections
0:42:42we can mess up
0:42:43neurons in a particular area we can even adjust things light levels of neurotransmitters some studies suggest that there may
0:42:53be an excess of double mean and some people who stutter
0:42:56well we have added up i mean receptors or base of anglia loop so we can go in and we
0:43:01can start changing double mean levels and seeing how that changes but the behaviour of the model and also the
0:43:07brain activities of the model
0:43:09and what we're doing now is running a number of imaging studies involving people who stutter or we've made predictions
0:43:15based on several possible
0:43:19lead to damage in the brain that may result in stutter stuttering and we're testing those predictions both by seeing
0:43:25if the model is capable of producing stuttering behaviour but also seeing if the brain activities
0:43:31match up with what we see in people who stutter there are many different ways to invoke stuttering in the
0:43:36model but each way causes a different pattern of brain activity to occur
0:43:42so by having both the behavioural results and the neuroimaging or results we can do a much a more detailed
0:43:49treatment of what exactly is going on in this population
0:43:54the example i'm gonna spend the rest of the talk describing is a bit different where in this case the
0:44:01speech motor system of the of the patient was
0:44:06but patient was suffering from locked in syndrome due to a brain stem stroke
0:44:11a locked in syndrome is a syndrome where
0:44:15patients have intact cognition and sensation but they're completely unable to perform voluntary movement so it's a case of being
0:44:23almost kind of
0:44:25buried in your own body alive and the patients sometimes have eye movements patient we worked with could vary slowly
0:44:33move his eyes up and down his eyelids actually to answer yes no questions
0:44:39this was the only form of communication here at
0:44:42and so prior to our involvement in the project he was implanted as part of a project developing technologies for
0:44:51locked in patients to control computers or external devices
0:44:56these technologies are referred to by several different names brain computer interface or brain machine interface or neural prosthesis
0:45:05and in this case we were focusing on a neural prosthesis for speech restoration
0:45:10the locked in syndrome is typically caused by either brain stem stroke and eventual ponce or more commonly people become
0:45:19locked in through neural degenerative diseases such as a last which are attacked the motor system
0:45:25people who suffer from a less
0:45:27go through a stage for the later stages of the disease wait where they are basically locked in there unable
0:45:34to move or speak
0:45:35but still fully conscious and with sensation
0:45:41well the electrode that was developed by are calling filled kennedy is schema ties here and here's a photograph of
0:45:49it it's a tiny glass cone that is open on both bands the cone is about a millimetre long they're
0:45:56three gold wires inside the cone
0:45:59there coded with a and insulator except at the very end where the wires cut off and that acts as
0:46:07a recording site so there are three recording sites within the cone one is used as a reference and the
0:46:12other two are used as recording channels
0:46:15and these wires are this electrode is inserted into the stripper cortex here i've got a schematic of the cortex
0:46:23which is good consists of six layers of cell types
0:46:28the goal is to get this near layer five but the cortex
0:46:32where the output neurons are these are the motor neurons that project in the in the motor cortex these are
0:46:39neurons a project for the periphery to "'cause" movement
0:46:42but it doesn't matter too much where you go because the cone is build with i nerve growth factor and
0:46:47what happens is
0:46:49over a month or two X sounds actually grow into this conan lock it into place that's very important because
0:46:55it stops movement if you have movement of a an electrode in the brain
0:47:00use get problems such as cleo says which is scar tissue building up around the electrode and stopping a the
0:47:06electron from picking up signals
0:47:08in this case the wires are actually inside a protected class cone and nobody else's builds up inside the cone
0:47:16so it's a permanent electrode you can implant this electrode and record form from it for many years and if
0:47:22when we did the project all talk about the electorate had been in the subjects brain for over three and
0:47:28a half years
0:47:33the electrode location was chosen in this case by having subject attempt to produce speech well in a and fmri
0:47:41and what we i noticed was that the brain activity is a relatively normal looks like brain activity of
0:47:49of a neurological a normal person trying to produce speech and in particular we there's a blob of activity on
0:47:57the three central drivers which is the location of the motor cortex
0:48:01in the region where we expect for speech so i'm going to refer to this region of speech motor cortex
0:48:08this is where the electrode was implanted so this is an fmri S can perform before implantation here is actually
0:48:15a C T scan afterwords where you can see in the same brain area the wires of the electrode coming
0:48:22this is bottom picture is a three D A C T scan showing this call a where you can see
0:48:29the training out to me where the electorate was inserted you can see the wires coming out and the wires
0:48:34go into a package of electronics that is located under the skin
0:48:39and these electronics amplify the signal and then send it is radio signals across the scout
0:48:44we attach intent as basically that just antenna coils to the scout so the subject has a normal looking had
0:48:52yes hair on his head there's nothing sticking out of his head
0:48:56when he comes into the lab we attach these antenna to the scout eight we tune them to just the
0:49:02right frequencies and they pick up the two signals that we are generating from are electrode
0:49:08the signals are then routed to a recording system and then to a computer where we can operate on those
0:49:14in real time
0:49:19kennedy had implanted the patient two years before we are several years before we got involved in the project
0:49:27but they were having trouble decoding the signals and part of the problem is
0:49:31that if you look in motor cortex there's nothing obvious that corresponds to a word or for that syllable or
0:49:38phoneme you don't see neurons turn on when the subject produces a particular syllable and then shut off twenty the
0:49:46subjects done
0:49:47a U C instead that all the neurons are just subtly changing their activity over time so there it appears
0:49:53that there's some sort of continuous representation here in the motor cortex there's not a representation of just words and
0:49:59phonemes at least at the motor level
0:50:02a cantonese a group contacted us because we had a model of what these brain areas are doing and so
0:50:09we collaborated on decoding these signals and routing them to a speech synthesizer so the subject could actually control some
0:50:17speech output
0:50:20the tricky question here is what is the neural code for speech in the motor cortex
0:50:26and the problem of course is that there are no prior studies people don't go into a human motor cortex
0:50:33and record normally
0:50:35and monkeys don't speak you know whether animals speak so we don't have any single cell data about what's going
0:50:41on in the motor cortex during speech we have data from our movements and we use the insights from this
0:50:48yeah but we are also used insights from what we saw in human speech movements to determine what where the
0:50:54variables that these people were controlling what was the motor system caring about
0:50:59mostly to care about muscle positions or data care about the sound signal
0:51:04and there is some available data from simulation studies the motor cortex these come from
0:51:11the work by up and field who work with epilepsy patients who were having surgeries to remove portions of the
0:51:18cortex that were
0:51:19causing a epileptic fits
0:51:22before they did the removal what they would do is actually stimulate in the court ecstasy out what
0:51:30parts of the brain we're doing why any particular what they wanted to do was avoid parts of the brain
0:51:35involved in speech and they mapped out along the motor cortex areas that "'cause" movements of the speech articulators for
0:51:41example and other areas that caused interruptions of speech and so for
0:51:46and these studies were informative and we help we use them to help us determine where to localise some of
0:51:52the neurons in the model but they don't really tell you about what kind of representation is being used by
0:51:57the neurons when you stimulate a portion of cortex are stimulating hundreds of neurons minimally they were using something like
0:52:04two bolts for stimulation the maximum activity even ron is fifty five mill of also the stimulation signal was dramatically
0:52:11bigger than any natural signal
0:52:13and it activates a large area of cortex and so you see a gross
0:52:17where lee form the movement coming out and speech movements tended to be things like that our price of the
0:52:22subject might say that
0:52:24something like this adjust the of movement it's not really a speech sound they don't produce any words or anything
0:52:30like that
0:52:31and from these sorts of studies it's next to impossible to determine what sort of representation is going on in
0:52:37the motor cortex
0:52:39a however we do have our model which does provide the first explicit characterisation of what these response properties should
0:52:46be of speech motor cortical cells we have actual speech motor cortical cells in the model they are tuned to
0:52:52particular things
0:52:54and so what we did was we use the model to guide are search for information in this part of
0:53:00the brain
0:53:01and i want to point out that the characterisation provided by the model was something that we spent twenty years
0:53:08refining so we ran a large number of experiments testing different possibilities about how speech was control
0:53:15and we ended up with a particular format in the model and that's no coincidence that's because we spent a
0:53:22lot of time looking at that in here is the result of one such study which a highlights the fact
0:53:28that in motor planning
0:53:30sound appears to be more important than where you're talking is actually located and this is a study of the
0:53:37phoneme are that i mentioned before just to describe what you're going to see here so that the each of
0:53:43these lines you see represents a tongue shape
0:53:47and they're to chunk shapes in each panel there's a dashed line
0:53:52so this is the tip of the time this is the centre the tongue in this back of the tongue
0:53:56where actually measuring the positions of these transducers that are located on the time using a thirty kilometre E
0:54:01and the dashed lines show the tongue shape that occurs seventy five milliseconds before
0:54:09B centre of the R which happens to be they minimum of the F three trajectories
0:54:14and the dark bold lines show the tongue shape at the center ready are a or at that have three
0:54:21minimum so in this case you can see the speaker used
0:54:24and upward movement other tongue tip to produce the R
0:54:28in this panel
0:54:30so what we have over here in our two separate subjects where we have measurements from the subject on the
0:54:36top row and then productions of the model represented in the bottom row and the model was actually using speaker-specific
0:54:43vocal tract in this case so
0:54:45what we did was we took the subject we are collected a number of them are i stands while they
0:54:50were producing different phonemes
0:54:52we did principal components analysis to pull out their main movement degrees of freedom we had their acoustic signals and
0:54:58so we built a synthesiser that had their vocal tract shape and produce their formant frequencies
0:55:04then we had the diva model learned to control their vocal tract so we put this vocal tract synthesiser in
0:55:10place of the my the synthesizer we battled the vocal tract around had it learn at to produce hours and
0:55:16then we went back and had it
0:55:18produce the estimate lee in the study and in this case the people producing utterances
0:55:24walk around
0:55:25what drum and one row of so B R was either preceded by a sound at the orgy
0:55:34what we see is that the subject produces very different movements in these three cases so in a context the
0:55:40subject uses it upward movement of the tongue tip like we see over here
0:55:44but in the D context the subject actually move their tongue backwards to produce the R
0:55:49in the G context they move their time downward to produce the are so they're using three completely different gestures
0:55:55are articulatory movements to produce the R and yet the producing pretty much the same after each race the F
0:56:01three traces are very similar in these cases
0:56:04if we take the model and we have it produce R's with the speaker-specific vocal tract we see that the
0:56:11model because it cares about the acoustic signal primarily it's trying to get these F three target
0:56:17and the model also uses different movements in the different context an impact the movements reflect the movements of the
0:56:23speaker so here the model uses an upward movement of the tongue tip here the model uses the backward movement
0:56:29of the time and here the model uses a downward movement of the time to produce are so
0:56:34what we see is that with a very simple model that's just going to be appropriate position and formant frequency
0:56:39space we can capture this complicated variability in the articulator movements
0:56:45of the actual speaker
0:56:47a another thing to note here is this is the second speaker again the model replicates the movements and the
0:56:53model also capture speaker-specific differences here in this case the speaker use the small upward tongue tip movement to produce
0:57:01the R
0:57:02up at the speaker for reasons having to do with the morphology of their vocal tract had to do a
0:57:06much bigger movement of the tongue tip to produce the are in a contact
0:57:11and again the model produces a bigger movement in this speakers case than in the speaker space so
0:57:17this provides a pretty solid data that speakers are really concentrating on
0:57:21the formant frequency trajectories of their speech output more so than where the individual articulators were located
0:57:29and so we made production and that we should see formant frequency representations in the speech motor cortical area if
0:57:38we're able to look at what's going on during speech
0:57:42a the slide i'm sure everybody here follows this appears actually the formant frequency traipse traces for good doggy this
0:57:51is what i'd use of the target for the
0:57:54simulations i showed you earlier and down here i show the first two formant frequencies what's called the formant frame
0:58:00plane and the important point here is that if we can move if we can just change F one and
0:58:06F two we can produce pretty much all of the vowels
0:58:09of the language because they are differentiated by their first two formant frequencies and so formant frequency space provides a
0:58:18very low dimensional continuous space for the planning of movements
0:58:22and that's crucial for the development of the brain computer interface
0:58:27okay and why is a crucial well
0:58:31there have been our number brain computer interfaces that involve implants and the hand area
0:58:37of the motor cortex
0:58:39and what they do usually is they decode cursor position on the screen from neural activities in the hand area
0:58:46and people learn to control movement of a cursor by who are activating their neurons in their hand motor cortex
0:58:55now they when they build these interfaces they don't try to decode all of the joint angles of the arm
0:59:01and then determine where the cursor would be based on where the mouse would be instead they go directly to
0:59:06the output space in this case the two dimensional cursor space
0:59:11in the reason they do that is we're dealing with a very small number of neurons in these sorts of
0:59:15studies relative to the entire motor system there are hundreds of millions of neurons involved in your motor system
0:59:21and in the best case you might get a hundred neurons in the brain computer interface we were actually getting
0:59:26far fewer from that then that we had a very old in plant that only had two electrode wire
0:59:32so we were getting somewhere we had less than ten neurons maybe is a few as two or three neurons
0:59:38we could pull out more signals than that but they weren't signal nor on activities
0:59:42well if we tried to pull out a high dimensional representation of the arm configuration from a small number of
0:59:49neurons we can have a tremendous amount of error and this is why they don't do that instead they try
0:59:54to pull out a very low dimensional thing which is this two D cursor position
0:59:58well we're doing the analogous thing here instead of trying to pull out all of the articulator positions that determine
1:00:04the shape of the vocal tract we're simply going to the output space which is the formant frequency space which
1:00:10for the for about production can be as simple as a two-dimensional signal
1:00:16okay so what we're doing is basically decoding and intended sound position in this two D formant frequency space
1:00:23that's generated from motor cortical cells a but is a much lower dimensional thing then the entire vocal tract shape
1:00:32well the first thing we need to do is verify that this formant frequency information was actually in this part
1:00:37of the brain and the way we did this was we had that subject try to imitate a minute long
1:00:44vowel sequence that was something like
1:00:46yeah year who this lasted a minute and they were told the subject was told to do this in synchrony
1:00:56with the stimulus
1:00:58this is crucial because we don't know otherwise when he's trying to speak up because no speech comes out and
1:01:04so what we do is we record the neural activities during this minute long attempted utterance
1:01:08and then we try to map them into the formant frequencies that the subject was trying to imitate so the
1:01:14square wave here right which is kind of the C is that the actual in this case actually have to
1:01:21going up and down and here's the actual F one going up and down for the different vowels
1:01:27and the solid are not bold squiggly line here is the decoded signal a it's not great but it's actually
1:01:35highly statistically significant we did cross validated training and testing and we had a very highly significant
1:01:42a representation of the formant frequencies our values one point six nine four F one point six eight for F
1:01:48two and so this verifies that there is indeed formant frequency information in your primary motor cortex
1:01:55and so the next step was simply to use this information to try to produce speech output
1:02:00just as a review for most of you formant synthesis of speech has been around for a long time goner
1:02:07font for example in nineteen fifty three use this very large piece of electronic equipment here
1:02:14with this style was on a two-dimensional pad and what he did was he would be stylus around on the
1:02:20pad and the location of the stylus was i location in the F one F two space
1:02:26so is basically moving around in the formant plane and just by moving this cursor around in this two dimensional
1:02:32space is able to produce
1:02:33intelligible speech so here's an example
1:02:41so the good news here is that with just two dimensions some degree of speech output can be produced
1:02:47consonants are very difficult i'll get back to that at the end but certainly bows are possible with this sort
1:02:53of synthesis
1:02:55so what we did was we took the system and we so here is a schematic are electrode in the
1:03:01speech motor cortex
1:03:03is recorded by this are picked up and amplified and then sent across the sky now
1:03:08we record the signals and we then run them through a neural decoder and what the neural decoder does is
1:03:15it predicts what formant frequencies are being attempted based on the activities so it's trained up on one of these
1:03:21one minute long sequences
1:03:23and once you train it up then it can take a set of a neural activities and translate that into
1:03:30a predicted first and second formant frequency which we can then send over a speech synthesiser to the subject
1:03:36the delay from the brain activity to the sound output was fifty milliseconds in our system and this is approximately
1:03:42the same delay as
1:03:43your motor cortical activity to your sound output and this is crucial because if the subject is going to be
1:03:49able to learn to use this synthesiser you need to have an actual feedback delay if you delay speech feedback
1:03:55by a hundred milliseconds in a normal speaker
1:03:58they start to become highly disfluent they go through some stuttering like behaviour they'll start talking it's very disruptive so
1:04:08it's important that this thing at operates very quickly
1:04:11and produces this feedback in a natural time frame
1:04:17now what i'm gonna show is the subsets performance with the speech bci so we had "'em" produce a about
1:04:24tasks so subject would start out at the centre about
1:04:29then would it is
1:04:31ask on each trial was to go to about that we told him to go to so in the video
1:04:36well play you'll hear the computer say
1:04:39and it'll say something like yea i
1:04:42and it'll say speak and then he supposed to say E with the synthesiser so you'll hear his sound output
1:04:49as produced by the synthesizer as the attempts to produce the bow that was being that presented in you'll see
1:04:56that the target values in green here
1:04:59the cursor you'll see is the subjects location in the formant frequency space
1:05:04a most of the trials we did not provide visual feedback the subject didn't need visual feedback and we saw
1:05:10no increase in performance from visual feedback E instead use the auditory feedback that we produced from the synthesiser to
1:05:17produce a better and better speech
1:05:20or what speech sounds at least and so here are five examples five consecutive productions in a block
1:05:31we speak
1:05:35a so that's a directivity very quickly want to the target
1:05:45be your egos awfully here's the error any kind of steers the back into the target five
1:05:55another directive is next trial isn't here you'll seems to me like yeah the before the timeout
1:06:12but nobody around here
1:06:18so straight to the target so what we saw were
1:06:21to sorta behaviours often times it was straight to the target but other times you would go off a little
1:06:26bit and then you would see him one see her the feedback going off you would see "'em"
1:06:31and presumably in his head he's trying to change the shape of this time we don't to try to you
1:06:36try to actually say the sound so he's trying to reshape where that sound is going and so you'll see
1:06:41"'em" kind of steered toward the target in those cases so what's
1:06:48happening in these slides is or these panels is i'm showing the error a rate here course they hit rate
1:06:54as a function of block so any given session we would have
1:06:58a four blocks of trials there were about five productions to ten productions per block so during the course of
1:07:06a session he would produce anywhere between about
1:07:09ten that's what course ten to twenty repetitions but about actually five to ten repetitions of each about
1:07:16and when he first starts his hit rate is just below fifty percent that's above chance but it's not great
1:07:23but we see with practise it gets better with each block and by the end he's improved a set rate
1:07:29to over seventy per se
1:07:31on average in a in fact in the later sessions he was able to get up to about ninety percent
1:07:36hit rate if we look at the end point error as a function of block this is how far away
1:07:41he was from the target and formant space i when the trial and that
1:07:45so if it was a success it would be zero if it's not a success and there's an error we
1:07:50see that this pretty much linearly drops off over the course of a forty five minute session
1:07:56and this movement i'm also improves a little bit
1:07:59this slide shows what happens over many sessions so these are twenty five sessions
1:08:04one thing to note here is and this is the endpoint error we're looking at one thing to note is
1:08:09that there's a lot of variability from day to day i'll be happy to talk about that we had to
1:08:13train up a new decoder everyday because we weren't sure we had the same neurons everyday
1:08:17so some days the decoder work very well like here in other days it didn't work so well what we
1:08:24saw on average over the sessions is that the subject got better and better at learning to use the synthesisers
1:08:30meaning that
1:08:31even though he was given a brand new synthesiser on the twenty that session it didn't take "'em" nearly as
1:08:36long to get good it using that a synthesiser
1:08:42well to summarise them for the speech brain computer interface here
1:08:45there are several mount novel aspects of this interface that was the first real time speech brain computer interface so
1:08:52this is the first attempt to actually decode ongoing speech as opposed to pulling out words or moving a cursor
1:08:59to choose words on the screen
1:09:02it was the first real time control using wireless system a wireless is very important for this because
1:09:10if you have a connector coming out of your head which is the case for some patients you get the
1:09:15sort of surgery
1:09:17that connector actually can have an infection build up over build up around it and this is a constant problem
1:09:24for people with this sort of system wireless systems are the weight of the future
1:09:29we were able to do a wireless system because we only had two channels of information a current systems have
1:09:36usually hundred channels or more of information and the wireless technology is still catching up so these hundred channel systems
1:09:44typically still have
1:09:45connectors coming out of the head
1:09:48and finally are project was the first real time control within a lecture that in been implanted for this long
1:09:53the selected within for over three years this highlights the utility of the sort of electrode we you
1:10:00or permanent implantation the speech that came out was extremely rudimentary as you saw but keep in mind that where
1:10:08we have two tiny wires of information coming out of the brain
1:10:13pulling out information from at ten neurons max
1:10:17out of the hundreds of millions of neurons involved in the system and yet the subject was still able to
1:10:22learn to use the system and improve the speech over time their number things we're working on now to improve
1:10:29at most notably we're working on improving synthesis that we are developing two-dimensional synthesisers that can produce both vowels and
1:10:37consonants and that sound much more natural than a straight formant synthesiser
1:10:41a number of groups are working on smaller electronics and more electrodes
1:10:45the state-of-the-art now as i mentioned is probably ten times the information that we were able to get out of
1:10:51this brain computer interface so we would expect a dramatic improvement
1:10:56in a performance with the modern system
1:10:59and we're spending a lot of time working on decoding techniques that are i'm improved as well the initial decoder
1:11:07that you give these subjects is a very rough it just gets i mean the ballpark and that's because there's
1:11:13not nearly enough information
1:11:15to an upper decoder properly from a training a sample and so what people are working on include people in
1:11:22our lab are decoders that actually tune while the subject is trying to use the prosthesis of not only is
1:11:29the subjects motor system adapting to use the prosthesis
1:11:32but the prosthesis itself is helping that adaptation by a cutting error down on each production very slowly over time
1:11:40to help the system state to overtime
1:11:43and with that i'd like to
1:11:45again thank at my collaborators and also thank the N I D C D and N S F four funds
1:11:51that funded this research
1:12:05okay so we have time for two questions
1:12:12really interesting to
1:12:15yeah it is pretty strong emphasis and formants room this numbers and speeches in that
1:12:21when you have the playback of doggy the
1:12:26that's of great so right is there other work that you're doing with stop consonants are figuring out a way
1:12:32to put things like that in your right eye so i largely focused on performance for simplicity during the talk
1:12:39the smell sensory feedback control system in the model actually does
1:12:44a lot of the work for stop consonants so for example for a B we have a target for the
1:12:49closure itself or so there is in addition to the formant representation we have tactile dimensions that supplement the targets
1:13:02mass sensory feedback i is i in our model secondary auditory feedback largely because during development we get auditory targets
1:13:12in their entirety from people around us
1:13:14but we don't we can't
1:13:16tell what's going on in their mouth so early development we believe is largely driven by auditory dimensions
1:13:21this may have sensory system learns what goes on when you properly produce the sound and then it later contributes
1:13:27to the production once you build up this madison street target
1:13:31no one other quick note is another simplification here is that
1:13:35at frequencies a strictly speaking are very different for women and children and men and so we when we are
1:13:43using a different voices we use the normalized formant frequency space where we actually use ratios of the formant frequencies
1:13:51to a
1:13:52to help accommodate
1:13:57the question
1:13:59right well as i think that i think you but i think that scare yeah
1:14:04i mean and if for any debate between and you can literally tag it's very
1:14:13i just one can understand where you're coming from one to because we really and in working with
1:14:20people have
1:14:21and look very similar to a data you know we here
1:14:27there you can delete it look at right
1:14:31then you can get that the articulatory information at that it just use where actually made perfect memory for example
1:14:38yeah okay well that's so that so in my are you
1:14:45gestural score is more or less equivalent swore ford motor command and that feed forward command is tuned up to
1:14:52hit auditory target so
1:14:54we do have a job in a factor gestural score in the form of a feed-forward motor command and so
1:14:59if you produce speech very rapidly that whole fee for motor command will get read out but it won't necessarily
1:15:07make the right sounds if you push to the limit
1:15:10so for example in the perfect memory case the model would you be you know would do the gesture for
1:15:15the see if it's producing a very rapidly
1:15:18it wouldn't that he may not come out but it would presumably here a slight error in try to correct
1:15:25for that a little bit in later production but
1:15:28to make a long story short my view is that the gestural score which i think does exist is something
1:15:34that is equivalent to a feed-forward motor man
1:15:38the people model does it show how huge amount that gestural score how you keep it to do over time
1:15:44things like that okay
1:15:48and then
1:15:50thanks a really amusing talk and
1:15:54seems to me that people did review but someone who sensory feedback doesn't really tell you about what those words
1:16:03all those mean people through all those sort of visual track on any of the kind of feedback and speech
1:16:10it absolutely does but we do not have anything like that in the model so we purposely focused on motor
1:16:16control speech is a motor control problem and
1:16:19the words are meaningless in the to the model that of course a simplification
1:16:25track ability for us to be able to study a system that we could actually characterise computationally out were working
1:16:32well or a higher level connecting this model which is kind of a low level motor control model if you
1:16:38will with higher level models of
1:16:41sequencing of syllables and we're starting to think about how these
1:16:46sequencing areas of the brain interact with areas that represent meeting and so middle frontal drivers for example is
1:16:56very commonly associated with some word meaning and temporal oh these areas you know i
1:17:02but the sequencing system a but we have not yet model that so this kind of
1:17:10in our view we're gonna working our way up from the bottom where the bottom is motor control and the
1:17:15top as language
1:17:17we're not that far up there yeah
1:17:25so it was really inspiring talk
1:17:30kind of wondering that thinking about
1:17:32the beginning of your talk and the babbling in imitation
1:17:37one of the things is pretty
1:17:41apparent from that is that you're starting out effectively with your model with adult vocal tract
1:17:49and they're listening to external stimuli which are also kind of matched so right so what is your take on
1:17:56the i'm i work with very back then a lot on thinking about things like normalisation i'm kinda curious what
1:18:02your take on online
1:18:05how things change as the as you know you get a six month old and their vocal tract rows and
1:18:12stuff like that how do you see that fitting into model well so
1:18:16i think that highlights the fact that
1:18:18formant strictly are not the representation that's used for this transformation from at all you know when the channel here's
1:18:25an adult sample they're hearing the big muscular normalized version of it that their vocal tract can imitate because
1:18:33frequencies themselves they can't imitate but things like so we've looked at a number representations that involve things like wall
1:18:40of the ratio of the formants and so forth
1:18:43and those improve its abilities and they work
1:18:47well in some cases but we haven't found that and like what is that
1:18:52that representation
1:18:54i where i think it is in the brain i think in playing time prowling the higher order auditory areas
1:18:59that's probably where you're representing speech in this
1:19:02are independent manner
1:19:03but what exactly those dimensions are i can't say for sure it's something some normalized formant representation but the ones
1:19:12we tried we tried miller's space for example
1:19:17eighty nine paper a they're not for satisfactory they do a lot of the normalisation from but they don't work
1:19:23that well for controlling movements
1:19:27oh i mean one of the things that i was thinking about is that keith johnson for example really
1:19:32feels like well this normalisation is actually learn phenomenon so it's easy feels like you have some of the machinery
1:19:38there instead of i mean deposit
1:19:40that it's
1:19:41you know it is some operation
1:19:43that you could ever imagine
1:19:45having an adaptive
1:19:47system that actually you know what that normalisation
1:19:50it's possible i there's just so examples like parents being able to see and so forth so i think that
1:19:56there's something about the mammalian auditory system that pulls out that the dimensions that it pulls out naturally are
1:20:03largely speaker-independent already that the i mean it pulls out all kinds of information but for speech system i think
1:20:09it you know that's what's using but
1:20:11i wish i could deviate more satisfactory answer
1:20:15nor did you have a great for a while
1:20:18question from cell and i using that when it
1:20:22is it just the first three data using we have first three are first two depending so for the prosthesis
1:20:28project we just use the first two for the simulations i showed for the rest of the people do the
1:20:33simulations those first three okay "'cause" we just in recent work for example that are
1:20:39and we and then you can tell information about which particular a term shape which is if you look at
1:20:45high of one right and when that is ideal
1:20:49it would be great if you pay include create something like any idea do not know what the other hand
1:20:55i was just gonna say we can look at that so by controlling F one through F three we can
1:20:59see what F
1:21:01for an F five would be for four different are configurations we haven't looked at that yeah but
1:21:07my view is that is that they're perceptually not very important or even salient so of course the physics will
1:21:13make times you know the form a slightly different if your tongue shapes are
1:21:16are different especially for the higher for men
1:21:19but i think that the speakers are you know what they what they perceive and
1:21:24is largely limited to lower formants i think some your earlier work
1:21:29just about
1:21:31no clear and not heard this argument that because you're selling a plate is a christian for see that brad
1:21:39story and then some more dishes at work that i actually did they give you colouring
1:21:45i mean you can a speaker-specific information here to saint and make it sound like a different person it's getting
1:21:50a plan what the values are i see so we yeah so we just fix those formants in our model
1:21:57a zero values for all sounds and
1:22:00you can hear the sounds properly but it like you know the voice quality may well change if we allow
1:22:04them to
1:22:06very good for just one just a continued but at the more like you would be able to add and
1:22:11when you add determine what the acoustic features are that these various case because you get the right to place
1:22:17in about
1:22:18does its trees but you get this will continue on in between right that would be great information people and
1:22:24speaker independent and you know
1:22:27speaker identification and characteristics right and speaker recognition can assistant
1:22:32as well as well speech therapy and pronunciation tools
1:22:37so that just something to think about all revisit that
1:22:40okay so we're gonna close that session because i don't want to sort of a take too much out of
1:22:45the right but like that's like thanks also be gone again