0:00:21um so hmmm coding everybody so um
0:00:25my my is that's the most
0:00:26and the result in junior at three yeah
0:00:30and the work i'm bring to present you uh as been then by one of my critiques scheme we we
0:00:35who is associate professor
0:00:36at at key uh on the and all of the
0:00:40and the uh i set
0:00:45the problem we are or
0:00:47a thinking this is a work
0:00:48is the acoustic-to-articulatory inversion
0:00:51and we propose to use a a a a new model in this domain
0:00:54uh which is a and they present in my
0:00:58so here is the the of my to work
0:01:00um in the first part uh i'm going to briefly present you
0:01:05what is the problem of the uh acoustic to a to mean person
0:01:09uh also um
0:01:11or brief presentation of the it is a tick mapping
0:01:14and uh the motivation of
0:01:17uh then i we propose you the um present you the the proposed approach
0:01:21so uh which we call the the not keep it had memory
0:01:25and these but be followed by a a compact addition
0:01:29before the completion
0:01:33so um
0:01:35what's do acoustic don't good to mission problem uh and the is to recover
0:01:40the uh articulatory gestures
0:01:42from a uh a a speech you
0:01:45uh this is a of an interesting problem because many application can take and H
0:01:49uh of the
0:01:51knowledge about the articulatory
0:01:53such as uh a language learning
0:01:55speech directly or also speech recognition
0:01:59this is an interesting problem but also a very difficult why
0:02:02because this problem
0:02:03uh use i D uh a nonlinear
0:02:05and uh
0:02:07the mapping between the acoustic to the after three space
0:02:11uh is it and then you
0:02:15so uh we think that um in fact the dynamics
0:02:19at the at very then a mix can and to us sold
0:02:22uh i is partially
0:02:23the non-uniqueness uniqueness of the solution
0:02:26because uh
0:02:28the the dynamics
0:02:30uh accounts for uh
0:02:32some that when only effect
0:02:33uh such as the quad addition
0:02:36is a control so for the physical property of the a greater
0:02:40such as the a ct the last
0:02:43uh the degree of freedom
0:02:45and also it accounts uh
0:02:48for the twenty teaching
0:02:49that the uh speaker use the are
0:02:51a a to a good choice it
0:02:57so what about the that modeling um
0:03:00in the like it's like their linguistic many works
0:03:03uh a a a three
0:03:04uh a on the existence of a if or you know
0:03:07in fact this is a a part of brand
0:03:10uh where we encode code uh in the uh
0:03:14we experience in or like
0:03:16and this uh
0:03:18experience uh uh a a are good the uh you into it is that
0:03:23i can you retrieved
0:03:24uh at any time
0:03:26and they are they are maybe that's is that's we use the order to may be speech processing
0:03:31and uh in fact you can uh retrieve a fast if you that you know that to interpret present events
0:03:37and also to um
0:03:42and to speak uh you we knew
0:03:47oh so they it but it can be uh or we use the uh in a
0:03:51i think to to to of speech uh processing
0:03:55uh us just the speech recognition
0:03:57so we don't be based speech recognition and also
0:04:00uh we've uh a speech and this is
0:04:02uh uh uh we've unit addition
0:04:05which can you also uh seen as a
0:04:08so um
0:04:09this model
0:04:11it's models or
0:04:12are in fact a
0:04:15i yeah collections of uh acoustic tradition of a lexical units
0:04:21we can be phones life on sites say to votes on word
0:04:25and uh most of the time this uh
0:04:28a it is that are are this try uh i as uh i'm i'm the uh acoustic frequencies
0:04:33and uh we've contextual information
0:04:39the results of the
0:04:41this model uh
0:04:43for both speech recognition and speech and these are uh most of the time expressed
0:04:47uh as a concatenation of it that
0:04:50and he's can get and we should uh best explains
0:04:54input seen your signal for speech recognition
0:04:57but a put to the the input speech you know would be uh describe a sequence of it is that
0:05:02and for speech in
0:05:03uh this it i and that's use we the also express i to comp condition
0:05:08of of
0:05:10uh i i call these uh are sure uh
0:05:14was the decay uh a memory as compared to
0:05:18so let's go back
0:05:19to do or or from problem which is the the
0:05:21acoustic but going there's
0:05:24so uh
0:05:25because is can is attractive for this problem for uh to reason
0:05:29the first one it that's it relies on uh all sir
0:05:32uh synchronized acoustic and articulatory data
0:05:35so we don't at to form a any assumption about a mapping function
0:05:39uh the second uh it that each it's that's to get three dynamics are these of we think it is
0:05:45and and then was to solve
0:05:47the problem of the than unity you
0:05:51um um or were there is um
0:05:55more practical problem than uh
0:05:57uh to record problem
0:05:59um i mean
0:06:00a if we consider speech recognition and speech in
0:06:05the not being is a from continuous space from a discrete space
0:06:08for speech recognition so we try to map and acoustic signal to a sequence of
0:06:14the speech and this
0:06:15try to map
0:06:17the sequence of lexical units
0:06:18so that's a phone type one
0:06:20two and a
0:06:23but if you can see that the uh i did not that patch the prime used
0:06:27the mapping is between two
0:06:29continues space
0:06:32so um
0:06:34usually usually for speech cushion speech and this the memory are based on uh
0:06:39let's say a a few of words of to tens of a words of speech
0:06:43uh to have a uh reason it one uh press
0:06:47but uh
0:06:49the a or are uh of uh board
0:06:52for uh we articulatory in information are very sport for now
0:06:58pixel out have a few minutes
0:07:00or uh
0:07:01at most
0:07:02two tenths of
0:07:03and that's this
0:07:05small amount of data
0:07:07uh can at cover
0:07:08us to efficiently
0:07:09uh well the
0:07:11evaluation in the
0:07:13the uh
0:07:15acoustic and articulatory space
0:07:23so um
0:07:25we propose to um
0:07:28to frank
0:07:29for two to combine uh the the bit about it is that and uh this combination
0:07:34uh uh we'll be based on the look similar i between these it is that
0:07:39uh this way of combining it use that can uh produce
0:07:44and seen a uh are that we trajectory
0:07:46and can uh
0:07:47bit there are or a nice about the
0:07:50that these we can
0:07:51the memory will be able to produce variation of fixed
0:07:57a here is a a a a a a a very basic example just to illustrate uh what i mean
0:08:01by combining it
0:08:02so just consider a
0:08:04a very simple like and pro problem
0:08:06and just a that i give you this letter and and
0:08:11ask you
0:08:12two are try to to solve this problem
0:08:14and we think uh only a to six
0:08:19image that you to fine to to try
0:08:23uh within in this that you hand
0:08:24uh the the um
0:08:26the red one and a two one
0:08:28and after that
0:08:29i can ask you could you
0:08:31a a give me or their solution to do so
0:08:35and we get
0:08:38i see three point point
0:08:41uh let's say the some sort of a real E
0:08:44so from the to previously five
0:08:48uh we think the like and we can find a what of want
0:08:54i and
0:08:59but this is a very basic problem and a is only spatial
0:09:03and and of course
0:09:04here we don't have to do with a a for and uh to mention
0:09:11a a a a a a a solution
0:09:16here right spend oh i bits my memory um
0:09:20we consider a it is that as a a sequence use of synchronized acoustic and country three observation
0:09:25uh and uh the consider leads you can it is the phone
0:09:33do we consider are local but i T so
0:09:35see uh look uh local also T
0:09:37is uh
0:09:40to uh are similar are good we can gosh which a pure at
0:09:45so you know times
0:09:46so not instance
0:09:47during the addition of a given for
0:09:51so you have to do with to uh time mention
0:09:54the first one to tom they mention
0:09:56and the second one is to spatial
0:09:58oh so we use uh a the D U W uh i i've to uh
0:10:03did with temporal dimension
0:10:05and we you also if the and
0:10:07not the to uh
0:10:08make the
0:10:10the mapping
0:10:11uh a symmetry
0:10:13and to be able to compare different uh
0:10:16uh distance
0:10:17between it is that
0:10:20and uh uh also be talk or constraint uh a a low to uh control the
0:10:25distortion that time distortion
0:10:27a a of the at
0:10:30for for special to a similar P uh let's consider
0:10:34the plots on the bottom right corner
0:10:37um uh uh just say that it's the a trajectory of one of one at late or
0:10:41and just consider the at a time
0:10:45uh the position of position X uh X i
0:10:49and we uh just say that X i plus one it's the natural
0:10:54a a a a a a target of uh X I
0:10:57and we just
0:10:58make this
0:10:59the following estimation
0:11:01that's X i plus one would have been is found
0:11:05uh without that a significant impact uh on the uh a a a a quiz
0:11:10so we define uh
0:11:12when in the divide
0:11:13uh a their center of around uh X Y this one
0:11:17and we just uh say that any uh uh got three configuration
0:11:23uh uh within this into value
0:11:24can be uh
0:11:26consider a a similar
0:11:27to uh X Y
0:11:35lets consider two to it is that now
0:11:39oh a given for so
0:11:41that's say for example to uh acoustic and articulatory or a addition of the the phone G or
0:11:50don't um
0:11:53uh see uh uh oh oh to beats uh
0:11:56the genetic thing
0:11:58we just check before or
0:12:01before that uh X and Y are similar enough
0:12:04uh because
0:12:05uh uh to a realisation of uh
0:12:08some uh all
0:12:10uh can be quite different
0:12:12uh because some to get or on a not critical for for four
0:12:20so we we we map uh first uh
0:12:23let's say it is that uh a want to the if is that X
0:12:27uh i've represent the the the a line observation
0:12:31we've the got collides
0:12:33so the right one
0:12:38okay okay
0:12:44so i it just to like that uh from to a it is the
0:12:47uh the genetic memory can things
0:12:50uh uh at the bottom of to grow uh of the figure
0:12:53as you can see
0:12:54uh eight
0:12:55through good it it is so the memory is able to produce a
0:13:00from a a a two if is that eight uh
0:13:03it it is that which are uh up a battery uh for for from a a a a a three
0:13:07point of view
0:13:10a but it and can uh and that
0:13:14oh so the emission consist in the so so the chance you marie
0:13:18uh is an oriented graph
0:13:20so each node is the
0:13:23synchronized acoustic and at the target vision
0:13:26and the it is a the a load of uh a transition
0:13:29did from the
0:13:31a preceding a mapping from uh uh and it was that
0:13:34and know that
0:13:36and the emission in finding in the this draft
0:13:41the path which best matching
0:13:43but matches the
0:13:47the input uh acoustic to be birds
0:13:51and uh of course don't to great gesture
0:13:53uh uh is the right from the to get three component of each node
0:13:59so um
0:14:00for the edition we have compared uh
0:14:02uh the memory yeah that's going
0:14:04we a concatenative in and we will could look bad uh this approach
0:14:09we the me call uh uh a a constraint
0:14:12um he is the cup are we use more got
0:14:15uh uh which contains two speakers and made and a female
0:14:19uh the which is english and uh
0:14:21we use a a more are you seven seven colours
0:14:24uh are two on the the lips
0:14:26on the low once he's are that don't keep the don't body
0:14:29a of some and the
0:14:31and we use also a french corpus we have recorded
0:14:33not your got
0:14:35a uh we don't use the uh we don't fix the code
0:14:38a a on the vet on but uh on the the route
0:14:46okay okay
0:14:49and the would do that
0:14:51evaluation efficient um
0:14:52off the is to a uh trajectory
0:14:55uh are based on that would mean square or and the P which can you five this to me like
0:14:59and synchrony between to
0:15:01a accounts and it's meeting up to a that we
0:15:06so you are the results
0:15:10do the red about isn't the codebook book uh a results
0:15:13the blue the concatenative memory and the green bar
0:15:15does not memory and
0:15:17we can observe to same uh improvement trend
0:15:20uh over all the three corpus
0:15:22so over the two language language which use um over the three speaker
0:15:27that memory uh a always perform
0:15:30the competitive memory and the could be
0:15:33and uh graph can five the probability of movement
0:15:37so we can expect an improvement
0:15:38between five and and percent
0:15:40with an eight nine person computer
0:15:43for the gmm over the seem am and uh
0:15:46between ten and fifteen points some
0:15:48uh for this unit level
0:15:53here is you a uh
0:15:55so uh as you can see the could to write very jerky trajectory
0:16:00why the
0:16:01um it to dig memories
0:16:03uh provide us with the uh
0:16:06because it i it it's better model
0:16:10so it correspond to the movement of them
0:16:12a along the at that X
0:16:15for the french and sure
0:16:16she's to can "'cause" extreme the boss
0:16:21okay that uh
0:16:22a compile the the is you the of the or results
0:16:25uh we can say that
0:16:27we have uh
0:16:28reason able good performance
0:16:30uh for example i
0:16:32i propose to some uh
0:16:34machine learning algorithm
0:16:37uh which have been proved over something to based and uh are we can see that
0:16:40the uh mean square and price all between
0:16:43a a one point four and one once
0:16:47but um
0:16:48a would have reported in article that uh
0:16:51do uh articulatory data acquisition is a a a all is about
0:16:56zero point for me to also
0:16:57we can just say that a a okay
0:16:59uh we have different uh method but
0:17:03as we don't share exactly the same process
0:17:05thing uh
0:17:06that that process
0:17:07and uh because of the uh that the position error
0:17:10we are more
0:17:16we propose a a a a not to because the be marie so this model is uh interesting because
0:17:21it does not require a it since assumption about the mapping function
0:17:24uh the memory is able to uh on but the dynamic
0:17:30and uh
0:17:32it is a also so to produce and seen to uh gesture and just can should are a i about
0:17:39for a future work
0:17:40uh we're focusing on the use of more reviews distance because for
0:17:45this where we have used the a to the end distance of the acoustic space and
0:17:51these distance is known that to be
0:17:53uh robust
0:17:54for the
0:17:56we like was of to they can do
0:17:58the can uh the correlation between the articulators
0:18:01because that bit does can compensate can each with the
0:18:04and uh
0:18:06we think this
0:18:08correlation can add to get for that
0:18:11uh a like was to to uh move from uh
0:18:15a pure phonetic segmentation
0:18:17during the
0:18:18the building of the memory
0:18:19to uh
0:18:20but not cry just based uh
0:18:23that tension should propose or something but i uh i don't think used
0:18:27and finally can uh
0:18:30or to get further improvement local the application
0:18:33uh because the memory is able to produce new trajectories but face
0:18:38uh two
0:18:40precisely map uh an acoustic frame it is uh
0:18:44in to the up that i've got made if
0:18:48synchronise of solution um
0:18:53i you i
0:18:58we have time about the question
0:19:08i and that's just one thing linear and it seems to me there is room for combining the codebook book
0:19:12and the chance model and that the codebook book be some kind of a starting trajectory arrears
0:19:18i i was T is a possible to come by the codebook book at the channel to model so the
0:19:22codebook book stuff as you are
0:19:24yeah no initialization annotation so to speak are
0:19:28yeah i think um
0:19:31and i to the search or would that be computationally to
0:19:37i in the memory it's
0:19:39it's uh
0:19:40and is that as a kind of code
0:19:42it's much data could because
0:19:45we have to dump for information within the memory
0:19:48uh this is and see that the could
0:20:04okay so thank you again