um so hmmm coding everybody so um my my is that's the most and the result in junior at three yeah and the work i'm bring to present you uh as been then by one of my critiques scheme we we who is associate professor at at key uh on the and all of the and the uh i set um the problem we are or a thinking this is a work is the acoustic-to-articulatory inversion and we propose to use a a a a new model in this domain uh which is a and they present in my so here is the the of my to work um in the first part uh i'm going to briefly present you uh what is the problem of the uh acoustic to a to mean person uh also um or brief presentation of the it is a tick mapping and uh the motivation of is uh then i we propose you the um present you the the proposed approach so uh which we call the the not keep it had memory and these but be followed by a a compact addition before the completion so um what's do acoustic don't good to mission problem uh and the is to recover the uh articulatory gestures from a uh a a speech you uh this is a of an interesting problem because many application can take and H uh of the knowledge about the articulatory such as uh a language learning speech directly or also speech recognition this is an interesting problem but also a very difficult why because this problem uh use i D uh a nonlinear and uh the mapping between the acoustic to the after three space uh is it and then you so uh we think that um in fact the dynamics at the at very then a mix can and to us sold uh i is partially the non-uniqueness uniqueness of the solution because uh the the dynamics uh accounts for uh some that when only effect uh such as the quad addition um is a control so for the physical property of the a greater such as the a ct the last uh the degree of freedom and also it accounts uh for the twenty teaching that the uh speaker use the are a a to a good choice it um so what about the that modeling um in the like it's like their linguistic many works uh a a a three uh a on the existence of a if or you know in fact this is a a part of brand uh where we encode code uh in the uh events we experience in or like and this uh uh experience uh uh a a are good the uh you into it is that i can you retrieved uh at any time and they are they are maybe that's is that's we use the order to may be speech processing and uh in fact you can uh retrieve a fast if you that you know that to interpret present events and also to um uh two and to speak uh you we knew for oh so they it but it can be uh or we use the uh in a i think to to to of speech uh processing uh us just the speech recognition so we don't be based speech recognition and also uh we've uh a speech and this is uh uh uh we've unit addition which can you also uh seen as a so um this model it's models or are in fact a i yeah collections of uh acoustic tradition of a lexical units we can be phones life on sites say to votes on word and uh most of the time this uh a it is that are are this try uh i as uh i'm i'm the uh acoustic frequencies and uh we've contextual information um the results of the this model uh for both speech recognition and speech and these are uh most of the time expressed uh as a concatenation of it that and he's can get and we should uh best explains the input seen your signal for speech recognition but a put to the the input speech you know would be uh describe a sequence of it is that and for speech in uh this it i and that's use we the also express i to comp condition of of so uh i i call these uh are sure uh was the decay uh a memory as compared to so let's go back to do or or from problem which is the the acoustic but going there's so uh because is can is attractive for this problem for uh to reason the first one it that's it relies on uh all sir uh synchronized acoustic and articulatory data so we don't at to form a any assumption about a mapping function uh the second uh it that each it's that's to get three dynamics are these of we think it is that and and then was to solve the problem of the than unity you um um or were there is um maybe more practical problem than uh uh to record problem um i mean a if we consider speech recognition and speech in um the not being is a from continuous space from a discrete space for speech recognition so we try to map and acoustic signal to a sequence of lexicon the speech and this try to map uh the sequence of lexical units so that's a phone type one two and a but if you can see that the uh i did not that patch the prime used the mapping is between two continues space so um usually usually for speech cushion speech and this the memory are based on uh let's say a a few of words of to tens of a words of speech uh to have a uh reason it one uh press but uh the a or are uh of uh board for uh we articulatory in information are very sport for now uh pixel out have a few minutes or uh at most two tenths of and that's this uh small amount of data uh can at cover us to efficiently uh well the evaluation in the the uh acoustic and articulatory space so um we propose to um to frank for two to combine uh the the bit about it is that and uh this combination uh uh we'll be based on the look similar i between these it is that uh this way of combining it use that can uh produce and seen a uh are that we trajectory and can uh bit there are or a nice about the that these we can the memory will be able to produce variation of fixed so a here is a a a a a a a very basic example just to illustrate uh what i mean by combining it so just consider a a very simple like and pro problem and just a that i give you this letter and and uh ask you two are try to to solve this problem and we think uh only a to six and image that you to fine to to try uh within in this that you hand uh the the um the red one and a two one and after that i can ask you could you a a give me or their solution to do so and we get i see three point point uh let's say the some sort of a real E so from the to previously five uh trajectory uh we think the like and we can find a what of want name i and and yeah models but this is a very basic problem and a is only spatial and and of course here we don't have to do with a a for and uh to mention uh a a a a a a a solution um so here right spend oh i bits my memory um we consider a it is that as a a sequence use of synchronized acoustic and country three observation uh and uh the consider leads you can it is the phone were so what um do we consider are local but i T so see uh look uh local also T is uh to uh are similar are good we can gosh which a pure at so you know times so not instance during the addition of a given for so you have to do with to uh time mention the first one to tom they mention and the second one is to spatial image oh so we use uh a the D U W uh i i've to uh did with temporal dimension and we you also if the and not the to uh make the the mapping uh a symmetry and to be able to compare different uh uh distance between it is that and uh uh also be talk or constraint uh a a low to uh control the distortion that time distortion a a of the at um for for special to a similar P uh let's consider uh the plots on the bottom right corner um uh uh just say that it's the a trajectory of one of one at late or and just consider the at a time the uh the position of position X uh X i and we uh just say that X i plus one it's the natural a a a a a a target of uh X I and we just make this the following estimation um that's X i plus one would have been is found uh without that a significant impact uh on the uh a a a a quiz so we define uh when in the divide uh a their center of around uh X Y this one and we just uh say that any uh uh got three configuration uh uh within this into value can be uh consider a a similar to uh X Y so um lets consider two to it is that now um oh a given for so that's say for example to uh acoustic and articulatory or a addition of the the phone G or know um don't um let uh see uh uh oh oh to beats uh the genetic thing so we just check before or before that uh X and Y are similar enough uh because uh uh to a realisation of uh some uh all uh can be quite different uh because some to get or on a not critical for for four um so we we we map uh first uh let's say it is that uh a want to the if is that X uh i've represent the the the a line observation we've the got collides so the right one oh okay okay two um so i it just to like that uh from to a it is the uh the genetic memory can things uh uh at the bottom of to grow uh of the figure as you can see uh eight that through good it it is so the memory is able to produce a from a a a two if is that eight uh it it is that which are uh up a battery uh for for from a a a a a three point of view uh a but it and can uh and that oh so the emission consist in the so so the chance you marie uh is an oriented graph so each node is the uh synchronized acoustic and at the target vision and the it is a the a load of uh a transition did from the a preceding a mapping from uh uh and it was that and know that and the emission in finding in the this draft uh the the path which best matching but matches the uh the input uh acoustic to be birds and uh of course don't to great gesture uh uh is the right from the to get three component of each node so um for the edition we have compared uh uh the memory yeah that's going we a concatenative in and we will could look bad uh this approach we the me call uh uh a a constraint um he is the cup are we use more got uh uh which contains two speakers and made and a female uh the which is english and uh we use a a more are you seven seven colours uh are two on the the lips on the low once he's are that don't keep the don't body a of some and the and we use also a french corpus we have recorded not your got a uh we don't use the uh we don't fix the code a a on the vet on but uh on the the route that um okay okay that and the would do that evaluation efficient um off the is to a uh trajectory uh are based on that would mean square or and the P which can you five this to me like and synchrony between to a accounts and it's meeting up to a that we so you are the results um do the red about isn't the codebook book uh a results the blue the concatenative memory and the green bar does not memory and we can observe to same uh improvement trend uh over all the three corpus so over the two language language which use um over the three speaker uh that memory uh a always perform the competitive memory and the could be and uh graph can five the probability of movement so we can expect an improvement between five and and percent with an eight nine person computer uh for the gmm over the seem am and uh between ten and fifteen points some uh for this unit level here is you a uh a so uh as you can see the could to write very jerky trajectory why the um it to dig memories uh provide us with the uh trajectory because it i it it's better model so it correspond to the movement of them a along the at that X for the french and sure she's to can "'cause" extreme the boss okay that uh a compile the the is you the of the or results uh we can say that we have uh reason able good performance uh for example i i propose to some uh machine learning algorithm uh which have been proved over something to based and uh are we can see that the uh mean square and price all between a a one point four and one once but um a would have reported in article that uh do uh articulatory data acquisition is a a a all is about zero point for me to also we can just say that a a okay uh we have different uh method but maybe as we don't share exactly the same process thing uh that that process and uh because of the uh that the position error we are more and so um we propose a a a a not to because the be marie so this model is uh interesting because it does not require a it since assumption about the mapping function uh the memory is able to uh on but the dynamic and uh it is a also so to produce and seen to uh gesture and just can should are a i about it so for a future work uh we're focusing on the use of more reviews distance because for uh this where we have used the a to the end distance of the acoustic space and G these distance is known that to be uh robust for the we like was of to they can do the can uh the correlation between the articulators because that bit does can compensate can each with the and uh we think this uh correlation can add to get for that uh a like was to to uh move from uh a pure phonetic segmentation during the the building of the memory to uh but not cry just based uh that tension should propose or something but i uh i don't think used and finally can uh proceed or to get further improvement local the application uh because the memory is able to produce new trajectories but face uh two uh precisely map uh an acoustic frame it is uh in to the up that i've got made if uh synchronise of solution um thank i you i we have time about the question i and that's just one thing linear and it seems to me there is room for combining the codebook book and the chance model and that the codebook book be some kind of a starting trajectory arrears i i was T is a possible to come by the codebook book at the channel to model so the codebook book stuff as you are yeah no initialization annotation so to speak are it's yeah i think um space and i to the search or would that be computationally to expense oh i in the memory it's it's uh and is that as a kind of code it's it's much data could because uh we have to dump for information within the memory uh this is and see that the could uh missus but okay so thank you again