a a but not well as you uh uh the this stock is uh uh is a go clap of it but by an and the question that's and that's me and all lot and well uh the liz resolving non-uniqueness in the acoustic-to-articulatory mapping which i would for the for it was it a mapping uh i i i think i'll skip the scroll slide because the last two presentations were pretty much about the same thing and uh it's basically just to give an idea as to what it to we uh mapping is or inversion is uh uh i'll do it jump to the the main uh focus of this stop which is actually the non-uniqueness in this mapping which has been a for to by the P uh by uh by the few adults as before uh would oh spoke before me uh so in the literature we have uh things like uh at the loss lists uh you models of the vocal tract to is a parameter gotta oh model of uh speech synthesis and uh you can say that the inverse mapping from acoustic coast is actually to a class of a a function the not exactly one E and you have a similar results from other that such experiments and you have something uh there are some experiments called a bike block experiments where the uh the all these speaker is this constraint uh but still a a uh is the speakers can produce a perceptually sim similar sounds even spite of the natural pollution so this gives an uh indication of money of course these are are sit situations than the it this this may not really a in natural speech so what what about in continuous speech so we would be you can have different forms of data to collect this uh thing uh uh which have listed here uh in our case we use "'em" mocha timit database just like the previous to uh a so i wouldn't going to that too much uh so this is an example from the data set and then we have a a a a a phoneme uh a uh uh and the the red and the blue lines here they they get uh the D spectrum the magnitude spectrum uh from two instances and uh uh the figure two uh uh uh to the bottom to the right bottom is actually of the positions of the articulator quite a a a and you can see that even though the the sticks are are are quite similar the the uh the if you positions art are slow are quite different uh but is this non-uniqueness i mean uh i mean do you still can't say really that just not just because there is a difference in the acoustic so uh can can this difference in acoustics be explained by uh uh by this there variation position of the of the article so uh the but that's that that sort of comes to the problem uh in when you have this kind of uh data a limited data base that you cannot get exactly the same uh uh sticks an exactly the same articulators uh uh uh uh what or it exactly the same of six with different not is so that that's the that the difficulty as data so the P questions in this in this stock or a how does one estimate non-uniqueness in a limited data and uh that we do it but statistical modeling morning based one a one of four previous papers uh how do these non any instances of coding friends agreed to goes stick articulate frame uh does uh applying can here D constraints help a all non less uh these of be a main questions so we are we have a toy example your and you can say that a that the the figure on the top here is uh is uh the acoustic parameters belong to say one phoneme and this is the uh are to two parameters but of long one point men you can see that acoustic is you name but is that i can three parameters are by more so is this non unique a what so you look at the data points here i i don't know whether the points are very clear but uh you can see that i mean it's not it's not completely true i mean you can see that there are some clusters your in the look at that the joint i quickly we an acoustic uh space and uh therefore we what we do is we for a model in this this sort of data and the joint space uh articulatory acoustic space and then we can look at what one of one value of acoustic but i'm with that a shown by the blue line there that's of test sample and we can find the conditional probability distribution and this case this is uh a by more eager which says that at at this uh at this value for acoustic parameter uh the uh the the mapping use non unique but if you look at a another acoustic parameter here which belong to this the same a a close to cluster you can see that it's uh uni modal and it's not it's not not not of course that the there's is the question of uh the variance uh which is also a a a a least of some sort of and because for one value for a stick but i'm with you can have different well use of articulate but i mean but uh we don't we don't look at to this sort of money miss in the in this paper and uh we just look at the uh this by mortar kind of an on in uh this to the close the parameterization of the data and again it's very similar to what has been used in the state of the art though uh uh it we mapping systems and source some that the one which was used previous the previous paper uh this is an example of a non nice so what these uh these uh but blocks are actually the conditional distributions uh given a one vector of a co six these pop but with lots of the the conditional distributions of the uh of the articulate records so in this case and the blue out the blue dot sense and triangles and one they are they are actually that the peaks of these uh different modes and the green line i uh that's clear in the in the presentation that the green line is actually the the recorded positions the this case you can see that the other close or to one of the peaks so and the other P and the other because actually uh uh the the the non unique a a not a non unique estimate of for this uh this particular stick uh what now we look at this in a trajectory uh so uh and in this case there is you can you can see that there they all you anymore more the all the uh the the conditional distribution the you anymore but you look at the next frame and then and in this case you can set saying that that on tip which is here and uh you can start saying that there is a there's another but which of uh which uh you can in you can see the same thing and in the lower lip which is here and the tongue dorsum was oh and it's and so uh but you can see but at the same time though the recorded positions are actually are always close or two one of the uh uh that the two are to one of the modes side than the other and uh but the the to another example of uh not uh following this and this is the uh another example and this case you can see that i this is largely uni modal uh um this is one frame but to uh are in the shop uh and uh you can start seeing that that that is a i the the second mode starts appearing somewhere here and you can see that it's there and the next estimate here post it shifts on to the the new mode so uh there what this work in in the first in the in the first example the this this second mode it up your and then sort of disappeared from from the estimates and this case it seems like there's a switch between the the first set of modes to the second set so that we have new questions zero which is like what is a different between the two examples how often do each type of these non uh a core and what is the role that what role does it play the predictability of the art uh uh i clear articulation and uh what we do that now is that we just shift the uh so that the previous examples what in in the articulate space the midsagittal plane where this one is actually in the uh in the space time are these plots in space time so the blue and the pink lines are actually uh the peaks of these these uh modes that you so that you saw on the black line is the the recorded project so you can see in that in that the type one what we call along the same part these um that the uh the the the recorded positions be sort of this stick to one of the project where as you can see that there is some non unique a estimates for some part of this uh uh uh of this tragic which we call non unique batch uh a the and in the second uh uh example uh you can see that the that they did not any uh so there is a sort of a a shifting from one of these oh well that's that that can be taken do the second but that's from the blue for the big i recall that the change in but so obviously it's it's is it's all obvious that from that type one is can is easy to estimate but using a information about the previous frames but that's not the case but i two uh and in this case you also need a a uh uh this the succeeding frames also you need to know where in which direction uh a but there are some exceptions you for example you can see this here that uh this is actually a the expect type is along the same but a but uh in fact it actually the the the recorded questions goes to W C P through to with the change in but so we'll we we just want to see how often thus you get these kind of excess uh this will uh so what we do is that we just do uh oh we we just have a conditions and be find other miss error the first one is we we apply can unity constraints the based on dynamic programming from the preceding context and then we select the one of the peaks from the to the second one we select the mean between the two peaks actually this is not really but um yeah not body articulate positions but we do it just two C uh uh what how how we reduce a whether it uses the arm error and the last one is that we uh estimate but uh so we estimate which of the don't the the peaks is actually uh gives a low was uh are of so we don't of to continue to constraints but we just uh C say that uh which of the peaks is close to the put so i i i just go to the to the you graph and two uh so it's uh a first at uh sort that the first thing we see that these uh i think so the the X axes actually the light of five so how many as a set uh successive frames you get where you have non unique uh uh estimates that is that's that's the exact x-axis and the number of occurrences is in the the wire so you can see that it form sort of a the if and a function and it sort of uh that the the number of has uh a number of uh a i is not any is with sort of decreases as of people's of with like and and that's that's even uh it's more so for the uh uh with change and but case that from yeah from the long the same part in long the same point you see that for a to uh uh um for two consecutive frames you get a lot more or you get more uh or cry oh oh it so the frequency of occurrence of a along the same but this higher than the used uh C if but only for a shorter parts for for longer parts it seems like it's uh more uh with change uh so fifty want to a three percent of the E frames are are result with a unity constraints for is P uh but it's it's much lower what for or W C B as expect it's only twenty a nine to that a three percent and it keeps or using uh uh that the the a pitch gives are using but the uh the with the length of uh that's a the mean uh between the two but actually works pretty well for uh a them use a the view C P many of the case which is actually a or what it's not a it's not completely into it but it seems to what some but this is probably because you don't know at what point the the trajectory switches from one one of these uh thoughts the other part that's so selecting the mean actually is gonna pragmatic to use that seven um and uh but but this but the uh uh but uh this method actually by selecting the mean actually decreases as the length of the uh uh green a a a a a a a around it percent for that is for that is P and twenty two percent for the W C P i don result in the sense that the uh the uh the mode which actually it gives you the best results uh cannot be estimated using can to constrain so uh that's uh the other i result from from this uh paper it has uh that yes a a a a a a a uh acoustic project clean motion can be uh uh and the non-uniqueness in this uh inversion can be estimated statistically can you constraints but not for all ins uh instead we probably need some other information rather than just got to D for example like the motion state or that that of speech and some some some the time because the estimate uh there are some semidefinite good conclusions is that uh human beings make use of non unique i can uh articulator positions so this is clear but uh but this cannot we i can be a less we have exactly the same with six or for you the same of six with to for so it's a set my some might definite they are are i'm someone many are rather a a on so questions so well the the main question here is that uh and does this is unique like quite positions it change the at a function of the vocal tract and it might it might seem at you to that they do but that had that least to verify and we can hope that we get some uh and my i uh dynamic and might i results to uh but it it there and uh what kind of compensation we kind of them is used to to make this uh non unique uh are quickly uh uh uh articulation sorry one any calculations for the same course and uh a a given that we have non uniqueness in this mapping a what is it all for the for learning uh a line so how the inference figured out vol um and i like when my speech so that for it's open for discussion the many questions some okay over there that's to um so i we have a common to a type and your last slide you had a a question about what uh do not unique assistance of forty political right so maybe can show some i uh you know that's some comments on no the way to measure but the articulation right that these three positions or or even a sagittal it'll some of the image image right well provides this sort of uh projection are is complex channel tree that's also moving in time so we have a uh restrictions of special control sampling and which are again trying to map it to some acoustic uh feature vector it's also some sort of prediction of the signal uh so it's really in often times not the point or whether it this actually a uh are be mapping the same things or or for uh and trying to find something that's not there and well this this just that it results and all all of four or yeah uh has as shown no can to some extent we can show this a so what are you thoughts on an know how one would actually who were there gaps that one could still well the yeah i mean that's that's a very valid question in this in this field of research to because i i as you said that the are was it there all projections from what the reality is and uh i mean this this thing is is uh a sort of a much larger question i in many sense but what i would like to say is that uh B by looking at these statistical methods the let's say that we just do we don't use the acoustic parameters that we use and we instead use some other acoustic but i uh how are we use a uh are articulated parameters which are which are different or instead of using position of the quite we use a a functions for example uh the the the thing is that in this uh by using this kind of a as the stick method we can find of whether it is uh it is a uh you non and nick or not in a reasonable way uh this kind of this of us paper that i i i i that that that we worked on is sort of uh tells you that the problems that come when we try to do statistical based base stick to get you mapping which is very cute which is quite clear have that you are gonna have these problems when you do so a basic course uh i to give you mapping i we just why would it in the sum might of and conclusions i mean because we can be sure as of so i i i i don't know how to go ahead based on the unless of pose we have a three D and i that think uh yeah yeah in front of you uh it uh if i understand it correctly uh in this work you're at the in the question of not in this with within the speaker is that right yeah it's within this speaker so how do we it is there a made to extend this to cross speaker non because i that might to be important for yes actually that's quite clear i mean uh a different it's that i'm many of are many other evidence which show that people that a cross because we use different strategies uh of for to produce the same kind of sound but the problem there i is not is not exactly the same this because you would not produce exactly the same that i mean the course sticks very is bother five is also which is like to shape of a vocal tract and so you can produce the same phonemes the same sounds that we classify as the same phonemes uh a different people use different set it in there are several results which of that but can be produce exactly the same of course to by different uh are like uh by different are to get configuration i think that that question is more relevant to you look at a single speaker so you would say that this is a big telling instant i no oh have was okay i mean there just different questions okay thanks yeah so i think this spring cell session try that can does so as i "'cause" you have those and and the people on the flap but just