oh can have a okay good i okay mean um to do be talking about how we generalise and adapt the concept of pronunciation modeling and and use that to design a framework to help analyse like step here is the structure of the talk and i'll first start from the motivation um of speech science and engineering that model so dialect recognition a uh the dialect research uh there are different branches she's of work on the one hand there's speech science so for it well speech a signs these are social linguists but word um and a rules for across dialects to understand why these dialects are different um this is very important um but the are analysis is often manual so it's very time consuming we are them out of data that that can be the and that without enough data uh have sometimes a it is po it's the ball that some of these rules might be over or or or or a specified on the other yeah and we have speech technology so for example or a speech engine is um but design automatic dialect recognition systems i i and um and i to of these not and so it can put a since to that very efficiently even if the is a lot to and can also reach be a decent perform that we model these two then the commands i'm do these dialect differences for and a work we decided to combine the straits of these to research communities to bridge the gap between speech science and technology a in particular we want to design automatic systems that are you have to explicitly the these than the cross across dialects and use that to infer from human last so because of this in so but it's nature of had these results of the system we turn this approach in so but the of dialect recognition so to to can you a a or taste of what i mean by what of system can do as an example so in the end that we have there were transcript and the audio signal which could be used to generate the reference pronunciation and the dialect specific pronunciation um um in and red here to the model for all and the mapping between this reference pronunciation and dialect specific pronunciation um so that in the ah but we can get these phonetic transformations the use phonetic rules um that tell you how the dialects are different so for example in this case we see that a is deleted one it's followed by a consonant a and in addition we can see that we can quantify the occurrence frequency and no how often this happens and that's kind of information is extremely important for forensic phoneticians which is uh one of the big motivations behind a work so before i go into more of the details of our proposed model um i like to form we introduce what i mean by phonetic transformation because uh i will be we will be characterising dialects differences um using phonetic transformations so um represents adds a word to um in the rap reference dialect as reference phones and in the dialect interest we represent the pronunciation a surface phones and this may in between the reference phones and the surface phones is what we call phonetic transformation so to K if we're given the word a um and shoes general american english has the reference dialect um and british english as a dialect of interest now we have the reference phones and surface phones of the word back and here you see and the reference phones is mapped to a a a a a and the surface phones so this is an example of a a substitution which use the kind of phonetic transformation um there are two other car i have to be shown in in and so more about then right right but this is what i mean by phonetic transformation and i and to i proposed model and a we we it to make a model any parents to express a woman these have a transformations so i'm is called phonetic pronunciation model yeah and we want to answer the following questions you of this model so first to um one and can be a dialect to a reference dialect kinds of phonetic transformations occur oh a substitution insertions or deletions and if they occur to the how to that kurt in only certain phonetic context that okay and a thing to the curb so to answer these questions um we have to in a model but a markov model and we use that to help us automatically running the reference phones with the surface phones um the second part decision tree clustering which helps us gender as the phonetic rule so here is the slide way a three the thing kind of phonetic transformations each with an example yeah and the in the example american english has a reference dialect and british english for but um dialect of interest um so we use a cases the substitution of a a an american english it's pronounced that's back and in british english or sound like by um the second that the relation example where one is followed by a constant so in american english part what's that like something like in british english and example of phonetic transformations is insertions still here in general american english of what happens with the bound and the val following it at that the word finally it starts with a um that how the and and i i might be inserted in between when it's the british ah english speaker so that phrase saw i feel was on to more like saw a film um so these are some of the examples of the phonetic transformations and in the following slides was straight how these examples fit into our proposed H M and that but here is um a traditional hmm work where the circles represent the states in the squares represent the observation and um they are also i the state transition so this is a trivial case where the reference phones in the surface phones are things so there are no dialect differences um and this is the case of a substitution where i W and in this case the traditional hmm system can handle it at quickly however what about an insertion it's so if we have an insertion of a here we see that this are stiff is and does not have any corresponding state to it so a solution is that now we have a one to two mapping between the reference phones and the state so for reference pattern it's rappers oh uh states the first one is the right circle which indicates an estate and then it's by an insertion state the green circle and so now you see that um the observation that's the corresponding state to be mapped to and in addition uh we also for the categorise our state transitions um according to the press data transformations so now if a state transition is and sure and insertion state has like the red a or here in the graph there we call it insertion state transition okay so we can like the case of insertions how about deletions then so here we see the example i where um this state are has some the corresponding surface down or observation and to solve this problem we introduce a deletion state transition which skips normal state so in this case the state are is skipped so it no longer needs to be mapped to an observation so these are some of the highlights of um the differences if i proponents hmm network and the traditional one to help us more explicitly model the phonetic transformations in a richer way for now after training a hmm system using triphones we could find a rose like these on the right so for example yeah becomes all and it's followed by a T H so back becomes by also becomes comes a one it's followed by an uh as becomes class and i'm not example hmmm i still laugh becomes small the question here or one as it is the is observed rules um actually originating from a more general underlying rule and if it it is how can we find that so here we use decision tree a clustering to help us so from the results of decision tree clustering um we can find that by clustering these observed for an underlying rule so here the underlying what we found was that oh so now i actually when have a a is followed by a voiceless fricative but phonetic transformation of at to a little occur so i just talked about the highlights of for model and now um we going into the evaluation stage and we've done a series of experiments um and because of the time constraint not be able to share this information so the dialect recognition task um well not be talked about but uh you can read a lot of the details in our paper i'll be focusing on the other choose the first one is the pronunciation generation experiment where basically what as that's that bill the of the model by seeing how well it can convert one pronunciation into one other dialects pronunciation that do are we used it is um and big database um it has five different arabic dialect regions you where E egypt why palestine time in C or yeah and they are all conversational telephone speech and here we chose your he as a reference dialect and in this table or you can see that data the partition um for a experiment so this experiment the assumption is if we trained a pronunciation model well that it has learned these phonetic rules across dialects correctly then the model should be able to convert um the reference phones into a other dialects each and a very well so here after which and C and model a phonetic pronunciation model we give it a reference phones of the test that and to will generate the most likely surface phones of other arabic dialects i by comparing these surface phones that were generated to the ground truth surface phones we can see how well i model was converting uh one pronounce one doll let's pronunciation to another and here are the results so the orange um by a is the monophone version of the pronunciation model and the blue one is the decision tree um pronunciation model and we see here tree helps improve the recovery rate at one point seven percent relative meaning that the decision tree through results help as um convert these pronunciations better i'm here are like to mention a site note and we also did a lot of for analysis and found that they are are word usage differences across arabic dialect and this could um um can potentially complicate the evaluation of our system for um we also did the same experiment a using a phonetic pronunciation model on multiple english corpora without these were usage differences that will cause complications and the results are very good unfortunately i can not sure with the a show with you these to day because it will be covered in interspeech but um that means you should all come to my talk in interest as well so that evaluation is the row can an evaluation of where we can i one and rules are and shoot the ones in the linguistic literature so here on the left see that linguistic description of their for arabic dialects there are from the literature on the right T C where rules from my proposed system and you can see that the and rules from a proposed system actually um corresponds with these linguistic descriptions and spherical or more i they actually sometimes might potentially find the phonetic context of what these rules occur and most importantly um we can also quantify to five the current frequencies of these rules given the phonetic context and this information is very input six annotations for a for forensic phoneticians but is rarely document in the literature a little to conclude my top what talking about the contributions of this work so here we propose an automatic yet informative approach and analysing dialects and we call that's informative dialect recognition we use a mathematical framework to characterise phonetic transformations a a style X in a very explicit manner or to in these rules um yeah and i proposed system is able to postulate rules from large corpora to discover a we fine and quantify dialect specific rules so um if people have questions or issues that they were like to ask me about the talk i would be happy to do so i five a i don't know of the four one one four uh um um i oh i thought i i i it to you i yeah hmmm a i a hmmm i a hmmm hmmm hmmm a and a oh a um thank you and so i don't know i can remember all of them to respond them to a but uh that that's one yes that is the uh we are well yeah that point and it's just a i system is also able to go to these tension differences that may not actually be a phonetic rule in the but existing or not existing know when error is one of them and um john wells had have have a a has established a lot of very good literature on dialect differences in in and actually i'll be using a a lot of that in my next talk a um so so that is um what you could you looking for two and um you mentioned something else out the reference dialect um but the session of the reference dialect they are but the to me and linguistic descriptor um considerations so we actually consider or from the linguistic um side um i make some decisions such as i would not want to use a each option i back as um the reference dialect because it seems like for the native speakers of their big that i know that usually know how their dialogue is different um the egyptian dialect and so i since i don't really understand yeah a big and we have had to them to help me as a as of the model or of the system is going in the right direction uh will be easier for them to tell me uh uh if these phonetic transformations are occurring and it egyptian one is not a reference and then for palestine a and and see we want to but we have time we have been taking a big family so i was more reluctant to use them as reference is because uh since they are more closely then that values palestine then i may not be able to see C or you and difference is very easily and in the initial um establishment of uh the system it might be be better to have more or dialect differences and finally from the engineering perspective we actually have a lot more you data so that we can train systems on and so um that was the reason why a B and we chose iraqi rocky and this is a was a very difficult its decision but i and worked out okay in this case um um and so uh are there any other questions no no okay know hmmm