a a a i i i no i a i you know what a i today and people and some people would be you know a to up to have an another's with they on if an sup of for the rest of the live i thinking that those prophecies is of been just slightly misinterpreted and the event that they were referring to is this a wonderful speech to okay i a that have in of actually that you know it almost buys review thing it so uh i do i think that have anything and their estimates the significance of this that a that's okay okay so first they just about the name it's a it's some kind of coffee reference hence the little coffee being with uh but had so a but is just whatever name we thought to so uh the structure of this uh this whole presentation is fess i'm gonna talk for about fifteen or twenty minute just giving you know of you kind of from all sides of this tool K and then we're gonna a people to escape in case they don't want to know more details than the have a short break and then i not and uh on drug going to talk about a uh some more called local stuff like and i was gonna talk about some of the acoustic modeling code and we'll talk about the uh matrix like which just kind of independent useful uh speech and then after that uh i'm gonna go through some example scripts that we have been try to get people more of a you know give people a sense of of how to use that now or the next slide so some important aspect of the project is it the it's license under a you V two point uh which is the style a license that basically allows it to do anything you want with it there is only a uh an acknowledgement a close which as you have to acknowledge that the code came from that but that that's of that's it's it's one of the most open up the standard lies uh the project of currently hosted on source forge which is the standard place for these kinds of open source project uh we we it some talk it's a very closely associated with a particular institution our attention is for it to be more of a kind of thing that lives and the clouds out or and source for i i shouldn't have use that will that that's to that's just gratuitous that but it yeah there it's very for it not to just be him a the pet project of some particular little group but uh that's to represent the best of what's out there and and and we will can be participants as long as you can contribute code under this slice sense than that's great uh it's basically a C plus plus to at the code compiles it a native windows and and the common units but fun like can we're not claiming that a compile once on or you know other we're problem but but it compiled from on the normal one a you have some documentation not as much as takes T K and and and we have example script these example scripts and not uh there just for results also as one and and uh wall street journal but we're gonna have more to they they basically run from ldc that's so once you have the this you can kind of point them to the disk and just get an idea of how it work so oh no i now i realise that we didn't look a large enough row i think i think we just have a tie this thing to uh aggressively if these were not guy uh yeah so okay somehow out i gonna go through the kind of a think that support this is just the current features obviously or tending to a lot more so you can build a standard context-dependent uh lvcsr system you know with tree clustering in that it's been written in such a way that it supports arbitrary context size is so you can go to quint phone oh what's have and it will uh a work without without pain but the the training coding about fst based on a our code compiled against openfst for those of you who don't know up fst is it's kind of like the eighteen T tells set it's open source it's uh a project uh like google and some other um we can only only have max and like the had training we haven't yet done lattice generation but at time timeline line for adding discriminative training and lattice generation and this summer slash like uh we we we support all kinds of linear and affine transforms you can imagine i don't not all of these necessarily involve uh you know that tree version what where you have a multiple regression plot that's just because we are trying to avoid very complicated frameworks that would make that so difficult to use so a lot of these just support point a single transform we all of these things also have examples scrip so it's not just something that's in the code that that we know work something that you can also get to so and trying to have a i did want to just is other tool kits as a little disclaimer here that we're not claiming that all of tool kids don't have any of these advantages to but uh waiting for clean coal code and modular design uh and and by module we we probably need something a little bit stronger than you would normally uh normally imagine it's it's written in such a way that it's not only easy to combine the various things that are in the but it's easy to uh kind of extend arbitrarily and and we have avoid the kind of code where when you add something a bunch of other bits of code have to know about what you added then you have to modify all kinds of you know all kinds of other and the part is a big uh advantage i know but not a lot of uh to gets such a completely free lies and that that we don't really anticipate this being used for commercial purposes uh our understanding is that a lot of research group as a matter of principle they they won't you stuff that has no commercially license because this say is this research can the commercial by the and now or of the license will uh have example scripts which were which were uh standing documentation and that this whole community building think that the people involved in cal is currently uh it it's a group of people mostly vol who are to the previous to you works so myself are are not a bunch of guys from but and and if you others case uh but we open to new participant and uh well what what what we're hoping for mainly is not just people who come to be a line not to of code but the people who really want to understand the whole thing you can contribute a significant amount um the it's okay is especially good for stuff that involves a lot of linear algebra it has a very good matrix like be the andreas going to talk about so if you want to do stuff that involves a lot of a matrix and vector H are do also uh of course we we compile pile against the openfst library so you can do have T stuff you know at the code uh its built in a scalable way well it doesn't explicitly interact with any power level is a parallel by it doesn't it doesn't interact with them at weird do use or or um um P i i think "'cause" we felt that that that would just lock it into particular kinds of system so uh but all the a it's been in in such a way that uh it should still work efficiently when everything is very large scale you have a lot of day our our intention is to it and all of the state-of-the-art methods for lvcsr things like discriminative training a standard all of the standard adaptation uh but uh i think i say on the next slide uh something that we not kinda doing in the in the immediate future it's things like online decoding which what i mean by that is uh where the data is coming in say from a microphone or telephone and it's some kind of interactive application because you could use it to do that and building a decoder isn't that hard in this framework but uh i basic target audience is uh speech recognition researchers who want to work on the speech rec oh other than rather than those who uh oh have a mock i was learning what everyone was looking at a multiscale to enter the room um and disrupted that's all right if very present i oh i i i okay i so i we set some people lately have uh this become popular recently take do a kind of life unwrapper for C plus plus code the idea being that you can uh more easily write your script however we we've avoided that approach because probably because it's a hassle to do the the wrapping and nobody ever understands house we were probably because uh it just forces people to learn a new language and probably those who just want the colours think that everyone knows by that uh so we support the kind of flexibility and configurable ability of that in different ways but partly uh i think it'll become clear later so perhaps will will will will leave to lake to those ask so we don't have back would training their in there are no immediate plans to do it and i some people i think some people like for back for kind of religious reason but uh i don't believe any was demonstrated the viterbi be is worse and it just so and we need to use with a be for uh we because you can write the alignments to this compact lee and then on really okay really interesting but i i i even even not let this as like just a single hypothesis makes it if okay so we'll have to think about that i mean it's not like it's really hard to do but it just wasn't something that we had planned uh one uh_huh oh okay well it's at the state level but we it's not really this the i stay i mean pdf index but that you little bit more precise not because uh you just right out the state sequence it's fine for model training but then if you wanna work work the phone sequence the penny how tree work it might not be implied by the state sequence of then we have these identifiers the also contain the phone and the transition oh it's and it's a it's an integer a list of it just but those in integers are not quite the states there something that can be mapped to the state also to the phone uh so i'm just gonna describe a how this came to be we had this work in two thousand nine a a lot of uh focus was on as G M N um we that the supper we we were using that some guys some brno a university of technology including a on draw look at another uh they built this uh infrastructure for uh for training as gmms that was it was written in C plus plus but it rely don't he's T K system and i also built a a and F E F S T based code so that we could be code our own C plus plus code with access to the matrix like um so we kind of calling that crow took D and and we wanted to release that recipe you know in is some kind of open source way but we realise that the rest P was just too hard to encapsulate because the had he's T K had our stuff as a lot of script so we we wanted to create something that good support this stuff and was easy to encapsulate so we we an entirely new uh uh the next summer we were entirely new toolkit that is you know that we that we wanted everything to be clean and unified and to have a nice use shiny C plus plus speech rec my i think that's the uh i think that's this a slides a last somewhere are two thousand ten we had another workshop and or no where we uh that a lot of coding and and the vision at that time which and i realise is very unrealistic was that we we have a complete working system with example script you know the end of the sum but that that kind of didn't really materialise a had a lot of pieces but we didn't really have a complete working system so after uh i kind of obligated to you know and is the system and and and we had a help from others thing especially on that and and doing a lot of coding after that so uh when we go to the next slide it's a it's only been officially really something like last week that's when we actually uh got all the legal approvals and put up on source forge this is just a list of the people i don't think i'm gonna go through all the names this the list of all the people who are rich then uh code specifically for D that's of the list the people who done various other things or it's so help the in various ways and uh i would describe exactly have for each one but i'm kind of scared i've left someone of one of these lists and i i i i i i just let you read it um a lot of these people are have some connection to bird or you invested to of technology oh people but the in uh or like that so that this is a this is is a rather messy diagram i i just wanted i want to give you some idea of what the dependency structure of kaldi was but i decided to put side information and to here so the area of these uh of these rectangles is roughly proportional to how many lines of code there are so the these think the thing that we can pile again so open a fist is the C plus plus library uh at let's C left that refers to the math libraries that we can pile again uh and and the rough dependency structures thing on top of things that and on them but is very approximate so for instance he's various fst the algorithms that we've extended of an fst with uh stuff relating to tree clustering for decision tree uh that for leading to hmm topology decoder decoders language modeling thing this is a small box because really all it does is uh compile a marketing to enough i two that uh you tell this at this is mostly i O stuff as various frameworks for io that will be explained later run kind of after a break so we can allow people skate this is the matrix like we so this a lot of this is just wrappers for stuff that's here but if any i don't know if any of you are familiar with with the steal a pack and blast and those things but their C libraries that for C plus plus program a slightly painful to work with "'cause" they have all of these arguments like the rose the columns tried and the thing you wanna do is this very long line of code and uh so there's no notion of like a matrix as an object so this kind of ad that abstraction and it is it is significantly easier to use then of this make the uh this is feed sure preprocessing and you know going from a web file to mfcc that's that's fair uh gaussian mixture models a diagonal and full subspace gaussian mixture models this is the reason might talk the linear transforms things like fmllr M L R S T C hlda things of that nature vtln is in here to kind of the linear form of vtln uh all of these things that he had these are kind of you know directories that contain command line programs that tells you a bit about the structure of the toolkit which is that we have which really more than a hundred command line program and each one does a fairly specific thing wanted to avoid this phenomenon where you have a program that kind of allegedly does one thing that really is controlled by button really an option and has rather complicated behavior depending which upset you give it so this is part of the mechanism that we use to ensure uh the everything's configurable an easy to understand is no python layer but that's a lot of uh programs as simple function and on top of this is the shell scripts so to do a not actual system building a recipe what are example scripts currently only do is it's the bash script and that you know has a bunch of variables and bash to keep track of iteration and stuff and it and it runs the job but invoking from the command line because the different ways you could do this if you if you love perl up a python or whatever you as to i but that's how a script and and something that i really haven't included on this diagram but it's kind of parts of the dependency structures this some tools that we rely on so uh for language modeling i D thought we use i R T L them just because of license issues but probably you on use that's i lm if you wanna do a lot of a language modeling stuff uh things like as P H two pi to and to uh in separate data from the L and so on so that the you we actually we actually have a and of can and installation script that will automatically obtain those things are so the scripts can run without you having to manually install stuff and your sis so i'm just gonna briefly summarise the matrix like tree under will be talking more about it later but the plan was to allow people to escape after this initial segment case the not that the boat to that they one here about this stuff but uh as i said it's a C plus plus rap for a blast and seal at pat and we've well why should say on really has gone to a lot of trouble to ensure that it can compile and the various different configurations the what libraries you have your system so it can either the work from blast plus C lap pack or from a less or using entails M K L the reason is that on some systems you might have one but not the other i i less is an implementation of blast that's the kind of optimized to the specific a hardware automatically is is generally a more so the code that we've rat includes generic matrices like square matrices also packed symmetric matrices where where you uh have a symmetric matrix the only store the lower triangle and it's like this this this order and uh pack triangular matrix there are other formats that last and C web back supports but these are the ones that we for what most applicable to speech processing like we don't are a lot of sparse make sure and traditional so this uh and i like we also includes things like S P D an F S C i fifty isn't supply any of those libraries but we uh we we uh got permission from rick come out of our microsoft to uh use this code so he has a good "'em" um something about the matrix like the even if you don't buy into the whole to kit if you need a C plus plus matrix library it's probably a is probably quite good in fact it's surprising that there it doesn't seem to be a lot out there that fills this nice just there's blues but that it's a rather weird library and i i don't think a lot of people like um okay if you what the about open F is key so i i seem and one he knows what what fsts are it in T had this command line tool kit but i don't believe they ever released source so one some of those guys went to google they decided to have one that was uh for open source and it's a patch lies that's why we as part there is reason we made out the a you license because we figured that to to use up pin fst there's no real point in having a different license "'cause" it just gives the law my head so we went for the same one ah so yeah we can pile against its some that for is the decoder it doesn't use like a special decoding graph format use is the uh same memory structures the openfst and the by the way open F to C has a lot of templates and stuff so that is not just one fst for and there's a lot of them so if you want to do you could uh kind of template your decoder run some fancy format that would be let's a compact or dynamically expanded or some like we're not gonna go into that in detail today so we actually implemented various extensions to openfst some of the recipes the perhaps not totally in the spirit of openfst because those guys have a particular recipe that they do and i was is just a little bit different for later on i can explain why i feel that there are good reasons for uh i don't know if those guys would agree with uh so if you with the by about io it's of the controversial decision among the group to U C plus plus three in the end we decided to do it probably because openfst also does it uh something you know a lot of people prefer sea base i L but but but we do this uh we support binary in text mode formats a little bit like htk so that each object in the toolkit as a function that will right and it takes a little argument binary tech so it it'll just put its output it's data out of the stream in binary or text mode any in each object also has the read function that does the same thing so ah it's of the standard thing in many talk at the used and final made in various ways like this can mean the standard input standard output it is just a command and this is what how it knows that it's can uh this is the and off that into a found meaning it will it will open the file fc to that position it's is uh useful for reasons that will be described later uh so this this archive format is it quite fundamental part of the way uh kaldi work and i think i've just cry i'm gonna describe this more later in a another talk with the basic concept is you have a collection of objects let's imagine that they're matrix and there you are there are indexed by a string where the string might be let's say an utterance id so you want to have some way to to access this collection of uh strings and matrices and you might there might be a couple of different ways you could do that you might wanna go sequentially through the as an accumulation of some we might want to do random access so there's a whole framework for doing this uh basically the reason is so that your the most of the calico doesn't have to worry about things like opening files and ever conditions and you know that doesn't have to be a lot of logic about that in the command line programs because it's all handled by some generic framework but apart from this we tried to avoid generic framework ah the tree building clustering code we it's based on very generic clustering the can something like i guess hard to model whatever they call it uh so it doesn't that that that internal code doesn't assume a lot about what your trees it is suitable build decision trees in different ways including like sharing the true and asking questions about the central central phone it's like that um it's very scalable to white context for example quint phone i know a lot of the it it's hard to write code that was scaled to queen phone because if you have to enumerate all of the context that's kind of it's hard hard to go to a but uh we basically avoid ever enumerating those con uh as an example of a how we make use of this general C and the wall street journal recipe we uh we increase the phone sets of the in the were asking about the phone position and the stress i a "'cause" the know he's to K supports this "'cause" i thing you had a have a paper marked with he about doing that so uh but but uh if the phones that much larger than that probably an approach based on enumeration of context would start i you don't think so no i mean like it was a thousand thousand keep this day right okay well i okay uh okay hmm and transition modeling co so we've we try to have an approach where a piece of code only needs to know the minima needs to know so so the hey gmm and transition modeling code doesn't really have any notion of a pdf it's purely it purely does what it needs to do and the rest to separate so this is probably pretty standard approach you you develop a uh you specify prototype to paul it's apology for each phone is that how many states what the transitions are uh and we make the transitions the separate depending on the uh depending on the pdf so so that if the pdfs into states are different than the transitions out of those states are separately estimated is this is just the most specifically that you can estimate the transitions without having your decoding graph blowup it's not believing clear that this matters but uh we just felt that it was that we should do the best we could on uh they're mechanisms would sending these youth hmms into fsts because all of the training decoding is fst basically kind of have to have an fst representation of these uh it's is something that we touched on a a and i are F S T so what you would normally imagine is that the F it has input symbols that are the the pdf so some symbol the represents the P D and the output symbols of the word but the problem with that is let's suppose you uh you want to find out what the phone sequence it's all well well and good if your if if your phone had separate tree so that so that it was could for each state which phone it belong but but what if you had a larger phone set and you wanted to have a shared tree room and that wasn't you know one to one mappings oh there was in the mapping you need so so we have a input labels on the fsts the encoded bit more information uh and this is also useful in training the transitions because sometimes just the pdf labels wouldn't you of you quite enough information the train the transition uh there's a couple of different ways to create decoding graphs for for uh training purposes you have to create a lot of these things at the same time and combining the fst algorithms using script would be quite inefficient because you have the overhead of process creation so we uh we call the openfst algorithms of the C plus plus level combine them together so that uh you can create your decoding graphs for training uh and and we typically put them in one of these archive like basically a big file concatenated together with little keys in it on disk so that you don't have the I O of accessing hundreds of little file training use of the viterbi path these graphs uh for test time we we we didn't we didn't use this approach of C plus plus because it there's just no point we uh it's basically scripts and i'm gonna goes wannabe scripts later for those words um that's the least scripts that create the decoding graph recalls some openfst tools but some of our own and that relates partly to a difference in recipes but uh i'll talk more about later after great so and i was gonna talk later about some of the acoustic modeling co i'm just gonna give a brief summary are gmm code is it's very simple it's not part of some big framework it kind of but like an and object that has you know the means the variances it can evaluate like it's for you give it the feature but it doesn't and her from some generic acoustic model class and it doesn't at ten that's a kind of know about things like linear a it just sits there and and things like we a transform they kind of have to access the model and do what they want the the reason for that is that if the gmm knows too much them whatever you do that's fancy you have to then change the gmm code and it just it's is not my situation so uh yeah we have a separate class for gmm stats accumulation and doing that they so for for a collection of gmms like an gmm gmm system we have a class that pretty much behave similar to a vector a G M at so it's it's a fairly simple thing there's no notion of name of a state that is just an integer and then really we've avoided having like names and names for things in the co exit jurors uh_huh oh this this low case vector just refer to the S T L vector but there is an upper case vector to that but does something in a matrix like i well the code is never been case in that as the code we i i even on windows uh i yeah okay we've got quite a lot of linear transform coder uh lda hate lda again and fitting on the fence with regard the naming of this technique i don't wanna and anyway i uh another these multi name okay uh olympia version of each other i mean we tried regular vtln is yeah everyone knows that it's kind of tricky to get it to work it was that you'll anyone that worked better in the N uh it is something new that it's a kind of a replacement for vtln that what's a little bit better i gonna explain what it is uh at a later date mllr uh a lot of this so one this transform the global the with the way we handle them as well it just becomes part of the feature space so it's just start of the matrix on disk and this use a lot of plight so the way it actually works is that this matrix is multiplied by the feature as part of a high my seem like you're right obviously there is silly way to do it from a computational point of view but it just makes the scripts really convenient to uh do uh yeah so when i say they're applied in a unified way what what i mean is that the co the estimates any of these transforms there really outputs just to make trick so uh you know there's no like and some a lot transform J that's just well okay yeah there is so for the uh regression tree one i but for but for the global one it's just it's just a matrix i i mean that's with the point of contention among as that to whether to do it this way but uh some of a style that it was important to keep the simple case is simple and to it to avoid having a a framework for the cases one was an S uh okay decoders well of the decoders that we currently have use fully expanded F S is one i mean when i say for the expanded i mean is down to that H M and state level with so loops represented as uh actual you know if sdr i know there's a lot of way to do this and initially one of the thoughts we had would be that you know we wouldn't have the self loop so we might even have representations of the states the and then it was just so much simpler to do it this way this is what we have now we have three decoders but by decoder we mean they uh C plus plus code that does decoding it's not necessarily the same thing as a command line decoding we have three decoders on the spectrum simple too fast and the reason for this is that once you have a complicated fast decoder is almost impossible to the to debug so if something goes wrong you can always just one the simple one you know and you can find out if it's a decoder issue uh decoded we wanted to make it so the decoder doesn't as you too much about what you're model model selection so it again decoder has no idea of gmm hmms it doesn't even know about features all that all the decoder knows about is give me the likelihood or score level for this uh frame index and this pdf in that so it so interface that the decoder seizes is almost like a matrix the matrix of uh of floats but i'm is is not represented that way because you want to you know you want to have it on them on i so yeah this is the decodable interface an interface that the it's a very simple interface that says give me the likelihood for this you know time in this frame and like how many time frames are the and how many pdf index is that that's almost all the interfaces but this this is the interface at the decoder requires so the idea was to implement you know L fantastic a model and you uh i in in a very matter what interface of that model is you create a small object that satisfies the decodable interface and knows how to get the likelihoods from your and L fantastical model and then you uh you instantiate the decoder with that are you give that so uh the gmm wrapping okay yeah so i come online decoding programs a very simple we don't have like multipath or anything we don't have uh we don't we don't know than to support multiple types of model an example decoding program is decode with the G M and but no with number multiple class adaptation yeah so does the simple thing and then if you want to support let's a multi-class mllr fmllr we uh have a separate come online prague yeah the idea is that there might be people coming into the project might want to be able to understand that come online program and we don't one that once a make the barrier to entry too high we got the support the overhead of having to maintain two parallel decoders keep it relatively simple to understand any given one uh we support the standard types of features mfcc and plp features are quite similar to K one we've we put in a reasonable range of configure ability but i mean being realistic with respect to how much people are really working on this stuff i mean i think most people are doing research on this would probably be coming out with their own features so we don't support every possible combination of it for every possible change we only we we dwell format because there i reasoning is your you can always it's find the external program to convert it and do it as part of a high sorry well we cannot htk and i won't from uh we don't there's no more that we support uh yeah i mean i i basic concept to have people use the system is as a complete system because once you start supporting model you know in a conversion just get work but yeah that's the he's tk K features as a as a special case uh we typically will right features another large objects to a single very large file of relates to this archive format so the form of the file as a key space then your object and another key a space that object and uh we have efficient mechanisms to read such files the the the two normal cases are firstly sequential access we want it's rate over the things an archive exactly random access and the the different ways to do that one is you can write a separate file that has little point doesn't of the file another is that you can kind of simulate random access even though you're really going sequentially if you know that the keys are sorted uh and another way is if the file isn't isn't that big you can do random access by just having the code go through the whole file stalled objects and memory that's not just scalable but for for a lot of uh types of all kinds it really doesn't matter oh yeah so the feature feature level processing like adding deltas that from a lot typically each one of those the separate program so you have like a sequence of programs and apply and again that's a bit inefficient but it's not like it's really consuming more than ten percent of your C P U so you just don't care that much this has been written with ease of use in my uh like i said there's a lot of command line tools this is an example of uh a command line and this backslashes of this the cell so uh this this is one of the many programs the plp would be a separate command line this is just you know an option either the two command line arguments in this uh i gonna be explaining later on or about what these mean with this directed to write these things to it and archive on the a key object key object and then i don't know this is the input we have to read it and then this is telling it to write an archive and also and i C P file that kind of has little pointers into the okay so that you can efficiently access the features by random access um um so so yeah another example of another feature of this is that as only one option here we we we have no more than a few options on any given come on i mean it's a local program i support less the channel it's not it's not a very can different to at is more driven by how you combine these grow a oh you something else about this whole archive a uh formalism is that this C plus plus level code in the individual come line tools we doesn't have have to worry too much about high uh you can just treat the uh when to get something like this there's there's very short uh statements in the C plus plus that will it's a rate over a stuff so it doesn't have the think too much about the error conditions but yep fst festive generation okay that as another part of the talk later on well for training there's there's a command line program that will kind of do the fst generation for you and generate lots of the left S to use one for each file yeah so for testing it's it's a script the calls a fist openfst programs an our versions of openfst for so i'm gonna go through that script later one and another part top a a are you this decide this is not obvious you know a lot stand the script but this is just to get people some idea oh of uh of how we do do training so you know this is the bashed script it's doing a loop over the iterations uh and this one is estimating ml mllt up i suppose this script review the bias and sorry man i but as that we are is the colour i've yet so a so if it's that one of iterations that we do a lot C then uh so we have on disk some uh alignment this is like steak level alignment it's in a mark at i've from my that i mentioned so this converts them to posteriors just an average of trivial way by thing that each each one has a posterior of one this takes the this this gives a zero weight to the file and that's would be a this would be a variable and by uh yeah so this takes away the uh you the silence is there a posterior and this is an accumulation program that uh this would be the model that's the thought fit of the features as the abashed variable that would be that elsewhere where uh this a a hmmm i think that's refers to the standard input me that's reading an our cat from the standard input and that output by the you this mean that's writing an archive to standard it out but so yeah yeah output of these programs is passed by up pi uh all all of the error and logging out but goes to the standard error uh because we've kind of used with that it out but for this type stuff so so we just directing the logging up there so then this is a separate program that does the mllt the estimation it takes in uh let me see uh it's it's it's computing some kind of make and then uh because then am a lot T yeah what i i have to you can the transform you have to change the means of your model so we have a separate we like to get everything separate so you know transforming the me the separate operations so we have a separate program for that and then we have to compose the L B M T transform with the previous one so this is another will program that does that so this with was setting another bash variable able to make the ah features correspond now to the new ml L you a melody features so so as you can see that this is the very and bash and it's this would be passed as a command line arguments to one of the program and it's a command involving a pie that actually vol calling to separate cal be uh program each for their own argument so obvious you can guess from the names of those programs what they're doing and then of "'cause" uh it seems to have features sub oh yeah i think we were estimating the ml T on a subset of features so this is like the same as this but it's the it's using less the data so i think i i spoke about these issues but for oh yeah so uh we had example scripts results management and was to general and these run from the ldc distributed this uh now we found in the literature just some uh some some uh baseline these numbers are numbers are just the basic context system with i think uh mean normalization we have of course more advanced things but those you know because it had to find in the literature the same thing we just giving you the unadapted adapted so it's a slightly better than this number will can right someone a two thousand and that the hates you K paper from ninety four a has a funny but a number for this was the gender dependent system so uh so i think basically would doing the same as you expect given the same out i mean uh i was hoping you know the set of this help project that the results would be but uh for issues relating to the tree in can phone and stuff but you know that in we give a senate so it it's working there's no major but uh did of the okay next slide uh just the not on speed and coding is used use a bigram numbers and the "'cause" the baseline we'll bigram numbers we can't yeah yeah code with the full with the full uh trigram language model that distributed with the wall street journal corpus because the fsts uh they get to large we have a "'cause" to with pruned track but that's why we're coding the bigram numbers so hopefully by the sum we gonna as the couple of things we can do that we both working on one is to have a decoder that does some kind of on the fly pensions so that we can uh the code directly with that and the other to have a just generation so we can we score the decoding speed is for these was to just don't numbers is about twice as fast as real and a "'cause" that's on a good machine so i mean this is june so that you don't get more than zero point one degradation from versus a white B a the wall street journal script takes a few hours on a single machine using we problem lies on to three C be used this is just an example script we didn't want to include things like you serve in the example script because then it wouldn't run on uh everyone's machine the was they would be fast if you were doing a parallel yeah uh_huh if it in member it well as well well but ten gig i i mean i S you know everyone knows that F is T compilation tend to up a bit it's not like if you have the size of the model you can just about compiler i i don't recall that it's a trigram one for most journal i i and then we go how many was but i think i don't think that the our stuff is any worse than you know normal if T that ups that fully expand of thing oh yeah okay results management this is a use she came results or take an uh from uh this is i think this is basically the hey K are each K the be but he's real us to take it from a paper of mine like in ninety nine or something "'cause" i just couldn't find in the read me file from are C K on all of the test and the average as you can see the average is the same so with the same algorithms are getting the same result as H okay uh yeah and it and the decoding we on the setup is about zero point one times real yeah yeah yeah the test set are quite oh yeah is a very small test that a handful of words that are of uh this is this page is mainly just to give you some idea of the kinds of things that are in our example scripts we have a bunch of different configuration of this of the standard configuration well this is the standard configuration because this is what within the htk baseline line uh um adding M L T doesn't we seem the hell sorry adding is T see as they we the hell i a a a a a it's well i think nine frames plus lda that you makes it worse but then when you do uh F T C on top of that if that you better than here and so that this was the kind of this was that I B M recipe P so sorry this with I B M is to be so i i guess that must been some interaction between these two parts of the recipe that somehow made it work i i don't know if it's a generalized to other trade other test set we gonna find out uh that's placed nine frames plus hlda triple deltas plus hlda triple deltas plus lda D A plus a lot C this this quite good uh sgmm cyst these are all and adaptive have a separate slide for uh adapted exp um if is doing it and that's it's stated otherwise that oh yeah okay so this is but utterance adaptation this is per speaker so this was four point five my and before uh adaptation so it really doesn't help if you do it but i'd sir rights and that's because this too many parameters in to model a this is doing the same thing per speaker gets a lot but uh its exponential transform is again i'm not gonna describe what it is is something vtln one uh and it gets quite a bit but uh this is a this vtln and of the kind of many a version of vtln i believe it is that thing to improve quite a lot and of got improvement is more pronounced on the per utterance level because uh in know it it's just like a constrained form of a from a loss of the only point is to do it to do when you have less they uh splice nine frames for cell day sex to transform a from well thing i from a lot we only did some of these put speaker because it wouldn't help of the uh as you can see that the well of different combinations this is as gmm including the speaker offsets sets the and thumbs if you member and it does help so so uh i think rick was saying that that is wasn't working for him but it seems to be working for us three point one five goes to uh where is it to point six eight i i must of uh forgot to fill this line and it's is that's gmm plus a from a la but no speaker vectors yeah a per speaker yeah i think i have these numbers but i think i must not put in i think a best number was like to point for point three uh so general plug for cal uh i believe it easy to use i mean i have the scripts didn't scale you guys up as if you traction is that once you understand them everything becomes quite simple but it kind of does that you that the sound has speech works like if you some under who does is randomly moving the script you know changing configurations of the you're not uh it's not gonna work it it doesn't like it doesn't or to magically know that the features you have a not combat your model so so you can have to know what you doing from a speech science point of view but it's quite uh it's easy to use that the C plus plus flash to software engineer uh it's easy to extend and modify you can reduce should be go changes are give them back to the cal group uh we open to including other people's stuff so that may give you most citation so this is i really the and the this first part so you can get up and have a drink and after a few minutes well yeah has documentation cal D duck source forge dot net uh uh okay if is not as good as H K and probably being realistic will never be what will do is will of able lies the F but the he's to K has use and point people to the he's K documentation so then about eight and say that have he had then yeah i know me i i i i use C see i i but okay we can have a shot rate you we can have a drink and just a pair you're not in uh that committed to it and then uh uh we've have a gonna talk up to what the fact