0:00:13i
0:00:14i think this is gonna be the official theme tune L T
0:00:19a
0:00:21okay
0:00:21so i gonna be talking about yeah
0:00:24how to rank of the
0:00:26and
0:00:27so basically will be going
0:00:29we have to recipes on on uh
0:00:31yeah
0:00:32the come our stuff one is results man as one a most wall street journal
0:00:36we gonna go through some all the results management recipe
0:00:39and well
0:00:40well have a few digressions to explain
0:00:43as much of the internal the cal as you need to know it's a kind of a understand that
0:00:48that
0:00:48we those to the installation process i'll describe the unix one because that's one most people will use but
0:00:54it also has a visual studio windows one
0:00:56uh
0:00:57the scripts i I scripts or all and bash again for popularity reasons
0:01:02D is kind of agnostic about the shell there's nothing unit that's really specific to anyone shell
0:01:07no
0:01:08suppose you want to download cal than you want to run it
0:01:11a
0:01:12you probably first go to this
0:01:14location kaldi don't source forge don't net
0:01:17i F will also work
0:01:18we have a page of documentation that explains the
0:01:22we much everything county relate to
0:01:24uh
0:01:25we use a source control program called sub ocean the command name is S the N
0:01:30it'll typically be installed on most of the system you will have
0:01:35uh
0:01:36it's a S U N as a little bit like C V S but it's a more modern implementation
0:01:41so to check out cal the you would just type this command on
0:01:44and this will a
0:01:46the a lot of stuff go by and the screen it will check out a bunch of directories the code
0:01:50the screen
0:01:52an installation instructions the documentation source
0:01:55so on
0:01:56uh
0:01:57the installation instructions you just
0:01:59look at the install file
0:02:00and the installation is pretty simple that's like change direct to here
0:02:04ron installed a S say
0:02:06C D to here running
0:02:08one can figure run me
0:02:10there is an rather nonzero probability that something will go wrong
0:02:14because it does kind of hope that set things there install
0:02:17but we kind of provide instructions
0:02:19a the common cases
0:02:21and if it doesn't and stop please ask me and i'll tried the
0:02:24help you to get it to install
0:02:26uh
0:02:28that there is a directory of kind of external tools and
0:02:31we try to have a script to configure and uh
0:02:34to download and make
0:02:36all of these external tools so that you don't have to
0:02:39worry about that yourself
0:02:40these include as P H two right
0:02:43two
0:02:43because sphere files
0:02:45i yeah rest M is a language modeling toolkit
0:02:49uh
0:02:49with a that them as i mentioned before we chose this because it has a right of be open license
0:02:54very
0:02:55limited features
0:02:56openfst at such
0:02:59so
0:03:00but you done that you checked it doubt you try to install it
0:03:04uh
0:03:05no scratch that sense
0:03:07uh
0:03:08so
0:03:10a and the future we gonna have version numbers and everything currently because we haven't yet come to version one
0:03:15point are we just have trunk
0:03:17which is the kind of version control thing for whatever your current code is
0:03:21uh inside then you'll find
0:03:24rules which is a place where we gonna download and compile various external to
0:03:29the find has sars the source which is
0:03:31well all of us source code is including the source for our documentation
0:03:36and these are the subdirectories in there
0:03:38that these are these are the names of the things that i showed you on that funny uh slide with
0:03:42the rectangles
0:03:44so this this is all the subdirectories of code
0:03:46and E G the directory a script
0:03:49they contain the results management and
0:03:52and wall street journal scripts
0:03:53the you can probably see it's it's and of hubris and the naming scheme here we we got we went
0:03:59for their deep naming scheme because we
0:04:01believe that eventually will be tons of script
0:04:04and those directories
0:04:06i
0:04:08so
0:04:10i was you've uh
0:04:12then the installation in tools and that that's we just wanna get script
0:04:17you got the source to
0:04:18configure
0:04:20the the configure script that sometimes configure script so these vast
0:04:24scripts that also generated by things like to make
0:04:27or whatever it is
0:04:29but this one is just a hand generated wanted tries to find where you're
0:04:32like atlas library or steal a pack libraries and if it finds it
0:04:36then a composite with that
0:04:38and it and it detects certain like
0:04:40certain systems like
0:04:42cygwin and
0:04:43mac os that have
0:04:45particular setups that are common and then it handles those a separate
0:04:50uh
0:04:51uh it's good to talk minus J for when you make
0:04:54decode because there's a lot of uh
0:04:56tools and the code is rather template and so the compilation is a little bit slow
0:05:00this makes it in parallel
0:05:03you don't make test the ghost all the subdirectories and
0:05:06runs all the programs that and with dashed test
0:05:09we have a lot of testing programs
0:05:11they're mostly uh
0:05:12units S
0:05:13to make sure that
0:05:15all of the code is working things like
0:05:17you have a matrix
0:05:18multiplication or something you do the multiplication and you
0:05:21verified that the answer was right
0:05:23like that
0:05:25uh
0:05:27and there's also you if you can also type make well grind it runs a program called well grind to
0:05:31check for memory error
0:05:33and that that would
0:05:34i mean right now there's no error but that would detect
0:05:37if there with things like
0:05:38and allocated memory
0:05:41so suppose you've done and you to make you type make test and
0:05:45i thing one wrong
0:05:47so
0:05:48you C D two
0:05:49E jeez are S one and it's is where example script uh
0:05:53this just seems that you
0:05:56you know you member of the L D C a what have and you have
0:05:58you have access to be
0:06:00you did think this corpus i think for members that's like three hundred dollars the a lot of sites will
0:06:05have it already
0:06:06so the results management corpus it the a all really simple corpus
0:06:10but
0:06:11uh
0:06:13we use a because it's really fast to run and it
0:06:15because it's kind of an lvcsr like task is really medium vocabulary but
0:06:19because it contains that words
0:06:21and has a lexicon and everything it kind of but haze like a typical lvcsr system even though it's
0:06:26or
0:06:27uh
0:06:29so that's be on some directory and you have to figure out what that directory is
0:06:34at some point you have to pass it to one of the scrip
0:06:37as a bunch of come than here that you're supposed to write you know you're not real expect the run
0:06:41this directly it will just X it on you if you do that
0:06:44it's
0:06:45it's just a sequence of commands you're expected to run by had
0:06:48because there's a high enough probability that any given one of the most failed that
0:06:53you thought it wasn't good to a be over optimistic can just make it a single script
0:06:57i mean the failure is the gonna be do to simple things like
0:07:00maybe the wrong directory as some
0:07:02but
0:07:03anyway
0:07:03so i i'm gonna go through what this run done S age that
0:07:08the first thing is data preparation
0:07:10and
0:07:11so you will
0:07:13you'll see that the door called data probably B to there
0:07:16you know you know what the directory if your results management uh
0:07:20data data is
0:07:21this is up this is the ldc the
0:07:24give it that uh
0:07:25and it'll just do a bunch of stuff basic with convert thing whatever format this
0:07:29corpus has
0:07:31in to a format that deal like
0:07:34and you know
0:07:35Q waiting lists of file names like that
0:07:38these
0:07:39you C D out
0:07:40uh
0:07:41just for things that are created by this
0:07:44that was the bunch of stuff actually in this directory that it
0:07:47create
0:07:48here's an example
0:07:49S C P file
0:07:51so it contains the utterance i'd B
0:07:54and then
0:07:55if is the pipe come on
0:07:58from his the here's apply can man so this can is gonna be run whenever some program tries to read
0:08:02this
0:08:03now and C P far this of the concept that
0:08:06the that's okay had is not really quite the same as C K's notion of an S C P file
0:08:11not be explaining later exactly what that is
0:08:13another think that's created here is
0:08:16a decoding graph in fst format
0:08:18in in some other scripts like in the wall street journal script
0:08:22this stage wouldn't be creating any fsts it would just create an arpa
0:08:25the because our M doesn't use an arc a we do like this
0:08:30so uh
0:08:30uh
0:08:32and stage of data preparation uh
0:08:34oh yeah it's is created in that directory to it comes from
0:08:38stuff that's in the results management this
0:08:40it'll create a lexicon for you in this form
0:08:43is pretty of obvious
0:08:45and will to ten into an F C
0:08:47the call tools don't we deal with this directly
0:08:50we deal with fst so the lexicon that you give to count
0:08:53is gonna be an open
0:08:54i
0:08:55format
0:08:56fast
0:08:57that there also some uh is the speaker matt
0:09:00so
0:09:01this of that are inside the this a speaker I D
0:09:04the file that contains a lot of the
0:09:06this this is how to the
0:09:08you know maps utterances just because and vice versa
0:09:11there's no notion of like
0:09:12masks of comments or
0:09:15uh_huh
0:09:17so
0:09:17yeah that's content about turns idea is quite important important and D
0:09:21in never there's no notion of like parsing file in thing like the last element is the utterance idea what's
0:09:27of
0:09:27you have to have an explicit uh list
0:09:30and all of these that C P files than R kaiser index by this utterance are inside the you have
0:09:34to decide on
0:09:35uh
0:09:37we are
0:09:38that script also create a text format of the transcriptions but will convert this into an integer format kaldi eli
0:09:44just just so the cal doesn't need to have
0:09:47for all of the program some kind of match between the
0:09:50text an integer form of the uh
0:09:53the the uh word
0:09:55so this is the transcript the text format of the trans
0:09:57oops
0:09:59uh
0:10:00next step after to that it the prep stuff we
0:10:02the pair the graphs
0:10:04there's is gonna be a
0:10:06a bunch of openfst if commands and here like scripts to convert
0:10:10from uh
0:10:11from the lexicon to the fst format
0:10:15the
0:10:16the lexicon actually contains the silent
0:10:18and the scrip
0:10:20the the script kind of at and it's not something that
0:10:22very deeply embedded
0:10:24in county
0:10:25these these little files if you've ever used at indy toolkit or openfst you'll know what these are
0:10:31there
0:10:32symbol tables
0:10:33so so it's uh this uh
0:10:36this is the text form of zero the text form of one et cetera
0:10:39and E P S for epsilon is always zero
0:10:43this is kind of uh
0:10:45a common thing an fst toolkits knows of that idea zero
0:10:48so
0:10:49so this is why phones or one based because
0:10:52zero is always reserved for epsilon
0:10:56uh
0:10:58so
0:10:59the the
0:11:00create
0:11:01yeah all of the F so openfst does have a capability to put
0:11:05to put symbol tables on the fst so the fsts we kind of know what the words were
0:11:10we haven't used that because it
0:11:13it quickly becomes very difficult once you decide to have simple tables on the fsts we've
0:11:17we basically use integer format throughout the data
0:11:21which
0:11:23uh
0:11:23it outputs these files this is the
0:11:26gee the grammar or that could be not the language model used for decoding
0:11:30so the lexicon
0:11:31L L just got this one big is the lexicon the disambiguation symbols
0:11:35and if anyone has read the papers of uh
0:11:38more riyadh i'll the described the standard recipe for fst based uh
0:11:43yes uh
0:11:44i don't know what that is
0:11:46little symbols like hash one hashed to
0:11:49that they put on the lexicon of the ends of words
0:11:52to ensure that term eyes ability
0:11:54uh
0:11:56i i i i but i'm not going to that in more detail or it's gonna
0:11:59suck up the entire time of the tall
0:12:04uh
0:12:06pairing integer list of silence and nonsilence phones i we we we created little files
0:12:11tape things like this
0:12:13isn't needed later on by the scripts because occasionally a scribble need to know what the I D's of the
0:12:17silence phones
0:12:19and because the kaldi tools will at integer formats it's gonna need that
0:12:22as a list of integers
0:12:25uh computing remote okay so this is
0:12:29this is just a command to
0:12:31and vocal kind of other script
0:12:33that uh
0:12:35compute the mfcc
0:12:37and and here is the command and i believe actually this before
0:12:40it's uh
0:12:43it basically write cm mfcc to some disk
0:12:46and then this is gonna be a text file
0:12:48that
0:12:49contains
0:12:51on each line is gonna be utterance id
0:12:53and then
0:12:54the law that this
0:12:56filename
0:12:57cool on
0:12:58integer offset so it so it can kind of
0:13:01directly go to that
0:13:03part of the file using F C
0:13:07okay i think this is what i just said
0:13:09this is the uh
0:13:10script format that i mentioned
0:13:12of course
0:13:12the script format is very generic this whole thing doesn't have to be of this formant it's any
0:13:18is anyone of our extended filenames might include a real file something of this form
0:13:23pi whatever
0:13:25i i
0:13:26i'm showing you what the archive format looks like that really is binary data so that you can
0:13:31see
0:13:32uh
0:13:34yeah
0:13:35but but in some cases there would be text i'm you you
0:13:38you can give it the option to write in text and and very often it'll be a nice line by
0:13:42line format
0:13:45yeah i
0:13:47uh
0:13:48yeah so
0:13:50i think i mentioned this before
0:13:51the script
0:13:53this key is an important concept
0:13:55because there's this concept of uh
0:13:58a collection of objects indexed by key in this case the string
0:14:02think of a little bit like an S T L map
0:14:05where you know it would be a map from string to whatever object
0:14:10so
0:14:11the archives in the script both the kind of make use of this concept
0:14:15and i think that concept a little bit more detail in the next slide but the script format is the
0:14:19key and then
0:14:20some kind of extended filename blah blah blah
0:14:24i think i mentioned this before but the types of extended filenames include actual file
0:14:30a command
0:14:31piping output
0:14:33pipe symbol then the command which is like in
0:14:36which is the input motion and out but this is only a pretty
0:14:41very inputting from applied
0:14:43an an offset into a file which is uh
0:14:47which is useful where you where you want to write a big archive but have random access into
0:14:52uh
0:14:52so
0:14:54this might seem like a very
0:14:55in minus things i think it's important that
0:14:57if you as you want to ever use count it's and to understand this how this work
0:15:01so
0:15:02there's the concept of a table
0:15:04and this table doesn't really correspond to any like concrete objects or class it
0:15:08some a generic comes that the idea or is
0:15:11a collection of objects of some known type it's type known and of of
0:15:15all indexed by
0:15:17key which is the string
0:15:19we we define a key is the non empty space free string for
0:15:23that
0:15:24and i was we have to make its space free so
0:15:26otherwise we get it all kinds of issues
0:15:28a
0:15:29so
0:15:30so
0:15:31there was a street template plated class of that somehow relate to tables
0:15:35is the table right ear
0:15:36sequential table read or and random access table with that
0:15:40so this two ways you this three ways you can do something with a table
0:15:44you can write a table
0:15:46and what you like you do with this is you
0:15:49you'd say write me something with this key
0:15:52and this object
0:15:53that's gonna write it to the table
0:15:56a any in you keep doing that
0:15:58you can read a table chilly
0:16:01which means repeatedly give the next key and giving the next subject
0:16:05are you can random act you can do random access on a table which means
0:16:09do you have a object this key and so no if so
0:16:12give me the object
0:16:13that's how you interact at is the templates the template it on
0:16:17i gonna describe the next
0:16:19like what they're ten it on
0:16:21then not actually template on the object
0:16:23it's it would be most natural to template
0:16:26on the object that's in the table
0:16:28but the problem is that doesn't work very well with uh
0:16:32kind of fundamental types like integers and so on
0:16:34because it "'cause" that normal cal the object
0:16:37they have a read function and a right function have a particular
0:16:41behavior
0:16:42it's common all of them
0:16:43but we can't just to see using the everything we want to read and write will have that form
0:16:47because how would be writing to do is a how would write as T L like
0:16:52and it and it would be ridiculous and my pin to somehow have to derive a class that's an integer
0:16:57and give it a
0:16:58thank
0:16:59of the integers not class
0:17:00last
0:17:01so
0:17:02we tend like um what we call a holder
0:17:04a hold class as a cost that has set and read and write functions
0:17:09uh
0:17:10and it has a type that T inside it
0:17:13that
0:17:13is the actual type of the table whole
0:17:16so
0:17:18you know knowing all of this stuff
0:17:20is
0:17:20if you if the i lost to by not because you know a C plus plus are really doesn't matter
0:17:25because this is i'm just it's how the channels of this like as am works but uh
0:17:31you don't need to know this to understand the how the whole thing work
0:17:35so
0:17:37i think that's as an example of how
0:17:39i the C plus plus level you use that the table comes
0:17:43so
0:17:44we we introduce things of terminology here that may seem a bit annoying but
0:17:48eventually becomes clarifying
0:17:50and i are specify or is a string that tells the table code had to read a table of check
0:17:56uh
0:17:57and his an example of one
0:17:59uh
0:18:00yeah yeah K call on this finally
0:18:02so
0:18:03the table code is gonna part this and
0:18:06when it reads this it's as okay
0:18:08yeah telling me that this is an arc
0:18:10um thing that has the
0:18:11key object key object
0:18:14and this is an extended file name but tells you had to open a pipe of
0:18:18or
0:18:18open a tree
0:18:21so
0:18:23now this is the tight name
0:18:24sequential table read template it on this holder tie
0:18:28so this is
0:18:29if were reading something of type in thirty two
0:18:33so this is the use of the object name
0:18:36the and initialize that we're giving it this string
0:18:39so
0:18:40it's soon as you initialise the object it it's
0:18:42opening the high
0:18:43it's say we gonna read from this
0:18:46so
0:18:47now we now we using the subject with thing what
0:18:49for blah blah about
0:18:52what is what this code is doing that's getting each key and to and from the sequential table read
0:18:58and of course this and since this is the sequential table read that's what this subject
0:19:02expect us to do
0:19:03so the point is that
0:19:06the maybe error it's right
0:19:07some of the objects may not be there
0:19:09sometimes you know something may go wrong
0:19:12this
0:19:13the template it code is gonna handle that so you're kind of
0:19:16user level code
0:19:18just see that as a
0:19:19sequential access
0:19:22i think this
0:19:23uh
0:19:25a stuff that have already told you
0:19:28a there is some things that the table code has to do there were little bit tricky
0:19:32one one of these things as
0:19:34a very often you
0:19:35once to do random access on objects that are
0:19:38in an archive in that our K may maybe in a pi
0:19:41as use a lot of high
0:19:43and and the problem is that
0:19:44suppose to some reason you ask you query a key that was not in the arc
0:19:49in order it's of tell you know it wasn't in the arc
0:19:52it's gonna have to read each one in the archive
0:19:54go to the end of the pie and then saying no
0:19:57but that means that i doesn't know that you're not gonna ask for something else to so has got the
0:20:01store all of that stuff and member
0:20:03so
0:20:04in in order to uh
0:20:07stop it from having to do this
0:20:09you can specify and the are specified thing a little
0:20:12common S calm cs S
0:20:14a options that what tell it
0:20:15this archive is sorted on key
0:20:18are we gonna call this archive in sorted or
0:20:22so basically that gives the code enough information to know that
0:20:25i it doesn't have to store all the stuff and memory in it can still kind of
0:20:28be correct
0:20:30i'm gonna go a little bit fast is reduced
0:20:33uh
0:20:34i think we went through this computing mfccs
0:20:38monophone training
0:20:40so
0:20:42you would invoke this script
0:20:44uh
0:20:46we gonna go through the script a little bit
0:20:48it's set some some very than bash the directory were what are you doing your experiment
0:20:53the features i think we so one of the strings for
0:20:57this is
0:20:57and are specified that i mentioned before
0:21:00this
0:21:01this part tell the that
0:21:03we're gonna and separate this stream as an archive this tells the had to open the stream
0:21:08and of course this is a i that's another colour the command has its own thing
0:21:12sometimes it can can even be nested but beyond one level of nesting
0:21:16be the shell escaping would become to thing
0:21:19that
0:21:23hi this is applied
0:21:25what's so in fact this is an output is always that puts on the right
0:21:28so what this is a it i think that out says
0:21:31it's reading in this
0:21:33this script file that says where the features a
0:21:36and its output thing to an are kind of on the standard up
0:21:39so and then this says that this whole thing is a pie
0:21:43so this park gets interpreted by the program that
0:21:46is given that
0:21:50yeah
0:21:51you can used to it
0:21:52as you i
0:21:53oh
0:21:55huh
0:21:55so
0:21:56and that
0:21:57is going to the monophone training script
0:22:00uh we create a file called the X
0:22:02slash last let's top L
0:22:04that specifies the hitch an apology
0:22:07to be uh
0:22:09to the uh
0:22:10the cow
0:22:12so
0:22:13i mean you you can this file for a fairly self explanatory a script repeat that
0:22:19uh
0:22:21there is uh
0:22:22is it of the three state and then this is the kind of final state that
0:22:26call of that the last state always has an X a probability of one
0:22:32uh
0:22:33this is a week amount to initialize the uh G M and
0:22:37initialize that with the dimension of thirty nine outputs puts the here
0:22:41and this also outputs a tree very trivial tree that doesn't really have any splits and it
0:22:46and that's how we handle a monophone system
0:22:48even a monophone system has a decision tree
0:22:51it's just so that you don't have you know all the code is you five
0:22:55uh
0:22:57see if we have
0:23:02okay
0:23:02creating decoding graphs for training
0:23:05or all of the kind of training script have a command of this form that
0:23:09it creates an archive that have what has all of the fsts one for each are
0:23:13and we do this
0:23:14as a separate come "'cause" otherwise it would be too slow we'd only do on each iteration
0:23:19a little bit too slow so
0:23:21i take that the initial model
0:23:23the lexicon a fist C
0:23:26uh
0:23:27trained a all this of the transcriptions an integer format
0:23:31and that the put goes to this sprite that it just use that it and puts it in a
0:23:36and that file
0:23:38so uh
0:23:39this is just the format of the dot track not try file it's just an integer at uh
0:23:44transcription where we've can all of the strings so their integer
0:23:48numbers
0:23:50no of people like that
0:23:52a
0:23:56you okay so
0:23:57the very first stage of uh monophone training is the flat start where
0:24:01you uh
0:24:02and of in
0:24:03divide the utterance equally
0:24:05a to the number of phones or whatsoever
0:24:07and uh
0:24:08create a an alignment a once to that so
0:24:13yeah output of this program is something called alignment
0:24:16which is
0:24:17basically for each utterance it's a vector of integer
0:24:20in to those integers is an id D that i touched on earlier we call a transition i D
0:24:25it's something that behaves roughly similar to the P D F
0:24:30index of P D
0:24:31i D
0:24:32but it has a little bit more information so you know the phone you know what the transition lot
0:24:36so it kind of contains sufficient
0:24:38information to to to update data
0:24:42so we put that into this
0:24:43program gmm max that
0:24:45a light the suffix a means that it read an alignment
0:24:49"'cause" the different versions of this program that we alignments that read in uh
0:24:53posteriors gaussian in little posters and different thing
0:24:57so
0:24:59it takes the model it take the feature this of the shell variable is good bye
0:25:04it read than this stuff from the input put an input
0:25:07and the outputs of this
0:25:10so but by the way
0:25:13whenever something has a arc on it or or the C P O
0:25:16that
0:25:17that's an are specify or or doubly specify that means that as a collection of objects being passed around indexed
0:25:23by key
0:25:24but if you don't see that
0:25:26like here and is just a file is just a single stream
0:25:29is not there's no notion of index
0:25:34a
0:25:35there the
0:25:36i think a cover this
0:25:38a this oh you and that's is the gmm mm update
0:25:41so it takes the
0:25:44the original late to outputs the you model
0:25:50so
0:25:50the that this is the viterbi stage of training
0:25:53what what we do during training is on so on selected to rate it's iterations we redo the alignment
0:25:59we don't necessarily do that every iteration simply because
0:26:02this is the
0:26:03this is the thing that takes most to the time
0:26:06and and it "'cause" it
0:26:09if you have multiple gaussian Z
0:26:11uh
0:26:12this is not the only thing that's going on in training so it makes sense to uh
0:26:16not do it every
0:26:17so
0:26:19i think this is pretty obvious that should be to that she's here
0:26:23it
0:26:24i you give it the beam
0:26:25with the model this is the yeah this is this stream that that has all the fsts on it
0:26:32features
0:26:33and uh
0:26:35it's gonna right
0:26:36it's gonna
0:26:37sorry oh that's a as an option i mentioned briefly options
0:26:41on these are specify or or in this case a double is just five it so that a right in
0:26:44text format
0:26:46the default is binary but you could do common be if you want to emphasise that
0:26:51uh
0:26:51you monophone training we re align on almost every iteration because
0:26:56thing i found that that would better or something thing or maybe it's because you usually have single gaussian
0:27:01uh during right
0:27:03i i think
0:27:04after that you system is they do to but you don't have to
0:27:07we so often so typically during it kind of you pocket triphone training we'd only
0:27:12realigned three or four time
0:27:14uh
0:27:15so mix up to increase the number of gaussian
0:27:17is maybe slightly against the whole called a philosophy but
0:27:20it's just an option to the update program
0:27:24uh
0:27:25the way we allocate gas since we don't have a constant number of gaussians per state
0:27:30we
0:27:31we provide uh
0:27:33it it's a power law it's proportional to the count
0:27:36and this shouldn't be no but by the should be not point to i don't know why that
0:27:40way
0:27:41uh
0:27:42it it's just slightly better than having a constant number
0:27:46so yeah just schedule we used to allocate the guest in that's typically
0:27:50you start from a set the number
0:27:52you linearly increase
0:27:54and then it
0:27:55levels out it would probably be more natural to increase with the log
0:28:00kind of increase of the power law something but
0:28:02it just didn't work as well
0:28:03was but it to do a linear
0:28:08i
0:28:09uh
0:28:10okay
0:28:11so a triphone training
0:28:13the first stage is we
0:28:15we align all of the data of that we uh
0:28:18for the monophone we use the subset because this is no point
0:28:22so just small system
0:28:23so we re all of the data and we've output alignment
0:28:27we
0:28:27we we Q my a special kind of stats for training the decision tree
0:28:31what this is
0:28:32for each unique tries
0:28:34triphone context in this case
0:28:36it's gonna a malay single gaussian
0:28:39well the stats for a single gaussian and this is gonna and was to train the tree the standard way
0:28:44so that the just stuff in the script that kind of
0:28:47automatically that some automatic clustering produces question
0:28:51we don't use hundred or questions is as the hassle
0:28:54find them
0:28:56and and this
0:28:58it a these a producing various files that will be read like D
0:29:01so a lot of the actual control of how the tree get set up is some of the script level
0:29:07a building the tree this is the colour command the bill the tree
0:29:12what that's actually does is the it goes that to fifteen hundred leaves
0:29:15and then it kind of clutches it like down a little bit
0:29:18but by nonpredictable amount because
0:29:20yeah chills threshold it uses to
0:29:23you the clustering of to the initial splitting
0:29:26is the same as the kind of last successful split
0:29:29so you can't quite predict have big it'll be but normally it's tricks by twenty percent
0:29:34or is what you
0:29:35it's give you
0:29:37so you initialise the model
0:29:39for this tree this this this program doesn't know if it's gonna be a gmm or of for for various
0:29:44gmm or S gmm you gonna create
0:29:47oh to separate program to initialize the model
0:29:50uh
0:29:52is a nice feature of the whole alignment
0:29:54onset
0:29:55you can actually take can a and produce for one model
0:29:58and converted it to kind of be valid for another model
0:30:01so that means that
0:30:03you can avoid it's a certain amount of uh we generating a
0:30:10okay so if you want to decode you have to build the decoding graph
0:30:14this is a this is the how we and be a graph generation
0:30:18and the think that is doing first to compose is L with G
0:30:22it's a as minimize is you get a L G
0:30:25that's an
0:30:26this some
0:30:27so stuff for disambiguation symbols going on
0:30:30uh
0:30:31if if you and are gonna go through that
0:30:35then you have to compose the in the context of christian
0:30:38it kind of expands the file that a context-dependent phone
0:30:42and
0:30:43that's a kind of dynamic generation of uh
0:30:46the context of T going on that happens within member in here
0:30:50and not gonna go
0:30:51and more do sell uh what's going on here
0:30:54and then this last one
0:30:56uh make eight trends use so that this hey jeff T that
0:30:59on the
0:31:01a basic we expand that the hey jim and
0:31:03so on the right to the context-dependent phones on the left
0:31:06you got the P D S but uh adds all the stuff that network
0:31:09so this is grading to see that does that and the last just to uh
0:31:14compose hates with C L G
0:31:18uh
0:31:19it's M and eyes i
0:31:20yeah
0:31:21oh in we did that without self loop so at the end
0:31:24i this is to to just
0:31:26make these more memory efficient
0:31:28we don't we wait till the very end had the self
0:31:34so
0:31:34this just goes prefix prefixes
0:31:36for the decoding script
0:31:38we create a shell variable that tells that what the features will be
0:31:42then we invoke this
0:31:43uh
0:31:44program that's three decoders and
0:31:47this what the affect the uh come man it could be jen and decode code
0:31:50simple faster or D
0:31:52it's the the kind of medium one
0:31:54there
0:31:54the G and decode simple is mainly that for debugging
0:31:57"'cause" it's so simple that
0:31:59you know the can be anything wrong with it sorry
0:32:01we just we just compare the to
0:32:05i
0:32:05yeah
0:32:06so
0:32:08decoding coding be missed twenty it's is the beam min kind of language model
0:32:12scale
0:32:13we only a the acoustic the language model
0:32:16uh
0:32:17this is just a get more human readable out put
0:32:21the model
0:32:22the F S T
0:32:25features
0:32:26this isn't a W specify specifies had to write transcription
0:32:31it's says do it
0:32:31and text for format
0:32:33but D the can be integers
0:32:35we're gonna have to change them to uh
0:32:38the text before scoring if one
0:32:40format
0:32:41and this is a this is the the alignment
0:32:44this is
0:32:45it is a really useful like Q just decoding
0:32:47but
0:32:48i you might want to do adaptation late to using using that
0:32:51decoding us that these supervision
0:32:53so it just kind of
0:32:54we just always produce a "'cause" it doesn't cost
0:33:02okay so
0:33:04i think that's basically comes to the end of this talk
0:33:07a given you a very vague idea of have the scripts work what's and them
0:33:11or we see there's a lot more details that you'd the
0:33:13to find out before using them but a lot of that stuff is in the documentation
0:33:17you have to kind of dig around the documentation
0:33:20i've been told that it's not very uh clear where to star
0:33:23but there is a lot of it there so if you just willing to read at all
0:33:27a the thought also it it have heavily cross reference
0:33:31so
0:33:31if you five something that kind of related to what you need
0:33:34that usually be a link that you can click on that will take it to what you do
0:33:39i
0:33:41okay so that's it
0:33:49for any question
0:33:52uh
0:33:55yeah
0:34:03uh
0:34:04you never really deals directly with
0:34:07with any of those symbols because all in integers
0:34:10so
0:34:11it really doesn't matter as as can and what it is so yeah i seen that you could do any
0:34:16t-f eight
0:34:21well those are those have to be
0:34:24does have to contain no white space and been on them
0:34:28so
0:34:29i don't i don't
0:34:31yeah it's not is not gonna worry but you T F a long is not white space i think that
0:34:35in never checks but it's actually ask
0:34:38but i mean
0:34:39i think you to if a is that's that if it's not
0:34:43but is no we gonna be white speech "'cause" it's all always more than a hundred and twenty eight
0:34:47i don't
0:34:49but it it should be a a for any have a
0:34:51i i don't really think is a good idea to put you T F A in those things
0:34:54because i mean
0:34:58don't
0:34:59you have old fashion
0:35:04uh_huh
0:35:08yeah
0:35:09i i i think it should work but
0:35:11you gonna be concerned about this
0:35:13about this
0:35:14shell the "'cause" it could be that the shell of doing some kind of
0:35:17manipulation on the lines of that foundation go weird characters
0:35:21i don't know with the will work
0:35:23you know but
0:35:26but it's
0:35:27it should be easily changeable to handle that if it really becomes an issue
0:35:31uh
0:35:38uh_huh
0:35:42uh
0:35:44i believe that can i think our i reasoning right
0:35:46that that i don't recall call it things ever been tested
0:35:49a pulse go
0:35:49so it it's two percent
0:35:53well the five that i shall can have those probabilities but
0:35:56i think the perl script that
0:35:58great the lexicon actually
0:36:00has a flag but is post except them
0:36:02or least that that one point but
0:36:04there of that the but it's just it's just that you know that you line for script so it's not
0:36:08like
0:36:13yeah yeah so it that really care whether a lexicon has probability is just an F T
0:36:18so yeah
0:36:23the
0:36:25uh
0:36:26yeah
0:36:37ooh
0:36:38well uh
0:36:40they can get very large and
0:36:42i i mean you with ones that we like
0:36:44one and a half ago
0:36:45i think that was with a a somewhat and trigram lm
0:36:50they do get very big but at some point we gonna create coders that
0:36:53a from that problem
0:36:56i mean i think the festive
0:36:57from a as a it's great because he D back its simple but
0:37:02maybe the memories
0:37:04we're gonna work on
0:37:08if there's no more questions i guess we can call it a day
0:37:11oh one more
0:37:14i guess for but if there
0:37:20oh
0:37:21you have to redo them music they own oh oh
0:37:26oh
0:37:28yeah
0:37:32yeah