| 0:00:26 | a a a i i i | 
|---|
| 0:00:27 | no | 
|---|
| 0:00:30 | i | 
|---|
| 0:00:31 | a | 
|---|
| 0:00:32 | i you know what | 
|---|
| 0:00:33 | a i today | 
|---|
| 0:00:35 | and people | 
|---|
| 0:00:36 | and some people would be you know a to up to have an another's with | 
|---|
| 0:00:40 | they on if an sup of for the rest of the live | 
|---|
| 0:00:43 | i thinking that those prophecies is of been just slightly misinterpreted | 
|---|
| 0:00:47 | and the event that they were referring to is this | 
|---|
| 0:00:49 | a wonderful speech to okay | 
|---|
| 0:00:51 | i | 
|---|
| 0:00:53 | a that have in of actually that | 
|---|
| 0:00:55 | you know it almost buys review thing it | 
|---|
| 0:00:57 | so uh | 
|---|
| 0:00:59 | i do i think that have anything and their estimates the significance of this that | 
|---|
| 0:01:04 | a that's okay | 
|---|
| 0:01:05 | okay | 
|---|
| 0:01:06 | so first they just about the name | 
|---|
| 0:01:08 | it's a it's some kind of coffee reference hence the little coffee being with uh | 
|---|
| 0:01:13 | but had so | 
|---|
| 0:01:15 | a | 
|---|
| 0:01:16 | but | 
|---|
| 0:01:16 | is just | 
|---|
| 0:01:17 | whatever name we thought to | 
|---|
| 0:01:20 | so uh | 
|---|
| 0:01:21 | the structure of this uh this whole presentation is fess i'm gonna talk | 
|---|
| 0:01:25 | for about | 
|---|
| 0:01:26 | fifteen or twenty minute | 
|---|
| 0:01:28 | just giving you know of you kind of from all sides of this tool K | 
|---|
| 0:01:32 | and then we're gonna a people to escape in case they don't want to know more details than the have | 
|---|
| 0:01:36 | a short break | 
|---|
| 0:01:37 | and then | 
|---|
| 0:01:39 | i not and uh on drug going to talk about a uh | 
|---|
| 0:01:43 | some more called local stuff like | 
|---|
| 0:01:45 | and i was gonna talk about some of the acoustic modeling code | 
|---|
| 0:01:48 | and we'll talk about the uh matrix like | 
|---|
| 0:01:51 | which just kind of independent useful | 
|---|
| 0:01:53 | uh | 
|---|
| 0:01:54 | speech | 
|---|
| 0:01:55 | and then after that | 
|---|
| 0:01:56 | uh | 
|---|
| 0:01:57 | i'm gonna go through some example scripts that we have been try to get people | 
|---|
| 0:02:01 | more of a you know | 
|---|
| 0:02:02 | give people a sense of of how to use that | 
|---|
| 0:02:06 | now | 
|---|
| 0:02:07 | or the next slide | 
|---|
| 0:02:09 | so | 
|---|
| 0:02:10 | some important aspect of the project is it the | 
|---|
| 0:02:13 | it's license under a you V two point uh which is the | 
|---|
| 0:02:17 | style a license that basically allows it to do anything you want with it | 
|---|
| 0:02:21 | there is only a uh | 
|---|
| 0:02:23 | an acknowledgement a | 
|---|
| 0:02:25 | close which as you have to acknowledge that | 
|---|
| 0:02:27 | the code came from that but that | 
|---|
| 0:02:29 | that's of that's | 
|---|
| 0:02:31 | it's it's one of the most open up the standard lies | 
|---|
| 0:02:35 | uh | 
|---|
| 0:02:36 | the project of currently hosted on source forge which is the | 
|---|
| 0:02:39 | standard place for these kinds of open source project | 
|---|
| 0:02:43 | uh | 
|---|
| 0:02:44 | we we it | 
|---|
| 0:02:45 | some talk it's a very closely associated with a particular institution | 
|---|
| 0:02:49 | our attention is for it to be more of a kind of | 
|---|
| 0:02:52 | thing that lives | 
|---|
| 0:02:54 | and the clouds out or and source for | 
|---|
| 0:02:56 | i i shouldn't have use that will that that's to | 
|---|
| 0:02:58 | that's just gratuitous that | 
|---|
| 0:03:00 | but it yeah there it's very for it not to just be him a | 
|---|
| 0:03:04 | the pet project of some particular little group but uh | 
|---|
| 0:03:07 | that's to represent | 
|---|
| 0:03:08 | the best of what's out there and and and we will can be participants as long as you can contribute | 
|---|
| 0:03:12 | code under | 
|---|
| 0:03:13 | this slice sense than that's great | 
|---|
| 0:03:16 | uh | 
|---|
| 0:03:17 | it's basically a C plus plus to at | 
|---|
| 0:03:19 | the code compiles it a native windows and | 
|---|
| 0:03:22 | and the common units but fun like can we're not claiming that a compile once on or you know | 
|---|
| 0:03:28 | other we're problem but but it compiled from on the normal one | 
|---|
| 0:03:32 | a | 
|---|
| 0:03:34 | you have some documentation not as much as takes T K | 
|---|
| 0:03:37 | and and and we have example script | 
|---|
| 0:03:40 | these example scripts and not uh | 
|---|
| 0:03:42 | there just for results also as one and and uh | 
|---|
| 0:03:45 | wall street journal | 
|---|
| 0:03:46 | but we're gonna have more to | 
|---|
| 0:03:48 | they | 
|---|
| 0:03:50 | they basically run from ldc that's | 
|---|
| 0:03:52 | so once you have the this you can kind of point them to the disk | 
|---|
| 0:03:55 | and just | 
|---|
| 0:03:56 | get an idea of how it work | 
|---|
| 0:04:00 | so | 
|---|
| 0:04:04 | oh no i now i realise that we didn't look a large enough row | 
|---|
| 0:04:08 | i think i think we just have a tie this thing to uh aggressively | 
|---|
| 0:04:12 | if these were not guy | 
|---|
| 0:04:14 | uh | 
|---|
| 0:04:15 | yeah | 
|---|
| 0:04:16 | so | 
|---|
| 0:04:20 | okay somehow out i gonna go through the kind of a think that support this is just the current features | 
|---|
| 0:04:24 | obviously | 
|---|
| 0:04:25 | or tending to a lot more | 
|---|
| 0:04:27 | so you can build a standard context-dependent uh | 
|---|
| 0:04:30 | lvcsr system | 
|---|
| 0:04:32 | you know with tree clustering | 
|---|
| 0:04:34 | in that it's been written in such a way that it supports arbitrary context size is so you can go | 
|---|
| 0:04:39 | to | 
|---|
| 0:04:39 | quint phone oh what's have and it will uh | 
|---|
| 0:04:42 | a work | 
|---|
| 0:04:43 | without without pain | 
|---|
| 0:04:45 | but the the training coding about fst based on a | 
|---|
| 0:04:49 | our code compiled against openfst | 
|---|
| 0:04:52 | for those of you who don't know up fst is | 
|---|
| 0:04:55 | it's kind of like the eighteen T tells set it's open source | 
|---|
| 0:04:58 | it's uh | 
|---|
| 0:04:59 | a project uh | 
|---|
| 0:05:00 | like google and some other | 
|---|
| 0:05:04 | um | 
|---|
| 0:05:06 | we can only only have max and like the had training | 
|---|
| 0:05:09 | we haven't yet done lattice generation but at time | 
|---|
| 0:05:12 | timeline line for adding discriminative training and lattice generation | 
|---|
| 0:05:15 | and | 
|---|
| 0:05:16 | this summer slash | 
|---|
| 0:05:18 | like | 
|---|
| 0:05:20 | uh | 
|---|
| 0:05:21 | we we we support all kinds of linear and affine transforms you can imagine | 
|---|
| 0:05:25 | i don't not all of these | 
|---|
| 0:05:27 | necessarily involve uh | 
|---|
| 0:05:29 | you know that tree version | 
|---|
| 0:05:30 | what where you have a | 
|---|
| 0:05:32 | multiple regression plot | 
|---|
| 0:05:34 | that's just because we | 
|---|
| 0:05:36 | are trying to avoid very complicated frameworks that would make that so difficult to use | 
|---|
| 0:05:41 | so a lot of these just support point a single transform | 
|---|
| 0:05:45 | we | 
|---|
| 0:05:45 | all of these things also have examples scrip | 
|---|
| 0:05:48 | so it's not just something that's in the code that | 
|---|
| 0:05:51 | that we know work | 
|---|
| 0:05:51 | something that you can also | 
|---|
| 0:05:53 | get to | 
|---|
| 0:05:55 | so | 
|---|
| 0:05:57 | and trying to have a i did want to just is other tool kits as a little disclaimer here | 
|---|
| 0:06:01 | that | 
|---|
| 0:06:02 | we're not claiming that all of tool kids don't have any of these advantages to | 
|---|
| 0:06:07 | but uh | 
|---|
| 0:06:08 | waiting for clean coal code and modular design | 
|---|
| 0:06:12 | uh | 
|---|
| 0:06:13 | and and by module we we probably need something a little bit stronger than you would normally uh | 
|---|
| 0:06:19 | normally imagine it's it's written in such a way that | 
|---|
| 0:06:22 | it's not only easy to combine the various things that are in the | 
|---|
| 0:06:25 | but it's easy to uh | 
|---|
| 0:06:27 | kind of extend arbitrarily | 
|---|
| 0:06:29 | and and we have avoid the kind of code where | 
|---|
| 0:06:32 | when you add something | 
|---|
| 0:06:34 | a bunch of other bits of code have to know about what you added then you have to modify all | 
|---|
| 0:06:38 | kinds of | 
|---|
| 0:06:39 | you know | 
|---|
| 0:06:40 | all kinds of other | 
|---|
| 0:06:42 | and | 
|---|
| 0:06:44 | the part is a big uh | 
|---|
| 0:06:46 | advantage i know but not a lot of uh | 
|---|
| 0:06:50 | to gets such a completely free lies | 
|---|
| 0:06:53 | and that that we don't really anticipate this being used for commercial purposes | 
|---|
| 0:06:57 | uh | 
|---|
| 0:06:58 | our understanding is that | 
|---|
| 0:06:59 | a lot of research group | 
|---|
| 0:07:02 | as a matter of principle they they won't | 
|---|
| 0:07:04 | you stuff that has no commercially license because this say | 
|---|
| 0:07:07 | is this research can the commercial by the | 
|---|
| 0:07:10 | and now | 
|---|
| 0:07:11 | or of the license will | 
|---|
| 0:07:14 | uh | 
|---|
| 0:07:15 | have example scripts which were which were uh | 
|---|
| 0:07:19 | standing documentation | 
|---|
| 0:07:21 | and | 
|---|
| 0:07:22 | that this whole community building think that the people involved in cal is currently uh | 
|---|
| 0:07:27 | it | 
|---|
| 0:07:28 | it's a group of people mostly vol | 
|---|
| 0:07:30 | who are to the previous to you works so | 
|---|
| 0:07:33 | myself are are not a bunch of guys from but | 
|---|
| 0:07:36 | and and if you others | 
|---|
| 0:07:38 | case | 
|---|
| 0:07:39 | uh | 
|---|
| 0:07:40 | but we open to new participant | 
|---|
| 0:07:43 | and uh | 
|---|
| 0:07:45 | well what what what we're hoping for mainly is not just people who come to be a line not to | 
|---|
| 0:07:49 | of code but | 
|---|
| 0:07:50 | the people who really want to understand the whole thing | 
|---|
| 0:07:53 | you can contribute a significant amount | 
|---|
| 0:07:56 | um | 
|---|
| 0:07:58 | the | 
|---|
| 0:07:59 | it's okay is especially good for stuff that involves a lot of linear algebra | 
|---|
| 0:08:03 | it has a | 
|---|
| 0:08:04 | very good matrix like be the andreas going to talk about | 
|---|
| 0:08:07 | so if you want to do stuff that involves a lot of a matrix and vector | 
|---|
| 0:08:12 | H | 
|---|
| 0:08:13 | are | 
|---|
| 0:08:14 | do | 
|---|
| 0:08:15 | also uh | 
|---|
| 0:08:16 | of course we we compile pile against the openfst library so | 
|---|
| 0:08:20 | you can do have T stuff you know at the code | 
|---|
| 0:08:23 | uh | 
|---|
| 0:08:24 | its built in | 
|---|
| 0:08:26 | a scalable way | 
|---|
| 0:08:27 | well | 
|---|
| 0:08:30 | it doesn't explicitly interact with any power level is a parallel by | 
|---|
| 0:08:34 | it doesn't | 
|---|
| 0:08:36 | it doesn't interact with them at weird do use or or um | 
|---|
| 0:08:39 | um P i i think | 
|---|
| 0:08:40 | "'cause" we felt that that that would just lock it into particular kinds of system | 
|---|
| 0:08:44 | so uh | 
|---|
| 0:08:45 | but | 
|---|
| 0:08:46 | all the a | 
|---|
| 0:08:47 | it's been in in such a way that uh it should still work efficiently when everything is very large scale | 
|---|
| 0:08:53 | you have a lot of day | 
|---|
| 0:08:55 | our our intention is to it and all of the state-of-the-art methods | 
|---|
| 0:09:00 | for lvcsr things like | 
|---|
| 0:09:01 | discriminative training | 
|---|
| 0:09:03 | a standard | 
|---|
| 0:09:04 | all of the standard adaptation | 
|---|
| 0:09:07 | uh | 
|---|
| 0:09:09 | but uh i think i say | 
|---|
| 0:09:11 | on the next slide | 
|---|
| 0:09:12 | uh | 
|---|
| 0:09:13 | something that we not kinda doing in the in the immediate future | 
|---|
| 0:09:17 | it's things like online decoding which | 
|---|
| 0:09:19 | what i mean by that is uh | 
|---|
| 0:09:21 | where the data is coming in say from a microphone or telephone | 
|---|
| 0:09:25 | and it's some kind of interactive application | 
|---|
| 0:09:27 | because you could use it to do that and building a decoder isn't that hard in this framework | 
|---|
| 0:09:32 | but uh | 
|---|
| 0:09:34 | i basic target audience is uh | 
|---|
| 0:09:37 | speech recognition researchers who want to work | 
|---|
| 0:09:40 | on the speech rec oh | 
|---|
| 0:09:42 | other than | 
|---|
| 0:09:43 | rather than those who uh | 
|---|
| 0:09:47 | oh have a mock i was learning what everyone was looking at a multiscale to enter the room um and | 
|---|
| 0:09:52 | disrupted that's all right if very present | 
|---|
| 0:09:56 | i | 
|---|
| 0:09:56 | oh i | 
|---|
| 0:09:59 | i | 
|---|
| 0:10:00 | i | 
|---|
| 0:10:00 | okay | 
|---|
| 0:10:02 | i | 
|---|
| 0:10:04 | so i | 
|---|
| 0:10:06 | we set some people lately have uh | 
|---|
| 0:10:10 | this become popular recently take do a kind of life unwrapper for C plus plus code | 
|---|
| 0:10:15 | the idea being that you can uh | 
|---|
| 0:10:17 | more easily write your script | 
|---|
| 0:10:19 | however we we've avoided that approach because | 
|---|
| 0:10:22 | probably because it's a hassle to do the the wrapping | 
|---|
| 0:10:25 | and nobody ever understands house we were | 
|---|
| 0:10:28 | probably because uh | 
|---|
| 0:10:30 | it just forces people to learn a new language and | 
|---|
| 0:10:32 | probably those who just want the colours think that everyone knows by | 
|---|
| 0:10:36 | that | 
|---|
| 0:10:37 | uh | 
|---|
| 0:10:39 | so | 
|---|
| 0:10:40 | we support the kind of | 
|---|
| 0:10:42 | flexibility and configurable ability of that in different ways | 
|---|
| 0:10:46 | but partly uh | 
|---|
| 0:10:48 | i think it'll become clear later so perhaps will | 
|---|
| 0:10:53 | will will will leave to lake to those ask | 
|---|
| 0:10:56 | so we don't have back would training their in there are no immediate plans to do it | 
|---|
| 0:11:01 | and i some people i think some people like for back for kind of religious reason | 
|---|
| 0:11:05 | but uh | 
|---|
| 0:11:06 | i don't believe any was demonstrated the viterbi be is worse | 
|---|
| 0:11:10 | and it just so and we need to use with a be | 
|---|
| 0:11:12 | for uh | 
|---|
| 0:11:14 | we because you can write the alignments to this compact lee | 
|---|
| 0:11:17 | and then | 
|---|
| 0:11:18 | on | 
|---|
| 0:11:23 | really | 
|---|
| 0:11:24 | okay | 
|---|
| 0:11:27 | really interesting | 
|---|
| 0:11:29 | but i i i even even not let this as | 
|---|
| 0:11:32 | like | 
|---|
| 0:11:32 | just a single hypothesis | 
|---|
| 0:11:34 | makes it if | 
|---|
| 0:11:36 | okay | 
|---|
| 0:11:37 | so we'll have to think about that i mean it's not like it's really hard to do | 
|---|
| 0:11:40 | but it just wasn't something that we had planned | 
|---|
| 0:11:42 | uh | 
|---|
| 0:11:45 | one | 
|---|
| 0:11:47 | uh_huh | 
|---|
| 0:11:49 | oh okay | 
|---|
| 0:11:50 | well it's at the state level | 
|---|
| 0:11:53 | but we it's not really this the | 
|---|
| 0:11:55 | i stay i mean pdf | 
|---|
| 0:11:56 | index but | 
|---|
| 0:11:57 | that you little bit more precise not because uh | 
|---|
| 0:12:00 | you just right out the state sequence it's fine for model training but then | 
|---|
| 0:12:04 | if you wanna work work the phone sequence the penny how tree work | 
|---|
| 0:12:08 | it might not be implied by the state sequence of then we have these identifiers the also contain the phone | 
|---|
| 0:12:13 | and the transition | 
|---|
| 0:12:15 | oh it's and it's a it's an integer a list of it just but those | 
|---|
| 0:12:19 | in integers | 
|---|
| 0:12:21 | are not quite the states there | 
|---|
| 0:12:22 | something that can be mapped to the state also to the phone | 
|---|
| 0:12:28 | uh | 
|---|
| 0:12:30 | so i'm just gonna describe a how this | 
|---|
| 0:12:33 | came to be we had this work in two thousand nine | 
|---|
| 0:12:36 | a a lot of uh focus was on | 
|---|
| 0:12:38 | as G M N | 
|---|
| 0:12:41 | um | 
|---|
| 0:12:42 | we that the supper we we were using that some guys some brno a university of technology | 
|---|
| 0:12:47 | including a on draw look at another | 
|---|
| 0:12:49 | uh | 
|---|
| 0:12:50 | they built this | 
|---|
| 0:12:51 | uh infrastructure for uh | 
|---|
| 0:12:54 | for training as gmms that was it was written in C plus plus but it rely don't he's T K | 
|---|
| 0:12:58 | system | 
|---|
| 0:12:59 | and i also built a a and F E F S T based code | 
|---|
| 0:13:02 | so that we could be code our own C plus plus code with access to the matrix like | 
|---|
| 0:13:07 | um | 
|---|
| 0:13:08 | so we kind of calling that crow took D | 
|---|
| 0:13:12 | and and we wanted to release that | 
|---|
| 0:13:14 | recipe | 
|---|
| 0:13:15 | you know in is some kind of open source way but we realise that | 
|---|
| 0:13:19 | the rest P was just too hard to encapsulate because the had he's T K had our stuff | 
|---|
| 0:13:24 | as a lot of script | 
|---|
| 0:13:26 | so we we wanted to create something that | 
|---|
| 0:13:29 | good support this stuff and was easy to encapsulate so we we an entirely new uh | 
|---|
| 0:13:35 | uh | 
|---|
| 0:13:37 | the next summer we were entirely new toolkit that is | 
|---|
| 0:13:41 | you know that we that | 
|---|
| 0:13:43 | we wanted everything to be clean and unified | 
|---|
| 0:13:45 | and to have a nice use shiny C plus plus | 
|---|
| 0:13:48 | speech rec my | 
|---|
| 0:13:51 | i think that's the uh | 
|---|
| 0:13:53 | i think that's this a | 
|---|
| 0:13:55 | slides a last somewhere | 
|---|
| 0:13:57 | are two thousand ten we had another workshop and or no | 
|---|
| 0:14:00 | where we uh | 
|---|
| 0:14:02 | that a lot of coding | 
|---|
| 0:14:03 | and and the vision at that time which and i realise is very unrealistic | 
|---|
| 0:14:07 | was that we | 
|---|
| 0:14:08 | we have a complete working system with example script | 
|---|
| 0:14:12 | you know the end of the sum | 
|---|
| 0:14:14 | but that that kind of didn't really materialise a had a lot of pieces | 
|---|
| 0:14:18 | but we didn't really have a complete working system so | 
|---|
| 0:14:22 | after uh | 
|---|
| 0:14:24 | i kind of obligated to | 
|---|
| 0:14:26 | you know | 
|---|
| 0:14:27 | and is the system and and and we had a help from others thing especially on that | 
|---|
| 0:14:31 | and and doing a lot of coding after that | 
|---|
| 0:14:34 | so | 
|---|
| 0:14:37 | uh | 
|---|
| 0:14:40 | when we go to the next slide | 
|---|
| 0:14:42 | it's a it's only been officially really something like last week | 
|---|
| 0:14:46 | that's when we actually uh got all the legal approvals and | 
|---|
| 0:14:49 | put up on source forge | 
|---|
| 0:14:51 | this is just a list of the people i don't think i'm gonna go through all the names | 
|---|
| 0:14:55 | this the list of all the people who are rich then uh code specifically for D | 
|---|
| 0:14:59 | that's of the list the people who done various other things or it's so help the in various ways | 
|---|
| 0:15:04 | and | 
|---|
| 0:15:05 | uh | 
|---|
| 0:15:07 | i would describe exactly have for each one but i'm kind of scared i've left someone of one of these | 
|---|
| 0:15:12 | lists | 
|---|
| 0:15:13 | and | 
|---|
| 0:15:14 | i i i i | 
|---|
| 0:15:15 | i i just let you read it | 
|---|
| 0:15:18 | um | 
|---|
| 0:15:20 | a lot of these people are | 
|---|
| 0:15:21 | have some connection to bird or you invested to of technology | 
|---|
| 0:15:25 | oh people but the in uh or | 
|---|
| 0:15:28 | like that | 
|---|
| 0:15:30 | so that this is a | 
|---|
| 0:15:32 | this is is a rather messy diagram | 
|---|
| 0:15:34 | i i just wanted | 
|---|
| 0:15:36 | i want to give you some idea of what the dependency structure of kaldi was but i decided to put | 
|---|
| 0:15:40 | side information and to here so | 
|---|
| 0:15:42 | the area of these uh | 
|---|
| 0:15:45 | of these rectangles is roughly proportional to how many lines of code | 
|---|
| 0:15:49 | there are | 
|---|
| 0:15:50 | so | 
|---|
| 0:15:51 | the these think the thing that we can pile again | 
|---|
| 0:15:54 | so open a fist is the C plus plus library | 
|---|
| 0:15:57 | uh | 
|---|
| 0:15:59 | at let's C left that refers to the math libraries that we can pile again | 
|---|
| 0:16:03 | uh | 
|---|
| 0:16:06 | and | 
|---|
| 0:16:06 | and the rough dependency structures thing on top of things that and on them but | 
|---|
| 0:16:10 | is very approximate | 
|---|
| 0:16:11 | so | 
|---|
| 0:16:13 | for instance he's various | 
|---|
| 0:16:14 | fst the algorithms that we've extended of an fst with | 
|---|
| 0:16:18 | uh | 
|---|
| 0:16:19 | stuff relating to tree clustering for decision tree | 
|---|
| 0:16:23 | uh | 
|---|
| 0:16:24 | that for leading to hmm topology | 
|---|
| 0:16:27 | decoder decoders | 
|---|
| 0:16:29 | language modeling thing this is a small box because really all it does is uh | 
|---|
| 0:16:34 | compile a marketing to enough | 
|---|
| 0:16:37 | i two that | 
|---|
| 0:16:38 | uh | 
|---|
| 0:16:41 | you tell this at this is mostly i O stuff as various frameworks for io | 
|---|
| 0:16:45 | that | 
|---|
| 0:16:46 | will be explained later run kind of after a break | 
|---|
| 0:16:49 | so we can allow people | 
|---|
| 0:16:50 | skate | 
|---|
| 0:16:51 | this is the matrix like we so this | 
|---|
| 0:16:54 | a lot of this is just wrappers for stuff that's here | 
|---|
| 0:16:57 | but if any i don't know if any of you are familiar with | 
|---|
| 0:16:59 | with the steal a pack and blast and those things | 
|---|
| 0:17:02 | but their C libraries that | 
|---|
| 0:17:04 | for C plus plus program a slightly painful to work with "'cause" they have all of these arguments like the | 
|---|
| 0:17:09 | rose the columns | 
|---|
| 0:17:10 | tried | 
|---|
| 0:17:11 | and the thing you wanna do is this very long line of code | 
|---|
| 0:17:14 | and uh | 
|---|
| 0:17:16 | so there's no notion of like a matrix as an object | 
|---|
| 0:17:18 | so this kind of ad that abstraction and it is it is significantly easier to use | 
|---|
| 0:17:23 | then of this make the | 
|---|
| 0:17:26 | uh | 
|---|
| 0:17:27 | this is feed sure | 
|---|
| 0:17:29 | preprocessing and you know | 
|---|
| 0:17:30 | going from a web file to mfcc that's that's fair | 
|---|
| 0:17:34 | uh gaussian mixture models a diagonal and full | 
|---|
| 0:17:39 | subspace gaussian mixture models this is | 
|---|
| 0:17:41 | the reason might talk | 
|---|
| 0:17:42 | the | 
|---|
| 0:17:44 | linear transforms | 
|---|
| 0:17:45 | things like fmllr M L R S T C | 
|---|
| 0:17:49 | hlda | 
|---|
| 0:17:50 | things of that nature | 
|---|
| 0:17:52 | vtln is in here to kind of the | 
|---|
| 0:17:54 | linear form of vtln | 
|---|
| 0:17:57 | uh | 
|---|
| 0:17:57 | all of these things that he had these are kind of you know directories that contain | 
|---|
| 0:18:01 | command line programs that tells you a bit about the structure of the toolkit which is that we have | 
|---|
| 0:18:06 | which really more than a hundred command line program | 
|---|
| 0:18:09 | and each one does a fairly specific thing | 
|---|
| 0:18:12 | wanted to avoid this phenomenon where you have a program that kind of allegedly does one thing | 
|---|
| 0:18:17 | that really is controlled by button really an option | 
|---|
| 0:18:20 | and has rather complicated behavior depending which upset you give it | 
|---|
| 0:18:24 | so this is part of the mechanism that we use to ensure | 
|---|
| 0:18:27 | uh | 
|---|
| 0:18:28 | the everything's configurable an easy to understand | 
|---|
| 0:18:31 | is no python layer but that's a lot of uh | 
|---|
| 0:18:34 | programs as simple function | 
|---|
| 0:18:37 | and on top of this | 
|---|
| 0:18:38 | is the | 
|---|
| 0:18:39 | shell scripts | 
|---|
| 0:18:40 | so to do a not actual system building a recipe | 
|---|
| 0:18:45 | what are example scripts currently only do is it's the bash script | 
|---|
| 0:18:48 | and that you know has a bunch of variables and bash to keep track of iteration and stuff | 
|---|
| 0:18:53 | and it and it runs the job | 
|---|
| 0:18:55 | but invoking | 
|---|
| 0:18:56 | from the command line | 
|---|
| 0:18:57 | because the different ways you could do this if you if you love perl up a python or whatever you | 
|---|
| 0:19:01 | as to i | 
|---|
| 0:19:02 | but that's how a script | 
|---|
| 0:19:04 | and and something that | 
|---|
| 0:19:06 | i really haven't included on this diagram but it's kind of parts of the | 
|---|
| 0:19:10 | dependency structures this some | 
|---|
| 0:19:12 | tools that we rely on so | 
|---|
| 0:19:15 | uh | 
|---|
| 0:19:17 | for language modeling | 
|---|
| 0:19:18 | i D thought we use i R T L them just because of license issues but probably you on use | 
|---|
| 0:19:23 | that's i lm if you | 
|---|
| 0:19:25 | wanna do a lot of a language modeling stuff | 
|---|
| 0:19:27 | uh things like as P H two pi | 
|---|
| 0:19:29 | to and | 
|---|
| 0:19:30 | to uh | 
|---|
| 0:19:31 | in separate data from the L | 
|---|
| 0:19:34 | and so on so that the you | 
|---|
| 0:19:35 | we actually we actually have a | 
|---|
| 0:19:37 | and of can | 
|---|
| 0:19:38 | and installation script that will automatically obtain those things are so the scripts can run | 
|---|
| 0:19:43 | without you having to manually install stuff | 
|---|
| 0:19:46 | and your sis | 
|---|
| 0:19:48 | so i'm just gonna | 
|---|
| 0:19:49 | briefly summarise the matrix like tree under will be talking more about it later but the plan was | 
|---|
| 0:19:55 | to allow people to escape after this initial segment | 
|---|
| 0:19:58 | case the not that the boat to that they one here about this stuff | 
|---|
| 0:20:01 | but uh | 
|---|
| 0:20:03 | as i said it's a C plus plus rap for a blast and seal at pat | 
|---|
| 0:20:07 | and we've | 
|---|
| 0:20:07 | well why should say on really has gone to a lot of trouble to ensure that it can compile | 
|---|
| 0:20:12 | and the various | 
|---|
| 0:20:13 | different configurations the what | 
|---|
| 0:20:16 | libraries you have your system | 
|---|
| 0:20:18 | so it can either the work from blast plus C lap pack | 
|---|
| 0:20:21 | or from a less or using | 
|---|
| 0:20:23 | entails M K L | 
|---|
| 0:20:25 | the reason is that on some systems you might have one but not the other | 
|---|
| 0:20:29 | i i less is an implementation of blast that's the | 
|---|
| 0:20:32 | kind of optimized to the specific a hardware | 
|---|
| 0:20:35 | automatically | 
|---|
| 0:20:37 | is is generally a more | 
|---|
| 0:20:39 | so | 
|---|
| 0:20:40 | the code that we've rat | 
|---|
| 0:20:41 | includes | 
|---|
| 0:20:42 | generic matrices like square matrices | 
|---|
| 0:20:46 | also packed symmetric matrices where where you uh | 
|---|
| 0:20:50 | have a symmetric matrix the only store the lower triangle | 
|---|
| 0:20:53 | and it's like this this this | 
|---|
| 0:20:56 | order | 
|---|
| 0:20:57 | and uh pack triangular matrix | 
|---|
| 0:20:59 | there are other formats that last and C web back supports but these are the ones that we for what | 
|---|
| 0:21:04 | most | 
|---|
| 0:21:04 | applicable to | 
|---|
| 0:21:06 | speech processing like we don't are a lot of sparse make sure | 
|---|
| 0:21:10 | and traditional | 
|---|
| 0:21:13 | so | 
|---|
| 0:21:14 | this uh and i like we also includes things like S P D an F S C | 
|---|
| 0:21:18 | i fifty isn't supply any of those libraries but we uh | 
|---|
| 0:21:22 | we we uh got permission from rick come out of our microsoft | 
|---|
| 0:21:25 | to uh | 
|---|
| 0:21:26 | use this code | 
|---|
| 0:21:27 | so he has a good "'em" | 
|---|
| 0:21:30 | um | 
|---|
| 0:21:32 | something about the matrix like the even if you don't buy into the whole to kit | 
|---|
| 0:21:36 | if you need a C plus plus matrix library it's probably a | 
|---|
| 0:21:40 | is probably quite good in fact it's surprising that there it doesn't seem to be a lot out there | 
|---|
| 0:21:45 | that fills this nice just there's blues | 
|---|
| 0:21:48 | but that it's a rather weird library and i i don't think a lot of people like | 
|---|
| 0:21:55 | um | 
|---|
| 0:21:57 | okay if you what the about open F is key | 
|---|
| 0:22:00 | so i i seem and one he knows what what fsts are | 
|---|
| 0:22:03 | it in T had this command line tool kit | 
|---|
| 0:22:06 | but i don't believe they ever released | 
|---|
| 0:22:07 | source | 
|---|
| 0:22:08 | so one some of those guys went to google they decided to have one that was uh | 
|---|
| 0:22:12 | for open source and it's a patch lies | 
|---|
| 0:22:15 | that's why we as part there is reason we made out the a you license | 
|---|
| 0:22:18 | because we figured that | 
|---|
| 0:22:20 | to to use up pin fst there's no real point in having a | 
|---|
| 0:22:23 | different license "'cause" it just gives the law my head | 
|---|
| 0:22:26 | so we went for the same one | 
|---|
| 0:22:28 | ah | 
|---|
| 0:22:30 | so yeah | 
|---|
| 0:22:31 | we can pile against its some that for is the decoder | 
|---|
| 0:22:35 | it doesn't use like a special decoding graph format | 
|---|
| 0:22:38 | use is the uh same memory structures the openfst | 
|---|
| 0:22:43 | and the by the way open F to C has a lot of templates and stuff so that | 
|---|
| 0:22:47 | is not just one fst for and there's a lot of them | 
|---|
| 0:22:49 | so if you want to do you could uh | 
|---|
| 0:22:52 | kind of template your decoder run some fancy format that would be let's a compact or dynamically expanded or some | 
|---|
| 0:22:58 | like | 
|---|
| 0:22:59 | we're not gonna go into that in detail today | 
|---|
| 0:23:02 | so we actually implemented various extensions to openfst | 
|---|
| 0:23:07 | some of the recipes the perhaps not totally in the spirit of openfst because | 
|---|
| 0:23:12 | those guys have a particular recipe that they do | 
|---|
| 0:23:15 | and i was is just a little bit different for | 
|---|
| 0:23:20 | later on i can explain why | 
|---|
| 0:23:21 | i feel that there are good reasons for uh i don't know if those guys would agree with | 
|---|
| 0:23:26 | uh | 
|---|
| 0:23:29 | so | 
|---|
| 0:23:31 | if you with the by about io | 
|---|
| 0:23:33 | it's of the controversial decision among the group to U C plus plus three | 
|---|
| 0:23:38 | in the end we decided to do it probably because openfst also does it | 
|---|
| 0:23:42 | uh | 
|---|
| 0:23:43 | something you know a lot of people prefer sea base i L | 
|---|
| 0:23:46 | but but but we do this | 
|---|
| 0:23:48 | uh | 
|---|
| 0:23:48 | we support binary in text mode formats a little bit like htk so that each | 
|---|
| 0:23:53 | object in the toolkit | 
|---|
| 0:23:55 | as a function that will | 
|---|
| 0:23:57 | right and it takes a little argument binary tech | 
|---|
| 0:24:00 | so it it'll just | 
|---|
| 0:24:01 | put its output it's data out of the stream in binary or text mode | 
|---|
| 0:24:05 | any in each object also has the read function that does the same thing | 
|---|
| 0:24:08 | so | 
|---|
| 0:24:09 | ah | 
|---|
| 0:24:11 | it's of the standard thing in many talk at the used and final made in various ways | 
|---|
| 0:24:15 | like this can mean the standard input standard output | 
|---|
| 0:24:18 | it is just a command | 
|---|
| 0:24:20 | and this is what how it knows that it's | 
|---|
| 0:24:22 | can | 
|---|
| 0:24:23 | uh | 
|---|
| 0:24:24 | this is the | 
|---|
| 0:24:25 | and off that into a found meaning it will | 
|---|
| 0:24:28 | it will open the file fc to that position | 
|---|
| 0:24:31 | it's is uh useful for reasons that will be described later | 
|---|
| 0:24:36 | uh | 
|---|
| 0:24:39 | so this this archive format is it | 
|---|
| 0:24:41 | quite fundamental part of the way uh | 
|---|
| 0:24:44 | kaldi work | 
|---|
| 0:24:45 | and i think | 
|---|
| 0:24:46 | i've just cry i'm gonna describe this more later in a another talk with the basic concept is | 
|---|
| 0:24:51 | you have a collection of objects let's imagine that they're matrix | 
|---|
| 0:24:55 | and there you are there are indexed by a string | 
|---|
| 0:24:58 | where the string might be let's say an utterance id | 
|---|
| 0:25:01 | so you want to have some way to | 
|---|
| 0:25:04 | to access this collection of uh | 
|---|
| 0:25:06 | strings and matrices | 
|---|
| 0:25:09 | and you might there might be a couple of different ways you could do that you might wanna go sequentially | 
|---|
| 0:25:12 | through the | 
|---|
| 0:25:13 | as an accumulation of some | 
|---|
| 0:25:15 | we might want to do random access | 
|---|
| 0:25:17 | so there's a whole framework for doing this | 
|---|
| 0:25:20 | uh | 
|---|
| 0:25:21 | basically the reason is so that your | 
|---|
| 0:25:23 | the most of the calico doesn't have to worry about | 
|---|
| 0:25:27 | things like opening files and ever conditions and | 
|---|
| 0:25:30 | you know that doesn't have to be a lot of logic about that in the command line programs because it's | 
|---|
| 0:25:34 | all handled by some | 
|---|
| 0:25:36 | generic framework | 
|---|
| 0:25:37 | but apart from this we tried to avoid | 
|---|
| 0:25:39 | generic framework | 
|---|
| 0:25:42 | ah | 
|---|
| 0:25:44 | the tree building clustering code | 
|---|
| 0:25:46 | we it's based on | 
|---|
| 0:25:47 | very generic | 
|---|
| 0:25:49 | clustering the can something like | 
|---|
| 0:25:51 | i guess hard to model whatever they call it | 
|---|
| 0:25:53 | uh | 
|---|
| 0:25:54 | so it doesn't that that that internal code doesn't assume a lot about what your trees | 
|---|
| 0:25:59 | it is suitable build decision trees in different ways including | 
|---|
| 0:26:02 | like sharing the true | 
|---|
| 0:26:04 | and asking questions about the central central phone | 
|---|
| 0:26:07 | it's like that | 
|---|
| 0:26:08 | um | 
|---|
| 0:26:10 | it's very scalable to white context for example quint phone | 
|---|
| 0:26:13 | i know a lot of the | 
|---|
| 0:26:16 | it it's hard to write code that was scaled to queen phone because if you have to enumerate all of | 
|---|
| 0:26:20 | the context | 
|---|
| 0:26:22 | that's kind of it's hard hard to go to | 
|---|
| 0:26:24 | a but uh | 
|---|
| 0:26:25 | we basically avoid ever enumerating those con | 
|---|
| 0:26:29 | uh as an example of a | 
|---|
| 0:26:30 | how we make use of this general C | 
|---|
| 0:26:33 | and the wall street journal recipe we uh | 
|---|
| 0:26:35 | we increase the phone sets of the in the were asking about the phone position and the stress | 
|---|
| 0:26:41 | i | 
|---|
| 0:26:41 | a "'cause" the know he's to K supports this "'cause" i thing you had a | 
|---|
| 0:26:44 | have a paper marked with | 
|---|
| 0:26:45 | he about doing that | 
|---|
| 0:26:47 | so uh | 
|---|
| 0:26:48 | but but uh if the phones that much larger than that probably | 
|---|
| 0:26:52 | an approach based on enumeration of context would start | 
|---|
| 0:26:55 | i | 
|---|
| 0:26:57 | you don't think so no i mean like it was a thousand thousand keep this day | 
|---|
| 0:27:03 | right | 
|---|
| 0:27:04 | okay well i | 
|---|
| 0:27:06 | okay | 
|---|
| 0:27:09 | uh | 
|---|
| 0:27:10 | okay hmm and transition modeling co | 
|---|
| 0:27:13 | so | 
|---|
| 0:27:14 | we've | 
|---|
| 0:27:15 | we try to have an approach where | 
|---|
| 0:27:17 | a piece of code only needs to know | 
|---|
| 0:27:20 | the minima needs to know | 
|---|
| 0:27:21 | so so the hey gmm and transition modeling code doesn't really have any notion of a pdf it's purely | 
|---|
| 0:27:27 | it purely does what it needs to do | 
|---|
| 0:27:30 | and the rest to separate | 
|---|
| 0:27:32 | so | 
|---|
| 0:27:32 | this is probably pretty standard approach you you develop a uh | 
|---|
| 0:27:36 | you specify prototype to paul | 
|---|
| 0:27:38 | it's apology for each phone is that how many states what the transitions are | 
|---|
| 0:27:43 | uh | 
|---|
| 0:27:44 | and we make the transitions the | 
|---|
| 0:27:46 | separate depending on the uh | 
|---|
| 0:27:49 | depending on the pdf | 
|---|
| 0:27:50 | so so that if the pdfs into states are different than the transitions out of those | 
|---|
| 0:27:54 | states are separately estimated | 
|---|
| 0:27:56 | is this is just the most | 
|---|
| 0:27:58 | specifically that you can estimate the transitions without having your | 
|---|
| 0:28:02 | decoding graph blowup | 
|---|
| 0:28:04 | it's not believing clear that this matters but | 
|---|
| 0:28:07 | uh | 
|---|
| 0:28:08 | we just felt that it was that we should do the best we could on | 
|---|
| 0:28:12 | uh | 
|---|
| 0:28:13 | they're mechanisms would sending these youth hmms into fsts because | 
|---|
| 0:28:17 | all of the training decoding is fst basically kind of have to have an fst representation of these | 
|---|
| 0:28:24 | uh | 
|---|
| 0:28:25 | it's is something that we touched on a a | 
|---|
| 0:28:28 | and i are F S T so what you would normally imagine is that the F it has input symbols | 
|---|
| 0:28:32 | that are the | 
|---|
| 0:28:33 | the pdf so some symbol the represents the P D and the output symbols of the word | 
|---|
| 0:28:39 | but the problem with that is let's suppose you uh | 
|---|
| 0:28:43 | you want to find out what the phone sequence | 
|---|
| 0:28:45 | it's all well well and good if your | 
|---|
| 0:28:48 | if if your phone had separate tree | 
|---|
| 0:28:51 | so that so that it was could for each state which phone it belong | 
|---|
| 0:28:55 | but but what if you had a larger phone set and you wanted to have a shared tree room | 
|---|
| 0:28:59 | and that wasn't you know one to one mappings | 
|---|
| 0:29:01 | oh there was in the mapping you need so | 
|---|
| 0:29:04 | so we have a input labels on the fsts the encoded bit more information | 
|---|
| 0:29:08 | uh | 
|---|
| 0:29:10 | and this is also useful in training the transitions because | 
|---|
| 0:29:12 | sometimes just the pdf labels wouldn't you of you quite enough information | 
|---|
| 0:29:17 | the train the transition | 
|---|
| 0:29:20 | uh | 
|---|
| 0:29:21 | there's a couple of different ways to create decoding graphs | 
|---|
| 0:29:24 | for | 
|---|
| 0:29:24 | for uh training purposes you have to create a lot of these things at the same time | 
|---|
| 0:29:29 | and combining the fst algorithms using script | 
|---|
| 0:29:32 | would be quite inefficient because you have the overhead of process creation | 
|---|
| 0:29:37 | so | 
|---|
| 0:29:37 | we uh | 
|---|
| 0:29:39 | we call the openfst algorithms of the C plus plus level combine them together | 
|---|
| 0:29:43 | so that uh | 
|---|
| 0:29:45 | you can create your decoding graphs for | 
|---|
| 0:29:47 | training | 
|---|
| 0:29:49 | uh | 
|---|
| 0:29:50 | and and we typically put them in one of these archive | 
|---|
| 0:29:54 | like basically a big file concatenated together with little keys in it | 
|---|
| 0:29:58 | on disk | 
|---|
| 0:29:59 | so that you don't have the I O of | 
|---|
| 0:30:01 | accessing hundreds of little file | 
|---|
| 0:30:03 | training use of the viterbi path | 
|---|
| 0:30:05 | these graphs | 
|---|
| 0:30:07 | uh | 
|---|
| 0:30:08 | for test time | 
|---|
| 0:30:09 | we we we didn't we didn't use this approach of C plus plus because it there's just no point | 
|---|
| 0:30:14 | we uh | 
|---|
| 0:30:16 | it's basically scripts and i'm gonna goes wannabe scripts later for those words | 
|---|
| 0:30:21 | um | 
|---|
| 0:30:23 | that's the least scripts that create the decoding graph recalls some openfst tools but some of our own | 
|---|
| 0:30:28 | and that relates partly to a difference in recipes | 
|---|
| 0:30:31 | but uh | 
|---|
| 0:30:32 | i'll talk more about later | 
|---|
| 0:30:34 | after great | 
|---|
| 0:30:36 | so | 
|---|
| 0:30:37 | and i was gonna talk later about some of the acoustic modeling co | 
|---|
| 0:30:41 | i'm just gonna give a brief summary | 
|---|
| 0:30:43 | are gmm code is | 
|---|
| 0:30:45 | it's very simple it's not part of some big framework | 
|---|
| 0:30:47 | it kind of but like an | 
|---|
| 0:30:49 | and object that has you know the means the variances | 
|---|
| 0:30:52 | it can evaluate like it's for you give it the feature | 
|---|
| 0:30:55 | but it doesn't | 
|---|
| 0:30:56 | and her from some | 
|---|
| 0:30:58 | generic acoustic model class and it doesn't at ten | 
|---|
| 0:31:01 | that's a kind of know about things like linear a it just sits there | 
|---|
| 0:31:04 | and and things like we a transform | 
|---|
| 0:31:07 | they kind of have to access the model and do what they want | 
|---|
| 0:31:10 | the the reason for that is that if | 
|---|
| 0:31:12 | the gmm knows too much | 
|---|
| 0:31:14 | them whatever you do that's fancy | 
|---|
| 0:31:16 | you have to then change the gmm code | 
|---|
| 0:31:19 | and it just | 
|---|
| 0:31:21 | it's is not my situation | 
|---|
| 0:31:24 | so uh | 
|---|
| 0:31:26 | yeah we have a separate class for gmm stats accumulation | 
|---|
| 0:31:29 | and doing that they | 
|---|
| 0:31:31 | so | 
|---|
| 0:31:32 | for for a collection of gmms like an gmm gmm system | 
|---|
| 0:31:36 | we have a class that pretty much behave similar to a vector a G M at | 
|---|
| 0:31:41 | so it's | 
|---|
| 0:31:42 | it's a fairly simple thing | 
|---|
| 0:31:43 | there's no notion of name of a state that is just an integer | 
|---|
| 0:31:47 | and then really we've avoided having | 
|---|
| 0:31:50 | like names and names for things in the co | 
|---|
| 0:31:52 | exit | 
|---|
| 0:31:53 | jurors | 
|---|
| 0:31:54 | uh_huh | 
|---|
| 0:31:57 | oh this this low case vector just refer to the S T L vector | 
|---|
| 0:32:01 | but there is an upper case vector to that | 
|---|
| 0:32:03 | but does something in a matrix like | 
|---|
| 0:32:06 | i | 
|---|
| 0:32:08 | well the code is never been case in that as the code we | 
|---|
| 0:32:12 | i i even on windows | 
|---|
| 0:32:14 | uh | 
|---|
| 0:32:15 | i | 
|---|
| 0:32:19 | yeah | 
|---|
| 0:32:20 | okay | 
|---|
| 0:32:21 | we've got quite a lot of linear transform coder | 
|---|
| 0:32:24 | uh | 
|---|
| 0:32:26 | lda hate lda | 
|---|
| 0:32:28 | again and fitting on the fence with regard the naming of this technique | 
|---|
| 0:32:32 | i don't wanna and anyway | 
|---|
| 0:32:33 | i | 
|---|
| 0:32:35 | uh | 
|---|
| 0:32:36 | another these multi name okay | 
|---|
| 0:32:38 | uh olympia version of each other i mean we tried regular vtln is | 
|---|
| 0:32:42 | yeah everyone knows that it's kind of tricky to get it to work | 
|---|
| 0:32:45 | it was that you'll anyone that worked better in the N | 
|---|
| 0:32:47 | uh it is something new that | 
|---|
| 0:32:50 | it's a kind of a replacement for vtln that what's a little bit better | 
|---|
| 0:32:53 | i gonna | 
|---|
| 0:32:54 | explain what it is uh at a later date | 
|---|
| 0:32:57 | mllr | 
|---|
| 0:32:58 | uh | 
|---|
| 0:32:59 | a lot of this | 
|---|
| 0:32:59 | so | 
|---|
| 0:33:02 | one this transform the global the with the way we handle them as well | 
|---|
| 0:33:06 | it just becomes part of the feature space | 
|---|
| 0:33:09 | so it's just | 
|---|
| 0:33:09 | start of the matrix on disk and this | 
|---|
| 0:33:12 | use a lot of plight so the way it actually works is that this matrix | 
|---|
| 0:33:15 | is multiplied by the feature as part of a high | 
|---|
| 0:33:18 | my seem like you're right obviously there is silly way to do it from a computational point of view but | 
|---|
| 0:33:23 | it just makes the scripts really convenient | 
|---|
| 0:33:25 | to uh | 
|---|
| 0:33:26 | do | 
|---|
| 0:33:27 | uh | 
|---|
| 0:33:29 | yeah so when i say they're applied in a unified way what what i mean is that the co the | 
|---|
| 0:33:33 | estimates any of these transforms | 
|---|
| 0:33:35 | there really outputs just to make trick | 
|---|
| 0:33:37 | so uh | 
|---|
| 0:33:38 | you know there's no like | 
|---|
| 0:33:40 | and some a lot transform J | 
|---|
| 0:33:43 | that's just | 
|---|
| 0:33:44 | well okay yeah there is so for the uh regression tree one | 
|---|
| 0:33:47 | i | 
|---|
| 0:33:48 | but for but for the global one it's just it's just a matrix | 
|---|
| 0:33:52 | i | 
|---|
| 0:33:52 | i mean that's with the point of contention among as that to whether to do it this way | 
|---|
| 0:33:56 | but uh | 
|---|
| 0:33:57 | some of a style that it was important to keep the simple case is simple | 
|---|
| 0:34:01 | and to it to avoid having a | 
|---|
| 0:34:03 | a framework | 
|---|
| 0:34:04 | for the cases one was an S | 
|---|
| 0:34:07 | uh | 
|---|
| 0:34:10 | okay decoders | 
|---|
| 0:34:11 | well of the decoders that we currently have use | 
|---|
| 0:34:14 | fully expanded F S is one i mean when i say for the expanded i mean is down to that | 
|---|
| 0:34:18 | H M and state level with | 
|---|
| 0:34:20 | so loops represented as uh | 
|---|
| 0:34:23 | actual you know if sdr | 
|---|
| 0:34:26 | i know there's a lot of way to do this and initially | 
|---|
| 0:34:28 | one of the thoughts we had | 
|---|
| 0:34:29 | would be that | 
|---|
| 0:34:31 | you know we wouldn't have the self loop so we might even have | 
|---|
| 0:34:34 | representations of the states the and then it was just so much simpler to do it this way | 
|---|
| 0:34:38 | this is what we have now | 
|---|
| 0:34:41 | we have three decoders but by decoder we mean they uh | 
|---|
| 0:34:44 | C plus plus code that does decoding | 
|---|
| 0:34:47 | it's not necessarily the same thing as a command line decoding | 
|---|
| 0:34:50 | we have three decoders on the spectrum simple too fast | 
|---|
| 0:34:53 | and the reason for this is that | 
|---|
| 0:34:54 | once you have a complicated fast decoder is almost impossible to the to debug | 
|---|
| 0:34:59 | so if something goes wrong you can always just one the simple one | 
|---|
| 0:35:02 | you know and you can find out if it's a decoder issue | 
|---|
| 0:35:06 | uh | 
|---|
| 0:35:07 | decoded | 
|---|
| 0:35:08 | we wanted to make it so the decoder doesn't as you too much about what you're model model selection | 
|---|
| 0:35:13 | so it again decoder has no idea of gmm hmms it doesn't even know about features | 
|---|
| 0:35:18 | all that | 
|---|
| 0:35:20 | all the decoder knows about is give me the likelihood or | 
|---|
| 0:35:24 | score level | 
|---|
| 0:35:25 | for this | 
|---|
| 0:35:26 | uh frame index | 
|---|
| 0:35:28 | and this pdf in that | 
|---|
| 0:35:30 | so it so interface that the decoder seizes is almost like a matrix | 
|---|
| 0:35:35 | the matrix of uh | 
|---|
| 0:35:37 | of floats but i'm is is not represented that way because you want to | 
|---|
| 0:35:41 | you know you want to have it on them on | 
|---|
| 0:35:43 | i | 
|---|
| 0:35:45 | so yeah this is the decodable interface an interface that the | 
|---|
| 0:35:49 | it's a very simple interface that says give me the likelihood for this you know time in this frame and | 
|---|
| 0:35:54 | like | 
|---|
| 0:35:54 | how many time frames are the | 
|---|
| 0:35:57 | and how many pdf index is that that's almost all the interfaces | 
|---|
| 0:36:01 | but this this is the interface at the decoder requires so the idea was to implement you know | 
|---|
| 0:36:06 | L fantastic a model | 
|---|
| 0:36:08 | and you | 
|---|
| 0:36:09 | uh | 
|---|
| 0:36:10 | i | 
|---|
| 0:36:11 | in in a very matter what interface of that model is | 
|---|
| 0:36:13 | you create a small object that satisfies the decodable interface | 
|---|
| 0:36:17 | and knows how to get the likelihoods from your and L fantastical model | 
|---|
| 0:36:21 | and then you uh | 
|---|
| 0:36:23 | you instantiate the decoder with that are you give that | 
|---|
| 0:36:27 | so uh | 
|---|
| 0:36:31 | the gmm wrapping okay | 
|---|
| 0:36:34 | yeah so i come online decoding programs a very simple we don't have like multipath or anything | 
|---|
| 0:36:39 | we don't have uh | 
|---|
| 0:36:42 | we don't we don't know than to support multiple types of model | 
|---|
| 0:36:46 | an example decoding program is | 
|---|
| 0:36:48 | decode with the G M and | 
|---|
| 0:36:51 | but no | 
|---|
| 0:36:52 | with number multiple class adaptation | 
|---|
| 0:36:54 | yeah so does the simple thing | 
|---|
| 0:36:55 | and then if you want to support let's a multi-class | 
|---|
| 0:36:58 | mllr fmllr | 
|---|
| 0:37:00 | we uh have a separate come online prague | 
|---|
| 0:37:03 | yeah the idea is that | 
|---|
| 0:37:04 | there might be people coming into the project might want to be able to understand that come online program | 
|---|
| 0:37:09 | and we don't one that once a make the barrier to entry too high | 
|---|
| 0:37:12 | we got the | 
|---|
| 0:37:13 | support the overhead of having to maintain two parallel decoders | 
|---|
| 0:37:17 | keep it relatively simple to understand any given one | 
|---|
| 0:37:24 | uh | 
|---|
| 0:37:24 | we support the standard types of features | 
|---|
| 0:37:27 | mfcc and plp features are quite similar to | 
|---|
| 0:37:30 | K one | 
|---|
| 0:37:31 | we've | 
|---|
| 0:37:32 | we put in a reasonable range of configure ability but | 
|---|
| 0:37:35 | i mean being realistic with respect to how much people are really working on this stuff i mean i think | 
|---|
| 0:37:40 | most people are doing research on this would probably be coming out with their own features | 
|---|
| 0:37:44 | so we don't support every possible | 
|---|
| 0:37:46 | combination of it | 
|---|
| 0:37:47 | for every possible change | 
|---|
| 0:37:49 | we only we we dwell format because there i reasoning is | 
|---|
| 0:37:53 | your you can always it's find the external program to convert it and | 
|---|
| 0:37:57 | do it as part of a high | 
|---|
| 0:38:02 | sorry | 
|---|
| 0:38:07 | well we cannot htk and i won't from uh we don't there's no more that we support | 
|---|
| 0:38:13 | uh | 
|---|
| 0:38:15 | yeah | 
|---|
| 0:38:15 | i mean | 
|---|
| 0:38:16 | i i basic concept to have people use the system is | 
|---|
| 0:38:19 | as a complete system | 
|---|
| 0:38:21 | because once you start supporting model you know in a conversion just get work | 
|---|
| 0:38:26 | but yeah that's the he's tk K features as a as a special case | 
|---|
| 0:38:30 | uh | 
|---|
| 0:38:31 | we typically will right features another large objects to a single very large file of relates to this archive format | 
|---|
| 0:38:37 | so the form of the file as a key space then your object | 
|---|
| 0:38:41 | and another key a space that object | 
|---|
| 0:38:43 | and uh | 
|---|
| 0:38:45 | we have efficient mechanisms to read such files | 
|---|
| 0:38:48 | the the the two normal cases are firstly sequential access | 
|---|
| 0:38:51 | we want it's rate over the things an archive | 
|---|
| 0:38:54 | exactly random access and the the different ways to do that one is | 
|---|
| 0:38:58 | you can write a separate file that has little | 
|---|
| 0:39:00 | point doesn't of the file | 
|---|
| 0:39:02 | another is that | 
|---|
| 0:39:03 | you can kind of simulate random access even though you're really going sequentially | 
|---|
| 0:39:07 | if you know that the keys are sorted | 
|---|
| 0:39:10 | uh and another way is if the file isn't isn't that big | 
|---|
| 0:39:13 | you can do random access by just having the code go through the whole file | 
|---|
| 0:39:18 | stalled objects and memory | 
|---|
| 0:39:19 | that's not just scalable but | 
|---|
| 0:39:21 | for for a lot of uh | 
|---|
| 0:39:23 | types of all kinds it really doesn't matter | 
|---|
| 0:39:27 | oh yeah so the feature | 
|---|
| 0:39:29 | feature level processing like adding deltas that from a lot | 
|---|
| 0:39:32 | typically each one of those the separate program so you have like a sequence of programs and apply | 
|---|
| 0:39:37 | and again that's a bit inefficient but | 
|---|
| 0:39:39 | it's not like it's really consuming more than ten percent of your C P U so | 
|---|
| 0:39:43 | you just don't care that much this has been written with | 
|---|
| 0:39:46 | ease of use in my | 
|---|
| 0:39:49 | uh | 
|---|
| 0:39:50 | like i said there's a lot of command line tools this is an example of uh | 
|---|
| 0:39:54 | a command line and this backslashes of this | 
|---|
| 0:39:57 | the cell | 
|---|
| 0:39:58 | so uh | 
|---|
| 0:40:01 | this this is one of the many programs | 
|---|
| 0:40:03 | the plp would be a separate command line | 
|---|
| 0:40:06 | this is just you know | 
|---|
| 0:40:07 | an option | 
|---|
| 0:40:08 | either the two command line arguments in this uh | 
|---|
| 0:40:11 | i gonna be explaining later on or about what these mean with this | 
|---|
| 0:40:14 | directed to write these things to it | 
|---|
| 0:40:16 | and archive on the | 
|---|
| 0:40:18 | a key object key object | 
|---|
| 0:40:21 | and then | 
|---|
| 0:40:22 | i don't know this is the input | 
|---|
| 0:40:23 | we have to read it | 
|---|
| 0:40:25 | and then this is telling it to write an archive and also | 
|---|
| 0:40:28 | and i C P file that | 
|---|
| 0:40:30 | kind of has little pointers into the okay | 
|---|
| 0:40:32 | so that you can efficiently access the features by random access | 
|---|
| 0:40:36 | um um | 
|---|
| 0:40:39 | so | 
|---|
| 0:40:39 | so yeah another example of another feature of this is that as only one option here we we we have | 
|---|
| 0:40:44 | no more than a few options on any given come on | 
|---|
| 0:40:47 | i mean it's a local program i support | 
|---|
| 0:40:49 | less the channel | 
|---|
| 0:40:51 | it's not it's not a very can different to at is more driven by how you combine these grow | 
|---|
| 0:40:57 | a | 
|---|
| 0:40:58 | oh you something else about this whole archive a uh formalism is that | 
|---|
| 0:41:02 | this C plus plus level code in the individual come line tools | 
|---|
| 0:41:06 | we doesn't have have to worry too much about high uh | 
|---|
| 0:41:10 | you can just treat | 
|---|
| 0:41:11 | the uh | 
|---|
| 0:41:13 | when to get something like this there's | 
|---|
| 0:41:15 | there's very short uh | 
|---|
| 0:41:17 | statements in the C plus plus that will it's a rate over a | 
|---|
| 0:41:20 | stuff | 
|---|
| 0:41:20 | so it doesn't have the | 
|---|
| 0:41:22 | think too much about the error conditions | 
|---|
| 0:41:26 | but yep | 
|---|
| 0:41:32 | fst festive generation | 
|---|
| 0:41:35 | okay that as another part of the talk later on | 
|---|
| 0:41:44 | well | 
|---|
| 0:41:45 | for training | 
|---|
| 0:41:47 | there's there's a command line program that will | 
|---|
| 0:41:49 | kind of do the fst generation for you and generate lots of the left S to use one for each | 
|---|
| 0:41:53 | file | 
|---|
| 0:41:54 | yeah so for testing | 
|---|
| 0:41:56 | it's it's a script the calls a fist openfst programs an our versions of openfst for | 
|---|
| 0:42:03 | so | 
|---|
| 0:42:05 | i'm gonna go through that script later one and another part | 
|---|
| 0:42:07 | top | 
|---|
| 0:42:09 | a a are you this decide this is not obvious you know a lot stand the script | 
|---|
| 0:42:13 | but this is just to get people some idea | 
|---|
| 0:42:16 | oh of uh | 
|---|
| 0:42:17 | of how we do do training | 
|---|
| 0:42:19 | so you know this is the bashed script it's doing a loop over the iterations | 
|---|
| 0:42:24 | uh and this one is estimating ml mllt up | 
|---|
| 0:42:29 | i suppose this script review the bias and sorry man i | 
|---|
| 0:42:33 | but as that we are is the colour i've yet | 
|---|
| 0:42:35 | so a | 
|---|
| 0:42:38 | so if it's that one of iterations that we do a lot C | 
|---|
| 0:42:42 | then uh | 
|---|
| 0:42:44 | so we have on disk | 
|---|
| 0:42:45 | some uh alignment this is like steak level alignment | 
|---|
| 0:42:49 | it's in a mark at i've | 
|---|
| 0:42:51 | from my that i mentioned | 
|---|
| 0:42:52 | so this converts them to posteriors | 
|---|
| 0:42:54 | just an average of trivial way by thing that each | 
|---|
| 0:42:57 | each one has a posterior of one | 
|---|
| 0:43:00 | this takes the this | 
|---|
| 0:43:01 | this gives a zero weight to the file and | 
|---|
| 0:43:03 | that's would be a | 
|---|
| 0:43:04 | this would be a variable and by | 
|---|
| 0:43:07 | uh | 
|---|
| 0:43:08 | yeah so this takes away the uh | 
|---|
| 0:43:10 | you the silence is there a posterior | 
|---|
| 0:43:12 | and this is an accumulation program | 
|---|
| 0:43:14 | that uh | 
|---|
| 0:43:16 | this would be the model that's the thought fit of the features as the abashed variable that would be | 
|---|
| 0:43:21 | that elsewhere where | 
|---|
| 0:43:23 | uh this | 
|---|
| 0:43:24 | a a hmmm | 
|---|
| 0:43:25 | i think that's refers to the standard input | 
|---|
| 0:43:28 | me that's reading an our cat from the standard input and that | 
|---|
| 0:43:30 | output by the | 
|---|
| 0:43:32 | you this | 
|---|
| 0:43:32 | mean that's writing an archive to standard it out but | 
|---|
| 0:43:35 | so | 
|---|
| 0:43:35 | yeah yeah output of these programs is passed by up pi | 
|---|
| 0:43:41 | uh | 
|---|
| 0:43:42 | all all of the error and logging out but goes to the standard error uh | 
|---|
| 0:43:45 | because we've kind of used with that it out but for this type stuff | 
|---|
| 0:43:49 | so | 
|---|
| 0:43:50 | so we just directing the logging up | 
|---|
| 0:43:52 | there | 
|---|
| 0:43:53 | so then this is a separate program that does the mllt the estimation | 
|---|
| 0:43:58 | it takes in uh | 
|---|
| 0:43:59 | let me see | 
|---|
| 0:44:01 | uh it's it's it's computing some kind of make | 
|---|
| 0:44:04 | and then uh | 
|---|
| 0:44:06 | because then am a lot T yeah | 
|---|
| 0:44:08 | what i i have to you can the transform | 
|---|
| 0:44:10 | you have to change the means of your model so | 
|---|
| 0:44:13 | we have a separate we like to get everything separate | 
|---|
| 0:44:16 | so you know transforming the me the separate operations so we have a separate program for that | 
|---|
| 0:44:21 | and then | 
|---|
| 0:44:22 | we have to compose the L B M T transform with the previous one | 
|---|
| 0:44:26 | so this is another will program that does that | 
|---|
| 0:44:29 | so this with was setting another bash variable able to make | 
|---|
| 0:44:32 | the ah features correspond now to the | 
|---|
| 0:44:35 | new ml L you a melody features | 
|---|
| 0:44:38 | so | 
|---|
| 0:44:40 | so as you can see that this is the very and bash | 
|---|
| 0:44:43 | and it's | 
|---|
| 0:44:43 | this would be passed as a command line arguments to one of the program | 
|---|
| 0:44:47 | and it's a command involving a pie that actually vol | 
|---|
| 0:44:51 | calling to separate cal be uh | 
|---|
| 0:44:54 | program | 
|---|
| 0:44:55 | each for their own argument | 
|---|
| 0:44:57 | so obvious you can guess from the names of those programs what they're doing | 
|---|
| 0:45:01 | and then of "'cause" uh it seems to have features sub | 
|---|
| 0:45:04 | oh yeah i think we were estimating the ml T on a subset of features | 
|---|
| 0:45:08 | so this is like the same as this but it's the | 
|---|
| 0:45:10 | it's using less | 
|---|
| 0:45:12 | the data | 
|---|
| 0:45:15 | so i think i | 
|---|
| 0:45:17 | i spoke about these issues but for | 
|---|
| 0:45:21 | oh yeah so uh | 
|---|
| 0:45:24 | we had example scripts results management and was to general and these run from the ldc | 
|---|
| 0:45:29 | distributed this | 
|---|
| 0:45:31 | uh now we found in the literature just some uh | 
|---|
| 0:45:35 | some some uh baseline | 
|---|
| 0:45:37 | these numbers are numbers are just the basic context system | 
|---|
| 0:45:42 | with i think uh mean normalization | 
|---|
| 0:45:44 | we have of course more advanced things but | 
|---|
| 0:45:47 | those you know because it had to find in the literature the same thing | 
|---|
| 0:45:50 | we just giving you the unadapted adapted | 
|---|
| 0:45:53 | so it's a | 
|---|
| 0:45:54 | slightly better than this number will can right someone a two thousand | 
|---|
| 0:45:58 | and that the hates you K paper from ninety four | 
|---|
| 0:46:01 | a has a funny but a number for this was the gender dependent system | 
|---|
| 0:46:05 | so uh | 
|---|
| 0:46:05 | so i think basically would doing the same as | 
|---|
| 0:46:08 | you expect given the same out | 
|---|
| 0:46:11 | i mean | 
|---|
| 0:46:12 | uh | 
|---|
| 0:46:13 | i was hoping you know the set of this help project that the results would be but uh | 
|---|
| 0:46:17 | for issues relating to the tree in can phone and stuff but | 
|---|
| 0:46:20 | you know that in we give a senate | 
|---|
| 0:46:22 | so | 
|---|
| 0:46:22 | it it's working there's no major but | 
|---|
| 0:46:26 | uh did of the | 
|---|
| 0:46:28 | okay next slide | 
|---|
| 0:46:32 | uh just the not on speed and coding is used | 
|---|
| 0:46:35 | use a bigram numbers and the "'cause" the baseline we'll bigram numbers | 
|---|
| 0:46:38 | we can't yeah yeah code with the full | 
|---|
| 0:46:41 | with the full uh trigram language model that | 
|---|
| 0:46:43 | distributed with the wall street journal corpus | 
|---|
| 0:46:46 | because the fsts uh | 
|---|
| 0:46:48 | they get to large | 
|---|
| 0:46:50 | we have a "'cause" to with pruned track | 
|---|
| 0:46:52 | but that's why we're coding the bigram numbers | 
|---|
| 0:46:54 | so | 
|---|
| 0:46:56 | hopefully by the sum we gonna | 
|---|
| 0:46:58 | as the couple of things we can do that we both working on one is to have a decoder that | 
|---|
| 0:47:01 | does some kind of on the fly | 
|---|
| 0:47:03 | pensions so that we can uh | 
|---|
| 0:47:05 | the code directly with that | 
|---|
| 0:47:07 | and the other to have a just generation so we can we score | 
|---|
| 0:47:11 | the decoding speed is for these was to just don't numbers is about twice as fast as real | 
|---|
| 0:47:16 | and a "'cause" that's on a good machine | 
|---|
| 0:47:18 | so i mean this is june so that you don't get more than zero point one degradation from | 
|---|
| 0:47:23 | versus a white B | 
|---|
| 0:47:26 | a the wall street journal script takes a few hours on a single machine using | 
|---|
| 0:47:30 | we problem lies on to three C be used | 
|---|
| 0:47:32 | this is just an example script we didn't want to include things like you serve in the example script | 
|---|
| 0:47:37 | because then it wouldn't run on uh everyone's machine | 
|---|
| 0:47:40 | the was they would be fast if you were doing a parallel | 
|---|
| 0:47:44 | yeah | 
|---|
| 0:47:47 | uh_huh | 
|---|
| 0:47:54 | if it in member it well as well | 
|---|
| 0:47:56 | well | 
|---|
| 0:47:58 | but ten gig | 
|---|
| 0:47:59 | i i mean | 
|---|
| 0:48:00 | i S you know everyone knows that F is T compilation tend to up a bit | 
|---|
| 0:48:04 | it's not like | 
|---|
| 0:48:06 | if you have the size of the model you can just about compiler | 
|---|
| 0:48:12 | i i don't recall that it's a trigram one for most journal | 
|---|
| 0:48:15 | i i | 
|---|
| 0:48:17 | and then we go how many was but i think | 
|---|
| 0:48:19 | i don't think that the our stuff is any worse than you know normal if T | 
|---|
| 0:48:23 | that ups that fully expand of thing | 
|---|
| 0:48:27 | oh yeah okay results management | 
|---|
| 0:48:29 | this is a | 
|---|
| 0:48:31 | use she came results | 
|---|
| 0:48:33 | or take an uh | 
|---|
| 0:48:35 | from uh | 
|---|
| 0:48:36 | this is i think this is basically the hey K are each K the be but he's real us to | 
|---|
| 0:48:40 | take it from a paper of mine like in ninety nine or something | 
|---|
| 0:48:43 | "'cause" | 
|---|
| 0:48:44 | i just couldn't find in the read me file from are C K on all of the test | 
|---|
| 0:48:49 | and the average as you can see the average is the same | 
|---|
| 0:48:52 | so | 
|---|
| 0:48:53 | with the same algorithms are getting the same result as H | 
|---|
| 0:48:55 | okay | 
|---|
| 0:48:56 | uh | 
|---|
| 0:48:59 | yeah and it and the decoding we on the setup is about zero point one times real | 
|---|
| 0:49:09 | yeah | 
|---|
| 0:49:09 | yeah | 
|---|
| 0:49:16 | yeah the test set are quite | 
|---|
| 0:49:20 | oh yeah | 
|---|
| 0:49:21 | is a very small test that a handful of words that are of | 
|---|
| 0:49:25 | uh this is this page is mainly | 
|---|
| 0:49:27 | just to give you some idea of the kinds of things that are in our example scripts we have a | 
|---|
| 0:49:31 | bunch of | 
|---|
| 0:49:32 | different configuration of this of the standard configuration | 
|---|
| 0:49:35 | well this is the standard configuration because this is what within the htk baseline line | 
|---|
| 0:49:40 | uh | 
|---|
| 0:49:41 | um | 
|---|
| 0:49:42 | adding M L T doesn't we seem the hell | 
|---|
| 0:49:45 | sorry adding is T | 
|---|
| 0:49:46 | see | 
|---|
| 0:49:47 | as they we the hell | 
|---|
| 0:49:48 | i | 
|---|
| 0:49:51 | a a a a a it's well i think nine frames plus lda that you makes it worse but then | 
|---|
| 0:49:56 | when you do | 
|---|
| 0:49:57 | uh F T C on top of that | 
|---|
| 0:50:00 | if that you better than here and so that this was the kind of this was that I B M | 
|---|
| 0:50:04 | recipe P | 
|---|
| 0:50:05 | so | 
|---|
| 0:50:07 | sorry this with I B M is to be so i i guess that must been some interaction between these | 
|---|
| 0:50:11 | two parts of the recipe | 
|---|
| 0:50:13 | that somehow made it work | 
|---|
| 0:50:14 | i i don't know if it's a generalized to other trade other test set | 
|---|
| 0:50:17 | we gonna find out | 
|---|
| 0:50:19 | uh that's placed nine frames plus hlda | 
|---|
| 0:50:22 | triple deltas plus hlda | 
|---|
| 0:50:24 | triple deltas plus | 
|---|
| 0:50:26 | lda D A plus a lot C | 
|---|
| 0:50:28 | this this | 
|---|
| 0:50:29 | quite good | 
|---|
| 0:50:30 | uh sgmm cyst these are all and adaptive | 
|---|
| 0:50:33 | have a separate slide for uh adapted exp | 
|---|
| 0:50:37 | um | 
|---|
| 0:50:39 | if is doing it | 
|---|
| 0:50:41 | and that's | 
|---|
| 0:50:41 | it's stated otherwise that oh yeah okay so this is but utterance adaptation this is per speaker | 
|---|
| 0:50:47 | so | 
|---|
| 0:50:48 | this was four point five my and before uh | 
|---|
| 0:50:51 | adaptation so it really doesn't help if you do it but i'd sir rights and that's because this too many | 
|---|
| 0:50:55 | parameters | 
|---|
| 0:50:57 | in to model a | 
|---|
| 0:50:58 | this is doing the same thing per speaker gets a lot but uh | 
|---|
| 0:51:01 | its exponential transform is | 
|---|
| 0:51:03 | again i'm not gonna describe what it is is something vtln one | 
|---|
| 0:51:07 | uh and it gets quite a bit but uh | 
|---|
| 0:51:09 | this is a this vtln and of the kind of many a version of vtln i believe | 
|---|
| 0:51:14 | it is that thing to improve quite a lot | 
|---|
| 0:51:16 | and of got improvement is more pronounced on the per utterance level because | 
|---|
| 0:51:20 | uh | 
|---|
| 0:51:21 | in know it it's just like a constrained form of a from a loss of the only point is | 
|---|
| 0:51:26 | to do it | 
|---|
| 0:51:27 | to do when you have less they | 
|---|
| 0:51:29 | uh | 
|---|
| 0:51:30 | splice nine frames for cell day sex to transform | 
|---|
| 0:51:34 | a from well thing i from a lot | 
|---|
| 0:51:36 | we only did some of these put speaker because it wouldn't help of the | 
|---|
| 0:51:39 | uh | 
|---|
| 0:51:41 | as you can see that the well of different combinations this is as gmm including the | 
|---|
| 0:51:45 | speaker offsets sets the and thumbs if you member | 
|---|
| 0:51:48 | and it does help so | 
|---|
| 0:51:50 | so uh i think rick was saying that that is wasn't working for him but it seems to be working | 
|---|
| 0:51:54 | for us | 
|---|
| 0:51:55 | three point one five goes to uh | 
|---|
| 0:51:59 | where is it to point six eight | 
|---|
| 0:52:01 | i i must of uh forgot to fill this line and | 
|---|
| 0:52:04 | it's is that's gmm plus a from a la | 
|---|
| 0:52:07 | but no speaker vectors | 
|---|
| 0:52:09 | yeah | 
|---|
| 0:52:10 | a per speaker | 
|---|
| 0:52:12 | yeah | 
|---|
| 0:52:13 | i think i have these numbers but i think i must not put in i think a best number was | 
|---|
| 0:52:17 | like to point for | 
|---|
| 0:52:20 | point three | 
|---|
| 0:52:21 | uh | 
|---|
| 0:52:23 | so general plug for cal | 
|---|
| 0:52:26 | uh | 
|---|
| 0:52:27 | i believe it easy to use i mean i have the scripts didn't scale you guys up as if you | 
|---|
| 0:52:31 | traction is that once you understand them | 
|---|
| 0:52:34 | everything becomes quite simple | 
|---|
| 0:52:36 | but | 
|---|
| 0:52:37 | it kind of does that you that the sound has speech works like if you some under who does | 
|---|
| 0:52:42 | is randomly | 
|---|
| 0:52:43 | moving the script you know changing configurations of | 
|---|
| 0:52:45 | the | 
|---|
| 0:52:46 | you're not uh | 
|---|
| 0:52:47 | it's not gonna work | 
|---|
| 0:52:48 | it it doesn't like | 
|---|
| 0:52:50 | it doesn't or to magically know that the features you have a not combat your model | 
|---|
| 0:52:56 | so so you can have to know what you doing from a speech science point of view | 
|---|
| 0:53:00 | but | 
|---|
| 0:53:01 | it's quite uh | 
|---|
| 0:53:02 | it's easy to use that the C plus plus | 
|---|
| 0:53:04 | flash to | 
|---|
| 0:53:06 | software engineer | 
|---|
| 0:53:08 | uh | 
|---|
| 0:53:08 | it's easy to extend and modify | 
|---|
| 0:53:10 | you can reduce should be go changes are give them back to | 
|---|
| 0:53:13 | the cal group | 
|---|
| 0:53:15 | uh | 
|---|
| 0:53:15 | we open to including other people's | 
|---|
| 0:53:17 | stuff | 
|---|
| 0:53:18 | so that may give you most citation | 
|---|
| 0:53:21 | so this | 
|---|
| 0:53:21 | is i really | 
|---|
| 0:53:23 | the and the this first part so | 
|---|
| 0:53:26 | you can get up and have a drink and after a few minutes | 
|---|
| 0:53:29 | well | 
|---|
| 0:53:32 | yeah has documentation cal D duck source forge dot net | 
|---|
| 0:53:36 | uh uh okay if is not as good as H K and probably being realistic will never be | 
|---|
| 0:53:41 | what will do | 
|---|
| 0:53:42 | is will | 
|---|
| 0:53:43 | of able lies the F but the he's to K has use and point people to the he's K documentation | 
|---|
| 0:53:48 | so then about eight and say that have he had then | 
|---|
| 0:53:51 | yeah i know me | 
|---|
| 0:53:52 | i | 
|---|
| 0:53:54 | i | 
|---|
| 0:53:55 | i | 
|---|
| 0:53:55 | i use C | 
|---|
| 0:53:58 | see i | 
|---|
| 0:53:59 | i | 
|---|
| 0:54:02 | but okay | 
|---|
| 0:54:03 | we can have a shot rate you we can have a drink | 
|---|
| 0:54:06 | and just a pair you're not in uh | 
|---|
| 0:54:09 | that committed to it | 
|---|
| 0:54:10 | and then | 
|---|
| 0:54:12 | uh uh we've have a gonna talk up to what | 
|---|
| 0:54:14 | the fact | 
|---|