0:00:15 | i mary harper and |
---|
0:00:18 | in october two thousand ten i went i refer to develop this program a babel |
---|
0:00:25 | i |
---|
0:00:26 | and i say i think babel there's lots of ways to pronounce babel |
---|
0:00:32 | and you can actually go to this website and find out about all the way |
---|
0:00:38 | of saying a lot of people like this example |
---|
0:00:41 | for me about is be a D V only and i drop in buffalo new |
---|
0:00:44 | york were that was by |
---|
0:00:46 | dialectal variation of inclusion |
---|
0:00:48 | we want a taller are while so i see table |
---|
0:00:52 | and of course there's also the original hebrew word and a variety of other ways |
---|
0:00:56 | of pronouncing as well |
---|
0:00:58 | but morgan pointed out the but though |
---|
0:01:01 | now you say sounds but they didn't |
---|
0:01:03 | in for some reason |
---|
0:01:07 | okay so |
---|
0:01:08 | every program |
---|
0:01:10 | whether it's a darpa write our has sort of a back story you have to |
---|
0:01:14 | have a motivation |
---|
0:01:15 | that sort of an elevator speech so my challenges that |
---|
0:01:19 | you know you you're in a situation we're dealing with the crisis it might be |
---|
0:01:23 | they the |
---|
0:01:24 | it might be a need for example where you have to deal with a lot |
---|
0:01:29 | of |
---|
0:01:30 | i'll |
---|
0:01:30 | speech noisy speech in order to solve a crisis situation and you have thousands of |
---|
0:01:36 | hours in no time to listen you might have one or two people who could |
---|
0:01:40 | listen to it but you're certainly not gonna get through it and anytime the |
---|
0:01:43 | would be reasonable in order to help people |
---|
0:01:46 | and |
---|
0:01:47 | if you have no existing speech technology for that language you have problems |
---|
0:01:53 | but if you could rapidly develop that |
---|
0:01:55 | say in the day here to you actually might be able to do simple it |
---|
0:02:02 | it sort of addresses two gaps it's harder to build up the human capital in |
---|
0:02:06 | the language |
---|
0:02:07 | because it can take years and signal i don't we have one or two people |
---|
0:02:10 | global language and we see that even with just developing the resources that we don't |
---|
0:02:15 | have this |
---|
0:02:16 | this language capital |
---|
0:02:18 | and there's also a technology gap and so this slide was certainly |
---|
0:02:25 | down a number of years ago but certainly computer to the three hundred and ninety |
---|
0:02:29 | three million a printer ninety three languages that have a million speakers we touched very |
---|
0:02:34 | few |
---|
0:02:35 | right so we and we've only studied of is actually really and immediately i mean |
---|
0:02:42 | we study english all the time because it easy they're corpora and so one |
---|
0:02:46 | and it can take way too much time |
---|
0:02:49 | it months to years to build a new language especially if you have to transcribe |
---|
0:02:53 | the and the systems double developed for english don't always work well to other languages |
---|
0:02:59 | they can they can help with the bootstrap |
---|
0:03:02 | they certainly don't give you the kind of |
---|
0:03:04 | error rates that maybe someone might want to see |
---|
0:03:09 | so the basic idea underlying |
---|
0:03:12 | table we rather than just evaluate a word error because |
---|
0:03:17 | the director was very the image that she wanted to habitat |
---|
0:03:21 | have a real task that just transcription |
---|
0:03:24 | and so keyword search |
---|
0:03:26 | fortunately had have an evaluation in two thousand six |
---|
0:03:30 | and so we settled on at the keyword search task |
---|
0:03:33 | where the basic idea is you speech recognition |
---|
0:03:37 | or phone recognition or something the index |
---|
0:03:40 | the thousands of hours of audio and then you have some whale |
---|
0:03:44 | of putting in a query |
---|
0:03:46 | for babel we use orthographic queries and for those who are doing low resource they |
---|
0:03:51 | do other things with the data in order to basically accommodate to the fact that |
---|
0:03:56 | we |
---|
0:03:57 | we use orthographic queries and then we evaluate |
---|
0:04:00 | a weather about the got the keeper correctly identified from the audio |
---|
0:04:06 | so |
---|
0:04:08 | our approach is really to work with a wide variety of languages and |
---|
0:04:13 | the more |
---|
0:04:14 | why you know so not just european languages it's it i think is really important |
---|
0:04:19 | to study things that have a wide variety of aspects |
---|
0:04:23 | in |
---|
0:04:25 | real recording conditions as much as possible i mean obviously collections are gonna suffer from |
---|
0:04:30 | the pair that |
---|
0:04:31 | you go into these countries you may not be able to |
---|
0:04:34 | record in a highly reverberant room or something |
---|
0:04:36 | but the hope is that you can get these sort of real world recording situations |
---|
0:04:43 | and then we |
---|
0:04:45 | constrain the resources in various ways and that we actually collect a lot of data |
---|
0:04:51 | but we actually create a wide variety of conditions for people to evaluate and |
---|
0:04:55 | and the actually can create conditions as well to answer questions that they think are |
---|
0:04:59 | important like getting by without lexicon |
---|
0:05:03 | so we gradually reduce the amount of brown from speech for dipping them in training |
---|
0:05:07 | but we also give them the audio |
---|
0:05:10 | untranscribed i we also reducing the amount of time that they have to basically evaluate |
---|
0:05:15 | the surprise language and i think that's critical |
---|
0:05:17 | and not starting off with something that's impossible at the outset we actually getting people |
---|
0:05:22 | to the point where they can develop the technology is extremely important in this |
---|
0:05:27 | and we set the targets to be |
---|
0:05:31 | sort of a three times improvement over which we get with phonetics or |
---|
0:05:35 | and i think that was critical and that was done based on the std or |
---|
0:05:38 | six |
---|
0:05:39 | dbn results in cantonese and mandarin and |
---|
0:05:42 | where they got a point three briefly about point three atwv so we said that |
---|
0:05:46 | as the target level |
---|
0:05:50 | so |
---|
0:05:51 | the goal is to improve speech technology with limited amounts of ground truth data only |
---|
0:05:56 | speech |
---|
0:05:57 | systems for an english |
---|
0:06:00 | language is extremely important |
---|
0:06:02 | i improving speech recognition through innovative use |
---|
0:06:05 | of the technology and different approaches |
---|
0:06:09 | at a wide variety of languages so that you can get fast development of keyword |
---|
0:06:14 | search systems to tackle this problem |
---|
0:06:20 | i just to give you a sense of the layout of the program |
---|
0:06:26 | other than the basic your which was they had a little bit one for the |
---|
0:06:29 | nine months because it was a fifteen month period |
---|
0:06:34 | they have roughly about nine months to work with data |
---|
0:06:37 | the columns |
---|
0:06:38 | not necessarily a day one |
---|
0:06:40 | and then the evaluation starts where they have one want to do the keyword search |
---|
0:06:45 | under practice languages so we evaluate everything |
---|
0:06:48 | is this really important to understand |
---|
0:06:51 | what progress is being made on the different languages causes languages are all different |
---|
0:06:55 | and then we actually three give them a surprise language were we give them the |
---|
0:07:00 | half of data and all talk about that little bit |
---|
0:07:03 | where they have a certain number of meets the builder system |
---|
0:07:06 | which decreases over the pure so in the basically related for weeks |
---|
0:07:11 | this period more in the option one period don't have three weeks |
---|
0:07:16 | and then they have one week to retrain their keyword results |
---|
0:07:20 | and |
---|
0:07:21 | give you might ask why one we there a lot of research or evaluation methods |
---|
0:07:26 | the people are trying out what keywords or so it is important to leave a |
---|
0:07:29 | sufficient amount of time there as well |
---|
0:07:32 | so they not sure that we're using to measure performance is the actual term weighted |
---|
0:07:36 | value which was developed by nist |
---|
0:07:39 | i think in coordination with a number of |
---|
0:07:43 | sponsors of that evaluation and it's kind of got i use case where |
---|
0:07:49 | you've got people who |
---|
0:07:51 | would like to be able to find stuff they don't tolerate a great number of |
---|
0:07:54 | false alarms so you wouldn't want to use of score the other thing is that |
---|
0:07:58 | rare terms given as a few nature of language |
---|
0:08:01 | and the fact that maybe rare terms may be very useful in terms of finding |
---|
0:08:05 | things that are critical |
---|
0:08:07 | i mean tsunami might be a very common thing in the in the traffic you're |
---|
0:08:12 | collecting but it may not have been there in your training data so you want |
---|
0:08:17 | to be able to find some dummy things in your in your audio for |
---|
0:08:22 | and |
---|
0:08:23 | what you have to realise is that it's are weighted it and it's basically evaluated |
---|
0:08:29 | over all terms regardless of the frequency of those terms |
---|
0:08:33 | right so a single ten house the same in the score is something that is |
---|
0:08:37 | highly frequent |
---|
0:08:39 | and then there's a number of things that |
---|
0:08:43 | this is that |
---|
0:08:44 | like that the C in the be value a by enlarge you've got this weighting |
---|
0:08:48 | a probability of false alarm |
---|
0:08:50 | the systems have very low probability of for of false alarms typically so those are |
---|
0:08:56 | really kind of little bit you can understand that it sort of a tradeoff between |
---|
0:09:00 | those two things |
---|
0:09:01 | but really missing something really doesn't a score when there are singleton |
---|
0:09:06 | so something you want to sort of in mind as you look at the results |
---|
0:09:09 | that i'm gonna go through |
---|
0:09:13 | so |
---|
0:09:15 | the babel program has a number of dimensions in terms of people were working |
---|
0:09:21 | obviously the program wouldn't exist without data and so i has been the data collector |
---|
0:09:26 | from day one i started i actually talked to them |
---|
0:09:29 | proposed a i went on the job about the notion of the data collection |
---|
0:09:35 | then and then |
---|
0:09:37 | we have the test an evaluation team that's what T N T stands for work |
---|
0:09:41 | and |
---|
0:09:42 | it's actually |
---|
0:09:44 | important to realise that you |
---|
0:09:46 | have nist two can run and evaluation miserable cindy the technological support to setup an |
---|
0:09:50 | evaluation |
---|
0:09:52 | and then approach that's like this |
---|
0:09:53 | so we actually have like hidden layers that actually build system so we can do |
---|
0:09:57 | forced alignments and things like that |
---|
0:10:00 | and then my work in some logistic stuff and then can also provides |
---|
0:10:04 | needed help with linguistics they help me what they advise me on a number of |
---|
0:10:09 | dimensions and certainly getting good phonetic coverage over the language and getting know how diversity |
---|
0:10:15 | languages |
---|
0:10:16 | it would be really card for me where i don't know well these languages actually |
---|
0:10:21 | do that and the other thing is there is sort of attaining between the T |
---|
0:10:25 | V gene and happen in order to ensure that the quality of the data is |
---|
0:10:30 | appropriate for the T ask that we're doing |
---|
0:10:33 | so keyword searches not something that happened had |
---|
0:10:36 | then sporting before and had or browse |
---|
0:10:40 | really doesn't make it very challenging to evaluate and keyword search so |
---|
0:10:45 | we actually do teen and offline i'll talk a little bit about it that's the |
---|
0:10:50 | one and then we have for teens |
---|
0:10:52 | where i put the primes on the left applied to C M U I B |
---|
0:10:56 | M X C and we bbn are the primes and you can see all |
---|
0:11:01 | all the people who participated in the base period sometimes there's some reconfiguration but this |
---|
0:11:06 | is the picture it was at the time of the base period so mobile technologies |
---|
0:11:10 | still in there |
---|
0:11:13 | so |
---|
0:11:14 | lots of work |
---|
0:11:16 | i think they're sixteen papers here that were supported by table i think of you |
---|
0:11:21 | go back you |
---|
0:11:22 | go back to |
---|
0:11:23 | icassp over the past couple years in interspeech |
---|
0:11:27 | i think they're probably hundred papers for so that have been sponsored by |
---|
0:11:32 | babel all with rate work |
---|
0:11:34 | i want to point out that is like go through things like that can i |
---|
0:11:37 | have time to touch on |
---|
0:11:38 | all the work for all the cool things the people are doing |
---|
0:11:41 | i'm just gonna point out something selecting |
---|
0:11:43 | or sort of interesting lessons word |
---|
0:11:46 | and you know |
---|
0:11:48 | there's a lot of other things the people are doing better |
---|
0:11:50 | quite interesting i'm also gonna point out |
---|
0:11:53 | how we change things for the option period |
---|
0:11:56 | and the kinds of things that look like they're |
---|
0:11:58 | really glimmering hopes i think |
---|
0:12:00 | so |
---|
0:12:01 | i'm not gonna ask that research is you'll be able see that |
---|
0:12:04 | and in future conference |
---|
0:12:07 | so the data collection is actually quite daunting |
---|
0:12:11 | we actually have we're collecting the data and four dollars |
---|
0:12:15 | where there are seven collected at a time |
---|
0:12:18 | and we only needed for practise languages and one surprise language for |
---|
0:12:23 | the base period we collected seven |
---|
0:12:25 | no was a good thing we did because what we can plan to help as |
---|
0:12:29 | the development language and a surprise language which would was a some is and by |
---|
0:12:32 | golly |
---|
0:12:33 | where are some is was supposed to be surprise |
---|
0:12:37 | things went wrong with the collection and so we basically had to use the other |
---|
0:12:41 | five languages so |
---|
0:12:44 | it is really important to be over collecting for your leads at a particular time |
---|
0:12:49 | the amount of time we spent collect seven languages |
---|
0:12:52 | given the fact that you stagger kick up |
---|
0:12:54 | is |
---|
0:12:55 | roughly two years |
---|
0:12:57 | right so you can see that there is like to your overlapped periods |
---|
0:13:02 | so it is really interesting right now we're working on sixteen getting ready just send |
---|
0:13:06 | funds for five so it there it really is this is sort of the critical |
---|
0:13:11 | period for basically making sure the rest of the of the program is going to |
---|
0:13:16 | play out but you can see there is an increasing number |
---|
0:13:19 | of languages in each period you subtract one for the surprise language and you can |
---|
0:13:25 | see how many |
---|
0:13:26 | being used for practise |
---|
0:13:35 | right so you can imagine by the time you hit the and of the program |
---|
0:13:39 | multilingual systems are gonna be really highly supportive really high supported |
---|
0:13:45 | we have a variety of criteria for selecting languages all talk about the little bit |
---|
0:13:49 | more on the next |
---|
0:13:52 | most of these are multi dialectal and they also represent a wide variety of recording |
---|
0:13:57 | conditions |
---|
0:13:58 | and starting in the option period we also started collecting |
---|
0:14:02 | microphone channel |
---|
0:14:04 | and |
---|
0:14:05 | they all the data include surprise environments are channels in the evaluation |
---|
0:14:10 | so |
---|
0:14:11 | there is always something that's next and it's not that for hundreds of the data |
---|
0:14:16 | but it there so people can assess whether they're methods are working |
---|
0:14:21 | and these things |
---|
0:14:22 | so we languages from a variety of language families with different features phonotactic morphological syntactic |
---|
0:14:30 | and so one whether there are stolen a the collected in country which i think |
---|
0:14:35 | is really important so you're living with a wide variety of telecommunications |
---|
0:14:40 | kind of sit |
---|
0:14:41 | situations others dialectal variation |
---|
0:14:44 | a wide variety of environments the easiest environment tends to be the home office one |
---|
0:14:49 | where there's a landline a mobile not always landline and some of these countries now |
---|
0:14:53 | so when line is disappearing in some of the collections we're doing now |
---|
0:14:57 | probably place three in the be able with the car click card it tends to |
---|
0:15:01 | be one of the or |
---|
0:15:03 | and then there's others |
---|
0:15:05 | obviously you want to have not telephone channel data in there as well |
---|
0:15:10 | and metadata valence |
---|
0:15:12 | we actually do |
---|
0:15:13 | provide the metadata |
---|
0:15:15 | with each of the for the files so that the collection could alternately be used |
---|
0:15:19 | to support dialect id room language id or other things |
---|
0:15:22 | right so |
---|
0:15:23 | you want to collect this data in such a way that it can be used |
---|
0:15:26 | for a variety of purposes |
---|
0:15:30 | we start off doing risk assessment obviously you don't wanna go into country where there's |
---|
0:15:37 | a likelihood that people will die when they're doing the collection so |
---|
0:15:41 | you have to you have to take that into consideration we also have to take |
---|
0:15:44 | into consideration whether or not |
---|
0:15:46 | there are |
---|
0:15:47 | is the potentially get transcribers |
---|
0:15:49 | and people who knows something about the language so all those things are certainly taken |
---|
0:15:53 | into account |
---|
0:15:54 | then we begin the work of working of a language where we actually work on |
---|
0:15:59 | what happened calls a language specific peculiarities document |
---|
0:16:02 | it typically involve involves providing the phoneme set |
---|
0:16:07 | that is gonna be used by captain |
---|
0:16:10 | and a variety of other things and something about the dialects in what |
---|
0:16:14 | there were what the primary dialect that they would standardise and for example |
---|
0:16:24 | that some people use some people don't |
---|
0:16:28 | well but it is a part of the and process so we allow we keep |
---|
0:16:31 | it going it provides |
---|
0:16:33 | at the start of the lexicon |
---|
0:16:35 | and also something densities which are very useful |
---|
0:16:39 | and then there's a small database of transcribed conversational that they sent to us that |
---|
0:16:44 | is review |
---|
0:16:45 | i castle in others |
---|
0:16:47 | to make sure that |
---|
0:16:49 | the transcription quality is reasonable |
---|
0:16:52 | sometimes we also get a lexicon to take a look |
---|
0:16:56 | provide feedback |
---|
0:16:57 | that affects things only receive an interim do we which is about three hours of |
---|
0:17:01 | a conversation and that we actually start looking for had ever graph |
---|
0:17:06 | who had or perhaps there are verging because |
---|
0:17:08 | you can actually is the lexicon to help you spot these together with some language |
---|
0:17:12 | experts so we try to clean that up so spelling normalisation is something that we |
---|
0:17:17 | do |
---|
0:17:19 | i it's that perhaps it's a little bit are adapt a certain amount of artificiality |
---|
0:17:24 | but it certainly is important to do it i can tell you it's not |
---|
0:17:28 | we're gonna be a hundred percent accurate |
---|
0:17:31 | it's being done with |
---|
0:17:32 | a certain amount of limitation on the resources are available |
---|
0:17:35 | finally we get the be delivery and that's reviewed in partitioned into training |
---|
0:17:40 | do have any the L and |
---|
0:17:42 | every collection is collected is if it's a surprise well |
---|
0:17:47 | where we use seventy five hours of the about but for the development languages the |
---|
0:17:51 | practice languages we only used fifteen |
---|
0:17:53 | so in many cases we have a lot of leftover audio that we just don't |
---|
0:17:57 | pass |
---|
0:17:58 | we also develop keyworks |
---|
0:18:00 | using a certain amount that we have them annotated by captain so that we can |
---|
0:18:04 | assign types and so one |
---|
0:18:06 | so that we can have a certain notion of balance among the keyword so we |
---|
0:18:09 | make sure that we come up with a certain number of names and so once |
---|
0:18:13 | the but there's balance in the test |
---|
0:18:15 | we also have the segments that and provides can be very large |
---|
0:18:21 | we can re segments |
---|
0:18:23 | using |
---|
0:18:26 | voice activity detection |
---|
0:18:27 | and basically those segments are passed back to happen for judgement in quality where they |
---|
0:18:34 | compare to the original segments |
---|
0:18:36 | then we do forced alignments on the dev and email and give the force alignments |
---|
0:18:40 | to the performance |
---|
0:18:43 | all these are the period one languages where problem we begin with cantonese pashto tagalog |
---|
0:18:48 | and turkish and those were pretty risk free languages |
---|
0:18:54 | and then we tested on vietnamese remember vietnamese was not to be the surprise language |
---|
0:18:59 | and ended up being somewhat challenging in that |
---|
0:19:04 | cantonese they provided provide of word boundaries but in vietnamese |
---|
0:19:08 | it was just the syllables right and so things tend to be short words and |
---|
0:19:14 | they also did a and not to bang up job of including all the dialectal |
---|
0:19:19 | variants the pronunciations which |
---|
0:19:21 | i think probably also cause problems |
---|
0:19:24 | but actually as a resource it's a it's a great resource if you're interested in |
---|
0:19:28 | understanding the vietnamese dialects you can see the number of dialects per language |
---|
0:19:34 | cantonese have five partial for to call it three turkish seven in vietnamese for instance |
---|
0:19:40 | the cantonese dialects |
---|
0:19:43 | probably were pretty heart for |
---|
0:19:45 | some people to understand so at the beginning when we use the data there was |
---|
0:19:49 | some question about whether those dialects really cantonese but they work |
---|
0:19:55 | so when we when we evaluate but this |
---|
0:19:59 | developed an evaluation plan and they were |
---|
0:20:02 | three conditions for the language resources that are use there was sort of the basic |
---|
0:20:06 | language pack so this is i use the resources and i'm to button |
---|
0:20:11 | there's the babel L like language resource condition where you could use a language packs |
---|
0:20:16 | that you have available |
---|
0:20:18 | and that's very nice for multi-lingual work |
---|
0:20:21 | and then if you wanted to bring in other not available resources do that so |
---|
0:20:25 | for example if you wanted to bring in web text or something like that |
---|
0:20:29 | or you wanted to bring in a pronunciation lexicon or if you had some found |
---|
0:20:32 | data do that |
---|
0:20:35 | and then there's the amount of training that they used from the base a lower |
---|
0:20:39 | condition |
---|
0:20:40 | and you could either use the eighty hours of conversational |
---|
0:20:43 | together with the scripted |
---|
0:20:45 | or you could use something that was limited which uses |
---|
0:20:48 | ten hours of transcription that it's selected it sub selected from the eighty hours so |
---|
0:20:54 | it's a proper subset of but our set |
---|
0:20:57 | and then there are two conditions for evaluating keywords there is the and heart condition |
---|
0:21:04 | no text audio reeves so you build your keywords system you don't have knowledge of |
---|
0:21:10 | the keyword you just basically do the search based on |
---|
0:21:13 | those keywords you you're not able to read decoder retrain or something like that with |
---|
0:21:17 | knowledge |
---|
0:21:19 | you're not allowed can you're obviously gonna |
---|
0:21:24 | decode but you can take into consideration knowledge of the keywords the test audio we |
---|
0:21:28 | used condition |
---|
0:21:29 | is you have knowledge of the keywords you could actually do things like automatically add |
---|
0:21:34 | them to the lexicon and do crazy things in terms of a language model you |
---|
0:21:38 | could use that you could go if you were gonna do the other lr you |
---|
0:21:41 | can actually but what for language model data and so on |
---|
0:21:44 | right so there's a lot of variability here |
---|
0:21:47 | in the action period |
---|
0:21:48 | we're really actually change things up a lot |
---|
0:21:51 | where people can declare the resources and so there's a lot of interesting new conditions |
---|
0:21:56 | that performers can come up with a narrow |
---|
0:21:59 | so this is this was the star but there certainly gonna be i think a |
---|
0:22:02 | lot more variability in the experiments people in the future |
---|
0:22:07 | so |
---|
0:22:08 | another innovation the came up with the program is since we're waiting so many languages |
---|
0:22:14 | and we don't want to prevent people from doing experimental conditions |
---|
0:22:19 | nist developed but they probably in the scoring server |
---|
0:22:23 | and this allows researchers to submit |
---|
0:22:26 | in get evaluated against the test data we don't |
---|
0:22:29 | release all the test data after test we really some portion of it |
---|
0:22:33 | if you wanted to go one evaluate against the full test set you know |
---|
0:22:37 | given a sequestered part |
---|
0:22:39 | and i think that's really important so |
---|
0:22:41 | if you're writing a paper |
---|
0:22:43 | ten months after the other evaluation you wanna go back and reevaluate or you've discovered |
---|
0:22:48 | something new |
---|
0:22:51 | and |
---|
0:22:52 | you want to basically |
---|
0:22:55 | test your hypothesis on the past languages |
---|
0:22:58 | right can do that |
---|
0:22:59 | and still get the full test them i think that's really very important and i |
---|
0:23:04 | really think it it's gonna make a lot of difference in terms of the pure |
---|
0:23:09 | science of the program can support |
---|
0:23:15 | jon fiscus put together this for the open of al |
---|
0:23:19 | this is submissions |
---|
0:23:20 | over |
---|
0:23:22 | the six weeks so the twenty seventh week |
---|
0:23:26 | in the program |
---|
0:23:28 | and you can see where there's spikes in terms of the rapid increase in the |
---|
0:23:33 | cumulative number of submissions but you can see even after the evaluations over especially with |
---|
0:23:39 | vietnamese |
---|
0:23:41 | people cat submitting right because vietnamese was somewhat challenging and some people wanted to continue |
---|
0:23:46 | to do work and of course the number of other languages as well |
---|
0:23:50 | this that the resulting get back to you and as soon as they basically say |
---|
0:23:54 | everything's okay everything's |
---|
0:23:56 | right there is a sort of an intermediate point where they wanna make sure that |
---|
0:24:00 | everything is working properly and so |
---|
0:24:04 | usually takes about a week before the first results are was but assume is there |
---|
0:24:08 | last |
---|
0:24:09 | then people can report them openly |
---|
0:24:13 | so |
---|
0:24:15 | in the first period people to the state and a lot of creative things |
---|
0:24:19 | people submitted primary in contrast systems and |
---|
0:24:23 | for the most are trying to or submissions word system combinations and we'll talk a |
---|
0:24:28 | little bit about system combination because it really does seem to help except for the |
---|
0:24:32 | swordfish |
---|
0:24:33 | all performers were able to make the program targets in all languages including the surprise |
---|
0:24:38 | using the full language pair |
---|
0:24:40 | and that in the base language resource condition with no audio |
---|
0:24:45 | and of course |
---|
0:24:47 | there are other conditions where you could potentially do better |
---|
0:24:50 | program targets were exceeded with ten hours of training and for the five languages by |
---|
0:24:55 | some people |
---|
0:24:57 | usually using system combination |
---|
0:24:59 | right |
---|
0:25:01 | system combination reduces |
---|
0:25:04 | the token error rate and increases atwv compared to single systems |
---|
0:25:10 | but even single system full |
---|
0:25:12 | language pair |
---|
0:25:14 | single system full language pack systems |
---|
0:25:16 | maybe program target |
---|
0:25:20 | with the with the language back |
---|
0:25:23 | all systems have of course have very probable low false alarm |
---|
0:25:29 | warring this miss rate places a significant role in increasing atwv and that something you |
---|
0:25:34 | want to sort of keep in mind |
---|
0:25:36 | and there were several collection factors that actually attracted atwv language dialect environment gender |
---|
0:25:44 | and i'm good just show you some poor results i think that are sort of |
---|
0:25:47 | interesting |
---|
0:25:48 | i don't know that i don't think evolution this even to the performers i actually |
---|
0:25:52 | put this together for my program review |
---|
0:25:56 | in here in here is |
---|
0:25:58 | this i the this slide are posted i'm accurately but up here |
---|
0:26:03 | i call this from the |
---|
0:26:06 | actually they're probably not posted |
---|
0:26:08 | but you can see that the base lr full language pack |
---|
0:26:11 | are all marketing reading |
---|
0:26:13 | and you know not everybody submits to condition the only one that was required was |
---|
0:26:17 | the full language pack a cell are |
---|
0:26:19 | and you can see people made that are their targets |
---|
0:26:23 | and in all the languages |
---|
0:26:28 | gender affects atwv and what was kind of into the and word error as well |
---|
0:26:33 | in the set of collections the females that better |
---|
0:26:36 | i think you did better with female speech which is kind of interesting |
---|
0:26:41 | i'm not all the languages sometimes by a lot look at the technology for example |
---|
0:26:45 | the males are so much worse |
---|
0:26:48 | i don't why i mean really we collect two thousand speakers |
---|
0:26:52 | right for per language so |
---|
0:26:57 | and i'm sure there's interactions |
---|
0:26:59 | with other factors but environment is important |
---|
0:27:03 | you can see overall |
---|
0:27:06 | pooling all over all systems you get a and the average of point five one |
---|
0:27:10 | atwv |
---|
0:27:16 | so the car here and he |
---|
0:27:19 | the unexpected environment were sort of |
---|
0:27:23 | equally for the landline a mobile are the home office and those are sort of |
---|
0:27:28 | the best |
---|
0:27:29 | and then the place in street people are sort of |
---|
0:27:33 | somewhere in between |
---|
0:27:35 | typically those are probably done what cell phone so |
---|
0:27:38 | but |
---|
0:27:41 | when you want to cross language |
---|
0:27:43 | this is kind of mse slide the card it is significantly worse than our past |
---|
0:27:48 | over summaries and |
---|
0:27:49 | and you know obviously partial was of her language overall but there's something going on |
---|
0:27:55 | there |
---|
0:27:57 | and it is kind of interesting you know you look at it turkish and the |
---|
0:28:01 | land lines wonderful well they probably have a much more stable |
---|
0:28:06 | environment for landline |
---|
0:28:09 | in some of these maybe rare so maybe with pancho |
---|
0:28:13 | so |
---|
0:28:14 | was the predominant thing so i didn't i what i didn't give you is sort |
---|
0:28:18 | of the breakout of the distributions |
---|
0:28:21 | a dialect |
---|
0:28:23 | dialect in atwv interacted and i gave the for teens and you can see |
---|
0:28:28 | northeast northwest so used in southwest |
---|
0:28:31 | southwest was really under-represented it was really became clear with the interesting collects twice it's |
---|
0:28:36 | a i don't want people by right |
---|
0:28:39 | but you can see people could still do something with that in some of these |
---|
0:28:43 | were related but certainly the ones that have a higher amount of data |
---|
0:28:46 | certainly word the past |
---|
0:28:49 | and the ones that had the lower amount of data least amount of data we're |
---|
0:28:52 | sort of the words |
---|
0:28:53 | and that was true across the board |
---|
0:28:55 | but certainly the dialect does dimension of challenge to the data |
---|
0:29:04 | so |
---|
0:29:09 | lamb |
---|
0:29:11 | i think it's area specific |
---|
0:29:15 | somehow or another and |
---|
0:29:16 | getting a |
---|
0:29:19 | and echo |
---|
0:29:20 | so what helps well early and it was clear that |
---|
0:29:24 | especially with the cantonese data that you gotta do |
---|
0:29:27 | re segmentation of the data to remove this to do science modeling get rid of |
---|
0:29:31 | the silence or you kinda screwed things |
---|
0:29:35 | robust multi mlp features |
---|
0:29:38 | works really important i think |
---|
0:29:41 | it really paid "'em" played a major role deep learning |
---|
0:29:47 | really started to shine in the program very early and then i think there's |
---|
0:29:52 | lots and lots of room for to keep shining in doing |
---|
0:29:55 | very interesting experiments |
---|
0:29:58 | pitch features on a language were useful at least for most people |
---|
0:30:03 | and that what kind of cool about that is that sort of gives hope for |
---|
0:30:07 | more universal feature extraction |
---|
0:30:10 | and |
---|
0:30:11 | one of the things that was really extremely important was to develop methods for preserving |
---|
0:30:15 | potential it's a search alternatives |
---|
0:30:17 | and the variety of ways of doing that including |
---|
0:30:20 | tensor lattices are smarter ways of doing the queries there there's a number of papers |
---|
0:30:25 | here that you can probably |
---|
0:30:26 | C and this topic |
---|
0:30:29 | and in other |
---|
0:30:31 | then used |
---|
0:30:33 | combining systems especially wonderful training data really matters a lot |
---|
0:30:38 | it matters a lot |
---|
0:30:39 | whether you try to build the systems differently or we just randomly see them differently |
---|
0:30:45 | system combination is very useful semi supervised training |
---|
0:30:49 | is very helpful for acoustic model and features |
---|
0:30:52 | and score normalization |
---|
0:30:54 | really plays a big model so if you do nothing else score normalization gives you |
---|
0:31:00 | a lot right |
---|
0:31:03 | so i |
---|
0:31:04 | i could i could report and number |
---|
0:31:07 | of things i just picked a smattering of things |
---|
0:31:10 | in typically the reason why a perfect it was not an endorsement per se but |
---|
0:31:14 | it was largely because |
---|
0:31:16 | there was some P |
---|
0:31:17 | sure that sort of |
---|
0:31:19 | with speech to the point i was trying to make |
---|
0:31:22 | but certainly several of these have papers appearing here so i put them there when |
---|
0:31:27 | i when i could sort of a lineup the result |
---|
0:31:30 | because some of these results were things that i got from site visits as opposed |
---|
0:31:34 | to from papers because |
---|
0:31:35 | a group here i to prepare the talk |
---|
0:31:38 | longer go |
---|
0:31:40 | but you can see |
---|
0:31:41 | the stacked bottleneck features versus the inter bottleneck features you get an eight percent reduction |
---|
0:31:46 | in word error |
---|
0:31:47 | and the anaconda competent improvement in terms of atwv |
---|
0:31:51 | adding fundamental frequency in probability of voicing |
---|
0:31:54 | reduces word error |
---|
0:31:55 | we generate this was on vietnamese i believe |
---|
0:31:58 | we generation neural network |
---|
0:32:00 | we generation neural net it |
---|
0:32:02 | and neural network targets at a percent and semi supervised training |
---|
0:32:07 | helped a lot to |
---|
0:32:08 | and those were all |
---|
0:32:10 | additive right so |
---|
0:32:12 | very cool |
---|
0:32:13 | so features very important |
---|
0:32:16 | deep learning is very helpful and |
---|
0:32:19 | we have a comparison here between shallow and deep |
---|
0:32:22 | and you can see the shallow versus the deep atwv and |
---|
0:32:26 | you know two to three percent |
---|
0:32:30 | absolute improvement this was using the kuwaiti tandem sat |
---|
0:32:34 | fmpe full language pack models |
---|
0:32:39 | pitch helps even for non-tonal language |
---|
0:32:41 | this is this is from the M probably has been playing around with features |
---|
0:32:45 | because he was very unhappy with how here performed with this |
---|
0:32:50 | edge and vietnamese so he's basically done a lot of interesting network |
---|
0:32:55 | and you can see you know when they the |
---|
0:32:59 | as C |
---|
0:33:01 | pitch feature sometimes the goes up |
---|
0:33:03 | it goes down a little bit for by golly but his method that he incorporated |
---|
0:33:07 | in the kaldi gives an improvement |
---|
0:33:09 | and all those languages |
---|
0:33:11 | so vietnamese and cantonese are tonal but you can see a semi something a like |
---|
0:33:16 | and certainly a lot of other people |
---|
0:33:18 | have a similar program your problem and so one |
---|
0:33:21 | have this kind of result large lattices help up to a point |
---|
0:33:26 | right so you've got a |
---|
0:33:28 | i actually haven't so this is the data per |
---|
0:33:31 | where random is up in the upper right corner |
---|
0:33:34 | and the further down go the better but that curve shows the operating |
---|
0:33:39 | performance in terms of trade off between probability of false alarm probability of miss |
---|
0:33:44 | and so |
---|
0:33:45 | in further down is really important |
---|
0:33:48 | and so you can see the green line is done with small lattices |
---|
0:33:51 | and the purple and the line is done with larger |
---|
0:33:57 | and the normals |
---|
0:33:58 | lattices and eventually it is diminishing returns but certainly reserving stuff |
---|
0:34:04 | that you want to find is extremely important |
---|
0:34:10 | knowledge of the keywords helps |
---|
0:34:12 | so you can see |
---|
0:34:13 | it helps even more with the limited language pack we're |
---|
0:34:16 | you don't |
---|
0:34:17 | you might not know about those words based on the ten hour subset |
---|
0:34:21 | so if you know about the keywords |
---|
0:34:24 | you can actually leverage that knowledge |
---|
0:34:26 | in interesting ways like not running things weight you always wanna keep the probabilities right |
---|
0:34:31 | but you might want to set |
---|
0:34:33 | specific |
---|
0:34:34 | beings for specific for |
---|
0:34:41 | maybe and has developed |
---|
0:34:44 | a white list approach |
---|
0:34:47 | that using the audio we use so you can see here |
---|
0:34:51 | here's knowledge |
---|
0:34:53 | of the keywords before they basically do things and they get a re called keywords |
---|
0:34:58 | about ninety two percent |
---|
0:35:01 | without knowledge of the keyword that seventy four percent you can see there's the big |
---|
0:35:05 | of done atwv and you can see the number of hits per keyword is much |
---|
0:35:11 | lower in keywords without its |
---|
0:35:13 | much higher |
---|
0:35:14 | but if you simply look at say infrequent words that may be important |
---|
0:35:19 | just boosting the P model that was actually does give you something |
---|
0:35:23 | in terms of being able to preserve those |
---|
0:35:26 | keywords and so the percent of recall somewhere between |
---|
0:35:29 | and that's that that's beneficial right so it's preserving stuff |
---|
0:35:33 | so that you don't perform things out |
---|
0:35:35 | and |
---|
0:35:37 | you look at system combination i think system combinations about preserving stuff to |
---|
0:35:42 | you get big gains |
---|
0:35:44 | this is this is that the data set so it's |
---|
0:35:47 | you can see |
---|
0:35:49 | the best system here to combine system |
---|
0:35:52 | and all on a full language pack and a limited language pack and you can |
---|
0:35:55 | see |
---|
0:35:56 | icsi except for posh though |
---|
0:35:58 | system combination gets you |
---|
0:36:01 | about point three atwv |
---|
0:36:03 | which is pretty amazing |
---|
0:36:05 | i word errors but you know |
---|
0:36:09 | you can actually make the target |
---|
0:36:11 | amazing |
---|
0:36:13 | here's another picture of system combination |
---|
0:36:16 | where you can see the |
---|
0:36:18 | the individual systems using various |
---|
0:36:22 | putting |
---|
0:36:22 | you know dnns be enough |
---|
0:36:24 | and then you have the combination in routinely this is a limited language pair |
---|
0:36:30 | results as well |
---|
0:36:31 | so you're gonna see much more modest scores |
---|
0:36:37 | light |
---|
0:36:38 | good duh normalisation this is the bbn result and |
---|
0:36:43 | dab in email |
---|
0:36:44 | per language where you look at |
---|
0:36:46 | cantonese part though turkish tagalog and vietnamese you can see normalisation gives you a significant |
---|
0:36:53 | improvement |
---|
0:36:54 | not always the same price the |
---|
0:36:57 | the dev and that S right so there's some impact of the set |
---|
0:37:01 | but you can see |
---|
0:37:03 | normalisation and doing it well |
---|
0:37:05 | is certainly a big part of the program and there's a lot of methods that |
---|
0:37:08 | people are working on now including that that's |
---|
0:37:11 | rescoring |
---|
0:37:14 | and the other interesting result which i believe appears here as a poster |
---|
0:37:20 | and i couldn't put all the names of the authors out there so |
---|
0:37:24 | would be readable so i put in and all |
---|
0:37:26 | but when you normalize is very important |
---|
0:37:30 | so you've got the contrast between the no audio we used in the audio reuse |
---|
0:37:36 | but you can you can look at either one row or the other row |
---|
0:37:39 | and if i do |
---|
0:37:41 | normalisation after system combination i only get so far but if i normalize before i |
---|
0:37:47 | do system combination i do really well |
---|
0:37:49 | and |
---|
0:37:50 | if i normal is |
---|
0:37:55 | after the best tokenization be more score combination i basically can build a single system |
---|
0:38:01 | that is really better than what you produce you normalize weights and normalizing orally so |
---|
0:38:07 | if you're doing combinations of various representations is important to get the scores on the |
---|
0:38:11 | cities |
---|
0:38:12 | and in the same |
---|
0:38:13 | the same place i mean it |
---|
0:38:15 | it is really important it makes a big difference |
---|
0:38:18 | and quite frankly a single systems gonna be much easier to run so it's kind |
---|
0:38:22 | of an interesting thing to know |
---|
0:38:25 | the other people that appears here |
---|
0:38:28 | is |
---|
0:38:30 | touches on analysis so |
---|
0:38:32 | effective thresholds on atwv is also |
---|
0:38:36 | an interesting thing to look at where you can actually get a number or rules |
---|
0:38:40 | we have a fair threshold so it's just based on |
---|
0:38:44 | my notion of what i can do here we based on |
---|
0:38:47 | what i have in the genoa |
---|
0:38:50 | verses if i set the threshold to be a them all |
---|
0:38:54 | for the key for each keyword |
---|
0:38:57 | and then if i play around and make sure that i he |
---|
0:39:01 | the things that matter and throw away the things that are so i basis that |
---|
0:39:05 | the probability |
---|
0:39:06 | of hits to one of my probability of missus does euro you can see |
---|
0:39:11 | the probability space is also playing a major role |
---|
0:39:14 | in terms of your ability to get the keywords it's not just a matter of |
---|
0:39:17 | calibration |
---|
0:39:18 | also getting better probabilities seems to be an important aspect as well |
---|
0:39:22 | so there is a lot of interesting things that people can look at and certainly |
---|
0:39:27 | analysis i think is really a very important aspect of the program so understanding |
---|
0:39:32 | why something works why something doesn't work |
---|
0:39:34 | why something doesn't work is such a prison such a bad thing it basically by |
---|
0:39:38 | if you a piece of knowledge that really is important in terms of solving the |
---|
0:39:41 | problem |
---|
0:39:43 | we also had an open keyword search |
---|
0:39:47 | valuation in two thousand thirteen for vietnamese and we have a lot of people i |
---|
0:39:51 | we had before babel performers plus eight outside teens who ended up submitting systems |
---|
0:39:57 | and |
---|
0:39:58 | i was them here |
---|
0:40:00 | we have eight wonderful volunteers who actually participated in the open kws meeting is of |
---|
0:40:06 | the results in their all over the |
---|
0:40:08 | all over the place i kinda put it up there |
---|
0:40:11 | so that and these are posted right that the resulting in kws are posted you |
---|
0:40:16 | can go take a look |
---|
0:40:17 | but |
---|
0:40:18 | if people want to participate in the next one maybe they won't feel so shy |
---|
0:40:22 | about the possibility of submitting something that may not be |
---|
0:40:27 | super certainly babel people |
---|
0:40:29 | have a lot more practise with the data |
---|
0:40:31 | but |
---|
0:40:33 | you can see you know that the scores were all over the place |
---|
0:40:36 | and but people really did a lot of interesting things and there was your resource |
---|
0:40:41 | approaches as well as |
---|
0:40:43 | is low resource approaches |
---|
0:40:47 | so impure into we added six languages we have fried practice and one surprise |
---|
0:40:53 | they only have sixty hours of transcribed training they do have the remaining twenty hours |
---|
0:40:59 | untranscribed |
---|
0:41:00 | there's also what ten hour training set |
---|
0:41:03 | and they have to exceed the program targets now i'm both condition |
---|
0:41:08 | because they got so close right so the |
---|
0:41:19 | and also |
---|
0:41:21 | approaches that use things like morphology and so one |
---|
0:41:23 | where maybe they would help you to get |
---|
0:41:26 | i in the ten hour set |
---|
0:41:28 | maybe the sixty hours that are the eighty hours that is a little bit too |
---|
0:41:31 | large |
---|
0:41:32 | and then they'll help three weeks to build the surprise language |
---|
0:41:36 | the languages are bengali a nasty so those were collected in the first period they |
---|
0:41:40 | don't have another channel they're pure telephony |
---|
0:41:43 | no we have a means illumination real allow and of course that we have a |
---|
0:41:47 | surprise and i'm not gonna out that here |
---|
0:41:52 | optimizing bengali i think our |
---|
0:41:56 | somewhat |
---|
0:41:57 | okay right but zoo appears to be quite challenging annotation real |
---|
0:42:02 | appears to be quite |
---|
0:42:04 | simple right and so these are aspects of the language i don't think they're aspects |
---|
0:42:08 | up a collection |
---|
0:42:10 | and then lower |
---|
0:42:12 | will have its own challenges because again |
---|
0:42:17 | we couldn't annotate the compounds |
---|
0:42:19 | reliably and so |
---|
0:42:21 | the lower words |
---|
0:42:23 | not the not the borrowed words are |
---|
0:42:26 | multi syllabic right there the syllables |
---|
0:42:28 | but there's single slap excuse me |
---|
0:42:32 | so cast would put together some of the challenges of this |
---|
0:42:36 | and present it got but i thought that was interesting so that the notion of |
---|
0:42:40 | the sure language models where you can |
---|
0:42:42 | sure between then golly in a somebody's right |
---|
0:42:46 | also means doesn't have this much of a web presence and so it sort of |
---|
0:42:49 | an interesting thing to do reporting in the french for the haitian creole |
---|
0:42:53 | the phonology there are stolen allowance to |
---|
0:42:56 | lasso has told kinda like |
---|
0:42:58 | cantonese and |
---|
0:42:59 | and vietnamese but two tone is very different |
---|
0:43:03 | unfortunately this is how we could not marking the legs kind of couldn't be done |
---|
0:43:08 | reliably and so it didn't make sense to put it in the resource |
---|
0:43:12 | and then you also have some six segmental |
---|
0:43:15 | it's a segmental phonology issues by golly and |
---|
0:43:18 | morphology use there is too big time maybe more so than in the big only |
---|
0:43:23 | enough |
---|
0:43:24 | the oov rate is |
---|
0:43:26 | higher than any of the languages we've seen including turkish which |
---|
0:43:30 | didn't really have a terrible oov rate |
---|
0:43:33 | and then there's other aspects that linguists might be interested in looking at the likeness |
---|
0:43:38 | levels |
---|
0:43:40 | there's person to script something albion ask means are sort of very similar |
---|
0:43:45 | strictly speaking at being falsely score for a some is also mean square but it |
---|
0:43:49 | really is same as the bengali |
---|
0:43:52 | and then you have wow which has an another script as well there's a lot |
---|
0:43:56 | of code switching and fusion creole but there is available ones you to i certainly |
---|
0:44:01 | see a |
---|
0:44:02 | and so those can be problems |
---|
0:44:05 | and then there's a lot of short words and haitian creole allow but |
---|
0:44:08 | i guess the short or words are hurting haitian creole maybe we'll for well |
---|
0:44:14 | maybe not |
---|
0:44:15 | so exciting directions people are going in |
---|
0:44:20 | one of the things that we want is more analysis and so we revise the |
---|
0:44:24 | evaluation plan images posted at the open kws sites you can actually take a look |
---|
0:44:29 | if you want |
---|
0:44:31 | so that people can actually evaluate a lot more conditions and then |
---|
0:44:35 | sure the conditions with each other so that |
---|
0:44:37 | others can evaluate likewise |
---|
0:44:40 | there's a lot of work going out multilingual processing is trying to sit right it's |
---|
0:44:46 | very intriguing and very interesting |
---|
0:44:48 | and i think yes the deep learning things to really |
---|
0:44:52 | those neural net models or certainly seen to play a role in progress the people |
---|
0:44:58 | are making |
---|
0:45:00 | machine learning |
---|
0:45:02 | sort of get a somewhat slow start because you're trying to integrate this community into |
---|
0:45:07 | the speech community but they're beginning to take off |
---|
0:45:10 | two so stay tuned i think that there is a lot of interesting things the |
---|
0:45:15 | we're gonna happen |
---|
0:45:16 | smart lattices and consensus networks were beginning to play a role at the end of |
---|
0:45:21 | last |
---|
0:45:22 | a period |
---|
0:45:23 | but i think that there are actually making much progress now |
---|
0:45:29 | and the thing is that a lot of work was done a consensus networks to |
---|
0:45:32 | make it work with the keyword search task |
---|
0:45:35 | originally it was developed by lydia |
---|
0:45:38 | due to basically |
---|
0:45:40 | you know do a last pass right before you gave your one best output |
---|
0:45:46 | and it was great for that but there were things that you can do to |
---|
0:45:49 | basically make it work a little bit better with the keyword search |
---|
0:45:52 | and then morphology |
---|
0:45:56 | again this is a community integration of people largely were context working with speech community |
---|
0:46:02 | so there's a lot of tradeoffs between whether you wanna break |
---|
0:46:05 | break up a little pieces words which might be something that's great if you're doing |
---|
0:46:09 | text |
---|
0:46:10 | and that's a great if you're doing speech so |
---|
0:46:14 | a lot of a lot of the integration of the teens |
---|
0:46:17 | is beginning to bear fruit there as well so |
---|
0:46:20 | it's quite interesting and a big thing that i think is really important is the |
---|
0:46:24 | getting by with less |
---|
0:46:26 | so |
---|
0:46:26 | ten hours of training were less |
---|
0:46:29 | i don't seen results with less |
---|
0:46:31 | but i certainly think would be cool |
---|
0:46:33 | and no pronunciation like |
---|
0:46:35 | so |
---|
0:46:35 | everybody promise to do decimation studies but |
---|
0:46:39 | to large extent you know the program targets unfortunately |
---|
0:46:42 | seem to sometimes try the research toward program targets as opposed actual exploring the space |
---|
0:46:49 | of experiments so there is there is a there's a tradeoff between having annual evaluations |
---|
0:46:55 | in getting people to do research |
---|
0:46:56 | but i really |
---|
0:46:58 | really do hope that people will |
---|
0:47:01 | explore these conditions "'cause" i think the really important |
---|
0:47:07 | so i'm ending up with us why about the open kws |
---|
0:47:13 | the slightest why |
---|
0:47:15 | and you can see the timescale right so |
---|
0:47:18 | registrations gonna close and G and at the end of january so if you're interested |
---|
0:47:22 | at all |
---|
0:47:24 | we use |
---|
0:47:25 | do consider |
---|
0:47:27 | the vietnamese language pack will be available for those of you who have not participated |
---|
0:47:31 | before |
---|
0:47:32 | the open kws people who have participated as long as they participate again can keep |
---|
0:47:37 | the data |
---|
0:47:38 | right so if you just keep participating you can actually keep all the surprise languages |
---|
0:47:42 | and hopefully nist open |
---|
0:47:44 | up some of those that language is by evaluating on them to |
---|
0:47:47 | so there's lots of data it's very useful |
---|
0:47:51 | there's a lot of things there that you could |
---|
0:47:54 | do with that data above |
---|
0:47:57 | to support basic speech recognition and other types of speech research |
---|
0:48:03 | and hopefully by the time with the |
---|
0:48:04 | and of the program this will be released publicly to everybody |
---|
0:48:08 | since we all the data alright |
---|
0:48:11 | but you can see |
---|
0:48:12 | the surprise language bill |
---|
0:48:14 | is gonna be sent |
---|
0:48:17 | we can have or so before the |
---|
0:48:19 | evaluation begins where we send a password so there won't be any problem with the |
---|
0:48:24 | download |
---|
0:48:25 | downloads gonna be a little bit harder since we have the channel data which is |
---|
0:48:30 | not downsampled in any way |
---|
0:48:33 | because we figured |
---|
0:48:34 | that's an aspect of handling that data |
---|
0:48:37 | right and then people have the three weeks we send out the evaluation pack ahead |
---|
0:48:43 | of time as well as its larger |
---|
0:48:44 | it's seventy five hours and some of that is channel data |
---|
0:48:48 | right and this will send a password on april twenty eight there at which point |
---|
0:48:52 | people have a week |
---|
0:48:54 | to complete their submissions you can submit many things |
---|
0:48:58 | this will keep an eye and things to make sure that submissions are sounded there's |
---|
0:49:02 | problems |
---|
0:49:03 | there is a point of contact and so on so |
---|
0:49:06 | it should not be a very bad thing and the other thing that this |
---|
0:49:10 | there will be an open kws meeting were everybody would be expected to participate so |
---|
0:49:14 | there is sort of a bird there for people who might participate but |
---|
0:49:19 | i think that the meeting last time was very valuable in |
---|
0:49:22 | in table babel folks were really very generous in sharing their insights so |
---|
0:49:27 | i think it's a great opportunity to hear about |
---|
0:49:31 | the work |
---|
0:49:32 | and be able to ask questions and interact with |
---|
0:49:35 | the babel participants so i think it's a really good thing you have the open |
---|
0:49:39 | kws |
---|
0:49:42 | and last but not least this is the get up with the slide stage |
---|
0:49:49 | this is this is one of the things you have to do and the pitch |
---|
0:49:53 | for the program |
---|
0:49:54 | i put a little task force there and |
---|
0:49:57 | after languages cover |
---|
0:49:59 | but |
---|
0:50:01 | obviously it's nice to be able to say all but really there's the caviar that |
---|
0:50:05 | this has to be a language that has an orthographic transcription |
---|
0:50:08 | i have to say even just having a north the orthographic transcription does not make |
---|
0:50:13 | it easy |
---|
0:50:14 | to create a language and so some languages are really much more normalize than others |
---|
0:50:19 | me |
---|
0:50:20 | as much as we have done a lot of work in terms of normalizing english |
---|
0:50:23 | and there's a lot of spelling variants that happened it's a lot harder to do |
---|
0:50:28 | it in these other languages were there really isn't |
---|
0:50:31 | well studied conventions so |
---|
0:50:34 | all star the caviar because certainly |
---|
0:50:38 | you really do have to have the ability the capability of being able to clean |
---|
0:50:42 | up the language |
---|
0:50:43 | even when there is a presence in the web |
---|
0:50:47 | and tiny is we talked about them as well |
---|
0:50:50 | we're moving down to ten to forty hours |
---|
0:50:54 | working with variable recording conditions where they developed |
---|
0:50:59 | system in a we |
---|
0:51:00 | a big the immediate impact has been language data were i i've |
---|
0:51:05 | shared language data in had opened of males |
---|
0:51:08 | that impacts the community at also text the government |
---|
0:51:12 | new methods and speech search speech systems is then sort of the medium impact |
---|
0:51:16 | and getting affective keyword search a new languages deliver quickly as the ultimate delivery so |
---|
0:51:21 | learning how to do that learning how to solve the problem of |
---|
0:51:26 | this is a new language now build the system is really for it's the core |
---|
0:51:30 | principle program |
---|
0:51:32 | and everything really needs to be projected in that direction |
---|
0:51:36 | and alternately there are lots of other ways to say well |
---|
0:51:39 | what if i only have a certain amount of time to transcribe |
---|
0:51:43 | we find that |
---|
0:51:44 | we can do that very well programmatically the people can certainly investigate that right where |
---|
0:51:49 | they consider |
---|
0:51:51 | the time to what the time to transcribe and clean things up in terms of |
---|
0:51:55 | selecting data that they can work with |
---|
0:51:58 | the nice thing is that there is that eighty hours of audio regardless of how |
---|
0:52:03 | much data you use and so there's a lot of room to investigate a wide |
---|
0:52:06 | variety |
---|
0:52:07 | of getting by with less |
---|
0:52:09 | including getting by with no lexical |
---|
0:52:12 | getting by without transcripts |
---|
0:52:14 | at all certainly there is more like that going on in the program |
---|
0:52:19 | it may not |
---|
0:52:21 | the |
---|
0:52:22 | price |
---|
0:52:23 | that the best systems do but i would say it's all equally important and in |
---|
0:52:28 | vital to the program so having a wide variety of things going out i think |
---|
0:52:31 | is really important |
---|
0:52:34 | i'm done so if you questions |
---|