i mary harper and
in october two thousand ten i went i refer to develop this program a babel
i
and i say i think babel there's lots of ways to pronounce babel
and you can actually go to this website and find out about all the way
of saying a lot of people like this example
for me about is be a D V only and i drop in buffalo new
york were that was by
dialectal variation of inclusion
we want a taller are while so i see table
and of course there's also the original hebrew word and a variety of other ways
of pronouncing as well
but morgan pointed out the but though
now you say sounds but they didn't
in for some reason
okay so
every program
whether it's a darpa write our has sort of a back story you have to
have a motivation
that sort of an elevator speech so my challenges that
you know you you're in a situation we're dealing with the crisis it might be
they the
it might be a need for example where you have to deal with a lot
of
i'll
speech noisy speech in order to solve a crisis situation and you have thousands of
hours in no time to listen you might have one or two people who could
listen to it but you're certainly not gonna get through it and anytime the
would be reasonable in order to help people
and
if you have no existing speech technology for that language you have problems
but if you could rapidly develop that
say in the day here to you actually might be able to do simple it
it sort of addresses two gaps it's harder to build up the human capital in
the language
because it can take years and signal i don't we have one or two people
global language and we see that even with just developing the resources that we don't
have this
this language capital
and there's also a technology gap and so this slide was certainly
down a number of years ago but certainly computer to the three hundred and ninety
three million a printer ninety three languages that have a million speakers we touched very
few
right so we and we've only studied of is actually really and immediately i mean
we study english all the time because it easy they're corpora and so one
and it can take way too much time
it months to years to build a new language especially if you have to transcribe
the and the systems double developed for english don't always work well to other languages
they can they can help with the bootstrap
they certainly don't give you the kind of
error rates that maybe someone might want to see
so the basic idea underlying
table we rather than just evaluate a word error because
the director was very the image that she wanted to habitat
have a real task that just transcription
and so keyword search
fortunately had have an evaluation in two thousand six
and so we settled on at the keyword search task
where the basic idea is you speech recognition
or phone recognition or something the index
the thousands of hours of audio and then you have some whale
of putting in a query
for babel we use orthographic queries and for those who are doing low resource they
do other things with the data in order to basically accommodate to the fact that
we
we use orthographic queries and then we evaluate
a weather about the got the keeper correctly identified from the audio
so
our approach is really to work with a wide variety of languages and
the more
why you know so not just european languages it's it i think is really important
to study things that have a wide variety of aspects
in
real recording conditions as much as possible i mean obviously collections are gonna suffer from
the pair that
you go into these countries you may not be able to
record in a highly reverberant room or something
but the hope is that you can get these sort of real world recording situations
and then we
constrain the resources in various ways and that we actually collect a lot of data
but we actually create a wide variety of conditions for people to evaluate and
and the actually can create conditions as well to answer questions that they think are
important like getting by without lexicon
so we gradually reduce the amount of brown from speech for dipping them in training
but we also give them the audio
untranscribed i we also reducing the amount of time that they have to basically evaluate
the surprise language and i think that's critical
and not starting off with something that's impossible at the outset we actually getting people
to the point where they can develop the technology is extremely important in this
and we set the targets to be
sort of a three times improvement over which we get with phonetics or
and i think that was critical and that was done based on the std or
six
dbn results in cantonese and mandarin and
where they got a point three briefly about point three atwv so we said that
as the target level
so
the goal is to improve speech technology with limited amounts of ground truth data only
speech
systems for an english
language is extremely important
i improving speech recognition through innovative use
of the technology and different approaches
at a wide variety of languages so that you can get fast development of keyword
search systems to tackle this problem
i just to give you a sense of the layout of the program
other than the basic your which was they had a little bit one for the
nine months because it was a fifteen month period
they have roughly about nine months to work with data
the columns
not necessarily a day one
and then the evaluation starts where they have one want to do the keyword search
under practice languages so we evaluate everything
is this really important to understand
what progress is being made on the different languages causes languages are all different
and then we actually three give them a surprise language were we give them the
half of data and all talk about that little bit
where they have a certain number of meets the builder system
which decreases over the pure so in the basically related for weeks
this period more in the option one period don't have three weeks
and then they have one week to retrain their keyword results
and
give you might ask why one we there a lot of research or evaluation methods
the people are trying out what keywords or so it is important to leave a
sufficient amount of time there as well
so they not sure that we're using to measure performance is the actual term weighted
value which was developed by nist
i think in coordination with a number of
sponsors of that evaluation and it's kind of got i use case where
you've got people who
would like to be able to find stuff they don't tolerate a great number of
false alarms so you wouldn't want to use of score the other thing is that
rare terms given as a few nature of language
and the fact that maybe rare terms may be very useful in terms of finding
things that are critical
i mean tsunami might be a very common thing in the in the traffic you're
collecting but it may not have been there in your training data so you want
to be able to find some dummy things in your in your audio for
and
what you have to realise is that it's are weighted it and it's basically evaluated
over all terms regardless of the frequency of those terms
right so a single ten house the same in the score is something that is
highly frequent
and then there's a number of things that
this is that
like that the C in the be value a by enlarge you've got this weighting
a probability of false alarm
the systems have very low probability of for of false alarms typically so those are
really kind of little bit you can understand that it sort of a tradeoff between
those two things
but really missing something really doesn't a score when there are singleton
so something you want to sort of in mind as you look at the results
that i'm gonna go through
so
the babel program has a number of dimensions in terms of people were working
obviously the program wouldn't exist without data and so i has been the data collector
from day one i started i actually talked to them
proposed a i went on the job about the notion of the data collection
then and then
we have the test an evaluation team that's what T N T stands for work
and
it's actually
important to realise that you
have nist two can run and evaluation miserable cindy the technological support to setup an
evaluation
and then approach that's like this
so we actually have like hidden layers that actually build system so we can do
forced alignments and things like that
and then my work in some logistic stuff and then can also provides
needed help with linguistics they help me what they advise me on a number of
dimensions and certainly getting good phonetic coverage over the language and getting know how diversity
languages
it would be really card for me where i don't know well these languages actually
do that and the other thing is there is sort of attaining between the T
V gene and happen in order to ensure that the quality of the data is
appropriate for the T ask that we're doing
so keyword searches not something that happened had
then sporting before and had or browse
really doesn't make it very challenging to evaluate and keyword search so
we actually do teen and offline i'll talk a little bit about it that's the
one and then we have for teens
where i put the primes on the left applied to C M U I B
M X C and we bbn are the primes and you can see all
all the people who participated in the base period sometimes there's some reconfiguration but this
is the picture it was at the time of the base period so mobile technologies
still in there
so
lots of work
i think they're sixteen papers here that were supported by table i think of you
go back you
go back to
icassp over the past couple years in interspeech
i think they're probably hundred papers for so that have been sponsored by
babel all with rate work
i want to point out that is like go through things like that can i
have time to touch on
all the work for all the cool things the people are doing
i'm just gonna point out something selecting
or sort of interesting lessons word
and you know
there's a lot of other things the people are doing better
quite interesting i'm also gonna point out
how we change things for the option period
and the kinds of things that look like they're
really glimmering hopes i think
so
i'm not gonna ask that research is you'll be able see that
and in future conference
so the data collection is actually quite daunting
we actually have we're collecting the data and four dollars
where there are seven collected at a time
and we only needed for practise languages and one surprise language for
the base period we collected seven
no was a good thing we did because what we can plan to help as
the development language and a surprise language which would was a some is and by
golly
where are some is was supposed to be surprise
things went wrong with the collection and so we basically had to use the other
five languages so
it is really important to be over collecting for your leads at a particular time
the amount of time we spent collect seven languages
given the fact that you stagger kick up
is
roughly two years
right so you can see that there is like to your overlapped periods
so it is really interesting right now we're working on sixteen getting ready just send
funds for five so it there it really is this is sort of the critical
period for basically making sure the rest of the of the program is going to
play out but you can see there is an increasing number
of languages in each period you subtract one for the surprise language and you can
see how many
being used for practise
right so you can imagine by the time you hit the and of the program
multilingual systems are gonna be really highly supportive really high supported
we have a variety of criteria for selecting languages all talk about the little bit
more on the next
most of these are multi dialectal and they also represent a wide variety of recording
conditions
and starting in the option period we also started collecting
microphone channel
and
they all the data include surprise environments are channels in the evaluation
so
there is always something that's next and it's not that for hundreds of the data
but it there so people can assess whether they're methods are working
and these things
so we languages from a variety of language families with different features phonotactic morphological syntactic
and so one whether there are stolen a the collected in country which i think
is really important so you're living with a wide variety of telecommunications
kind of sit
situations others dialectal variation
a wide variety of environments the easiest environment tends to be the home office one
where there's a landline a mobile not always landline and some of these countries now
so when line is disappearing in some of the collections we're doing now
probably place three in the be able with the car click card it tends to
be one of the or
and then there's others
obviously you want to have not telephone channel data in there as well
and metadata valence
we actually do
provide the metadata
with each of the for the files so that the collection could alternately be used
to support dialect id room language id or other things
right so
you want to collect this data in such a way that it can be used
for a variety of purposes
we start off doing risk assessment obviously you don't wanna go into country where there's
a likelihood that people will die when they're doing the collection so
you have to you have to take that into consideration we also have to take
into consideration whether or not
there are
is the potentially get transcribers
and people who knows something about the language so all those things are certainly taken
into account
then we begin the work of working of a language where we actually work on
what happened calls a language specific peculiarities document
it typically involve involves providing the phoneme set
that is gonna be used by captain
and a variety of other things and something about the dialects in what
there were what the primary dialect that they would standardise and for example
that some people use some people don't
well but it is a part of the and process so we allow we keep
it going it provides
at the start of the lexicon
and also something densities which are very useful
and then there's a small database of transcribed conversational that they sent to us that
is review
i castle in others
to make sure that
the transcription quality is reasonable
sometimes we also get a lexicon to take a look
provide feedback
that affects things only receive an interim do we which is about three hours of
a conversation and that we actually start looking for had ever graph
who had or perhaps there are verging because
you can actually is the lexicon to help you spot these together with some language
experts so we try to clean that up so spelling normalisation is something that we
do
i it's that perhaps it's a little bit are adapt a certain amount of artificiality
but it certainly is important to do it i can tell you it's not
we're gonna be a hundred percent accurate
it's being done with
a certain amount of limitation on the resources are available
finally we get the be delivery and that's reviewed in partitioned into training
do have any the L and
every collection is collected is if it's a surprise well
where we use seventy five hours of the about but for the development languages the
practice languages we only used fifteen
so in many cases we have a lot of leftover audio that we just don't
pass
we also develop keyworks
using a certain amount that we have them annotated by captain so that we can
assign types and so one
so that we can have a certain notion of balance among the keyword so we
make sure that we come up with a certain number of names and so once
the but there's balance in the test
we also have the segments that and provides can be very large
we can re segments
using
voice activity detection
and basically those segments are passed back to happen for judgement in quality where they
compare to the original segments
then we do forced alignments on the dev and email and give the force alignments
to the performance
all these are the period one languages where problem we begin with cantonese pashto tagalog
and turkish and those were pretty risk free languages
and then we tested on vietnamese remember vietnamese was not to be the surprise language
and ended up being somewhat challenging in that
cantonese they provided provide of word boundaries but in vietnamese
it was just the syllables right and so things tend to be short words and
they also did a and not to bang up job of including all the dialectal
variants the pronunciations which
i think probably also cause problems
but actually as a resource it's a it's a great resource if you're interested in
understanding the vietnamese dialects you can see the number of dialects per language
cantonese have five partial for to call it three turkish seven in vietnamese for instance
the cantonese dialects
probably were pretty heart for
some people to understand so at the beginning when we use the data there was
some question about whether those dialects really cantonese but they work
so when we when we evaluate but this
developed an evaluation plan and they were
three conditions for the language resources that are use there was sort of the basic
language pack so this is i use the resources and i'm to button
there's the babel L like language resource condition where you could use a language packs
that you have available
and that's very nice for multi-lingual work
and then if you wanted to bring in other not available resources do that so
for example if you wanted to bring in web text or something like that
or you wanted to bring in a pronunciation lexicon or if you had some found
data do that
and then there's the amount of training that they used from the base a lower
condition
and you could either use the eighty hours of conversational
together with the scripted
or you could use something that was limited which uses
ten hours of transcription that it's selected it sub selected from the eighty hours so
it's a proper subset of but our set
and then there are two conditions for evaluating keywords there is the and heart condition
no text audio reeves so you build your keywords system you don't have knowledge of
the keyword you just basically do the search based on
those keywords you you're not able to read decoder retrain or something like that with
knowledge
you're not allowed can you're obviously gonna
decode but you can take into consideration knowledge of the keywords the test audio we
used condition
is you have knowledge of the keywords you could actually do things like automatically add
them to the lexicon and do crazy things in terms of a language model you
could use that you could go if you were gonna do the other lr you
can actually but what for language model data and so on
right so there's a lot of variability here
in the action period
we're really actually change things up a lot
where people can declare the resources and so there's a lot of interesting new conditions
that performers can come up with a narrow
so this is this was the star but there certainly gonna be i think a
lot more variability in the experiments people in the future
so
another innovation the came up with the program is since we're waiting so many languages
and we don't want to prevent people from doing experimental conditions
nist developed but they probably in the scoring server
and this allows researchers to submit
in get evaluated against the test data we don't
release all the test data after test we really some portion of it
if you wanted to go one evaluate against the full test set you know
given a sequestered part
and i think that's really important so
if you're writing a paper
ten months after the other evaluation you wanna go back and reevaluate or you've discovered
something new
and
you want to basically
test your hypothesis on the past languages
right can do that
and still get the full test them i think that's really very important and i
really think it it's gonna make a lot of difference in terms of the pure
science of the program can support
jon fiscus put together this for the open of al
this is submissions
over
the six weeks so the twenty seventh week
in the program
and you can see where there's spikes in terms of the rapid increase in the
cumulative number of submissions but you can see even after the evaluations over especially with
vietnamese
people cat submitting right because vietnamese was somewhat challenging and some people wanted to continue
to do work and of course the number of other languages as well
this that the resulting get back to you and as soon as they basically say
everything's okay everything's
right there is a sort of an intermediate point where they wanna make sure that
everything is working properly and so
usually takes about a week before the first results are was but assume is there
last
then people can report them openly
so
in the first period people to the state and a lot of creative things
people submitted primary in contrast systems and
for the most are trying to or submissions word system combinations and we'll talk a
little bit about system combination because it really does seem to help except for the
swordfish
all performers were able to make the program targets in all languages including the surprise
using the full language pair
and that in the base language resource condition with no audio
and of course
there are other conditions where you could potentially do better
program targets were exceeded with ten hours of training and for the five languages by
some people
usually using system combination
right
system combination reduces
the token error rate and increases atwv compared to single systems
but even single system full
language pair
single system full language pack systems
maybe program target
with the with the language back
all systems have of course have very probable low false alarm
warring this miss rate places a significant role in increasing atwv and that something you
want to sort of keep in mind
and there were several collection factors that actually attracted atwv language dialect environment gender
and i'm good just show you some poor results i think that are sort of
interesting
i don't know that i don't think evolution this even to the performers i actually
put this together for my program review
in here in here is
this i the this slide are posted i'm accurately but up here
i call this from the
actually they're probably not posted
but you can see that the base lr full language pack
are all marketing reading
and you know not everybody submits to condition the only one that was required was
the full language pack a cell are
and you can see people made that are their targets
and in all the languages
gender affects atwv and what was kind of into the and word error as well
in the set of collections the females that better
i think you did better with female speech which is kind of interesting
i'm not all the languages sometimes by a lot look at the technology for example
the males are so much worse
i don't why i mean really we collect two thousand speakers
right for per language so
and i'm sure there's interactions
with other factors but environment is important
you can see overall
pooling all over all systems you get a and the average of point five one
atwv
so the car here and he
the unexpected environment were sort of
equally for the landline a mobile are the home office and those are sort of
the best
and then the place in street people are sort of
somewhere in between
typically those are probably done what cell phone so
but
when you want to cross language
this is kind of mse slide the card it is significantly worse than our past
over summaries and
and you know obviously partial was of her language overall but there's something going on
there
and it is kind of interesting you know you look at it turkish and the
land lines wonderful well they probably have a much more stable
environment for landline
in some of these maybe rare so maybe with pancho
so
was the predominant thing so i didn't i what i didn't give you is sort
of the breakout of the distributions
a dialect
dialect in atwv interacted and i gave the for teens and you can see
northeast northwest so used in southwest
southwest was really under-represented it was really became clear with the interesting collects twice it's
a i don't want people by right
but you can see people could still do something with that in some of these
were related but certainly the ones that have a higher amount of data
certainly word the past
and the ones that had the lower amount of data least amount of data we're
sort of the words
and that was true across the board
but certainly the dialect does dimension of challenge to the data
so
lamb
i think it's area specific
somehow or another and
getting a
and echo
so what helps well early and it was clear that
especially with the cantonese data that you gotta do
re segmentation of the data to remove this to do science modeling get rid of
the silence or you kinda screwed things
robust multi mlp features
works really important i think
it really paid "'em" played a major role deep learning
really started to shine in the program very early and then i think there's
lots and lots of room for to keep shining in doing
very interesting experiments
pitch features on a language were useful at least for most people
and that what kind of cool about that is that sort of gives hope for
more universal feature extraction
and
one of the things that was really extremely important was to develop methods for preserving
potential it's a search alternatives
and the variety of ways of doing that including
tensor lattices are smarter ways of doing the queries there there's a number of papers
here that you can probably
C and this topic
and in other
then used
combining systems especially wonderful training data really matters a lot
it matters a lot
whether you try to build the systems differently or we just randomly see them differently
system combination is very useful semi supervised training
is very helpful for acoustic model and features
and score normalization
really plays a big model so if you do nothing else score normalization gives you
a lot right
so i
i could i could report and number
of things i just picked a smattering of things
in typically the reason why a perfect it was not an endorsement per se but
it was largely because
there was some P
sure that sort of
with speech to the point i was trying to make
but certainly several of these have papers appearing here so i put them there when
i when i could sort of a lineup the result
because some of these results were things that i got from site visits as opposed
to from papers because
a group here i to prepare the talk
longer go
but you can see
the stacked bottleneck features versus the inter bottleneck features you get an eight percent reduction
in word error
and the anaconda competent improvement in terms of atwv
adding fundamental frequency in probability of voicing
reduces word error
we generate this was on vietnamese i believe
we generation neural network
we generation neural net it
and neural network targets at a percent and semi supervised training
helped a lot to
and those were all
additive right so
very cool
so features very important
deep learning is very helpful and
we have a comparison here between shallow and deep
and you can see the shallow versus the deep atwv and
you know two to three percent
absolute improvement this was using the kuwaiti tandem sat
fmpe full language pack models
pitch helps even for non-tonal language
this is this is from the M probably has been playing around with features
because he was very unhappy with how here performed with this
edge and vietnamese so he's basically done a lot of interesting network
and you can see you know when they the
as C
pitch feature sometimes the goes up
it goes down a little bit for by golly but his method that he incorporated
in the kaldi gives an improvement
and all those languages
so vietnamese and cantonese are tonal but you can see a semi something a like
and certainly a lot of other people
have a similar program your problem and so one
have this kind of result large lattices help up to a point
right so you've got a
i actually haven't so this is the data per
where random is up in the upper right corner
and the further down go the better but that curve shows the operating
performance in terms of trade off between probability of false alarm probability of miss
and so
in further down is really important
and so you can see the green line is done with small lattices
and the purple and the line is done with larger
and the normals
lattices and eventually it is diminishing returns but certainly reserving stuff
that you want to find is extremely important
knowledge of the keywords helps
so you can see
it helps even more with the limited language pack we're
you don't
you might not know about those words based on the ten hour subset
so if you know about the keywords
you can actually leverage that knowledge
in interesting ways like not running things weight you always wanna keep the probabilities right
but you might want to set
specific
beings for specific for
maybe and has developed
a white list approach
that using the audio we use so you can see here
here's knowledge
of the keywords before they basically do things and they get a re called keywords
about ninety two percent
without knowledge of the keyword that seventy four percent you can see there's the big
of done atwv and you can see the number of hits per keyword is much
lower in keywords without its
much higher
but if you simply look at say infrequent words that may be important
just boosting the P model that was actually does give you something
in terms of being able to preserve those
keywords and so the percent of recall somewhere between
and that's that that's beneficial right so it's preserving stuff
so that you don't perform things out
and
you look at system combination i think system combinations about preserving stuff to
you get big gains
this is this is that the data set so it's
you can see
the best system here to combine system
and all on a full language pack and a limited language pack and you can
see
icsi except for posh though
system combination gets you
about point three atwv
which is pretty amazing
i word errors but you know
you can actually make the target
amazing
here's another picture of system combination
where you can see the
the individual systems using various
putting
you know dnns be enough
and then you have the combination in routinely this is a limited language pair
results as well
so you're gonna see much more modest scores
light
good duh normalisation this is the bbn result and
dab in email
per language where you look at
cantonese part though turkish tagalog and vietnamese you can see normalisation gives you a significant
improvement
not always the same price the
the dev and that S right so there's some impact of the set
but you can see
normalisation and doing it well
is certainly a big part of the program and there's a lot of methods that
people are working on now including that that's
rescoring
and the other interesting result which i believe appears here as a poster
and i couldn't put all the names of the authors out there so
would be readable so i put in and all
but when you normalize is very important
so you've got the contrast between the no audio we used in the audio reuse
but you can you can look at either one row or the other row
and if i do
normalisation after system combination i only get so far but if i normalize before i
do system combination i do really well
and
if i normal is
after the best tokenization be more score combination i basically can build a single system
that is really better than what you produce you normalize weights and normalizing orally so
if you're doing combinations of various representations is important to get the scores on the
cities
and in the same
the same place i mean it
it is really important it makes a big difference
and quite frankly a single systems gonna be much easier to run so it's kind
of an interesting thing to know
the other people that appears here
is
touches on analysis so
effective thresholds on atwv is also
an interesting thing to look at where you can actually get a number or rules
we have a fair threshold so it's just based on
my notion of what i can do here we based on
what i have in the genoa
verses if i set the threshold to be a them all
for the key for each keyword
and then if i play around and make sure that i he
the things that matter and throw away the things that are so i basis that
the probability
of hits to one of my probability of missus does euro you can see
the probability space is also playing a major role
in terms of your ability to get the keywords it's not just a matter of
calibration
also getting better probabilities seems to be an important aspect as well
so there is a lot of interesting things that people can look at and certainly
analysis i think is really a very important aspect of the program so understanding
why something works why something doesn't work
why something doesn't work is such a prison such a bad thing it basically by
if you a piece of knowledge that really is important in terms of solving the
problem
we also had an open keyword search
valuation in two thousand thirteen for vietnamese and we have a lot of people i
we had before babel performers plus eight outside teens who ended up submitting systems
and
i was them here
we have eight wonderful volunteers who actually participated in the open kws meeting is of
the results in their all over the
all over the place i kinda put it up there
so that and these are posted right that the resulting in kws are posted you
can go take a look
but
if people want to participate in the next one maybe they won't feel so shy
about the possibility of submitting something that may not be
super certainly babel people
have a lot more practise with the data
but
you can see you know that the scores were all over the place
and but people really did a lot of interesting things and there was your resource
approaches as well as
is low resource approaches
so impure into we added six languages we have fried practice and one surprise
they only have sixty hours of transcribed training they do have the remaining twenty hours
untranscribed
there's also what ten hour training set
and they have to exceed the program targets now i'm both condition
because they got so close right so the
and also
approaches that use things like morphology and so one
where maybe they would help you to get
i in the ten hour set
maybe the sixty hours that are the eighty hours that is a little bit too
large
and then they'll help three weeks to build the surprise language
the languages are bengali a nasty so those were collected in the first period they
don't have another channel they're pure telephony
no we have a means illumination real allow and of course that we have a
surprise and i'm not gonna out that here
optimizing bengali i think our
somewhat
okay right but zoo appears to be quite challenging annotation real
appears to be quite
simple right and so these are aspects of the language i don't think they're aspects
up a collection
and then lower
will have its own challenges because again
we couldn't annotate the compounds
reliably and so
the lower words
not the not the borrowed words are
multi syllabic right there the syllables
but there's single slap excuse me
so cast would put together some of the challenges of this
and present it got but i thought that was interesting so that the notion of
the sure language models where you can
sure between then golly in a somebody's right
also means doesn't have this much of a web presence and so it sort of
an interesting thing to do reporting in the french for the haitian creole
the phonology there are stolen allowance to
lasso has told kinda like
cantonese and
and vietnamese but two tone is very different
unfortunately this is how we could not marking the legs kind of couldn't be done
reliably and so it didn't make sense to put it in the resource
and then you also have some six segmental
it's a segmental phonology issues by golly and
morphology use there is too big time maybe more so than in the big only
enough
the oov rate is
higher than any of the languages we've seen including turkish which
didn't really have a terrible oov rate
and then there's other aspects that linguists might be interested in looking at the likeness
levels
there's person to script something albion ask means are sort of very similar
strictly speaking at being falsely score for a some is also mean square but it
really is same as the bengali
and then you have wow which has an another script as well there's a lot
of code switching and fusion creole but there is available ones you to i certainly
see a
and so those can be problems
and then there's a lot of short words and haitian creole allow but
i guess the short or words are hurting haitian creole maybe we'll for well
maybe not
so exciting directions people are going in
one of the things that we want is more analysis and so we revise the
evaluation plan images posted at the open kws sites you can actually take a look
if you want
so that people can actually evaluate a lot more conditions and then
sure the conditions with each other so that
others can evaluate likewise
there's a lot of work going out multilingual processing is trying to sit right it's
very intriguing and very interesting
and i think yes the deep learning things to really
those neural net models or certainly seen to play a role in progress the people
are making
machine learning
sort of get a somewhat slow start because you're trying to integrate this community into
the speech community but they're beginning to take off
two so stay tuned i think that there is a lot of interesting things the
we're gonna happen
smart lattices and consensus networks were beginning to play a role at the end of
last
a period
but i think that there are actually making much progress now
and the thing is that a lot of work was done a consensus networks to
make it work with the keyword search task
originally it was developed by lydia
due to basically
you know do a last pass right before you gave your one best output
and it was great for that but there were things that you can do to
basically make it work a little bit better with the keyword search
and then morphology
again this is a community integration of people largely were context working with speech community
so there's a lot of tradeoffs between whether you wanna break
break up a little pieces words which might be something that's great if you're doing
text
and that's a great if you're doing speech so
a lot of a lot of the integration of the teens
is beginning to bear fruit there as well so
it's quite interesting and a big thing that i think is really important is the
getting by with less
so
ten hours of training were less
i don't seen results with less
but i certainly think would be cool
and no pronunciation like
so
everybody promise to do decimation studies but
to large extent you know the program targets unfortunately
seem to sometimes try the research toward program targets as opposed actual exploring the space
of experiments so there is there is a there's a tradeoff between having annual evaluations
in getting people to do research
but i really
really do hope that people will
explore these conditions "'cause" i think the really important
so i'm ending up with us why about the open kws
the slightest why
and you can see the timescale right so
registrations gonna close and G and at the end of january so if you're interested
at all
we use
do consider
the vietnamese language pack will be available for those of you who have not participated
before
the open kws people who have participated as long as they participate again can keep
the data
right so if you just keep participating you can actually keep all the surprise languages
and hopefully nist open
up some of those that language is by evaluating on them to
so there's lots of data it's very useful
there's a lot of things there that you could
do with that data above
to support basic speech recognition and other types of speech research
and hopefully by the time with the
and of the program this will be released publicly to everybody
since we all the data alright
but you can see
the surprise language bill
is gonna be sent
we can have or so before the
evaluation begins where we send a password so there won't be any problem with the
download
downloads gonna be a little bit harder since we have the channel data which is
not downsampled in any way
because we figured
that's an aspect of handling that data
right and then people have the three weeks we send out the evaluation pack ahead
of time as well as its larger
it's seventy five hours and some of that is channel data
right and this will send a password on april twenty eight there at which point
people have a week
to complete their submissions you can submit many things
this will keep an eye and things to make sure that submissions are sounded there's
problems
there is a point of contact and so on so
it should not be a very bad thing and the other thing that this
there will be an open kws meeting were everybody would be expected to participate so
there is sort of a bird there for people who might participate but
i think that the meeting last time was very valuable in
in table babel folks were really very generous in sharing their insights so
i think it's a great opportunity to hear about
the work
and be able to ask questions and interact with
the babel participants so i think it's a really good thing you have the open
kws
and last but not least this is the get up with the slide stage
this is this is one of the things you have to do and the pitch
for the program
i put a little task force there and
after languages cover
but
obviously it's nice to be able to say all but really there's the caviar that
this has to be a language that has an orthographic transcription
i have to say even just having a north the orthographic transcription does not make
it easy
to create a language and so some languages are really much more normalize than others
me
as much as we have done a lot of work in terms of normalizing english
and there's a lot of spelling variants that happened it's a lot harder to do
it in these other languages were there really isn't
well studied conventions so
all star the caviar because certainly
you really do have to have the ability the capability of being able to clean
up the language
even when there is a presence in the web
and tiny is we talked about them as well
we're moving down to ten to forty hours
working with variable recording conditions where they developed
system in a we
a big the immediate impact has been language data were i i've
shared language data in had opened of males
that impacts the community at also text the government
new methods and speech search speech systems is then sort of the medium impact
and getting affective keyword search a new languages deliver quickly as the ultimate delivery so
learning how to do that learning how to solve the problem of
this is a new language now build the system is really for it's the core
principle program
and everything really needs to be projected in that direction
and alternately there are lots of other ways to say well
what if i only have a certain amount of time to transcribe
we find that
we can do that very well programmatically the people can certainly investigate that right where
they consider
the time to what the time to transcribe and clean things up in terms of
selecting data that they can work with
the nice thing is that there is that eighty hours of audio regardless of how
much data you use and so there's a lot of room to investigate a wide
variety
of getting by with less
including getting by with no lexical
getting by without transcripts
at all certainly there is more like that going on in the program
it may not
the
price
that the best systems do but i would say it's all equally important and in
vital to the program so having a wide variety of things going out i think
is really important
i'm done so if you questions