so but of this past leo and all be presenting a mit a side presentation
for the summation
and you know we have a number of system kind of focus more on the
but i
a little bit of analysis and even kind of listening to in the top one
so in general basically wanna talk a little bit about the kind of systems that
we looked at and then i'm gonna talk about which ones we end up using
on the evaluation itself so maybe the primary submission
but it up about the development data we have a how we ended up using
that data panel things we look to try to augment that data
then initial some of the averaging results we had in like everybody else was kind
of surprise
when we first so what happened on the original compared but we have seen on
the development set
then i'm gonna have a sum of the regions income
so in terms of systems we looked at about well over ten systems
not surprisingly the system that we looked at where you there
i-vector systems or the nn
i don't bottleneck systems in something in some way and are you know on the
on the more conventional i-vectors subset of systems we have an about an sdc type
system with cepstra and then we also have a system that was basically the same
system with a speech added to it
then we have kind of are set of the nn system would welcome x and
bottleneck spa speech and modeled in a posteriors
then we even though to things like and i'm in my system kind of also
will system but in that case we were using a bottleneck features instead of kind
of the conventional features we used in the past
for the open task let me kind of emphasise that quite a bit we also
tried the multilingual system
and we use a five of the babel systems
and we also had a few other systems that where maybe on the slightly more
site you know we had a kind of what unit discover a system is kind
of along the lines of what all the described earlier
and we also have this of the nn counts multinomial model system which it something
that i think my jeans gonna talked about what a bit more during his stuff
right
and four turns out that for calibration we really didn't do anything nude that we
don't over the last years maybe the last pretty of allegiance or sell so that
wasn't really anything new on the on that site
in next someone to talk a little bit of the development data
and have you probably heard by now we have the six people are displayed
we did a little wider be a waste of augmented that data and you know
at the end there wasn't really a whole lot of things that work on the
side of movement in the data and we basically ended up having
kind of some reviews on the data where we had the full segments lost
i mean the full utterances plus segments we derive from that same data so we
can also the data twice except we so some sort of duration
variability in kind of a lot of things we tried we tried you know kind
of doing some working page a change in the spectrum of the to be looking
at things like to create
at the end of that really seen to hal a performance even what we
one thing we did what we and we did not retrain
our whole systems what we did retrain kind of what back and strategy so basically
we can't more system specs on the data we had been developed mean with what
we did retrain the back in with kind of basically the hundred percent of the
in terms of be that was mainly by the way for though the fix that
so for the open set of course like everybody else we did looked at one
of the source if we had available and of course there's the word plenty of
sources in there
at the end you only know thing i'm you know system but really benefited by
you have been this additional data was gonna the multilingual
or just so basically most of the system we use on the open set as
we have developed on the
fixed condition except for of course the multilingual which kind of needed all the extra
one thing and on wanna talk a little bit more and get into some specifics
is
that doing that development
we did notice that using all the data without available actually did not her did
not help performance
so it was our after doing some kind of all really experiments we decided to
only at data in a few of the languages
and i'm gonna talk about whether that was the best decision or not
so
at least in the bottom and we need see that out in that
data to those languages the performance
in terms of the bottom results we sell
in this addresses both kind of the cluster average in detail
and then what happened between the de l
fixed set and the open set
in this kind of what we so we for the most part you know
we select chinese and i've union kind of been the top this new ones on
the dev set
what the performance in general seem reasonable so we were kind of pretty happy with
it in an average we were kind of one though
one zero to some more about neighbourhood
the other observation here i think
is that on the open set we did see that we get a little bit
of improvement over you know the that fixed condition
so we may maybe we will see as much of maybe we could have expected
what we saw some improvement so we so that was also recently
not wouldn't talk a little bit about the evaluation results
well so on the evaluation results kind of
the bad we got a big this discrepancy we between what we saw doing the
data set and what we so on the evaluation set
so
in right away john like you know almost a year ten times some other also
regions in here and things like we ended up some meeting a five way fusion
of systems
and we had a this unit discovery system
we had a account system we had a the bottleneck features and we have the
speech kind of conventional system that we
that we train and that's just
performance in that we of in their of obtaining was on the on c average
of what a little bit lowest and point eighteen
and of course also julian here
this idea of what happened with the french cluster
controlling both the performance we had as a whole in the performance you
we have not dealt with the french cluster
and
one other observation here is
die
like everything else we ended up using all the systems and we had a greedy
approach to kind of remotely panel of the lines of what you saw the are
on the last person speech and then we sorted out
after looking at
big long evaluation of all a fusion of
then we systems and five we system
and we ended up with this five with some system and it does not for
the most we were not necessarily
that far off of the
a human performance and we could have obtain so how we describe how we somehow
know what our best system it would have been
we actually would have been
very little form some it into a kind of the oracle system
other than that of one of the region is kind of like the best system
we had an enormous
for estimation what's the bottleneck feature system closely followed by the awards just
of course in it something data has been talked about
quite a bit by now
there was these each with the french clustering and in each really kind of the
main there were two things that they came in that we kind of talked about
and the first one was applied seems like we're really building the channel detector which
is kind of what all dimension and then there were all the things that a
mean and we heard from
ldc at the workshop that have to do well there might be older each is
not only channel to sell before i forget that i wanna kind of drawing a
common to the earlier discussion that all dine in dog had
which is we did do a lot of analysis on the
on the channel each you in two you know nine
one thought that thing to my neighbours something in the and say can i say
something different that what everybody of the same is whether the difference is that we
analyze you know nine were mainly based on the language
in here we're kind of one cluster classes which may or may not
at to the discussion of all why we're seeing that this seems to be printing
on the channel side even though
apparently would people listened to it they're those difference and might not be there
so
kind of
going into more detail into is feature about the prince cluster we did see here
that we do seem like things line up for channel and i mean the big
feature and i think it was obvious earlier had to deal with the fact that
for one of the languages just
did not have any data on that channel
so it seemed like the channel what that we so on the wall in was
not a available all do in the dataset
was kind of being more like going to the actual channel instead of the language
and you can see here that you know there is there doesn't seem to be
a big difference here between you know
been able to tell the like that passes so far it's more like seems to
be more on the channel
element
one thing we did do
it's kind of well we said well maybe just the nature of the problem when
we kind of look at i don't know how to different a cluster and we
look at the slap a cluster which was
polish in russian and it does not we look at that cluster we didn't sing
used to observe the same each you with this kind of channel alignment on so
we were able to some extent even though there's the challenge channel element here we
also can kind of tells the classes of what a lot better than
we were able to do one the french cluster
no i'm going to the
the open condition kind of the though the main difference here like i said was
then we have this modeling well bottleneck feature system
and that was actually kind of replaced the but what system
or compared to what we had only a fixed condition
and once again the performance you was a little bit better was and not substantially
better done on the on a fixed condition but it was a little bit better
and like i said kind of the multilingual bottleneck seem to be you don't want
that
run into the difference and was actually be different in this case
like i said earlier one thing that came in
okay and we were a little bit surprised so
was the fact that using extra that did not seem to help on the development
set
and you know you hear your kind of looking at what happen in the case
of arabic and we added error rate in a number of ways and you can
see on that
on the lower right corner in there
that for the most but it didn't seem like it make a big differences only
kind of one particular scenario where we get a little bit of one improvement but
it's not like svm or data we seem to consistently be able to
get improvements
one thing that actually
also came into play list the fact that what happened as fast as we look
at be used after the evaluation
and one thing was that anything old also address some of this was that
even though we did not see any improvements by adding data on the development set
we would have taken substantial improvements we have care all that data in
into the eval set of course one into is that a lot of that has
to do with this labeled data that have that particular channel in there and whether
there are some data in there that seems to be used the same data or
not we didn't go in and it's precisely looked at you know are just precisely
the same cluster not examples lines of course we're expecting that maybe not necessarily are
the same
body would have kind of substantially change or performance maybe on the order of thirty
forty percent
i don't think that we also did a little bit of after the eval was
kind of keep looking at this multilingual bottleneck features and once again we don't is
no you know are scored nine
so someone think that we used a we also get some improvements on with dot
the multilingual bottleneck feature system
asked to change the diversity implanted in this is not completely
linear meaning you doesn't mean that we go from five to seven five to ten
and five fifteen any it's always improving it still kind of something we're getting a
better handle on what it seems like there's some obviously some religion in their between
the diversity of the languages we used to train the not be with
and kind of the performance in once again i'd probably at this point with seen
as much as tend to fifty percent improvement
i don't think that we that idea actually post people was i try to listen
to the languages that i know so spanish and english and
kind of the idea was well you know for our system and once again in
my assessment which i'm not a language
you know if i listened to some of the errors we had
you know is there anything i see and hear that seems to be system that
once again for the u r submission any in the case of spanish basically what
i ended up going we had a number of errors i mean probably for the
holy but i think we had on the order of two thousand there was also
and what if i just randomly picked fifty on each of these two languages this
into them in figure out if there anything that seems to be somewhat system
any disparage case there were two things that seem calm in the first one that
i was a little bit surprising seem like we have we can do not a
problem with human p
once again know little white necessarily but my i mean one idea that comes to
mind is maybe there were someone on the represented on the training
and by the way when i say i i'd say expenditures i mean spending terrors
i took out for to be used from the iberia clusters alarm and miss you
know of the to the three classes advantage
and i don't think that clean and one thing i'll want one other point is
that i see example and all the error cycles across all directions i mean i
probably the same to maybe a handful of forty seconds maybe ten or so on
the order of ten seconds and maybe like seventy percent of the cards or about
when you some
we low twenties are on the three set up the range and that applied for
both cases actually
so one thing that's also
i want to mention is
we actually had within because we have on the stand aside between five and seven
cats that either nonspeech on them
or things like
or a ladder or something so i mean how much you should be able to
detect language from that
not quite sure bottom what we obviously
we obviously having five cards in there i mean seems like it might be a
big number what it that all usefully extended to the whole set of errors we
had all it's not clear but at least that's the observation on this limited
set of data that i listen to on the english side we also kind of
have also for this a mutual well basically empty any speech files most of them
where you should on the three second what even on some of the ten seconds
we would have this nominal ten seconds speech caught and then you'll see that you
know the person rate
comes here the first and maybe speaks for a second dozens there's nothing left while
that gets detect it and then becomes they have again and maybe laughter something so
there was a little bit of that once again i guess to some extent that's
reality but it's kind of something product or that i wanted to bring up well
for
to your attention
the other thing was that in once again on this limited sample on the english
side it seemed like most of the errors
i so
where between
british english and american
so there were
maybe five rate
we're seen there that they all in one way or another within the ending beach
but most of the year maybe like i said there on the eighty percent or
so we're actually confusion between rereading which
and
in america
and i think that's actually be a
particular
we're going too much right away so kind of i
quickly as a
let me just gotta go through you know we did see that there what a
little bit of improvement vaguely future
needless to say you know bottlenecks in the nn bayesian i-vectors
dominated at well we're still kind of parsing out procedure with the french cluster i
actually saw presentation yesterday that i think they for that some of the data kind
of across like thinking with the true some of the data anything like they got
really big improvements by using a little bit of the data of for training
so it does seem like having the channel represented in the would include what a
bit euro performance
and you know there was also this each other's adding more data to sit down
quail
everything else i'd
you know hindsight is twenty so we know
and once again i guess the generally to is in the feature you know should
we focus on
some particular conditions or kind of think about in terms of robustness
and not
right now we have time for some questions
so we were coming we
only a week
that probably also these errors that we have in the spanish clusters
could be also you
like to
no each of the levels because it raises the question about
it's that you will get a spanish your this pain
from the cell
it's closer to carry any spondees or
to the regular responding a from spain
a i mean
in my personal experience
i find like people like i think it's under the cr for example seen very
close to the people in puerto rico
wait closer than people from my three than anywhere else in to sell a decision
like i saw a lot of those errors like a like people that really where
maybe what i would hypothesize that been from the south spain
community once again i saw somebody it's not like at least for our system it
seems like this q one female notion was something that actually
but absolutely i would i in my limited understanding and knowledge about this i would
say i would have expected that because i you know the way that a that
people from seem to me i once again that the cr would kind of draw
a last syllables and seems like that is precisely the way people important people would
do
cushion
thank you for your presentation in one of your s like to set that up
for your opens the task you didn't use all the data set and for training
this
this out of set model right all right so
if i recall correctly what i said was that we only use the this you
open
data for the multi
lamel or not
a lot i opens the data you well
once again we remember not necessarily all the data what kind of the multilingual was
trained on the five well label
about sorry and you have mentioned here that adding more data did not solve the
problem
which data you added to your us all you know it's a paralysed addition or
just blind
no absolutely its size so we it like i shows i think on the
on that are we is just an example or one
so in this case we basically what if we drawing more error rate and basically
what we get a serving on the test set so obviously you're kind of as
everything else are doing the best you put on the dev set and hoping to
make a good prediction of what we're gonna see on the eval set
what we had observing here that are more are training data in this just one
so that is to help the problem we have done training which makes included all
the data that is going there like for the sources we had available and it's
actually seem to work so we backtrack and only added training in some of the
for instance again whether we didn't necessarily go back end of a little this on
the eval data consistently say i mean if we had at your data you know
we did the analysis of all we have done all the system that whole training
data only
that would have been better mostly because of the french last right because we would
have labeled data that represent the that channel
in about all the don't have happened only in it systematically i don't you know
on this language would have well known that language without her so we have one
okay thank
questions
i'm gonna ask those questions on the slide that you have here we used for
the four
from the other real whatever
i'm gonna ask is the supreme assignments there because the reno
reno when we did our test that if you threw away all the speech and
just use well what you thought was silent you get a five percent danger
i
i just one i'm not sure right eigen necessarily and channel dependent
other questions
people are usually good
i think
okay rock to think that drawing and via mit lincoln laboratory to