0:00:15so but of this past leo and all be presenting a mit a side presentation
0:00:21for the summation
0:00:23and you know we have a number of system kind of focus more on the
0:00:27but i
0:00:29a little bit of analysis and even kind of listening to in the top one
0:00:34so in general basically wanna talk a little bit about the kind of systems that
0:00:37we looked at and then i'm gonna talk about which ones we end up using
0:00:40on the evaluation itself so maybe the primary submission
0:00:43but it up about the development data we have a how we ended up using
0:00:47that data panel things we look to try to augment that data
0:00:51then initial some of the averaging results we had in like everybody else was kind
0:00:55of surprise
0:00:56when we first so what happened on the original compared but we have seen on
0:01:00the development set
0:01:01then i'm gonna have a sum of the regions income
0:01:06so in terms of systems we looked at about well over ten systems
0:01:11not surprisingly the system that we looked at where you there
0:01:14i-vector systems or the nn
0:01:18i don't bottleneck systems in something in some way and are you know on the
0:01:23on the more conventional i-vectors subset of systems we have an about an sdc type
0:01:27system with cepstra and then we also have a system that was basically the same
0:01:32system with a speech added to it
0:01:35then we have kind of are set of the nn system would welcome x and
0:01:39bottleneck spa speech and modeled in a posteriors
0:01:43then we even though to things like and i'm in my system kind of also
0:01:47will system but in that case we were using a bottleneck features instead of kind
0:01:52of the conventional features we used in the past
0:01:55for the open task let me kind of emphasise that quite a bit we also
0:02:00tried the multilingual system
0:02:03and we use a five of the babel systems
0:02:06and we also had a few other systems that where maybe on the slightly more
0:02:10site you know we had a kind of what unit discover a system is kind
0:02:13of along the lines of what all the described earlier
0:02:15and we also have this of the nn counts multinomial model system which it something
0:02:20that i think my jeans gonna talked about what a bit more during his stuff
0:02:25and four turns out that for calibration we really didn't do anything nude that we
0:02:30don't over the last years maybe the last pretty of allegiance or sell so that
0:02:34wasn't really anything new on the on that site
0:02:37in next someone to talk a little bit of the development data
0:02:41and have you probably heard by now we have the six people are displayed
0:02:45we did a little wider be a waste of augmented that data and you know
0:02:50at the end there wasn't really a whole lot of things that work on the
0:02:53side of movement in the data and we basically ended up having
0:02:58kind of some reviews on the data where we had the full segments lost
0:03:02i mean the full utterances plus segments we derive from that same data so we
0:03:06can also the data twice except we so some sort of duration
0:03:10variability in kind of a lot of things we tried we tried you know kind
0:03:15of doing some working page a change in the spectrum of the to be looking
0:03:19at things like to create
0:03:20at the end of that really seen to hal a performance even what we
0:03:26one thing we did what we and we did not retrain
0:03:28our whole systems what we did retrain kind of what back and strategy so basically
0:03:33we can't more system specs on the data we had been developed mean with what
0:03:37we did retrain the back in with kind of basically the hundred percent of the
0:03:43in terms of be that was mainly by the way for though the fix that
0:03:48so for the open set of course like everybody else we did looked at one
0:03:52of the source if we had available and of course there's the word plenty of
0:03:55sources in there
0:03:56at the end you only know thing i'm you know system but really benefited by
0:04:01you have been this additional data was gonna the multilingual
0:04:04or just so basically most of the system we use on the open set as
0:04:08we have developed on the
0:04:10fixed condition except for of course the multilingual which kind of needed all the extra
0:04:15one thing and on wanna talk a little bit more and get into some specifics
0:04:20that doing that development
0:04:22we did notice that using all the data without available actually did not her did
0:04:26not help performance
0:04:28so it was our after doing some kind of all really experiments we decided to
0:04:33only at data in a few of the languages
0:04:35and i'm gonna talk about whether that was the best decision or not
0:04:40at least in the bottom and we need see that out in that
0:04:43data to those languages the performance
0:04:46in terms of the bottom results we sell
0:04:49in this addresses both kind of the cluster average in detail
0:04:53and then what happened between the de l
0:04:57fixed set and the open set
0:04:58in this kind of what we so we for the most part you know
0:05:02we select chinese and i've union kind of been the top this new ones on
0:05:06the dev set
0:05:07what the performance in general seem reasonable so we were kind of pretty happy with
0:05:11it in an average we were kind of one though
0:05:13one zero to some more about neighbourhood
0:05:15the other observation here i think
0:05:18is that on the open set we did see that we get a little bit
0:05:21of improvement over you know the that fixed condition
0:05:26so we may maybe we will see as much of maybe we could have expected
0:05:29what we saw some improvement so we so that was also recently
0:05:34not wouldn't talk a little bit about the evaluation results
0:05:38well so on the evaluation results kind of
0:05:41the bad we got a big this discrepancy we between what we saw doing the
0:05:46data set and what we so on the evaluation set
0:05:50in right away john like you know almost a year ten times some other also
0:05:56regions in here and things like we ended up some meeting a five way fusion
0:06:00of systems
0:06:01and we had a this unit discovery system
0:06:04we had a account system we had a the bottleneck features and we have the
0:06:09speech kind of conventional system that we
0:06:11that we train and that's just
0:06:13performance in that we of in their of obtaining was on the on c average
0:06:19of what a little bit lowest and point eighteen
0:06:22and of course also julian here
0:06:24this idea of what happened with the french cluster
0:06:27controlling both the performance we had as a whole in the performance you
0:06:30we have not dealt with the french cluster
0:06:34one other observation here is
0:06:37like everything else we ended up using all the systems and we had a greedy
0:06:40approach to kind of remotely panel of the lines of what you saw the are
0:06:43on the last person speech and then we sorted out
0:06:46after looking at
0:06:47big long evaluation of all a fusion of
0:06:50then we systems and five we system
0:06:52and we ended up with this five with some system and it does not for
0:06:56the most we were not necessarily
0:06:58that far off of the
0:07:00a human performance and we could have obtain so how we describe how we somehow
0:07:04know what our best system it would have been
0:07:07we actually would have been
0:07:08very little form some it into a kind of the oracle system
0:07:13other than that of one of the region is kind of like the best system
0:07:16we had an enormous
0:07:18for estimation what's the bottleneck feature system closely followed by the awards just
0:07:26of course in it something data has been talked about
0:07:29quite a bit by now
0:07:31there was these each with the french clustering and in each really kind of the
0:07:35main there were two things that they came in that we kind of talked about
0:07:39and the first one was applied seems like we're really building the channel detector which
0:07:43is kind of what all dimension and then there were all the things that a
0:07:46mean and we heard from
0:07:49ldc at the workshop that have to do well there might be older each is
0:07:53not only channel to sell before i forget that i wanna kind of drawing a
0:07:58common to the earlier discussion that all dine in dog had
0:08:02which is we did do a lot of analysis on the
0:08:06on the channel each you in two you know nine
0:08:09one thought that thing to my neighbours something in the and say can i say
0:08:13something different that what everybody of the same is whether the difference is that we
0:08:17analyze you know nine were mainly based on the language
0:08:21in here we're kind of one cluster classes which may or may not
0:08:26at to the discussion of all why we're seeing that this seems to be printing
0:08:30on the channel side even though
0:08:32apparently would people listened to it they're those difference and might not be there
0:08:38kind of
0:08:39going into more detail into is feature about the prince cluster we did see here
0:08:43that we do seem like things line up for channel and i mean the big
0:08:47feature and i think it was obvious earlier had to deal with the fact that
0:08:50for one of the languages just
0:08:52did not have any data on that channel
0:08:56so it seemed like the channel what that we so on the wall in was
0:08:59not a available all do in the dataset
0:09:01was kind of being more like going to the actual channel instead of the language
0:09:07and you can see here that you know there is there doesn't seem to be
0:09:10a big difference here between you know
0:09:12been able to tell the like that passes so far it's more like seems to
0:09:16be more on the channel
0:09:18one thing we did do
0:09:21it's kind of well we said well maybe just the nature of the problem when
0:09:24we kind of look at i don't know how to different a cluster and we
0:09:28look at the slap a cluster which was
0:09:31polish in russian and it does not we look at that cluster we didn't sing
0:09:35used to observe the same each you with this kind of channel alignment on so
0:09:40we were able to some extent even though there's the challenge channel element here we
0:09:43also can kind of tells the classes of what a lot better than
0:09:46we were able to do one the french cluster
0:09:50no i'm going to the
0:09:52the open condition kind of the though the main difference here like i said was
0:09:57then we have this modeling well bottleneck feature system
0:10:00and that was actually kind of replaced the but what system
0:10:06or compared to what we had only a fixed condition
0:10:09and once again the performance you was a little bit better was and not substantially
0:10:13better done on the on a fixed condition but it was a little bit better
0:10:16and like i said kind of the multilingual bottleneck seem to be you don't want
0:10:22run into the difference and was actually be different in this case
0:10:27like i said earlier one thing that came in
0:10:30okay and we were a little bit surprised so
0:10:32was the fact that using extra that did not seem to help on the development
0:10:37and you know you hear your kind of looking at what happen in the case
0:10:40of arabic and we added error rate in a number of ways and you can
0:10:43see on that
0:10:45on the lower right corner in there
0:10:47that for the most but it didn't seem like it make a big differences only
0:10:50kind of one particular scenario where we get a little bit of one improvement but
0:10:54it's not like svm or data we seem to consistently be able to
0:10:57get improvements
0:10:59one thing that actually
0:11:01also came into play list the fact that what happened as fast as we look
0:11:06at be used after the evaluation
0:11:09and one thing was that anything old also address some of this was that
0:11:15even though we did not see any improvements by adding data on the development set
0:11:19we would have taken substantial improvements we have care all that data in
0:11:24into the eval set of course one into is that a lot of that has
0:11:28to do with this labeled data that have that particular channel in there and whether
0:11:32there are some data in there that seems to be used the same data or
0:11:35not we didn't go in and it's precisely looked at you know are just precisely
0:11:39the same cluster not examples lines of course we're expecting that maybe not necessarily are
0:11:43the same
0:11:45body would have kind of substantially change or performance maybe on the order of thirty
0:11:48forty percent
0:11:50i don't think that we also did a little bit of after the eval was
0:11:54kind of keep looking at this multilingual bottleneck features and once again we don't is
0:11:59no you know are scored nine
0:12:01so someone think that we used a we also get some improvements on with dot
0:12:06the multilingual bottleneck feature system
0:12:09asked to change the diversity implanted in this is not completely
0:12:13linear meaning you doesn't mean that we go from five to seven five to ten
0:12:16and five fifteen any it's always improving it still kind of something we're getting a
0:12:21better handle on what it seems like there's some obviously some religion in their between
0:12:25the diversity of the languages we used to train the not be with
0:12:29and kind of the performance in once again i'd probably at this point with seen
0:12:33as much as tend to fifty percent improvement
0:12:38i don't think that we that idea actually post people was i try to listen
0:12:42to the languages that i know so spanish and english and
0:12:47kind of the idea was well you know for our system and once again in
0:12:51my assessment which i'm not a language
0:12:54you know if i listened to some of the errors we had
0:12:57you know is there anything i see and hear that seems to be system that
0:13:01once again for the u r submission any in the case of spanish basically what
0:13:05i ended up going we had a number of errors i mean probably for the
0:13:08holy but i think we had on the order of two thousand there was also
0:13:12and what if i just randomly picked fifty on each of these two languages this
0:13:16into them in figure out if there anything that seems to be somewhat system
0:13:20any disparage case there were two things that seem calm in the first one that
0:13:26i was a little bit surprising seem like we have we can do not a
0:13:30problem with human p
0:13:32once again know little white necessarily but my i mean one idea that comes to
0:13:37mind is maybe there were someone on the represented on the training
0:13:41and by the way when i say i i'd say expenditures i mean spending terrors
0:13:46i took out for to be used from the iberia clusters alarm and miss you
0:13:49know of the to the three classes advantage
0:13:52and i don't think that clean and one thing i'll want one other point is
0:13:56that i see example and all the error cycles across all directions i mean i
0:14:01probably the same to maybe a handful of forty seconds maybe ten or so on
0:14:06the order of ten seconds and maybe like seventy percent of the cards or about
0:14:10when you some
0:14:11we low twenties are on the three set up the range and that applied for
0:14:14both cases actually
0:14:17so one thing that's also
0:14:19i want to mention is
0:14:21we actually had within because we have on the stand aside between five and seven
0:14:25cats that either nonspeech on them
0:14:28or things like
0:14:30or a ladder or something so i mean how much you should be able to
0:14:34detect language from that
0:14:36not quite sure bottom what we obviously
0:14:40we obviously having five cards in there i mean seems like it might be a
0:14:44big number what it that all usefully extended to the whole set of errors we
0:14:49had all it's not clear but at least that's the observation on this limited
0:14:54set of data that i listen to on the english side we also kind of
0:14:58have also for this a mutual well basically empty any speech files most of them
0:15:04where you should on the three second what even on some of the ten seconds
0:15:07we would have this nominal ten seconds speech caught and then you'll see that you
0:15:13know the person rate
0:15:14comes here the first and maybe speaks for a second dozens there's nothing left while
0:15:19that gets detect it and then becomes they have again and maybe laughter something so
0:15:23there was a little bit of that once again i guess to some extent that's
0:15:26reality but it's kind of something product or that i wanted to bring up well
0:15:31to your attention
0:15:33the other thing was that in once again on this limited sample on the english
0:15:37side it seemed like most of the errors
0:15:40i so
0:15:41where between
0:15:43british english and american
0:15:46so there were
0:15:48maybe five rate
0:15:50we're seen there that they all in one way or another within the ending beach
0:15:53but most of the year maybe like i said there on the eighty percent or
0:15:56so we're actually confusion between rereading which
0:16:01in america
0:16:04and i think that's actually be a
0:16:08we're going too much right away so kind of i
0:16:11quickly as a
0:16:13let me just gotta go through you know we did see that there what a
0:16:16little bit of improvement vaguely future
0:16:18needless to say you know bottlenecks in the nn bayesian i-vectors
0:16:24dominated at well we're still kind of parsing out procedure with the french cluster i
0:16:29actually saw presentation yesterday that i think they for that some of the data kind
0:16:33of across like thinking with the true some of the data anything like they got
0:16:37really big improvements by using a little bit of the data of for training
0:16:40so it does seem like having the channel represented in the would include what a
0:16:44bit euro performance
0:16:45and you know there was also this each other's adding more data to sit down
0:16:50everything else i'd
0:16:51you know hindsight is twenty so we know
0:16:54and once again i guess the generally to is in the feature you know should
0:16:58we focus on
0:16:59some particular conditions or kind of think about in terms of robustness
0:17:03and not
0:17:11right now we have time for some questions
0:17:24so we were coming we
0:17:26only a week
0:17:29that probably also these errors that we have in the spanish clusters
0:17:35could be also you
0:17:37like to
0:17:38no each of the levels because it raises the question about
0:17:43it's that you will get a spanish your this pain
0:17:46from the cell
0:17:47it's closer to carry any spondees or
0:17:51to the regular responding a from spain
0:17:55a i mean
0:17:58in my personal experience
0:18:00i find like people like i think it's under the cr for example seen very
0:18:03close to the people in puerto rico
0:18:06wait closer than people from my three than anywhere else in to sell a decision
0:18:11like i saw a lot of those errors like a like people that really where
0:18:16maybe what i would hypothesize that been from the south spain
0:18:20community once again i saw somebody it's not like at least for our system it
0:18:24seems like this q one female notion was something that actually
0:18:29but absolutely i would i in my limited understanding and knowledge about this i would
0:18:33say i would have expected that because i you know the way that a that
0:18:38people from seem to me i once again that the cr would kind of draw
0:18:41a last syllables and seems like that is precisely the way people important people would
0:18:55thank you for your presentation in one of your s like to set that up
0:19:00for your opens the task you didn't use all the data set and for training
0:19:07this out of set model right all right so
0:19:11if i recall correctly what i said was that we only use the this you
0:19:18data for the multi
0:19:21lamel or not
0:19:24a lot i opens the data you well
0:19:27once again we remember not necessarily all the data what kind of the multilingual was
0:19:33trained on the five well label
0:19:36about sorry and you have mentioned here that adding more data did not solve the
0:19:43which data you added to your us all you know it's a paralysed addition or
0:19:49just blind
0:19:50no absolutely its size so we it like i shows i think on the
0:19:55on that are we is just an example or one
0:20:00so in this case we basically what if we drawing more error rate and basically
0:20:05what we get a serving on the test set so obviously you're kind of as
0:20:09everything else are doing the best you put on the dev set and hoping to
0:20:12make a good prediction of what we're gonna see on the eval set
0:20:15what we had observing here that are more are training data in this just one
0:20:20so that is to help the problem we have done training which makes included all
0:20:26the data that is going there like for the sources we had available and it's
0:20:30actually seem to work so we backtrack and only added training in some of the
0:20:35for instance again whether we didn't necessarily go back end of a little this on
0:20:40the eval data consistently say i mean if we had at your data you know
0:20:45we did the analysis of all we have done all the system that whole training
0:20:50data only
0:20:51that would have been better mostly because of the french last right because we would
0:20:55have labeled data that represent the that channel
0:20:58in about all the don't have happened only in it systematically i don't you know
0:21:03on this language would have well known that language without her so we have one
0:21:08okay thank
0:21:19i'm gonna ask those questions on the slide that you have here we used for
0:21:23the four
0:21:25from the other real whatever
0:21:28i'm gonna ask is the supreme assignments there because the reno
0:21:32reno when we did our test that if you threw away all the speech and
0:21:36just use well what you thought was silent you get a five percent danger
0:21:42i just one i'm not sure right eigen necessarily and channel dependent
0:21:51other questions
0:21:53people are usually good
0:21:56i think
0:21:57okay rock to think that drawing and via mit lincoln laboratory to