0:00:15 | so but of this past leo and all be presenting a mit a side presentation |
---|
0:00:21 | for the summation |
---|
0:00:23 | and you know we have a number of system kind of focus more on the |
---|
0:00:27 | but i |
---|
0:00:29 | a little bit of analysis and even kind of listening to in the top one |
---|
0:00:34 | so in general basically wanna talk a little bit about the kind of systems that |
---|
0:00:37 | we looked at and then i'm gonna talk about which ones we end up using |
---|
0:00:40 | on the evaluation itself so maybe the primary submission |
---|
0:00:43 | but it up about the development data we have a how we ended up using |
---|
0:00:47 | that data panel things we look to try to augment that data |
---|
0:00:51 | then initial some of the averaging results we had in like everybody else was kind |
---|
0:00:55 | of surprise |
---|
0:00:56 | when we first so what happened on the original compared but we have seen on |
---|
0:01:00 | the development set |
---|
0:01:01 | then i'm gonna have a sum of the regions income |
---|
0:01:06 | so in terms of systems we looked at about well over ten systems |
---|
0:01:11 | not surprisingly the system that we looked at where you there |
---|
0:01:14 | i-vector systems or the nn |
---|
0:01:18 | i don't bottleneck systems in something in some way and are you know on the |
---|
0:01:23 | on the more conventional i-vectors subset of systems we have an about an sdc type |
---|
0:01:27 | system with cepstra and then we also have a system that was basically the same |
---|
0:01:32 | system with a speech added to it |
---|
0:01:35 | then we have kind of are set of the nn system would welcome x and |
---|
0:01:39 | bottleneck spa speech and modeled in a posteriors |
---|
0:01:43 | then we even though to things like and i'm in my system kind of also |
---|
0:01:47 | will system but in that case we were using a bottleneck features instead of kind |
---|
0:01:52 | of the conventional features we used in the past |
---|
0:01:55 | for the open task let me kind of emphasise that quite a bit we also |
---|
0:02:00 | tried the multilingual system |
---|
0:02:03 | and we use a five of the babel systems |
---|
0:02:06 | and we also had a few other systems that where maybe on the slightly more |
---|
0:02:10 | site you know we had a kind of what unit discover a system is kind |
---|
0:02:13 | of along the lines of what all the described earlier |
---|
0:02:15 | and we also have this of the nn counts multinomial model system which it something |
---|
0:02:20 | that i think my jeans gonna talked about what a bit more during his stuff |
---|
0:02:24 | right |
---|
0:02:25 | and four turns out that for calibration we really didn't do anything nude that we |
---|
0:02:30 | don't over the last years maybe the last pretty of allegiance or sell so that |
---|
0:02:34 | wasn't really anything new on the on that site |
---|
0:02:37 | in next someone to talk a little bit of the development data |
---|
0:02:41 | and have you probably heard by now we have the six people are displayed |
---|
0:02:45 | we did a little wider be a waste of augmented that data and you know |
---|
0:02:50 | at the end there wasn't really a whole lot of things that work on the |
---|
0:02:53 | side of movement in the data and we basically ended up having |
---|
0:02:58 | kind of some reviews on the data where we had the full segments lost |
---|
0:03:02 | i mean the full utterances plus segments we derive from that same data so we |
---|
0:03:06 | can also the data twice except we so some sort of duration |
---|
0:03:10 | variability in kind of a lot of things we tried we tried you know kind |
---|
0:03:15 | of doing some working page a change in the spectrum of the to be looking |
---|
0:03:19 | at things like to create |
---|
0:03:20 | at the end of that really seen to hal a performance even what we |
---|
0:03:26 | one thing we did what we and we did not retrain |
---|
0:03:28 | our whole systems what we did retrain kind of what back and strategy so basically |
---|
0:03:33 | we can't more system specs on the data we had been developed mean with what |
---|
0:03:37 | we did retrain the back in with kind of basically the hundred percent of the |
---|
0:03:43 | in terms of be that was mainly by the way for though the fix that |
---|
0:03:48 | so for the open set of course like everybody else we did looked at one |
---|
0:03:52 | of the source if we had available and of course there's the word plenty of |
---|
0:03:55 | sources in there |
---|
0:03:56 | at the end you only know thing i'm you know system but really benefited by |
---|
0:04:01 | you have been this additional data was gonna the multilingual |
---|
0:04:04 | or just so basically most of the system we use on the open set as |
---|
0:04:08 | we have developed on the |
---|
0:04:10 | fixed condition except for of course the multilingual which kind of needed all the extra |
---|
0:04:15 | one thing and on wanna talk a little bit more and get into some specifics |
---|
0:04:19 | is |
---|
0:04:20 | that doing that development |
---|
0:04:22 | we did notice that using all the data without available actually did not her did |
---|
0:04:26 | not help performance |
---|
0:04:28 | so it was our after doing some kind of all really experiments we decided to |
---|
0:04:33 | only at data in a few of the languages |
---|
0:04:35 | and i'm gonna talk about whether that was the best decision or not |
---|
0:04:39 | so |
---|
0:04:40 | at least in the bottom and we need see that out in that |
---|
0:04:43 | data to those languages the performance |
---|
0:04:46 | in terms of the bottom results we sell |
---|
0:04:49 | in this addresses both kind of the cluster average in detail |
---|
0:04:53 | and then what happened between the de l |
---|
0:04:57 | fixed set and the open set |
---|
0:04:58 | in this kind of what we so we for the most part you know |
---|
0:05:02 | we select chinese and i've union kind of been the top this new ones on |
---|
0:05:06 | the dev set |
---|
0:05:07 | what the performance in general seem reasonable so we were kind of pretty happy with |
---|
0:05:11 | it in an average we were kind of one though |
---|
0:05:13 | one zero to some more about neighbourhood |
---|
0:05:15 | the other observation here i think |
---|
0:05:18 | is that on the open set we did see that we get a little bit |
---|
0:05:21 | of improvement over you know the that fixed condition |
---|
0:05:26 | so we may maybe we will see as much of maybe we could have expected |
---|
0:05:29 | what we saw some improvement so we so that was also recently |
---|
0:05:34 | not wouldn't talk a little bit about the evaluation results |
---|
0:05:38 | well so on the evaluation results kind of |
---|
0:05:41 | the bad we got a big this discrepancy we between what we saw doing the |
---|
0:05:46 | data set and what we so on the evaluation set |
---|
0:05:50 | so |
---|
0:05:50 | in right away john like you know almost a year ten times some other also |
---|
0:05:56 | regions in here and things like we ended up some meeting a five way fusion |
---|
0:06:00 | of systems |
---|
0:06:01 | and we had a this unit discovery system |
---|
0:06:04 | we had a account system we had a the bottleneck features and we have the |
---|
0:06:09 | speech kind of conventional system that we |
---|
0:06:11 | that we train and that's just |
---|
0:06:13 | performance in that we of in their of obtaining was on the on c average |
---|
0:06:19 | of what a little bit lowest and point eighteen |
---|
0:06:22 | and of course also julian here |
---|
0:06:24 | this idea of what happened with the french cluster |
---|
0:06:27 | controlling both the performance we had as a whole in the performance you |
---|
0:06:30 | we have not dealt with the french cluster |
---|
0:06:33 | and |
---|
0:06:34 | one other observation here is |
---|
0:06:36 | die |
---|
0:06:37 | like everything else we ended up using all the systems and we had a greedy |
---|
0:06:40 | approach to kind of remotely panel of the lines of what you saw the are |
---|
0:06:43 | on the last person speech and then we sorted out |
---|
0:06:46 | after looking at |
---|
0:06:47 | big long evaluation of all a fusion of |
---|
0:06:50 | then we systems and five we system |
---|
0:06:52 | and we ended up with this five with some system and it does not for |
---|
0:06:56 | the most we were not necessarily |
---|
0:06:58 | that far off of the |
---|
0:07:00 | a human performance and we could have obtain so how we describe how we somehow |
---|
0:07:04 | know what our best system it would have been |
---|
0:07:07 | we actually would have been |
---|
0:07:08 | very little form some it into a kind of the oracle system |
---|
0:07:13 | other than that of one of the region is kind of like the best system |
---|
0:07:16 | we had an enormous |
---|
0:07:18 | for estimation what's the bottleneck feature system closely followed by the awards just |
---|
0:07:26 | of course in it something data has been talked about |
---|
0:07:29 | quite a bit by now |
---|
0:07:31 | there was these each with the french clustering and in each really kind of the |
---|
0:07:35 | main there were two things that they came in that we kind of talked about |
---|
0:07:39 | and the first one was applied seems like we're really building the channel detector which |
---|
0:07:43 | is kind of what all dimension and then there were all the things that a |
---|
0:07:46 | mean and we heard from |
---|
0:07:49 | ldc at the workshop that have to do well there might be older each is |
---|
0:07:53 | not only channel to sell before i forget that i wanna kind of drawing a |
---|
0:07:58 | common to the earlier discussion that all dine in dog had |
---|
0:08:02 | which is we did do a lot of analysis on the |
---|
0:08:06 | on the channel each you in two you know nine |
---|
0:08:09 | one thought that thing to my neighbours something in the and say can i say |
---|
0:08:13 | something different that what everybody of the same is whether the difference is that we |
---|
0:08:17 | analyze you know nine were mainly based on the language |
---|
0:08:21 | in here we're kind of one cluster classes which may or may not |
---|
0:08:26 | at to the discussion of all why we're seeing that this seems to be printing |
---|
0:08:30 | on the channel side even though |
---|
0:08:32 | apparently would people listened to it they're those difference and might not be there |
---|
0:08:37 | so |
---|
0:08:38 | kind of |
---|
0:08:39 | going into more detail into is feature about the prince cluster we did see here |
---|
0:08:43 | that we do seem like things line up for channel and i mean the big |
---|
0:08:47 | feature and i think it was obvious earlier had to deal with the fact that |
---|
0:08:50 | for one of the languages just |
---|
0:08:52 | did not have any data on that channel |
---|
0:08:56 | so it seemed like the channel what that we so on the wall in was |
---|
0:08:59 | not a available all do in the dataset |
---|
0:09:01 | was kind of being more like going to the actual channel instead of the language |
---|
0:09:07 | and you can see here that you know there is there doesn't seem to be |
---|
0:09:10 | a big difference here between you know |
---|
0:09:12 | been able to tell the like that passes so far it's more like seems to |
---|
0:09:16 | be more on the channel |
---|
0:09:18 | element |
---|
0:09:18 | one thing we did do |
---|
0:09:21 | it's kind of well we said well maybe just the nature of the problem when |
---|
0:09:24 | we kind of look at i don't know how to different a cluster and we |
---|
0:09:28 | look at the slap a cluster which was |
---|
0:09:31 | polish in russian and it does not we look at that cluster we didn't sing |
---|
0:09:35 | used to observe the same each you with this kind of channel alignment on so |
---|
0:09:40 | we were able to some extent even though there's the challenge channel element here we |
---|
0:09:43 | also can kind of tells the classes of what a lot better than |
---|
0:09:46 | we were able to do one the french cluster |
---|
0:09:50 | no i'm going to the |
---|
0:09:52 | the open condition kind of the though the main difference here like i said was |
---|
0:09:57 | then we have this modeling well bottleneck feature system |
---|
0:10:00 | and that was actually kind of replaced the but what system |
---|
0:10:06 | or compared to what we had only a fixed condition |
---|
0:10:09 | and once again the performance you was a little bit better was and not substantially |
---|
0:10:13 | better done on the on a fixed condition but it was a little bit better |
---|
0:10:16 | and like i said kind of the multilingual bottleneck seem to be you don't want |
---|
0:10:20 | that |
---|
0:10:22 | run into the difference and was actually be different in this case |
---|
0:10:27 | like i said earlier one thing that came in |
---|
0:10:30 | okay and we were a little bit surprised so |
---|
0:10:32 | was the fact that using extra that did not seem to help on the development |
---|
0:10:36 | set |
---|
0:10:37 | and you know you hear your kind of looking at what happen in the case |
---|
0:10:40 | of arabic and we added error rate in a number of ways and you can |
---|
0:10:43 | see on that |
---|
0:10:45 | on the lower right corner in there |
---|
0:10:47 | that for the most but it didn't seem like it make a big differences only |
---|
0:10:50 | kind of one particular scenario where we get a little bit of one improvement but |
---|
0:10:54 | it's not like svm or data we seem to consistently be able to |
---|
0:10:57 | get improvements |
---|
0:10:59 | one thing that actually |
---|
0:11:01 | also came into play list the fact that what happened as fast as we look |
---|
0:11:06 | at be used after the evaluation |
---|
0:11:09 | and one thing was that anything old also address some of this was that |
---|
0:11:15 | even though we did not see any improvements by adding data on the development set |
---|
0:11:19 | we would have taken substantial improvements we have care all that data in |
---|
0:11:24 | into the eval set of course one into is that a lot of that has |
---|
0:11:28 | to do with this labeled data that have that particular channel in there and whether |
---|
0:11:32 | there are some data in there that seems to be used the same data or |
---|
0:11:35 | not we didn't go in and it's precisely looked at you know are just precisely |
---|
0:11:39 | the same cluster not examples lines of course we're expecting that maybe not necessarily are |
---|
0:11:43 | the same |
---|
0:11:45 | body would have kind of substantially change or performance maybe on the order of thirty |
---|
0:11:48 | forty percent |
---|
0:11:50 | i don't think that we also did a little bit of after the eval was |
---|
0:11:54 | kind of keep looking at this multilingual bottleneck features and once again we don't is |
---|
0:11:59 | no you know are scored nine |
---|
0:12:01 | so someone think that we used a we also get some improvements on with dot |
---|
0:12:06 | the multilingual bottleneck feature system |
---|
0:12:09 | asked to change the diversity implanted in this is not completely |
---|
0:12:13 | linear meaning you doesn't mean that we go from five to seven five to ten |
---|
0:12:16 | and five fifteen any it's always improving it still kind of something we're getting a |
---|
0:12:21 | better handle on what it seems like there's some obviously some religion in their between |
---|
0:12:25 | the diversity of the languages we used to train the not be with |
---|
0:12:29 | and kind of the performance in once again i'd probably at this point with seen |
---|
0:12:33 | as much as tend to fifty percent improvement |
---|
0:12:38 | i don't think that we that idea actually post people was i try to listen |
---|
0:12:42 | to the languages that i know so spanish and english and |
---|
0:12:47 | kind of the idea was well you know for our system and once again in |
---|
0:12:51 | my assessment which i'm not a language |
---|
0:12:54 | you know if i listened to some of the errors we had |
---|
0:12:57 | you know is there anything i see and hear that seems to be system that |
---|
0:13:01 | once again for the u r submission any in the case of spanish basically what |
---|
0:13:05 | i ended up going we had a number of errors i mean probably for the |
---|
0:13:08 | holy but i think we had on the order of two thousand there was also |
---|
0:13:12 | and what if i just randomly picked fifty on each of these two languages this |
---|
0:13:16 | into them in figure out if there anything that seems to be somewhat system |
---|
0:13:20 | any disparage case there were two things that seem calm in the first one that |
---|
0:13:26 | i was a little bit surprising seem like we have we can do not a |
---|
0:13:30 | problem with human p |
---|
0:13:32 | once again know little white necessarily but my i mean one idea that comes to |
---|
0:13:37 | mind is maybe there were someone on the represented on the training |
---|
0:13:41 | and by the way when i say i i'd say expenditures i mean spending terrors |
---|
0:13:46 | i took out for to be used from the iberia clusters alarm and miss you |
---|
0:13:49 | know of the to the three classes advantage |
---|
0:13:52 | and i don't think that clean and one thing i'll want one other point is |
---|
0:13:56 | that i see example and all the error cycles across all directions i mean i |
---|
0:14:01 | probably the same to maybe a handful of forty seconds maybe ten or so on |
---|
0:14:06 | the order of ten seconds and maybe like seventy percent of the cards or about |
---|
0:14:10 | when you some |
---|
0:14:11 | we low twenties are on the three set up the range and that applied for |
---|
0:14:14 | both cases actually |
---|
0:14:17 | so one thing that's also |
---|
0:14:19 | i want to mention is |
---|
0:14:21 | we actually had within because we have on the stand aside between five and seven |
---|
0:14:25 | cats that either nonspeech on them |
---|
0:14:28 | or things like |
---|
0:14:30 | or a ladder or something so i mean how much you should be able to |
---|
0:14:34 | detect language from that |
---|
0:14:36 | not quite sure bottom what we obviously |
---|
0:14:40 | we obviously having five cards in there i mean seems like it might be a |
---|
0:14:44 | big number what it that all usefully extended to the whole set of errors we |
---|
0:14:49 | had all it's not clear but at least that's the observation on this limited |
---|
0:14:54 | set of data that i listen to on the english side we also kind of |
---|
0:14:58 | have also for this a mutual well basically empty any speech files most of them |
---|
0:15:04 | where you should on the three second what even on some of the ten seconds |
---|
0:15:07 | we would have this nominal ten seconds speech caught and then you'll see that you |
---|
0:15:13 | know the person rate |
---|
0:15:14 | comes here the first and maybe speaks for a second dozens there's nothing left while |
---|
0:15:19 | that gets detect it and then becomes they have again and maybe laughter something so |
---|
0:15:23 | there was a little bit of that once again i guess to some extent that's |
---|
0:15:26 | reality but it's kind of something product or that i wanted to bring up well |
---|
0:15:30 | for |
---|
0:15:31 | to your attention |
---|
0:15:33 | the other thing was that in once again on this limited sample on the english |
---|
0:15:37 | side it seemed like most of the errors |
---|
0:15:40 | i so |
---|
0:15:41 | where between |
---|
0:15:43 | british english and american |
---|
0:15:46 | so there were |
---|
0:15:48 | maybe five rate |
---|
0:15:50 | we're seen there that they all in one way or another within the ending beach |
---|
0:15:53 | but most of the year maybe like i said there on the eighty percent or |
---|
0:15:56 | so we're actually confusion between rereading which |
---|
0:16:00 | and |
---|
0:16:01 | in america |
---|
0:16:04 | and i think that's actually be a |
---|
0:16:07 | particular |
---|
0:16:08 | we're going too much right away so kind of i |
---|
0:16:11 | quickly as a |
---|
0:16:13 | let me just gotta go through you know we did see that there what a |
---|
0:16:16 | little bit of improvement vaguely future |
---|
0:16:18 | needless to say you know bottlenecks in the nn bayesian i-vectors |
---|
0:16:24 | dominated at well we're still kind of parsing out procedure with the french cluster i |
---|
0:16:29 | actually saw presentation yesterday that i think they for that some of the data kind |
---|
0:16:33 | of across like thinking with the true some of the data anything like they got |
---|
0:16:37 | really big improvements by using a little bit of the data of for training |
---|
0:16:40 | so it does seem like having the channel represented in the would include what a |
---|
0:16:44 | bit euro performance |
---|
0:16:45 | and you know there was also this each other's adding more data to sit down |
---|
0:16:49 | quail |
---|
0:16:50 | everything else i'd |
---|
0:16:51 | you know hindsight is twenty so we know |
---|
0:16:54 | and once again i guess the generally to is in the feature you know should |
---|
0:16:58 | we focus on |
---|
0:16:59 | some particular conditions or kind of think about in terms of robustness |
---|
0:17:03 | and not |
---|
0:17:11 | right now we have time for some questions |
---|
0:17:24 | so we were coming we |
---|
0:17:26 | only a week |
---|
0:17:29 | that probably also these errors that we have in the spanish clusters |
---|
0:17:35 | could be also you |
---|
0:17:37 | like to |
---|
0:17:38 | no each of the levels because it raises the question about |
---|
0:17:43 | it's that you will get a spanish your this pain |
---|
0:17:46 | from the cell |
---|
0:17:47 | it's closer to carry any spondees or |
---|
0:17:51 | to the regular responding a from spain |
---|
0:17:55 | a i mean |
---|
0:17:58 | in my personal experience |
---|
0:18:00 | i find like people like i think it's under the cr for example seen very |
---|
0:18:03 | close to the people in puerto rico |
---|
0:18:06 | wait closer than people from my three than anywhere else in to sell a decision |
---|
0:18:11 | like i saw a lot of those errors like a like people that really where |
---|
0:18:16 | maybe what i would hypothesize that been from the south spain |
---|
0:18:20 | community once again i saw somebody it's not like at least for our system it |
---|
0:18:24 | seems like this q one female notion was something that actually |
---|
0:18:29 | but absolutely i would i in my limited understanding and knowledge about this i would |
---|
0:18:33 | say i would have expected that because i you know the way that a that |
---|
0:18:38 | people from seem to me i once again that the cr would kind of draw |
---|
0:18:41 | a last syllables and seems like that is precisely the way people important people would |
---|
0:18:46 | do |
---|
0:18:49 | cushion |
---|
0:18:55 | thank you for your presentation in one of your s like to set that up |
---|
0:19:00 | for your opens the task you didn't use all the data set and for training |
---|
0:19:05 | this |
---|
0:19:07 | this out of set model right all right so |
---|
0:19:11 | if i recall correctly what i said was that we only use the this you |
---|
0:19:16 | open |
---|
0:19:18 | data for the multi |
---|
0:19:21 | lamel or not |
---|
0:19:24 | a lot i opens the data you well |
---|
0:19:27 | once again we remember not necessarily all the data what kind of the multilingual was |
---|
0:19:33 | trained on the five well label |
---|
0:19:36 | about sorry and you have mentioned here that adding more data did not solve the |
---|
0:19:42 | problem |
---|
0:19:43 | which data you added to your us all you know it's a paralysed addition or |
---|
0:19:49 | just blind |
---|
0:19:50 | no absolutely its size so we it like i shows i think on the |
---|
0:19:55 | on that are we is just an example or one |
---|
0:20:00 | so in this case we basically what if we drawing more error rate and basically |
---|
0:20:05 | what we get a serving on the test set so obviously you're kind of as |
---|
0:20:09 | everything else are doing the best you put on the dev set and hoping to |
---|
0:20:12 | make a good prediction of what we're gonna see on the eval set |
---|
0:20:15 | what we had observing here that are more are training data in this just one |
---|
0:20:20 | so that is to help the problem we have done training which makes included all |
---|
0:20:26 | the data that is going there like for the sources we had available and it's |
---|
0:20:30 | actually seem to work so we backtrack and only added training in some of the |
---|
0:20:35 | for instance again whether we didn't necessarily go back end of a little this on |
---|
0:20:40 | the eval data consistently say i mean if we had at your data you know |
---|
0:20:45 | we did the analysis of all we have done all the system that whole training |
---|
0:20:50 | data only |
---|
0:20:51 | that would have been better mostly because of the french last right because we would |
---|
0:20:55 | have labeled data that represent the that channel |
---|
0:20:58 | in about all the don't have happened only in it systematically i don't you know |
---|
0:21:03 | on this language would have well known that language without her so we have one |
---|
0:21:08 | okay thank |
---|
0:21:14 | questions |
---|
0:21:19 | i'm gonna ask those questions on the slide that you have here we used for |
---|
0:21:23 | the four |
---|
0:21:25 | from the other real whatever |
---|
0:21:28 | i'm gonna ask is the supreme assignments there because the reno |
---|
0:21:32 | reno when we did our test that if you threw away all the speech and |
---|
0:21:36 | just use well what you thought was silent you get a five percent danger |
---|
0:21:41 | i |
---|
0:21:42 | i just one i'm not sure right eigen necessarily and channel dependent |
---|
0:21:51 | other questions |
---|
0:21:53 | people are usually good |
---|
0:21:56 | i think |
---|
0:21:57 | okay rock to think that drawing and via mit lincoln laboratory to |
---|