0:00:14she thank you also
0:00:18so the language recognition i-vector challenge had three main goals
0:00:26first to including attracts people from outside a regular community
0:00:32and to make
0:00:35this
0:00:37work that we do more accessible to that
0:00:39and the idea behind that was to people to explore new approaches and methods
0:00:47from machine learning and language recognition with the overall goal of improving performance and language
0:00:52recognition
0:00:55the task was open set language identification so given audio segments a which are and
0:01:00languages the audio segments spoken in or whether was and
0:01:06unknown language
0:01:09the data used was from previous and a cell l are used as well as
0:01:14from the i r pa babble program
0:01:17and the data was selected in such a manner such that multiple sources were used
0:01:25for each language in order to reduce
0:01:27the source and language fact
0:01:30and we're also select in order to have highly confusable languages included in the
0:01:37dataset
0:01:40accuracy the size of the data there were fifty languages and train and sixty five
0:01:44and dev and test
0:01:47about three hundred per language gender segments per language in the training and about a
0:01:53hundred
0:01:53and the devon test
0:01:55and we see the total number of segments all the way the right hand column
0:02:01fifteen hundred for training so about sixty four hundred for dev and about sixty five
0:02:05hundred for test
0:02:06and the training set did not include data that was from out of set
0:02:12the development set included and unlabeled out of set
0:02:15and the test set was divided into progress and evaluation subsets so we'll
0:02:21cover and just a moment
0:02:23people were able to upload their system outputs and receive some feedback on how that
0:02:29one and that was done using a progress set
0:02:32and then at the end of the evaluation period
0:02:36a feedback was given on an evaluation set in it was a partition so there's
0:02:40not overlap
0:02:44here we see data sources for each language
0:02:48on the
0:02:50right hand side i sure noisy that is to see
0:02:53you can see different corpora labels i think that a high-level we can say
0:02:58blue or conversational telephone speech green include
0:03:04broadcast narrowband speech and yellow is a combination of the two
0:03:09i think
0:03:10one thing to say is that if you look across
0:03:13the training data which is the i guess you're leftmost column
0:03:17the dev data which is in the middle and the test data to rest of
0:03:20the right
0:03:21the distribution across sources is very similar per language there are a few exceptions
0:03:27and as we mentioned there was no out of set
0:03:29due to the training
0:03:36and here we see us speech duration
0:03:41both in trained up and test
0:03:43training is this page that is green and test is blue
0:03:48and we see it again a similar distribution a model trained of interest
0:03:53this was low more
0:03:59the performance metric was error rates split into out of seven languages and within seven
0:04:04languages
0:04:06where the prior probability of a lot of seven languages point two three
0:04:15participation was
0:04:18wonderful a more than what will typically see and a lre
0:04:23was from international sites six continents and thirty one countries
0:04:30about eighty participants to model the data know little a fifty five per se but
0:04:34the results
0:04:36from
0:04:37forty four unique organisations
0:04:41during the evaluation period a little over seventy i'm sorry thirty seven hundred dollars emissions
0:04:46were submitted
0:04:49and that number continues to grow
0:04:54after which
0:04:59and mentioned that we
0:05:01i had more participation and the i-vector challenge that we need to be with your
0:05:05salary and we can see some other comparisons
0:05:09i guess i've not had said one of the main differences between the i-vector challenge
0:05:15and a traditional areas in the data that we distribute
0:05:19and the traditional battery we send a audio segments as input to systems and i-vector
0:05:26challenge we send i-vectors instead
0:05:30the task was different never to challenge as a open set identification instill detection
0:05:37and i-vector challenge the cost was based on a kind of total error rates per
0:05:43language and in the traditional laureates on miss and false alarm rates
0:05:48a larger number of target languages a different
0:05:52distribution of speech duration and mention that was log normal and i-vector challenge in the
0:05:57traditional array it's three ten and thirty second bins traditionally
0:06:02the challenge lasted much longer than the i-vector challenge
0:06:07and it
0:06:08but also the i-vector challenge results were
0:06:12feedback where it was given during the challenge period which is also about something we
0:06:16do in traditional evaluations
0:06:19and last there was a an evaluation platform that was online
0:06:27and this was something that we
0:06:30focused on for the i-vector challenge
0:06:33in particular the goal was to facilitate
0:06:36the evaluation process with limited human involvement
0:06:40all evaluation activities were conducted via this platform including receiving the data
0:06:47uploading submissions and been able to see how things went
0:06:56and now looking at some results on the y-axis we see
0:07:01cost
0:07:03and on the x-axis a time
0:07:06the first
0:07:07first diff i think is around may seventeenth the choice certainly first
0:07:12and the second floor
0:07:14large dip is on may twenty first so
0:07:18of about half roughly half of the progress made during the evaluation to place during
0:07:25the first
0:07:25two or three weeks or so
0:07:28and then during the remainder of four months the rest of the progress was made
0:07:37here we also see cost on the y-axis one x-axis we see
0:07:43participant id so these are really discrete it's sorted by best cost
0:07:49obtained on the evaluation
0:07:50a subset
0:07:52and so we see most of the sites be the be the baseline
0:07:59which is trained and a few sites be an oracle system so i guess speaking
0:08:03of speaking to both of these the baseline i believe is a simple
0:08:13a simple
0:08:17system that used cosine distance and oracle system used p lda
0:08:25so it's called oracle because there were unlabeled data that were distributed to the participants
0:08:30butts the oracle system used those labels
0:08:38and here we see the number of submissions per participant
0:08:42in general
0:08:43a participants you did well estimated more systems but there were
0:08:48a few exceptions i think now is a reasonable time dimension that
0:08:54participant id and
0:08:56site id the distinction between participants and site so
0:09:02participants as someone who signed up and maybe there were multiple participants personally so i
0:09:08use are not necessarily unrelated for example section three may have also been by thirty
0:09:15just
0:09:20and you receive results by a target language we have every year on the y-axis
0:09:27on x-axis we see language the lowest error or was received on
0:09:39parameters and highest on hindi
0:09:42what was surprising was english also had a high error rate
0:09:47second from can be actually of second for the right
0:09:51and the blue was the out of seven languages somewhere in the middle the pack
0:09:58and here we see results by speech duration i guess no surprise that is you
0:10:04get more audio
0:10:08you tend to do better
0:10:10one thing that
0:10:12is also may be interesting is there seems to be some diminishing marginal returns
0:10:17so if for example you had three seconds and you could get ten you do
0:10:26maybe
0:10:27we
0:10:28point to better but if you want from
0:10:34a ten to twenty
0:10:36the difference is not so great
0:10:38just as an example
0:10:42so some lessons learned
0:10:44wonderful participation were all very grateful for you in the audience to fit it is
0:10:51this was those we couldn't dryness today
0:10:55number of systems be the baseline that surprisingly six you're actually better than the oracle
0:10:59system sure hoping to learn more about
0:11:03a half of the improvement made as early on i which may just to reconsider
0:11:09the timeline
0:11:11surprisingly top systems do not all do so well on english
0:11:18performance of out of seven languages also was not is for this we might have
0:11:22expected
0:11:25we did not receive many system descriptions so it's unclear how many of the participants
0:11:32attended have its although
0:11:34later in the session will your from
0:11:38tops is thus able to capture stated in the a team that created top system
0:11:44that did develop level techniques and we'll see more that
0:11:48and the web platform ends up so please feel free to visit and participant the
0:11:54challenge now
0:11:57and see how see how you're doing
0:12:00and a quick plug for upcoming activities there's a story sixteen and workshop
0:12:06where the it speaker detection on telephone speech recorded over a variety of handsets
0:12:13similar to lre fifteen those are from layer there's now a fixed training condition as
0:12:17well as an open condition
0:12:20can see some other there so that the evaluation and there's also a twenty sixteen
0:12:25lre analysis workshop and all of this will be co-located with salty sixteen and
0:12:30send
0:12:32so it looks like we have time for
0:12:35for questions