| 0:00:06 | i |
|---|
| 0:00:07 | you start by one percent |
|---|
| 0:00:10 | this evaluation this was and therefore we made penis paying two years ago |
|---|
| 0:00:17 | this is the line of the presentation um |
|---|
| 0:00:20 | i |
|---|
| 0:00:21 | talk about the compass timeout duration than i would describe the this conditions |
|---|
| 0:00:26 | which are more much liking the nist evaluation |
|---|
| 0:00:30 | uh evaluations |
|---|
| 0:00:31 | we took uh nist evaluations as us |
|---|
| 0:00:35 | um an example and |
|---|
| 0:00:37 | i understand that in fact |
|---|
| 0:00:39 | so we can be uh |
|---|
| 0:00:41 | maybe uh |
|---|
| 0:00:42 | and ninety percent off |
|---|
| 0:00:44 | the condition |
|---|
| 0:00:45 | then uh i will describe |
|---|
| 0:00:47 | uh basically |
|---|
| 0:00:49 | we thought as possible |
|---|
| 0:00:51 | uh the results and then give some conclusions |
|---|
| 0:00:57 | well uh |
|---|
| 0:00:58 | this uh evaluation was uh supported by the spanish thematic network on speech technology to spain |
|---|
| 0:01:04 | uh it was uh uh but the feed the any of what's up on speech than only one was |
|---|
| 0:01:09 | that can be bought in november |
|---|
| 0:01:12 | with us tonight |
|---|
| 0:01:14 | and in that but in that uh were so um |
|---|
| 0:01:17 | the what the other two other operations on the speech trouble relation |
|---|
| 0:01:22 | and screens |
|---|
| 0:01:23 | speech synthesis |
|---|
| 0:01:24 | but wrong |
|---|
| 0:01:25 | the language recognition evaluation |
|---|
| 0:01:27 | i don't you know |
|---|
| 0:01:29 | and the what the was another motivation which was that uh or group was interested in developing |
|---|
| 0:01:36 | language recognition did not you for uh |
|---|
| 0:01:39 | spokane document retrieval applications |
|---|
| 0:01:43 | well what we see on all the points we have |
|---|
| 0:01:47 | we had in mind |
|---|
| 0:01:48 | when you sign in the evaluation |
|---|
| 0:01:50 | uh |
|---|
| 0:01:51 | well to promote collaboration between research group |
|---|
| 0:01:54 | in this pain also portable |
|---|
| 0:01:57 | uh secondly uh |
|---|
| 0:01:59 | to provide |
|---|
| 0:02:00 | i speech database |
|---|
| 0:02:01 | a specifically designed |
|---|
| 0:02:03 | two |
|---|
| 0:02:04 | uh perform and recognition in the language in spain |
|---|
| 0:02:08 | therefore languages in spain |
|---|
| 0:02:10 | not everybody knows that |
|---|
| 0:02:12 | yeah that |
|---|
| 0:02:13 | four official languages spoken in spain |
|---|
| 0:02:16 | and |
|---|
| 0:02:18 | then a another motivation was uh to ms word accuracy |
|---|
| 0:02:22 | that the state of the art systems |
|---|
| 0:02:24 | good a time for for this particular application |
|---|
| 0:02:28 | because these languages uh |
|---|
| 0:02:30 | yeah |
|---|
| 0:02:31 | how people in jointly in spain |
|---|
| 0:02:35 | so uh maybe this task |
|---|
| 0:02:37 | could be more challenging than |
|---|
| 0:02:39 | we could expect |
|---|
| 0:02:41 | and finally uh mister |
|---|
| 0:02:44 | this was a diffuse your uh motivation |
|---|
| 0:02:47 | maybe for some you |
|---|
| 0:02:49 | to mister the performance of systems developed |
|---|
| 0:02:52 | on a limited |
|---|
| 0:02:52 | a month |
|---|
| 0:02:53 | data |
|---|
| 0:02:56 | well uh the language |
|---|
| 0:02:58 | detection task was defined |
|---|
| 0:02:59 | same way of as for nist |
|---|
| 0:03:01 | i don't describe this is |
|---|
| 0:03:03 | the same |
|---|
| 0:03:04 | or been described |
|---|
| 0:03:06 | they were |
|---|
| 0:03:08 | uh |
|---|
| 0:03:11 | yeah that's |
|---|
| 0:03:12 | and this can be assumes |
|---|
| 0:03:13 | simple |
|---|
| 0:03:14 | uh |
|---|
| 0:03:15 | uh |
|---|
| 0:03:16 | what is the |
|---|
| 0:03:17 | described here this this is like |
|---|
| 0:03:19 | uh the current system development which is |
|---|
| 0:03:22 | a special one |
|---|
| 0:03:23 | and we need differentiate |
|---|
| 0:03:26 | yeah between uh |
|---|
| 0:03:28 | systems developed uh in three conditions |
|---|
| 0:03:31 | using any available materials |
|---|
| 0:03:34 | and systems uh developed |
|---|
| 0:03:36 | using only |
|---|
| 0:03:38 | did the date that we provide |
|---|
| 0:03:40 | okay |
|---|
| 0:03:40 | that was very special for this evaluation |
|---|
| 0:03:43 | we were interested in putting all the teams |
|---|
| 0:03:46 | at the same point |
|---|
| 0:03:47 | to develop their systems |
|---|
| 0:03:49 | and then to evaluate |
|---|
| 0:03:50 | what they could |
|---|
| 0:03:51 | do |
|---|
| 0:03:52 | starting from the |
|---|
| 0:03:53 | okay |
|---|
| 0:03:54 | well |
|---|
| 0:03:55 | regarding the set of trials we define it as for nist the closest set |
|---|
| 0:03:59 | uh this to one nation open set this one ratio |
|---|
| 0:04:03 | uh we also be fine |
|---|
| 0:04:05 | fig kind of segments |
|---|
| 0:04:06 | of uh for a second and second and three seconds segments |
|---|
| 0:04:11 | um we uh defined we used |
|---|
| 0:04:13 | the same performance measures |
|---|
| 0:04:15 | uh also uh defined by nice |
|---|
| 0:04:19 | you have |
|---|
| 0:04:20 | uh scene |
|---|
| 0:04:22 | this must or sin |
|---|
| 0:04:23 | the previous presentation |
|---|
| 0:04:26 | average calls |
|---|
| 0:04:27 | we also use |
|---|
| 0:04:28 | the seattle area |
|---|
| 0:04:30 | and finally that the course to |
|---|
| 0:04:32 | give uh qualitative |
|---|
| 0:04:34 | uh evaluation |
|---|
| 0:04:35 | systems |
|---|
| 0:04:36 | we uh we define the same priors and colours |
|---|
| 0:04:40 | of the last |
|---|
| 0:04:41 | to understand what we |
|---|
| 0:04:45 | well then database way to record it |
|---|
| 0:04:49 | it was found that call |
|---|
| 0:04:50 | okay |
|---|
| 0:04:51 | i recorded the database |
|---|
| 0:04:53 | from T V |
|---|
| 0:04:53 | in my home |
|---|
| 0:04:56 | uh just connecting idea to record the to the |
|---|
| 0:04:59 | to the decoder |
|---|
| 0:05:00 | the |
|---|
| 0:05:01 | couple T V reporter |
|---|
| 0:05:03 | and is described in the paper that a right |
|---|
| 0:05:07 | two thousand two |
|---|
| 0:05:09 | so it it it um important for target languages |
|---|
| 0:05:12 | spanish out on a second at least yeah |
|---|
| 0:05:16 | and also all the languages |
|---|
| 0:05:17 | just to i love open set test |
|---|
| 0:05:20 | uh the languages |
|---|
| 0:05:22 | uh |
|---|
| 0:05:23 | where friends portable used your money and english |
|---|
| 0:05:26 | five or two case is not so close to spanish that you can |
|---|
| 0:05:30 | uh |
|---|
| 0:05:31 | fig |
|---|
| 0:05:31 | so |
|---|
| 0:05:33 | for you want to these people uh |
|---|
| 0:05:35 | find many different |
|---|
| 0:05:37 | um the spanish too |
|---|
| 0:05:38 | it should |
|---|
| 0:05:39 | with the language |
|---|
| 0:05:40 | well uh audio files uh where uh what files yeah |
|---|
| 0:05:44 | um |
|---|
| 0:05:45 | sixteen Q don't hurt |
|---|
| 0:05:46 | uh |
|---|
| 0:05:47 | uh the last frequencies |
|---|
| 0:05:49 | sampling frequency |
|---|
| 0:05:50 | yeah well they were single channel |
|---|
| 0:05:52 | fig ten bits per sample compressed |
|---|
| 0:05:55 | P C M |
|---|
| 0:05:56 | uh this is another dot |
|---|
| 0:05:58 | friends with about the nist evaluation |
|---|
| 0:06:01 | speech signals would all start from T V souls including |
|---|
| 0:06:05 | a lot of speech or spontaneous |
|---|
| 0:06:06 | speech |
|---|
| 0:06:07 | what kind of environment conditions |
|---|
| 0:06:09 | uh yeah |
|---|
| 0:06:11 | for instance that could be |
|---|
| 0:06:12 | three speakers as speaking enough to be |
|---|
| 0:06:15 | second segment |
|---|
| 0:06:17 | so |
|---|
| 0:06:17 | that could be many speakers |
|---|
| 0:06:19 | speaking in the same |
|---|
| 0:06:20 | yeah |
|---|
| 0:06:21 | test set |
|---|
| 0:06:23 | well we define |
|---|
| 0:06:24 | he joins subsets of T V shows |
|---|
| 0:06:27 | to train development and evaluation |
|---|
| 0:06:29 | this was to make to i a guaranteed at different |
|---|
| 0:06:33 | more or less |
|---|
| 0:06:34 | that different speakers |
|---|
| 0:06:36 | uh where in each |
|---|
| 0:06:38 | in it |
|---|
| 0:06:38 | subset |
|---|
| 0:06:40 | and finally the only data bases pretty |
|---|
| 0:06:42 | small |
|---|
| 0:06:43 | four |
|---|
| 0:06:44 | then use a standard |
|---|
| 0:06:45 | it's just a fifty fifty hours long |
|---|
| 0:06:49 | is distributed to |
|---|
| 0:06:50 | C D V D but |
|---|
| 0:06:52 | and um |
|---|
| 0:06:53 | we are just now uh |
|---|
| 0:06:56 | talking with |
|---|
| 0:06:56 | the L D C to the distributed |
|---|
| 0:06:59 | two |
|---|
| 0:06:59 | L D C |
|---|
| 0:07:00 | and the train data set in |
|---|
| 0:07:03 | good |
|---|
| 0:07:03 | yeah |
|---|
| 0:07:04 | last |
|---|
| 0:07:05 | and fifty six hours |
|---|
| 0:07:07 | nine hours per target language |
|---|
| 0:07:09 | we don't provide any uh they got to uh |
|---|
| 0:07:13 | train |
|---|
| 0:07:14 | or something that |
|---|
| 0:07:15 | four |
|---|
| 0:07:16 | oh the seven languages |
|---|
| 0:07:17 | so just |
|---|
| 0:07:18 | nine hours |
|---|
| 0:07:19 | but that the language |
|---|
| 0:07:20 | that's that's all |
|---|
| 0:07:21 | and i audits of languages |
|---|
| 0:07:23 | i'll be uh |
|---|
| 0:07:24 | in the development dataset and in the evaluation that |
|---|
| 0:07:28 | which are more or less the we |
|---|
| 0:07:30 | have more or less the same structure |
|---|
| 0:07:32 | but |
|---|
| 0:07:33 | i don't |
|---|
| 0:07:34 | i would then |
|---|
| 0:07:36 | uh well when defining when deciding that about the database |
|---|
| 0:07:40 | uh we only choose uh |
|---|
| 0:07:42 | tools um |
|---|
| 0:07:44 | high snr as speech |
|---|
| 0:07:46 | described in sediments with |
|---|
| 0:07:47 | right |
|---|
| 0:07:48 | lemma noise |
|---|
| 0:07:49 | uh speech overlaps |
|---|
| 0:07:51 | they or what all of that all of them |
|---|
| 0:07:54 | well fit the foul |
|---|
| 0:07:55 | and the guidance documents for training they'll have to know then |
|---|
| 0:08:00 | restrictions |
|---|
| 0:08:02 | maybe five minutes |
|---|
| 0:08:03 | you could train with a five minute segment |
|---|
| 0:08:05 | with |
|---|
| 0:08:07 | so um but for seven ms for the betterment of automation |
|---|
| 0:08:11 | yeah |
|---|
| 0:08:12 | to cut |
|---|
| 0:08:13 | uh |
|---|
| 0:08:13 | lend restrictions |
|---|
| 0:08:15 | um |
|---|
| 0:08:16 | we are defined automatic |
|---|
| 0:08:18 | a way of constructing them |
|---|
| 0:08:21 | and by ensuring that they would enclosed by silence |
|---|
| 0:08:24 | more or less |
|---|
| 0:08:26 | yeah and in fact the subsets |
|---|
| 0:08:29 | well the subset of |
|---|
| 0:08:30 | three second segment |
|---|
| 0:08:31 | is |
|---|
| 0:08:32 | extracted from the subset |
|---|
| 0:08:34 | subset |
|---|
| 0:08:35 | of |
|---|
| 0:08:35 | then support six seven months |
|---|
| 0:08:37 | and the same way that and second segment |
|---|
| 0:08:40 | subset is extracted from the |
|---|
| 0:08:42 | the the second |
|---|
| 0:08:44 | segments option |
|---|
| 0:08:45 | uh quite difficult but |
|---|
| 0:08:47 | what |
|---|
| 0:08:47 | we tried is to ensure that differences in |
|---|
| 0:08:51 | in performance |
|---|
| 0:08:53 | uh would you only to uh the the land |
|---|
| 0:08:56 | not too |
|---|
| 0:08:57 | um |
|---|
| 0:08:58 | being testing against different material |
|---|
| 0:09:02 | and |
|---|
| 0:09:03 | the where sound tolerance in land |
|---|
| 0:09:06 | we use in fact |
|---|
| 0:09:08 | a segments between |
|---|
| 0:09:09 | three and five seconds |
|---|
| 0:09:10 | ten and twelve seconds and |
|---|
| 0:09:12 | fig the active duty cycle |
|---|
| 0:09:14 | where |
|---|
| 0:09:15 | the door |
|---|
| 0:09:16 | uh interval |
|---|
| 0:09:18 | and finally that they don and that's it |
|---|
| 0:09:20 | and the same for evaluation |
|---|
| 0:09:22 | but uh |
|---|
| 0:09:23 | one though send a candidate segment |
|---|
| 0:09:26 | yeah |
|---|
| 0:09:27 | sue me not i think for the three ratios so the where |
|---|
| 0:09:31 | six hundred seven is but duration |
|---|
| 0:09:33 | and for each iteration there were a one hundred twenty segments but by the language |
|---|
| 0:09:38 | and |
|---|
| 0:09:39 | i know that |
|---|
| 0:09:40 | one hundred twenty seven minutes of all the syllable |
|---|
| 0:09:43 | it's uh |
|---|
| 0:09:45 | i have to say that |
|---|
| 0:09:46 | this uh |
|---|
| 0:09:48 | it means that |
|---|
| 0:09:49 | yeah |
|---|
| 0:09:50 | the where exactly it |
|---|
| 0:09:52 | oh well too |
|---|
| 0:09:53 | uh yeah |
|---|
| 0:09:56 | twenty percent |
|---|
| 0:09:57 | of um |
|---|
| 0:09:59 | uh segments where i'll go |
|---|
| 0:10:00 | from out of seven |
|---|
| 0:10:02 | as |
|---|
| 0:10:02 | in the in both the development and evaluation purposes |
|---|
| 0:10:06 | which might |
|---|
| 0:10:07 | exactly |
|---|
| 0:10:08 | was what |
|---|
| 0:10:09 | what was defined |
|---|
| 0:10:10 | in the |
|---|
| 0:10:11 | in the right |
|---|
| 0:10:13 | thing |
|---|
| 0:10:15 | um |
|---|
| 0:10:17 | well |
|---|
| 0:10:17 | everybody database design |
|---|
| 0:10:19 | the proportions of known languages |
|---|
| 0:10:21 | where mate |
|---|
| 0:10:22 | they rely |
|---|
| 0:10:23 | but difficult for me to promote it |
|---|
| 0:10:25 | a different for development evaluation |
|---|
| 0:10:28 | and to avoid uh |
|---|
| 0:10:30 | tandem systems to reject specifically |
|---|
| 0:10:33 | so uh |
|---|
| 0:10:34 | kerry part of the table of uh |
|---|
| 0:10:36 | the distribution of segments |
|---|
| 0:10:38 | for development and evaluation |
|---|
| 0:10:40 | you can see that |
|---|
| 0:10:41 | there were |
|---|
| 0:10:42 | seventy sevens for friends |
|---|
| 0:10:43 | them for portuguese and |
|---|
| 0:10:45 | forty four english |
|---|
| 0:10:46 | and not for from the element in the development set |
|---|
| 0:10:50 | and evaluation set |
|---|
| 0:10:52 | the drawings were change |
|---|
| 0:10:54 | between for example to be sent english and german |
|---|
| 0:10:57 | so |
|---|
| 0:10:58 | was |
|---|
| 0:10:59 | may |
|---|
| 0:11:00 | this way |
|---|
| 0:11:02 | uh |
|---|
| 0:11:03 | evaluation do simply |
|---|
| 0:11:05 | there were there was on a rotation plan very similar to |
|---|
| 0:11:08 | that companies |
|---|
| 0:11:10 | uh the wherefore class conditions open set free open suppressed it consists of three judge that |
|---|
| 0:11:16 | restrict it |
|---|
| 0:11:17 | and three durations of the web |
|---|
| 0:11:19 | to attract |
|---|
| 0:11:20 | five |
|---|
| 0:11:21 | for it |
|---|
| 0:11:22 | this condition on uh it's fifteen percent just one single primary system |
|---|
| 0:11:27 | and any number of compressed before alternative systems |
|---|
| 0:11:31 | they wanted to pursue |
|---|
| 0:11:32 | ah the solution should be |
|---|
| 0:11:34 | so my submitted by by teams |
|---|
| 0:11:37 | in this uh evaluations format |
|---|
| 0:11:40 | at this file with one hundred trials section |
|---|
| 0:11:43 | fig spline |
|---|
| 0:11:45 | um but this depends what am i committed to specifically specified |
|---|
| 0:11:49 | whether or not the scores may be interpreted |
|---|
| 0:11:52 | us |
|---|
| 0:11:52 | look like |
|---|
| 0:11:53 | oh look |
|---|
| 0:11:53 | like that the errors |
|---|
| 0:11:55 | or not |
|---|
| 0:11:56 | and also to send this presents and to participate um |
|---|
| 0:12:01 | in the ann arbour scene with us tonight |
|---|
| 0:12:03 | and then with evolution evolution works |
|---|
| 0:12:06 | okay |
|---|
| 0:12:07 | systems where one |
|---|
| 0:12:08 | uh in fact according to their average goals |
|---|
| 0:12:12 | and are defined that way |
|---|
| 0:12:14 | in this |
|---|
| 0:12:15 | fancy |
|---|
| 0:12:16 | and |
|---|
| 0:12:17 | though |
|---|
| 0:12:18 | was run it |
|---|
| 0:12:19 | and right now i'm not work for the best system |
|---|
| 0:12:23 | the system the only you'll be in |
|---|
| 0:12:25 | the least |
|---|
| 0:12:25 | the average goes in there |
|---|
| 0:12:27 | see a |
|---|
| 0:12:28 | thirty condition |
|---|
| 0:12:30 | close to restrict it |
|---|
| 0:12:31 | on |
|---|
| 0:12:32 | other subset of |
|---|
| 0:12:34 | uh |
|---|
| 0:12:35 | still be second seven |
|---|
| 0:12:38 | well |
|---|
| 0:12:38 | this was there |
|---|
| 0:12:40 | the scale of their valuation |
|---|
| 0:12:42 | uh |
|---|
| 0:12:42 | in |
|---|
| 0:12:44 | few words |
|---|
| 0:12:45 | the work |
|---|
| 0:12:46 | three months for developing your system |
|---|
| 0:12:48 | and there were three weeks to uh |
|---|
| 0:12:50 | uh process they want vision of |
|---|
| 0:12:53 | and |
|---|
| 0:12:54 | i have to say that uh the database produced the database was produced |
|---|
| 0:12:58 | in |
|---|
| 0:12:59 | three models |
|---|
| 0:12:59 | from april to june |
|---|
| 0:13:01 | depends on a |
|---|
| 0:13:02 | and we also recorded some more um |
|---|
| 0:13:06 | data in september |
|---|
| 0:13:08 | to find something to |
|---|
| 0:13:08 | two |
|---|
| 0:13:09 | uh |
|---|
| 0:13:11 | and uh complete |
|---|
| 0:13:12 | the evaluation on the test |
|---|
| 0:13:14 | okay |
|---|
| 0:13:16 | that that you can find it in the paper |
|---|
| 0:13:19 | well |
|---|
| 0:13:20 | uh |
|---|
| 0:13:21 | i now i begin to describe herself |
|---|
| 0:13:24 | yeah |
|---|
| 0:13:25 | the work for participants |
|---|
| 0:13:27 | displayed in teams |
|---|
| 0:13:29 | percent including systems |
|---|
| 0:13:31 | things were from spain and what about |
|---|
| 0:13:34 | and uh there were two teams percent in a state of the art systems more or less |
|---|
| 0:13:39 | and the two first the first ones |
|---|
| 0:13:41 | T one T two |
|---|
| 0:13:43 | and the other two percent it assistance not |
|---|
| 0:13:46 | specifically designed for uh |
|---|
| 0:13:48 | a language recognition applications so |
|---|
| 0:13:51 | the the source world |
|---|
| 0:13:52 | just the table of uh the average cost |
|---|
| 0:13:55 | four |
|---|
| 0:13:56 | uh thirty second segment |
|---|
| 0:13:58 | you can |
|---|
| 0:13:59 | see that |
|---|
| 0:14:00 | there |
|---|
| 0:14:00 | performance as well |
|---|
| 0:14:02 | very bad |
|---|
| 0:14:04 | so uh in the following that it will only |
|---|
| 0:14:08 | because |
|---|
| 0:14:08 | talk about |
|---|
| 0:14:09 | results of these two two |
|---|
| 0:14:11 | to to |
|---|
| 0:14:13 | okay |
|---|
| 0:14:14 | well |
|---|
| 0:14:15 | yeah no i somersaults uh first the |
|---|
| 0:14:18 | condition i |
|---|
| 0:14:20 | talk about |
|---|
| 0:14:21 | is |
|---|
| 0:14:21 | the the mandatory one |
|---|
| 0:14:23 | for which they almost all the teams have to |
|---|
| 0:14:26 | the centre system |
|---|
| 0:14:28 | you can see here |
|---|
| 0:14:29 | cool |
|---|
| 0:14:30 | that's good |
|---|
| 0:14:31 | um |
|---|
| 0:14:33 | this uh |
|---|
| 0:14:34 | like what this |
|---|
| 0:14:35 | one is uh |
|---|
| 0:14:37 | for a contrastive system funding from T one with |
|---|
| 0:14:40 | in fact |
|---|
| 0:14:41 | uh got the best result |
|---|
| 0:14:43 | the best the average goes |
|---|
| 0:14:45 | but the best primary system was also |
|---|
| 0:14:47 | from T one |
|---|
| 0:14:49 | uh |
|---|
| 0:14:50 | they have the |
|---|
| 0:14:51 | then um |
|---|
| 0:14:54 | okay |
|---|
| 0:14:55 | yeah |
|---|
| 0:14:56 | when |
|---|
| 0:14:57 | when channel |
|---|
| 0:14:57 | this was i |
|---|
| 0:14:58 | okay |
|---|
| 0:14:59 | to say that this was this was in restrictive conditions |
|---|
| 0:15:02 | these systems to come see a |
|---|
| 0:15:04 | big difference with |
|---|
| 0:15:06 | T seem to on T V |
|---|
| 0:15:08 | team one |
|---|
| 0:15:09 | uh because uh |
|---|
| 0:15:11 | they were already uh develop their systems |
|---|
| 0:15:14 | using |
|---|
| 0:15:15 | this to the data provided in this one which |
|---|
| 0:15:18 | not using |
|---|
| 0:15:19 | any other sisters |
|---|
| 0:15:20 | they rely on all the data and all the like |
|---|
| 0:15:23 | okay |
|---|
| 0:15:24 | so when changing to the three conditions |
|---|
| 0:15:28 | uh |
|---|
| 0:15:29 | with |
|---|
| 0:15:29 | see the systems uh got |
|---|
| 0:15:32 | uh |
|---|
| 0:15:32 | much better |
|---|
| 0:15:33 | performance |
|---|
| 0:15:34 | around |
|---|
| 0:15:35 | five percent equal error rate |
|---|
| 0:15:37 | but the |
|---|
| 0:15:39 | in fact we were surprised |
|---|
| 0:15:41 | by this result because we expect it |
|---|
| 0:15:43 | much better results |
|---|
| 0:15:45 | around one percent or less |
|---|
| 0:15:47 | yeah |
|---|
| 0:15:49 | and uh we uh |
|---|
| 0:15:51 | made some experiments afterwards |
|---|
| 0:15:53 | the the the one mission which |
|---|
| 0:15:55 | on system |
|---|
| 0:15:57 | a system that got on there |
|---|
| 0:16:00 | fig |
|---|
| 0:16:01 | he percent or whatever right |
|---|
| 0:16:02 | in the general language recognition task defined in use |
|---|
| 0:16:05 | two thousand seven |
|---|
| 0:16:07 | evaluation |
|---|
| 0:16:08 | and we've got |
|---|
| 0:16:09 | five |
|---|
| 0:16:10 | yeah |
|---|
| 0:16:11 | forty five percent whatever right |
|---|
| 0:16:13 | so |
|---|
| 0:16:14 | five |
|---|
| 0:16:15 | it seems that uh this task |
|---|
| 0:16:17 | mm |
|---|
| 0:16:18 | the task defined for |
|---|
| 0:16:19 | for about seeing this evaluation |
|---|
| 0:16:22 | is |
|---|
| 0:16:23 | uh more difficult than |
|---|
| 0:16:24 | i'm spec |
|---|
| 0:16:25 | okay |
|---|
| 0:16:27 | there are some possible issues not the same that's another thing that data results comparable comparable |
|---|
| 0:16:33 | between the knees |
|---|
| 0:16:34 | evaluation on this evaluation |
|---|
| 0:16:36 | maybe not |
|---|
| 0:16:38 | the statistical significance |
|---|
| 0:16:40 | there are not many uh trials |
|---|
| 0:16:42 | only |
|---|
| 0:16:43 | yeah six hundred there |
|---|
| 0:16:44 | uh |
|---|
| 0:16:46 | but the nation |
|---|
| 0:16:47 | okay |
|---|
| 0:16:48 | and there are also some possible explanations maybe the acoustic variability a speaker's channel |
|---|
| 0:16:53 | background noise |
|---|
| 0:16:55 | there were different conditions |
|---|
| 0:16:56 | and also their phonetic and lexical we |
|---|
| 0:16:59 | but for these |
|---|
| 0:17:01 | uh |
|---|
| 0:17:01 | the phonetic on lexical similarity among body language |
|---|
| 0:17:05 | or more than one |
|---|
| 0:17:07 | the same country |
|---|
| 0:17:09 | oh no |
|---|
| 0:17:10 | many years many centuries |
|---|
| 0:17:11 | what we |
|---|
| 0:17:12 | don't leave so |
|---|
| 0:17:14 | maybe this is the |
|---|
| 0:17:15 | then race |
|---|
| 0:17:16 | in any case |
|---|
| 0:17:18 | size have said that that seems uh challenging enough for that |
|---|
| 0:17:22 | a lot of other research |
|---|
| 0:17:24 | in |
|---|
| 0:17:25 | language recognition technology |
|---|
| 0:17:27 | well |
|---|
| 0:17:27 | yeah this is |
|---|
| 0:17:29 | we have been talking about their clothes |
|---|
| 0:17:32 | set |
|---|
| 0:17:32 | condition now i'm talking about |
|---|
| 0:17:35 | that opens the condition |
|---|
| 0:17:36 | the best performance in this case was |
|---|
| 0:17:38 | worse |
|---|
| 0:17:39 | like for |
|---|
| 0:17:41 | yeah because there are |
|---|
| 0:17:43 | uh well known languages |
|---|
| 0:17:44 | in there |
|---|
| 0:17:44 | the trials |
|---|
| 0:17:46 | and with the systems that the system works around nine percent accurate |
|---|
| 0:17:51 | this case |
|---|
| 0:17:52 | which is almost |
|---|
| 0:17:53 | two times they were raiding the |
|---|
| 0:17:55 | close to completion |
|---|
| 0:17:57 | so that |
|---|
| 0:17:57 | three conditions |
|---|
| 0:17:59 | yeah well |
|---|
| 0:18:00 | or conclusion is that some unknown languages are being confused with body language is maybe or to be some friends |
|---|
| 0:18:06 | we don't know |
|---|
| 0:18:09 | well |
|---|
| 0:18:09 | yeah you have uh |
|---|
| 0:18:11 | there was these results |
|---|
| 0:18:12 | uh the second rate for languages for target languages |
|---|
| 0:18:15 | uh uh for the best system |
|---|
| 0:18:18 | so you can hear you can see for the close |
|---|
| 0:18:21 | set condition i'm for that opens the condition |
|---|
| 0:18:24 | and |
|---|
| 0:18:25 | the green who is for bus |
|---|
| 0:18:28 | which got the best |
|---|
| 0:18:29 | uh performance |
|---|
| 0:18:31 | and then uh |
|---|
| 0:18:32 | right put his fork at a time |
|---|
| 0:18:34 | we've got a |
|---|
| 0:18:35 | worst performance |
|---|
| 0:18:36 | in opens the condition |
|---|
| 0:18:38 | and you can see the uh that bus |
|---|
| 0:18:42 | the change in the performance |
|---|
| 0:18:43 | for bass |
|---|
| 0:18:44 | really |
|---|
| 0:18:45 | it's more |
|---|
| 0:18:47 | also forced by means which is the |
|---|
| 0:18:50 | i think |
|---|
| 0:18:51 | but this is the kid a blue |
|---|
| 0:18:53 | and the power point it's not easy and which also |
|---|
| 0:18:56 | uh wasn't |
|---|
| 0:18:57 | yeah |
|---|
| 0:18:57 | it's performance but not as |
|---|
| 0:18:59 | much as a forecast for qatar |
|---|
| 0:19:02 | so are we have uh analyse this in more time |
|---|
| 0:19:06 | with |
|---|
| 0:19:06 | this table |
|---|
| 0:19:07 | i have to say that |
|---|
| 0:19:09 | there is are a right uh never in the paper |
|---|
| 0:19:12 | uh these numbers are |
|---|
| 0:19:13 | five |
|---|
| 0:19:14 | the uh error rates |
|---|
| 0:19:16 | uh |
|---|
| 0:19:17 | you need some before somehow |
|---|
| 0:19:19 | we missing there |
|---|
| 0:19:21 | in the |
|---|
| 0:19:22 | dialogue now and |
|---|
| 0:19:23 | be false alarm aside diagonal |
|---|
| 0:19:25 | um |
|---|
| 0:19:27 | uh we mistake them as coast |
|---|
| 0:19:29 | so yes this |
|---|
| 0:19:30 | they they did but they soon as the same but |
|---|
| 0:19:32 | the |
|---|
| 0:19:34 | the numbers are not |
|---|
| 0:19:35 | what the paper says |
|---|
| 0:19:37 | they are |
|---|
| 0:19:37 | okay |
|---|
| 0:19:38 | ah as you can |
|---|
| 0:19:40 | there is a reliable recall here |
|---|
| 0:19:43 | the white meaning |
|---|
| 0:19:44 | zero there are and black meaning |
|---|
| 0:19:47 | one |
|---|
| 0:19:48 | the maximum possible error |
|---|
| 0:19:51 | uh yeah |
|---|
| 0:19:51 | so this is for the close and so condition |
|---|
| 0:19:54 | and when changing to the open set |
|---|
| 0:19:57 | conceive here |
|---|
| 0:19:59 | really |
|---|
| 0:20:00 | usually that for at a time |
|---|
| 0:20:01 | no languages |
|---|
| 0:20:03 | tyler |
|---|
| 0:20:04 | you find that |
|---|
| 0:20:05 | um |
|---|
| 0:20:06 | many uh |
|---|
| 0:20:07 | trials |
|---|
| 0:20:08 | corresponding to known languages where |
|---|
| 0:20:10 | confused with qatar |
|---|
| 0:20:12 | that's |
|---|
| 0:20:12 | the origin of |
|---|
| 0:20:14 | that uh changing the core |
|---|
| 0:20:17 | for the open set condition |
|---|
| 0:20:19 | okay |
|---|
| 0:20:20 | uh |
|---|
| 0:20:21 | this assault and not going to comment |
|---|
| 0:20:23 | this because it's the same for us for me is |
|---|
| 0:20:26 | the the performance |
|---|
| 0:20:27 | uh watson's us |
|---|
| 0:20:29 | the |
|---|
| 0:20:30 | double of the land |
|---|
| 0:20:31 | uh is |
|---|
| 0:20:32 | less |
|---|
| 0:20:34 | of the around |
|---|
| 0:20:35 | the segment |
|---|
| 0:20:37 | and uh |
|---|
| 0:20:38 | this is for |
|---|
| 0:20:39 | more interesting for me |
|---|
| 0:20:41 | is |
|---|
| 0:20:41 | because uh |
|---|
| 0:20:43 | you can see |
|---|
| 0:20:44 | what happens when you restrict |
|---|
| 0:20:46 | they get the the bottom we conditions |
|---|
| 0:20:49 | uh you have here um |
|---|
| 0:20:51 | yeah two different teams |
|---|
| 0:20:53 | the blue ones |
|---|
| 0:20:54 | being one |
|---|
| 0:20:55 | the right one is team too |
|---|
| 0:20:57 | and |
|---|
| 0:20:58 | for the uh three |
|---|
| 0:21:00 | condition |
|---|
| 0:21:01 | well then uh got more or less the same |
|---|
| 0:21:04 | performance |
|---|
| 0:21:06 | this |
|---|
| 0:21:06 | to go |
|---|
| 0:21:08 | but for the restricted condition when |
|---|
| 0:21:10 | restricting the materials |
|---|
| 0:21:12 | they could use |
|---|
| 0:21:13 | to double their systems |
|---|
| 0:21:16 | what uh the T one |
|---|
| 0:21:18 | okay |
|---|
| 0:21:19 | the the performance |
|---|
| 0:21:21 | quite close to the |
|---|
| 0:21:22 | to the other condition where is the the other one |
|---|
| 0:21:25 | uh |
|---|
| 0:21:26 | the performance was much |
|---|
| 0:21:28 | much worse |
|---|
| 0:21:29 | the difference |
|---|
| 0:21:30 | it's |
|---|
| 0:21:32 | uh |
|---|
| 0:21:33 | forty percent word |
|---|
| 0:21:34 | a or its goals |
|---|
| 0:21:35 | to uh |
|---|
| 0:21:36 | four hundred percent |
|---|
| 0:21:38 | what are its goals |
|---|
| 0:21:40 | these |
|---|
| 0:21:41 | okay |
|---|
| 0:21:41 | so |
|---|
| 0:21:42 | i think this is important because |
|---|
| 0:21:44 | you can |
|---|
| 0:21:45 | i for me |
|---|
| 0:21:46 | this is spent is much more robust |
|---|
| 0:21:48 | now the other one |
|---|
| 0:21:50 | because it |
|---|
| 0:21:51 | not |
|---|
| 0:21:52 | does |
|---|
| 0:21:52 | does not depend |
|---|
| 0:21:54 | on |
|---|
| 0:21:55 | so much on the materials provided to to trying to to to train it |
|---|
| 0:21:59 | okay |
|---|
| 0:22:00 | well |
|---|
| 0:22:01 | conclusions |
|---|
| 0:22:04 | um |
|---|
| 0:22:05 | well |
|---|
| 0:22:06 | we i thought sent it uh that what was an evaluation involving |
|---|
| 0:22:10 | the official language in spain |
|---|
| 0:22:12 | ask around a listener spanish |
|---|
| 0:22:14 | you seen uh material was a recording from till you drop |
|---|
| 0:22:19 | davis |
|---|
| 0:22:20 | since then uh playing |
|---|
| 0:22:22 | state of technology got around five percent equal error rate |
|---|
| 0:22:25 | in the close set |
|---|
| 0:22:26 | three development condition |
|---|
| 0:22:28 | just |
|---|
| 0:22:28 | what's that |
|---|
| 0:22:29 | the rest for for them |
|---|
| 0:22:32 | and |
|---|
| 0:22:33 | we think |
|---|
| 0:22:34 | that |
|---|
| 0:22:34 | fine task |
|---|
| 0:22:35 | tasks |
|---|
| 0:22:36 | in this uh evaluation |
|---|
| 0:22:38 | my support for the developments in language information technology |
|---|
| 0:22:42 | and uh will we form |
|---|
| 0:22:44 | yeah darkness is its sensitivity to the bottom restrictions |
|---|
| 0:22:48 | depending depending |
|---|
| 0:22:50 | on the system |
|---|
| 0:22:52 | uh |
|---|
| 0:22:53 | from two different systems |
|---|
| 0:22:54 | that |
|---|
| 0:22:56 | uh |
|---|
| 0:22:57 | fig creasing calls |
|---|
| 0:22:58 | what's different |
|---|
| 0:23:00 | for them |
|---|
| 0:23:01 | my thing to be |
|---|
| 0:23:03 | this |
|---|
| 0:23:03 | uh |
|---|
| 0:23:05 | condition i don't know if |
|---|
| 0:23:06 | you are interested in restricting |
|---|
| 0:23:08 | the materials |
|---|
| 0:23:10 | but i think |
|---|
| 0:23:11 | it could be interesting for me assimilation |
|---|
| 0:23:14 | maybe i don't know |
|---|
| 0:23:15 | you are interested but |
|---|
| 0:23:17 | um |
|---|
| 0:23:19 | on finally we found not the same performance um opening languages |
|---|
| 0:23:23 | the best performance was formed for bass |
|---|
| 0:23:25 | and the |
|---|
| 0:23:26 | was performance was from for a time |
|---|
| 0:23:29 | speculating about these |
|---|
| 0:23:30 | we can uh say that bus |
|---|
| 0:23:33 | is uh a special language |
|---|
| 0:23:35 | not romance languages |
|---|
| 0:23:37 | uh data |
|---|
| 0:23:38 | its origins are different |
|---|
| 0:23:40 | oh |
|---|
| 0:23:40 | all the languages in spain |
|---|
| 0:23:43 | and contamination roma's language |
|---|
| 0:23:45 | which you may be |
|---|
| 0:23:46 | usually confused |
|---|
| 0:23:48 | people by the systems |
|---|
| 0:23:50 | with portuguese or |
|---|
| 0:23:51 | maybe friends |
|---|
| 0:23:53 | or maybe |
|---|
| 0:23:54 | spain uses pennies or at least |
|---|
| 0:23:57 | well |
|---|
| 0:23:58 | and |
|---|
| 0:23:59 | finally i have to say or couldn't work |
|---|
| 0:24:01 | is organising |
|---|
| 0:24:03 | in this in this evaluation |
|---|
| 0:24:04 | we are not just now |
|---|
| 0:24:06 | or anything |
|---|
| 0:24:07 | this evaluation that was seen two percent and language recognition evaluation |
|---|
| 0:24:12 | yeah yeah |
|---|
| 0:24:12 | we have a record it i knew we have extended |
|---|
| 0:24:15 | the |
|---|
| 0:24:16 | how like a database |
|---|
| 0:24:17 | which was |
|---|
| 0:24:18 | one used |
|---|
| 0:24:19 | before |
|---|
| 0:24:20 | to uh define come back to |
|---|
| 0:24:23 | we have i did portuguese and english study languages |
|---|
| 0:24:27 | maybe you are interested |
|---|
| 0:24:28 | these languages |
|---|
| 0:24:29 | a happy new they're set of unknown languages |
|---|
| 0:24:32 | um have included |
|---|
| 0:24:34 | i knew this condition for noisy speech |
|---|
| 0:24:39 | is this getting we |
|---|
| 0:24:40 | you can't of easter |
|---|
| 0:24:42 | yeah until july |
|---|
| 0:24:43 | fifteen |
|---|
| 0:24:45 | i'm september you have |
|---|
| 0:24:47 | more or less three months |
|---|
| 0:24:49 | if you use them now |
|---|
| 0:24:52 | until september twenty seven |
|---|
| 0:24:54 | to uh the video systems |
|---|
| 0:24:57 | and two weeks |
|---|
| 0:24:58 | to uh process |
|---|
| 0:25:00 | uh evaluation that |
|---|
| 0:25:02 | then and the key file and results were released |
|---|
| 0:25:05 | one double |
|---|
| 0:25:07 | fifteen |
|---|
| 0:25:08 | and the warsaw |
|---|
| 0:25:10 | yeah |
|---|
| 0:25:11 | language recognition what is and what's not |
|---|
| 0:25:13 | we we had |
|---|
| 0:25:14 | in november i beagle spain |
|---|
| 0:25:18 | in a contest of how to do something that is uh |
|---|
| 0:25:22 | what's up uh |
|---|
| 0:25:23 | spain |
|---|
| 0:25:25 | okay |
|---|
| 0:25:26 | uh you can repeat step in this |
|---|
| 0:25:28 | uh |
|---|
| 0:25:30 | well |
|---|
| 0:25:31 | and if you look is that please |
|---|
| 0:25:33 | dissipate |
|---|
| 0:25:35 | that's all |
|---|
| 0:25:43 | they should |
|---|
| 0:25:48 | you mentioned at the beginning |
|---|
| 0:25:50 | when you collect the database |
|---|
| 0:25:52 | you might sure |
|---|
| 0:25:53 | uh |
|---|
| 0:25:54 | now treat each speaker |
|---|
| 0:25:56 | no no i'm not sure |
|---|
| 0:25:57 | i |
|---|
| 0:25:58 | try to uh |
|---|
| 0:25:59 | distribute programs |
|---|
| 0:26:01 | in different sets for instance one brother one T V so |
|---|
| 0:26:04 | was only for evaluation |
|---|
| 0:26:06 | another T V show was only for development and |
|---|
| 0:26:10 | you |
|---|
| 0:26:11 | for instance yeah |
|---|
| 0:26:12 | this I T V so called colour |
|---|
| 0:26:15 | in bass |
|---|
| 0:26:16 | this T V show was |
|---|
| 0:26:17 | i don't think it all for training |
|---|
| 0:26:20 | but not for development not |
|---|
| 0:26:22 | i'm not sure |
|---|
| 0:26:23 | that if they are not |
|---|
| 0:26:24 | the same speakers |
|---|
| 0:26:26 | i like |
|---|
| 0:26:27 | i i |
|---|
| 0:26:27 | we tried |
|---|
| 0:26:28 | two |
|---|
| 0:26:29 | to um |
|---|
| 0:26:31 | to manage |
|---|
| 0:26:31 | the |
|---|
| 0:26:33 | just understood |
|---|
| 0:26:33 | oh |
|---|
| 0:26:34 | i'm just speculating that |
|---|
| 0:26:36 | uh |
|---|
| 0:26:37 | the speakers are also like |
|---|
| 0:26:39 | what |
|---|
| 0:26:39 | well developed |
|---|
| 0:26:40 | see |
|---|
| 0:26:40 | oh |
|---|
| 0:26:41 | if |
|---|
| 0:26:42 | there's a lot |
|---|
| 0:26:44 | uh_huh |
|---|
| 0:26:45 | repeated speaker in |
|---|
| 0:26:47 | elements |
|---|
| 0:26:48 | no |
|---|
| 0:26:49 | uh |
|---|
| 0:26:50 | you could lead to recognise |
|---|
| 0:26:51 | speaker |
|---|
| 0:26:51 | right |
|---|
| 0:26:52 | lang |
|---|
| 0:26:53 | no because uh we try to put on so and |
|---|
| 0:26:57 | and |
|---|
| 0:26:58 | many speakers in in |
|---|
| 0:27:00 | that is not problems like um |
|---|
| 0:27:03 | um |
|---|
| 0:27:04 | broadcast news |
|---|
| 0:27:05 | where there is only one or two speakers speaking all the time |
|---|
| 0:27:08 | or more |
|---|
| 0:27:09 | much yeah |
|---|
| 0:27:10 | much |
|---|
| 0:27:11 | time |
|---|
| 0:27:12 | we try to select various |
|---|
| 0:27:13 | T V so |
|---|
| 0:27:14 | different T V shows |
|---|
| 0:27:16 | uh sincerely debates |
|---|
| 0:27:18 | and talk shows where many people speak |
|---|
| 0:27:21 | and they're also interviews so |
|---|
| 0:27:24 | so |
|---|
| 0:27:25 | maybe is to what you're telling |
|---|
| 0:27:27 | you are suggesting |
|---|
| 0:27:29 | but |
|---|
| 0:27:30 | i don't think so we |
|---|
| 0:27:32 | i don't know |
|---|
| 0:27:39 | question uh |
|---|
| 0:27:40 | with the growing to do the data format so you recorded to wideband speech |
|---|
| 0:27:45 | and uh |
|---|
| 0:27:46 | you also find that so uh |
|---|
| 0:27:48 | it was a little harder task |
|---|
| 0:27:50 | yeah |
|---|
| 0:27:51 | expect |
|---|
| 0:27:52 | speech |
|---|
| 0:27:52 | in this |
|---|
| 0:27:54 | nation |
|---|
| 0:27:55 | no |
|---|
| 0:27:56 | so you would have more information like |
|---|
| 0:27:59 | speech |
|---|
| 0:28:00 | but |
|---|
| 0:28:01 | um |
|---|
| 0:28:02 | it might also that might be the fact |
|---|
| 0:28:04 | right |
|---|
| 0:28:05 | we stick unique |
|---|
| 0:28:06 | what effect it might be the fact that |
|---|
| 0:28:08 | most people have been developed for telephone |
|---|
| 0:28:11 | oh |
|---|
| 0:28:12 | speech |
|---|
| 0:28:13 | what would you be in there |
|---|
| 0:28:16 | well |
|---|
| 0:28:16 | i i not happy with that obviously |
|---|
| 0:28:18 | because we are developing technology for |
|---|
| 0:28:21 | for T V |
|---|
| 0:28:22 | for for T V signals uh record it |
|---|
| 0:28:25 | in one by one |
|---|
| 0:28:26 | but by conditions |
|---|
| 0:28:28 | so uh i i understand the reasons to organise |
|---|
| 0:28:32 | then use the one races because |
|---|
| 0:28:34 | was the sponsor or what |
|---|
| 0:28:35 | the sponsor ones |
|---|
| 0:28:36 | two |
|---|
| 0:28:38 | by finance in the the |
|---|
| 0:28:40 | they're all nations |
|---|
| 0:28:41 | but |
|---|
| 0:28:41 | from the point of view of the of the research uh community i think |
|---|
| 0:28:45 | we should uh |
|---|
| 0:28:48 | try to |
|---|
| 0:28:49 | to organise how to kind of whatever whatever it seems more devoted to technology got it |
|---|
| 0:28:54 | the bottom ends |
|---|
| 0:28:55 | unless to the application i |
|---|
| 0:28:57 | it's my opinion but |
|---|
| 0:28:59 | uh |
|---|
| 0:29:00 | we have to decide |
|---|
| 0:29:01 | we had to maybe to find |
|---|
| 0:29:03 | sponsors and |
|---|
| 0:29:04 | i don't know |
|---|
| 0:29:06 | if |
|---|
| 0:29:06 | uh |
|---|
| 0:29:06 | that's |
|---|
| 0:29:08 | possible or not |
|---|
| 0:29:16 | understood |
|---|
| 0:29:17 | and |
|---|
| 0:29:17 | i just |
|---|
| 0:29:18 | session because the men should discussion going |
|---|
| 0:29:22 | so |
|---|
| 0:29:22 | thank you |
|---|
| 0:29:23 | oh okay |
|---|