i you start by one percent this evaluation this was and therefore we made penis paying two years ago this is the line of the presentation um i talk about the compass timeout duration than i would describe the this conditions which are more much liking the nist evaluation uh evaluations we took uh nist evaluations as us um an example and i understand that in fact so we can be uh maybe uh and ninety percent off the condition then uh i will describe uh basically we thought as possible uh the results and then give some conclusions well uh this uh evaluation was uh supported by the spanish thematic network on speech technology to spain uh it was uh uh but the feed the any of what's up on speech than only one was that can be bought in november with us tonight and in that but in that uh were so um the what the other two other operations on the speech trouble relation and screens speech synthesis but wrong the language recognition evaluation i don't you know and the what the was another motivation which was that uh or group was interested in developing language recognition did not you for uh spokane document retrieval applications well what we see on all the points we have we had in mind when you sign in the evaluation uh well to promote collaboration between research group in this pain also portable uh secondly uh to provide i speech database a specifically designed two uh perform and recognition in the language in spain therefore languages in spain not everybody knows that yeah that four official languages spoken in spain and then a another motivation was uh to ms word accuracy that the state of the art systems good a time for for this particular application because these languages uh yeah how people in jointly in spain so uh maybe this task could be more challenging than we could expect and finally uh mister this was a diffuse your uh motivation maybe for some you to mister the performance of systems developed on a limited a month data well uh the language detection task was defined same way of as for nist i don't describe this is the same or been described they were uh yeah that's and this can be assumes simple uh uh what is the described here this this is like uh the current system development which is a special one and we need differentiate yeah between uh systems developed uh in three conditions using any available materials and systems uh developed using only did the date that we provide okay that was very special for this evaluation we were interested in putting all the teams at the same point to develop their systems and then to evaluate what they could do starting from the okay well regarding the set of trials we define it as for nist the closest set uh this to one nation open set this one ratio uh we also be fine fig kind of segments of uh for a second and second and three seconds segments um we uh defined we used the same performance measures uh also uh defined by nice you have uh scene this must or sin the previous presentation average calls we also use the seattle area and finally that the course to give uh qualitative uh evaluation systems we uh we define the same priors and colours of the last to understand what we well then database way to record it it was found that call okay i recorded the database from T V in my home uh just connecting idea to record the to the to the decoder the couple T V reporter and is described in the paper that a right two thousand two so it it it um important for target languages spanish out on a second at least yeah and also all the languages just to i love open set test uh the languages uh where friends portable used your money and english five or two case is not so close to spanish that you can uh fig so for you want to these people uh find many different um the spanish too it should with the language well uh audio files uh where uh what files yeah um sixteen Q don't hurt uh uh the last frequencies sampling frequency yeah well they were single channel fig ten bits per sample compressed P C M uh this is another dot friends with about the nist evaluation speech signals would all start from T V souls including a lot of speech or spontaneous speech what kind of environment conditions uh yeah for instance that could be three speakers as speaking enough to be second segment so that could be many speakers speaking in the same yeah test set well we define he joins subsets of T V shows to train development and evaluation this was to make to i a guaranteed at different more or less that different speakers uh where in each in it subset and finally the only data bases pretty small four then use a standard it's just a fifty fifty hours long is distributed to C D V D but and um we are just now uh talking with the L D C to the distributed two L D C and the train data set in good yeah last and fifty six hours nine hours per target language we don't provide any uh they got to uh train or something that four oh the seven languages so just nine hours but that the language that's that's all and i audits of languages i'll be uh in the development dataset and in the evaluation that which are more or less the we have more or less the same structure but i don't i would then uh well when defining when deciding that about the database uh we only choose uh tools um high snr as speech described in sediments with right lemma noise uh speech overlaps they or what all of that all of them well fit the foul and the guidance documents for training they'll have to know then restrictions maybe five minutes you could train with a five minute segment with so um but for seven ms for the betterment of automation yeah to cut uh lend restrictions um we are defined automatic a way of constructing them and by ensuring that they would enclosed by silence more or less yeah and in fact the subsets well the subset of three second segment is extracted from the subset subset of then support six seven months and the same way that and second segment subset is extracted from the the the second segments option uh quite difficult but what we tried is to ensure that differences in in performance uh would you only to uh the the land not too um being testing against different material and the where sound tolerance in land we use in fact a segments between three and five seconds ten and twelve seconds and fig the active duty cycle where the door uh interval and finally that they don and that's it and the same for evaluation but uh one though send a candidate segment yeah sue me not i think for the three ratios so the where six hundred seven is but duration and for each iteration there were a one hundred twenty segments but by the language and i know that one hundred twenty seven minutes of all the syllable it's uh i have to say that this uh it means that yeah the where exactly it oh well too uh yeah twenty percent of um uh segments where i'll go from out of seven as in the in both the development and evaluation purposes which might exactly was what what was defined in the in the right thing um well everybody database design the proportions of known languages where mate they rely but difficult for me to promote it a different for development evaluation and to avoid uh tandem systems to reject specifically so uh kerry part of the table of uh the distribution of segments for development and evaluation you can see that there were seventy sevens for friends them for portuguese and forty four english and not for from the element in the development set and evaluation set the drawings were change between for example to be sent english and german so was may this way uh evaluation do simply there were there was on a rotation plan very similar to that companies uh the wherefore class conditions open set free open suppressed it consists of three judge that restrict it and three durations of the web to attract five for it this condition on uh it's fifteen percent just one single primary system and any number of compressed before alternative systems they wanted to pursue ah the solution should be so my submitted by by teams in this uh evaluations format at this file with one hundred trials section fig spline um but this depends what am i committed to specifically specified whether or not the scores may be interpreted us look like oh look like that the errors or not and also to send this presents and to participate um in the ann arbour scene with us tonight and then with evolution evolution works okay systems where one uh in fact according to their average goals and are defined that way in this fancy and though was run it and right now i'm not work for the best system the system the only you'll be in the least the average goes in there see a thirty condition close to restrict it on other subset of uh still be second seven well this was there the scale of their valuation uh in few words the work three months for developing your system and there were three weeks to uh uh process they want vision of and i have to say that uh the database produced the database was produced in three models from april to june depends on a and we also recorded some more um data in september to find something to two uh and uh complete the evaluation on the test okay that that you can find it in the paper well uh i now i begin to describe herself yeah the work for participants displayed in teams percent including systems things were from spain and what about and uh there were two teams percent in a state of the art systems more or less and the two first the first ones T one T two and the other two percent it assistance not specifically designed for uh a language recognition applications so the the source world just the table of uh the average cost four uh thirty second segment you can see that there performance as well very bad so uh in the following that it will only because talk about results of these two two to to okay well yeah no i somersaults uh first the condition i talk about is the the mandatory one for which they almost all the teams have to the centre system you can see here cool that's good um this uh like what this one is uh for a contrastive system funding from T one with in fact uh got the best result the best the average goes but the best primary system was also from T one uh they have the then um okay yeah when when channel this was i okay to say that this was this was in restrictive conditions these systems to come see a big difference with T seem to on T V team one uh because uh they were already uh develop their systems using this to the data provided in this one which not using any other sisters they rely on all the data and all the like okay so when changing to the three conditions uh with see the systems uh got uh much better performance around five percent equal error rate but the in fact we were surprised by this result because we expect it much better results around one percent or less yeah and uh we uh made some experiments afterwards the the the one mission which on system a system that got on there fig he percent or whatever right in the general language recognition task defined in use two thousand seven evaluation and we've got five yeah forty five percent whatever right so five it seems that uh this task mm the task defined for for about seeing this evaluation is uh more difficult than i'm spec okay there are some possible issues not the same that's another thing that data results comparable comparable between the knees evaluation on this evaluation maybe not the statistical significance there are not many uh trials only yeah six hundred there uh but the nation okay and there are also some possible explanations maybe the acoustic variability a speaker's channel background noise there were different conditions and also their phonetic and lexical we but for these uh the phonetic on lexical similarity among body language or more than one the same country oh no many years many centuries what we don't leave so maybe this is the then race in any case size have said that that seems uh challenging enough for that a lot of other research in language recognition technology well yeah this is we have been talking about their clothes set condition now i'm talking about that opens the condition the best performance in this case was worse like for yeah because there are uh well known languages in there the trials and with the systems that the system works around nine percent accurate this case which is almost two times they were raiding the close to completion so that three conditions yeah well or conclusion is that some unknown languages are being confused with body language is maybe or to be some friends we don't know well yeah you have uh there was these results uh the second rate for languages for target languages uh uh for the best system so you can hear you can see for the close set condition i'm for that opens the condition and the green who is for bus which got the best uh performance and then uh right put his fork at a time we've got a worst performance in opens the condition and you can see the uh that bus the change in the performance for bass really it's more also forced by means which is the i think but this is the kid a blue and the power point it's not easy and which also uh wasn't yeah it's performance but not as much as a forecast for qatar so are we have uh analyse this in more time with this table i have to say that there is are a right uh never in the paper uh these numbers are five the uh error rates uh you need some before somehow we missing there in the dialogue now and be false alarm aside diagonal um uh we mistake them as coast so yes this they they did but they soon as the same but the the numbers are not what the paper says they are okay ah as you can there is a reliable recall here the white meaning zero there are and black meaning one the maximum possible error uh yeah so this is for the close and so condition and when changing to the open set conceive here really usually that for at a time no languages tyler you find that um many uh trials corresponding to known languages where confused with qatar that's the origin of that uh changing the core for the open set condition okay uh this assault and not going to comment this because it's the same for us for me is the the performance uh watson's us the double of the land uh is less of the around the segment and uh this is for more interesting for me is because uh you can see what happens when you restrict they get the the bottom we conditions uh you have here um yeah two different teams the blue ones being one the right one is team too and for the uh three condition well then uh got more or less the same performance this to go but for the restricted condition when restricting the materials they could use to double their systems what uh the T one okay the the performance quite close to the to the other condition where is the the other one uh the performance was much much worse the difference it's uh forty percent word a or its goals to uh four hundred percent what are its goals these okay so i think this is important because you can i for me this is spent is much more robust now the other one because it not does does not depend on so much on the materials provided to to trying to to to train it okay well conclusions um well we i thought sent it uh that what was an evaluation involving the official language in spain ask around a listener spanish you seen uh material was a recording from till you drop davis since then uh playing state of technology got around five percent equal error rate in the close set three development condition just what's that the rest for for them and we think that fine task tasks in this uh evaluation my support for the developments in language information technology and uh will we form yeah darkness is its sensitivity to the bottom restrictions depending depending on the system uh from two different systems that uh fig creasing calls what's different for them my thing to be this uh condition i don't know if you are interested in restricting the materials but i think it could be interesting for me assimilation maybe i don't know you are interested but um on finally we found not the same performance um opening languages the best performance was formed for bass and the was performance was from for a time speculating about these we can uh say that bus is uh a special language not romance languages uh data its origins are different oh all the languages in spain and contamination roma's language which you may be usually confused people by the systems with portuguese or maybe friends or maybe spain uses pennies or at least well and finally i have to say or couldn't work is organising in this in this evaluation we are not just now or anything this evaluation that was seen two percent and language recognition evaluation yeah yeah we have a record it i knew we have extended the how like a database which was one used before to uh define come back to we have i did portuguese and english study languages maybe you are interested these languages a happy new they're set of unknown languages um have included i knew this condition for noisy speech is this getting we you can't of easter yeah until july fifteen i'm september you have more or less three months if you use them now until september twenty seven to uh the video systems and two weeks to uh process uh evaluation that then and the key file and results were released one double fifteen and the warsaw yeah language recognition what is and what's not we we had in november i beagle spain in a contest of how to do something that is uh what's up uh spain okay uh you can repeat step in this uh well and if you look is that please dissipate that's all they should you mentioned at the beginning when you collect the database you might sure uh now treat each speaker no no i'm not sure i try to uh distribute programs in different sets for instance one brother one T V so was only for evaluation another T V show was only for development and you for instance yeah this I T V so called colour in bass this T V show was i don't think it all for training but not for development not i'm not sure that if they are not the same speakers i like i i we tried two to um to manage the just understood oh i'm just speculating that uh the speakers are also like what well developed see oh if there's a lot uh_huh repeated speaker in elements no uh you could lead to recognise speaker right lang no because uh we try to put on so and and many speakers in in that is not problems like um um broadcast news where there is only one or two speakers speaking all the time or more much yeah much time we try to select various T V so different T V shows uh sincerely debates and talk shows where many people speak and they're also interviews so so maybe is to what you're telling you are suggesting but i don't think so we i don't know question uh with the growing to do the data format so you recorded to wideband speech and uh you also find that so uh it was a little harder task yeah expect speech in this nation no so you would have more information like speech but um it might also that might be the fact right we stick unique what effect it might be the fact that most people have been developed for telephone oh speech what would you be in there well i i not happy with that obviously because we are developing technology for for T V for for T V signals uh record it in one by one but by conditions so uh i i understand the reasons to organise then use the one races because was the sponsor or what the sponsor ones two by finance in the the they're all nations but from the point of view of the of the research uh community i think we should uh try to to organise how to kind of whatever whatever it seems more devoted to technology got it the bottom ends unless to the application i it's my opinion but uh we have to decide we had to maybe to find sponsors and i don't know if uh that's possible or not understood and i just session because the men should discussion going so thank you oh okay