however everybody uh to the are will speak about the segment level confidence measure or for spoken document retrieval this is a a trained of my presentation after a brief introduction of the motivation and this to do is i will speak about indexability to mission for documents and and then the prediction of this indexability a so then to speak about experiments results are in finale as a conclusion so is back is included in the spoken document what you're real task where are that automatic speech recognition system give a transcription and when you must were from relates the query such and trying to vote them to the user the documents in the ranking okay uh there were is speech recognition uh systems automatic speech recognition yeah well but pros and search your percent can it back says the accuracy of the subject right and uh spoken document retrieval trivial task oh use and the global performance of the system in this work but at all kids i do we okay is to check the stick if a document can they that the base or as indexing this is a look at in document performance intervals of spoken document what your row more precisely the automatic speech recognition system gives uh some good documents oh it when there is a tremendous and a one you know a as the user or from a query is such and doing can returns a it when there was documents in the first ranking so but we have to introduce the method to kids i don't mean to the take this i when there was set documents and for example they can be corrected but and i could used and we we introduce in the database so no i will present the indexability estimation for document and some first box and the left the document and file in blue is provided by is the automatic speech recognition system and i was of documents are manually transcribed in the rows of X and the right and documents are manually transcribed include the document and uh what we formulate a uh some is the search and right we'll return to know the from drinking and we have a to run for as a document of uh finally we compute and C estimation for the document and file base the mean i've of the of you on the twenty best wizard this is in to indexability it's to mission for the document no i will present the production and this indexability ability this is good of this well he's to pretty if but the command can they meet the that based on that the principle is based on the mix i have uh to kind of miserables rules the first is the correctness of the row names the confidence measure and the second a semantic modeling of the world name it semantic compactness and X we use that really are you on the one network to combine the matrix and predicts indexability after a in the reserved section we really speak about the the results of their prediction there is some problem with the coral yeah so as a first image matrix is a confidence measure which are expected from the automatic speech recognition system the as present the correctness of the world we use twenty tree features grouped into places acoustic linguistic and got classes and the confidence measure i've the documents is is the mean of the confidence but real of the meaningful for i have as a document we have a a true example for each class is in acoustic with then we can find uh the log likelihood of the room uh a in the linguistic the income probability in in the graph class we have to do of the complete it's well which represents a number of at on that's you that's in the remote section zeros are matrix is the semantic compactness mean uh and the X in the state of the are in some cases so in sick and information then so and prove the confidence measure accuracy for automatic speech recognition system but we can these tunes that the insertion of substitution of meaning for worlds and backed is a spoken document retrieval system that was this made so that the uh really in this better we propose a local detection i've semantic which layers but isn't sliding context window which represents a back or for a is on the large corpus use at the rate as reference we have a example of uh where as the for up to just to patients so and can a i P are only in the in the same uh context but the rubber rain never uh doesn't up here in the same context as zero the roll so this is and with value now i will speak about the experiments and the the reason you're and transcription are generated by using is the automatic speech recognition system of the L A a a name it L it is based on the uh stop search of in you you a lexicon and i of uh sixty seven of and uh that was and the well the corpus yeah is the uh the as to the sets which contain approximatively really eight are else have but just news and contain approximate proximity really um seven two hundred documents we have a maximum i so it's two seconds it's documents uh i have a approximate proximity between uh so and and uh at where the system but for a uh so that's a five percent error rates in but a real time system is that such and train use is the send it is based on the the frequency can see and document frequency on agree the core with this set contain uh one hundred sixty thousand queries extracted from the that line of the newspaper remind the court there we from used is the we keep it to and uh corpus in query is oh a it's is it in filter read in order to keep the meaningful word which trains a neural network okay and a one i have a and this to the experiments and seven are all so i will present no to prediction yeah right we use to metric is the distortion between the production of indexability ability and the and X but and what mean square error as we can see that that but it we use uh i has prediction of indexability only use a confidence measure and uh the semantic compactness and X as prediction of indexability ability and the mix the combination of the to metrics you can as a combination and yet as a better performance we have a we have a six been better for as a distortion and for a chip or some fourteen percent for so what mean square now i represents and or experiments which i will uh and are composed to to into pulse the corpus you know but to keep in a uh you know running hand the and then takes about documents and is well as and zero and except document yeah for example a not covers well to select only is the uh so a and and so but the commands you can fix a transfer to such a percent and it's documents is classified as good classify if are but that's five if so um we we have a a good classification it was a and the but it's you and the prediction of indexability two i about and there or a pro i i a or that in this case the that the commands and red is but is if i yeah now was this is the the classification right according to the indexability a show in impose a confidence measure or in your of the semantic compactness and X and in red the combination of the term is real use to predict indexability as you can still and are from to sense i matrix i will to classify correctly is uh the indexability ability we have a but i have a two percent of classification for the confidence measure at the to to find of two in the second part a well than of two percent i intrigued decrease and especially at eighty percent where as the confidence measure rule yeah fifty five or send of classification we the same transmit the confidence measure rules i don't to classify approximatively to and written documents models and the confidence measure only and a and uh in all cases as uh as a combination of the two metrics yeah as a better performance so in conclusion uh with the most rate interest of uh the semantic information and uh with the uh confidence measure or for spoken document retrieval we use a combination of the two metrics and the combination and uh i do to improve about so it's your percent the classification rates in terms of and except or and then takes about the command one with in does but well we are planning to explore the uh let's and initially application for uh all the semantic modeling because it is but is that on the to pick a topic distribution on the power think you i i and you can have a few more minutes so question yeah i i and one question on uh and like maybe thinking about a question so a real say my west each of their uh quite often a quite is use out to be a no change like only next no you you don't and and a are X i right like to like christ roughly that percentage of to quite so i just and is the same as sick the same there is that as an annual you'll transcription i i so i can just one make it a so i you had to get the transcription right to create it's nice as you are looking at an output and yeah i S i like a a case like to five parts so are what i started each of you a quite results in a change eighteen the results from S output is the transcription yeah yeah and okay so basically a and like some of the years at you know a chance all spoken document retrieval achieve i no not come as that actually S i guess and i think to Q is not much so i out i i don't think that that like they make a case yeah like make a twenty five or so so it is a task in so someone i'm it and that i just at that state for plus do you need to like to i and at at no you only so that actually get the strain same are split as their your task i uh normally if if you have the the many transcription uh_huh and we want to correct i can be used the the power of the documents which well can be corrected i really uh is it is there is a lot of i would never appear in the top ranking have the the crew of the the search so that this kind of the command the attributes can select through remote of the database and no one hand and the are and so uh was a lot of documents we just the uh right we are there right is very are and needs to be manually approximate you approximate evenly we have a on the a per cent of a lower rate a the ten percent of a documents of the corpus which can be a remote by is that i can just because it just not not of and documents that's the just to uh we have a uh in "'cause" use of information a like a low this is no not the very important information and approximatively fifteen a sense to to be corrected so have a good the and except at uh vol group thanks to a close to i at question thank you a and the question it's thank speaker