Speech Transcript - A SEGMENT-LEVEL CONFIDENCE MEASURE FOR SPOKEN DOCUMENT RETRIEVAL

0:00:15	however everybody
0:00:16	uh to the are will speak about the segment level confidence measure or for spoken document retrieval
0:00:22	this is a a trained of my presentation
0:00:25	after a brief introduction of the motivation and this to do is
0:00:29	i will speak about indexability
0:00:31	to mission for documents
0:00:33	and and then the prediction of this indexability a
0:00:37	so then to speak about experiments results are in finale as a conclusion
0:00:43	so is back is included in the
0:00:46	spoken document what you're real task
0:00:48	where are that automatic speech recognition system give a transcription
0:00:52	and when you must were from relates
0:00:54	the query such and trying to vote them to the user
0:00:58	the documents in the ranking
0:01:00	okay
0:01:01	uh there were is
0:01:02	speech recognition uh systems
0:01:05	automatic speech recognition yeah well
0:01:07	but pros and search your percent
0:01:09	can it back says the accuracy of the subject right
0:01:14	and uh spoken document retrieval trivial task
0:01:16	oh use
0:01:18	and the global performance
0:01:19	of the system
0:01:21	in this work
0:01:22	but at all kids i do we okay is
0:01:25	to check the stick if a document can they that the base or as indexing
0:01:31	this is a look at in document performance intervals of
0:01:35	spoken document what your row
0:01:37	more precisely the automatic speech recognition system
0:01:43	gives
0:01:43	uh some good documents
0:01:45	oh it when there is a tremendous
0:01:47	and a one you know a as the user or from a query is
0:01:51	such and doing can returns a it when there was documents in the first
0:01:55	ranking
0:01:58	so but we have to introduce
0:01:59	the method to kids
0:02:01	i don't mean to the take this i when there was set documents
0:02:04	and for example they can be
0:02:06	corrected
0:02:07	but and i could used
0:02:09	and we
0:02:10	we introduce in the database
0:02:15	so no i will present the indexability estimation for document
0:02:22	and some first box
0:02:23	and the left
0:02:24	the document and file in blue is provided by is the automatic speech recognition system
0:02:30	and i was of documents are manually transcribed
0:02:33	in the rows of X
0:02:35	and the right
0:02:36	and
0:02:37	documents are manually transcribed include the document
0:02:40	and uh
0:02:43	what we formulate a uh some is the search and right
0:02:47	we'll return to know
0:02:49	the from drinking
0:02:51	and we have a to run for as a document of uh
0:02:56	finally we compute and C estimation for
0:03:01	the document and file base the mean i've of the of you on the twenty best
0:03:06	wizard
0:03:07	this is in to indexability
0:03:09	it's to mission for the document
0:03:14	no
0:03:15	i will present the production
0:03:17	and
0:03:17	this indexability ability
0:03:21	this is good of this well he's
0:03:23	to pretty
0:03:24	if but the command can they meet the that based on that
0:03:29	the principle is based
0:03:30	on the mix
0:03:31	i have uh
0:03:32	to kind of
0:03:33	miserables rules
0:03:35	the first is the correctness of the row
0:03:37	names the confidence measure
0:03:39	and the second a semantic modeling
0:03:42	of
0:03:43	the world
0:03:45	name it semantic compactness and X
0:03:48	we use that really are
0:03:50	you on the one network
0:03:52	to combine the matrix
0:03:55	and predicts indexability
0:03:57	after a in the reserved section
0:03:59	we really speak about the
0:04:02	the results of their prediction
0:04:08	there is some problem with the coral
0:04:10	yeah
0:04:10	so as a first image matrix is a confidence measure
0:04:14	which are expected from the automatic speech recognition system
0:04:17	the as present the correctness of the world
0:04:21	we use
0:04:21	twenty tree
0:04:22	features grouped into places
0:04:25	acoustic linguistic and got classes
0:04:28	and the confidence measure i've
0:04:31	the documents
0:04:32	is is the mean of the confidence
0:04:34	but real of the meaningful for
0:04:37	i have as a document
0:04:40	we have a a true example for each class is
0:04:44	in acoustic with then we can find uh the log likelihood
0:04:47	of the room
0:04:48	uh a in the linguistic the income probability in in the graph
0:04:52	class
0:04:53	we have to do
0:04:54	of the complete it's well
0:04:56	which represents a number of
0:04:58	at on that's you that's
0:04:59	in the remote section
0:05:03	zeros are matrix is
0:05:05	the semantic compactness mean uh and the X
0:05:08	in the state of the are
0:05:10	in some cases
0:05:12	so in sick and information then so
0:05:15	and prove
0:05:15	the confidence measure accuracy
0:05:18	for automatic
0:05:21	speech recognition system
0:05:23	but we can these tunes that the insertion of substitution of meaning for worlds
0:05:27	and backed
0:05:28	is a spoken document retrieval system
0:05:35	that was this made so that the uh really in this better we propose a local detection i've
0:05:40	semantic
0:05:41	which layers
0:05:42	but isn't sliding context window
0:05:44	which represents
0:05:46	a back or for
0:05:48	a is on the large corpus
0:05:50	use at the rate as reference
0:05:53	we have a example of uh
0:05:56	where as the for up to just to patients so and can a i P are only in the
0:06:02	in the same uh
0:06:05	context but
0:06:06	the rubber rain
0:06:07	never uh doesn't up here in the same context as
0:06:11	zero the roll
0:06:12	so this is and with value
0:06:20	now i will speak about the experiments
0:06:22	and the the reason you're
0:06:24	and transcription are generated by using is the automatic speech recognition system of the L A a a name it
0:06:31	L
0:06:32	it is based on the uh stop search of in
0:06:35	you you
0:06:37	a lexicon and i of uh
0:06:39	sixty seven of and uh that was and the well
0:06:43	the corpus
0:06:45	yeah is the uh the as to the sets
0:06:47	which contain approximatively really eight are else
0:06:51	have
0:06:51	but just news
0:06:52	and contain approximate proximity really um
0:06:56	seven two hundred documents
0:06:57	we have a maximum i
0:06:59	so it's two seconds
0:07:01	it's documents uh i have a
0:07:03	approximate proximity between uh
0:07:05	so and and uh
0:07:07	at where
0:07:10	the system but for a uh so that's a five percent error rates
0:07:14	in
0:07:15	but a real time system
0:07:21	is that such and train use is the send it is based on the the frequency can see and document
0:07:26	frequency on agree
0:07:28	the core with this set
0:07:30	contain uh
0:07:32	one hundred sixty thousand queries
0:07:35	extracted from the that line of the newspaper
0:07:38	remind
0:07:40	the court
0:07:42	there we from used is the we keep it to
0:07:45	and uh corpus in query is
0:07:47	oh a it's is it in filter read
0:07:49	in order to keep the meaningful word
0:07:53	which trains a neural network okay
0:07:55	and a one i have a and this to
0:07:58	the experiments
0:07:59	and seven are all
0:08:04	so i will present no to prediction
0:08:07	yeah right
0:08:08	we use
0:08:09	to metric is the distortion
0:08:11	between the production of indexability ability
0:08:13	and the and X but
0:08:15	and what mean square error
0:08:18	as we can see that that but it we use uh
0:08:22	i has
0:08:23	prediction of indexability
0:08:25	only use a confidence measure
0:08:27	and uh
0:08:27	the semantic compactness and X
0:08:30	as
0:08:31	prediction of indexability ability
0:08:33	and the mix
0:08:34	the combination of the
0:08:36	to metrics
0:08:38	you can as a combination and yet as a better performance
0:08:41	we have a we have a six been better
0:08:43	for as a distortion and
0:08:46	for a chip or some fourteen percent
0:08:48	for
0:08:49	so what mean square
0:08:53	now i represents and or experiments
0:08:56	which i will uh
0:08:58	and are composed
0:08:59	to to into pulse
0:09:02	the corpus
0:09:04	you know but to keep
0:09:05	in a uh you know running hand the and then takes about documents
0:09:08	and is well as and zero and except document
0:09:13	yeah for example a not covers
0:09:15	well to select only is the uh
0:09:18	so a and and so but the commands you can fix a transfer to such a percent
0:09:24	and it's documents
0:09:26	is classified as
0:09:28	good
0:09:29	classify if are but that's five
0:09:31	if
0:09:32	so um
0:09:34	we we have a a good classification it was a
0:09:37	and the but it's you and the prediction of indexability two
0:09:41	i about
0:09:42	and there or a pro
0:09:44	i i a or that in this case the that the commands and red is but is if i
0:09:50	yeah
0:09:51	now was this is the
0:09:55	the classification right
0:09:57	according to the indexability a show
0:10:02	in impose a confidence measure or in your of the semantic compactness and X
0:10:06	and in red the combination of the term is real use
0:10:10	to predict indexability
0:10:14	as you can still
0:10:15	and are from to sense
0:10:17	i matrix
0:10:19	i will to classify
0:10:21	correctly is uh the indexability ability
0:10:24	we have a but i have a two percent of classification
0:10:27	for the confidence measure
0:10:29	at the to to find of two
0:10:33	in the second part
0:10:35	a well than of two percent
0:10:39	i intrigued decrease
0:10:40	and especially at eighty percent
0:10:44	where as the confidence measure rule yeah fifty five or send
0:10:48	of classification
0:10:50	we the same transmit
0:10:52	the confidence measure rules
0:10:54	i don't to classify approximatively to and written documents
0:10:59	models and the
0:11:00	confidence measure only
0:11:03	and a and uh in all cases as uh as a combination of the two metrics
0:11:10	yeah as a better performance
0:11:20	so in conclusion
0:11:22	uh with the most rate interest
0:11:24	of uh
0:11:25	the semantic information and uh
0:11:28	with the uh
0:11:30	confidence measure or for spoken document retrieval
0:11:33	we use a combination of the two metrics
0:11:36	and the combination and uh
0:11:39	i do to improve about so it's your percent
0:11:42	the classification rates in terms of
0:11:44	and except or and then takes about the command
0:11:47	one with
0:11:49	in does but
0:11:50	well
0:11:51	we are planning to explore
0:11:54	the uh
0:11:55	let's and initially application for uh all the semantic modeling
0:12:00	because it is but is that
0:12:02	on the to pick a topic distribution on the power
0:12:06	think you
0:12:10	i
0:12:10	i
0:12:12	and you can have a few more minutes
0:12:15	so question
0:12:20	yeah
0:12:20	i i and one question on uh and like maybe thinking about a question
0:12:24	so a real say my west each of their uh quite often a quite is use out to be a
0:12:30	no change like only next no
0:12:33	you you don't and
0:12:34	and a are X i right like to like christ roughly that percentage of to quite so i just
0:12:40	and is the same as sick the same there is that
0:12:43	as an annual you'll transcription
0:12:47	i i so i can just one make it a so i you had to get the transcription right to
0:12:52	create it's nice as you are looking at an output and yeah i S i like a a case like
0:12:57	to five parts
0:12:58	so are what i started each of you a quite results in a change eighteen
0:13:03	the results from S output is the transcription
0:13:08	yeah
0:13:15	yeah
0:13:17	and
0:13:19	okay so basically a and like some of the years at you know a chance all spoken document retrieval achieve
0:13:25	i no not come as that actually S i guess and i think to Q is not much
0:13:30	so i out
0:13:32	i
0:13:33	i don't think that that like they make a case yeah like make a twenty five or so
0:13:39	so it is a task in
0:13:41	so someone i'm it and that i just at that state for plus do you need to like to i
0:13:47	and
0:13:48	at at no you only so that actually get the strain same are split as their your task
0:13:55	i
0:14:01	uh
0:14:08	normally if if you have the
0:14:11	the many
0:14:12	transcription
0:14:13	uh_huh
0:14:14	and we want to correct
0:14:16	i can be used
0:14:17	the
0:14:17	the power of the documents
0:14:19	which
0:14:20	well can be corrected
0:14:22	i really uh is it is there is a lot of
0:14:25	i
0:14:25	would never appear in the top ranking
0:14:28	have the the crew of the the search
0:14:31	so that this kind of the command
0:14:34	the attributes
0:14:35	can select through remote
0:14:37	of the database
0:14:38	and no one hand
0:14:39	and the are and so uh was a lot of documents
0:14:42	we just the
0:14:43	uh right
0:14:45	we are there right
0:14:46	is very are
0:14:47	and needs to be manually approximate you
0:14:51	approximate evenly we have a
0:14:53	on the
0:14:56	a per cent of a lower rate
0:14:59	a the ten percent of a documents
0:15:01	of the corpus
0:15:02	which can be a remote by is that i can just
0:15:05	because it just not not of and
0:15:08	documents
0:15:09	that's the just to uh
0:15:10	we have a uh
0:15:12	in "'cause" use of information a like a low this is
0:15:16	no not the very important information
0:15:19	and approximatively fifteen
0:15:21	a sense
0:15:22	to to be corrected
0:15:24	so have
0:15:25	a good the
0:15:27	and except at uh
0:15:30	vol group
0:15:32	thanks to a close to i at question thank you
0:15:35	a and the question
0:15:38	it's thank speaker

A SEGMENT-LEVEL CONFIDENCE MEASURE FOR SPOKEN DOCUMENT RETRIEVAL

Spoken Document Processing

Presented by: Gregory Senay, Author(s): Gregory Senay, Georges Linarès, Benjamin Lecouteux, University of Avignon, France