|Lin-Shan Lee (National Taiwan University)||Lin-Shan Lee|
Multimedia content over the Internet is very attractive, while the spoken part of such content very often tells the core information. It is therefore possible to index, retrieve or browse multimedia content primarily based on the spoken part. If the spoken content can be transcribed into text with very high accuracy, the problem is naturally reduced to text information retrieval. But the inevitable high recognition error rates for spontaneous speech including out-of-vocabulary (OOV) words under a wide variety of acoustic conditions and linguistic context make this never possible. One primary approach, among many others, is to consider lattices with multiple hypotheses in order to include more correct recognition results. This talk will briefly review the approaches and directions along this line, not only search over lattices but those beyond, such as relevance feedback, learning approaches, key term extraction, semantic retrieval and semantic structuring of spoken content.