Entropy based Supervised Merging for Visual Categorization

Usman F Niaz, Bernard Merialdo

Bag Of visual Words (BoW) is widely regarded as the standard representation of visual information present in the images and is broadly used for retrieval and concept detection in videos. The generation of visual vocabulary in the BoW framework generally includes a quantization step to cluster the image features into a limited number of visual words. This quantization achieved through unsupervised clustering does not take any advantage of the relationship between the features coming from images belonging to similar concept(s), thus enlarging the semantic gap. We present a new dictionary construction technique to improve the BoW representation by increasing its discriminative power. Our solution is based on a two step quantization: we start with k-means clustering followed by a bottom-up supervised clustering using features' label information. Results on the TRECVID 2007 data show improvements with the proposed construction of the BoW. We equally give upperbounds of improvement over the baseline for the retrieval rate of each concept using the best supervised merging criteria.

Switch Camera

Real-time Dance Pattern Recognition Invariant to Anthropometric and Temporal Differences

Meshia Cedric Oveneke, Valentin Enescu, Hichem Sahli

ACIVS 2012

Advanced Concepts for Intelligent Vision Systems

Entropy based Supervised Merging for Visual Categorization

Search in Audio

Speech Transcript

Related Recordings

Kernel Similarity based AAMs for face recognition

Real-time Dance Pattern Recognition Invariant to Anthropometric and Temporal Differences