Skip to Main Content
Generic visual categorization (GVC) is the pattern classification problem that consists in assigning labels to an image based on its semantic content. This is a challenging task as one has to deal with inherent object/scene variations, as well as changes in viewpoint, lighting, and occlusion. Several state-of-the-art GVC systems use a vocabulary of visual terms to characterize images with a histogram of visual word counts. We propose a novel practical approach to GVC based on a universal vocabulary, which describes the content of all the considered classes of images, and class vocabularies obtained through the adaptation of the universal vocabulary using class-specific data. The main novelty is that an image is characterized by a set of histograms - one per class - where each histogram describes whether the image content is best modeled by the universal vocabulary or the corresponding class vocabulary. This framework is applied to two types of local image features: low-level descriptors such as the popular SIFT and high-level histograms of word co-occurrences in a spatial neighborhood. It is shown experimentally on two challenging data sets (an in-house database of 19 categories and the PASCAL VOC 2006 data set) that the proposed approach exhibits state-of-the-art performance at a modest computational cost.