Skip to Main Content
We propose an unsupervised approach to segment color images and annotate its regions. The annotation process uses a multi-modal thesaurus that is built from a large collection of training images by learning associations between low-level visual features and keywords. Association rules are learned through fuzzy clustering and unsupervised feature selection. We assume that a collection of images is available and that each image is globally annotated. The objective is to extract representative visual profiles that correspond to frequent homogeneous regions, and to associate them with keywords. Our approach has three main steps. First, each image is coarsely segmented into regions, and visual features are extracted from each region. Second, the regions are categorized using a fuzzy algorithm that performs clustering and feature weighting simultaneously. As a result, we obtain clusters of regions that share subsets of relevant features. Representatives from each cluster and their relevant visual and textual features would be used to build a thesaurus. Third, fuzzy membership functions are used to label new regions based on their proximity to the thesaurus entries. The proposed approach is trained with a collection of 2,695 images and tested with several different images.