Skip to Main Content
A challenging problem in image content extraction and classification is building a system that automatically learns high-level semantic interpretations of images. We describe a Bayesian framework for a visual grammar that aims to reduce the gap between low-level features and user semantics. Our approach includes learning prototypes of regions and their spatial relationships for scene classification. First, naive Bayes classifiers perform automatic fusion of features and learn models for region segmentation and classification using positive and negative examples for user-defined semantic land cover labels. Then, the system automatically learns how to distinguish the spatial relationships of these regions from training data and builds visual grammar models. Experiments using LANDSAT scenes show that the visual grammar enables creation of higher level classes that cannot be modeled by individual pixels or regions. Furthermore, learning of the classifiers requires only a few training examples.