Skip to Main Content
Real-world scenes involve many objects that interact with each other in complex semantic patterns. For example, a bar scene can be naturally described as having a variable number of chairs of similar size, close to each other and aligned horizontally. This high-level interpretation of a scene relies on semantically meaningful entities and is most generally described using relational representations or (hyper-) graphs. Popular in early work on syntactic and structural pattern recognition, relational representations are rarely used in computer vision due to their pure symbolic nature. Yet, today recent successes in combining them with statistical learning principles motivates us to reinvestigate their use. In this paper we show that relational techniques can also improve scene classification. More specifically, we employ a new relational language for learning with kernels, called kLog. With this language we define higher-order spatial relations among semantic objects. When applied to a particular image, they characterize a particular object arrangement and provide discriminative cues for the scene category. The kernel allows us to tractably learn from such complex features. Thus, our contribution is a principled and interpretable approach to learn from symbolic relations how to classify scenes in a statistical framework. We obtain results comparable to state-of-the-art methods on 15 Scenes and a subset of the MIT indoor dataset.