Skip to Main Content
This work is about scene interpretation in the sense of detecting and localizing instances from multiple object classes. We concentrate on object indexing: generate an over-complete interpretation - a list with extra detections but none missed. Pruning such an index to a final interpretation involves a global, often intensive, contextual analysis. We propose a tree-structured hierarchy as a framework for indexing; each node represents a subset of interpretations. This unifies object representation, scene parsing, and sequential learning (modifying the hierarchy as new samples, poses and classes are encountered). Then, we specialize to learning-designing and refining a binary classifier at each node of the hierarchy dedicated to the corresponding subset of interpretations. The whole procedure is illustrated by experiments in reading license plates.