Suppose a set of arbitrary (unlabeled) images contains frequent occurrences of 2D objects from an unknown category. This paper is aimed at simultaneously solving the following related problems: 1) unsupervised identification of photometric, geometric, and topological properties of multiscale regions comprising instances of the 2D category, 2) learning a region-based structural model of the category in terms of these properties, and 3) detection, recognition, and segmentation of objects from the category in new images. To this end, each image is represented by a tree that captures a multiscale image segmentation. The trees are matched to extract the maximally matching subtrees across the set, which are taken as instances of the target category. The extracted subtrees are then fused into a tree union that represents the canonical category model. Detection, recognition, and segmentation of objects from the learned category are achieved simultaneously by finding matches of the category model with the segmentation tree of a new image. Experimental validation on benchmark data sets demonstrates the robustness and high accuracy of the learned category models when only a few training examples are used for learning without any human supervision.