This paper presents a template and its relation extraction and estimation (TREE) algorithm for indexing images from picture libraries with more semantics-sensitive meanings. This algorithm can learn the commonality of visual concepts from multiple images to give a middle-level understanding about image contents. In this approach, each image is represented by a set of templates and their spatial relations as keys to capture the essence of this image. Each template is characterized by a set of dominant regions, which reflect different appearances of an object at different conditions and can be obtained by the template extraction and analysis (TEA) algorithm through region matching. The spatial template relation extraction and measurement (STREAM) algorithm is then proposed for obtaining the spatial relations between these templates. Due to the nature of a template, which can represent object's appearances at different conditions, the proposed approach owns better capabilities and flexibilities to capture image contents than traditional region-based methods. In addition, through maintaining the spatial layout of images, the semantic meanings of the query images can be extracted and lead to significant improvements in the accuracy of image retrieval. Since no time-consuming optimization process is involved, the proposed method learns the visual concepts extremely fast. Experimental results are provided to prove the superiority of the proposed method.