Bag of visual patches (BOP) image representation has been the main research topic in computer vision literature for scene and object recognition tasks. Building visual vocabularies from local image feature vectors extracted automatically from images have direct effect on producing discriminative visual patches. Local image features hold important information of their locations in the image which are ignored during quantization process to build visual vocabularies. In this paper, we propose Spatial Pyramid Vocabulary Model (SPVM) to build visual vocabularies from local image features at pyramid level. We show, with experiments on multi-class classification task using 700 natural scene images, that the spatial pyramid vocabulary model is suitable and discriminative for bag-of-visual patches semantic image representation compared to using universal vocabulary model (UVM).