Skip to Main Content
The “bag of words” model has enjoyed much attention in the studies of object categorization. As implied by the name, the images under consideration are modeled as a bag containing multiple features. Despite its simplicity, this model has been able to achieve great performances in many state of the art object categorization datasets. Using this model, we extract patches from an image and categorize them as codewords, forming the “bag of words”, which then used for object categorization. This model tends to assume the independence between patches, which greatly reduces the complexity. However, in this paper we take out the independence assumption and model the dependencies of the local regions. We move further by taking into account the cardinality of the patches to reduce the effect of noise patches. This collection of codewords acts as the building block of latent themes shared among images and categories, which distribution is learnt using a variation of the Hierarchical Dirichlet Process. In this paper, we introduce the contribution of region cardinality in the linkage of the latent themes to improve the learning and detection performance. The result of modeling the image, as obtained from our experiment, shows that our proposed model handles the presence of noise patches robustly with a more discriminative in categorizing the objects. All experiments are executed on the Caltech-4 datasets.