Skip to Main Content
The Bag of Words model is probably one of the most effective ways to represent images based on the aggregation of locally extracted descriptors. It uses clustering techniques to build visual dictionaries that map each image into a fixed length signature. Despite its effectiveness, one major drawback of this model is the codebook informativeness and its computational complexity. In this paper we propose Copula-BoW (C-BoW), namely an efficient local feature aggregator inspired by the Copula theory. In C-BoW, we build in a quadratic time an efficient codebook for vector quantization, based on the correlation of the marginal distributions of the local features. Our experimental results prove that the C-BoW signature is much more efficient and as discriminative as traditional BoW for scene recognition and video retrieval (TRECVID  data). Moreover, we also show that our new model provides complementary information when combined to existing local features aggregators, substantially improving the final retrieval performance.
Note: PDF Not Yet Available In IEEE Xplore. The document that should appear here is not currently available. IEEE Xplore is working to obtain a replacement PDF. That PDF will be posted as soon as it is available. We regret any inconvenience in the meantime.