Skip to Main Content
This paper introduces the contextual dissimilarity measure, which significantly improves the accuracy of bag-of-features-based image search. Our measure takes into account the local distribution of the vectors and iteratively estimates distance update terms in the spirit of Sinkhorn's scaling algorithm, thereby modifying the neighborhood structure. Experimental results show that our approach gives significantly better results than a standard distance and outperforms the state of the art in terms of accuracy on the Nisteacuter-Steweacutenius and Lola data sets. This paper also evaluates the impact of a large number of parameters, including the number of descriptors, the clustering method, the visual vocabulary size, and the distance measure. The optimal parameter choice is shown to be quite context-dependent. In particular, using a large number of descriptors is interesting only when using our dissimilarity measure. We have also evaluated two novel variants: multiple assignment and rank aggregation. They are shown to further improve accuracy at the cost of higher memory usage and lower efficiency.