Skip to Main Content
This paper presents a framework for near-duplicate image detection in a visually salient Riemannian space. A visual saliency model is first used to identify salient regions of the image and then the salient region covariance matrix (SCOV) of various image features is computed. SCOV, which lies in a Riemannian manifold, is used as a robust and compact image content descriptor. An efficient coarse-to-fine Riemannian (CTOFR) image search strategy has been developed to improve efficiency while maintaining accuracy. CTOFR first uses a computationally fast but less accurate log-Euclidean Riemannian metric to do a coarse level search of the entire database and retrieve a subset of likely targets and then uses a computationally expensive but more accurate affine-invariant Riemannian metric to search the returns from the coarse search. We present experimental results to demonstrate that SCOV is a very compact, robust, and discriminative descriptor which is competitive to other state-of-the-art descriptors for near-duplicate image and video detection. We show that CTOFR can yield significant speedups over traditional full search methods without sacrificing accuracy, and that the larger the database the higher the speedup factor.