Skip to Main Content
Content-based copy detection (CBCD) is one of the emerging multimedia applications for which there is a need of a concerted effort from the database community and the computer vision community. Recent methods based on interest points and local fingerprints have been proposed to perform robust CBCD of images and video. They include two steps: the search of similar fingerprints in the database and a voting strategy that merges all the local results in order to perform a global decision. In most image or video retrieval systems, the search of similar features in the database is performed by a geometrical query in a multidimensional index structure. Recently, the paradigm of approximate knearest neighbors query has shown that trading quality for time can be widely profitable in that context. In this paper, we introduce a new approximate search paradigm, called Statistical Similarity Search (S3), dedicated to local fingerprints and we describe the original indexing structure we have developped to compute efficiently the corresponding queries. The key-point relates to the distribution of the relevant fingerprints around a query. Since a video query can result from (a combination of ) more or less transformations of an original one, we modelize the distribution of the distorsion vector between a referenced fingerprint and a candidate one. Experimental results show that these statistical queries allow high performance gains compared to classical e-range queries. By studying the influence of this approximate search on a complete CBCD scheme based on local video fingerprints, we also show that trading quality for time during the search does not degrade seriously the global robustness of the system, even with very large databases including more than 20,000 hours of video.