Skip to Main Content
The proliferation of digital images and the widespread distribution of digital data that has been made possible by the Internet has increased problems associated with copyright infringement on digital images. Watermarking schemes have been proposed to safeguard copyrighted images, but watermarks are vulnerable to image processing and geometric distortions and may not be very effective. Thus, the content-based detection of pirated images has become an important application. In this paper, we discuss two important aspects of such a replica detection system: distance functions for similarity measurement and scalability. We extend our previous work on perceptual distance functions, which proposed the Dynamic Partial Function (DPF), and present enhanced techniques that overcome the limitations of DPF. These techniques include the Thresholding, Sampling, and Weighting schemes. Experimental evaluations show superior performance compared to DPF and other distance functions. We then address the issue of using these perceptual distance functions to efficiently detect replicas in large image data sets. The problem of indexing is made challenging by the high-dimensionality and the nonmetric nature of the distance functions. We propose using Locality Sensitive Hashing (LSH) to index images while using the above perceptual distance functions and demonstrate good performance through empirical studies on a very large database of diverse images.