1. Introduction
Current state of the arts Copy-Move Forgery Detection (CMFD) algorithms perform extremely well on known public datasets. But one must consider practical use cases when studying copy-move. In particular, the application of CMFD algorithms on ID documents, requires the method to detect small duplicated elements in the presence of many Similar but Genuine Objects (SGO). To be practical in such situations, CMFD algorithm should be able to maintain the lowest false positive rate to avoid any manual verification. Because CMFD methods search for similarities in images, they will most likely struggle in the presence of SGO. Even though this fact has been acknowledged by the authors of [1] when proposing the COVERAGE dataset, most research often uses other public dataset, such as [2,3,4,5], to evaluate their works. On those datasets, it is common to observe near perfect results. But they do not represent realistic use cases for copy-move forgeries as they often contain large duplicated elements in images without SGO and are thus most of the time obvious and rather easy to detect. Sadly, apart from [1], no datasets propose challenging images with SGO. In this paper we propose the Copy-Move ID (CMID) dataset, a novel Dataset for copy-move forgery detection. Our dataset contains 893 forged images of ID documents. We used ID documents as it is a practical use case of CMFD algorithms and because an ID document contains many SGO which makes it extremely challenging for CMFD method. We evaluate state-of-the-art algorithm on this novel dataset to further confirm the issue first presented by [1]. We will first go through the current state of the arts CMFD method and datasets commonly used. Then we will describe how we automatically generated our dataset. And finally we will evaluate current methods on this dataset to provide a baseline result.
https://cmiddataset.github.io/