An Efficient CNN Model to Detect Copy-Move Image Forgery

Recently, digital images have become used in many applications, where they have become the focus of digital image processing researchers. Image forgery represents one hot topic on which researchers prioritize their studies. We concentrate on the copy-move image forgery topic as a deceptive forgery type. In copy-move image forgery, a part of an image is copied and placed in the same image to produce the forgery image. This paper proposes an accurate convolutional neural network(CNN) architecture for the effective detection of copy-move image forgery. The proposed architecture is computationally lightweight with a suitable number of convolutional and max-pooling layers. We also present a fast and accurate testing process with 0.83 seconds for every test. Many empirical experiments have been conducted to ensure the efficiency of the proposed model in terms of accuracy and time. These experiments were done on benchmark datasets and have achieved 100% accuracy.


INTRODUCTION
Digital images are essential data that are used in many applications such as forensics [1], as evidence in the court, computer-aided medical diagnosis systems [2], social networks [3], and military [4]. Based on their importance, it is necessary to ensure their authenticity and keep their contents tamper-proof. Many computer programs enable users and ordinary people to falsify digital images, which results in the difficult detection of fake images by the eye. Because fraud tools have been widely available, it is now required to assess whether two types of pictures are fabricated or genuine. In other words, it is necessary to develop modern techniques to detect forged images. The main approaches for discovering image forgery are divided into active and passive approaches [5], as shown in Figure 6. The active approach enables us to insert watermarks, Digital Signatures onto images while creating them. The passive approach enables us to change correct information to incorrect information and shadow important images. The digital image forgery could be classified into five types, namely Copy-move forgery, image splicing, image retouching, morphing, and enhancement. Figures 1, 2 The copy-move is one of the most common types of digital image forgery. Many approaches for detecting copy-move forgery in digital images were proposed. Generally, we could classify these approaches into three main groups. First is the traditional copy-move forgery detection approach in which the well-known local feature extractors such as SIFT, SURF, and ORB [6]. Second, the orthogonal moment-based approach uses geometric invariant orthogonal moments to extract the features. The third is the deep learning-based copy-move forgery detection approach, in which various approaches of deep learning are used.

Traditional Copy-move forgery detection approach
Hashmi et al. [9] proposed an algorithm for copy-move forgery (CMF) detection based on Discrete Wavelet Transform. According to DCT and SVD, Zhao et al. [7] introduced an efficient method for CMF. This approach gives good results in the case of multiple CMF. Chihaoui et al. [8] combine Invariant Feature Transform (SIFT) and Singular Value Decomposition (SVD) methods to introduce an efficient approach for automatic detection of duplicated regions in the same image. The proposed approach demonstrated high robustness against the geometrical transformations. Dhivya et al. [10] suggested an approach for     Diwan et al. [14] suggested a new technique for CMF. They used the good results of the CenSurE keypoint and the FERAK as feature descriptors and produced a stable and accurate CMF detection algorithm. Priyanka et al. [11] merged DCT and SVD and introduced an efficient CMF detection algorithm. The proposed approach gives high accuracy in the presence of different image deformations. A novel technique for CMF detection based on SIFT and the reduced LBP has been introduced by Park et al. [12]. This approach reveals when compared with other existing methods.

Moment-based Copy-move forgery detection approach
Recently, various techniques for CMFD based on image moments have been proposed. Hosny et al. [20] suggested a fast and accurate algorithm for CMFD based on polar complex exponential transform moments PCETMs. The proposed approach exhibited high accuracy with different types of image deformations. The previous approach [20] has been upgraded using the quaternion concept applicable with color images Hosny et al. [21]. Meena et al. [22] introduced a very appropriate method for CMFD based on Gaussian Hermite Moments GHMs. The empirical results proved the accuracy of the proposed approach to detect the copy moved forged regions. Good characteristics of both techniques: speed-up robust feature SURF and PCET, was the motive for Wang et al. [23] to introduce an efficient and accurate method for CMFD, SURF is used to detect the key points. In contrast, the features of the images are extracted using the PCETMs. Wang et al. [24] merged the singular value decomposition SVD and the PCET approaches to introduce the SVD-PCET approach. At first, the invariant geometric moments of an image are extracted using the PCET, then SVD is used to reduce the dimension of the obtained feature matrix. Various experiments proved the accuracy of the SVD-PCET as a CMFD approach.

Deep Learning-based Copy-move forgery detection approach
One of the hot topics that have been used in various fields is deep learning. The CMFD represents one of these fields. Deep learning mainly depends on CNN. Through CNN, their many stages. At each stage, a set of features are generated. Some features are used as a training set. Methods based on deep learning reveal better performance than traditional and moment-based approaches. Recently, many CMFD approaches based on deep learning have been presented. Elaskily et al. [25] presented an efficient approach for automatic CMFD based on CNN, and the suggested approach achieved 100% accuracy when applied to different datasets. Goel et al. [28] suggested a CMFD system based on a novel technique called dual branch CNN. The proposed system proves good results in terms of time and performance. Ortega et al. [27] proposed two approaches for CMFD based on deep learning: a custom architecture model and a transfer learning model. The proposed system has been tested over eight benchmark datasets. Abhishek et al. [26] introduced an efficient system to detect and localize the image forgeries based on deep CNN and semantic segmentation. The obtained results give accuracy above 98%. Jaiswal et al. [29] presented a CMFD model it used multi-scale input and two blocks of convolutional layers: encoder and decoder blocks. The empirical results proved the high accuracy of the proposed system. As a result of the previous discussion, it's starting to show a shortage of previous works, and the shortage motivates the author to propose an efficient CNNbased method. The main contributions presented by this study can be summarized as follow:  An efficient and accurate CNN model was proposed. It achieved a promising accuracy score as compared with the other investigated models.  The proposed model is lightweight. It contains three convolutional layers, three max-pooling, 266306 hyperparameters, and one fully connected layer.  An analytical comparison of normal and forgery is conducted between the proposed model and the other investigated models (M. Elaskily et al. [25], Amerini et al. [15], Amerini et al. [16], Elaskily et al. [17], Mishra et al. [18], Kaur et al. [19], J. Zhong et al. [31], Y. Wu et al. [32], A. Islam et al. [33], and Y. Zhu et al. [34]). The obtained results are superior to other recently published approaches.
The rest of this study contains four sections as follows: Section 2 discusses, in preliminaries, the CNN description. The structure of the proposed approach is presented in Section 3. Our results and discussed in Section 4. Finally, sec. 5 the conclusion.

The Description of CNN
In this section, we describe in brief the CNN model. CNN is a convolution neural network. Its task is to extract the important features in the image. Deep learning consists of three basic layers: the convolution layer, pooling layer, and fully connected layer. CNN includes many layers: convolutional layer, maxpooling layer, flattening layer, and full connection layer, as shown in Figure 7.
A. The convolutional layer: is the activation function, and it is a non-linear function. It has several types; the activation function is most commonly used. It is a nonlinear function with several types, as shown in Figure 8.
The most commonly used of them are:  ReLU (rectified linear unit) Its importance is reducing the number of accounts performed.  Sigmoid, which is used in the output layer. B. Max-pooling layer: It collects the features extracted from the image, reduces the dimensions, and extracts the most important features present in the image, as shown in Figure 9.  C. Flattening layer: it converts the characteristics taken from max-pooling into a one-dimensional matrix D. Fully connected layer: it puts all the neurons together.

Proposed method
In this paper, an accurate deep CMF detection method was introduced. The proposed approach is based on the CNN model, as shown in Figure 10. The traditional approach works on a block-based algorithm, while the CNN approach works on the whole image. The presented approach has three stages: preprocessing ,feature extraction, and classification.
The input image is resized to enter the next stage without cropping any image parts in the preprocessing data stage. The feature extraction stage contains three convolution layers, followed by a max-pooling layer. At the end of this stage, a full connection layer connects all features with the dense layer. Finally, the classification stage is called to classify the data into two classifications (forged or original). The convolution layers as feature mining, in which each convolution layer generates its feature maps using its own set of filters (i.e., ReLU). By starting with the feature maps produced from the first convolution layer, the next maxpooling layer produces resized pooled feature maps, which are considered the inputs of the next convolution layer. The last feature maps merged with the final max-pooling are formatted as vectors and incorporated into Fully Connected. Finally, the dense layer classifies the features extracted from the fully connected layer into two classes (original or tampered). The proposed model uses the optimizer "rmsprop" and batch size 32, which allows it to be efficiently trained.

Results and Discussion
This section and a comprehensive assessment of the proposed approach's findings. The tests have been run on the Google Collaborator server with Google compute engine backend (GPU) RAM: 2.5GB/12GB. The TensorFlow with Keras as a backend, using python 3.0.

Evaluation metrics
To estimate the accuracy of the proposed approach, we used the following accuracy measure: T P represent the number of tampered images that are genuinely detected as tampered images, while F P represent the number of original images that are falsely detected as tampered images. The refers to the number of tampered images falsely detected as original images. T N represent the number of original images that are genuinely detected as original images. We have used the Logarithmic loss (Log Loss) to determine the false classified classes. If we have M classes containing N samples, the Logarithmic loss is:

=1
Where indicates whether (a) belongs to category (b) or not; the indicates that this sample (a) may belong to category (b). the accuracy value being higher If the Logarithmic loss is near to zero. The Test time (TT) is a key factor in assessing the given method time varies with other algorithms; the TT is the time average spent testing images for (k) iterations of the test process.

The results over MICC-F2000 data set
Our study tested over the MICC-F2000 [15] [34], as shown in figure 11.  Results obtained through Table 3 specified that the proposed approach is superior to the compared method [25] with values 2, -2, -4.53 for accuracy, Log loss, and TT, respectively. These results were the best. Through Table 3, we summarized the results obtained at no of epochs 25. These results were the closest to the best results in Table 3. Also, the superiority of these results was in favor of the proposed approach with an average efficiency gain of 2.5, -2.59, -6.36 with the accuracy, log loss, and TT, respectively.

The results over MICC-F600 data set
The proposed approach was tested over the MICC-F600 [16] data set. The obtained results have been evaluated against other recently published methods [25,15,16,17,[31][32][33][34]. We demonstrated the confusion matrices for the proposed approach and investigated approaches shown in  [34], as shown in figure 12.
The introduced approach has the best results at no of epochs 35, where when compared with the results in [25], the obtained efficiency gain was in favor of it with values 3.9, -3.9, and -1.2 in terms of accuracy, Log loss, and TT respectively. These results appear in Table 6. We summarized the results obtained at no of epochs 25 in Table  6. These results were the closest to the best results we obtained in Table 6. Also, these results' superiority favored the proposed approach with an average efficiency gain of 3.69, -3.68, -0.21with the accuracy, log loss, and TT, respectively.  Table 7 presented experimental results among the proposed approach and other compared approaches [15][16][17]25]. The results showed outperformance favoring the proposed approach regarding accuracy and TT.
The introduced approach has the best results at no of epochs 35, where when compared with the results in [25], the obtained efficiency gain was in favor of it with values 2.38, -2.38, and -0.48 in terms of accuracy, Log loss, and TT respectively. These results appear in Table 9. We summarized the results obtained at no of epochs 25 in Table 9. These results were the closest to the best results we obtained. Also, the superiority of these results was in favor of the proposed approach with an average efficiency gain of 2.05, -2.05, -0.5 with the accuracy, log loss, and TT, respectively. Table 10 presented experimental results among the proposed approach and other compared approaches [15][16][17][18][19]25]. The results showed outperformance favoring the proposed approach regarding accuracy and TT.

Conclusion
In conclusion, this study introduced a Copy-move Forgery Detection methodology based on deep neural learning. The proposed model can recognize the tampered images, classifying the candidate's image into two types of classification: forged and original. The system that has been proposed can elicit create feature vectors from an image's features. The suggested approach automatically uses the full connection layer to find feature correspondences and dependencies. The proposed model must be trained first to be ready to test and then classify the tampered images. The performance of the proposed model was assessed through three benchmark datasets: MICC-F2000, MICC-F600, and MICC-F220. The numerical results after investigating and compared with other approaches reveal superiority in favor of the proposed approach. We obtained 100% accuracy at no of epochs 35 with all datasets. In the case of TT, we also obtained good results compared with other algorithms. With the datasets MICC-F2000, MICC-F600, and MICC-F220, we obtained TT equal 47.48sec, 7.73 sec, and 0.83 sec, respectively. All empirical results proved the high superiority of the proposed model against other reported algorithms in terms of accuracy and TT.