Border Control Morphing Attack Detection With a Convolutional Neural Network De-Morphing Approach

,


I. INTRODUCTION
The face recognition process is a well-known biometric identification challenge due to the high accuracy rates achieved and low intrusion to the subjects under identification. To approach the facial authentication process, there are several methods. Some of these approaches have high security requirements. An example is the automatic border control (ABC) system, in which the biometric trait is used to control and assure the border crossing process. Three biometric elements (iris, fingerprint and face) could be considered in The associate editor coordinating the review of this manuscript and approving it for publication was Jinjia Zhou . ABC systems, but as a matter of fact, only the face is widely considered at airports. The authentication process of the ABC has to determine whether or not there is a coincidental match between the facial image stored in the Electronic Machine Readable Travel Document (eMRTD) and a snapshot taken in situ (see Fig. 1).
The ABC systems are exposed to multiple attacks or threats, for example, identity theft or fraud, which also is called spoofing. For this reason, many current research works focus their attention on anti-spoofing techniques [1]. Morphing attack is one of the most dangerous attacks because of its high difficulty to be detected. It is based on the application of morphing techniques to the facial image recorded in the VOLUME 8, 2020 This passport or traveler document. The morphing attack consists of manipulating and storing inside of the eMRTD a morphed image between the real owner's ID card (accomplice) and the surrogate or impostor (criminal). Then, the system should distinguish whether the traveler is who it claims to be or not. This approach relies on comparing the taken picture at the site (ABC) and the eMRTD's image that contains the potentially morphed image. If a MAD module is not present in the ABC, the usual verification response will be acceptance, considering the high similarity between the criminal face and the morphed criminal+accomplice image. The morphing process emerged from Arts' world such as films, video clips or advertisements as an art resource to achieve awesome special effects [2], [3]. In the beginning, the process was handcrafted but this situation changed quickly due to the emergence of the first new algorithms that automated the morphing tasks [4]. It should be noted that it was an arduous task, even to experts, to distinguish two faces when they were merged [5]- [7]. Thus, the technique evolved from an art resource to a spoofing toolkit [8].
Morphing of facial images can be considered as one of the most important threats of ABC systems [9]- [11] since applications based on face recognition are likely to be deceived [12], [13]. For instance, the outcomes of the National Institute of Standards and Technology (NIST) Face Recognition Vendor Test MORPH (https://pages.nist. gov/frvt/html/frvt_morph.html) discussed that the submitted MAD algorithms lack robustness and performance when considering unseen and challenging corpora, as explained in [14]. However, other biometric features have been considered in morphing attacks such as fingerprints [15] or the iris [16]. In any case, the focus of this study is on facial morphing since this attack is the most devastating and difficult to detect in ABC systems.
In recent years, the wide use of ABC systems in airports has increased the attention and the study of the possible multiple menaces (e.g., presentation attack) as explained by the European Border and Coast Guard Agency (FRON-TEX) [17], [18]. These attacks incentivize the proliferation of algorithms about presentation attack detection (PAD) [19]- [21] and especially morphing attack detection (MAD) [13], [22] because it is a difficult paradigm to be detected.
In this paper, a novel method to detect morphing attacks is explained using a reverse de-morphing approach based on convolutional neural networks. There are several differences compared with previous works [23], [24], which are explained as follows.
Ferrara et al.'s work consists of detecting the morphing attack, the elaboration of two corpora (PMDB and MorphDB) and the assessment of the quality of two corpora using a commercial off-the-shelf (COTS) algorithm. The key point in Ferrara's algorithm is that their algorithm depends on the prior knowledge of the generation of the morphed face, such as the morphing process and the morphing parameters. Moreover, the reconstruction faces rely on the inverse engineering process of morphing tasks using a mathematical method. Finally, this work is based on Delaunay-Voronoi triangulation but there are new approaches in which the demorphing process is performed with neural networks. For instance, Damer et al. [25] and Peng et al. [24] propose the use of the generative adversarial network (GAN). Regarding Peng et al.'s work, it is based on disentangling the accomplice identity from a potentially morphed image. However, the authors developed approach is divided into two aims. The first aim consists of unraveling the criminal identity. The second aim relies on comparing the image obtained in the previous stage with the in vivo image obtained in the ABC gate. Therefore, the authors can conclude whether morphing attack occurred or not. Additionally, the de-morphing process of Peng et al. is based on a GAN, but the presented approach relies on an autoencoder architecture. Regarding MAD, another key point is that none of the approaches consider print and scan images in their studies. Finally, Peng and Ferrara's works take the pictures for their corpus in a controlled environment. Nevertheless, in this research work, 1170 in vivo images taken in vivo in the eGates or automated boarding control system are used.
The paper is organized as follows: the state of the art is presented in the following section. The dataset is then described. Since a morphing method is required, Section IV is devoted to presenting a morphing method and its adaptation to passport control in the ABC. Subsequently, (Section V), the de-morphing approach is detailed. Section VI points out the results and provides discussion. The conclusions are presented in the last section.

II. PREVIOUS WORKS
In recent years, as morphing techniques have undergone experimental investigations, an impressive improvement in several aspects such as visual quality and automation generation has been achieved. From a substantive viewpoint, morphing's corpora are designed with open source and well-known software such as the GNU Image Manipulation Program (GIMP) which has a plugin called the GIMP Animation Package (GAP) [26]. This plugin is able to merge images [10], [13], [23], [27], but most of the software uses the Delaunay-Voronoi triangulation algorithm (DVT) [28]- [33] and a swapping technique to improve the outcome achieved [34]- [39]. Moreover, some current research works use morphing pictures with generative adversarial networks (GANs) instead of using the triangulation process as mentioned previously [25].
Two MAD implementations can be found in the literature, depending on morphing attack scenarios: a) MAD with a single image (no-reference). Only one morphed image is available. b) MAD with two images (differential MAD). The morphed picture and another one are used. This is the typical scenario in ABC systems [10], [23], [24], [28]. The first approach, no-reference, seeks to determine the noise or the deterioration in terms of quality of the image. The picture achieved after the morphing process, however, presents low quality. For this reason, this technique is based on micro-texture analysis or spatial descriptor occurrences or spectral analysis with the Fourier transform.
In addition to the structural descriptor and texture analysis, other studies assess the degradation of the image through spectral image analysis. Some researchers try to detect a possible manipulation using the last mentioned technique [35]. Others try to evaluate the noise pattern employing the photo response non-uniformity (PRNU) approach [50]) in the full image [32] or each region [33].
With the advent of deep learning in the last decades, some approaches use convolutional neural networks (CNNs) to detect the morphing process [25], [38], [39], [51]. Some of the most well-known corpora are VGG19 [52], AlexNet [53] or GAN [54], [55]. The main drawback of these kinds of corpora is the number of samples required to train models. For this reason, some research works use pretrained networks, that is, networks with precalculated weights such as FaceNet [56] or VGG-Face [57].
Differential MAD needs two images for morphing detection and often proposes solutions for similar ABC systems where two images of identities are available. For instance, Scherhag [58] seeks SIFT descriptors in the ID passport image and the in situ image. Once the descriptors of both images are detected, they are compared. It is important to remark that in this case, the ID passport image is not a trustworthy image but a fake one. This fake image is based on the surrogate image and the ABC person's face. The approach is similar to the previous research study in [58], but this time, the amount and position of the face landmark detected are compared [59].
However, other differential MAD approaches take greater advantage of two available images and propose that when one of the identities is removed from the morphed image, the other one remains [10], [23]. This removal process is named de-morphing.
Both differential and no-reference MAD approaches have a challenge with real-world images. The real morphed images have often been printed and scanned, and then, this final image is embedded in the passport. Given this action, MAD algorithms are no longer able to detect manipulated images. The aforementioned studies [10], [23] also analyze this problem.

III. DATASET
This research work was carried out with FRAV-ABC database (see Fig. 1), which was designed and developed by the research group FRAV following all conditions present in a general ABC system. Initially, a study of the facial databases available for the research community was performed, but none of them exactly fulfilled border crossing and ABC conditions. Considering the strict procedure in which a passport image is acquired and the normative restrictions (ICAO Doc 9303 [60]), a new database was acquired. To achieve it, a real airport ABC infrastructure was used that followed all of the aforementioned conditions to emulate as far as possible a real morphing attack.
The corpus was composed of 1170 individuals, 640 females and 530 males, with an age range of study participants between 18 and 74 years old. Indeed, 70% of subjects ranged between 25 and 50 years old. Each subject provided two images. The first was a chip image of a real passport, and the second was an in vivo image. Chip images have a resolution of 250×300, which are color images of real passports that comply with the standard regulation of the International Civil Aviation Organization (ICAO) Document 9303 for the eMRDT [61], and in vivo images have a resolution of 300×300 pixels, which are color images captured in situ at the airport by an ABC device.
The corpus was divided into two data sets: FRAV-ABC-Train with 1000 subjects (70%-30% as recommended in [62], [63], where 700 are used to train and 300 are for validation) and FRAV-ABC-Test with 170 subjects (roughly a 15% of the total). The authors designed and developed a large corpus. The way to build it is explained as follows. VOLUME 8, 2020 On the one hand, a thousand subjects were mixed and morphed to each other, except themselves. This action provides (1000 × 1000) -1000 combinations. Thus, the final training corpus returned 999.000 images. Note that the age, gender, and ethnicity of the subjects were not considered because the authors wanted to accomplish producing a robust data set. On the other hand, the test corpus was designed and developed in a similar way. Specifically, 170 images were combined with each other, except themselves. Therefore, the arithmetic equation returns 28730 images.
The verification process was performed with a Tensor-Flow implementation of the face recognizer described in [56] but reimplemented and published by [64]. Moreover, this implementation is based on ideas from [57]. This available subsystem is a facial recognition system with high accuracy rates (99.63% in LFW (Labeled Faces in the Wild [65]) and 95.12% in YTF (YouTube Faces [66])).
To follow the Spanish passport image generation procedure, 170 morphed images of the FRAV-ABC-Test data set were printed at 300 dots per inch (dpi) quality with a LaserJet color printer. Next, these images were scanned to build a new data set (denoted by FRAV-ABC-Test-P&S-300). Furthermore, a new degradation step was carried out to assess the efficiency of algorithms used in this research work. FRAV-ABC-Test-P&S-150 was devised from the new process of printing and scanning of the same 170 images, but the images were printed at 150 dpi (see Fig. 2). In this way, a whole set of ''fake morphed'' passports with a highly realistic appearance was created. Finally, it should be highlighted the a publicly available database, CASIAWebFace [67], which has 500K facial images, was used. This database has been used for autoencoder face training.

IV. MORPHING METHODS AND SETUP
This section describes the morphing process and is split into two different parts. The first part will detail the morphing process selected and adapted to obtain a realistic image. This process of visually detection for a border guard can be an arduous task. The general facial morphing procedure has been tailored to suit the problem taken into account. Thus, the state-of-the-art algorithm and the enhancements added will be described. The second part will show the need for a morphing detection module in a face verification system.

A. MORPHING PROCESS
Currently, it is simple to find morphing commercial software [26], [68]- [74]. All of them provide a high-quality performance with morphed images, but it should be noticed that researchers have to perform manual manipulation and generate a large enough number of images to achieve an adequate corpus for their research works. The well-known algorithm [22], [23] used in the literature is adapted to the studied problem and explained below. a) Given two pictures (see Figure 1(a)), 76 reference points are located in each one. There are 68 face landmarks, calculated with the Kazemi and Sullivan algorithm [59] (Dlib [75]) and eight more in the middle of image boundaries (see Figure 3 (a)). b) The alignment process is mandatory. To carry out this task, the position and size of both images must match up at eye level. c) Both images are triangulated by the Delaunay-Voronoi algorithm (DVT [76]) (see Figure 3 (b) and (c)) and each triangle in one image has a counterpart triangle in the other image ( Figure 4). d) Each triangle is blended in only one triangle whose vertexes are the midpoints ( Figure. 4 (d)). The result of merging all triangles is the average image ( Figure 5(a)), and this process is called the warping process [43], [77], [78].
The average image has ghost artifacts in peripheral regions and has low quality as an attack because it is too detectable. For that reason, it is necessary to carry out some enhancements that will be explained in the following.
Two main enhancements have been considered to obtain a realistic appearance without losing the morphing effects in the final image. a) For cropping, in the target image, the convex hull of the face peripheral landmarks are placed in some of the source images 5 (b). b) Using the Poisson image editing [79], the merging process is carried out. This method avoids hard seams, different capture illumination conditions or distinct skin colors (see Figure 5 (c)).

B. FACIAL VERIFICATION UNDER MORPHING ATTACKS
Facial verification systems are not prepared to deal with morphing attacks [12] and the common MADs are not effective with noisy or low-quality images [31], [41], [80]. The main problem with the verification process is the acceptance threshold because it is complicated to distinguish whether a morphing attack is produced or not due to the probability density function (PDF) that is located between the positive and negative acceptance, as shown in Fig. 6. Then, it is difficult to establish a rejection threshold for the transformation. Fig. 6 and 7 depict the similarity scores obtained with FaceNet and one of COTS used by Ferrara et. al in [8], respectively, from different presentations. Each illustration shows three kinds of curves or areas. The first is a genuine traveler (positive presentation denoted by blue), the second is an impostor (negative presentation signified by orange), and finally, the third is a morphing attack (morphing presentation designated by a red striped line). The left plot depicts the test with FRAV-ABC-Test, the middle plot illustrates the test with FRAV-ABC-Test-P&S-300 and the right plot represents the test using the FRAV-ABC-Test-P&S-150 data set.
As observed in both figures, the scores of genuine individuals and impostors are well separated, and it is possible to define an adequate threshold to achieve high accuracy rates with FaceNet as well as with the COTS when digital images are used. However, the problem is more complex with the print and scan images (see pictures (b) and (c) in Figure 6 and Figure 7). Therefore, it is mandatory the use of MAD systems to prevent plausible attacks in both cases, open source systems (FaceNet) and COTS.

C. MAD SYSTEM UNDER PRINT AND SCAN IMAGES
The photo response non-uniformity (PRNU) system [50] is depicted in Fig. 8. The PRNU is selected as an example of existing MAD methods to be compared with the current approach considering several data sets. There are three data sets such as (a) FRAV-ABC-Test, (b) FRAV-ABC-Test-P&S-300, and (c) FRAV-ABC-Test-P&S-150). Each row is divided into two kinds of illustrations. On the left side, bona fide and morphing images are shown. On the right side, the histogram of one hundred bona fide images against one hundred morphing images are depicted. Regarding the first histogram, there exists a small difference between both pictures. However, it is difficult to unravel or distinguish the images' histograms when print and scan images are examined.

V. DE-MORPHING APPROACH CONSIDERED
With the morphing scheme described in the previous section, the de-morphing approach can be presented. The de-morphing process does not depend on the morphing scheme considered. The advantage of the previously described morphing method is that it avoids ghost effects or abrupt skin texture changes, making the de-morphing process a truly challenging situation. Morphing procedures that are not as complex can be visually detected devoting some attention to ghost artifacts in the hair or face limits and face skin color changes.
The de-morphing process is shown in Fig. 9. The input of the de-morphing process is the in vivo and the passport images and the output is one image. The goal of the de-morphing process is to unravel the chip image. If the chip image is a morphed image, the in vivo image is unlinked from the chip, and the output will be a new image. This new image will be quite different from the in vivo image since only the information from the other image used in the morphing process (impostor) will remain. If the chip image is a nonmorphing image, the output will be similar to the in vivo image. Therefore, the last stage in the process will be an identity verification process between the output image and VOLUME 8, 2020   the in vivo image. If both images are similar and they can be assumed to have the same subject, the chip image is not morphed. However, if both images are not similar, that is, if researchers can assume that those images are from different subjects, it can be noticed that the original chip image is a morphed image. That morphed image is a mixture between the in vivo subject and the subject whose information has been kept in output image.
The de-morphing process has been split into two parts. The first one is a facial autoencoder for each of the input images that is followed by a decoder network. Therefore, three neural networks will be used: two encoders of the same size and architecture (one for the in vivo and another one for the chip images) and one decoder neural network.

A. DE-MORPHING PROCESS
This process tries to discover the initial pictures from two images. These images could be bona fide or genuine; in contrast, they could be fraudulent. Two pictographs are used in Figure 9 to explain the meaning of these images. The green passport means a genuine chip picture and the red passport means a fraudulent chip image in which the morphing process is carried out. Finally, the blue camera symbol is used to explain the in vivo picture in ABC.
In Equation 1, morphing and de-morphing processes are illustrated. Operations on sets help to understand the role that different images (in vivo or chip images) play in the process.
(1) The intersection of two images (A and B) is the morphing process representation, where C is the morphed image. The difference between C-A represents the de-morphing process. C-A should be an image similar to B (in case C is a morphed image); additionally, C-A will not be similar to the A image. If a morphed image is presented in an ABC gate, only C and A image are obtained (C from the chip and A from the in vivo image). If C-A is not similar to A, it can be assumed that a B image has been used to compose a morphing attack. Therefore, if the compared output snapshot and in vivo image are similar and the in vivo image is also similar to the chip image prior to the de-morphing process, then it might be noticed that the output is not a morphing attack, as illustrated in Fig. 10(a). However, if the output picture is not similar to the in vivo image (regardless of the similarity with chip image before the de-morphing process), then a morphing attack was performed, as depicted in Fig. 10(b).
Once the de-morphing process is performed, the similarity output is compared with the in vivo and chip images. In Fig. 11, output distributions of the classifier are shown. Two examples are depicted to illustrate the difference between a morphed and non-morphed process. When a genuine chip image is compared to an in vivo image, both plots are overlapped largely as shown in Fig. 11(a); in other words, the distance among pictures is minimum. In contrast, if the comparison is based on a morphed or manipulated chip image and a in vivo snapshot, the distance among plots is evident as shown in Fig. 11(b).
Moreover, the Figure 11 depicts the analysis of the difference between the likeness degree of the de-morphing image and a previous chip image with blue. The figure also shows the likeness degree of the de-morphing image and the in vivo image is illustrated with orange. From these two plots, the probability density function can be calculated to detect a morphing attack, as shown in Fig 11 (c). The probability density function has been computed from the difference of similarities from de-morphing image and the passport and in vivo images.
The de-morphing process is performed by a convolutional network (see Fig. 12(c)), which is composed of two extraction branches of features. These features are based on an VOLUME 8, 2020 autoencoder (see Fig. 12(b). This autoencoder is divided into several layers for reconstruction of the image. As depicted in Fig. 13, the first layers of the convolutional network extract the features of input images (chip and in vivo). The transposed convolution of reconstruction layers are in charge of distinguishing whether these features are not located in both images. Indeed, if this were the case, it could be assumed that the process obtains the criminal's features.

B. AUTOENCODER
The autoencoder whose architecture is described in Fig. 12(b) is composed of two stages, encoding and decoding. Both stages belong to a specific convolutional neural network (CNN) similar to that described in [63]. In the first stage, encoding (denoted by blue striped line), reduces the initial shape image 224×224×3 to 28×28×256 without losing critical attribute information, as depicted in Fig. 14. This figure shows the original input images (column a) and on the right side, the output reconstructed images after the encoder process (column b). The main facial information remains during the process.
The second stage, decoding, provides the original input image (224×224×3) using transposed convolution successive layers (denoted by red striped line) [81]. The architecture of encoder layers of the autoencoder is the same as the VGG-Face first layers [57], as depicted in Fig. 12(a). VGG-Face is a CNN implementation designed to identify and verify individuals with high accuracy rates such as 98.95% when they use Labeled Faces in the Wild (LFW) [65] or 97.3% when they use YouTube Faces (YTF) [66]. It should be noted that VGG-Face provides pretrained weights with 2.6 million faces. As shown in the state-of-the-art section, these kinds of neural networks are well suited for extracting information from face images.
As explained above, VGGFaces have pretrained weights in their first layers but they were insufficient to obtain good results in the current problem. Thus, the decoder layers should be trained to achieve final high accuracy rates.
On the one hand, the autoencoder was trained with Ten-sorFlow 1.15 library with CASIAWebFace pictures, using the same identities (faces) as inputs and outputs in every single step. The autoencoder was trained approximately with 3000 epochs or iterations, with 512 samples per batch, using mean squared error (MSE). Moreover, the option of ''early stopping (patience = 500)'' was used in all scenarios, that is, if the algorithm did not improve in 500 iterations, then it was stopped. The graphic card used to train the current autoencoder was a NVIDIA GeForce GTX 1050 (8 GB RAM). This study relied on the learning rate used in [82]. The learning rate for model fine-tuning starts from 0.005 and decreases to 0.001. Finally, an adaptive momentum (Adam) [83] was used as the optimization algorithm.
On the other hand, it is necessary to assess the similarity between the input and output images to test the autoencoder yield. To perform this assessment, 170 chip images of FRAV-ABC-Test corpus were processed and calculated by   autoencoder and FaceNet facial verification acceptance probability. Moreover, FaceNet was used instead of the VGGFace corpus because the first one avoided noisy outcomes caused by the use of the same autoencoder's architecture.

C. DE-MORPHING FACES
Once the encoding process has ended, two images are obtained as output. Their sizes are 28×28×256. After that, these images are concatenated with only one output image whose size is 28×28×512. This image merges the two previous images' information. Finally, the decoder returns an output image with the original resolution (224×224×3), using transposed convolution successive layers, as depicted in Fig. 12(c).
The training process of the de-morphing neural network is based on a supervised classification algorithm like all CNNs. To obtain a robust training corpus, it is necessary to perform a large number of combinations. The training subjects were 1000. From those subjects, 700 were used as the training set and 300 as the validation set. Therefore, all combinations increase to approximately one million morphing images. The network was trained with the TensorFlow 1.15 library and was trained in 5000 epochs or iterations, with 512 samples per batch, with GeForce GTX 1050 (8 GB RAM), using mean squared error (MSE). As in [82], the learning rate for model fine-tuning starts from 0.005 and decreases to 0.001. Finally, an adaptive momentum (Adam) [83] as the optimization algorithm has been used.

VI. RESULTS AND DISCUSSION
This section presents the evaluation metrics commonly followed in the morphing attacks detection approaches and the results obtained in the presented work.

A. EVALUATION METRICS
Recently, the community has achieved a common standard ISO (IEC 30107-3:2016) [84] to evaluate PAD systems. In this standard, the capability of the attack detection is measured with the following errors: attack presentation classification error rate (APCER) and bona fide presentation classification error rate (BPCER). This measure can be defined as follows: • Attack presentation classification error rate (APCER) is defined as the proportion of presentation attacks that have been classified incorrectly (as bona fide) [84] ( Equation 2).
• Bona fide presentation classification error rate (BPCER) is defined as the proportion of bona fide presentation incorrectly classified as presentation attacks [84] (Equation 3).
where |PAI | is the number of presentation attack instruments (PAI) and RES ω takes the value 1 if the presentation ω is assessed as attack and 0 if it is evaluated as bona fide. A PAI is defined as a used object or biometric trait in a presentation attack.
where |BF| is the cardinality of bona fide presentations and RES i returns the value 1 if the presentation ω is allocated as an attack and 0 if it is analyzed as bona fide. An APCER-BPCER DET curve (detection error trade-off) and the EER (equal error rate) where both errors are identical, provides a comparison among MAD systems.

B. RESULTS
The study estimates the quality of morphing attack detection. It explores its potential application in ABC, considering the images of the FRAV-ABC dataset acquired in a real ABC system. This research work presents a specific set of experiments concentrated in combinations of different kinds of pictures such as in vivo, chip, and Print & Scan photos, as described in Fig. 15.
APCER and BPCER errors of three corpora (FRAV-ABC-Test, FRAV-ABC-Test-P&S-300 and FRAV-ABC-Test-P& S-150) are shown in Fig. 15. Each curve represents results with one database. Curves closer to the origin (bottom left) present a lower EER and therefore represent better performance. The accuracy rate obtained increases to 98% in all corpora. The first corpus, FRAV-ABC-Test, obtained 0.78 EER and an accuracy rate of 98.7% with a similar threshold. This corpus contains the original images without any compression or downsampling. The two other corpora, FRAV-ABC-Test-P&S-300 and FRAV-ABC-Test-P&S-150, obtained analogous outcomes. The second corpus, FRAV-ABC-Test-P&S-300, achieved an EER of 0.80 and an accuracy rate of The best results are obtained using digital images. It can be seen that the curve obtained from digital images is closer to the origin; therefore, this curve has the lowest EER. In the case of P&S images, the performance is very similar even when using a different resolution. Since this is the procedure followed by the passport issue authorities, it should be remarked that the results are quite similar independent of the resolution considered. Increasing the resolution will not have a significant improvement in the attack detection results. A comparison between several MAD approaches presented in the literature and the results obtained in this study is pointed out in Table 1. Both the EER and accuracy values are depicted to properly explain the system behavior. The results are shown using both input types of images (digital or print and scan). The environment conditions in the data acquisition task have also been added to this table. The outcomes of this research work are the only results that have been acquired under real ABC conditions. The results achieved exceed the outcomes obtained in several studies presented in the literature. On the one hand, the EER of digital images is 14% lower than the best result accomplished in the literature and one order of magnitude better than the others. The accuracy obtained is similar to the state of the art. On the other hand, the EER of print and scan images is one order of magnitude better than all others. Thus, the use of printed and scanned images with different qualities (300 or 150 dpi) is not significant. However, the accuracy is even better than the only value reported in the table.
The calculation capacity of this study is notable. On the one hand, the average time of the full de-morphing process was approximately 5 seconds. This time was calculated with test images on a personal computer. The characteristics of the test environment are as follows: Intel R Core TM i7 motherboard with 8 GB RAM. Note that Frontex recommends the time to be less than 30 seconds [18].
On the other hand, the average time of the DMN process was 3.726 seconds and 0.403 for each of the two verifications (image de-morphing vs image in vivo and image de-morphing vs image chip). Thus, the final time was 4.532 seconds (3.726 + 3 × 0.403).

VII. CONCLUSION
This research work proposes a new de-morphing-based approach using a CNN to detect morphing presentation attacks in real automated border control systems. A current CNN architecture has been adapted to this specific problem. A neural network was trained with different images such as in vivo, chip, Print, Scan, and Print & Scan. A deep evaluation was carried out to check and assess the morphing attack detection capability. The assessment has been performed taking into account two images (in vivo and passport chip), which is currently the most typical situation in border control.
Regarding the experimental results, it can be concluded that the CNN paradigm is suitable for morphing attack detection, obtaining relevant outcomes. The print and scan results achieved are remarkably better than other aforementioned research works. A significant influence of the dpi scan resolution in detection attack outcomes has not been shown.
The results achieved in digital images are significantly better than ''print and scan'' samples and improve the values obtained in the literature. The comparison of outcomes was performed against three different studies, and the current research work enhanced the previous studies by one order of magnitude.
One of the most relevant aspects is the improvement of quality and visual aspects of the pictures achieved after the de-morphing process. Moreover, the de-morphing network is perfectly adapted in the ABC systems procedure. In addition to the foregoing, the hidden identity of the impostor is attained. This feature could be very useful for other future applications.
Towards this point, future work is envisioned that would increase the number of users of our own database, so that by adding samples to the database, better training performance and more reliable results from the testing procedure could be obtained. Moreover, paying more attention to feature selection for the CNN would enhance the outcomes.