Data Augmentation of X-Ray Images in Baggage Inspection Based on Generative Adversarial Networks

Recognizing prohibited items in X-ray security checking images automatically based on Convolutional Neural Networks (CNNs) has attracted attention increasingly. However, there are no suitable X-ray security checking image databases to train a reliable CNN model. Therefore, we propose a data augmentation method for X-ray security checking images. First, a lot of new X-ray prohibited item images are generated using the improved Self-Attention Generative Adversarial Network (SAGAN). Next, a Cycle GAN based method is proposed to transform the item natural images into the X-ray images. It can enrich the diversity of the new images, including item shape and pose. Then, we combine the prohibited item images with background images to synthesize the new X-ray security checking images. Finally, two single shot multi-box detector (SSD) models are applied to verify whether the enlarged database has achieved data augmentation. Experimental results show that the performance of SSD model trained by the enlarged database is better than the SSD model trained by the original database. It implies that our method can achieve data augmentation for X-ray security checking images effectively.


I. INTRODUCTION
X-ray baggage inspection is an essential part of maintaining public transportation security [1]. However, the reliability and efficiency of manual inspection is undesirable. The phenomenon of missing detection often occurs in practice. During the rush hours, passengers spend a long time waiting in line for safety check, which brings great working pressure to security inspection operators. Therefore, a reliable automatic detection system for prohibited items is necessary to improve the accuracy of threat detection as well as speed up the screening process [2]. Recently, Convolutional Neural Network (CNN) models have shown powerful performance on image classification and object detection [3]- [8]. They are also applied in X-ray baggage inspection. Xu et al. [9] proposed a CNN based method with an attention mechanism to detect the X-ray prohibited item images. An et al. [10] built a semantic segmentation net with dual attention to identify the prohibited items. Although these methods improve the The associate editor coordinating the review of this manuscript and approving it for publication was Victor Sanchez . performance of prohibited item detection, they still can not be applied in real scenes. A reliable CNN model for detecting prohibited items needs an ideal X-ray security checking image database, not only in model training but also in model testing. Currently, the available X-ray image database are GDX-ray [11] and SIXray [12]. GDX-ray, a grayscale image database, is not suitable to train model which detect prohibited items in pseudo color image. The SIXray database only includes 8929 images with prohibited items. These databases could not meet the requirements of CNN training in sample quantity and diversity. Collecting ample X-ray security checking images containing the prohibited items with various poses and scales by X-ray machine is very difficult. To solve the difficulty of training data, a reasonable method is proposed to generate new samples automatically based on generative models.
In the last few years, Generative Adversarial Networks (GANs) have achieved considerable success in image generation. Many derived GAN models have been proposed to improve the quality of the generated images [13]- [15], especially the SAGAN [16] and the BigGAN [17]. Some GAN models are also used for image-to-image translation such as the Pix2Pix GAN [18] and the Cycle GAN [19]. The GAN-based image generation and image-to-image translation schemes have been demonstrated their feasibility in data augmentation [20]- [23]. Recently, the GAN-based data augmentation method has been used for X-ray prohibited item image database. In Ref. [24], GAN model is applied to generate X-ray prohibited item images. However, their generated images only contain the single prohibited item, while the image quality and diversity are not ideal. Thus, we focus on improving the GAN model which can generate new images with better quality and diversity.
In this paper, a method based on GAN is proposed to enlarge the X-ray security checking image database. First, the database used in this paper is introduced. Second, an improved GAN model is designed to generate the new prohibited item images (as shown in Figure 1(b)). These generated images are compared with the images generated by other GAN models based on Frechet Inception Distance (FID) score. Next, to enrich the diversity of new prohibited item images, we propose an image to image translation method based on Cycle GAN model, which can transform the natural item image into the X-ray image. Then, we combine the generated prohibited item images with the background images (as shown in Figure 1(c)) to synthesize the new X-ray security checking images (as shown in Figure 1(a)), which are added into the database as new samples. Finally, the SSD models [25] are trained by the original database and the enlarged database respectively. We verify the effect of the proposed data augmentation method by comparing the model performance. The experimental results show that the new enlarged X-ray security checking image database can improve the detector performance.

II. DATABASE
In this Section, two databases are introduced: X-ray prohibited item image database (Database A) and X-ray security checking image database (Database B). The entire X-ray image databases are manually collected using an X-ray security inspection equipment working in our laboratory.

A. X-RAY PROHIBITED ITEM IMAGE DATABASE
The Database A includes 10 categories of prohibited item images, such as gun, fork, scissor and so on. Each category involves 200-400 images in 256×256 size, as shown in Figure 2. In order to facilitate the X-ray image synthesis in the subsequent work, we extract the prohibited item foreground in images by the method proposed in [24]. The prohibited item images without background are shown in Figure 3.  Different colors represent different materials during X-ray imaging. Blue, green and orange respectively indicate metal, mixture and organic. If the two items are overlapping, the color of overlapping part with different materials will be covered and the color of overlapping part with same material will be deepened. For instance, in figure 2, the color of the overlap between the gun and the pliers is deepened.

B. X-RAY SECURITY CHECKING IMAGE DATABASE
The Database B including 7 categories of prohibited items, such as gun, knife, plier, scissor, fork, power bank and lighter. Each image has one or more prohibited items. The database B contains 4500 images, which are 512×512 in size (as shown in Figure 4). These images are mostly obtained from X-ray scans on personal luggage, in which the size of the objects vary widely and the items are often stacked randomly.

III. X-RAY PROHIBITED ITEM IMAGE GENERATION
The training of most GAN models needs a lot of images. However, the X-ray prohibited item image database is small in size, which would result in model overfitting or model collapse. To generate prohibited item images in high reality and rich diversity, we improve the Self-Attention GAN model [16] by matching the prohibited item image database. Then our model is compared with other GAN models based on FID scores. Finally, we show some generated images of the proposed model.

A. THE IMPROVED SAGAN MODEL
According to Section 2.1, we find the prohibited item images have special geometries and the database A only contains a few images. Thus, the GAN model can learn the geometric features of images well in a small training database. The SAGAN model proposes a self-attention mechanism, which improves the modeling ability of geometric features. However, based on a limited database, it can not perform well, which the generated images have the distorted shape of items. To generate better prohibited item images, we improve the network structure and loss function of the SAGAN model. The improved SAGAN model is illustrated in Figure 5. We use the convolution and deconvolution structures as the Discriminator and Generator. In order to make model learn more long-range correlation of prohibited item images, we deepen the convolutional network structure. The Discriminator consists of six convolutional layers and one fully connected layer. The Generator contains seven deconvolutional layers and one fully connected layer. The detailed structure parameters of the improved model are shown in the Table 1. We remove the Batch Normalization of the Discriminator layers and only retain the Batch Normalization in the Generator. We also remove the Spectral Normalization [26] in the networks to avoid over-fitting.
Compared to the WGAN-GP, we find that the Hinge cost function [27] is not suitable for the Database A. Thus, the WGAN-GP cost function is applied to optimize the model. The WGAN-GP cost function is: where G and D present Generator and Discriminator of GAN. The z is the random uniform noise vector. The gradient penalty objectx is sampled from the generated images and the real images uniformly. The λ is the penalty coefficient. In addition, hyperparameter is also important for GAN model. The batch size is 36. The penalty coefficient λ is 10. The learning rate of Generator and Discriminator is 0.0004 and 0.0001 respectively. The update ratio of Generator and Discriminator is 2.

B. PERFORMANCE COMPARISON OF DIFFERENT GANs
As shown in Figure 6, some images with different visual quality are generated using four GAN models. The first model is the Deep Convolutional Generative Adversarial Network (DCGAN) [14]. It can be found that the generated images have many noises and no texture information, which are poor in visual quality. The second model is SAGAN [16].
Although the quality of the generated images has improved, the shape of the prohibited items is distorted. The third model is WGAN-GP model. Although these generated images are better than others in quality, the edges of the prohibited items are blurred. Therefore, it is difficult for existed models to generate high quality prohibited item images. The last model is the proposed model. Comparing with other GAN models, the visual quality of the generated images using our model improve obviously. However, visual quality is not objective, and we need quantitative comparison. Currently, FID [28] is extensively used for evaluating the generated images. FID has been shown to be more consistent with human evaluation in assessing the realism and variation of the generated samples. The lower FID score, the better model performance. We also compared four models quantitatively using FID score. Table 2 presents the FID scores of the four models and it can be found that the generated images using our model achieve lowest FID scores. It means that our model can better approximate the distribution of real images than other models.

C. THE GENERATED IMAGES BY OUR MODEL
As shown in Figure 7, some prohibited item images without background are generated using our model. These items contain guns, knives, screwdrivers, scissors, pliers, lighters, wrenches, power banks, hammers and forks. The quality of the generated images is very close to the real images. The generated images have high resolution, reasonable contour and clear textured, which are our expectation.
Our model can also generate overlapping dual prohibited item images. As shown in Figure 8, the overlapping parts of the items in the generated images are realistic. These generated overlapping dual prohibited item images have good   visual quality. The images in Figure 9 are also generated by our model, which have better prohibited item quality and clear background information. In addition, our model can also generate the new images which combine the features of multiple real images. Figure 8 and Figure 9 show that the prohibited items in these images have new features. VOLUME 8, 2020 For example, the real image database only contains the overlapping dual prohibited items images, but the model can generate the images with three overlapping prohibited items. These generated images are conducive to enrich the diversity of the database. Figure 10 shows that, the model also generates some special images with low-quality, but those images can also increase the diversity of the database.

IV. X-RAY PROHIBITED ITEM IMAGES TRANSFORMATION
Data augmentation includes sample count and sample diversity. In Section 3, GAN-based method is designed to generate many new prohibited item images. However, the diversity of the generated images is restricted by the training database samples. In addition, since it is difficult to obtain many different guns, daggers and other prohibited items in practice, the shape diversity of prohibited items is relatively poor in X-ray security checking images. Here, we propose an image transform method between the natural prohibited item images and the X-ray prohibited item images based on the Cycle GAN to solve these problems.

A. CYCLE GAN-BASED TRANSFORMATION METHOD
Cycle GAN [19] is able to achieve image-to-image translation by learning the feature distribution of two domain images, such as color style translation: horse to zebra. In addition, the model is also used for semantic segmentation, edge extraction and so on. According to its powerful ability of image-to-image translation, we realize that the transformation between the natural prohibited item images and the X-ray prohibited item images has the feasibility. By this way, we enrich the shape and pose diversity of the security X-ray prohibited item images. Cycle GAN uses two Generators to learn the mappings of two domains, and two Discriminators to discriminate the generated images and the real images. The chart of image transformation based on Cycle GAN model is illustrated in Figure 11.
The input image is natural image. After the binarization processing, the natural images are converted to the binary images, which are the input of the generator. G AB , G BA , D A and D B are two Generators and two Discriminators respectively. The G AB can convert binary images to X-ray images as well as the G BA can convert X-ray images to binary images. The D A is used to determine whether the binary images are the real images or the generated images by G BA . The D B is used to determine whether the X-ray images are the real images or the generated images by G AB . We replace the original cross entropy loss function of the Cycle GAN model with the least square loss function [29]. The Cycle GAN cost function is as follows: where X and Y are binary image and X-ray image respectively. L GAN is the least square loss function, which are shown in Ep. (4) and (5). L Cycle is the loss function of cyclic consistency, which is defined as Eq. (6), After confrontation training, G AB and G BA can generate more realistic images. The X-ray images generated by G AB are what we need.

B. THE TRANSFORMATION RESULT BASED ON CYCLE GAN MODEL
The used database includes the natural prohibited item images and the X-ray prohibited item images. We collect some real prohibited item images with rich shapes and poses from the internet. The foreground extracting method [24] is used to extract foreground of the natural images. Next, both the natural prohibited item images and the X-ray prohibited item images are convert to binary images. Compared to the X-ray images, the natural images have richer diversity of the item shape and pose. In this conversion process, we only want to change the color features of the prohibited item images, retaining the shape and pose features. So, the X-ray prohibited item images and their corresponding binary images are used to train the Cycle GAN model.
After training, the model could achieve the transformation between the natural style and the X-ray style. Then, the corresponding binary images of the natural images could be transformed into X-ray images by the model. Figure 12 shows the transformation results of some handguns and hammers from natural images to X-ray images. Each column contains two sets of images, which are a natural image, a binary image, and an X-ray image from left to right. This method can only convert the color distribution of the items while ensuring that the shape and pose of the prohibited items in images are unchanged. Take the handgun as an example. Since the handgun is usually made of metal and the main color under X-ray is blue, the color of the generated handgun image is also blue and its special texture is also not changed. Therefore, the rich features of the natural item images are transformed into the X-ray images.
In this Section, a method based on the Cycle GAN model is proposed to convert the natural images to the X-ray images. The images generated by this method can effectively enrich the variety of the shapes and pose of the prohibited item images, especially the prohibited items which are difficult to obtain and have a wide variety of shapes and pose.

V. X-RAY SECURITY CHECKING IMAGE SYNTHESIS
In previous Sections, we mainly introduce the data augmentation method of X-ray prohibited item images. Through these methods, we generated many prohibited item images with high quality and rich diversity. In order to get X-ray security checking images, these prohibited item images are synthesized with background images. The background images are also collected manually in our Lab. Thus, in this Section, a method for X-ray security checking images synthesizing is introduced.
Different materials present different colors under X-ray. On the one hand, when the prohibited items of multiple materials are placed overlaid, X-ray imaging of high-priority materials covers with low-priority images.On the other hand, the color of the overlapping prohibited items with the same material is deepened in the X-ray imaging.
Thus, according to these characteristics of X-ray imaging, we choose the following methods to synthesize the X-ray security checking images. Figure 13 shows the architecture of synthesis scheme. The X-ray security checking images can be synthesized by solving Eq. (7), where the P denotes the generated prohibited item image without background. The Mask is obtained from the prohibited item image by binarization. The B refer to the background image. The X means the X-ray security checking image. In the previous Sections, we not only generate single-item images but also generate overlapping multi-item images. These images are used to synthesize X-ray security checking images. In the process of synthesis, in order to make the composite images match the priority of X-ray image imaging, we first synthesize the prohibited item of organic material, then the mixture and finally the metal items. We also combine the different prohibited item images with the background images randomly, such as item class and item position. In addition, the size of the prohibited items varies randomly according to their actual size. By this way, the diversity of synthesized X-ray security checking images is enriched. Figure 14 shows some synthesized images. We find that the synthesized X-ray security checking images are very close to the real images which shown in Figure 4.

VI. VERIFICATION EXPERIMENTS
In this Section, to verify whether the expanded database has the effect of data augmentation, we design a comparison experiment of performance evaluation by training object detection models with or without synthesized images. If the performance of object detection model trained by the enlarged database has improved, it will show that our data augmentation method works.
In this experiment, we apply a classical SSD model as the object detection model. First, we introduce the database in the experiment. The original database is the Database B which is introduced in Section 2.2. These images are collected manually, and there are only 4500 images. We synthesize 4200 X-ray security checking images containing 7 types of prohibited items, which is same with the category of the real images. (There are 4084 synthesized images containing the generated prohibited item images, and 634 synthesized images containing the transformed prohibited items images.) The size of the synthesized images is also 512×512. The enlarged database (named Database C) consists of the real X-ray security checking images and the synthesized X-ray security checking images. Thus, the Database C includes 8700 X-ray security checking images with seven classes. Based on the Database B and Database C, we manually add the bounding-box for each prohibited item. We randomly divide the Database B into two subsets for training and testing. The Database C is also divided into two subsets for training and testing. The testing subset of the Database C is same with the testing subset of the Database B. The number of subset of two database is shown in the Table 3. Next, the SSD model is trained by training subset of the Database B and Database C respectively, then we test the trained models with the testing subset. Furthermore, the SSD models are trained on a NVIDIA 1080Ti GPU. The batch size is 16, and the learning rate is 0.0001. The predicted bounding box is correct if its intersection over union (IoU) with the ground truth is higher than 0.5. We use average precision (AP) as the metric for evaluating detection performance. The mAP is defined as mean average precision. Finally, experimental results are shown on the following tables. The Table 4 represents the AP obtained by testing the trained SSD models with real images. From experimental results we find that the accuracy of the SSD model trained by the Database C is higher than the SSD model trained by the Database B, which the mAP increased by 5.6%. Since the large size and different colors of some prohibited items such as power bank and pliers, it is easy to be detected by the SSD model. Thus, the improvement of detection model performance after data augmentation is limited. However, the AP of other prohibited items like lighter, fork and knife has been greatly improved after data augmentation. This shows that data augmentation based on our method can help the detection model detect these prohibited items with small size and similar colors better, especially the AP of lighter is improved from 63.9% to 83.4%. However, the AP of gun declined. We speculate that the capacity of the SSD model is limited so that the model can not learn the feature of the real images from the enlarged database very well.
Here we only synthesized 4200 X-ray security checking images. If there are more synthesized images for SSD model training, we think the performance of SSD model can be further improved. Therefore, it is concluded that the synthesized images based on our method can enhance the X-ray security checking image database.

VII. CONCLUSION
In this paper, a data augmentation method for X-ray security checking images is proposed. We improve the SAGAN to generate the realistic X-ray prohibited item images. To enrich the diversity, a Cycle GAN based image transformation method between the natural images and the X-ray images is proposed. Then, we combine the generated prohibited item images and the background images to synthesize the new X-ray security checking images. Finally, we design a comparison experiment of performance evaluation to verify whether the synthesized images are useful, which by training a SSD model with or without the synthesized images. Experimental results indicate that the synthesized X-ray security checking images are useful for improving the mAP. Therefore, our method can effectively achieve data augmentation for the X-ray security checking image database.