Synthetic Gastritis Image Generation via Loss Function-Based Conditional PGGAN

In this paper, a novel synthetic gastritis image generation method based on a generative adversarial network (GAN) model is presented. Sharing medical image data is a crucial issue for realizing diagnostic supporting systems. However, it is still difficult for researchers to obtain medical image data since the data include individual information. Recently proposed GAN models can learn the distribution of training images without seeing real image data, and individual information can be completely anonymized by generated images. If generated images can be used as training images in medical image classification, promoting medical image analysis will become feasible. In this paper, we targeted gastritis, which is a risk factor for gastric cancer and can be diagnosed by gastric X-ray images. Instead of collecting a large amount of gastric X-ray image data, an image generation approach was adopted in our method. We newly propose loss function-based conditional progressive growing generative adversarial network (LC-PGGAN), a gastritis image generation method that can be used for a gastritis classification problem. The LC-PGGAN gradually learns the characteristics of gastritis in gastric X-ray images by adding new layers during the training step. Moreover, the LC-PGGAN employs loss function-based conditional adversarial learning so that generated images can be used as the gastritis classification task. We show that images generated by the LC-PGGAN are effective for gastritis classification using gastric X-ray images and have clinical characteristics of the target symptom.


I. INTRODUCTION
With the development of image recognition technologies, there have been expectations of their applications to clinical devices in the field of medicine [1]. Although recent machine learning-based image recognition techniques have shown good prospects as diagnostic supporting systems [2], the usefulness of these techniques is still limited. There have been many studies in which machine learning techniques were applied for world-renowned inspections (e.g., CT, MRI, and mammography; hereafter called major inspections) [3]- [5]. In major inspections, well-equipped research environments such as large-scale annotated medical image datasets have already been established [6], [7], and we can easily access such large-scale public clinical datasets containing data obtained from all over the world [8]. However, there have The associate editor coordinating the review of this manuscript and approving it for publication was Sudhakar Radhakrishnan. been few studies on regional inspections that are executed in a certain limited area (hereafter called minor inspections). In minor inspections, it is difficult to obtain high-quality annotated clinical data since there are different types of imaging equipment, which affect the quality of images, in medical facilities and the detailed annotation requires specialist knowledge. Moreover, new imaging methods have frequently been developed for minor inspections and even if we can construct an annotated clinical dataset from such data, updating training data becomes a fundamental function unlike that for established major inspections. It is necessary to prepare a new dataset that is adapted to the new imaging methods. Hence, minor inspections suffer from a lack of high-quality accessible data, and data-driven approaches are needed to apply machine learning techniques.
Sharing data is one of the effective data-driven approaches that can solve the problems of a lack of high-quality data and the need to update training data [9]. However, preservation of patient privacy should be the top priority in the process to share clinical information [10]. In the field of medicine, it has been considered that privacy information and convenience of data have an inverse proportional relation, and that is one of the challenging problems. Clinical image data include not only privacy information but also identifiers such as social security number, gender, age, and occupation, and such data should be treated carefully. The possibility of re-identification increases when amount of data becomes small. Although the removal of identifiers has been performed for anonymization [11], attention has not been paid to anonymization of image data. In order to accelerate the use of machine learning-based support for minor inspections, simplified approaches considering privacy prevention issues are required.
In recent years, medical image generation methods that enable meaningful synthetic information to be generated have attracted much attention [12], [13]. The wide availability of synthetic data may allow researchers to develop and validate more sophisticated image recognition techniques [14]. Namely, since image generation methods learn the distribution of training data without referring to real images, anonymization of individual information can be realized [15]. The use of synthetic data will contribute to image recognition tasks for minor inspections that require sharing and updating clinical data.
In this study, we targeted gastric X-ray images for the diagnosis of gastritis/non-gastritis. Gastritis is a key factor for the onset of gastric cancer, and gastritis can be diagnosed by gastric X-ray images [16], [17]. In some East Asian countries including South Korea and Japan, which have the highest gastric cancer mortality rates in the world, gastric cancer mass screening based on gastric X-ray inspections has been started [18]. Although gastric X-ray inspection is a traditional and important modality, it is still minor inspection compared to CT or MR inspections. Data-driven approaches are needed to introduce machine learning techniques in the field of gastric cancer mass screening [19].
We propose LC-PGGAN, a Loss function-based Conditional Progressive Growing Generative Adversarial Network, in this paper. The proposed method learns the distribution of the target data and realizes image generation following the distribution from a latent space. Therefore, generated synthetic images are not associated with individual patient image information, and they can easily be used by researchers to develop supporting systems. The LC-PGGAN consists of two novel points. The first one is a progressive growing network architecture. GANs easily fall into a mode of collapse in which all input noise vectors are mapped to the same output image, and optimization fails to make progress. We solve this problem by making our networks progressively learn the target distribution from low resolution to high resolution. The second novel point is loss function-based conditional adversarial training. Typical conditional GANs try to generate label-domain images by using multiple networks and one-hot vector representation. Then a discriminator plays the role of discrimination of conditional information in addition to the task of real-fake discrimination. However, it becomes difficult to implement stable training when the class domain classification task is difficult. In medical images, there always exist abnormal and normal samples, and their differences are subtle. We have designed our method to control the conditional information based on adversarial loss functions, which is the most efficient way for performing training. Namely, efficient training is realized by adding new loss functions in the high resolution step. Although conventional one-hot vector representation approaches force the model to train a different domain classification task in the early training stage, our model does not have to perform such a difficult task in the early stage. This contributes to the realization of efficient training of the adversarial network. For improving gastritis classification performance using anonymized generated images, we mix images generated by LC-PGGAN that have conditional information and images generated by PGGAN that have rich diversity. The three main contributions of LC-PGGAN are summarized below: • Generating anonymized synthetic medical image data • Enabling stable training based on the progressive growing network architecture • Controlling conditional information based on the conditional loss function for efficient training The rest of this paper is arranged as follows. We briefly review related works in Sec. II. In Sec. III, we show the details of our synthetic image generation approach. Experimental results are provided in Sec. IV. We conclude our paper in Sec. V.

II. RELATED WORKS
In this section, we begin with an explanation of the basic concept of GANs in II-A, and we review more specific relevant works on medical data synthesis in II-B.

A. GENERATIVE ADVERSARIAL NETWORK
A GAN is an implicit generative model that competitively learns neural networks firstly proposed by Goodfellow et al. in 2014 [20]. A basic GAN consists of two neural network models, a generative model G that learns the unseen training data distribution and a discriminative model D that learns to classify whether samples come from the training data distribution. When given the prior distribution of the latent variable z following the latent distribution p z (z), the generator G takes z as an input vector and outputs a sample G(z). On the other hand, the discriminator D takes a sample x as an input and outputs D(x), which represents the probability that is real data. Those two models are trained simultaneously with a stochastic gradient descent (SGD) [21] algorithm, and their training procedures can be seen as a two players' mini-max game with the following objective function: where the discriminator D tries to maximize V (D, G), while the generator G tries to minimize it. In other words, the discriminator D distinguishes the images in x ∼ p data from the ones of G(z), while the generator G generates samples to fool the discriminator D.
The concept of GANs has been applied to various tasks. In conditional image generation, supervised and unsupervised domain transformations of images have been explored. For instance, pix2pix proposed by Isola et al. learns an image-to-image translation task using paired data samples [22]. However, this approach requires a large number of paired samples despite the difficulty of obtaining annotated paired samples. In order to address this problem, unpaired image-to-image translation frameworks have been proposed by many researchers (e.g., UNIT [23], CoGAN [24], CycleGAN [25], and DiscoGAN [26]). Moreover, recent domain classification-based GANs that control the characteristics of generated images by operating a latent distribution have shown promising results [27]. When given a large dataset for the training that is easy to access, conventional methods have already achieved the generation of high-quality images.

B. DATA SYNTHESIS IN MEDICAL IMAGES
In medical image applications, preservation of patient privacy is the top priority, and this strict regulation makes accessing and collecting clinical data much more difficult than accessing and collecting data for natural images. Data synthesis approaches have recently been used to overcome this problem. Noseong et al. proposed an anonymization technique for clinical table data using GANs, and their synthesized tables can be shared without any concern about information leakage [28]. The advantage of generating synthetic images by GANs is that the generated images can preserve only the characteristics of training data that are effective for image recognition tasks without having any individual information. In other words, there is no one-to-one relationship between real images and synthetic images in a GAN-based approach, which makes it difficult to re-identify the anonymized information. Therefore, it can be considered that anonymization using generative models is a useful and safe approach for realization of data sharing.
Even if we can access medical datasets, they are often highly imbalanced with a paucity of data from rare conditions. Hojjat et al. tried to address this problem of an imbalance by synthesized images in chest pathology classification [29]. They employed a Deep Convolutional GAN [30] architecture for generating synthetic chest X-ray images. The classification performance was improved with synthetic images for balancing the dataset, though the resolution of synthetic images was lower than that of real images. In image-to-image translation approaches, Pedro et al. proposed a synthetic retinal image generation method motivated by the above-mentioned assumption [14].

III. PROPOSED IMAGE GENERATION METHOD
In this section, a gastritis image generation method, Loss function-based Conditional Progressive Growing GAN (LC-PGGAN), is presented. Firstly, in Subsec. III-A, we explain the data production for gastritis image generation using gastric X-ray images. In Subsec. III-B, the details of our progressive growing network architecture are provided. Finally, we explain how to train our model in Subsec. III-C.

A. DATA PRODUCTION
In this subsection, we propose an approach toward the realization of gastritis image generation using gastric X-ray images with consideration of clinical settings. Figure 1 shows examples of gastric X-ray images used in this study: where (a) is a sample with gastritis and (b) is a sample without gastritis (hereafter called non-gastritis). A stomach with gastritis has coarse mucosal surface patterns and non-straight folds, whereas a stomach without gastritis has uniform mucosal surface patterns and straight folds. Gastric X-ray images have a high resolution (e.g., 1, 024 × 1, 024 or 2, 048 × 2, 048 pixels), for which computational costs are high. In our previous investigation, we found that a patch-division is the best approach for the gastritis classification problem using gastric X-ray images since the differences between gastritis and non-gastritis images are in local regions of the images [31]- [33]. If we use resized gastric X-ray images for gastritis classification, extracted image features cannot show the characteristics of gastritis/non-gastritis. Therefore, we use divided patch images for the generation of synthetic images in the same manner as that in our previous works.
Firstly, we divide gastric X-ray images into multiple patches. Let F i (i = 1, 2, . . . , I ) ∈ R d×d denote gastric X-ray images for the image generation, where I is the number of training images, and their class labels are denoted as y i ∈ {1, −1}. Specifically, F i is divided into H × W patches (H and W being the numbers of patches in the vertical direction and horizontal direction, respectively), and we define X (h,w) i ∈ Rd ×d (h = 1, 2, . . . , H ; w = 1, 2, . . . , W ) which represent patches extracted from F i . Next, we classify the divided patches X (h,w) i into the following three kinds of data: • A: data including gastritis patches,  • N : data including non-gastritis patches, • O: patches from outside the stomach. It should be noted that the region annotation of the stomach in this study was defined manually by a radiological technologist since the accuracy of automated stomach region estimation methods for gastric X-ray images is still insufficient for clinical use [34]. Image level labels are assigned for each gastric X-ray image, and divided patches X

B. PROGRESSIVE GROWING NETWORK ARCHITECTURE
It is necessary to detect the fine differences between abnormal and normal characteristics when trying to generate synthetic gastritis/non-gastritis images for a classification task. However, abnormal (gastritis) characteristics in a gastric X-ray image are often only subtly different from normal (non-gastritis) characteristics in gastric X-ray images and are difficult to understand. In order to detect the subtle differences between abnormal and normal images, we employ a progressive growing network architecture motivated by Progressive Growing GAN (PGGAN) [35]. PGGAN is a representative generative model in high-quality image generation tasks. PGGAN's training starts with low resolution images, and then progressively increases the resolution by adding new layers to the generator and the discriminator. The architecture of PGGAN is shown in Fig. 3. In order to stabilize the training processes, some chips (e.g., mini-batch standard deviation, equalized learning rate, and pixel-wise feature vector normalization) are introduced into PGGAN. Employing PGGAN's VOLUME 7, 2019 learning process enables our networks to detect the characteristics of symptoms from coarse to fine.
In LC-PGGAN, we train our networks with a low spatial resolution of 4 × 4 pixels. The details of our network architecture are shown in Fig. 4. In the low resolution step, LC-PGGAN learns the broad outlines of training images. As the training advances, layers are incrementally added to the generator and the discriminator to reach the high resolution images. In the high resolution step, our LC-PGGAN learns the detailed regions of training images. By adopting these progressive training procedures, the generator can learn the characteristics of gastritis/non-gastritis shown in the training images. Conditional information is also added to generated synthetic images in the high resolution step, and this is another novel point of the proposed method. The conditional adversarial learning is explained in detail in Subsec. III-C.

C. LOSS FUNCTION-BASED CONDITIONAL ADVERSARIAL LEARNING
In training for a typical GAN, a generator learns only a single training distribution. There always exist two pairs of gastritis and non-gastritis images in gastric X-ray inspection. By utilizing this property, generation of synthetic gastritis images can be realized, thus contributing to the improvement of gastritis classification performance.
Let a generator G and a discriminator D have parameters θ G and θ D , respectively. LC-PGGAN utilizes three data distributions, A, N and O, to consider the above-mentioned situations. Then let x a , x n , and x o denote mini-batches of A, N , and O, respectively. Note that the method for the following image generation is a method that tries to generate ''gastritis'' images. Generation of ''non-gastritis'' images can be realized by simply replacing the feeding data A with N .
At the beginning of the training in the low resolution step, the objective function J D,G of LC-PGGAN is where the loss function L A is defined as: where α x a represents the prior distribution of A and x a represents the mini-batch of A. If the discriminator D can correctly classify x a as real images, D(x a ) becomes larger, namely, L A becomes smaller. Next, the loss function L G(z) is defined as: where a synthetic image G(z) is generated by feeding a noise vector z that is sampled from a latent distribution α z to the generator G. The generator G tries to fool the discriminator D, and L G(z) becomes larger when G(z) passes through the discriminator D. We minimize Eq. (2) by optimizing the two loss functions L A and L G simultaneously.
In this way, we feed the data A as training images to the discriminator D, and the generator G learns the outline of gastritis images in the low resolution step. However, some generated synthetic images are still not effective as training data for the classification problem since the generator G learns only a one-class data distribution. To generate more high-quality synthetic gastritis images for enhancing the gastritis classification performance, we feed other data distributions to our adversarial learning, and update our objective function in the high resolution step. In the high resolution step (the 256×256 pixels resolution step), we update our objective function as follows: where λ represents a weight coefficient, and λ is a constant used during the high resolution training step. Note that the loss function L N is where α x n represents the prior distribution of the data N . In the same manner as L N , the loss function L O is where α x o represents the prior distribution of the data O. The loss functions L N and L O provide conditional information to the generator G and the discriminator D. Specifically, the discriminator D should judge a generated ''gastritis'' image G(z) as a fake image, but ''non-gastritis'' and ''outside'' images are judged as fake images by the updated loss functionĴ D,G . The introduction of constraints on our objective function affects the training of the generator G. In the low resolution step, the generator G focuses on detecting only the outline of the ''gastritis'' distribution. On the other hand, in the high resolution step, the generator G has to generate more realistic ''gastritis'' samples since samples similar to ''non-gastritis'' and ''outside'' samples are rejected by the trained discriminator D. Consequently, the generation of ''abnormal'' images for gastritis classification can be realized. By the training procedures of the progressive growing network architecture and loss function-based conditional adversarial learning of our LC-PGGAN, our generator can produce images that have conditional information. Although images generated by LC-PGGAN have conditional information for gastritis classification, a variety of training data is also an essential element in the classification task. Therefore, we mix the images generated by PGGAN that have rich diversity and the images generated by LC-PGGAN that have conditional information for training of the gastritis classification model.
The most important aspect is that images generated by LC-PGGAN do not have one-to-one correspondence to real images. One of the bottlenecks of medical image analysis is that medical images should be treated with high confidentially. This problem must be solved to accelerate research on medical image analysis, particularly for minor inspections. Images generated by LC-PGGAN can contribute to a solution of this challenging problem since it can be used as data that do not include individual information.

IV. EXPERIMENT
In this section, we quantitatively and qualitatively evaluate synthetic images generated by LC-PGGAN. Experimental settings are shown in Subsec. IV-A, and quantitative and qualitative evaluation results are shown in Subsec. IV-B.

A. EXPERIMENTAL SETTINGS
As clinical data, 815 (240 gastritis and 575 non-gastritis) patients' gastric X-ray images were used. The ground truth of gastritis/non-gastritis was determined by endoscopic and X-ray image interpretation results with double-checking by clinicians. Gastric X-ray images were gray-scale and 2,048 × 2,048 pixels, and they were divided into multiple patches of 299 × 299 pixels with a sliding interval of 50 pixels. The sizes of the patches were experimentally determined. In the image generation procedure, these patches were resized for training. We randomly selected 100 gastritis and 100 nongastritis gastric X-ray images from our original data, and we constructed our training data for image generation. In other words, 200 gastric X-ray images were allocated as training data. The numbers of divided training patches of the data A, N and O were 45,127, 42,785 and 48,385, respectively. Synthetic image generation was implemented using data for A, N and O. The remaining 615 X-ray images were allocated to test data. Test data were also divided into multiple patches in the same manner as that for training data, namely, 1,225 patches were extracted from each test gastric X-ray image. Estimated labels of gastritis/non-gastritis were determined for each patch, and the final image level estimation result was determined by the simplest majority voting method.
Many evaluation metrics such as Inception score [36], Fréchet Inception distance (FID) [37], and Sliced Wasserstein distance (SWD) [35] have been proposed for evaluation of the quality of generated images. However, these metrics are not suitable for the evaluation of images for classification problems. A classification-based metric, known as GANtrain, for the evaluation of generated images was proposed by Shmelkov et al. in 2018 [38]. GAN-train evaluates the classification performance of a classifier trained on generated synthetic images and tests the performance on a set of real images. If an optimal GAN model that perfectly captures the target distribution can generate a set of images, they are indistinguishable from the original training set. Assuming this set has the potential of the same classification performance in GAN-train. Since anonymized generated images were used for gastritis classification in our study, we used GAN-train as our evaluation index.
In the experiment, the support vector machine (SVM) [39] was used as an estimator for the gastritis classification task in GAN-train. In terms of gastritis classification accuracy, a deep learning-based estimator is the first choice. However, such an estimator has many parameters and the classification performance heavily relies on the settings of the parameters. Therefore, we employed the simplest SVM as our estimator to fairly evaluate the effectiveness of generated images. The types of features also affect the classification performance. Hand-crafted features are an old-fashioned approach, and we therefore extracted high-level semantic features from pre-trained deep models, namely, pre-trained VGG-16 [40], Inception-v3 [41] and, ResNet-50 [42] models, in the experiment. Specifically, 4,096-dimensional features were extracted from the fully connected layer (fc_7) of VGG-16, and 2,048-dimensional features were extracted from the pool_3 layer of Inception-3 and the flatten layer of ResNet-50, respectively. The features of generated and real images that were obtained were used for the SVM-based evaluation.
As comparative methods, synthetic images generated by an original PGGAN and a basic deep convolutional GAN (DCGAN) [30] were used. In this GAN-train experiment, we generated 10,000 patches from latent distributions through learned image generation models. From this set, we randomly sampled 5,000 generated positive/negative patches and constructed GAN-train data. Note that the SVM training data ''LC-PGGAN + PGGAN'' is constructed by randomly sampled from 20,000 (10,000 from LC-PGGAN and 10,000 from PGGAN) patches. Sensitivity (Sen), specificity (Spe), and harmonic mean of Sen and Spe (HM) were utilized for the evaluation. These criteria can be defined as VOLUME 7, 2019  follows:

B. RESULTS AND DISCUSSION
The goal of our method is the generation of realistic synthetic images for sharing and updating clinical data easily. It is expected that anonymized synthetic data will be as effective as real data for classification problems. We compared the gastritis classification performance using synthetic data as quantitative evaluation. Results of the classification performance in GAN-train are shown in Tables 1-3. From the results,  we can see that LC-PGGAN outperformed the comparative methods in gastritis classification performance despite the fact that the performance does not outperform when real images were used as training data. Although PGGAN showed a certain level of classification performance, its performance was not as high as the performance of LC-PGGAN. Moreover, collaborative use of LC-PGGAN and PGGAN showed the best performance in Inception-v3 and ResNet-50 features. On the other hand, we can see that the model trained on generated images by DCGAN cannot correctly classify real data. Overall, we confirmed that the progressive growing network architecture is effective for detecting the real data distribution since DCGAN does not have such a network architecture.
Next, we discuss the visual quality of the generated images.   patches. As shown in Fig. 2, characteristics of gastritis are coarse mucosal surface patterns and non-straight folds, and those of non-gastritis are uniform mucosal surface patterns and straight folds. We can see that the images generated by LC-PGGAN can correctly detect these characteristics. In particular, we can see that the images generated by LC-PGGAN (in Fig. 5) have more specific characteristics of gastritis/nongastritis. Also, the images generated by PGGAN (in Fig. 6) have the same characteristics as those of real data. In addition, we can see that generated images have wide varieties. This may contribute to the improvement in gastritis classification performance. On the other hand, the images generated by DCGAN (in Fig. 7) have some noise.
Generally, GANs easily face the mode collapse problem caused by the meshing of the training progress of a generator and a discriminator. If mode collapse occurs, the learned generator produces only similar images. In the proposed method, we updated our loss function J D,G toĴ D,G during the training. Despite this regularization, LC-PGGAN achieved successful generation of high-quality synthetic images.
This study has some limitations. The classification performance of real gastritis/non-gastritis images in this study is not of a sufficient level for clinical applications. In the experiment, instead of using deep neural networks that require complicated parameter tuning processes, we used the simplest SVM models as our estimator since we focused on the evaluation of the quality of generated images. We have already achieved high classification performance using real images based on deep learning [33], and this attempt using synthetic images is the next step for data sharing.

V. CONCLUSION
We have presented a synthetic gastritis image generation method with progressive growing adversarial learning, which is a novel high-quality image generation method for realizing sharing and updating of clinical data for machine learning techniques more easily. Besides the fact that our anonymized generated images were useful for gastritis classification, we confirmed our these images had characteristics of gastritis/non-gastritis like real data. MIKI HASEYAMA (S'88-M'91-SM'06) received the B.S., M.S., and Ph.D. degrees in electronics from Hokkaido University, Japan, in 1986, 1988, and 1993, respectively. She joined the Graduate School of Information Science and Technology, Hokkaido University, as an Associate Professor, in 1994, where she is currently a Professor with the Faculty of Information Science and Technology. She was a Visiting Associate Professor with Washington University, USA, from 1995 to 1996. Her research interests include image and video processing and its development into semantic analysis. She is a member of the IEICE, ITE, and the Information Processing Society of Japan IPSJ. She has been the Vice-President of the Institute of Image Information and Television Engineers, Japan (ITE). She has been the Editor-in-Chief of the ITE Transactions on Media Technology and Applications. She has also been the Director of the International Coordination and Publicity of The Institute of Electronics, Information, and Communication Engineers (IEICE). VOLUME 7, 2019