An Enhanced Framework of Generative Adversarial Networks (EF-GANs) for Environmental Microorganism Image Augmentation With Limited Rotation-Invariant Training Data

The main obstacle to image augmentation with Generative Adversarial Networks (GANs) is the need for a large amount of training data, but this is difficult for small datasets like Environmental Microorganisms (EMs). EM image analysis plays a vital role in environmental monitoring and protection, but it is often encountered with small datasets due to the difficulty of EM image collection. To this end, we propose an Enhanced Framework of GANs (EF-GANs) that combines geometric transformation methods and GANs for EM image augmentation. First of all, the color of an EM image has an insignificant impact on its class label, based on this fact, we perform color space augmentation to the original EM images. Secondly, we train EF-GANs with augmented EM images to generate utterly new EM images. Finally, we rotate the generated samples in various directions to obtain a more natural performance. In this study, we use VGG16 and ResNet50 networks to evaluate the proposed EF-GANs on 21 different types of EMs (420 EM images). It is observed that the average precision (AP) of VGG16 increases between 4.5% and 84.1% in 20 EM classes and one class remains unchanged. The AP of Resnet50 rises between 8.7% and 38.7% in 12 EM classes and reaches 100% in two EM classes. Furthermore, to reflect the generalization performance of EF-GANs, we employ an utterly new EM image dataset (630 EM images) to test the previous VGG16 networks. We select the VGG16 networks with original and optimal settings for all the EM classes, and for testing, optimal settings for a single EM class is considered. In the 20 of 21 one-vs-rest EM image classification tasks, the AP of VGG16 increases between 1.66% and 88.1%. The results demonstrate that the proposed EF-GANs can achieve outstanding performance in augmenting single EM images with high quality and resolution, thus, to improve the APs of EM image classification.


I. INTRODUCTION
In recent years, with the continuous progress of industry, there have been numerous environmental problems, such as water The associate editor coordinating the review of this manuscript and approving it for publication was Kumaradevan Punithakumar . pollution and what is the full form of particulate matter (PM) cases, increasing the risk of diseases. Instead of using chemicals to eliminate such pollutants, a more harmless approach would be taking advantage of the natural consumption of Environmental Microorganisms (EMs). EMs are microscopic living organisms in natural and artificial environments VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ (e.g., forests and farmlands), which are useful for cleaning environments [1]. For example, Actinophrys can digest organic waste in sludge and increase the quality of freshwater, whereas Rotifera can decompose rubbish in water and reduce the level of eutrophication. To have more knowledge of the EMs, microorganism classification is a primary and significant task in microbiological fields. Generally, image analysis techniques are widely used in EM classification tasks because they are rapid, low cost, and objective process [2] [3] [4]. Since EMs are tiny and invisible to the naked eyes (their size usually varies between 0.1 and 100 µm), they can only be observed under microscopes. EM images are captured using a digital camera embedded in the microscope (p.s., no matter how a microorganism is in any direction in the image, it is still a microorganism. Therefore, the microorganism is rotation-invariant.). Since most of the EMs are colorless and transparent, it is usually challenging to see EMs' outlines and details under the microscope. Therefore, it is necessary to dye the microorganisms with an additional light source so that we can make the outlines and details of the microorganisms clear, and the additional light source does not affect the class label of microorganisms. However, there are some difficulties in collecting EM images. EMs are different from the laboratory microorganisms (LMs). LMs grow in the medium and there is usually only one type of microorganism in one medium. Therefore, the background of the image is pure and free of impurities. On the contrary, EMs grow in natural environments (e.g., forests, rivers, fish ponds, and farmlands). When the soil, water, and other samples collected in the natural environment are observed under the microscope, there are inevitably different types of microorganisms in the sample, so finding one microorganism of a specific type in a visual field is difficult. So, collecting EM images is difficult and the datasets are normally small. Therefore, although image analysis based on EM classification methods are effective, they often suffer from a small training dataset problem [5]. Especially, novel deep learning methods are sensitive to this problem even more severe than classical machine learning methods (e.g., support vector machine (SVM)). When the training dataset is unbalanced (the amount of available data is not equal between different categories), image classification accuracy often decreases significantly [6]. To this end, data augmentation techniques are normally used to expand the dataset to obtain a better training performance.
Traditionally, data augmentation methods include image cropping, rotation, flipping, scaling, translation, contrast transformation, color space transformation, and noise injection [7]. In contrast, the novel Generative Adversarial Networks (GANs) can generate more natural and vivid images [7]. Hence, more and more works select GANs for data augmentation. For example, in [8], CycleGAN is used in an emotion classification task for human face image generation. However, for small datasets such as EM microscopic images, directly using GANs to perform image augmentation has the following two problems: First, due to the various directional properties of EM microscopic images, it is difficult to generate various directional microscopic images by using a few various directional EM microscopic images. Only with a large number of images covering sufficient directions can GANs generate microscopic images in various directions. Second, because of the small dataset problem, if we directly use GANs to achieve image augmentation, many details will be missed from generated images. Without a large number of images with enough details, GANs cannot generate images with sufficient details. For these two problems, we propose an Enhanced Framework of GANs (EF-GANs) to solve its various directional small dataset problem. Our model is not a new GAN, but an enhanced framework of GANs. It means that all the GANs can be inserted into our model.
The difference between EF-GANs and GANs is that in addition to discriminator and generator networks that make up GANs, EF-GANs adds two heuristic steps at the beginning and the end of GANs. Before inputting the EM images into GANs, in step-1, we rotate EM images to unify the direction of the same class of EMs in the images and then perform color space transformation to these images. Because EM images are rotation-invariant, unifying the direction of EMs is able to solve the problem that GANs are difficult to generate EM images of different directions without enough images covering a number of directions. So, GANs only need to generate EM images in one direction. The operation of color space transformation makes the number of images input to GANs meet the training needs of GANs, which can solve the problem of insufficient training images of GANs. It enables GANs to generate more detailed images. Finally, after GANs generate the images, in step-3, we rotate these images in various directions. Because we rotate EMs images to a unified direction in step-1 and lose the direction information of EMs, we finally need to make direction compensation. So, we rotate the image in many different directions and obtain many generated EM images. Hence, it makes up for the loss of EM directional features after the augmentation of GANs.
The workflow of EF-GANs is shown in Fig. 1 and the process of EF-GANs is as follows: • In step-1, to train GANs effectively, we initially augment each image 15 times. In (I), original EM images and their Ground Truth (GT) images are prepared. In (II), we localize the positions of EMs in the GT images to find the directions of their main axis and minimum bounding rectangles. In (III) and (IV), we use the MBRs to crop the EMs in the original images and rotate the images to make EMs' main axis horizontal. In (V), because the colors of images have little effect on their class label [9] [10], we perform color space augmentations on these images.
• In step-2, we input the images from step-1 (V) into the GANs to generate images, which augment EM images four times.
• In step-3, we rotate the results from step-2 into various directions to generate more natural EM images, which augment EM images four times. Step-1, step-2, and step-3 are three steps of EM image augmentation. In step-1, first of all, we use the GT images to acquire the long axis (main axis), the short axis, the centroid, and the minimum bounding rectangle (MBR) of their corresponding original images. Secondly, we cut out the outer part of the MBR of the original images. And then, we rotate the cropped images to make the main axis horizontal for unifying the position of the same class of EMs. Finally, we perform color space transformation to the rotated images. In step-2, we input the images of step-1 to GANs to generate more images. In step-3, to simulate EM images in the real environments, we rotate generated images in various directions.
The main contribution of this paper is as follows: • We utilize the location information of GT images and transform each of EM images in the same direction (main axis direction) to unify the direction of EMs, so that GANs can generate EM images in the same direction.
• Based on the fact that the backgrounds of EM images have different colors under different light sources. Therefore we perform color space transformation to the EM images to reproduce them under different color light sources in the real environments. This technique expands the images and does not affect the quality and class label of images.
• We combine the traditional image processing technology with GANs to build the EF-GANs model, which can generate high-quality EM images with fewer images. Therefore, we have resolved the requirement of a large amount of training data.
• The generated images are in any directions and can improve the classification APs of deep learning networks.
The structure of this paper is as follows: Sec. II is the related work about existing image augmentation methods. Sec. III gives a detailed description of EF-GANs. Sec. IV introduces the experiment settings and evaluation of augmented images. Sec. V closes this paper with a brief conclusion and introduces our future work.

II. RELATED WORK
In this section, image augmentation approaches are introduced, including basic image manipulation and deep learning based methods.
A. IMAGE MANIPULATION BASED IMAGE AUGMENTATION 1) GEOMETRIC TRANSFORMATIONS Geometric transformations are traditional data augmentation methods. The effectiveness of geometric transformation relates to their safety, referring to the likelihood of preserving the label post-transformation. For example, rotations and flips are generally safe on ImageNet [11] challenges such as ''cat vs dog,'' but not safe for digit recognition tasks such as ''6 vs 9.'' Geometric transformations include rotation, flipping, cropping, and translation [7].
• Rotation augmentations are done by rotating the image clockwise or counterclockwise between 1 • and 359 • . The safety of rotation augmentations is heavily determined by the rotation degree parameter. Slight rotations such as between 1 • and 20 • could be useful on digit recognition tasks such as MNIST [12], but as the rotation degree increases, the safety of rotation augmentations would be destroyed.
• Flipping augmentations include horizontal and vertical axis flipping. Generally, horizontal axis flipping is much more common than vertical flipping. For example, images of different kinds of animals like cat and dog can generally perform data augmentation by horizontal axis flipping and it does not break the label of images.
• Cropping augmentations refer to cutting out the edges of an image. Its destination is to remove some of the background information and retain the main information of the image. Cropping augmentations are generally safe. It does not break the label of images.
• Translation augmentations are done by shifting images up, down, left, or right. As the original image is translated in a direction, the remaining space can be filled with either a constant value such as 0 s or 255 s, or it can be filled with random or Gaussian noise. This operation does not break the label of images. In addition, translation augmentations are similar to cropping augmentations. They both retain the main information of images. Thus, geometric transformations are reasonable solutions to fix the biases that are presented in the training data, and this can be implemented quickly. However, geometric transformations have some disadvantages, such as the requirement of additional memory, transformation computational costs, and additional training time. Moreover, some geometric transformations like rotation and flipping may not keep the label posttransformations. Therefore, the scope of where and when geometric transformations can be applied is relatively limited.

2) COLOR SPACE TRANSFORMATIONS
An image is composed of RGB color space metrics. Generally, color space transformations are done by each color space metric by multiplying a random number. However, color transformations may lose important color information and they are not always preserving the label postaugmentations [13]. For example, in [14], the color of blood is the most important feature of distinguishing blood from water or plant. Therefore, color transformations would result in that the model cannot recognize the blood in the image.

3) KERNEL FILTER
Kernel filters are used to sharpen and blur images [9] [10]. In [15], they use a unique kernel filter that randomly swaps the pixel values in a sliding window. They call this augmentation technique PatchShuffle Regularization. They show that PatchShuffle improves the generalization ability of convolutional neural networks (CNN), especially for small datasets.

4) MIXING IMAGES
Mixing images is a very counterintuitive approach to data augmentation. After the image transformation, the image cannot be recognized. In [16], they propose a SamplePairing method. Firstly, they perform primary image augmentations such as flipping for two images which are randomly selected from training datasets, and then take the average of pixels of two images as a new sample. The two images are not even limited to the same category. SamplePairing is very simple and useful for medical images. SamplePairing significantly improved classification accuracy for all the test sets. For example, the top-1 error rate was reduced from 8.22% to 6.93% on CIFAR-10 [17].

B. DEEP LEARNING BASED IMAGE AUGMENTATION 1) FEATURE SPACE AUGMENTATION
Neural networks are very powerful at mapping high dimensional inputs into lower dimensional representations. In [18], they increase the classification accuracy on CIFAR-100 from 66% to 73% accuracy by manipulating the modularity of neural networks to isolate and refine individual layers after training. Synthetic Minority Over-sampling Technique (SMOTE) [19] is a popular augmentation used to alleviate problems with class imbalance.

2) ADVERSARIAL ATTACK
Adversarial attack is a popular technique in image processing. Adversarial attack refers to adding adversarial noise to the original image to make the model misclassify. In [20], they use DeepFool to achieve adversarial attack and cause a misclassification with high confidence. In [21], they can misclassify 70.97% of images by changing one pixel. In [22], they cause misclassifications with adversarial attack limited to the border of images. The success of adversarial attack is especially exaggerated as the resolution of images increases.

3) GAN-BASED DATA AUGMENTATION
The use of GANs to perform data augmentation in medical imaging shows good results. In [23], a cervical intraepithelial neoplasia grade classification problem is investigated on segmented epithelium image patches. A conditional GAN is applied to expand the limited training dataset by synthesizing realistic cervical histopathology images. To control the feature quality of selected synthetic images for data augmentation, they propose a synthetic-image filtering mechanism based on the divergence in feature space between generated images and class centroids. As a result, they increase the classification accuracy from 66.3% to 71.7% using the same ResNet18 baseline classifier after leveraging conditional GAN generated images with feature based filtering.
In [24], a method is proposed for generating liver lesion images using deep learning GAN. This method is demonstrated on a limited dataset of computed tomography (CT) images of 182 liver lesions (Cysts, Metastases, and Hemangiomas). It achieves an improvement of 7% in accuracy for the liver lesion classification task by using synthetic data augmentation, showing that the generated medical images can work for synthetic data augmentation, and improve the performance of CNN for medical image classification. In [25], a generative algorithm is proposed to produce synthetic abnormal brain tumor multi-parametric MRI images from their corresponding segmentation masks using an image-to-image translation GAN. The dice score (mean/standard deviation) with the experimental setting of ''Real+Synthetic'' is 0.82/0.08. That is a good result. In [26], a novel approach is introduced to generate synthetic medical images using generative adversarial networks (GANs). The proposed model can create brain PET images for three different stages of Alzheimer's disease-normal control (NC), mild cognitive impairment (MCI), and Alzheimer's disease (AD). The mean SSIM of this model of real and generated images is 77.48, showing good generating performance.
In [27], Conditional Progressive Growing of Generative Adversarial Networks (CPGGANs) is proposed for medical image data augmentation using automatic bounding box annotation to improve the training robustness, incorporating highly-rough bounding box conditions incrementally into Progressive Growing of Generative Adversarial Networks (PGGANs) to place brain metastases at desired positions/sizes on 256 × 256 Magnetic Resonance (MR) images, for Convolutional Neural Network-based tumor detection. The novelty of this study is very good. The work of [28] proposes a two-step GAN-based data augmentation method that generates and refines brain Magnetic Resonance (MR) images with/without tumors separately. In the first step, PGGANs are used to generate realistic/diverse 256 × 256 images. In the second step, the Multimodal UNsupervised Image-to-image Translation (MUNIT) is used to combine GANs/Variational AutoEncoders or SimGAN, to further refine the texture/shape of the PGGAN-generated images to the real ones. The results show that the sensitivity of this study is between 93.67% and 97.48% in tumor detection. The work of [29] evaluates the use of CycleGAN for data augmentation in CT segmentation tasks, where a CycleGAN is trained to transform contrast CT images into non-contrast images. This study can reduce manual segmentation effort and cost in CT imaging.
In the natural scene, using GANs to perform data augmentation also shows good results. In [6], the semantic segmentation problem is carried out in the natural scene, where data augmentation approach is used to balance the label distribution in order to improve segmentation performance. A Pix2pix HD [30] model is applied to generate realistic images on the condition of the specific semantic label map. The generated images not only improve segmentation performance of those classes with low accuracy, but also obtain a 1.3% to 2.1% increase in average segmentation accuracy.
Although GANs are powerful, it is very difficult to get high-resolution images and they require a huge amount of images to train. Thus, using GANs to solve small dataset problems still is a challenge [31].

4) THE APPLICATIONS AND HIGH-PERFORMANCE IMPLEMENTATIONS OF DEEP LEARNING IN BIOMEDICAL AND BIOLOGICAL INFORMATICS
At present, deep learning is widely used in biomedical and biological information domains. For example, [32], [33] use a genetic hierarchical network and SVM to predict credit score. References [34], [35] use a deep neural network to evaluate and diagnose the electrocardiogram signal. Reference [36] uses ResNet networks to recognize sensor signals. All these studies obtain promising results and show the advantages and high performance of deep learning. References [37], [38] present a CUDA-based CT image reconstruction tool based on the algebraic reconstruction technique (ART) or cuART. They propose a symmetry-based CSR format (SCSR) to further compress the CSR data structure and optimize data access for both Sparse Matrix-Vector (SpMV) and SpMV-T via a column-indices permutation, and the experimental results are pretty good. Reference [39] develops a system, MemXCT, that uses an optimized SpMV implementation with two-level pseudo-Hilbert ordering and multi-stage input buffering, which is a novel memory-centric approach that avoids redundant computations at the expense of additional memory complexity. MemXCT can reconstruct a large (11K × 11K) mouse brain tomogram in 10 seconds using 4096 KNL nodes (256K cores), which is a good result.

C. EM IMAGE CLASSIFICATION
For EM image classification, there are two basic categories of feature extraction techniques: ''hand-crafted features'' and ''feature learning'' [40]. The basic idea of ''hand-crafted features'' is as follows. First, we extract the features of EM images. Second, we classify these EM images according to the features. The hand-crafted features include global shape, local shape (including SIFT), texture, color, etc. However, hand-crafted features are insufficient for representing diverse appearances of EMs because hand-crafted features are manually designed based on prior knowledge and investigation. Compared to this, feature learning is a better technique for EM image classification, including Bag of Visual Words (BoVW), Sparse Coding (SC), deep learning, etc.
Different from extracting specific features, the technique of feature learning is to extract features that do not have meaning in practice but can represent diverse appearances of EMs in the hidden world. For example, in [40], conditional random fields and deep convolutional neural networks are used to extract features and classify EM images. Their dataset contains 20 classes of EMs. Each EM class is represented by 20 microscopic images. The mean average precision of the experimental results is 91.40%, and it shows that the classification accuracy is very high and the method is effective. In [41], SC is used to classify EM images. Their dataset VOLUME 8, 2020 contains 15 classes of EMs. Each EM class is represented by 20 microscopic images. To overcome small dataset problem and effectively represent scarce training images, they use SC which extracts scient local features from an image and reconstructs it by a sparse linear combination of bases. They also use weakly supervised learning to jointly perform the localization and classification of EMs by examining the local information in training images. The mean average precision of the experimental results is about 55% and shows the effectiveness of the method.
As in the previous studies, EM image classification has always suffered from the problem of small datasets. To improve the classification accuracy, these studies all design better classification methods, rather than augment EM image datasets. So, EM image augmentation is a perspective of improving classification accuracy.

III. ENHANCED FRAMEWORK OF GENERATIVE ADVERSARIAL NETWORKS A. GENERATIVE ADVERSARIAL NETWORKS
GAN is first introduced in [42], based on a game-theory that consists of a generator (G) network and a discriminator (D) network. The purpose of G is to generate fake images to deceive D, and the purpose of D is to discriminate real images and fake images generated by G. Through training, G and D are in a Nash equilibrium. To be specific, D cannot discriminate between real images and fake images generated by G. The framework of GAN is shown in Fig. 2. In recent years, there are improved GANs because of good performance in generating images. For example, Deep Convolutional Generative Adversarial Networks (DCGAN) [43], Wasserstein Generative Adversarial Networks (WGAN) [44], and Improved Wasserstein Generative Adversarial Networks (WGAN-GP) [45] are highly praised GAN models.
The discriminator and generator of DCGAN use the convolutional neural network (CNN) to replace the multi-layer perceptron in GAN. Meanwhile, to make the whole network differentiable, the pooling layer in CNN is removed, and the global pooling layer is used to replace the full connection layer to reduce the computation. DCGAN is an improvement on GAN, and its modified version of the original GAN is mainly in network structure. Up to now, DCGAN has greatly improved the stability of GAN training and the resolution of the generated images.
Different from DCGAN, WGAN mainly improves GAN from the loss function. In theory, WGAN gives the reason for the instability of GAN training, that is, the cross-entropy (JS divergence) [46] is not suitable for measuring the distance between the generated data distribution and the real data distribution. Instead, Wasserstein distance [44] is used to measure the distance between the generated data distribution and the real data distribution, which theoretically solves the problem of unstable training. However, the use of the Wasserstein distance requires Lipschitz continuity [47]. To satisfy this condition, the authors impose the Lipschitz continuity by restricting the weights to a range, but this also causes the problem of gradient disappearance or gradient explosion.
To solve the problem of gradient disappearance or gradient explosion, and to find an appropriate way to meet the continuity condition of Lipschitz, in WGAN-GP, the author proposed the method of gradient penalty to meet the continuity condition. WGAN-GP has a faster convergence rate than WGAN and can produce higher resolution images.

B. DETAILS OF THE EF-GANs
As shown in Fig. 3, the structure of the EF-GANs is as follows: In step-1, we rotate original images in the same direction and perform color space transformation. In (I), let the pixel value of the GT image at the point (x, y) be I GT (x, y). In (II), the GT image is rotated every 3 • counterclockwise from 0 • to 180 • . When the area of the bounding rectangle is the smallest, the bounding rectangle is called the minimum bounding rectangle. When the radian of rotation angle is θ, The EF-GANs combine geometric transformation methods and GANs for EM image augmentation. It is not a new GAN but an enhanced framework of GANs. Various GANs can be inserted into the EF-GAN framework. As shown in Fig. 4, the radian of rotation angle is θ. The long axis is the line when the midpoint of the two short sides of the bounding rectangle is connected. The short axis is the line that the midpoint of the two long sides of the bounding rectangle is connected. They are two blue dashed lines that are perpendicular to each other, and the orange cross is the centroid.
In (III), we regard the MBR and centroid of the GT image as an original image because of their corresponding relationship. In (IV), we rotate the original image to make its long axis horizontal and centroid on the same side. Specifically, if the centroid is on the left of the short axis, we rotate the original image by θ, shown in Fig. 5(a). If the centroid is on the right of   the short axis, we rotate the original image by θ + π, which is shown in Fig. 5(b). In (V), we use the image to perform color space transformation. Specifically, we transform the image into fifteen color spaces by multiplying a random number between 0 and 1. After step-1, we augment the original image to fifteen times.
In step-2, the augmented images from step-1 are input into GANs to generate more images. Specifically, we choose various GANs and insert them into the EF-GANs, such as DCGAN, WGAN, and WGAN-GP.
In step-3, because the angle of an EM in an image does not affect its class label, we rotate images generated by the EF-GANs in different directions to increase image diversity.
As shown in Fig. 3, in step-1, we augment images fifteen times. This can solve the problem that GANs need plenty of images to train nets so that they can generate high resolution images. In step-2, we use GANs to augment images further and increase image diversity. In step-3, we enhance the image direction feature through rotation. After step-1, step-2, and step-3, we augment images 240 times and this method does not affect image quality. Because we rotate EM images to make their long axis horizontal and centroid on the same side, we help the GAN reduce the distance between the real images and the fake images, making it easier to generate the images. Therefore, GANs can generate effective images. Although some background information of EM images would be lost after step-1, the background information contains less information about GANs image generation and EM classification. So it is worth it.

2) DATASETS SETTING
We randomly divide each EM class into three parts with a ratio of 1:1:2, corresponding to the training, validation, and test sets. Therefore, for each EM class, we have five, five, and ten images for training, validation, and test. To avoid interference, we only use the training dataset for image augmentation, and the validation and test sets are used for the evaluation. In step-1, five training images are expanded to 75 images in each EM class. Specifically, we multiply the EM images by a random number between 0 and 1, and map the EM images to 15 different color spaces. In step-2, we use these 75 images to train EF-GANs to get 300 generated images. In the experiment, we observe that after generating 240 to 250 images, the images become more and more similar, so we heuristically cut off the generated images at 300 to keep away from redundant information. In step-3, we augment these 300 images with rotation by 0 • , 90 • , 180 • , and 270 • , which results in 1200 images for training.

3) EXPERIMENTAL ENVIRONMENT
The experiment is conducted by Python 3. The models we used in our experiment are implemented using Keras [48] framework with Tensorflow [49] as backend.   In our experiment, we use a workstation with Windows 10 operating system with Intel(R) Core(TM) i7-8700K CPU with 3.70GHz, 16GB RAM, and NVIDIA GEFORCE RTX 1080 8GB.

B. EVALUATION OF IMAGE GENERATION
As shown in Fig.7, there is a comparison between the images generated by original GANs (i.e., DCGAN, WGAN, WGAN-GP) and EF-GANs (i.e., EF-DCGAN, EF-WGAN, EF-WGAN-GP). We can see that the EM images generated by EF-GANs have higher visual quality than that generated by the original GANs. To be specific, the images generated by original GANs are full of noise, so it is impossible to distinguish what EMs they are. In contrast, EF-GANs cannot only generate distinguishable EM images but also have accepted image resolution. The reason why EF-GANs perform better than the original GANs in generating EM images is that the EF-GANs can generate rotation-invariant images, but the original GANs cannot. For example, in [43], face images are augmented using DCGAN, while face images are directional (eyes are up, nose is in the middle, and mouth is down. ). In [44], indoor images are generated by WGAN, while indoor images are directional (furniture is on the floor). In [45], WGAN-GP generates bedroom images, while bedroom images are directional (beds and chairs are on the floor). EF-GAN unifies EM direction by using classical geometric methods, which makes the generation of EM images take no account of orientation, so that high-quality EM images can be generated. There are differences in EM image generation between different EF-GANs. To be specific, the EM images generated by EF-DCGAN are artificially distinguished and have pure backgrounds. However, some images are fuzzy. The resolution of EM images generated by EF-DCGAN are not high and their outlines are not very clear. The EM images generated by EF-WGAN have clearer outlines, purer backgrounds, and higher resolution. The EM images generated by EF-WGAN-GP have clearer outlines, higher sharpness, and higher resolution and less noise. An example of the augmented images generated by EF-DCGAN, EF-WGAN, and EF-WGAN-GP is shown in Fig. 8. There are K classes of EM images in all. According to Eq.5, we average all classes of APs to get mAP.

2) EVALUATION OF DATA AUGMENTATION FOR EM IMAGE CLASSIFICATION
In this section, we demonstrate that image augmentation improves the EM image classification performance. We choose the VGG16 networks to test the classifier's improvement by adding different numbers of augmented images to the training set. Especially, in this multi-class classification task, we use a one-vs-rest strategy to design our experiment. Firstly, for 21 EM image categories {ω 1 , . . . , ω 21 }, we divide them into positive and negative classes. ω i , for i ∈ 1, . . . , 21 is the positive class, and j=1:21;j =i ω j , for i ∈ 1, . . . , 21 is the negative class. Secondly, for each one-vs-rest classification task, we add different numbers of images generated by different EF-GANs to the training set. We use the training and validation sets to train the VGG16 networks and calculate the Average Precision (AP). Thirdly, we calculate the mean Average Precision (mAP) of each one-vs-rest classification task. Finally, we calculate the mAP of each task. The result is shown in Table. 1.
For each one-vs-rest classification task, we add different numbers of images generated by different EF-GANs to the training set. For example, ''EF-DCGAN,15'' means adding 15 images generated by EF-DCGAN to the training set. ''Original'' means only using the original training set.
Because of the extremely unbalance of our datasets, only using accuracy to measure the effect of classification is not enough. The ratio of positive classes to negative classes in our validation sets is 1:20. If the classifier always predicts a negative example, the accuracy is 95.24%, which is very high. The TN is 100% but the FN is 0%. So this result  is meaningless. Therefore, we use precision and recall with confidence intervals, as well F1 score to measure the effect of our networks. F1 score is the harmonic mean of precision and recall. As shown in the Fig. 9, we calculate the accuracy, precision, recall, and F1 score. There are 20 of 21 tasks that the accuracy of ''Optimal'' is greater than or equal to the accuracy of ''original,'' 19 of 21 tasks that the precision of ''Optimal'' is greater than or equal to the precision of ''original,'' 20 of 21 tasks that the recall of ''Optimal'' is greater than or equal to the recall of ''original,'' and 20 of 21 tasks that the F1 score of ''Optimal'' is greater than or equal to the F1 score of ''original.'' The 90% confidence interval for all these results are between 0% and 8%. The results show that the EM images augmented by EF-GANs can improve the performance of classification.

3) DISCUSSION OF THE VGG16 CLASSIFICATION RESULTS
We evaluate the ability of EM image generation of EF-GANs by the improvement of VGG16 classification results. Table. 1 shows the classification performance of VGG16 networks. There are 21 one-vs-rest EM image classification tasks. After data augmentation, the APs with VGG16 networks increase  4.5% and 84.1% on 20 tasks. ''EF-WGAN,15'' has the highest AP with VGG16 networks in six tasks and its mAP is also the highest, showing that VGG16 should increase the classification performance with a small number of augmented images. That is because when a large number of images are added, the newly augmented images will have background information on other classes of images. The mAP results in Table. 1 are represented by a line graph in Fig. 10. We can see that the mAP is the highest when we add 15 images augmented by EF-WGAN and EF-WGAN-GP, and the mAP is the second highest when we add 15 images augmented by DCGAN. It shows that adding a small number of EM images into training sets have better results. In our model, we add different numbers of fake EM images generated by EF-GANs into training sets. When we add a small number of EM images into training sets, the mAP increase. However, when we add more fake EM images generated by EF-GANs into training sets, it brings much noise as well, as a result of the decrease of mAP. It shows that adding more EM images into training sets is not always helpful because the generated EM images are fake. In conclusion, if there are many fake EM images in the training sets, it will bring more noise, and when the noise accumulates to a certain extent, the training effect will decline.
In the VGG16 classification tasks of ''ω 5 vs rest,'' ''ω 7 vs rest,'' ''ω 10 vs rest,'' and ''ω 17 vs rest,'' the increase of results is less than 10%. By observing the information of these EMs, we find that the original results are high over 67% in the task of ''ω 7 vs rest,'' ''ω 10 vs rest'' and ''ω 17 vs rest,'' so that the increase of results is less than 10%. The boundary of ''ω 5 '' augmented images is not clear, so that the increase of its result is less than 10%.
In the VGG16 classification tasks of ''ω 8 vs rest,'' ''ω 9 vs rest,'' ''ω 11 vs rest,'' ''ω 12 vs rest,'' ''ω 13 vs rest,'' ''ω 14 vs rest,'' ''ω 15 vs rest,'' ''ω 16 vs rest,'' ''ω 18 vs rest,'' ''ω 19 vs rest,'' and ''ω 21 vs rest,'' the increase of results is more than 40%. The shapes of these EMs are like a circle and some have a few elongated body parts so that the increase of results is more than 40%. In the task of ''ω 6 vs rest,'' it has a better result that does not add augmented images to the training set. By observing the information of EM ''ω 6 ,'' we found that it is difficult to classify because of its characteristic that some of them are clustered together and some are isolated.
In conclusion, EF-GANs have a strong ability to generate independent EMs, but a weak ability to generate a cluster of EMs. Adding a small number of generated images have a better effect than adding a large number of generated images.

D. ADDITIONAL EXPERIMENT A: TEST WITH ResNet50
To improve the robustness of the model, we have implemented the ResNet50 networks to complete 21 one-vs-rest EM image classification tasks and the result is shown in Table. 2. The APs value with ResNet50 networks increase between 8.7% and 38.7% in 12 tasks, decline slightly in 7 tasks, and stay the same and equal 100% in 2 tasks. This proves the effectiveness of our method. However, there are seven classes of EM images where the APs values are decreasing. By observing these images, we found that the boundaries of these EM images are not clear and almost mixed with the background so that the increase of results is less than 10%. So, our method has a good effect on the generation of single EM with the clear edge and pure background, but a little weak for the generation of EM images with the fuzzy edge, impure background, or clusters.

E. ADDITIONAL EXPERIMENT B: TEST WITH OTHER DATASET
As the EMDS-5 test-set has only 10 images in each class of EM, in order to further prove our method, we collect another 630 EM images for testing. There are 21 classes of EM images, and each class of EM images has 30, a total of 630. According to the test results of EMDS-5 in Table 1, we choose three kinds of VGG16 networks to test each class of EM images. The first is the VGG16 networks without augmentation, ''Original.'' The second is the VGG16 networks with the optimal augmented setting for all EM classes, ''EF-WGAN, 15'' from the previous experimental result in Table 1. The third is the VGG16 networks with the optimal augmented setting for single EM classes, which is 21 different optimal VGG16 networks for each single class of EM images from the previous experimental result in Table 1. Therefore, we can reflect the generalization performance of our method and the result is shown in Table 3.
From Table 3, we can find that in the 21 one-vs-rest EM image classification tasks, there are 20 tasks that the APs with ''Optimal setting for all EM classes'' and ''Optimal setting for single EM classes'' are 1.66% to 88.1% higher than ''Original.'' There is only one task that the APs with ''Original'' is higher than ''Optimal setting for all EM classes'' and ''Optimal setting for single EM classes.'' The mAP with ''Optimal setting for all EM classes'' and ''Optimal setting for single EM classes'' are 31.95% and 30.81% higher than ''Original,'' respectively. It shows that our generated images VOLUME 8, 2020 .) The first row shows different classification tasks. The second to the bottom rows show the classification APs. The first column shows the EF-GANs and the numbers of added images. The values in bold for each column are more significant than ''Original'' or the same as ''Original,'' and the values with stars are the maximum for each column. .) The first row shows different classification tasks. The second row shows the classification APs without augmentation. The third row shows the classification APs with the optimal augmented setting for all EM classes (EF-WGAN,15). The fourth row shows the classification APs with the optimal augmented setting for single EM classes. The values in bold are the maximum for each column.
are useful for EM image classification. For the only one task with reduced AP value, the AP decrease by less than 10% compared with without augmentation. By observing that class of EM images, we find that they are all clustered together so that the VGG16 networks cannot distinguish. It shows that our method has weakly ability for generating clustered EMs to increase the classification APs. In conclusion, we use a completely different EM test-sets to test the VGG16 networks with promising results. It shows that our augmented EM images are not only useful for EM image classification but also prove the robustness of our method.

F. COMPUTATION TIME
In terms of computation time, it takes about 20-30 minutes to train DCGAN, WGAN, and WGN-GP with batchsize 32, stable learning rate 0.0002, and 800 training epochs for generating a single class of EM images. The selected parameters are the same as that in [44] to train the EF-GANs better. For classifying the single class of EM images, we take about 65 seconds to train VGG16 and ResNet50 with batchsize 5, stable learning rate 0.001, and 50 training epochs, and 13 seconds to test. Considering that the performance of our GPU, NVIDIA GEFORCE RTX 1080 8GB, is not superior enough, if we switch to NVIDIA GEFORCE RTX 2080Ti 11GB [50], the training time can be reduced by 40%, theoretically.

V. CONCLUSION AND FUTURE WORK
In this paper, we propose an EF-GANs network for the EM image augmentation task. In step-1, we rotate original images to the main axis direction and perform color space transformation. In step-2, we add the augmented images from step-1 to EF-GANs to generate images. In step-3, we rotate the generated images from step-2 to different directions. Finally, we augment images 300 times. We solve the problem that a small dataset cannot use GANs to perform image augmentation. After that, we add the augmented images to the training set of VGG16 networks to test their validity, which in the 21 one-vs-rest EM image classification tasks, the APs of 20 EM categories are improved and among them 11 categories are over 40%. It shows that our method can augment high quality EM images and can improve the results of EM image classification. There is only one class of EM images that the APs with VGG16 decrease. By observing this class of EM images, we find that some of EMs are clustered together, so they are hard to be generated and the APs with VGG16 are declining. It shows that EF-GANs have a weak ability to generate EM images in which multi EMs are clustered together.
To test the quality of the generated EM images, we also use the ResNet50 networks to complete 21 one-vs-rest EM image classification tasks under the same experimental setting as VGG16 networks. The result shows that in the 21 one-vs-rest EM image classification tasks, the APs with ResNet50 networks increase between 8.7% and 38.7% in 12 tasks, decline slightly in 7 tasks, and stay the same and equal 100% in 2 tasks. The result shows that we can augment high quality EM images and increase the result of classification. By observing these seven classes of EM images where the results decrease, we find that the boundaries of these EM images are not clear and almost mixed with the background so that we cannot augment EM images with high resolution.
To further prove our method, we use another totally different EM image set for testing. We use three previous VGG16 networks that training by EMDS-5, which are ''Original,'' ''EFWGAN,15'' and the optimal VGG16 networks for each single class of EM images. The result shows that in the 21 one-vs-rest EM image classification tasks, there are 20 tasks that the APs with ''Optimal setting for all EM classes'' and ''Optimal setting for single EM classes'' are higher than ''Original.'' It shows that our augmented images have high quality and high resolution, and they can improve the APs of EM image classification. Considering that we use different EM images to test, it also shows that EF-GANs have good robustness for other EM image datasets.
In conclusion, EF-GANs have a strong ability to augment EM images which are single EMs and have clear boundaries, but a weak ability to augment EM images that EMs are clustered together and have fuzzy boundaries. And EF-GANs have strong generalization performance for other EM image datasets. In the future, we plan to test our EF-GANs on more GANs to find a more effective solution to augment the limited rotation-invariant EM images. We also plan to combine other geometric transformation techniques with GANs to augment EM images. For example, in [51], radial transformation is used in medical image augmentation for training deep neural networks. We will combine radial transformation with GANs to augment EM images. We will use the technique of GPU parallel acceleration [52] to further improve the training time of EF-GANs. And we will use our EF-GANs on other medical images such as cervical histopathology images, CT images of lung cancer, and CT-MR brain images.
In addition, in [53], a new GAN called GauGAN is proposed for image-to-image translation. It can convert a semantic segmentation mask to a photorealistic image and has spatially-adaptive normalization, a simple but effective layer for synthesizing photorealistic images given an input semantic layout. The GauGAN can produce photorealistic outputs for diverse scenes including indoor, outdoor, landscape, and street scenes. Hence, we plan to use this model for EM image augmentation, where the GT images of EM will be fed as the semantic layout to the GauGAN, and then generate the EM images. It is obvious that synthesising GT images of EM is easier than synthesizing EM images. So, we can synthesize a number of GT images of EMs, then use GauGAN to generate more EM images. He is also the Head of the Environmental Engineering Institute. His research interests are microbial molecular ecology and systems biology, water pollution control, and waste recycling.
SHOULIANG QI (Member, IEEE) received the Ph.D. degree from Shanghai Jiao Tong University, in 2007. He is currently an Associate Professor with the Sino-Dutch Biomedical and Information Engineering School, Northeastern University, China. He joined the GE Global Research Center, where he was responsible for designing innovative magnetic resonance imaging (MRI) system. From 2014 to 2015, he was a Visiting Scholar with the Eindhoven University of Technology and the Kempenhaeghe Epilepsy Center, The Netherlands. In recent years, he has been conducting productive studies in intelligent medical imaging computing and modeling, machine learning, brain networks, and brain models. He had published more than 80 papers in peer-reviewed journals and international conferences. He has won many academic awards, such as the Chinese Excellent Ph.D. Dissertation Nomination Award and the Award for Outstanding Achievement in Scientific Research from the Ministry of Education.
YUEYANG TENG worked with the PET-CT R&D Department, Shenyang Neusoft Medical System Co., Ltd., from 2005 to 2013. Since 2013, he has been working with Northeastern University, China. He is currently an Associate Professor with the School of Medicine and Bioinformatics, Northeastern University, and a Young Director of the Artificial Intelligence Branch of the Biomedical Engineering Society. His research areas include machine learning (especially deep learning) theory and application, neuroinformatics, and biomedical imaging. VOLUME 8, 2020