Towards Imbalanced Image Classification: A Generative Adversarial Network Ensemble Learning Method

Learning from minority class has been a signiﬁcant and challenging task which has many potential applications. Weather classiﬁcation is such a case of imbalanced label distribution. This is because in places like Beijing, some types of weather, such as rain and snow, are relatively rare compared to sunny and haze days. Existing methods are primary to classify the weather conditions relying on expensive sensors or human assistance, which however usually are expensive and time-consuming. In this paper, we propose a new ensemble framework based on the advanced generative adversarial network and an effective data cleaning way to address the class imbalance problem for weather classiﬁcation. The proposed method not only generates new and reliable samples for the minority class to restore balance, but also ﬁlters those generated samples which are unreliable. Experiments show that our approach outperforms the state-of-the-art methods by a huge margin for imbalanced weather classiﬁcation on several benchmark data sets.


I. INTRODUCTION
Weather conditions not only strongly influence our daily lives in many ways but also have many potential applications ranging from solar energy systems to outdoor intelligent transportation. While weather classification [1], [2] techniques have been pretty accurate, most of them relied on expensive sensors or human assistance to identify the weather conditions. If we can exploit existing surveillance, which captures thousands of weather images in each camera everyday, it may be possible to turn an expensive sensor detection system into a powerful and cost-effective computer vision application.
In the paper, we formulate the weather classification as an image classification problem based on computer vision. Weather image classification is influenced by various factors, such as complex lighting changes, shadows and object occlusion. In particular, although the previous works [3]- [5] have provided interesting solutions for image classification, most of them only pay attention to the progress of feature extraction The associate editor coordinating the review of this manuscript and approving it for publication was Yuyu Yin . and matching [6] without considering the imbalanced data. That means these methods are based on the clear and balanced data. However, as the weather distribution table of 365 days a year in Beijing shown in Figure 1, more than 300 days a year are cloudy and sunny, while the percentage of snowy weather or hazy weather a year is just one to two percents. Overall, the data of weather are extremely unbalanced in real life.
Existing classification methods tend to underperform on minority classes. This is because their goal is to optimize the overall accuracy regardless of the relative distribution of each class [7]. When a small number of samples (such as snow) is combined with most types of samples (such as sunny days) to train a model, most of machine learning classification algorithms often struggle to learn the characteristics of snowy days. To address the problem, in the paper, we take measures from two aspects: constructing a balanced dataset and designing a stronger classifier.
To make the weather data balanced, we can increase more samples to the minority class as supplementary data or remove samples from the majority class. However, only creating copies of the existing samples may cause over-fitting Weather data for 365 days in Beijing in 2018. As we have observed, the weather data are extremely imbalanced. There are many sunny and cloudy days in a year, and almost no snow days. and the reduction of the number of samples may cause loss of useful information by removing significant patterns [8]. Thus it is important to figure out a valid and reliable generative model for mimicking real minority class of weather in order to augment and balance the training dataset in classification.
Motivated by the remarkable successes of Generative Adversarial Nets (GAN) [9] in computer vision and machine learning, we adopt GAN to increase the number of samples on the minority class. On the one hand, GAN can leverage the practically unlimited amount of unlabeled images to learn good intermediate representations [10]. On the other hand, instead of simply copying the content and style of the original image, the images produced by GAN are with diversity [11]. And it does not require the data to follow any specific assumptions when modeling the complex data even with some implicit distributions. Therefore, GAN is naturally suitable for the task of data generation in order to augment and balance the training dataset in classification. Further, VGG is introduced to extract the weather feature based on the augmented weather dataset that consists of both the original and the generated data. Although some generated weather data by using GAN are severely distorted, almost no studies try to filter the generated weather data. In the paper, we introduce the data clearing method Edited Eearest Neighbours (ENN) [12] to remove noisy instances from the training set whose class label differs from the class of at least half of its k nearest neighbors. The accuracy of classification model not only depends on the balanced training data but also the applied classifiers. We thus present an ensemble learning method by jointly applying the k-nearest neighbors(KNN), support vector machine(SVM), random forest (RF) and the boosting method. The framework is shown in Figure 2.
To summarize, our main contributions are as follows: • A framework is proposed for data augmentation by using GAN to generate supplementary data in weather classification task, which aims to address the weather classification problem from an imbalanced dataset.
• To get high-quality data, data cleaning methods are introduced to select more reliably generated samples.
In addition, we apply the ensemble learning method using KNN, SVM, RF and Adaboost joinly.
• Extensive experiments have been conducted and the results demonstrate that the proposed method is effective and performs superiorly than training the model based on unbalanced data directly.

A. LEARNING WITH IMBALANCED DATASETS
Weather classification has attracted much attention in computer vision recently. Current techniques related to weather classification mainly focus on balanced datasets [3], [4]. However, as we known, weather classification is such a case of imbalanced distribution of labels. Hence, learning with imbalanced weather datasets is of great significance. Data-level methods for addressing class imbalance include over-sampling and under-sampling. Chawla et al. [13] introduce the Synthetic Minority Over-sampling Technique (SMOTE) which produces artificial minority samples by interpolating between existing minority samples and their nearest minority neighbors. Majority weighted minority over-sampling technique (MWMOTE) [14] generates the synthetic minority class samples according to Euclidean distance. In imbalanced data sets, the noisy and borderline examples might create problems. To address the problem, some methods make efforts to develop re-sampling method with some filtering techniques. Borderline-SMOTE [15] improves upon the original algorithm by also taking majority class neighbors into consideration by limiting over-sampling to the samples near class borders. SMOTE-IPF [16] addresses the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. However, the broadened decision regions of the SMOTE variants [15], [16] are still error-prone by synthesizing noisy and borderline examples [17]. Therefore, although facing a risk of less fitting, under-sampling the larger class data to make it similar in number to the smaller class is often preferred to over-sampling [18]. In [19], the air pollutant data is used to identify the weather quality also in a extremely imbalanced setting and a prior duplication stategy is applied to improve the forecast of minority class. Different from these works, we aim to resolve this problem by generating weather data of minority classes based on GAN, which does not require the data to follow any specific assumptions when modeling the complex data even with some implicit distributions [20].

B. IMBALANCED LEARNING VIA GAN
Generative Adversarial Networks (GANs) are powerful generative models which have achieved impressive results in many computer vision tasks such as image generation [21], super resolution [22] and image-to-image translation [23]- [27]. GANs formulate generative modeling as a game between two competing networks: a generator network produces synthetic data given some input noise and a discriminator network distinguishes between the generatorąŕs output and true data. The game between the generator G and the discriminator D has the minmax objective. Unlike GANs which learn a mapping from a random noise vector to an output image, conditional GANs (cGANs) [10] learn a mapping from a random noise vector to an output image conditioning on additional information. Pix2pix is a generic image-to-image translation algorithm using cGANs [10].
It can produce reasonable results on a wide variety of problems. Given a training set which contains pairs of related images, pix2pix learns how to convert an image of one type into an image of another type, or vice versa. Cycle-consistent GANs (Cycle GANs) [24] learn the image translation without paired examples. By inverting the mapping of a cGAN [10], i.e., mapping a real image into a latent space and a conditional representation. More recently, Star GAN [27] is proposed to perform multi-domain image translation using a single network conditioned on the target domain label. It learns the mappings among multiple domains using only a single generator and a discriminator. Different from StarGAN, which learns all domain transformations within a single model, we train different simple composable translation networks for different attributes.
A well known problem of generative adversarial models is that while they learn to fool the discriminator they may end up drawing one or few foolish examples. In this work, our aim is to augment an imbalanced image datase to restore its balance. It is of paramount importance that the augmented dataset is variable enough and does not include a continuously repeating example. Thus we need to avoid mode collapse.

III. THE PROPOSED METHOD
The overall architecture of our method can be seen in Fig 2. In the following, we will focus on introducing the VOLUME 8, 2020 implementations of the method and describe it in the form of an algorithm.

A. RESAMPLING WITH GENERATIVE ADVERSARIAL NETWORKS
As the classification results are more biased towards the conventional majority in the classification process, some minority categories are seriously ignored. In this paper, Generative Adversarial Network (GAN) is introduced as a data enhancement approach to effectively increase the number of minority class samples to solve the weather's imbalanced problem rather than by over-sampling with replacement like rotation and skew which only operates in data space. GAN has the ability to synthesize realistic images with better quality through implicitly modeling high-dimensional distributions of data [28], [29].
We mainly focus on feature space by reusing parts of the generator and discriminator networks as feature extractors. The generator which takes a uniform noise distribution as the input tries to find the best image to fool the discriminator and allows us to generate visually appealing samples. Note that, only the minority class is over-sampled by taking each minority class sample. In our experiments, only if the number of some class samples is less than half the number of samples in another class will be displayed as the minority class and then used as the input of discriminator to assist the model to generate more images with the same property.

1) ORIGINAL GENERATIVE ADVERSARIAL NETWORKS (GANs)
GANs includes two models: a generative model G that captrues the data distribution, and a discriminative model D that estimates the probability that a sample came form the training data rather than G [9].
• Generator(G). The generator takes in random numbers with distribution p z (z), and returns an image by making the generated distribution p g (x) as close as to the target data distribution p data (x) as possible.
• Discriminator(D). The Discriminator takes in both real and fake images and returns probabilities to tell the difference between p data (x) and p g (x).
2) DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS (DCGANs) GANs have been known to be unstable to train, often resulting in generators that produce nonsensical outputs, hence deep convolutional generative adversarial networks (DCGANs) [30] is introduced in the paper. Similar to GANs, the strcture of DCGANs also includes a discriminative network and a generated network. However, DCGANs have certain architectural constraints, and is demonstrated that they are a strong candidate for unsupervised learning. Specially, DCGANs have the following features: • All poolling layers are cancelled in DCGAN. In the generator, fractional-strided convolution instead of the pooling layer is used for the down sampling. In the discriminator network, stride convolution is used instead of the pool to perform down sampling.
• Batch normalization (BN) is used for all layers of the network except the output layer of the generator and the input layer of the discriminator, and the use of BN enables stable learning and helps to deal with training problems caused by poor initialization.
• The fully connected layers are removed and the convolutional layer is used directly to connect the input and output layers of the generator and discriminator.
• Use the Tanh activation function at the output layer of the generator and ReLU at other layers, and use the leaky ReLU at the discriminator.

B. DATA CLEANING
Besides sampling methods, another pre-processing step that is gaining popularity in class imbalance classification is data cleaning. Most generated image looks real, but there are still some poorly generated images. To overcome this limitation, data cleaning method Edited Nearest Neighbor (ENN) Rule is utilized to eliminate the outliers samples which generated by GAN method. As is shown in Figure 3, ENN uses the existing sample set to edit itself and screen the samples at the junction of different weather categories in an appropriate way, which can achieve the dual purpose of reducing the number of samples and improving the correct recognition rate. First, for each X i , find the nearest sample Y i (X i ) in different categories. If Y i and X i do not belong to the same category, X i is deleted from the current weather class, and finally a clipped sample set is obtained to replace the original sample set and classify the samples.

C. ENSEMBLE LEARNING
In weather classification, the importance of image classifier cannot be understated. Because of the multiple distribution nature of diversity for the weather image, a single supervised classifier is probably insufficient to catch the diverse representations of weather data. Thus, in this paper, an ensemble learning method is designed to improve the generalization ability of the overall classifier. First, two single effective classifier k-nearest neighbors (KNN) and Support vector machine (SVM) are introduced. KNN classifies data based on the distance metric whereas SVM need a proper phase of training. KNN is less computationally intensive than SVM. However, due to the optimal nature of SVM, it is guaranteed that the separated data would be optimally separated [31]. Second, two famous ensemble learning methods Random forest (RF) and AdaBoost are combined. RF is able to handle high dimensional data without feature selection but there can possibly be a problem of high bias if not modeled properly, and Adaboost can be less susceptible to the overfitting problem than most learning algorithms but it is sensitive to noisy data and outliers [32]. Finally, the predicted class label for a particular sample is the class label that represents the majority of the class labels predicted by each individual classifier (KNN, SVM, RF and AdaBoost). For example, if KNN, SVM, RF predict some sample belongs to sunny, and only AdaBoost predict this sample belongs to snow, we set the sample label is sunny.

1) k-NEAREST NEIGHBORS (KNN) [33]
KNN classifies samples by measuring the distance between different features. If most of the k closest training examples in the feature space (that is, the most adjacent samples in the feature space) belong to a certain category, the sample also belongs to this category, where k is usually an integer not greater than 20. This method only determines the category of the samples to be divided according to the category of the nearest one or several samples. First, the distance between the test sample and all training data is calculated. Then, the distances should be sorted in ascending order. And the K points with the smallest distance are selected. Counting the occurrence times of the category of the first K points, finally, the category with the highest occurrence in the first K points is returned as the prediction P knn .
2) SUPPORT VECTOR MACHINE (SVM) [34] The fundamental objective of SVM is to find a hyperplane to separate the points of different classes. However, it is clear that sometimes a full separation of two different objects would require a curve which is more complex than a line. We need to use a set of mathematical functions, known as kernels, to map the original objects to the higher dimensional feature space to find an optimal line that can separate different objects. In short, SVM performs classification tasks by mapping a vector of predictors into higher dimensional plane that separates cases of two different class labels. The model can be written: To find a linear separating hyper-plane, minimization of the error function is written: subject to the constraints: where w is the vector of coefficients, b is a constant. The index i labels the N training cases. Note that y ±1 represents the class labels and x i represents the independent variables. The kernel is used to transform data from the input (independent) to the feature space.
3) RANDOM FOREST (RF) [35] RF takes a bootstrap sample from the data and fit a classification or regression tree. When constructing the decision tree, random samples of m predictors are selected from the full set of p predictors as split candidates every time the split is considered. Each tree is built according to the following algorithm: 1. Y represents the number of training samples, and X represents the number of features. 2. Input feature number x is used to determine the decision result of a node in the decision tree; Where x should be much less than X. 3. The Y training samples are sampled with replacement N times to form a training set (bootstrap sampling), and the undrawn sample are used to make predictions and evaluate the errors. 4. For each node, x features are randomly selected, and the decision of each node in the decision tree is determined based on these features. 5. Every tree grows whole without pruning.
The function f b (x) represents the classifier of a single random forest. B is an adjustable parameter, and the best B is selected by cross-validation.

4) AdaBoost [36]
AdaBoost is an ensemble method for improving the model predictions of any given learning algorithm which utilizes a set of weak learners to create a single strong learner. AdaBoost trains different classifiers through the operation of the sample set. It changes the sample weight by updating the distribution weight vector, that is, to improve the weight of high error samples, and focuses on training the error samples. Firstly, the weight distribution of training data is initialized. If there are N samples, each training sample is initially given the same weight 1 N . (w 11 , w 22 , . . . , w 1i , . . . , w 1N ), The second step is to train the weak classifier. Perform multiple iterations, using m = 1, 2 . . . , M represents the number of rounds of iteration. Learn from the training data set with weight distribution Dm to obtain the basic classifier (select the threshold with the lowest error rate to design the basic classifier) G m (x) : χ → {−1, +1}. The classification VOLUME 8, 2020 error rate of G m (x) on the training data set is calculated In the training process, if a sample point has been accurately classified, its weight will be reduced in the construction of the next training set. On the contrary, if a sample is not accurately classified, its weight will be improved. The formulation is: Then, the sample set with updated weights is used to train the next classifier, and the whole training process proceeds in such an iterative manner. m+1,1 , w m+1,2 , . . . w m+1,i , . . . , w m+1,N ) Finally, the weak classifiers are combined into strong classifiers.
After the training of each weak classifier, the weight of the weak classifier with small error rate is increased to play a greater decisive role in the final classification function, while the weight of the weak classifier with large error rate is reduced to play a smaller decisive role in the final classification function. In other words, the weak classifier with low error rate has a large weight in the final classifier, otherwise it is small.

5) VOTING CLASSIFIER
To achieve better results, the classification results of these classifiers are combined to determine the classification results. For each sample, the prediction result of most classifiers is the final prediction label of the sample. We define the decision of t th classifier as d t,j ∈ 0, 1, t = 1, 2, 3, 4 denote KNN, SVM, RF and AdaBoost respectively, and j = 1, . . . , C, where C is the number of classes. For each weather class, if t th classifier chooses the class, then d t,j = 1, otherwise, d t,j = 0.

IV. EXPERIMENTS A. DATASETS
We use three weather datasets, Multi-class Weather Image(MWI) [37], Multi-class weather dataset(MWD) [38] and self-made dataset, for performance evaluations. Each of datasets has at least four different weather conditions, as shown in Fig 4. The following is a brief description of these three datasets: • Multi-class weather image (MWI). MWI dataset contains 20K images obtained from many web albums and film, such as Flicker, Picasa, MojiWeather, Poco, Fengniao. Weather conditions are divided into four types, including sunny, rainy, hazy and snowy.
• Our self-made dataset: the dataset belongs to the laboratory self-made dataset. The weather dataset includes 4 weather categories, sunny, rainy, hazy and snowy. There are 2,858 images of sunny days, 1,142 images of rainy days, 1,586 images of hazy days and 750 images of snowy days.
Note that, in the original datasets, the first dataset is imbalanced dataset, but the second and the third dataset are balanced dataset. To keep the setting from the balanced to unbalanced, we need to manually change the amount of each type of weather.

B. EXPERIMENTAL SETTINGS
All compared methods are implemented using TensorFlow in Python. The batch size is set to 64 for GAN and VGG models. During image generation process, all the input images are resized to 108 * 108, and output images are 128 * 128. For KNN classifier, the number of nearest neighbors is set to 4. Euclidean distance is adopted to measure the distance. The other settings are selected by default. The number of trees set by Ntree is 50, and the other settings are default values. For AdaBoost, the number of trees for n estimators is 100.

C. EXPERIMENTAL EVALUATIONS
The proposed method contains three novel ingredients: 1) the GAN model is able to effectively augment and balance the training dataset in classification, 2) the Edited Nearest Neighbor is utilized to eliminate the outliers samples generated by GAN method, 3) a ensemble learning method is designed to improve the generalization ability of the overall classifier. To reveal how each ingredient contributes to the performance improvement, we implemented the following four variants of the proposed method: Variant1 (Denoted as Unbalance): Three unbalanced datasets are used directly to classify without any dispose.
Variant2 (Denoted as +GAN): We use GAN to generate data of the minority to adjust the imbalanced dataset to a new balanced dataset.
Variant3 (Denoted as +GAN+ENN): Considering some data generated by GAN are unreliable, ENN is adopted to clean the data, which utilises the nearest neighbor algorithm to edit data sets and finds out those samples that are not friendly to neighbors and remove them.
Variant4 (Denoted as +GAN+TL): Another method of data cleaning Tomek Link (TL) is compared with ENN.  For TL, sample x and sample y come from different categories and satisfy the following conditions, they are called Tomek Links: There is not any sample z could make d(x, z) < d (x, y) or d(y, z) < d(x, y). And the Tomek Links will be removed because sample x or sample y is likely to be noisy data, or the location of two samples near the category boundary. Each variant is evaluated by four single learning models including KNN, SVM, RandForest, AdaBoost and one designed ensemble learning model. Table 1, 2 and 3 show the evaluation results on the three datasets. And for each dataset, the the ratio of training set and test set is set to 7:3, 5:5, 3:7. In order to clearly show the role of each part, the histogram is displayed in Figure 6.

1) THE EFFECTIVENESS OF GAN
To verify the quality of the generated image, we show the synthetic images in Figure 5. As we observed, the quality of fog and haze is very high, while the weather pictures of rainy and snowy days are somewhat similar and difficult to distinguish. These results reveal that the proposed method could mostly capture the given weather photo information. What's more, compared to training the model with original unbalanced dataset, the model with GAN can effectively improve the accuracy by creating a balanced dataset. The results indicates that it is important to make an imbalance of weather dataset to become a balanced dataset to improve the classification accuracy.

2) THE EFFECTIVENESS OF DATA CLEANING
Two different methods ENN and TL are adopted to clean data. For the generated dataset, ENN removing examples whose class label differs from the class of at least half of   its k nearest neighbors can get better results than TL removeing unwanted overlap between classes where majority class links are removed until all minimally distanced nearest neighbor pairs are of the same class.

3) THE EFFECTIVENESS OF ENSEMBLE LEARNING
In this study, the effect of designed ensemble learning model voting is compared with four kinds of single classifiers, KNN, SVM, RF and AdaBoost. For the four classifiers, the SVM algorithm is more robust when compared with other classifiers. Specially, with the voting method, the results can be further improved. This shows the designed ensemble learning method is effective to help the model to classify weather data.

V. CONCLUSION
In this paper, an application of GAN is proposed to solve the problem of weather classification with imbalanced data. We turn the problem solution into two modules. The first module is the design and implementation of GAN. The second module is the solution to the problem of weather classification. To alleviate the dependence on some unreliably generated images, we also introduce a data cleaning method. With these proposed methods, the performance of proposed imbalanced weather classification method gains significant improvement. In the future work, we will further develop more robust methods to ensure the high quality of the generated images for weather classification.
YI JIN (Member, IEEE) received the Ph.D. degree in signal and information processing from the Institute of Information Science, Beijing Jiaotong University, Beijing, China, in 2010. She was a Visiting Scholar with the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, from 2013 to 2014. She is currently an Associate Professor with the School of Computer Science and Information Technology, Beijing Jiaotong University. Her research interests include computer vision, pattern recognition, image processing, and machine learning.
YIDONG LI received the Ph.D. degree in computer science from The University of Adelaide, in 2011. He is currently a Ph.D. Tutor and the Vice President of the School of Computer and Information Technology, Beijing Jiaotong University. His researches focus on high-performance computing, privacy protection and information security, social network analysis, and parallel and distributed computing.
ZHIPING LIN (Senior Member, IEEE) received the B.Eng. degree in control engineering from the South China Institute of Technology, Canton, China, in 1982, and the Ph.D. degree in information engineering from the University of Cambridge, England, in 1987. He was with the University of Calgary, Canada, Shantou University, China, and DSO National Laboratories, Singapore, from 1987 to 1999. Since 1999, he has been with Nanyang Technological University (NTU), Singapore. He is currently the Program Director with the Centre for Bio Devices and Signal Analysis, NTU. He served as the Chair of the IEEE CAS Singapore Chapter, from 2007 to 2008 and in 2019. VOLUME 8, 2020