Distribution Bias Aware Collaborative Generative Adversarial Network for Imbalanced Deep Learning in Industrial IoT

The impact of Internet of Things (IoT) has become increasingly significant in smart manufacturing, while deep generative model (DGM) is viewed as a promising learning technique to work with large amount of continuously generated industrial Big Data in facilitating modern industrial applications. However, it is still challenging to handle the imbalanced data when using conventional Generative Adversarial Network (GAN) based learning strategies. In this article, we propose a distribution bias aware collaborative GAN (DB-CGAN) model for imbalanced deep learning in industrial IoT, especially to solve limitations caused by distribution bias issue between the generated data and original data, via a more robust data augmentation. An integrated data augmentation framework is constructed by introducing a complementary classifier into the basic GAN model. Specifically, a conditional generator with random labels is designed and trained adversarially with the classifier to effectively enhance augmentation of the number of data samples in minority classes, while a weight sharing scheme is newly designed between two separated feature extractors, enabling the collaborative adversarial training among generator, discriminator, and classifier. An augmentation algorithm is then developed for intelligent anomaly detection in imbalanced learning, which can significantly improve the classification accuracy based on the correction of distribution bias using the rebalanced data. Compared with five baseline methods, experiment evaluations based on two real-world imbalanced datasets demonstrate the outstanding performance of our proposed model in tackling the distribution bias issue for multiclass classification in imbalanced learning for industrial IoT applications.

the correction of distribution bias using the rebalanced data. Compared with five baseline methods, experiment evaluations based on two real-world imbalanced datasets demonstrate the outstanding performance of our proposed model in tackling the distribution bias issue for multiclass classification in imbalanced learning for industrial IoT applications.

I. INTRODUCTION
W ITH the rapid development of industrial Internet of Things (IoT) technology, imbalanced learning has become a flourishing research field in smart manufacturing. The large amount of industrial Big Data collected from numerous industrial IoT sensors and devices has already become the primary productive force in modern Industry 4.0. However, a set of specific data-driven manufacturing scenarios, related to anomaly detection, e.g., network intrusion detection, fault detection, condition-based maintenance, etc., suffers from challenges caused by imbalanced learning problems [1], [2]. In particular, these problems are caused by the imbalanced inter-or intraclass distribution among the large amounts of benign samples (majority class) and extremely few anomalies (minority class). Traditional anomaly detection in industrial Big Data inevitably faces learning biases caused by the extremely imbalanced data. Additionally, it becomes more difficult to cope with the novel anomalies developed by newly designed malicious tools [3], [4].
Actually, identification on minority classes in anomaly detection, stands for practical interests that we are really concerned. To discriminate these minority classes from majority classes, deep learning schemes are discussed broadly in recent years, and a variety of deep generative models (DGMs) are developed to tackle such kind of imbalanced learning tasks [5]. In particular, generative adversarial network (GAN) based data augmentation methods are widely applied to generate data samples in minority classes [6]- [8]. However, there still exist several challenges that have not been well resolved. First, when using traditional resampling methods for data augmentation, for instance, down-sampling of majority classes may cause the loss This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ of part of feature information, while oversampling of minority classes may often result in data duplication or marginalization with unreasonable noises. Second, DGMs highly depend on the training dataset due to the consideration of distance similarity between feature vectors, thus, it may lead to the problem of distribution bias among the generated data, and limit the augmentation on minority classes for further classification accuracy improvement.
In this article, an integrated data augmentation framework using GAN techniques, namely distribution bias aware collaborative GAN (DB-CGAN), is newly proposed to tackle the above challenges, especially the distribution bias issue between the generated data and original data, for imbalanced deep learning in industrial IoT environments. Differing from conventional GANbased data augmentation models, which conduct adversarial training between generator and discriminator, a complementary classifier is introduced and connected into the basic GAN model, so as to realize the collaborative adversarial training among generator, discriminator, and classifier, based on a newly designed weight sharing scheme between two separated feature extractors. An augmentation algorithm is then developed to enhance intelligent anomaly detection in imbalanced learning tasks, which can efficiently improve the classification accuracy based on the appropriate alleviation of distribution bias using the rebalanced data. The main contribution of this article can be summarized as follows.
1) A GAN-based data augmentation framework is constructed to cope with the distribution bias problem in imbalanced learning scenarios, in which a feature classifier associated with the data generator is introduced, so as to realize two collaborative adversarial training processes between data generator and feature discriminator, and data generator and feature classifier, respectively. 2) A conditional generator with random labels is designed, which is trained adversarially with the complementary classifier, thus can efficiently enhance the ability in augmenting the number of data samples in minority classes. 3) A weight sharing scheme is newly developed between the GAN-based feature generator and original feature extractor, which enables the collaborative adversarial training among generator, discriminator, and classifier, and finally effectively correct the distribution bias of the generated data in imbalanced learning. The rest of this article is organized as follows. The stateof-the-art techniques related to this study are reviewed in Section II. The imbalanced learning problem and the basic learning framework are addressed in Section III. We explain the core mechanisms to realize the data augmentation based on our proposed model in Section IV. Experiment and evaluation results are demonstrated and discussed in Section V. Finally, Section VI concludes this article.

II. RELATED WORK
In this section, we survey and summarize current studies related to imbalanced learning in industrial IoT, and data augmentation with GAN, respectively.

A. Imbalanced Learning in Industrial IoT
It is common that the number of different types of samples in a given training dataset may vary significantly, such as disease diagnosis, risk management, and industrial product testing, in modern IoT applications [3], [9]. Recently, researchers have paid more and more attentions to build effective learning models from these kinds of imbalanced datasets. Basically, there are two main aspects of approaches addressing issues on imbalanced learning.
At the algorithm level, existing algorithms were improved to more effectively address the imbalanced classification problem. One of the strategies was cost-sensitive learning [5], in which the goal was to construct a cost matrix by attaching different costs to different categories. Generally, three kinds of methods were usually used to implement cost-sensitive learning: 1) Utilizing classification cost as the weight of the dataset, then using Bootstrap sampling to select samples with the best data distribution; 2) Employing the integrated learning modules, e.g., some standard learning algorithms as weak classifiers, to realize the cost minimization; 3) Combining the cost-sensitive functions or features into parameters of the classifier directly, to better fit the training process. Tang et al. [10] added cost-sensitive learning to SVM, and modified the learning model to improve the effectiveness in imbalanced classification problem. Zhu et al. [11] used cost-sensitive learning to enhance the weighting for rare but valuable instabilities, when evaluating the online short-term voltage stability. A number of researchers used other methods to solve the problem of imbalanced classification. Pan et al. [12] used the latent distribution law of normal data captured by the feature-generated network to generate pseudofault features, then trained the improved deep neural network to detect the typical faults from the imbalanced dataset.
At the data level, data augmentation methods were mainly used to solve the problem of imbalanced classification by reasonably augmenting the training dataset. Sampling, either randomly oversampling a few of classes or undersampling a number of classes, was a simple but effective way to rebalance the skewed datasets. However, these methods might lead to information loss or overfitting issues. Lee et al. [13] proposed a two-stage training method for plankton classification with imbalanced issues, in which a threshold was set according to the distribution of the dataset, then categories with the number of samples greater than the threshold was randomly extracted. Pouyanfar [14] introduced a dynamic sampling method that used the idea of improving samples for reference and dynamically adjusting the dataset according to the training result, which could randomly delete samples for better results.

B. Data Augmentation With GAN
In deep learning applications, when facing imbalanced data which means the number of samples in some specific classes is significantly more/less than the others in the training dataset [15], it is necessary to find a solution to reduce the negative impact in terms of data imbalance on classification accuracy. Odena et al. [16] used auxiliary classifiers to construct adversarial networks and generate more samples for the training dataset. Sampling-based methods [17], [18] have attracted considerable attentions to counteract the effect brought by imbalanced datasets. Thus, a number of variants of oversampling and undersampling techniques were investigated for different imbalanced learning tasks.
Many researchers have used GAN to overcome the generation of meaningless data caused by the added noise when using conventional data sampling methods for data augmentation [19]- [21]. Typically, training a deep learning network often requires a large and scattered dataset that covers as many data instances as possible in practice. The scarcity of training data is still an obstacle in building a robust learning model. Chkirbene et al. [22] applied the GAN structure to learn more informative instances from minority classes. They optimized parameters in inner learning steps of the discriminator in a generic GAN model, aiming to better identify potential attack types in anomaly detection with less training data. Jiang and Ge [23] built a GAN-based data augmentation classifier to solve the problem of imbalanced fault classification in industry. They designed the data filtering and data selection strategies for data purification in a supervised learning process, and introduced a multigenerator structure to deal with the incomplete learning problem in a single generator. Zhang et al. [24] presented an adversarial data augmentation method for object detection tasks, in which a supervised GAN model was constructed based on a discriminator loss that facilitated the finding of decision boundaries in both real and augmented samples. In summary, current research works indicated that it is essential to find a practical and effective way to obtain adequate training data, which is still a costly and challenging task in industrial IoT applications.

III. MODELING OF DB-CGAN FOR IMBALANCED DEEP LEARNING
In this section, we briefly address the distribution bias problem faced in imbalanced learning, followed by the introduction of our basic framework for bias-aware data augmentation based on the collaborative adversarial training.

A. Problem Description in Imbalanced Learning
. . , n} in the imbalanced learning, where x i indicates the data sample, and y i indicates the corresponding label, which belongs to |K| different types of classes, K = {k 1 , k 2 , . . . , k |K| }. The majority class M is defined as the class with relatively more samples, while the minority class S is defined as the class with fewer samples compared with that of majority class. Assuming the majority class M contains N M data samples, and the minority class S contains N S data samples, the general goal is to generatê N S augmented data samples for the minority class, so that the rebalanced dataset satisfiesN S + N S ≈ N M .
Generally, every dataset may follow a certain distribution, assuming D ∼ p(x, y), where p(x, y) is the joint distribution of the raw samples and labels. Thus, the augmented data may satisfyN S ∼ p(x,ŷ), where p(x,ŷ) is the joint distribution of the augmented samples and labels. Conventional GAN-based data augmentation aims to make the distribution of augmented data as closer as possible to that of original data. In an ideal way, the theoretical distribution of the generated data may have the same shape as the original data distribution. However, since most of the GAN-based approaches consider the similarity of distance between feature vectors, they may inadvertently ignore the problem of bias in data distribution. Actually, the augmented data with distribution bias may lead to relatively poor effect on classification accuracy improvement in downstream tasks, which would even become noisy data and reduce the confidence of minority classes during the training process. Therefore, in our proposed model, we aim to correct this kind of distribution bias as much as possible during the collaborative adversarial training among generator, discriminator, and classifier, which could significantly improve the classification accuracy by augmenting appropriate data in minority classes.

B. Basic Framework of DB-CGAN
Basically, the framework of our proposed DB-CGAN consists of three main modules: the data generator, feature discriminator, and feature classifier. As shown in Fig. 1, differing from a conventional GAN-based model, the data generator does not simply learn features from the original data, but is trained adversarially with the feature discriminator and feature classifier, to make the generated data increasingly approximate to the original data, and its distribution to the real data distribution as well. In particular, the adversarial training between data generator and feature classifier based on the shared feature weights can lead to a gradual correction of distribution bias during this data augmentation process, thus efficiently solve the problem of insufficient data in minority classes, and finally improve the classification accuracy in imbalanced learning.
Specifically, the data generator generates the data for the corresponding minority classes based on a given random label sequence. Given the random noise z ∼ N (0, I) and random label c ∈ K as inputs to our model, their feature representation vectors can be obtained using a feature embedding function, respectively. Label representations are then fused with noise representations via an addition operation, and the augmented data x is finally generated by a data encoder.
The feature discriminator is trained to distinguish the generated data x and original data x. The feature vector F of x , and feature vector F of x are extracted by the GAN-based feature generator and original feature extractor, respectively, for better discrimination performance. Generally, the data discriminator and data generator are trained adversarially, so that the generated data may gradually approximate to the original data.
Compared with conventional GAN-based models, a feature classifier is introduced and connected with data generator and feature discriminator for the collaborative adversarial training. Differing from the ECGAN [25], which attached an external classifier outside of GAN, and generated pseudolabels associated with the generated data to deal with few-shot classification tasks in a semisupervised training process, our DB-CGAN adds random labels in the beginning, and shares the feature weights of both generated data and original data via two separated feature extractors. During this supervised training process, the feature classifier is trained adversarially with the data generator, so as to avoid the generated data being easily assigned into majority classes, and finally correct the distribution bias in imbalanced learning.

IV. DISTRIBUTION BIAS AWARE DATA AUGMENTATION BASED ON COLLABORATIVE ADVERSARIAL TRAINING
In this section, we first explain how to realize the collaborative adversarial training in our DB-CGAN, then discuss the core mechanism on distribution alleviation for GAN-based data augmentation. A data augmentation algorithm for imbalanced learning is developed finally.

A. Collaborative Adversarial Training in DB-CGAN
Generally, our proposed DB-CGAN is designed and constructed during two collaborative adversarial training processes between data generator and feature discriminator, and data generator and feature classifier, respectively. In addition to one adversarial training in conventional GAN, another adversarial training is newly designed by introducing a complementary classifier, associated with the data generator. More precisely, as shown in Fig. 1, the original feature extractor shares the weights with the GAN-based feature generator, so as to realize the collaborative adversarial training among the data generator, feature classifier, and feature discriminator, which can efficiently solve the traditional data augmentation problem, alleviate the distribution bias issue, and finally improve the classification accuracy in imbalanced learning.
Similar to the conventional GAN, the data generator is trained adversarially with the feature discriminator, making the generated data closer to the original data. Given the random noise z and random label c, two embedding functions are defined to encode them into feature vectors c e and z e , respectively. The addition operation indicated in Fig. 1 is used to unify the dimensions of c e and z e , which can be mapped into a new conditional feature vector v e for further data augmentation. The detailed process is formulated as follows: where c ∈ K indicates the data label, and z ∼ N (0, I) indicates a random Gaussian noise. h( * ) and g( * ) are the feature embedding functions of noise and label, respectively. In practice, h( * ) and g( * ) can be implemented using fully connected neural networks. Furthermore, v e is fed into the data encoder to generate the augmented data, which can be described as follows: where ω E indicates the parameter of the encoder function. It is noted that since z and c are used as inputs to construct the conditional generator, the generated data x can be controlled by the input c, which means the use of each given label is very important in generating the appropriate data for minority classes during such training process.
The feature discriminator D( * ) outputs a scalar that discriminates whether the input is the real data or from the generated data. The data generator G( * ) and feature discriminator D( * ) work together to learn the features, and maximize the probability of D( * ) in making mistakes that determine the input is from the original data rather than G( * ), according to the following adversarial game: is defined as the expectation of the feature discriminator, p(x) is distribution of the original data. While E (z,c)∼p(z,c) [1 − D (G(z, c))] is defined as the expectation of the data generator, p(z, c) is distribution of the generated data.
Through maximizing E (x)∼p(x) [D(x)], the feature discriminator may identify as accurately as possible whether the input data is the real data or not, while through minimizing D(G(z, c))], the data generator may make the augmented data as closer to the real data as possible. According to this adversarial training process, the loss of data generator can be defined as follows: where Entropy is the binary cross entropy.
In particular, we restrict the random labels within minority classes, which may effectively augment the number of data samples to minority classes, achieving better data augmentation performance in a supervised training. On the other hand, considering most of the data augmentation may lead to the distribution bias during the training process, another adversarial training is conducted by involving a complementary classifier, and further sharing the weights of features extracted from the generated data and original data, respectively, to gradually correct the data distribution in a collaborative adversarial way.

B. Distribution Bias Alleviation in GAN-Based Data Augmentation
To alleviate the distribution bias between the generated data and original data, we introduce a complementary classifier into the GAN, and construct an adversarial training between the generator and classifier, based on the sharing of weights from two separated feature extractors: GAN-based feature generator and original feature extractor. The detailed loss of feature classifier is described as follows: where α and β are the hyperparameters that control the importance of the original and generated data in classification, α + β = 1. θ C is the shared weight for feature classifications of the original data and generated data. In particular, the loss of feature classification of original data L o C , and loss of feature classification of generated data L g C , can be expressed respectively as follows: where C( * ) indicates the function of feature classifier, which outputs the probability of the predicted label. According to (6) and (7), optimization based on L o C ensure the extractor can maximally obtain appropriate features from the original data, while optimization based on L g C ensure the gradient transfer from feature classifier to data generator during the back propagation, thus finally realizing the gradual correction of distribution bias of the generated data.
The design of sharing weight θ C between the two separated feature extractors makes the data generator and feature classifier work together with the feature discriminator for better data augmentation within the collaborative adversarial training process. Therefore, the loss of feature classification of the generated data is considered in optimizing the loss of data generator, which can be described as follows: L G (z, c; θ C ) = Entropy(D(G(z, c), 1) Similarly, the loss of feature classification of the original data is employed to optimize the loss of feature discriminator during the collaborative adversarial training process, which can be described as follows: L D (x, y, z, c; θ C ) = 1 2 (Entropy(D (G(z, c), 0)) + Entropy(D(x), 1)) + L o C (x, y; θ C ). (9) Accordingly, with the gradient of feature classifier transferred to data generator, G( * ) is further optimized to gradually reduce the distribution bias according to (8), making the generated data increasingly closer to the original data. While D( * ) is further enhanced for better discriminability by learning more accurate features from the original data according to (9), ensuring higherquality of the generated data. Following this way, the augmented data during this collaborative adversarial training process that is enabled by the weight θ C shared in (5)-(7), can finally improve the classification accuracy for imbalanced learning tasks.

C. Data Augmentation Algorithm in Imbalanced Learning
Based on the above discussion, we optimize the L C , L G , and L D together in a collaborative adversarial way. According to the core scheme that shares the weights between GAN-based feature generator and original feature extractor, the feature classifier is allowed to be trained adversarially with the data generator and feature discriminator, and realizes the correction of distribution bias in imbalanced learning.
An intelligent data augmentation algorithm based on the DB-CGAN is introduced in Algorithm 1, which includes two main parts. The first part is the training of the DB-CGAN model (from Line 1 to Line 18), which is responsible for learning features from minority classes in the imbalanced dataset. The adversarial training among the data generator, feature discriminator, and feature classifier is conducted alternately. The second part is the data augmentation based on the DB-CGAN (from Line 19 to Line 24). The trained model is used to generate the required data for each target minority class, and the generated data will be combined with the original imbalanced data to form a rebalanced dataset, which can support the improvement of accuracy in further classifications.

V. EXPERIMENT AND ANALYSIS
In this section, following the introduction of our experiment design based on two public datasets, a series of evaluations are conducted to compare and demonstrate the usefulness and effectiveness of our proposed model for data augmentation when facing imbalanced data.

A. Dataset and Experiment Design
Two open source datasets: NSL-KDD and UNSW-NB15, which are widely used in IoT environments, are employed to conduct the experiment, and evaluate the performance of our model in dealing with distribution bias in data augmentation for imbalanced learning.  (2) for discriminator updating 10: Update the shared weight θ C , and C( * ) by (5)  11: Update D( * ) by (9) Table I shows the statistics of each type of attacks in NSL-KDD. It is noted that DoS is the most common attack type with the highest number, accounting for 35.95% of the total, while Probe, U2L, and U2R are the less common types, accounting for 9.48%, 2.52%, and 0.17% respectively.
UNSW-NB15 Dataset was published by the University of New South Wales in 2015, which contains nine basic types of attacks. As the statistics shown in Table II, some majority classes such as generic and exploits account for 22.85% and 17.28%, respectively, while other minority classes such as Shellcode and Worms only account for 0.59% and 0.07%,  respectively. Compared with NSL-KDD, UNSW-NB15 is a large but extremely imbalanced dataset, which contains more smaller minority classes.
1) SMOTE: It generates the synthetic data based on the k nearest neighbors of each sample, which uses oversampling for minority classes and down-sampling for majority classes in data augmentation. 2) ADASYN: It is a sampling approach which considers the distribution when generating minority data, aiming to reduce the learning bias from the original imbalanced data and shift the classification decision boundary for those classes that are harder to learn. 3) DGM-SPOCU: It is a conditional GAN to generate synthetic samples for minority classes, which uses the KL divergence to ensure the model that could generate data more similar to the original data space. 4) ECGAN: It is a GAN-based semisupervised learning method, which employs pseudolabeling to set class labels on the generated data using an external classifier for binary-class classification tasks. 5) MENGNETO: It uses the network traffic data, which is represented as 2-D images using 2-D data mapping techniques, to train a GAN-based data augmentation model in dealing with imbalance issue for intrusion detection on malicious traffic.  To demonstrate the performance of augmented data based on our DB-CGAN, when using different classifiers for anomaly detection, three classical algorithms including random forest (RF), support vector machine (SVM), and deep neural network (DNN) are adopted, and metrics including F1 score and false alarm rate (FAR) are used for comparison evaluations. All the experiments are conducted in a server with Ubuntu, GTX 1070, G39030 Duel Core, 16 G RAM, and Python 3.6.

B. Evaluation on Distribution Bias Alleviation in Data Augmentation
The visualization tool UMAP is employed to illustrate the effect of data augmentation in this experiment. Compared with the PCA-based visualization, it could demonstrate the trace of data generation process with the essential features and their distribution in 2-D/3-D manner. In addition, the Kernel density estimation (KDE) is used to estimate the original and augmented data in terms of their density functions. Figs. 2 and 3 demonstrate the comparisons among our DB-CGAN and other five baseline methods based on UMAP visualization using NSL-KDD and UNSW-NB15, respectively. Figs. 4 and 5 demonstrate the comparisons on the corresponding distribution of the original data and augmented data in NSL-KDD and UNSW-NB15, respectively. Comparing these two datasets, it is noted that NSL-KDD holds fewer number of minority classes with denser distributions, while UNSW-NB15 holds more number of minority classes with sparse distributions. Fig. 2(a) and (b), and Fig. 3(a) and (b), demonstrate the visualization results of data augmentation based on two sampling methods: SMOTE and ADASYN, respectively. It is obvious that the generated attack data obviously form some trajectories, and with a few marginal points distributed evenly. This is because both the SMOTE and ADASYN methods consider to find the K  nearest samples of a given sample data in a minority class, then randomly generate the new sample between them, which might easily cause the marginalization problem in the distribution of the generated data. On the other hand, as their distributions shown in Fig. 4(a) and (b), and Fig. 5(a) and (b), respectively, both of these two sampling methods fall into the phenomenon of variance shifting, which means the variance of their augmented data becomes smaller. These results indicate that the traditional resampling methods are not suitable for training imbalanced learning models. Fig. 2(c)-(e) and Fig. 3(c)-(e), demonstrate the visualization results of data augmentation based on three state-ofthe-art GAN-based methods: DGM-SPOCU, ECGAN, and MENGNETO, respectively. It is easy to find the clustering phenomenon from results of DGM-SPOCU and MENGNETO, compared with that of ECGAN, which means ECGAN does not perform well in dealing with multiclass classification tasks. However, clusters visualized by DGM-SPOCU and MENGNETO in Fig. 2(c) and (d) have obvious overlaps, or even become relatively small in Fig. 3(c) and (d). This means both DGM-SPOCU and MENGNETO cannot effectively distinguish those identified minority classes, or even become difficult in generating them when the data in each minority class is extremely imbalanced. Furthermore, according to their distribution results shown in Fig. 4(c)-(e), and Fig. 5(c)-(e), respectively, in addition to the phenomenon of variance shifting, ECGAN and MENGNETO also fall into the phenomenon of mean shifting (e.g., in Fig. 4(d), (e), and Fig. 5(e)), which indicates the difficulty when using conventional GAN-based methods to deal with the problem of distribution bias.
Compared with the five baseline methods, in general, our proposed DB-CGAN demonstrates the better performance in both  data augmentation and distribution bias alleviation. As shown in Fig. 2(f), our DB-CGAN can significantly distinguish and generate sufficient data in each minority class. Comparing the results shown in Fig. 3, our model demonstrates more superior performance on generating more data for each minority class when facing extremely imbalanced dataset. Especially, as shown in Figs. 4(f) and 5(f), distributions of the data generated by our DB-CGAN can closely approximate those of the original data. These results indicate that our proposed model can effectively generate high-quality data in each minority class because of the distribution bias correction for data augmentation in imbalanced learning.

C. Evaluation on Anomaly Detection Based on Augmented Data
We go further to investigate the efficiency of the augmented data in anomaly detection based on three classic classification algorithms: RF, SVM, and DNN, among our DB-CGAN and other five baseline methods, including two sampling methods and three GAN-based methods. As mentioned above, the usually used metrics F1 and FAR are employed to evaluate the performance of these six methods in dealing with multiclass classification tasks. Results based on the two different datasets are shown in Tables III and IV, respectively. Table III, it is observed that the five baseline methods achieve relatively good results in "DoS" and "Probe" based on all three classification algorithms, but result in poor performance for "R2L" and "U2R," which suffer from high imbalance ratios. In contrast, our DB-CGAN achieves a higher F1 sore with a lower FAR value among all the methods when dealing with the more imbalanced classes: "R2L" and "U2R." Our model performs 81.58% at F1 with 4.01% at FAR, while the best result among other five baseline methods in such cases is only 74.10%. These results show the effectiveness of our proposed model in identifying minority classes and generating the appropriate data by correcting the distribution bias in imbalanced learning. Table IV demonstrates the performance of all the methods in an extremely imbalanced dataset. Overall, our DB-CGAN still outperforms the other five baseline methods. In particular, since UNSW-NB15 has more minority classes with more severe imbalance ratios than NSL-KDD, the five baseline methods can only achieve normal results in majority classes: "Generic" and "Exploits," but result in extremely poor performance in other seven minority classes. These results not only demonstrate the clear advantage of our proposed model in learning features of minority classes from extremely imbalanced dataset, but also indicate the significance of correcting distribution bias for imbalanced learning tasks.

VI. CONCLUSION
In this article, we proposed an integrated deep learning model, called DB-CGAN, to realize the robust data augmentation during the so-called collaborative adversarial training, which could effectively improve the classification accuracy based on the correction of distribution bias between the generated data and original data, for imbalanced learning in industrial IoT applications.
We first presented the design of a GAN-based data augmentation framework, in which a complementary classifier was involved and connected with the basic GAN structure, so as to realize two collaborative adversarial training between data generator and feature discriminator, and data generator and feature classifier, respectively. A conditional generator with random labels was then constructed and trained adversarially with the complementary classifier, which could effectively enhance the augmentation of the number of data samples in minority classes during a supervised training process. Importantly, a core sharing scheme in this model was newly developed to enable the collaborative adversarial training among generator, discriminator, and classifier, which shared feature weights between the GAN-based feature generator and original feature extractor, thus facilitated the alleviation of distribution bias between the generated data and original data. An augmentation algorithm was finally developed for intelligent anomaly detection in imbalanced learning, which could effectively improve the classification accuracy based on the correction of distribution bias using the rebalanced data. Experiments were conducted using two different real-world imbalanced datasets: NSL-KDD and UNSW-NB15. Compared with five baseline methods, including two classic sampling methods and three state-of-the-art GAN-based methods, evaluation results based on three classical algorithms, including RF, SVM, and DNN, demonstrated the outstanding augmentation performance of our DB-CGAN in generating high-quality data in minority classes, which could significantly improve the accuracy in multiclass classification tasks, especially when dealing with the extremely imbalanced dataset, due to the effective alleviation of distribution bias in imbalanced learning.
In future studies, evaluations will be conducted to improve our algorithm with more efficient deep learning schemes. Our model will be implemented in more complex industrial IoT environments to enhance the augmentation performance for imbalanced learning.