Deep Generative Models in the Industrial Internet of Things: A Survey

—Advances in communication technologies and artiﬁcial intelligence are accelerating the paradigm of industrial Internet of Things (IIoT). With IIoT enabling continuous integration of sensors and controllers with the network, intelligent analysis of the generated Big Data is a critical requirement. Although IIoT is considered a subset of IoT, it has its own peculiarities in terms of higher levels of safety, security, and low-latency communication in an environment of critical real-time operations. Under these circumstances, discriminative deep learning (DL) algorithms are unsuitable due to their need for large amounts of labeled and balanced training data, uncertainty of inputs, etc. To overcome these issues, researchers have started using deep generative models (DGMs), which combine the ﬂexibility of DL with the inference power of probabilistic modeling. In this article, we review the state of the art of DGMs and their applicability to IIoT, classifying the reviewed works into the IIoT application areas of anomaly detection, trust-boundary protection, network trafﬁc prediction, and platform monitoring. Following an analysis of existing IIoT DGM implementations, we identify challenges (i.e., weak discriminative capability, insufﬁcient interpretability, lack of generalization ability, generated data vulnerability, privacy concern, and data complexity) that need to be investigated in order to accelerate the adoption of DGMs in IIoT and also propose some potential research directions.

that are deployed to achieve high production rates with reduced operational costs through real-time monitoring, efficient management, and controlling of industrial processes, assets, and operational time [1]. IIoT is a subset of IoT which needs higher levels of safety, security, and reliable communication while considering real-time industrial operations and critical industrial environment. Moreover, IIoT pays attention to efficient management of industrial assets and operations along with predictive maintenance.
The recent breakthroughs in deep learning (DL) and hardware design empower many IIoT applications. DL offers advantages over traditional machine learning (ML) methods due to three characteristics: 1) generalizing the complicated relationship (such as temporal and spatial dependencies) of massive data collected from IIoT settings; 2) making good use of the massive data resource in IIoT since DL relies on Big Data for powerful training; and 3) automatically extracting effective features from IIoT data without laborious feature specification.
However, there are still a number of open challenges toward successfully implementing DL in IIoT networks and obtaining practical and reliable results.
1) Imbalanced datasets: The assumption of an abundance of both positive and negative samples does not hold in IIoT as much of a mechanical system's lifetime is in a normal state, with a short duration of faulty states. With mechanical components typically replaced or refurbished before reaching "end of life," manufacturing datasets typically have a skewed distribution, with the number of negative samples (normal state) outweighing the positive samples (faulty state) [2], [3]. 2) Limited labeled data: Diverse operating conditions and fault modes for sensor data mean that obtaining labeled data is expensive and not always attainable [4], with 80% of the IIoT data being unlabeled [5]. 3) Domain adaptation: While DL-enabled transfer learning has addressed domain adaptation, it is limited by its assumption of the source and target domains having the same input and output spaces. IIoT settings feature different input sensor signals and different sets of output labels [e.g., fault type, remaining useful life (RUL) range, etc.] across different machines [6].  and configuration of security measures, as a secured architecture based on segregation is more difficult [7]. While the deterministic industrial processes result in regular network patterns that facilitate intrusion detection systems (IDS), IIoT networks need more vantage points as traffic does not flow through one central point [7]. Moreover, in critical industrial scenarios where the entire training dataset cannot be disclosed or exchanged with a central server or other agents, federated learning [8] needs to be investigated, in which interconnected devices jointly refine the model parameters in a privacy-preserving manner [9]. 5) Low-latency communication: Traditional optimization methods for cellular communications, which require exact models and assume stationary wireless fading channels, are difficult to apply in the dynamic IIoT environment due to the many synchronized processes in industrial settings and diverse quality of service (QoS) requirements [10], [11]. The above findings are summarized in Table I, which highlights the differences between IoT and IIoT along different aspects. Discriminative techniques used in traditional DL techniques, such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), or long short-term memory (LSTM), draw the decision boundary in data space [12]. They provide excellent performance but need large labeled and representative datasets, as they require pretraining to provide precise outputs from the learning process [4]. The lack of representative data and imbalanced datasets may make directly learning a target intractable, via discriminative techniques [4], [12]. Data generation and data augmentation have been proposed as possible solutions to mitigate this risk [6]. Deep generative models (DGMs), which can approximate and generate a joint distribution of the target and training data, to generate samples similar to the real data, which also have physical process plausibility, are thus being leveraged in IIoT applications [6], [13]. DGMs create a probability distribution similar to the original by learning a high number of correlations, as opposed to discriminative techniques that simply label instances to their most probable classes. On one hand, learning the distribution of the data (generative classifiers) could potentially provide better performance than boundaries (discriminative classifiers); however, it is not always possible to infer the real distribution of the data, and, sometimes, it is approximated by a normal distribution. Hence, the outliers affect the performance of the generative models. On the other hand, the supervised learning nature of discriminative models means that they tend not to generalize well and can be prone to overfitting if insufficient data is available [6].
DGMs not only have DL's aforementioned benefits (feature extraction and relationship representation), but their data generation ability can aid prediction tasks in some IIoT applications where the collected IIoT data suffers from low usability caused by data incompleteness, (partially) unlabeled data, insufficient quantity, noisy data, etc. DGMs are also finding use in addressing the IIoT transfer learning or domain adaptation challenge, through adversarial approaches where a separate discriminator is used to align the distributions, to mitigate the domain difference [26]. In other words, DGMs can integrate the flexibility of DL and the inference power of probabilistic modeling, model the underlying distribution of the real data, and generate realistic "real" data in an unsupervised manner. They can be used as an upfront layer in a stacked network, providing classified data to subsequent discriminative models (e.g., to a subsequent RNN) to process the massive IIoT sequential data. The motivation of this survey arises from these aforementioned aspects of DGMs and their applicability to the domain of IIoT.
Since DGMs and deep neural networks (DNNs) are not mutually exclusive, we have studied existing surveys on both topics as well as those on IIoT. There are some recent studies that analyze the theoretical and implementation concepts of DGMs [12], [27], reviewing early DGM models such as Boltzmann machines, Gaussian mixture and hidden Markov models, autoencoders, and their variants. Pan et al. [28] presented a review of the generative adversarial networks (GANs) category of DGMs and detailed GAN variants from an architectural and objective function-based viewpoint. Similarly, other reviews on GANs focus on specific fields such as computer vision [29] or spatiotemporal data [30]. Emerging applications of DL algorithms such as CNN, vanilla autoencoders, restricted Boltzmann machines, and GANs in the IIoT are presented in [31]. Security aspects of IIoT are also the focus of recent studies, with [7] focusing on security challenges in IIoT and [32] on differential privacy applications.
Although these existing works review conventional DGMs, GANs, DL-based IIoT applications, and the security issues of IIoT, there is no work that reviews the applications of DGMs in the IIoT domain. Therefore, this survey mainly focuses on comprehensively reviewing the applications of DGMs in the IIoT domain. The comparison between existing related surveys and this one is explicitly shown in Table II.
The rest of this article is organized as follows. We review the state-of-the-art DGMs in Section II. We analyze their applications in IIoT scenarios in Section III. We identify existing challenges with respect to their applicability and present some research promising directions for solving the corresponding challenges in Section IV, which concludes this article.

A. Overview of DGMs
DGMs fall under the class of unsupervised ML algorithms, which aim to extract meaningful concepts from raw data. DGMs enable an approximation of the distribution of the data through conditional density estimation, where the characteristics of the probabilistic generative models enable the uncertainty of data to be captured. They can be thought of as the combination of DL with graphical models. DNNs are based only on point estimates and make deterministic predictions, given some feature vectors. Most works on DNNs do not pay much attention to the complexity of these models. Probabilistic models, on the other hand, are mainly conjugate and linear models and can be said to have a simplicity bias. The basis of DGMs is to have the simplest hypothesis that can explain the data, is tractable, can compute expectations, and remove biases for underfitting/overfitting. By combining the probability distribution view of the dataset with a generative iterative process or Markov chains, DGMs can offer a unified framework for model building, inference, prediction, and decision-making. They are also robust to overfitting and have explicit accounting for uncertainty of data and variability of outcomes.
As a result, DGMs are seeing widespread adoption in many industries, such as those involving computer vision based automation, e.g., image generation/compression and super resolution and object detection within images with relevant bounding boxes (to enable self-driving cars); generating synthetic data to accelerate scientific experiments; and designing new experiments for particle physics or drug discovery. In the following sections, we present the three categories of DGMs, i.e., 1) autoregressive models (ARs), 2) variational autoencoder (VAE), and 3) GANs, outlining their architecture and popular variants with recent advances.

B. Autoregressive Models
An AR [33] is a specific regression model on a time series, in which a value from this time series is regressed on previous values from the same series. The first-order autoregression model, written as AR(1), can be presented as follows: where y t is a time-series variable y measured in time t, y t−1 is y measured in time t − 1, and t , β 0 , β 1 denote the assumed error and the parameters in a simple linear regression model, respectively. Similarly, the second-order autoregression model, called AR(2), would be in which the time-series variable's value at time t can be predicted from its values at times t − 1 and t − 2. More generally, a kth-order autoregression model, denoted as AR(k), will be a multiple linear regression where the time-series variable's value at any time t can be calculated by a linear function of the values at times t − 1, t − 2, . . . , t − k, as shown in the following equation: Neural autoregressive distribution estimation (NADE) [34] addresses the problem of modeling a joint distribution. For starters, the assumption is that the dimensions of x are binary (i.e., x d ∈ {0, 1}∀d). To estimate the D-dimensional distribution of p(x), NADE begins by making the observation that p(x) can be cast into a product of conditional 1-D distributions, in any order o (a permutation of the integers 1, . . . , D) where o d contains the first d − 1 dimensions in ordering o and x o <d is the corresponding subvector for these dimensions. Therefore, an "autoregressive" generative model of the data can be obtained simply by specifying a parameterization of all D conditionals p(x o d |x o <d ). In NADE, we can model each conditional using a feed-forward neural network (NN). Specifically, is parameterized as follows: where σ is the sigmoid function, H is the number of hidden units, and V ∈ R D×H , b ∈ R D , W ∈ R H×D , and c ∈ R H are the parameters of the NN model. Finally, NADE can be trained by maximum likelihood or, equivalently, by minimizing the average negative log-likelihood by stochastic (minibatch) gradient descent method, where N is the batch size. Derived AR models: PixelCNN [35] is an autoregressive generative model based on CNN, which models the conditional distribution of seen image pixel values, and a gated CNN is used to remember prior pixel values in this gated architecture. PixelCNN++ [36] is a modified version of PixelCNN, in which some tweaks, including downsampling, dropout, and skip-out connections, are used to obtain better performance results. Pix-elRNN [37] is also proposed in the same study by using 12 LSTM layers while adopting the convolutional approach in PixelCNN. PixelVAE [38] is proposed to combine the benefits of VAEs and PixelCNNs, in which a conditional PixelCNN can be exploited as the output of the VAE's decoder in PixelVAE.
To sum up, AR models are basic ARs applied on simple time-series data generation; NADE model integrates the idea of autoregression with the function of NNs to obtain a better generalization performance for any data type generation; PixelCNN is proposed by specifying the NN in NADE model using CNNs, PixelCNN++ is a modified PixelCNN by using helpful tweaks to get a better performance, PixelRNN is an RNN version, and PixelVAE is a VAE version of the NADE model, a series of which are leveraged to do image generation.

C. Variational Autoencoders
VAEs [39]- [41] are deep Bayesian networks using NN, specifically multilayer perceptrons. Hence, they can support complex data distributions with fast training via backpropagation. The goal of VAEs is to find the hidden/latent variables to simplify the generation task. For example, in a task of generating faces, the pose or the color of the eyes is not annotated (latent). An autoencoder is formed by two NNs. The first one is an encoder that codifies the input into a latent vector and the second one is a decoder that converts the latent vector into an output that replicates the input. To properly generate data, the autoencoders need to have a regularized latent space in which all the points in the latent space are meaningful. VAEs solve this issue by encoding the input not into latent points but into distributions over the latent space. Then, the distribution is sampled as points to feed the decoder (see Fig. 1). In this way, VAEs avoid overfitting and enhance the decoder to function as a generator of meaningful data.
Formally, the encoder is represented by q(z i = P (z i /x i , θ)). Its input is x i and its output is the distribution of the latent space Z. A sample of this distribution is the input of the decoder that computes P (x i /z i , θ). For every point X in the dataset, there exists at least one vector of latent variables z able to generate something similar to x with a deterministic function q(z; θ), parametrized by a vector θ. VAEs aim at optimizing θ such that z can be sampled from the probability density function of z, P (z), and q(z; θ) will generate x, using the law of total probability q(z; θ) = P (x|z; θ). Hence, the aim of VAEs is to maximize the probability of each X (P (x), the marginal) in the training set according to However, the distribution P (x/z) is intractable, especially in high-dimensional space. To minimize this function, VAEs use the decoder to find the z that reconstructs x (P (z|x; θ)). Hence, the problem is to find a tractable model distribution q(z) to approximate the true posterior P (z|x), via variational inference. To that end, VAEs need to reduce the diversion (asymmetric distance) between two probability distributions (q(z) and P (z|y)) with Kullback-Leibler divergence, KL where E q(z) represents the expectation of the distribution q(z).
Hence, the problem is to minimize the KL distance. Applying Bayes rule Rearranging (10) gives Equation (11) is a constant (ln P (x)), which is equal to a term we want to minimize (KL) and a term that it is the variational lower bound or evidence lower bound (ELBO). The ELBO is a lower bound on the probability of observing some data under a model, which is used as an optimization criterion for approximating the posterior distribution. So, the problem can be converted into maximizing the ELBO ln P (x,z) To reduce the complexity, P (x|z; θ) is often chosen as a Gaussian distribution N (μ, σ 2 ) with mean μ and covariance σ 2 as follows: P (x|z; θ) = N (x|f (z; θ), σ 2 * I), or Bernoulli distributions for binary data. Hence, (E q(z) P (x/z)) is the reconstruction error. Assuming a normal distribution, σ 2 is a diagonal matrix (a vector) and the encoder generates the mean and variance of z as vectors. To simplify the calculations, Kingma and Welling [39] proposed to reparametrize z, as z i = μ i (y) + i σ 2 i (y), where i ∼ N ( ; 0, 1) introduces noise to allow the generation of unseen data (see Fig. 1). This trick reduces variance in the gradients and, hence, allows the stochastic gradient descent [41] and error back-propagation [42] in NN. Therefore, VAEs learn the probabilistic generative model p θ (x|z) (decoder) as well as an approximated posterior distribution q φ (z|x) (encoder) by maximizing the ELBO VAEs allow building flexible models, but, due to the diagonal covariance, they cannot capture fine grained details as autoregressive networks do. VAEs have been used to recognize and generate complex data, mainly images [38], [43], such as handwritten digits [39], [42], [44], faces [39]- [41], [45], and house numbers [43], [46], to rotate or modify the light of an image [45], noisy images [42], and even to predict the next frame in a video [47].

D. Generative Adversarial Networks
In contrast to autoregressive and VAE models that are likelihood-based, GANs are likelihood-free generative models, which combine a generator and discriminator in the same network. First proposed by Goodfellow et al. [57] in 2014, GANs are based on game theory, with the generator G θ learning the data distribution via unsupervised learning, to create realistic adversarial samples, and the discriminator D φ (or the critic) classifying it as real or fake (simulated).
During learning, the generator and discriminator are updated alternatively. G θ is a directed latent variable model that generates samples x from z, where x denotes samples from input data or generator and z is the noise input. The discriminator function tries to distinguish samples from the real dataset and the generator by maximizing the objective (p data = p θ ) or minimizing D(G(z; θ g )) for generated samples from p z not from p data . The architecture of a GAN is shown in Fig. 2.
G θ minimizes a two-sample test objective (p data = p θ ), which is equivalent to minimizing 1 − D(G(z; θ g )), as D is a binary classifier. Thus, overall, GANs have a minimax learning objective The implementation of GANs can prove challenging due to their 1) unstable optimization procedure, where the generator and discriminator loss continues to oscillate without converging and 2) potential for mode collapse, with the generator producing one of a few types of samples over and over again. Some studies [28] have proposed for the discriminator to use the minibatch layer to reflect the diversity of the sample, to avoid mode collapse.
Derived GAN models: With the G and D networks being multilayer perceptrons in the original GAN model, various derived GAN architectures have been proposed in order to improve the performance in terms of data diversity, data quality, and more stable training [28]- [30]. The deep convolutional generative adversarial networks [58] apply CNN in the generator and critic, for better image feature extraction. Conditional GANs (CGANs) [59] seek to address mode collapse by introducing a conditional variable c, in both the generator and discriminator. This makes the input to the discriminator to be G(z|c) from the generator, with the real sample also derived from c. Other approaches to avoid mode collapse include those that combine the adversarial loss of GANs with the objective function of VAEs [60], by replacing the VAE decoder with the GAN generator, and Wasserstein GAN (WGAN) [61], where a new loss function derived from the Wasserstein distance is used and D is used to score data quality by estimating the Wasserstein metric between generated and original data distribution. For unsupervised image-to-image translation, CycleGAN [62] has been proposed to learn the mapping between an input image and an output image, where paired training data may not be available. Self-attention GAN (SAGAN) [63] includes selfattention layers in the G and D networks, allowing to learn global, long-range dependencies for generating images specially in multiclass image generation. D checks that features in distant parts of the image are consistent (e.g., the nose and ears are in the right place of the face). Your Local GAN (YLG) [64] enhances SAGAN by making the networks as sparse as possible for computational and statistical efficiency. YLG introduces a new local sparse attention layer that preserves the geometry and locality. Multiscale gradient-GAN [65] creates multiscale connections between G and D, which allows for the gradients to flow at multiple resolutions simultaneously. This enhances the adaptation to different datasets, which is uncommon in GANs due, in part, to instability during training because there is not enough overlap in real and fake distributions. Another approximation to improve the performance of GANs is to enhance the loss function. f -GAN [66] uses a more general notion of distance, the f -divergence, which includes Jenson-Shannon and total variation as distance metrics for training generative neural samplers. RealnessGAN [67] represents the concept of realness as a distribution rather than a single scalar (real or generated). Loss sensitive GAN [68] introduces a loss function to quantify the quality of generated samples, keeping the loss of the real sample smaller than that of a generated counterpart.
To address the federated learning challenge in privacypreserving scenarios, the authors in [69] and [70] have proposed distributed GANs, where a number of cooperating agents learn the GAN task in a decentralized manner without sharing their data with any central server or among themselves. The work in [69] is applied to a distributed IDS with the agents sharing the weights of their D models, while in brainstorming GAN [70], the GAN value function is modified to a brainstorming function to integrate the generated data points across neighboring agents.

III. APPLICATIONS OF DGMS IN INDUSTRIAL IOT
We survey some representative IIoT application domains to which deep generative methods have been applied and demonstrated notable performance improvement. They are in four domains, which include the following: 1) anomaly detection; 2) trust-boundary protection; 3) network traffic prediction; 4) platform monitoring. All references are summarized in Table III.

A. Anomaly Detection
Anomaly detection approaches aim to learn the system behavior under normal operating conditions to be able to identify later system states that are dissimilar. Both VAEs and GANs have been applied for anomaly detection by learning the induced distribution and subsequently asserting if a sample is part of the distribution by mapping it to the closest sample in the generated distribution. VAEs tackle this mapping by adapting an autoencoder learnt during the training [71]. A GAN-based anomaly detection algorithm for imbalanced industrial time series datasets has been proposed in [72], with an encoder-decoder-encoder structured G network with convolutional layers. Only normal samples with elaborately extracted features are used in model training. The model outputs anomaly scores comprising apparent and latent loss, with fault samples generating much higher anomaly scores.
Considering that large volumes of multidimensional data are generated in 6G IIoT, the authors in [14] designed an autoregressive exogenous model (ARX) for eliminating the noise in data for anomaly detection, and a multidimensional data relationship diagram is creatively used to characterize the spatiotemporal correlations among heterogeneous data. The authors in [73] applied CGANs to search for security anomalies, noting that the discriminator needs to be trained for more steps than the generator to ensure that their loss curves converge.

B. Trust-Boundary Protection
Trust-boundary techniques are applied in IIoT to segment the networks, with IIoT processes and data storage separated into different segments based on user access privilege [17]. The authors in [88] use the GAN model to generate adversarial samples for aiding the design of trust-boundary protection mechanism against adversarial attacks. However, the distribution of noisy inputs of this GAN model largely differs from real data distribution in IIoT networks.
Therefore, Hassan et al. [17] proposed a downsampler encoder-based cooperative data generator to ensure better extraction of real distribution of IIoT network data in attack models, which is updated and verified using a DNN discriminator to guarantee its robustness with the idea of GAN's adversarial training. In [18], they further presented an adaptive trustboundary protection mechanism for IIoT networks using DL feature extraction based semisupervised model, which avoids manual effort to update the attack databases and automatically learns the rapidly changing natures of unknown attack models by using unsupervised learning and unlabeled data from the wild.
The large number as well as the heterogeneity of devices and communication protocols contribute to the large attack surface problem in IIoT networks. Trust-boundary protection, thus, uses intrusion detection as a core technique to control access levels [17]. To this end, Deep-IFS [20] is a forensicsbased DL model to detect intrusions in IIoT traffic. Deep-IFS learns local representations using local gated recurrent unit (LocalGRU) and captures global representation using multihead attention (MHA). Two autoregressive units' utilization improves the robustness of Deep-IFS model for intrusion detection on IIoT traffic in fog environment. In [74], the authors use conditional VAEs to detect network intrusions, using the labels in the training data as an extra input in the decoder to improve the accuracy. DIGFuPAS [75] aims to increase the ability of IDS against adversarial attacks by using a WGAN to repetitively retrain classifiers from crafted network traffic flow. ARIES [76] is a multilayered IDS that integrates unsupervised GAN with supervised decision tree and support vector machine (SVM). The first layer classifies attacks such as denial of service, brute force, port scanning attacks, etc., in a supervised manner, with the second and third layers identifying packet and operating data abnormalities.
Special attention needs to be paid to the radio frequency (RF) fingerprinting protection. In IIoT, as there can be numerous devices transmitting their data over RF, they could be subject to attacks where a fake device supplants the identity of a genuine device and transmits malicious data. RF fingerprinting is used to verify the identity of a device based on imperfections of the transmitters, such as in-phase and quadrature signal (IQ) imbalance, amplifier nonlinearity, digital-to-analog converter nonlinearity, carrier frequency offsets, and oscillator drift. In [77], the authors use GAN first to generate malicious data that simulates an authorized device that exists. Then, they use a GAN again to overcome this vulnerability by overtraining the authorized device with generated data.

C. Network Traffic Prediction
End-to-end network traffic is an essential information for many network security and management functions in IIoT; so network traffic prediction is not a trivial issue. Cellular traffic optimization for meeting the low-latency requirements in IIoT scenarios is an open problem [10]. Moreover, QoS is sensitive to packet size distributions, packets' interarrival time, and channel fading. Motivated by this, Nie et al. [19] proposed an effective prediction mechanism using multitask learning architecture and an autoregressive unit which takes advantage of link loads as additional information to improve network traffic prediction accuracy. To address the issue of limited data samples in channel fading models, Liu et al. [11] applied the GAN model to learn the wireless channel distributions in a factory environment and schedule the controller to actuator downlink transmissions accordingly, while also taking into account nonstationary channel fading. GANs have also been employed in [78] to propose a wireless channel modeling framework, with the results offering a good approximation of a real wireless channel. The applicability of GANs in traffic classification scenarios with <20% labeled traffic flows has been demonstrated in [79], by finding representation features of raw traffic data into lower dimension feature space. GANs have also found application in demand aware resource allocation by network slicing in a 5G cellular environment [21], to meet the diverse QoS needs over a common physical infrastructure, where a GAN-powered deep distributional Q network has been proposed to approximate the action-value distribution.

D. Platform Monitoring
IIoT integration has been an ultimate growth factor for multinational companies, especially in oil and gas industry. To this end, Sonawane et al. [80] presented a multivariate regression model to predict the future production performance of oil wells based on monthly production time series data, which ensures that the owner of oil and gas can monitor the equipment at a fine granularity. Similarly, [23] and [24] used an AR on IIoT device data to forecast the value of oil production to help in detecting anomalous values and provide an idea about any flaws in the oil well. In [81], an AR is integrated with a DL model (LSTM) to realize artificially lift mechanisms like beam pumping, hydraulic pumping, electronic submersible pumping, and gas lifts, while making sure that the oil and gas production is predicted accurately. In addition, IIoT can also be used to control paste thickener [22], in which an NN-based model predictive control scheme is implemented over an IIoT platform with the help of an autoregression unit (attention RNN). The authors in [25] proposed an NN-based nonlinear autoregressive with external input (NARX) model to predict syngas heating value and hot flue gas temperature for monitoring a waste-to-energy plant by using data collected by IIoT. Moreover, in [82], an RNN-based VAE is used to detect motor fault by using motor vibration time-domain signals, while [83] uses VAE for fault detection in a hot strip mill process. There, the authors first extract quality-related latent variables using deep variational information bottleneck, which minimizes the mutual information between latent variables and observations while maximizing mutual information between latent variables and process quality.
Furthermore, with the help of local interpretable model agnostic-explanations (LIME) that could interpret the blackbox of the NN, [84] proposed a VAE-LIME model for interpreting the models forecasting the temperature of the hot metal produced by a blast furnace. In [85], the generative-discriminative model pair in GAN drives an active learning-based automatic labeling method of voltage dip sequences used for training a voltage dip classification system.
The data generation ability of GANs can address the problem of fault data unavailability and imbalanced datasets in manufacturing IIoTs and has, thus, found use in predictive maintenance functions. Behera et al. [3] proposed a novel prognostics system based on CGAN and deep gated recurrent unit (GRU) to generate multivariate fault instances for predicting the RUL of manufacturing components, while CGAN and Wasserstein CGAN (WCGAN) are benchmarked in [2] for generating synthetic faulty samples for trucks' air pressure systems (APS). Adversarial training with GANs to optimize both real and fault data for fault detection in gearboxes is proposed in [86], while GANs with CNN for fault detection in rotating machinery are applied in [87].

IV. CONCLUSION
In the following paragraphs, we highlight several challenges that need to be investigated in order to accelerate the adoption of DGMs in IIoT, open issues, and future research directions, and then conclude the article.
Limited Expressive Power: Although DGMs are promising approaches leveraged to do data generation and assist prediction tasks, their power is limited by the relatively fixed network architecture and stringent requirements on the input of DGMs.
However, the data collected from IIoT usually contain noise and abnormal data and have unexpected formats. Therefore, data preprocessing should be implemented on the collected data before feeding them into DGMs, and the architecture of DGMs would require redesigning in order to be applicable to specific scenarios.
Weak Discriminative Capability: The existing DGMs fail to achieve the expected performance on sophisticated structured probabilistic models and completed unsupervised tasks (e.g., mode collapse of GANs). Combining different kinds of DGMs is a promising direction to further improve the generation performance; semisupervised approaches can be taken into account to alleviate the side effect of completely unsupervised training.
Insufficient Interpretability: DGMs lack sufficient interpretability since the latent vector used for generation is hard to interpret; as a result, it is difficult to capture the semantic meaning of the generated data. More approaches to improve understanding of latent vectors should be a focus for the future research of DGMs. Such interpretable latent vectors will be able to do controllable generation owing to the understanding of semantic meaning of generated data.
Lack of Generalization Ability: The trained DGMs usually lack generalization ability since they can only generate data samples conforming to data in the training dataset but cannot generate new data samples which are dissimilar to that in the training dataset. For example, once a DGM is trained with training dataset containing cat and dog images, the trained DGM can only be used to generate cat/dog images but cannot generate bird images. However, in practice, it is difficult to collect a comprehensive training dataset, leading to DGMs' limitation of generalization ability. To this end, the idea of continual learning can be taken into account to improve the generalization ability of DGMs.
Generated Data Vulnerability: The data generated by DGMs are not as good as real data, making it possible to distinguish generated data from the real data collected from IIoT with the help of ML techniques. The idea of adversarial training can be leveraged to avoid detection from ML techniques when the generated data have already been trained to pass the corresponding detection models.
Privacy Concern: Large real-world datasets in IIoT applications are used by DGMs to generate IIoT data, which unavoidably raise many privacy concerns. Therefore, a privacy protection mechanism should be an indispensable component for designing a feasible privacy-preserving DGM in order to prevent privacy leakage as well as maintain the performance of IIoT data generation. There have been encouraging developments through distributed GANs [69], [70] in this direction; however, in the presence of unreliable wireless links and limited resources on IIoT devices, optimizing scheduling and bandwidth allocation are open issues in IIoT privacy-preserving federated learning.
Data Complexity: Data collected from IIoT are massive and come from multiple sources. On the one hand, massive IIoT data brings the challenge of time and structure complexity; so more time-efficient and lightweight DGM architectures should be designed to handle the voluminous input data as well as maintaining the performance of generation. On the other hand, DGMs should evolve to generate multisource data in IIoT while managing the possible conflicts between different data sources. Aligned to this is the issue of energy efficiency, considering that model performance optimization can quickly drain energy in low-powered IIoT devices [4], especially in decentralized cases as noted above, where the computation is done on the devices. A promising development in this direction is that of GAN-powered compressed sensing [89] that enables energy-constrained IIoT sensors to efficiently sense signals without requiring high-rate samplers, minimizing energy consumption. This needs to be supported with the development of models that can be trained to infer useful information from the compressed data directly without actually uncompressing it.
DGMs and specifically networks incorporating adversarial training have received much recent research attention, due to their ability to understand the underlying data distribution. As a result, DGMs have huge potential in IIoT scenarios. In this article, we presented the state-of-the-art DGMs for IIoT and detailed the different applications of DGM-based IIoT. We also outlined several outstanding research challenges and identified future directions. We believe that this survey will motivate IIoT and DGM researchers to further investigate this exciting research topic and develop more creative and computationally efficient DGMs for IIoT applications.