LSTM Neural Networks With Attention Mechanisms for Accelerated Prediction of Charge Density at Onset Condition of DC Corona Discharge

The onset process of corona discharge is naturally nonlinear and dynamic. The conventionally physical-based onset model and numerical computation of onset charge distribution are hampered by the computational power and given time. Here, in order to efficiently model this highly nonlinear dynamic process, a long short-term memory (LSTM) neural networks with attention mechanisms is proposed for accelerated charge density prediction under different atmospheric conditions, which adaptively choose charge-related input variables at each time step and hidden states relating to charge density all time steps. Our results demonstrate that this well trained model could make instant predictions with high accuracy under given target atmospheric conditions. Results show that the proposed model substantially reduces the computing time compared to physical-based methods. This work provides insights into applying LSTM neural networks to the charge density prediction of other discharge modes as well.


I. INTRODUCTION
Corona discharge is a nonequilibrium electrical discharge in the non-uniform electric field caused by the sharp electrodes. Corona discharge plays a vital role in a wide range of industrial applications, such as electrostatic precipitator [1], plasma reactor [2], ionic wind [3], plasma propulsion engine [4], or plasma spectroscopy [5]. The corona discharge commonly appears or forms when the surface electric field of the electrode reaches the onset electric field, which is commonly realized by the rod-plane electrode or the coaxial cylindrical electrode [6], [7], [8].
The associate editor coordinating the review of this manuscript and approving it for publication was Ilaria Boscolo Galazzo .
In recent years, to reduce or diminish the energy consumption of corona devices or suppress corona loss of high-voltage power apparatuses, the manipulation of corona onset voltage has attracted immense attention [9], [10], [11]. Predicting the onset characteristics and conditions through making use of the onset phenomenon becomes important for scientific research and industrial applications. In recent decades, there has been an extensive interest to investigate the onset electric field and physical-based onset models [12], [13], [14], [15]. These onset electric field formulas are approximately empirical expressions and derived from the experimental measurements, such as well-known Peek's formula [16]. However, these empirical formulas cannot capture the variation of the onset field with surrounding atmospheric condition VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ FIGURE 1. Illustration of the attention-based LSTM neural networks. The attention weight α k t is computed by the spatial attention (U and W are parameters), softmax layer and hidden state h t −1 . And then the new inputx t is fed into the encoder LSTM cell. The attention weightα k t is computed by the temporal attention (Ũ andW are parameters), decoder hidden stateh t −1 and softmax layer. The context vectorc t −1 is a sum of attention weights multiplied by decoder hidden state and fed to decoder LSTM cell. Theỹ T is the prediction. and electrode radius. The physical-based onset models are deduced from the avalanche growth, self-sustained mechanism and photoionization. Although the extensive computational efforts and some progress had been made, these methods were time-consuming and intense computing resources. The prediction of charge density properties was a highly nonlinear and complex problem. In addition, the atmospheric conditions (i.e., humidity, temperature, and pressure) have a great influence on corona onset. There are still many obstacles on the way to accurately predict the charge density during the onset process under different atmospheric conditions.
Machine learning, especially deep learning, has proven to be ubiquitous in a wide range of practical applications ranging from spoken words recognition to modern scientific research [17], [18]. Nowadays, researchers in the plasma or corona discipline are more willing to accumulate data with combinatorial experiments [19], [20] and physical modelbased computations [21], [22]. Machine learning algorithms were deployed to find the potentially valuable pattern of plasma and predict the corona characteristics from these data [23], [24], [25], [26]. Some recent advancements applied neural networks to plasma discharge, such as plasma catalysis [27], [28], discharge process condition sensing [29], plasma chemical reactions [30], etc. Long short-term memory (LSTM) is a kind of recurrent neural network architecture, which is successfully applicable to nuclear plasma and partial discharge of apparatus. LSTM was used for controlling of plasma placement in Tokamak [31], density limit disruption prediction of Experimental Advanced Superconducting Tokamak [32], recognition of anomalous patterns of discharge in nuclear fusion [33], and plasma confinement mode classification [34]. In addition, LSTM had been applied in partial discharge detection and pattern recognition of insulated overhead conductors and gas insulated switchgear [35], [36], [37]. Recent works have achieved significant improvements in predictions of discharge through neural networks. However, there are still few studies focusing on the prediction of onset characteristics under different atmospheric conditions and the application constraint of recurrent models to onset of corona discharge remains.
The main contributions of this article are as follows: (1) Propose a long short-term memory (LSTM) neural networks with spatial and temporal attentions to model the charge density behaviors of DC corona discharge during the primary avalanche growth. (2) Propose the Hausdorff metric to measure the similarity between the predicted distribution of charge density and the numerical calculation distribution of charge density. (3) Analyze the spatial and temporal attentions and effectiveness of proposed model, which is compared with other traditional sequential models. (4) Analyze the impacts of temperature, absolute humidity, and pressure on charge distribution of corona discharge, which aims at the reliable prediction of atmospheric impacts.

II. LSTM BASED AUTOENCODER ARCHITECTURE AND SPATIOTEMPORAL ATTENTION A. MODEL ARCHITECTURE
The overall structure of the proposed method is the autoencoder architecture. The autoencoder architecture, including the encoder and decoder parts, is a competitive neural sequence transduction model and commonly proposed to address the sequential issue [38]. In the encoder framework, the encoder is built by stacked RNN, gated recurrent neural networks or LSTM. The input sequence is efficiently coded into the hidden state (dimensional vector) which encapsulating the useful information of input sequence. Then, the decoder maps the encoding to the output sequence. The decoder is also built by RNN or RNN variants. The length of input sequence may differ from the output sequence. For our study, Figure 1 presents the graphical illustration of the proposed attention-based LSTM neural networks for the sequence-to-sequence prediction, which belongs to the encoder-decoder architecture. To resolve the issue of different length between input and output sequences, the fixed-length context vector is produced from the hidden state and mapping function. In addition, when the length of input sequence increases, the performance of encoder-decoder neural networks deteriorates rapidly since it makes predictions based on the information of input and target sequences. To address this issue, the attention mechanism could be used for choosing the relevant context vector across all the time steps, which computes the representations of the input and output sequences.
As shown in Figure 1, in the encoder, the spatial attention is used to adaptively select the relevant input data related to charge density by referring to the hidden state at the previous time step. In the decoder, the temporal attention is used to automatically choose hidden states of the encoder related to the charge prediction across all time steps. Through these attentions, the LSTM neural networks could learn the longterm temporal dependency more effectively.

B. LSTM NEURAL NETWORKS
In the paper, the LSTM neural networks is used to construct the encoder-decoder framework. In each LSTM unit, a memory cell is used to determine what information could be passed or stored from input series by the input gate I t , forget gate F t , and output gate O t . The long-term dependencies of the series are captured by the LSTM unit. The hidden state and update of the LSTM unit could be expressed as follows where C t , h t and s t donate the intermediate state, hidden state and memory cell state of LSTM at time t, respectively; x t ∈ R n represents the input series at time t and n is the number of input series; W f , W i , W o and W c represent the parameter matrices to be learned in LSTM; b f , b i , b o and b s represent the intercept parameters to be learned in LSTM; σ and tanh represent the sigmoid activation function and hyperbolic tangent function, respectively; represents the element wise multiplication.

C. SPATIAL AND TEMPORAL ATTENTIONS
The key challenge in charge prediction task (sequence transduction) is to learn the long-term dependencies. To improve the prediction performance for long sequence, the spatial and temporal attentions are used to learn the long-term dependencies.
For the given input time series x k = x k 1 , x k 2 , · · · , x k T , the spatial attention mechanism could be constructed by the hidden state and memory cell state (at last time step t − 1) of encoder LSTM. The spatial attention includes the multi-layer perceptron networks and softmax layer: where e k t represents the output value of spatial attention, which measures the importance of input feature at time step t; v k e , W es , W eh , U e and b k e represent the parameters to be learned; α k t represents the output value of softmax layer.
whereẽ d t represents the output value of temporal attention; v d e , Wẽs, W˜eh, Uẽ and b d e represent the parameters to be learned;α d t represents the output value of softmax layer. The final prediction output is where c t =

D. EVALUATION MERICS
Three evaluation metrics (mean absolute error, root mean squared error and mean absolute percentage error) are used to measure the performance of onset characteristic prediction of different models. Specifically, the mean absolute error (MAE) is a scale-dependent metric and expressed as The root mean squared error (RMSE) is a scale-dependent metric and expressed as The mean absolute percentage error (MAPE) is not a scaledependent metric and given by VOLUME 10, 2022 where y i t andỹ i t represent the label and predicted values at time t, respectively. N represents the total number of samples.

III. EXPERIMENTAL IMPLEMENTATION DETAILS A. TRAINING AND TESTING DATASETS
The training and testing datasets are generated and collected from numerical computations and previous literature data [11], [12], [13], [14], [15], [16]. In our experiment, the number of avalanche charge is considered as the input variable. The 30000 labeled samples (number of primary avalanche and charge density) are collected during primary avalanche progression. The first 20000 samples are considered as training data. The following 2000 samples are used for validation data and the remaining 8000 are considered as the testing data. The input data is normalized to 0 to 1.

B. HARDWARE AND PARAMETER SETTINGS
The training and testing experiments are mainly implemented on a computing platform (NVIDIA RTX3090 GPUs with 24GB memory, Intel Core i9 10900K CPU @ 3.7 GHz and 32GB RAM), open-source PyTorch framework library, deep neural network library of NVIDIA compute unified device architecture (CUDA) and NVIDIA CUDA toolkit. In the comparison analysis experiments, the maximum 20 CPU threads are allocated for other non-neural network models. In addition, the NVIDIA compute unified device architecture (CUDA) toolkit and CUDA deep neural network (CUDNN) are used to accelerate the GPU performance. The PyTorch libraries, including numpy, matplotlib, sklearn, skimage, shutil, random, os, json, pandas, are used for the post hoc data processing and result visualization in our custom codes.
Firstly, in our LSTM model, there exist three main parameters to be optimized, including the numbers of hidden state in encoder and decoder parts and the number of time step of window. The grid search technique is used to determine the parameters which could achieve the best performance in the validation data. The number of time step of window is 10. The numbers of hidden states for encoder and decoder are all 64. In addition, the optimal minibatch size is 30 and the initial learning rate is set to 0.01. The number of epochs is determined as 60. In summary, our work employs the crossvalidation procedure, learning-rate scheduler and automated early stopping criterion for training our proposed models to combat overfitting. Figure 2 (a) presents the prediction results and prediction residuals of charge density distribution with avalanche length of the rod-plane electrode under a specific atmospheric condition (the radius of the rod is 0.1 cm, temperature of 20 • C, pressure of 1 atm, relative humidity of 20 %; the scale type of the ordinate is linear). It is found that the predicted charge density distribution could fit the target curve very well. The statistical tests, such as error histogram and quantile-quantile plot, are used to validate the performance of proposed model. Figure 2 (b) and (c) present the error histogram and quantile-quantile plot of the prediction of charge density for the testing data. These statistical tests indicate that the prediction errors approximately obey Gaussian distribution. This quantile-quantile plot compares sample data on the vertical axis to normal distribution data on the horizontal axis, which suggests that the sample data are normal distribution.

IV. PREDICTION RESULTS
To compare the prediction performance, the auto regressive integrated moving average (ARIMA), ARIMA model with external input (ARIMAX), recurrent neural networks (RNN), LSTM and the proposed model are used for prediction of charge density. The mean average error (MAE), root mean squared error (RMSE) and mean absolute percentage error (MAPE) are used as metrics of accuracy and these definitions are given in the supplementary material. TABLE 1 presents MAE, RMSE and MAPE of charge density prediction of negative glow corona discharge under temperature of 20 • C, pressure of 1 atm and relative humidity from 1 % to 52.5 %. We conduct the test 10 times for each model and present their average metrics and standard deviations in TABLE 1. It is observed that the performance of ARIMA and ARI-MAX are lower than neural networks. Since ARIMA only considers the target series at previous time and ARIMAX is a linear model. The RNN could capture the nonlinear temporal relationships, and thus it outperforms ARIMAX. The LSTM neural networks could use a memory cell and activation function to learn long-term dependencies of the data for charge prediction. In addition, the additional comparative Our proposed model achieves the best performance among the five methods since it could extract the relevant information by spatial attention and then uses the temporal attention to choose the relevant information across all time steps.  In Figure 3 (a)-(c), we present the predicted and numerically calculated charge density distribution with primary avalanche length of rod-plane electrode under different atmospheric conditions and the onset condition of negative glow corona. The solid lines represent the prediction results of the proposed model, and the dotted lines represent the numerical calculation results of physical-based onset model (the scale type of the ordinate is log10). For visual comparison, it is observed that the proposed model is in good match with the numerical calculations. As increasing the humidity, temperature and pressure, the predictions of charge density augment at the onset condition [24]. The predictive performance of our model is further test quantitatively by calculating curve similarity. Here the hausdorff distance (also called Hausdorff metric) is used to measure the similarity between two sets, which is a nonlinear operator and quantifies how the predicted curve resembles the reference curve [39]. In Figure 3 (a)-(c), the prediction results of the proposed model are the predicted curves, and the numerical calculation results are the reference curves.
In Figure 3 (d)-(f), we present the heat maps to depict the curve similarity of predicted and numerically calculated charge density of negative glow corona under different atmospheric conditions. Here, the numbers in the heat maps represent the value of hausdorff distance. The small distance means a large similarity. If every points of prediction set are close to points of numerical calculation set, the small hausdorff distance value between two kinds of set is obtained. For the new atmospheric condition, the well-trained model takes approximate two seconds to provide the predictions, while the numerical calculation takes 25 minutes. The closeness of machine learning (ML)-based predictions with numerical calculations indicates that the model could instantly predict the charge density with reasonable accuracy.
For the onset condition of positive glow corona, Figure 4 (a)-(c) presents the comparisons of predicted and numerically calculated charge density under different atmospheric conditions (the scale type of the ordinate is log10). Figure 4 (d)-(f) are the heat maps to depict the curve similarity of predicted and numerically calculated charge density of positive glow corona under different atmospheric conditions. The interactive influence of temperature, relative humidity and pressure on the charge density at onset condition has been quantified. As evinced by the high consistency between the ML predictions and numerical results in Figure 3 and 4, the charge density could be satisfactorily assessed under different atmospheric conditions by the proposed LSTM model with attention mechanisms. The stable corona may appear in glow or streamer mode depending on the electric field and avalanche growth. Similarly, this LSTM model is also applicable to the prediction of charge density at onset condition of streamer corona discharge.
Our results demonstrate that when trained and implemented under the same practices, our proposed model could substantially outperform the traditional time series prediction methods, despite being more complex in our model. In addition, the observations of substantial improvement in prediction performance benefit from the spatial and temporal attentions mechanism when compared with LSTM and RNN. It is attributed to the spatial attention mechanism which could adaptively choose the relevant input series and temporal attention mechanism which could capture relevant long-range encoder hidden states. Note, there may be other ways to potentially improve the predicted performance, such as, using an advanced parameter tuning method or a newer autoencoder architecture. The LSTM model with spatial and temporal attentions is strongly recommended to make prediction based on the spatiotemporal information of avalanche charge. Notably, the capability of attention mechanism to select relevant series in the proposed model does not imply that this model will necessarily perform better for all sequential tasks. In the future, the further robustness work of prediction is recommended, such as, prediction by evaluating on another dataset.

V. CONCLUSION
In summary, a LSTM model with attention mechanisms is introduced for the accelerated prediction of charge density at onset condition of DC corona discharge since the conventionally physical-based onset models are hampered by the computational power and given time. We provide how generated and curated necessary data could be used to train the proposed model. Our results present that the proposed model performs significantly better for the predictive tasks of charge density than ARIMA, ARIMAX, RNN, and LSTM. Notably, our findings also highlight the use of the trained proposed model to successfully and rapidly make predictions with high accuracy for specific atmospheric conditions. Our study motivates future LSTM neural networks with attention mechanisms as a complementary method to numerical calculation and experimental efforts in accelerated prediction for different atmospheric conditions.