Machine Learning Assisted Prediction Algorithm of Atmospheric Ion Mobility

Atmosphere ion mobility is an important electrical parameter related to the filamentary ion flow field of high voltage direct current (HVDC) transmission lines and other characteristics of the glow and streamer corona discharge. Many machine learning (ML)-based algorithms have already been widely used in the prediction of air discharge. This article proposes a genetic algorithm (GA)-support vector regression (SVR) combined with kernel principal components analysis (PCA) to predict the ion mobility, involving dimensionality reduction, feature selection, parameter optimization of SVR. Kernel PCA could reduce the dimension of data and GA with adaptive probability parameters is employed to optimize the parameters of SVR model. An improved parallel-plates ion generator is employed to produce corona discharge and then measure the saturation ion current density and then obtain the training data and testing data of ion mobility. The prediction results show that the proposed algorithm outperforms the other methods in terms of mean relative error and mean squared error criteria. In addition, the parameters of model and data features have a major influence on the performance of the prediction algorithm. Based on the measured data and reference data, the prediction result obtained on ion mobility under different humidity shows a satisfactory generalization and effectiveness of proposed model for the prediction.


I. INTRODUCTION
Corona discharge occurs on the energized conductors of high voltage direct current (HVDC) power transmission lines. Compared to the narrow ionization region (plasma region) closing to the conductor, the drift region occupying the most inter-electrode gap became more concerned for the ion flow field of unipolar or bipolar lines [1], [2]. In order to calculate the space electric field profiles and ambient electromagnetic environment accurately, complex corona phenomenon have comprehensively been studied by the theoretical model [3]- [5]. In the filamentary ion flow field model, the atmosphere ion mobility is a crucial physical parameter, which refers to an average mobility for the various ionic species in the ion drift region. The atmospheric ion mobility is usually considered as a constant proportionality coefficient in the realm of The associate editor coordinating the review of this manuscript and approving it for publication was Canbing Li. electro-hydrodynamics and electromagnetism [6]- [8]. In fact, the mobility is a multivariable function and determined by many impacts, including the atmospheric composition, temperature, humidity, atmospheric pressure, atmospheric pollutants, etc.
In the previous literatures, it was well established that ion mobility had been measured under different electric field strength. The point-to-plane electrode had been used to generate the severe non-uniform electric field and the ion mobility in sulfur hexafluoride (SF 6 ) was obtained by the correlation between the voltage-current curve and the ion mobility [9], [10]. However, whether it can be applied to air at atmospheric pressure needs to be further verified. In addition, the drift tube method was used to measure mobility spectra of atmospheric ions. The temporal decay of the current and the ion drift time were monitored during the test [11], [12]. However, in the earlier studies, the measurement of ion mobility had mainly focused on SF 6 or ion mobility had been measured at low temperature. Ion mobility spectrometry was the most widely used approach to measure the ion mobility in the atmosphere; meanwhile, the ironic species could be identified [13]- [16]. However, the ion mobility spectrum was often operated at a temperature of hundreds of degrees Celsius which is inconsistent with the actual atmospheric temperature. In addition, parallel plate ion current generator was employed to measure the ion mobility and a calibration method was proposed to calibrate the ion mobility [17], [18]. However, computational prediction research on the atmospheric ion mobility and the effect of humidity impacts appear sparse in literature.
The correlation between the atmosphere conditions and atmospheric ion mobility had been researched by the ion mobility spectrometry. The major environmental impact factors were focused on temperature and atmosphere pressure. The involved charge carriers were organic macromolecules and their derivatizations, dimeric polymer, hydrates, etc. They changed with different atmosphere environment. In addition, the previous measurements of the ion mobility spectrometry were in the range of 100 • to 300 • , which was higher than normal temperature. The effect of the humidity on the ion mobility was mainly through the ion mobility mass spectrometry in the realm of chemistry. However, due to the water vapor, the spectrum of the ion mobility is dramatically changed and the selectivity and sensibility of the spectrometry apparatus decreased.
In the present paper, a new prediction algorithm of atmospheric ion mobility based on genetic algorithm (GA)support vector regression (SVR) combined with kernel principal component analysis (PCA) is proposed. The electric field features and charge number density features are extracted and regarded as input of SVR model. The parameters of SVR are optimized by GA. The training and testing data are from experimental method which is that an improved parallel-plates ion current generator is placed in the environment controlled chamber. Finally, through comparing the predictions with the reference data, prediction results obtained on ion mobility show a satisfactory generalization and effectiveness of proposed model for the prediction.

II. PREDICTION ALGORITHM BASED ON GA-SVR COMBINED WITH KERNEL PCA A. KERNEL PRINCAIPAL COMPONENTS ANALYSIS
For linear case, PCA is suitable for unsupervised dimensionality reduction of unlabeled data, especially for small samples. The kernel PCA is used to preprocess the non-linear and high-order correlation datasets [19]. The idea of kernel PCA is that a non-linear mapping function is introduced to map the raw data to the high-dimensional space. In this high-dimensional space, the linearly inseparable data become linear separable, and then the PCA algorithm can be used. Kernel PCA is a widespread unsupervised feature extraction algorithm and used for dimensionality reduction. The detailed kernel PCA algorithm is presented in Algorithm 1.

Algorithm 1 Kernel PCA
Input: Given the feature set X = {x 1 , . . . . ., 1. Given kernel function k (x, y) = exp − x − y 2 γ 2 , γ is the kernel parameter, and kernel matrix is 6. Compute the principle components where C is a regularization parameter which represents the trade-off of large margin and noise tolerance; ξ donates the slack variable; Feature vector x i is mapped into a higher dimensional space by function ψ (x i ); w is the vector variable.; ε is the insensitive loss function. The solution of SVR is quadratic programming problem. Solving SVR can be changed to solving dual problem by Lagrange multipliers and Karush-Kuhn-Tucker (KKT) conditions. Dual formulation of SVR [21] is In the paper, Gaussian kernel is used due to its high flexibility and high dimensional space mapping of data. Gaussian kernel takes the form α is the Lagrange multiplier. The high efficiency SVR is depending on the selection of kernel function and model parameters. σ is the tuning kernel parameter.
The parameters (C, σ ) must be specified to train the model. In order to train SVR model, parameter (C, σ ) should be specified. The appropriateness of the parameters directly affects the quality of the SVR model. The genetic algorithm is employed to optimize the cross-validation accuracy. GA is a search algorithm of parallel computing and global optimization and without derivative, which is widely used in many fields. In the current study, the crossover probability P c and the mutation probability P m are constant values in the evolution period of population, which may delay the convergence of model and cause the long training time. Therefore, the adaptive p c and p m of GA is given by the fitness value f . The maximum crossover and mutation probability are P cmax , P mmax . The minimum crossover and mutation probability are P cmin , P mmin . The detailed GA is presented in Algorithm 2. L is the total evolutionary algebra, l is current population evolution algebra. Size of population is 400.
Compute the fitness of chromosomes by fitness function and k-fold cross validation 3. Stochastic universal sampling is used to select chromosome 4. Roulette wheel selection algorithm 5. Two point crossover and create new offspring. The probability of crossover is computed by [22] P c = L−l L P cmax + l L P cmin + Pcmax fmax f min 2 6. Simple mutation and the probability of mutation is computed by [22] P m = L−l L P mmax + l L P mmin 7. Stop criterion. If the number of epochs equals 150 or f l max −f l−1 max < 0.001, then the optimal chromosome is presented as a solution; otherwise go back to Step 2 Return: The new chromosomeZ = {C, σ }

Algorithm 3 Prediction Method
Input: Ion mobility (training data and testing data) The architecture of the prediction algorithm based on GA-SVR and kernel PCA is presented in Algorithm 3. The feature extraction, weighting parameter and SVR parameter optimization are proposed in the paper. The prediction algorithm first computes electric field and number density features of coaxial cylinder electrode model and optimizes the parameters of SVR by genetic algorithm, which has adaptive crossover probability and mutation probability depending on fitness value and population evolution algebra. And then, kernel PCA technique is used to extract the features and SVR dual problem is solved by quadratic problems with linear constraint. Finally, according to the optimal parameters and feature subset, the prediction model is used in the testing dataset.

III. MACHINE LEARNING FEATURES FROM ELECTRIC FIELD AND SPACE CHARGE NUMBER DENSITY OF COAXIAL CYLINDRICAL ELECTRODE A. CONFIGURATION OF COAXIAL CYLINDRICAL ELECTRODE
The prediction algorithm of ion mobility is primarily based on GA-SVR. The feature vectors (or called descriptors) of data are generated by calculation. The coaxial cylinder electrode model is used to calculate the electric field features and space charge features. The features of data include electric field features and space charge features of coaxial cylinder electrode model. These features are closely related to the atmospheric ion mobility. The diameter of the inner conductor is r 0 and the diameter of the outer electrode is r R , which is grounded.
where n p , n n and n e are the positive ion, negative ion and electron number density, respectively, cm −3 ; u p , u n and u e are the positive ion, negative ion and electron velocity, respectively, cm/s, α, η are the ionization and attachment coefficient, respectively, cm −1 ; β is recombination coefficient, cm 3 /s; D e is diffusion coefficient, cm 2 /s; E is the electric field, V/cm; ε is the permittivity of air; ϕ is the potential. S is the photoionization term. The above symbol of ∓ and ± represent positive polarity. The electron ionization and attachment coefficient are determined by the reduced electric field and the expressed as [26]- [28]. The boundary conditions include the conductor surface, plasma boundary and outer electrode boundary. The positive or negative ion density on the conductor surface is considered to be zero. The positive or negative ion density on the outer plasma boundary is assumed to be zero for the negative and positive corona mode. The corona current is given by In the process of solving the coaxial cylinder electrode model, the Runge-Kutta technique and the central finite difference method are used to calculate the equations. The relative error of the potential gradient and space charge density are the convergent conditions in the numerical iterative calculation. In the process of calculation, the ion mobility is considered to be constant and independent of the electric field, which is the same as the previous hypotheses used in the calculation of ion flow field.

C. ELECTRIC FIELD AND SPACE CHARGE FREATURES EXTRACTION
In the gap of coaxial cylinder electrode, forty-one features are defined and the feature set of electric field and space charge as shown in TABLE 1.
The electric field features are defined as follows: Maximum value of electric field: E m ; Average value of electric field: E a ;Electric field distortion: E d = (E m − E a ) E a ; Maximum value of electric field gradient: E m ; Average value of electric field gradient: E a . The second subscript p or i represent plasma region or ion flow region. E x% represents the ratio of area to total area which exceed x% of the maximum electric field.
The space charge features are defined as follows: Maximum and average positive ion density: n pm and n pa ; Maximum and average negative ion density: n nm and n na ; Maximum and average electron density: n em and n ea . The third subscripts p or i represents plasma region or ion flow region.
The current density features are defined as follows: Maximum and average positive current density: J pm and J pa ; Maximum and average negative current density: J nm and J na ; Maximum and average electron current density: J em and J ea ; Maximum and average total current density: J mtotal and J atotal ; The third subscripts p or i represents plasma region or ion flow region.

IV. PREDICTION OF ION MOBILITY
The training data and testing data of ion mobility are sourced from the experimental measurement. The ion mobility is measured by parallel-plate system.

A. STRUCTURE OF PARALLEL-PLATE SYSTEM
The schematic view of a parallel-plate system, which is recommended to produce uniform field with space charges and calibrate the electric field strength meter in IEEE Standard [29], is shown in Fig. 1 and Fig. 2. The apparatus used in this article is a circular coaxial geometry and different from the previous apparatus [18], which is a square coaxial configuration and avoids the inhomogeneous electric field effect. The space electric field of circular geometry is more uniform and isotropic. The apparatus has five layers, from the top to down, corona ring, corona wires layer, first screen, top plate (second screen) and bottom plate (grounded plate). The corona wires layer consists of the equidistant lined up copper wires and the interval of the wires is 5 cm. The diameter of the copper wire is 1 mm. The first screen is made by the square grids 0.5 × 0.5 cm with 0.2 mm-diameter copper wires and  fixed on the 1 m-diameter copper ring. The top plate (second screen) comprises square grids 1 × 1 mm with 0.1 mmdiameter copper wires and is also fixed on the 1 m-diameter copper ring. The Wilson plate is placed on the grounded metal plate with 7.9-cm-diameter current sense area and 1cm-wide guard band. An electrometer is mounted between the Wilson plate and the ground for measuring the ion current. A hole is in the center of the grounded plate for placing the field mill.
The corona ring is employed to uniform space electric field and used to suppress the fringing effect. The corona wires layer has copper wires, which is applied voltage to generate corona discharge, and the corona-originated ions move downward to the first screen. The ions pass through the first screen and then entered into the region of the parallel plates, which is similar to the two layer metal parallel-plates electrode. The second screen and the metal grounded plate generate an approximately uniform electric field. The ions are driven by the field. With the increase of the controlled voltage (V T ), the ion current density becomes saturation gradually and the space ions gathered at the second screen. Then, the electric field at the second screen becomes zero due to the space charge with same polarity and the ion mobility is obtained under the saturation condition. Fig. 3 shows the potential distribution of the parallel plates. The potential of along Y axis become uniform gradually. The effect of top plate grids on the potential of the most space between the parallel plates was rather limited. So, the onedimensional model can be used and only vertical component need to be considered.

B. TRAINING DATA AND TESTING DATA
The experiment has been carried out in the electromagnetic environment laboratory, which is relatively isolated from the outside environment. The temperature and atmosphere pressure is constant. The parallel-plates ion generator is placed in the sealed PMMA (polymethyl methacrylate) chamber with moisture content adjustable. The relative humidity (RH) is employed to describe the moist degree of the certain temperature and more intuitive than absolute humidity. The positive or negative ion mobility has been obtained with the applied voltage between ±200V and ±3000V. TABLE 2 shows a small portion of the training data and testing data of ion mobility. TABLE 3 shows a small portion of the training data and testing data of ion mobility under different relative humidity (RH).
After the model training is completed, the prediction function can be applied to predict the value of testing data. The prediction results can be evaluated by the following error analysis method.
Mean squared error (MSE)  Squared correlation coefficient When 0 < |r| < 1, there is a certain degree of linear correlation between the two variables. And the closer the |r| approaches 1, the closer the linear relationship between the two variables is.
Mean relative error (MRE) where f (x i ) is the prediction and y i is the measurement.

V. PREDICTION OF ION MOBILITY UNDER DIFFERENT HUMIDITY
In order to validate the effectiveness of the proposed model, the ion mobility of different humidity and air pressure is predicted and compared with data in the reference. Fig. 4 and Fig. 5 show the comparisons of measured and predicted of positive and negative ion mobility under different relative humidity. These two pictures illustrate the prediction results intuitively. The GA-SVR-kernel PCA model with Gauss kernel function coefficient σ = 0.9003 and penalty coefficient C = 88.7001 for positive polarity, and with Gauss kernel function coefficient σ = 0.6113 and penalty coefficient C = 68.8821 for negative polarity are applied to predict the value under different relative humidity at 20 • . The measured data of positive and negative ion mobility are from reference [30].
The predicted values are consistent with the measured values. Compared with measured data, the maximum deviation is 0.4015 m 2 /Vs for positive ion mobility and 0.4231 m 2 /Vs for negative ion mobility. VOLUME 8, 2020   For the positive ion mobility, the maximum mean relative error of different RH (20.9%-82.8%) is 2.7661%, 2.984%, 3.235%, 4.441%, 4.041% and 4.031%, respectively, as show in Fig. 6. For the negative ion mobility, the maximum mean relative error of different RH (24.6%-73.7%) is 2.128%, 2.433%, 3.199%, 3.221%, 3.098% and 2.877%, respectively. It is observed that prediction model has a high accuracy in the ion mobility estimation and effectiveness of GA-SVR combined with kernel PCA and numerical stability. The prediction results show that the feature extraction and parameter selection are appropriate. The prediction model presents a satisfactory generalization ability.

VI. CONCLUSION
In this article, an ion mobility prediction algorithm based genetic algorithm (GA) -support vector regression (SVR) combined with kernel PCA is proposed. The electric field features and space charge features of coaxial cylinder electrode model are extracted as the input of model and complicated physical process need not be considered. Kernel-PCA technique is used for dimensionality reduction. In addition, GA with adaptive probability parameters is an effective strategy and used to search the optimal penalty parameter and kernel parameter of prediction model.
An improved parallel-plates ion current generator has been developed for measuring the atmosphere ion mobility under different humidity. The prediction and measurement result show that proposed model offers higher prediction accuracy for ion mobility prediction. Besides, comparisons between prediction values and the publication data, the proposed prediction model offers a better performance. MRE of prediction values is below 4.441% and 3.221% for positive and negative ion mobility. The prediction results obtained on ion mobility show the satisfactory generalization and effectiveness.
ZHENGYING CHEN was born in Fujian, China, in 1993. He received the B.S. degree in electrical engineering from Tsinghua University, Beijing, China, in 2015. His research interests include electromagnetic environment and corona discharges of HVDC systems.
LIMING WANG (Senior Member, IEEE) was born in Zhejiang, China, in November 1963. He received the B.S., M.S., and Ph.D. degrees in high voltage engineering from the Department of Electrical Engineering, Tsinghua University, Beijing, China, in 1987China, in , 1990, and 1993, respectively. He has been working at Tsinghua University, since 1993. His major research interests include high-voltage insulation and electrical discharges, flashover mechanisms on contaminated insulators, and applications of pulsed electric fields. VOLUME 8, 2020