Novel Lithium-Ion Battery State-of-Health Estimation Method Using a Genetic Programming Model

State-of-health (SOH) is a health index (HI) that directly reflects the performance degradation of lithium-ion batteries in engineering, but the SOH of Li-ion batteries is difficult to measure directly. In this paper, a novel data-driven method is proposed to estimate the SOH of Li-ion batteries accurately and explore the relationship-like mechanism. First, the features of the battery should be extracted from the performance data. Next, by using the evolution of genetic programming to reflect the change in SOH, a mathematical model describing the relationship between the features and the SOH is constructed based on the data. Additionally, it has strong randomness in the formula model, which can cover most of the structural space of SOH and features. An illustrative example is presented to evaluate the SOH of the two batches of Li-ion batteries from the NASA database using the proposed method. One batch of batteries was used for testing and comparison, and another was chosen to verify the test results. Through experimental comparison and verification, it is demonstrated that the proposed method is rather useful and accurate.


I. INTRODUCTION
Currently, prognostic and health management (PHM) has been widely studied in industrial management to improve industrial production efficiency and ensure production safety [1], [2]. As an energy storage system, Li-ion batteries provide the most reliable power for various kinds of important equipment in different fields, such as satellite, aerospace and electrified vehicles. Therefore, it is extremely significant to consider the real-time accurate health status of Li-ion batteries to ensure the reliability of equipment [3].
With the development of monitoring technology, using Li-ion battery monitoring data to study PHM has become one of the most feasible and effective approaches. The SOH reflects the current capability of a battery to store and supply energy relative to that at the beginning of its life, so it is an indicator to evaluate the degradation level of batteries [4]. Unfortunately, achieving an accurate estimate of the SOH The associate editor coordinating the review of this manuscript and approving it for publication was Jiajie Fan . is a challenging task. Due to the limitations of monitoring technology, it is difficult to directly measure the SOH of Li-ion batteries. To solve this problem and realize further management of Li-ion batteries, researchers have made many contributions. In general, existing methods mainly include the model-based methods and data-driven methods [5]. The model-based methods attempt to establish physical models or use mathematical representations to describe the SOH degradation of Li-ion batteries. Although experimental methods require a large number of experiments to analyse battery ageing behaviour, they can be used to study ageing mechanisms, providing a theoretical basis for model-based methods such as the extended Kalman filter (EKF) [6]- [9], multiscale EKF [10], wiener process [11], and particle filter (PF) [12]- [15]. Although Li-ion battery life can be accurately predicted by these methods, it is difficult to track the SOH degradation process of the battery in details.
One of the most widely used data-driven techniques is incremental capacity/differential voltage (IC/DV) analysis. IC/DV analysis has been proven to be a powerful tool for battery capacity estimation [16]. Based on this method, the voltage plateaus on charging/discharging curves can be transformed into clearly identifiable peaks on IC/DV curves. Each peak of the curve represents a specific electrochemical process taking place in the cell and can be characterized by features such as the intensity and position. These peak features are closely related to battery capacity fade and can be used as indicators for SOH estimation. Weng et al. [17] estimated the battery SOH by relating it to the peak intensity of IC curves. Li et al. [18] established a linear regression relationship between battery capacity and the peak position on IC curves. However, IC/DV curves are sensitive to measurement noise inherent in battery systems [18], [19]. Accordingly, proper smoothing methods have to be proposed for obtaining smooth curves that facilitate the identification and evaluation of IC/DV curve features.
The fuzzy logic model is also a data-driven method that used to describe the performance degradation and calculate the SOH of Li-ion batteries [20]. In this model, the number of cycles should be considered as the feature parameter of Li-ion batteries, and the relationship between the feature parameter and SOH of Li-ion batteries can be described as where x represents the number of cycles and y is the value of SOH. The fuzzy logic computes the HI using a fitting curve with an error between 5% and 10%.Because many Liion battery parameters change as the capacity decreases, it is difficult to reflect the SOH of Li-ion batteries using the aforementioned performance degradation model. Simultaneously, only the state of the overall trend of the Li-ion battery can be given, resulting in the inaccuracy of the model. To solve the problem of estimation accuracy, many machine learning methods have been used to estimate the SOH of Li-ion batteries and have achieved good results, such as neural networks (NNs) [21], support vector machine (SVM) [17], [22], and Gaussian process regression (GPR) [23], [24]. The artificial neural network is a data-driven method widely used in the research of battery performance degradation. Hussein [25] used an artificial neural network to achieve capacity fade in Li-ion batteries for electric vehicles. You et al. [26] designed a recurrent neural network-based model to estimate the SOH in a more realistic environment. However, it has the shortcomings of poor generalization ability, discretization of structure, and low convergence speed. SVM and RVM have strong capabilities in solving small sample, nonlinear and local minimum values. Therefore, they have achieved good results in the SOH estimation and prediction of Li-ion batteries. As a result, these methods have been studied deeply in PHM. Zhang [27] denoised by empirical mode decomposition and then used multi-kernel relevance vector machine to achieve battery capacity prediction. Dong et al. [28] estimated the SOH of battery and predicted the remaining service life by supporting a vector regression particle filter. Li et al. [29] established a multistep prediction model based on average entropy to predict SOH and RUL.
However, the disadvantages of SVM and RVM in these aspects cannot be ignored: usually, it is a time-consuming task for tuning the soft margin parameter C through crossvalidation methods, sparsity cannot always be achieved, and a high number of support vectors are thus obtained. GPR is a nonparametric model that uses a Gaussian process prior to regression analysis of data. Zhou et al. [30] realized the online estimation of Li-ion battery capacity by combining EKF with the GPR model. Simultaneously, the performance of the GPR model depends on the chosen covariance function and its parameters, and the long-term prediction error is large. Although these methods do not require a deep understanding of battery ageing and its associated degradation mechanisms, it is necessary to collect a large amount of data from previous experiments to train the estimation models, which limits the wide use of these methods. Specifically, these methods need to choose the appropriate kernel function and parameters according to the actual situation, and it is not an easy task to achieve the autonomy of the kernel function. Furthermore, the above data-driven method cannot clearly know the potential relationship between the features and SOH.
The remainder of this paper is organized as follows. Section II describes some of the current problems in the SOH estimation of Li-ion batteries. In Section III, the process of obtaining an SOH calculation formula by a genetic programming model is elaborated. Section IV gives experimental verification and comparison of BP neural networks. Finally, this paper is concluded in Section V.

II. PROBLEM STATEMENT
After an extensive literature review, there are many kinds of battery SOHs that can be reflected in the numerous monitoring parameters of batteries. The SOH of a battery can only be monitored by mechanisms directly, but it is hard to monitor due to the limitation of practical applications. Especially when we find it difficult to know how it works, it is a meaningful issue to choose the parameters that can best reflect the change in battery SOH and construct the corresponding SOH estimation formula such as mechanism.
Genetic programming (GP) model is a very powerful supervised machine learning algorithm and has been successfully applied to classification and regression in many different fields [31], such as network identification [32], fault detection [33] and task scheduling planning [34]. This method has a good ability to approximate nonlinear relationships and is robust to outliers. Although it has good regression performance and has been well used in other areas, such as strategy optimization and feature selection, no similar work has been done to date in estimating the SOH of batteries. Reference [35] used multi-objective GP to synthesize humanunderstandable HI from sequences of voltages, currents and temperatures streamed via on-vehicle sensors. In paper [36], the GP was used to address the challenge of automatically discovering advanced features, which can well capture fault progression. However, these studies did not give a concise An accurate multiple regression model is sought for battery SOH estimation, which is constructed through various function changes and different formulas. The purpose is to obtain an accurate and simple battery SOH estimation model. Motivated by this, this paper proposes a method of autonomously determining the regression formula based on the GP regression method to diagnose Li-ion battery SOH. This proposed work aims to fill this gap by using the GP mode for estimating the SOH of Li-ion batteries accurately and exploring the relationship between monitoring parameters, such as current, voltage, temperature and SOH.
This method has several notable features of SOH estimation. First, it has strong randomness in the formula model, which can cover most of the structural space of SOH and features. Next, it has good robustness for the interference of significant noise and input-independent features present in the predictor variables. In addition, compared to other machine learning models, it is only necessary to determine the parameters without having to train the model multiple times.
The SOH is an important index for battery health management. To evaluate the estimation results, the SOH of Li-ion batteries must be defined. Therefore, it is important to understand the definition of SOH clearly, which can be generally defined as [1]: where C actual represents the maximum practical capacity as measured from the operating battery at the current time. C actual may fade over time due to the effect of battery ageing. C norm is the rated capacity from battery manufacturers. We assume that the SOH of Li-ion batteries can be repre-sented by an unknown function: where F soh is the estimated SOH of the Li-ion battery and g i (i = 1, 2, · · · ,n) are the features extracted from the Liion battery monitoring data. Using the genetic programming model, Eq. (3) can be constructed. It has the obvious advantage of avoiding the fixed-function model framework so that the model can independently explore the functional relationship between features and the SOH of Li-ion batteries.

III. METHODOLOGY A. GENETIC PROGRAMMING (GP)
The genetic programming flowchart is shown in Fig. 1. The extracted features and the SOH of Li-ion batteries are input into the genetic programming model. Like the genetic algorithm (GA), the GP is also one kind of machine learning method. Usually, GA optimizes a value, while the difference of GP is that the optimal individual is a strategy or function. First, the initial calculation formula population is randomly generated, and then the individual selection is performed according to the fitness function; further, the intersection and variation are performed. Finally, the evolution is terminated according to the judgement condition while obtaining the optimal formula individually. Similar to GA, GP has the same important elements as follows [36]: • Randomly generate an initial genetic population; • Require a training set and fitness function; • Evaluate the individual's viability in the population according to the fitness function, and then perform individual screening; • Individuals in the population perform similar gene manipulations to achieve crossover and mutation; VOLUME 8, 2020 • Termination controllable.
According to the GP model and Eq. (3), a function consists of an independent variables, dependent variables, operators and coefficients. Thus, the extracted features can be linked by operators and coefficients to obtain the calculation formula of SOH for Li-ion batteries. First, a large number of firstgeneration individuals need to be randomly generated. Each individual in this genetic programming represents a function. In addition, for the independent variables, the individual in the GP model also includes a series of mathematical operations, such as plus, minus, multiplication, division, square, square root, exponential operation and logarithmic operations. An individual coding can be represented by a tree structure, and an illustrative individual is shown in Fig. 2. For this individual, the node C is a constant, and the node G i represents the factor that is a function of the extracted features g i in the form of production (logarithm, square, square root, etc.) and constant a, as Hence, the specific formula represented by the tree structure in Fig. 2 for an illustrative individual is The depth of the tree structure is defined as the number of factors in an individual. It should be noted that for each individual in the GA model, the tree depth is set to 3, and each node is randomly generated in this paper. Genetic operations such as crossover and mutation are performed on the node in the tree structure for each individual to determine the model of SOH. After performing the selection operation, it is determined whether the cross operation is performed based on the crossover probability. If there is no intersection, the individual will not change; if it is crossed, using individuals F1 and F2 as examples, the specific operation is shown in Fig. 3. Individuals F1 and F2 are randomly selected, and the number of nodes is randomly selected from individuals F1 and F2 to cross to obtain a new individual. As shown in Fig. 3, nodes G3 and G4 in the two individuals are randomly selected and interchanged to obtain two new individuals F'1 and F'2. When the population of individuals is crossed, the variation in the individual population is simultaneously performed according to the process  of genetic programming, as shown in Fig. 1. The individual F1 is randomly selected, and whether the mutation operation is performed is determined or not depends on the mutation probability. If the mutation is not performed, the individual F1 remains the same. In contrast, the nodes in the individual are randomly selected, and the whole node or a part is selected to be mutated. In the example in Fig. 4, for individual F1 selection, the logarithmic operation in node G3 is mutated to a square operation, resulting in a new individual F'1. Among them, individuals who have undergone crossover and mutation can be repeatedly crossed and mutated.

B. FITNESS FUNCTION DESIGN
To minimize the error between the estimations and true values of the SOH, the root mean square error (RMSE) is proposed to measure the individual's fitness as where SOH i and F i represent the true values and estimations of the SOH of the Li-ion battery during the i cycle, respectively, and i = 1, 2, · · · , N . The population individual repeats the selection, crossover, and mutation operations. When the fitness of individual is less than the threshold or the evolutionary generations reach the maximum iteration numbers, the corresponding individual is the optimal one. The evolution is terminated accordingly.
When the optimal individual is obtained, the model F = f o (g 1 , g 2 , · · · , g n ) can be determined to present the formula between the performance feature parameters and SOH of Liion batteries. According to the determined formula, the SOH of Li-ion batteries could be estimated and predicted with newly monitored performance features g new .

IV. EXPERIMENTS AND VERIFICATION
In this section, the source and feature extraction of the performance degradation test data of Li-ion batteries is first introduced to ensure the repeatability and integrity of the algorithm and model. Then, the SOH estimation result of the Li-ion battery obtained by the genetic programming model is given, which is compared with the BP neural network model to verify the applicability and efficiency of the GP model.

A. EXPERIMENT DATA AND VERIFICATION DESIGN
In this paper, the test results of commercially available Liion 18650-sized rechargeable batteries obtained in the NASA open source database are used for verification. Three different Li-ion battery operating test conditions, including charge, discharge and impedance, can be specified as follows [37]: Charging process: The lithium-ion battery is charged in constant current (CC) mode (current 1.5). After that, the lithium-ion battery continues to charge in constant voltage (CV) mode until the charging current drops to 20 mA, which indicates the end of charging.
Discharge process: the lithium-ion battery is discharged in CC mode until the voltage reaches a certain set cut-off voltage.
Impedance measurement: The impedance of the lithiumion battery was measured by EIS frequency scanning in the range of 0.1 Hz ∼5 kHz.
The performance of the Li-ion battery degraded due to cyclic charge and discharge, which may be continuous or discontinuous due to impedance measurements during the actual capacity attenuation test. In this paper, two batches of lithium batteries under different experimental conditions are considered: (1) #5, #6 and #7 as a group are used to train the GP model. From the data, 50% -70% is selected for training commonly used in machine learning, and the rest of the data are used as a preliminary formula validation.
(2) #33, #34 and #36 are used to verify the formula results. Twenty percent of the data are selected for parameter optimization, and the formula is verified by realizing an accurate estimation of the SOH of Li-ion batteries.
All experiments were performed at room temperature (24 • C). The specific battery situation is shown in Table 1.

B. FEATURE EXTRACTION
The charge was performed at a CC of 1.5 A until the voltage reached 4.2 V, and then it continued charging at a CV until the charge current dropped to 20 µA. Five performance features can be extracted from the Li-ion battery data, and more details of extracting features can be seen in [15] and [38]: An equal voltage rise charging interval (g1) represents the time it takes for the voltage to rise from a lower value to a higher value during CC charging. The equal electric current drop charging interval (g2) is the time elapsed during a CV charging process in which the current drops from a higher value to a lower value. An equal voltage drop discharge time interval (g3) is extracted from the discharge voltage curve of the battery. The average charge battery temperature (g4) represents the average current temperature between the start time of g1 and the end time of g2. The average battery discharge temperature (g5) is the average battery temperature during the g3 period. In the performance features of Li-ion battery extraction, g1 is set to the time interval of the Liion battery CC charging voltage 2.7 V rising to 4.2 V; g2 is set to the Li-ion battery CV discharge current 1.5A falling to 0.3A time interval; g3 is set to a time interval in which the discharge voltage is reduced from 3.7 V to 2.7 V; the performance feature data are regulated to within 0-100, so the value of performance feature g1, g2, and g3 are respectively divided by 100. Finally, the performance features (g1, g2, g3, g4, g5) of the Li-ion battery are obtained.

C. SOH ESTIMATION RESULTS AND FORMULAS
The five performance features of batteries #5, #6, and #7 and the real SOH are shown in Fig. 6. Because the optimal function may be more than one, the function form is endless. To reduce the computational complexity and time consumption, the individual tree depth in this paper is set to 4 to 6. The factor G i of the function only considers the square, square root, logarithm and no function operation. The range of the factor G i and the constant coefficient of the function is (−2, 2). The performance feature data of the Li-ion battery is 168 cycles. The first 120 cycle feature data and the real SOH value of the Li-ion battery are used as the training set, and the remaining 48 cycle feature data are used as the test set. The performance feature data are put into the genetic programming model for training. The number of populations of the model is set to 1000, the crossover probability is 0.75, the mutation probability is 0.05, and the number of iterations is 500. The BP network uses a feedforward neural network with 10 hidden neurons and the Levenberg-Marquardt algorithm. VOLUME 8, 2020  The maximum number of trainings is 1000, the training accuracy is 0.001 and the learning rate is 0.01. Figs. 7(a-c) show the three sets of optimal estimation results for batteries #5, #6, and #7 evolved through the genetically regulated model and the BP neural network estimates obtained by the same data training. Figs. 7(d-f) depict the fitness optimization process for the optimal individual of batteries #5, #6, and #7.
It can be seen from Fig. 7 that after 120 cycles of training, the process of GP model training to find the optimal function converges very quickly, and the optimal individual has been FIGURE 7. The SOH estimation of (a) battery #5 (b) battery #6 (c) battery #7, and the process of fitness optimization of (d) battery #5 (e) battery #6 (f) battery #7. found in fewer than 100 iterations. In addition, the GP model fits well with the SOH of the #5, #6, and #7 Li-ion batteries. Compared to the BP neural network estimation results, the GP model estimates the remaining 48 cycles of SOH more accurately. According to the formulas for obtaining the optimal individual representative of the three Li-ion batteries, calculate the RMSE of the real SOH of Li-ion batteries, as shown in Table 2.
Under this training condition, the GP model estimated the RMSE of the SOH of batteries #5, #6 and #7 to be <1%, while the BP neural network estimated the RMSE of SOH to be >1%. The SOH estimation formulas of Li-ion batteries #5, #6 and #7 obtained by the GP model are all different. The number of performance features and the arithmetic function of the factor Gi in the formula are different, which shows that the formula obtained by the GP model has uncertainty. Similar to the principle of finding the best individual in the GP model, the group passes through the intersection and variation between individuals and then gradually realizes the evolutionary group. The obtained optimal formula is a feasible solution for estimating the SOC of Li-ion batteries.
To further verify the accuracy of the GP model, the optimal formula was obtained by training the 90, 100 and 110 cycles of the #5, #6 and #7 Li-ion batteries, respectively, and then the remaining SOH estimation was performed.
The three Li-ion batteries of #5, #6 and #7 were trained in 90, 100, 110, and 120 cycles, respectively, and then the optimal formula was obtained under different conditions, as shown in Table 3. Compared with the estimation results of the BP neural network, the RMSE of batteries #5 and #7 based on the GP model is 1% lower, while the RMSE of the BP neural network model is approximately 2%. The RMSE of the estimated SOH is much smaller than that of the BP neural network. In the #5 battery estimation results, the GP model is significantly better than the BP neural network model. The RMSE of battery #6 is estimated to be smaller than that of the BP neural network under different training periods. Therefore, we can conclude that the GP model is generally better than the BP neural network model for estimating the SOH of this group of Li-ion batteries. In addition to the estimated SOH results of battery #6 during training for 90 cycles, the RMSE of the SOH and the true value of the Li-ion   battery were estimated to be below 1% by the GP model. The occasional results with fewer training cycles are more accurate than those trained with more cycles, which needs to be further explained. The reason for this phenomenon is the certain randomness in the cross-mutation process, but the estimation accuracy generally improves as the number of training increases. Based on the above discussion, the GP model can accurately estimate the SOH of Li-ion batteries.
The optimal formulas obtained for the #5, #6 and #7 Li-ion batteries at different cycles were analysed. Feature selection is automatically realized in the iterative optimization process of the GP model, and the performance features contained in the final optimal formula have a certain correlation with the SOH of Li-ion batteries. Numerically, features g 1 , g 2 and g 3 appear more frequently in the formula than performance features g 4 and g 5 . Thus, the SOH contribution for the formula is greater than the performance features g 4 and g 5 , so the features g 1 , g 2 and g 3 are more able to reflect the change in the SOH of Li-ion batteries. In particular, the optimal formula obtained in the #7 battery training 110 cycle and RMSE is very small.
By analysing the optimal formula, the GP model can help analyse the influence of performance features on the SOH of Li-ion batteries and help to summarize the functional relationship between the SOH and performance features of Li-ion batteries. It is proven that the performance feature g 3 has a certain functional relationship with the SOH of Li-ion batteries, and a function form of the performance feature g 3 with the SOH of Li-ion batteries can be obtained. On the basis of this, SOH can be represented by this model: The selected # 33, # 34 and # 36 batteries were used to verify the obtained formula 8. The first 20% of the data (40  cycles) were used to train the particle swarm algorithm to optimize the parameters a 0 , a 1 , c.
It is obvious that the SOH estimation effect for batteries #33 and #34 is very good according to Fig. 11 and Table 4. Both the RMSE and MAE of the estimated results are less than 0.2%. The RMSE is close to 3%, and the MAE is smaller for the estimation of #36. According to Table 4, the estimation errors of # 33 and # 34 are small, and the estimation error of # 36 is relatively large. However, there is a good estimation result for the overall change trend of the SOH of battery #36, which can be confirmed at MAE = 0.81%. There are two large estimation errors of battery #36, which leads to a significant increase in the estimated RMSE. It can be seen from Fig. 11(c) that the actual SOH value is greater than 1, so it is reasonable to suspect that the monitoring data at these two places are abnormal. It can be concluded that Eq. (8) can achieve accurate estimates of # 33, # 34 and # 36 battery SOH. VOLUME 8, 2020    It is apparent that Table 4 that the coefficient of the 3/2 power term is much smaller than the power of the first power term, and there is a linear relationship with SOH. This can also be confirmed by Fig. 12.
In order to further verify the proposed method, battery #5 was selected to train the same 84 training cycles according to the test method of Multi-Kernel RVM estimation SOH in [38]. It is obvious that the accuracy is not significantly different and both methods can accurately estimate the SOH of battery, while the RMSE and R 2 of the GP model are better than MK-RVM from Table 4. Because the RMSE of the GP model is smaller and R 2 is closer to 1, the SOH estimation result is more stable and fits better through the GP model. Furthermore, the proposed method is easier to visualize by expression.

V. CONCLUSION
In this paper, a GP model is provided for estimating the SOH of Li-ion batteries based on the characteristic data. Data were collected from the data repository of the NASA Ames Prognostics Center of Excellence (PCoE). This paper selects two batches of Li-ion battery data by extracting the features of Li-ion batteries during each charge and discharge cycle. The feature data and the SOH of the battery are input to the GP model for training, and then a formula is obtained for estimating the SOH of the Li-ion battery. According to the principle of the GP model iteratively searching for the optimal formula, the obtained SOH estimation formula of Li-ion batteries is not unique and can greatly explore the optimal formula. When the training cycles or batteries are different, the estimation formula will be different. To verify that the SOH of the Li-ion battery can be well estimated by the obtained formula, the SOHs of batteries #5, #6 and #7 are estimated by selecting three different training cycles and compared with the BP neural network estimation results. Then, using the batteries of # 33, # 34 and # 36 to verify the results of the estimation formula, the relationship between features and SOH is explored. It can be concluded that the GP model can trace the change in SOH well through the extracted features, and the proposed method has good robustness.
The SOH estimation model is derived from training data such as traditional machine learning. The difference is that the method focuses on obtaining the function of estimating SOH, and the relationship between the performance features and the SOH of Li-ion batteries is constructed. Although the traditional machine learning model could accurately estimate the Li-ion battery SOH, it is difficult to conclude the possible relationship between extracted features and the SOH of batteries. According to the optimal formula obtained by the GP model, we can summarize the influence of features on the SOH of Li-ion batteries and even obtain the expression between the features and SOH.
This method can also be applied in electronics fields. In particular, performance degradation studies are usable when product performance degradation is not subject to typical degradation processes, and products have certain features and health indexes. When it is desired to obtain a functional relationship between HI and features, the model formula can be optimized autonomously according to this method to realize the estimation of HI.
QIAN ZHAO received the master's degree in management science and engineering from the National University of Defense Technology (NUDT), China. He is currently a Lecturer with the College of Information and Communication, NUDT. His research interests include reliability and remaining useful life prediction.
ZHI-JUN CHENG received the Ph.D. degree in management science and engineering from the National University of Defense Technology (NUDT), China. She is currently an Associate Professor with the College of Systems Engineering, NUDT. Her research interest mainly includes prognostics health management.
BO GUO received the B.S. degree in mathematics from the Huazhong Institute of Technology, the M.S. degree in system engineering from the National University of Defense Technology (NUDT), China, and the Ph.D. degree in engineering management from the Tokyo University of Science. He is currently a Professor with the College of Systems Engineering, NUDT. His research interests mainly include system reliability and project management.