Long Short-Term Memory With Attention Mechanism for State of Charge Estimation of Lithium-Ion Batteries

Evaluating the state-of-charge of the battery’s current cycle is one of the major tasks in the charge management of rechargeable batteries. We propose a long short-term memory model with an attention mechanism to estimate the charging status of two lithium-ion batteries. Data from three dynamic tests such as dynamic stress test, supplemental federal test procedure-driving schedule

INDEX TERMS Attention mechanism, lithium-ion battery, long short-term memory, state-of-charge. Lithium-ion batteries are not only used in portable electronics and EVs, but also used in smart grid technology for load balancing and short and medium-range passenger drones [1]. Li-ion batteries have many advantages, including lightweight, fast charging speed, high energy density, low self-discharge rate, and long service life [2]. Therefore, monitoring battery status is very important to ensure the safe and reliable operation of EVs [3]. The state of charge quantifies the current remaining charge cycle in the battery and shows how long the battery can last before it needs to be charged [3]. SOC is defined as the percentage of remaining power at the maximum available capacity, and its function is similar to that of a fuel gauge in an EV [4]. Due to complex battery dynamics and operating conditions, such as ambient temperature, selfdischarge rate, hysteresis regeneration, and battery aging, accurate SOC estimation remains a difficult task.

ANN
SOC estimation methods are classified as: looking-up table method, ampere-hour integration method, model-based  estimation method and data-driven estimation method [2]. The data-driven approach has attracted considerable attention. When using the signals measured by a Li-ion battery to estimate SOC, the relationship between the measured variable and SOC is non-linear with changes in temperature, charge/discharge current, and voltage. Anton et al. [5] used support vector machines to estimate SOC from battery current, voltage, and temperature measurements. Kang et al. [6] proposed an artificial neural network model to estimate SOC at different battery aging levels. Recently, with the improvement of the computing power provided by the graphics-processing unit, ANN-based methods have increasingly drawn attention from the research community.
An ANN model with an unscented Kalman filter [7], RNN model [8]- [10], and nonlinear autoregressive with exogenous input-based neural network [11] have used to estimate SOC. The experimental results show that the RNN model has good results on battery aging, hysteresis, dynamic current curve, nonlinear dynamic characteristics and parameter uncertainty. Due to the so-called gradient vanishing problem that occurs during traditional back-propagation training, RNNs themselves cannot capture long-term dependencies [12]. Long short-term memory can solve the problem of gradient vanishing and long-term dependence, while LSTM adjusts the balance between memory and forgetting by adding threshold gates [13]. Yang et al. [14] proposed a stacked LSTM model for Li-ion battery SOC estimation and the model provides better SOC estimation performance. Therefore, LSTM is regarded as one of the latest method to deal with time series prediction problems and SOC estimation. However, LSTM cannot avoid the defect of long-term forgetting, which means that the network cannot remember long-term information or state and communicate it to the current LSTM unit [15]. Therefore, using only LSTM to process long-term predictions cannot achieve better accuracy.
Recently, some researchers have introduced attention mechanisms to improve information-processing capabilities.     In analyzing image and time-series data, better results can be obtained by incorporating the attention layer into the LSTM model compared to other ordinary deep learning models [13], [16]- [18]. Note that the model only helps to select the output of earlier layers that are critical to each subsequent stage of the model. It allows the network to selectively focus on specific information and determine which part of the information may be more valuable for the current task. Therefore, we propose an LSTM model with an attention mechanism to estimate the SOC of two Li-ion batteries under three different operating conditions, where the differential evolution algorithm determines the optimal parameters of the model. Section II introduces the attention-based LSTM model and the DE algorithm. Experimental datasets and analysis results are provided in Sections III and IV, respectively. Finally, we make conclusions and suggestions for future research.

II. ATTENTION-BASED LSTM MODEL
Traditional RNNs have great difficulties in capturing longterm dependencies. If there is no chance in the network to reset its internal state when learning a long input sequence, simple RNNs often show degradation, and the most critical information in the sequence is still many time steps away from the time window [19]. Therefore, the output of the RNN has been limited until the first long-term memory unit proposed by Hochreiter and Schmidhuber in 1996, which stores long-term information in the state of other units and uses gates to control input or output information [20]. Since then, LSTMs or GRUs has often replaced traditional RNNs. LSTM is a type of RNN, which implements an efficient mechanism for determining that the encoded state element will be transmitted to the next unit at each time point and used to predict the target variable. It includes inputs, outputs, and forgets the gate so that the network can learn longer sequences, manage longer dependencies, and converge on specific solutions. The storage unit enables the network to know when to learn new information and when to forget old information. The attention mechanism was originally developed as a way to improve the accuracy of machine translation. The idea of an attention-based LSTM model is to introduce a layer of attention to the basic LSTM network. This not only enables the LSTM network to handle the long-term dependence of the driver sequence on historical time steps, but it can also handle importance-based sampling. Attention weights obtained from competitive random search will be transferred to LSTM networks based on evolutionary attention for time series prediction [21]. At the same time, prediction errors are sent as feedback to guide the search process.
As a result, attention mechanisms soon expanded to various fields, including time series prediction [22]- [24]. In this study, attention mechanisms are proposed to address two shortcomings of LSTM. The attention mechanism replaces the traditional method of recursively constructing LSTM depth. The attention mechanism is located on the output layer of LSTM to model long-term dependencies.
An attention-based LSTM model for SOC estimation includes two hidden layers: LSTM and attention mechanism. We define a historical sequence of targets as Note that I represents current, V represents voltage, and T represents temperature. Then, the LSTM calculates the control state sequence and the cell state sequence. The components of LSTM unit are: 1. Input gate (i) controls the size of the new memory content added to the memory as 2. Forget gate (f ) determines the amount of memory to be forgotten as 3. Cell activation vector (c) is derived by the memory c t−1 and the new memory after modulationc t .  The attention mechanism is located on the output layer of each LSTM unit to model long-term dependencies and predict the final SOC, as shown in Figure 1. The DE algorithm can obtain the optimal parameters of the LSTM     model, such as lookback, batch size, and neuro. Details of the DE algorithm can be found in the literature [25]- [27]. The steps to obtain optimal parameters of the model are as follows.
Step 1. Define and normalize the target and input features.
Step 2. Determine the fitness function. The mean absolute percentage error (MAPE) is selected as the objective function and its calculation formula is where SOC t is the actual SOC at time t, SÔC t is the predicted SOC at time t, and n is the total number of the test data.
Step 3. Select the parameters of the DE algorithm, such as lower limit, upper limit, NP, F, and CR, whose values are [5,10,20], [30,100,140], 50, 0.8, and 0.9 in the form of vector or constant.
Step 4. Output the optimal value for each parameter.
Step 5. Obtain the predicted SOC using the optimal parameters. Keras and DEoptim R libraries were used to get all the calculations. VOLUME 8, 2020

III. DATASETS
The first dataset of the 18650 LiNiMnCoO2/Graphite Li-ion cell was obtained from an experiment conducted by the Center for Advanced Life Cycle Engineering (CALCE) at the University of Maryland [7], [28]- [30]. The test platform includes test samples, a thermal chamber, an Arbin BT2000 battery test device, and a computer equipped with Arbin software for charging/discharging and data monitoring of the test device [28]. Three separate test plans were executed on the battery test bench at low temperature, room temperature, and high temperature in the chamber room. The battery open circuit voltage (OCV) test under low-current C/20 was performed at temperatures of 0 • C, 25 • C, and 45 • C. We control its ambient temperature and record all test data at one second intervals. Due to the steady growth of the global EV industry over the past decade, EV battery packs must operate under a variety of dynamic loads and temperature conditions [31]. Therefore, maintaining the accuracy of the monitoring system has become a major challenge. SOC estimation under dynamic load curves can be found in some literature [31]- [36]. The battery cell was tested with four drive cycles, such as DST, US06, FUDS, and BJDST at different discrete temperatures. These drive cycles were transferred from the time-velocity profile of an industry standard automobile. In this study, we used DST, US06, and FUDS datasets at 0 • C, 25 • C, and 45 • C to validate our proposed model. The DST profile, consisting of a series of current steps with different lengths and amplitudes, considers the battery capacity regeneration. The FUDS profile simulates a city driving profile with fast speed fluctuations, and the US06 simulates highway driving with high acceleration and rapid speed fluctuations. The second dataset, the Panasonic NCR18650PF cell, was obtained from Mendeley data [37]. The test platform is given in [38]. For each test, first fully charge the battery, and then calculate the driving cycle power curve based on the battery during discharge until the battery reaches the cutoff voltage of 2.5 V. Several driving cycle tests, such as  collected from three temperatures of 0 • C, 10 • C, and 25 • C. Cycles 1∼4 and NN consist of a random mix of UDDS, LA92, US06, and HWFET. Some specifications of these two batteries are shown in Table 1.

IV. ANALYSIS RESULTS AND DISCUSSION
The proposed model is trained and tested using the above two datasets at different temperatures. The proposed model is compared to the LSTM model without attention under similar conditions. During model training, the parameters are obtained by DE algorithm. Three input variables x t = [I t , V t , T t ] and one output variable y t = [SOC t ] are used in the SOC estimation. The true SOC is obtained by SOC(t) = 100 − DOD(t), where DOD is defined as the capacity discharged from a fully charged battery divided by its rated capacity [39].
The number of training datasets may affect the estimation results. In the first experiment, the model was trained using one dataset and the remaining datasets were used as test data. Table 2 summarizes the optimal parameters of the LSTM model with or without attention mechanism obtained by the DE algorithm. Table 3 compares the SOC estimation performance using RMSE criteria, where RMSE is obtained as where SOC t is the actual SOC at time t, SÔC t is the predicted SOC at time t, and n is the total number of the test data. For the US06 dataset, the RMSE of the SOC estimation using the LSTM model with attention is 1.1373 (0 • C), 1.2202 (25 • C), and 1.0928 (45 • C) and using the LSTM model is 1.2526 (0 • C), 1.2698 (25 • C), and 1.1401 (45 • C). In all cases, RMSE of the SOC estimation using the LSTM model with an attention mechanism is lower than the LSTM model, and the results are clearly shown in Figure 2.
In the second experiment, the model was trained using two datasets and the remaining dataset was used as test data. For example, the DST and US06 datasets are used as training data to predict the FUDS dataset. The optimal parameters of the LSTM model with an attention mechanism (such as lookback, batch size, neuro, steps-per-epochs, and epochs at three different temperatures) are 22, 99, 104, 100, and 100, respectively. Table 4 summarizes the optimal parameters of the LSTM model with or without attention mechanism using two datasets as training data. The comparison result of the SOC estimation is shown in Figure 3. For the US06 dataset, the RMSE of the SOC estimation using the LSTM model with attention is 0.89 (0 • C), 0.99 (25 • C), and 0.88 (45 • C) and using the LSTM model is 0.92 (0 • C), 1.24 (25 • C), and 0.99 (45 • C). The results show that the LSTM with an attention mechanism can estimate more accurately than the LSTM model. When training the model with two datasets, the estimated performance of the model is better than using one dataset, as shown in Figures 4-5.
The performance of the proposed model is compared with some published methods in Zhang et al. [28], such as SVM, standard RNN, LSTM, SVM-PF, standard RNN-PF, and LSTM-PF. Table 5 shows that the LSTM with attention performs best using the FUDS dataset as the test set. For example, under 0 • C temperature, the RMSE values of SVM, standard RNN, LSTM, SVM-PF, standard RNN-PF, LSTM-PF, LSTM using optimal parameters, and LSTM with attention are 5.9630, 5.7220, 3.2970, 2.9730, 2.9620, 1.4170, 1.1177, and 0.9593, respectively. Therefore, we conclude that the LSTM model with an attention mechanism and optimal parameters can provide better SOC estimation accuracy. RMSE is shown in Figure 6 to clearly present model performance.
The second dataset was adopted to evaluate the performance of the LSTM with attention, LSTM, and BDLSTM models. These models are trained by seven datasets, including Cycle 1, Cycle 2, Cycle 3, Cycle 4, LA92, UDDS, and NN, to test US06 and HWFET. The optimal parameters for LSTM without and with attention models are [5,96,64,50, 100] and [3,76,64,50,100] for lookback, batch size, neuro, step-perepoch, and epochs in vector form. The estimation accuracy of the BDLSTM model was found in the literature [2]. The mean absolute error and maximum error are chosen to assess the estimation performance of the models, which are calculated as: where SOC t is the actual SOC at time t, SÔC t is the predicted SOC at time t, and n is the total number of the test data. The MAE and MAX of the models are shown in Figures 7-8. For example, using the US06 dataset under 25 • C temperature, the MAE values of BDLSTM, LSTM, and LSTM with attention are 0.74, 0.57, and 0.24, respectively. The performance of the LSTM model with attention for the US06 and HWFET datasets is better than the other two models. Moreover, Figures 9-10 show the estimated SOC for US06 and HWFET datasets under 0 • C and 10 • C temperatures. Therefore, we can conclude that LSTM with attention mechanism performs better than LSTM without attention.
The robustness of the proposed model is evaluated after constructing the prediction interval and SOC estimation error. We used the Monte Carlo dropout technique to obtain the variance and bias of the proposed model. The model is run 100 times with random dropout, which will produce 100 predicted output values at each time. Then, we can calculate the empirical mean and variance of the output to obtain the prediction interval for each time step. The FUDS dataset 95% prediction interval and SOC estimation error are shown in Figure 11 for one training dataset and Figure 12 for two training datasets. Note that LB and UB are lower and upper bounds, respectively. The narrow width of the prediction interval indicates the reliability and robustness of the proposed model. The SOC estimation error is less than 1.2% for both training conditions as shown in Figure 11(b) and Figure 12(b). However, the estimation error at the starting point for both the training condition and the end-point for the two training datasets is the worst that indicates the limitation of the model.

V. CONCLUSIONS
This paper proposes a long short-term memory model with an attention mechanism to estimate battery SOC under dynamic operating conditions. The effect of attention mechanism in the LSTM model is studied. The DE algorithm obtains the optimal parameters of the model. Data collected from different charge-discharge curves are used to evaluate the proposed model at different temperatures. Then, using one dataset and two datasets for training model, the SOC estimation performance of different models is evaluated separately. The results show that the LSTM model with an attention mechanism can provide better estimation accuracy and stability. Besides, when using two datasets for model training, the SOC estimation can be improved. The estimation accuracy of the LSTM model using the optimal parameters is also considered. VOLUME 8, 2020 It provides better prediction accuracy compared to published models in [28], such as SVM, standard RNN, LSTM, SVM-PF, standard RNN-PF, and LSTM-PF. Therefore, it can be concluded that using the LSTM model with attention based on the optimal parameters can provide better SOC estimation accuracy.
Using the Panasonic NCR18650PF dataset, the BDLSTM, LSTM, and LSTM with attention models were compared. The results show that the attention-based LSTM model has better SOC estimation accuracy for US06 and HWFET test sets. As a result, the attention-based LSTM model can provide better SOC estimation accuracy under dynamic conditions. In the future, we will work on other advanced deep learning models for SOC estimation of Li-ion batteries.
TADELE MAMO received the M.Sc. degree in industrial engineering from Addis Ababa University, Ethiopia. He is currently pursuing the Ph.D. degree with the Department of Industrial Management, National Taiwan University of Science and Technology, Taiwan. His fields of interests include remaining useful prediction and data analytics.
FU-KWUN WANG received the Ph.D. degree in industrial engineering from Arizona State University, Tempe, USA. He is currently a Distinguished Professor with the Department of Industrial Management, National Taiwan University of Science and Technology, Taiwan. His fields of interest are reliability engineering, quality control, and predictive analytics. VOLUME 8, 2020