Combined CNN-LSTM Network for State-of-Charge Estimation of Lithium-Ion Batteries

State-of-charge (SOC), which indicates the remaining capacity at the current cycle, is the key to the driving range prediction of electric vehicles and optimal charge control of rechargeable batteries. In this paper, we propose a combined convolutional neural network (CNN) – long short-term memory (LSTM) network to infer battery SOC from measurable data, such as current, voltage, and temperature. The proposed network shares the merits of both CNN and LSTM networks and can extract both spatial and temporal features from input data. The proposed network is trained using data collected from different discharge profiles, including a dynamic stress test, federal urban driving schedule, and US06 test. The performance of the proposed network is evaluated using data collected from a new combined dynamic loading profile in terms of estimation accuracy and robustness against the unknown initial state. The experimental results show that the proposed CNN-LSTM network well captures the nonlinear relationships between SOC and measurable variables and presents better tracking performance than the LSTM and CNN networks. In case of unknown initial SOCs, the proposed network fast converges to true SOC and, then, presents smooth and accurate results, with maximum mean average error under 1% and maximum root mean square error under 2%. Moreover, the proposed network well learns the influence of ambient temperature and can estimate battery SOC under varying temperatures with maximum mean average error under 1.5% and maximum root mean square error under 2%.


I. INTRODUCTION
Lithium-ion batteries have gradually become the dominant power source of electric vehicles (EVs) due to their high energy density, high power density, long lifetime and environmental friendliness [1]. As the EV driving environment is usually complicated and the battery will degrade over repeated charge and discharge, a battery management system (BMS) is required to monitor the battery health status and protect the battery from over-charge and over-discharge to ensure the battery operating in a safe window [2]. State-ofcharge (SOC), which reflects the remaining battery charge during one charge-discharge cycle [3], is one of the key The associate editor coordinating the review of this manuscript and approving it for publication was Yanbo Chen. states in the BMS. Accurate SOC information is necessary for estimating EV range and preventing battery failure caused by over-charge or over-discharge. However, it can only be estimated from current, voltage and temperature and other measurable variables as direct measurements of battery SOC is not applicable.
Currently, Ampere-Hour integral method, open-circuit voltage (OCV) method, model-based filtering method, and machine learning method are widely investigated for SOC estimation [4].
The Ampere-Hour integral method estimates battery SOC directly through accumulating battery current over time [5]. This kind of method requires the initial SOC be known in advance and relies on the precision of current sensor. In addition, the underlying numerical integration method also plays an important role. In contrast, the OCV method estimates battery SOC via a look-up table based on the monotonic relationship between OCV and SOC [6]. The OCV method is simple, but it cannot be applied to on-board applications because obtaining precise OCV value requires the battery rest for an adequate time to reinstate the battery to an electrochemically stable condition.
Later on, the model-based method combines the Ampere-Hour integral method and OCV method with mature filtering techniques such as variants of Kalman filter and particle filter to update the ''best'' estimate of SOC recursively [7]. Plett et al. introduced an extended Kalman filter [8] and an unscented Kalman filter [9] to estimate the SOC of lithiumion polymer battery packs. Gao et al. employed a particle filter to estimate the SOC of lithium-ion batteries [10]. The model-based filtering method is very fast and hence suitable for real-time applications, but its performance relies heavily on the quality of battery model [11]. Many models, such as simple model, combined model, one-state hysteresis model, enhanced self-correcting model, and resistance-capacitance network based equivalent circuit model, have been proposed to estimate the SOC of lithium-ion batteries [12]. Most of these models can only work under strict conditions, such as constant ambient temperature and specified battery type. New models must be established when other factors are considered, such as temperature, degradation level, humidity etc.
In contrast, the machine learning method directly models the nonlinear relationships between battery SOC and measured variables through massive collected data [7]. The commonly used machine learning methods include artificial neural networks [13], fuzzy logic [14], support vector machine [15], and so on. While specific battery model is not required for machine learning methods, their estimation performance strongly depends on the quality and quantity of training data. Moreover, the training process takes a long time when large amounts of data are present.
In recent years, the neural network-based deep learning method has drawn much attention from the research world. On one hand, as the booming of computing power owing to the advancement of graphics processing units (GPUs) as well as the advent of mature machine learning frameworks such as TensorFlow, building and training neural networks have been much easier and faster than before [16]. On the other hand, large-scale field data can be gathered and stored via online BMS and then uploaded to remote data servers [4]. Additionally, battery data can also be generated from laboratory tests with dynamic driving regimes. Sahinoglu et al. [17] proposed a recurrent neural network (RNN) to estimate the SOC of lithium-ion batteries. Yang et al. [4] introduced a long short term memory (LSTM) network to estimate the battery SOC from measured voltage, current, and temperature. Yang et al. [16] employed a gated recurrent unit (GRU) network to estimate the battery SOC at varying temperatures and evaluated the performance using two mainstream lithium-ion batteries. Unlike traditional feedforward neural network, the RNN uses hidden nodes to store information of past inputs, allowing the SOC estimation to incorporate the past information. LSTM and GRU are two variants of RNN, which further extend the ability of original RNN for longterm dependency.
Convolutional neural network (CNN) [18] is yet another successful architecture in deep learning research. While LSTM characterizes long-term dependency and is good at handling time series information, the CNN uses convolutional filters to extract interrelations among inputs data.
In this paper, a combined CNN-LSTM network is proposed to model the complex battery dynamics. Specifically, the CNN is used to extract advanced spatial features in the original data, and the LSTM is used to model relationships between current SOC and historical inputs. The proposed network takes advantages of both the CNN and LSTM networks and captures both spatial and temporal features of battery data.
The contributions of this paper are summarized as follows. 1) A combined CNN-LSTM network is proposed to capture the nonlinear dynamics inside the lithium-ion battery and estimate battery SOC with voltage, current, and temperature measurements. 2) Data collected from various well-known dynamic loading profiles including dynamic stress test (DST), federal urban driving schedule (FUDS), and US06, are employed to train the proposed network. Data collected from a combined dynamic loading profile are used to evaluate the SOC estimation performance of the proposed network. 3) Robustness against unknown initial states of the proposed network is investigated. The performance of SOC estimation is compared with the LSTM and CNN networks. 4) The proposed network is trained to learn the influence of ambient temperature and its performance on SOC estimation is evaluated under varying temperatures. The rest of this paper is organized as follows. Section II introduces the experiment design and data collection. Section III illustrates the details of the proposed network for SOC estimation. The estimation results are presented in Section IV. Conclusions are drawn in Section V.

Fig. 1 shows our test bench in Shenzhen Research
Institute lab. The experiments were conducted on an Arbin BT2000 battery tester with cylindrical A123 18650 battery samples (cathode: lithium iron phosphate (LFP); anode: graphite; nominal capacity: 1.1Ah; cut-off voltage: 3.6/2V; end-of-charge current: 0.011A). The battery charge/discharge profile was controlled with Arbin's Mits Pro software. The ambient temperature of battery samples was regulated using a temperature chamber from Votsch.

A. NETWORK TRAINING TEST
To simulate different battery loading behaviors in real-world applications, a set of well-known dynamic loading profiles VOLUME 7, 2019 designed by the US Advanced Battery Consortium [19] were applied to discharge the battery under varying temperatures, including DST, FUDS and US06.
Specifically, the FUDS and US06 driving profiles simulate EV battery usage corresponding to city and highway driving conditions, respectively. The DST profile is a simplified profile resembling characteristics of the FUDS profile. Fig. 2(a) plots the current profiles of DST, FUDS, and US06, respectively. In each test, the battery was first fully charged using the standard constant-current/constant-voltage mode. In the discharge process, one of the above discharge profiles was applied repeatedly until fully discharged. After the DST, FUDS, and US06 tests, a constant current test, where the battery was discharged under a constant current (1.1A), was also conducted. The cumulative capacity calculated during the discharge process served as the nominal capacity of the battery sample.
Finally, to take ambient temperature into consideration, the above tests were repeated under 0 • C, 10 • C, 20 • C, 30 • C, 40 • C, 50 • C, and room temperature (RT, around 27 • C). The voltage, current, and temperature data were sampled every 1 second. Fig. 2(b) shows the discharge voltage measurements at room temperature corresponding to the DST, US06, and FUDS tests, respectively.

B. NETWORK EVALUATION TEST
To simulate the complex real-world EV battery loading behaviors, the DST-FUDS-US06 (DFU) profile, which combines the DST, FUDS, and US06 profiles, was used to evaluate the SOC estimation performance of the proposed network. During discharge, the DFU profile was adopted repeatedly until fully discharged. The DFU test was conducted under 0 • C, 10 • C, 20 • C, 30 • C, 40 • C, 50 • C, and RT to construct the testing data sets for SOC estimation. Fig. 3 shows the measured current, voltage, and calculated SOC from Ampere-Hour integral method during a DFU test at room temperature.

III. STATE-OF-CHRAGE ESTIMATION BASED ON THE COMBINED CNN-LSTM NETWORK
In this section, a combined CNN-LSTM network is proposed to model the highly nonlinear dynamics of lithium-ion batteries and estimate battery SOC from measurable voltage, current, and temperature variables. The CNN layer focuses on the current input and manages to extract the spatial features of battery data, then combines into high-level features. While the LSTM uses hidden cell memories to store information of past inputs, which is more suitable for processing timeseries data. The details of the CNN and LSTM networks are described in the following.
A. CONVOLUTIONAL NEURAL NETWORK CNN, proposed by Lecun et al. [18], is a feedforward neural network effective for pattern recognition and feature extraction. As in Fig. 4, a typical CNN usually consists of an input layer, a convolutional layer, a pooling layer, a fully connected layer, and an output layer. With a list of filters, the CNN extracts the topological features hidden inside the data through layer-by-layer convolution and pooling operations. The CNN can use few parameters to capture the spatial features of the input and combine them to generate high-level features. These features are then fed into the fully connected layer for further classification or regression.
Although CNN is known for great success in dealing with 2D images, there is no difficulty in applying the same idea to  1D data [20]. In this paper, 1D convolution is adopted to capture the spatial features of battery variables. A convolutional layer is added such that the input information runs through a convolutional operation and an activation function before flowing to the next layer, where * denotes the discrete convolution between the input signal x k and the filter weight W cnn ; b cnn is a bias parameter which shall be learned during training; σ cnn is the underlying activation function.
To capture different features, several filters of the same size are adopted in one convolutional layer. The input signal convolves with each filter and the results are then stacked together as the output, which is illustrated in Fig. 5, where one convolutional layer with two filters are present. The convolution operation is visualized as a sliding window of the same size moving along the input with certain stride, where for each stay of the window, the inner product between the filter and the examined portion of input is computed as one element of the output. For example, when using filter (−1, 0, 1) with stride two and no bias, the first output is 2 × (−1) + 3 × 0 + 5 × 1 = 3 and the second output is Since the space dimension of battery data on SOC estimation is limited, the pooling layers are not employed in this work.

B. LONG SHORT-TERM MEMORY
LSTM, proposed by Hochreiter et al. [21], is one of the most popular variants of RNN [22]. Due to gradient vanishing or explosion, RNN is incapable of addressing long-term dependency using classic gradient based training framework [23]. The LSTM network, in contrast, uses hidden memory instead of ordinary hidden nodes to avoid such drawbacks. Fig. 6 shows the structure of an LSTM unit, which contains three types of gate: the input gate i, which determines how much proportion of current input shall merge into the cell memory; the forget gate f , which characterizes the forget rate of the cell memory given current input; and the output  gate o, which controls how the cell memory shall influence the node output. At time k, the forward pass of an LSTM unit is proceeded as follows: where '•' denotes the Hadamard product; x k is the unit input at time k; h k is the corresponding unit output; c k is the hidden unit memory; i k , f k , and o k are the activation vectors of the input gate, the forget gate and the output gate, respectively; σ g , σ c , σ h are activation functions, where σ g is a logistic sigmoid function while σ c and σ h are both hyperbolic tangent functions; and W , U , and b are weight matrices and bias parameters to be learned during training. To see how the gating process works, take the forget gate for example, the gating factor f k is the output of a sigmoid function and hence every element of which lies between 0 and 1. After the gating operation, old cell memory tends to fade out when elements of f k approaches 0 and will be preserved when f k approaches 1. In other words, f k can be interpreted as an effectiveness factor determining how old memory is retained as new input is available. The input gate and the output gate function in the same way.

C. PROPOSED CONVOLUTIONAL LSTM NETWORK
When inferring SOC, two kinds of features exist, the spatial feature which is the interrelations within current input and the temporal feature which is the correlations between current SOC and past inputs. To attend to both the spatial and temporal features of battery data, we propose a combined CNN-LSTM network for accurate and robust battery SOC estimation. Specifically, the CNN is used to extract more advanced spatial features in the original data, and the LSTM is used to model relationships between current SOC and historical inputs. are fed into the network. The selection of average current and voltage signals as input refers to [24], in which better performance was achieved when the average current and voltage were present. Specifically, the average current and voltage are calculated over 20 precedent time steps in this work. Next one convolutional layer with six filters of length three is followed to extract the spatial features of battery input parameters. Then one LSTM layer with 300 hidden nodes is added to learn the temporal features of battery dynamic evolution. According to [16], [24], [25], it is sufficient to depict the temporal nonlinearity inside the battery with one LSTM layer. Moreover, examination with the networks of varying number of LSTM nodes reveals that 200 to 500 nodes are suitable for SOC estimation of LFP batteries. Finally, a fully connected layer with 80 nodes is used as a regression layer, spitting out the final SOC estimation.
The effect of 1D convolutional layer is reflected as following. By choosing the weight of convolution kernel and the width of window, different data features can be extracted to better serve as the input of LSTM layer. From signal processing point of view, performing 1D convolution is equivalent to applying discrete Fourier transform or wavelet transform with the same kernel to the raw data, hence extracting the characteristics in the frequency domain. Now the LSTM network explores the correlations of the current output with the past inputs, the introduction of CNN forces the network to also exploit the relationships within current input. Such relationship may present in a vague or unintuitive way, but can be generally understood as how current, voltage, temperature, mean current and mean voltage interrelates with each. Learning these features is reflected as training the CNN network towards reducing the estimation error.
During the training process, mean square error (MSE) is chosen as the overall loss function evaluated at the end of each forward pass: where y k is the true SOC value whileŷ k is the output of the proposed network at time k. Adam optimizer [26] is selected to minimize the total loss, which updates the network weights and biases based on the gradient of the loss function. The initial learning rate is set to 0.01. The decay rates are set to 0.9 and 0.999, respectively. Considering possible overtraining during the training phase, a dropout rate of 20% is used in the LSTM layer and fully connected layer [27].
In the testing process, the root mean square error (RMSE) and mean absolute error (MAE) are used to evaluate the performance of the proposed network: MAE measures how close the estimation is to the true values neglecting the sign. In contrast, the RMSE is more sensitive to large errors, and characterizes the variation of errors.

IV. RESULTS
The proposed CNN-LSTM network in Section III is trained with data collected from the DST test, the FUDS test, and the US06 test, and the performance of online SOC estimation is evaluated with data collected from the DFU test. The input of the network isx k = [I k , V k , T k , I avg,k , V avg,k ], while the output is the corresponding SOC estimation, namely y k = [SOC k ]. Section IV-A presents the estimation results at room temperature, while Section IV-B provides estimation results under varying temperatures. All the training processes are implemented on a server with two GeForce GTX 1080 Ti GPUs.

A. SOC ESTIMATION AT ROOM TEMPERATURE
In this section, the proposed CNN-LSTM network is trained with the DST data (8438 samples), the FUDS data (8390 samples), and the US06 data (7987 samples) at room temperature, and the performance of online SOC estimation is evaluated with the DFU data (8350 samples) at room temperature. While large training epoch generally enhances model accuracy, the training time grows accordingly. To determine an appropriate training epoch, the RMSEs of the training and testing data sets versus training epochs are plotted in Fig. 8, where the training epoch increases by 200 until 15000. As in Fig. 8, the RMSE quickly drops below 4% after 2000 epochs, and then almost keeps within 2% after 6200 epochs. Fluctuations are observed around epoch 6000∼8000 and 11000∼12000, where the RMSEs increase abruptly but then quickly stabilize, indicating the optimization algorithm hopping from one local optima to another. Training and testing error reach global minimum between epoch 8000 to 11000. Hence 10000 is a justified choice for training epoch selection. The performance of the proposed network is compared with the LSTM and CNN networks. The LSTM network is the proposed network without the convolutional layer. The CNN network has three-convolutional layers with six filters in each layer, where each hidden layer has the same size as the input layer, and zero padding is used such that data length is preserved in the subsequent layers. All networks are trained with 10000 epochs. The training times of the proposed network, the LSTM network, and the CNN network are 161 minutes, 231 minutes and 102 minutes, respectively. For the SOC estimation on our laptop, the average computation time at each time step is 0.098ms, 0.116ms, and 0.082ms, respectively. Fig. 9 shows the SOC estimation results with SOC starting from 100%. Compared with the LSTM network and the proposed network, the CNN network which is independent of past inputs yields a much fluctuating estimation. In contrast, the estimated SOCs from the LSTM network and the proposed network are much smoother and more accurate. The estimation errors of the proposed network and the LSTM network are also plotted in Fig. 9(b), where the estimation errors of the proposed network stay within 2%, while for the LSTM network, the worst estimation error exceeds 4%. In this case, both the proposed network and the LSTM network yield satisfying results, with the proposed network been slightly better. In practice, the initial battery SOC is not always known a priori, hence it is vital that the proposed network is robust against unknown initial SOC state. Rather than fixing the initial SOC at 80% and then performing the discharge test, data with initial SOC at 80% are generated by removing those data with SOC greater than 80% in Fig. 9. Other initial SOC data are generated in the same way.
When SOC starts from 80%, as in Fig. 10, being a feedforward neural network, the CNN network presents almost identical estimation results as those in Fig. 9. In comparison, the performance of the LSTM network and the proposed network are two-stage. In the first stage, the performance VOLUME 7, 2019 of the two networks are dominated by the unknown initial states, where compared with the LSTM network, it takes longer time for the proposed network to track the true SOC. After this period, the proposed network presents smaller and more consistent estimation errors similar to Fig 9, which can be seen from the estimation errors plotted in the Fig. 10(b). Statistically, the overall RMSE and MAE of the proposed network are 1.35% and 0.87%, respectively, slightly smaller than those of the LSTM network (RMSE: 1.43%, MAE: 0.95%).
Additionally, Fig. 11 presents the estimation results with initial SOC at 60%. As expected, the estimation results of the CNN network resemble those in Fig. 9. This time, the proposed network converges to the true SOC much faster than the LSTM network. In the second stage, the proposed network is again more stable and accurate. The RMSE and MAE of the proposed network are 0.92% and 0.48%, respectively, while those of the LSTM network are 2.97% and 2.03%, respectively.
While the CNN network still yields the worst performance, it is least influenced by unknown initial states. For the LSTM network and the proposed network, the SOC estimation is first dominated by unknown initial SOC. Once the networks converge to the true SOC, the proposed network presents better performance in terms of consistency and accuracy.
More statistical results are tabulated in Table 1, where initial SOC decreases from 100% to 20% by 20%. In all cases, the proposed network yields smaller RMSEs and MAEs than the LSTM and CNN networks. It is observed that the estimation RMSEs and MAEs are greater during 40% to 80%, rather than increasing with initial SOC bias. This can be explained by the existence of flat region in the OCV-SOC curve for the LFP batteries. Fig. 12 presents the typical OCV-SOC curve for the LFP battery under room temperature, where the 40%∼80% SOC region is relatively flat, meaning that measurable battery physical states are quite stable among this range, which is desirable for the battery as a power source. On the other hand, this property makes inferring the initial SOC becomes much harder, since tiny deviation in the OCV corresponds to great deviation in SOC estimation.

B. SOC ESTIMATION AT VARYING TEMPERATURES
To capture the effects of ambient temperature, in this section, the proposed network is trained to learn the battery dynamics under varying temperatures. The proposed network is trained using the DST, FUDS, and US06 data under 0 • C, 10 • C, 20 • C, 30 • C, 40 • C, and 50 • C. Then the proposed network is tested using the DFU data under 0 • C, 10 • C, 20 • C, 30 • C, 40 • C, 50 • C, and RT, respectively. Fig. 13 shows the estimation results under 0 • C, RT, and 50 • C, respectively. The proposed network produces satisfying estimation results, with RMSEs of 1.46%, 1.31%, and 0.82%, respectively. The RMSE and MAE results of all cases are tabulated in Table 2, where all RMSEs are within 2% while MAEs are within 1.5%. Therefore, the proposed network can capture the influence of ambient temperature and provide good SOC estimation under varying temperatures.

V. CONCLUSION
In this paper, we proposed a combined CNN-LSTM network for the SOC estimation of lithium iron phosphate batteries. The network was trained using data collected from different discharge profiles, including the DST, US06 and FUDS profiles. Data collected from a new combined DFU profile were used to evaluate the performance of the proposed network on SOC estimation.
Experimental results showed that the proposed network can capture the nonlinear correlations between SOC and network input variables, namely current, voltage, temperature, average current, and average voltage successfully. In case of unknown initial SOCs, the network converged to the true SOC quickly, and then presented smooth and accurate estimation with overall RMSEs under 2% and MAEs under 1%. Compared with the LSTM and CNN networks, the proposed network presented smoother estimation results and better tracking accuracy in all test cases. Besides, the proposed network well learned the influence of ambient temperature and provided satisfying SOC estimation under varying temperatures, with RMSEs within 2% and MAEs within 1.5%.
Finally, the battery dynamics is influenced by aging upon repeated usage. To consider the effect of battery aging, we proposed to update the network parameters regularly. Based on our experience, a two-month or even longer gap is acceptable as the battery aging process is slow.
XIANGBAO SONG received the M.Phil. degree in electronic and computer engineering from The Hong Kong University of Science and Technology, in 2016. He is currently a Researcher with the Shenzhen Research Institute, Googol Technology. His research interests include robotics, motion planning, signal processing, prognostics and health management, and deep learning. KWOK-LEUNG TSUI was a Professor with the School of Industrial and Systems Engineering, Georgia Institute of Technology. He is currently the Chair Professor of industrial engineering with the School of Data Science, City University of Hong Kong, and the Founder and the Director of Center for Systems Informatics Engineering. His current research interests include data mining, surveillance in healthcare and public health, prognostics and systems health management, calibration and validation of computer models, process control and monitoring, and robust design and Taguchi methods. He is a fellow of the American Statistical Association, the American Society for Quality, and the International Society of Engineering Asset Management, and a U.S. representative to the ISO Technical Committee on Statistical Methods. He was a recipient of the National Science Foundation Young Investigator Award. He was the Chair of the INFORMS Section on Quality, Statistics, and Reliability and the Founding Chair of the INFORMS Section on Data Mining.