ARNS: A Data-Driven Approach for SoH Estimation of Lithium-Ion Battery Using Nested Sequence Models With Considering Relaxation Effect

In recent years, lithium-ion batteries (LIB) have been used widely in portable electronic devices because of their advantages of durability, stability, high-capacity, low-cost, light-weight and small-scale. It makes LIB also deployed in various complex systems, in which efficient prediction of battery data, especially state-of-health (SoH), becomes crucial to ensure that the systems work stably without risks of power interruptions. With the recent improvement of Artificial Intelligence (AI) technologies, many works have been reported using deep learning (DL) models to investigate this problem, since such models can potentially increase their performance with more training data. This is also our direction in this research, which introduces a novel data-driven approach so-called Autoregression Nested Sequence (ARNS). On one hand, we come up with a nested sequence model to efficiently aggregate channel-wise and cycle-wise information, both of which are closely related to the operations of LIB. On the other hand, we incorporate relaxation effects into the model operations to handle peak prediction. To the best of our knowledge, ARNS is the first sophisticated deep learning model that combines all those features into a whole predictive system. The experimental results obtained using the NASA and CALCE datasets confirm significant improvement of ARNS, especially when dealing with peak periods in different SoH of multiple cycles.


I. INTRODUCTION
In recent years lithium-ion batteries (from now on referred to as LIB) are dominating in portable electronics and penetrating the electric vehicle market due to an unmatchable combination of high energy density, high voltage, and long life cycle [1]. In [2], the high energy efficiency of LIB may also allow their use in electric grid energy storage and other Internet of things (IoT) applications, including improving the quality of energy harvested from wind, solar, geothermal, and other renewable sources, thus contributing to their more widespread use and building an energy-sustainable The associate editor coordinating the review of this manuscript and approving it for publication was Kathiravan Srinivasan . economy. Especially, as shown in [3], grid energy storage applications relying on LIB have continuously been increased nowadays.
However, LIB performance declines over time (calendar aging) and use (cycle aging), which can lead to degraded performance, costly replacement, operational impairment, or even catastrophic consequences [4]. Hence, it is necessary to take measures to improve the reliability and safety of lithium-ion batteries. Engineering practice shows that the Battery Management System (BMS) [5] is essential for ensuring the safe, effective and reliable operation of LIB. BMS can also provide the best performance management and life extension solution for energy storage. The main tasks of the BMS include online monitoring of the battery's voltage and current, as well as providing real-time State-of-Health (SoH) and State-of-Charge (SoC) [6].
In this paper, we focus on the SoH aspect. Generally, SoH estimation is a critical metric in a BMS to quantify the extent of degradation. Basically, SoH reflects the condition of a battery, compared to its ideal conditions. This indicator is the battery capacity, which is defined as the ratio of current maximum available capacity C max over the rated value given by the manufacturer C rated as: Therefore, the issue of SoH estimation can be converted to capacity estimation. In actual use, the C max of LIB declines over time, which is the overall trend of the data. However, due to the time-varying external environment variables and the complexity of internal electrochemical performance in a practical application, the battery's capacity degrades irregularly (Figure 1a and Figure 1b), which increases the difficulty of SoH estimation.
Many studies have been conducted to determine an accurate estimation of SoH for lithium-ion batteries. These methods include impedance spectroscopy and electrochemical techniques, or model-based methods that rely heavily on estimating or identifying the characteristic parameters of LIB [9], [10], [11]. More recently, with the advancement of Artificial Intelligence (AI), in particular machine learning, data-driven methods based mainly on data using analytical, statistical, and machine learning models, without prior knowledge of the battery's inner workings [12] have gained more attention. Especially, Park et al. [13] proposed a deep learning model based on Long Short-Term Memory (LSTM) that achieved a significant result in capacity and Remaining Useful Life (RUL) prediction, which is still considered state-of-the-art in this area. In this research, Park et al. leveraged internal information of the batteries, or channels, including voltage, current, and temperature throughout the batteries working lifecycles, known as multi-channel information. Nevertheless, apart from such channel-wise data, this work has not leveraged aggregated cycle inputs (cycle-wise information). In our work, we aim to exploit both channel-wise and cycle-wise information about the battery, presenting a nested sequence model as subsequently discussed. The method proposed by Park et al. [13] is therefore adopted as the baseline of this paper.
One of the main difficulties of SoH estimation is that the LIB' capacity degrades after each cycle but occasionally spikes up. This may have been a result of LIB relaxation effect, which can be expressed as follows. If the battery rests some time longer than the regular break time between two consecutive cycles, it will lead to recovery of the battery. This dependency could increase the available capacity for the next cycle [6]. The relaxation effect is a typical feature of the LIB degradation process and should be considered in SoH estimation.
When considering some existing popular datasets in the area such as NASA, CALCE, or Oxford datasets, deep learning architects have proved to provide high-accuracy output for SoH trend prediction. However, there are just a few models to utilize multi-channel information extracted from LIB practical usages. Meanwhile, the relaxation effect may be very helpful to predict peak cycles. To the best of our knowledge, there is no existing research to combine these two approaches. In this paper, we leverage a deep learning model into the SoH estimation of LIB that have been impacted by the relaxation effect. The main contribution of this paper is that we introduce an AI-based Data-driven SoH estimation method that also considers the relaxation effect, called Auto Regression Nested Sequence (ARNS) model. As compared to other approaches, ARNS introduces the following novel contribution: • We introduce a mechanism of nested sequence model to handle seamlessly both channel-wise and cycle-wise data of LIB during their charge cycles.
• We leverage the incorporation of relaxation effect (RE) information into the model to specifically handle the peak prediction.
On top of that, to predict the trend of sequences, we adopt the Auto Regression branch instead of an additional LSTM layer to reduce model computation.
To the best of our knowledge, ARNS is the first work that incorporates the RE into a deep learning model. Our experiments also confirmed that ARNS outperforms other models, especially when predicting SoH at the peak cycles.
The rest of this paper is organized as follows. Section II explains the Battery Data which are analyzed in our problem. Next, Section III describes Related Works needed to be understood, prior to the application of SoH prediction. In Section IV, preliminary knowledge of sequence-based deep learning models is presented. Later on, Section V proposes and discusses in detail our ARNS model. In Section VI, the experimental results obtained using benchmarking datasets are presented, together with quantitative data and comparison. Finally, Section VII concludes the paper.

II. BATTERY DATA
In this study, we use the two LIB datasets known as NASA and CALCE. Data from the NASA repository was collected from a custom-built battery prognostics testbed at the NASA Ames Prognostics Center of Excellence (PCoE) 1 [7]. This dataset contains the test results of commercially available lithium-ion 18650-sized rechargeable batteries [14] in which we selected 4 batteries #5, #6, #7 and #18 that are widely used in SoH estimation [6], [15]. Similarly, the CALCE data was given for the usages of battery state estimation, remaining useful life prediction, accelerated battery degradation modeling, and reliability analysis. This battery type has a rated capacity of 1.1Ah with the discharging current (1.1A), and the cycling test of the batteries is implemented under ambient temperature. The 4 batteries #35, #36, #37, #38 are focused in comparison in this paper based on similar works in this area [16]. Figure 1a and 1b summarize the capacity degradation of the batteries through their working cycles.

A. BATTERY DEGRADATION DATA
Typically, the lithium-ion batteries were run through 3 different operational profiles (Charge cycle, Discharge cycle, and Impedance Measurement), in which Impedance Measurement can be used in some typical experimental methods to predict SoH by analyzing the battery's electrochemical impedance spectroscopy [11].
In the NASA dataset, the Charge cycle was carried out in Constant Current (CC) mode at 1.5A until the battery voltage reached 4.2V and then continued in a Constant Voltage (CV) mode until the charge current dropped to 20mA. The Discharge cycle was carried out at a constant current (CC) level of 2A until the battery voltage fell to 2.7V, 2.5V, 2.2V and 2.5V for batteries #5, #6, #7, and #18 respectively. However, in the CALCE dataset, degradation tests were performed for a set of cells with LiCoO 2 cathode of 1.1 Ah capacity rating.
The cells were charged and discharged repeatedly subjected to a charging profile using CC-CV protocol in ambient temperature (1 o C). The current was maintained at 1A till the voltage reached 4.2V. The charge was maintained at 4.2V until the charging current was reduced to 0.05A. The failure threshold for the cells was configured to be 0.88Ah. The specification for each batteries data are presented in Table 1 and Table 2.

B. REST PERIODS
Normally, the LIB data include two-time scales, calendar time and a number of cycles [6]. From now on, we denote t s and t c to indicate the calendar time and the index number of cycles, respectively. Thus, for each cycle, the time scale t c can be mapped to a specific calendar time scale t s . From Figure 2, it can be observed that the capacity regeneration is often caused by the calendar time gap between two cycles exceeding some threshold [17]. Therefore, an effective SoH estimation model should try to encode this information.
In the past, various works researched the correlation between rest period and lithium-ion battery's capacity regeneration, in other words relaxation effect. Paper [18] showed that rest durations provided to a Li-ion cell after every discharge and charge normalize the gradients of concentration and potential in the electrolyte and electrodes. This leads to better utilization of cyclable lithium and therefore regeneration of the battery's capacity. Moreover, the research also showed that a sufficiently long relaxation of the cell at the end of discharge could result in (a) a higher concentration of lithium in the solid matrix of the negative electrode; and (b) a lower concentration of lithium in the positive electrode, both leading to a higher cell potential, which has significant influence over battery performance in the subsequent cycle.
Furthermore, [19] proved that the amount of capacity regenerated is not linear to the length of the previous rest period. Moreover, experiments with various breaks showed that the duration of the relaxation time is more important than the sum of rest periods in deciding the cycle capacity. In this research, we aim to incorporate the relaxation effect with modern deep learning models to improve the SoH prediction of LIB, especially at peak periods.

III. RELATED WORKS
In this paper, we focus on data-driven methods for SoH prediction. The advantages of those methods are that they do not require specific knowledge about the battery working principles, as such knowledge may be hard to be obtained in VOLUME 10, 2022    Typical data processing pipeline of shallow approaches for battery SoH estimation [20]. some specific situations. Instead, those methods only depend on the collected aging data and thus can be automatically processed without much expert knowledge [6]. Figure 3 presents basic steps commonly found in datadriven approaches for SoH estimation [20], which generally produce output as prediction models. Such models are normally initialized with random values. In order to train the model, the historical data from recorded cycles are extracted. The training process has taken place by repeatedly performing model specific calculation based on the difference between the intermediate and measured ground truth values.
The model is then adjusted gradually (i.e fine-tuning its parameters) until it has converged and can be used for providing the estimation for the new SoH data.
Based on the insight of the model training process, one can classify data-driven methods into two major categories: (i) classical machine learning methods, which include statistical approaches and other shallow learning methods; and (ii) deep learning, which were recently developed based on the advancement of this field.

A. STATISTICAL AND SHALLOW LEARNING APPROACHES
Generally, the typical approaches for SoH estimation attempt to fit the degradation law of capacity generally through machine learning methods based on degradation data of LIB and estimating the SoH by extrapolating the capacity in the next cycle.
Statistical methods can be applied to the whole process of SoH prediction. On the one hand, these methods are effective to extract and re-produce effective signals and waveforms from data that suffer from much noise signals. Hence, statistical approaches can be adopted to perform noise filtering to obtain clean waveform data on which models are generated for further predictions. On the other hand, statistical methods can be used directly for SoH prediction as a machine learning approach. It is also noted that currently there is still no clear distinction between statistical and machine learning methods in the literature review. However, the statistical and classical/conventional learning machine, e.g., Bayes Naive Classification or Support Vector Machine share some common points as follows: (i) those methods try to produce output from analysis of training/input data (the learning process); and (ii) the learning process of those methods do not generate a large degree of hidden (or latent) features like deep learning models, and therefore they also do not use too many trainable parameters like deep learning models. Hence, we regard them as the same group of shallow learning.
Commonly used statistical filtering methods are mainly divided into Kalman filtering (KF) [21], Particle filtering (PF) [22] and Single Particle model [23]. To enhance the model with higher accuracy and robustness, once proposed Adaptive Hybrid model [24], and Constraint Propagation for LIB data [25] for the optimal design parameters enhancement of identifiability and states observability. There have been some approaches with these methods that take into account the relaxation effect, one of such is proposed in [17].
After the noise filtering steps, some process modeling methods may be further applied using probability theory and stochastic processes to analyze the variation law of historical monitoring data [6], [26]. This type of method often estimates the SoH by establishing a stochastic degradation model of battery capacity [26]. Using stochastic process modeling methods, Xu et al. [6] also introduced a method based on the Wiener Process with modeling the relaxation effect.
Recently, there are many studies using the classical machine learning methods (known as shallow learning) to deal with SoH estimation. Some examples of shallow learning approaches include Bayesian predictive model proposed by Hu et al. [27], Hidden Markov model by Piao et al. [28], Liu et al. [29] used Gaussian process regression and Relevancevector machine [30]. The internal computations of those shallow learning approaches are still based on statistics, but their fine-tuning processes are performed step-wise until the final convergence is reached.
Even though some remarkable results have been reported, shallow learning methods encounter a common issue when dealing with massive datasets, not only in LIB data but also in all other major domains, is that they hardly take full advantage of large volume and high-dimensional information. To be precise, it is commonly observed that shallow learning methods get saturated after handling a certain volume of training data. After this point, the models hardly improve with more training data provided. In addition, recent surveys on various domains also show that the shallow learning models hardly scale due to the overfit [31], [32]. With the advancement of deep learning methods nowadays, the deep learning models are expected to be capable to work with far more training data before getting saturation points and thus potentially getting better performance [33]. Hence, there are many recent works reported in this direction, which are discussed subsequently.

B. DEEP LEARNING METHODS
Deep learning (DL) is a machine learning concept based on Artificial Neural Networks (ANN) and other advanced variants, such as RNN, CNN, GNN, etc, that typically consist of more than one hidden layers, organized in deeply nested network architectures [34].
An advantageous point of deep learning models is that they are particularly more helpful than shallow learning in domains with large and high-dimensional data [34]. With recent advancements of deep learning, many researchers have also considered leveraging these models in data-driven approaches to SoH estimation. Khumprom and Yodo [14] presented a method using deep neural networks to predict the SoH and the RUL of the lithium-ion battery. Xia and Abu Qahouq [35] proposed an adaptive SoH estimation method utilizing a Feedforward Neural Network (FNN) and online AC complex impedance. She et al. [36] proposed a method for a battery aging assessment using a Radial Basis Function (RBF) neural network. Eddahech et al. [37] showed that Recurrent Neural Networks (RNN) can be used to predict performance decline in batteries. Tian et al. [38] proposed a deep neural network to estimate the charging curves of batteries from which SoH can be computed. Shen et al. [39] proposed a method to predict the usable capacity of a battery using a deep Convolutional Neural Network (CNN) with current and voltage measurement data.
Perhaps the most noteworthy approach in this direction is the use of recurrent networks like LSTM [13], [40], because the nature of this network is very suitable to handle sequence or time-series data. Thanks to the capability of time-series processing, LSTM-based approaches have been proven as one of the most effective methods for SoH prediction.
Traditionally, in LSTM-based approaches, the previous information of the battery capacity in some last cycles will be used to predict the corresponding capacity statuses in the next cycles. However, some additional information from other channels, such as voltage, current, and temperature, can be complementarily used for an insightful analysis of the battery status. Thus, the collected channel-wise data of a battery become those of multivariate time series [13]. In [13], a multichannel charging model is proposed for the prediction task. The paper also incorporates the relationship between capacity and charging profiles, archived by trainable neural networks. Similarly, other partial data is used in [41] using a statistical feature selection and engaging both charge and discharge cycles. These deep learning studies provide an alternative approach that can potentially improve the accuracy of SoH prediction, since deep learning may scale better with large datasets as previously discussed. Then, deep learning would enjoy more benefits in the future, once more data can be gathered and computational infrastructure can also be further upgraded to support more computational ability.
In particular in the work of Park et al. [13], an approach that combined multi-channel charging profiles and an LSTM model to improve the accuracy of the SoH prediction task is introduced. So far, this work is considered as the state-of-theart result in this area and also adopted as the baseline of this paper, from herewith referred to as BaseLSTM model.
It is also noted that recent studies in this area also introduced some advanced DL models. In [42], the authors introduced a seq2seq model which can foresee the SoH for one cycle. The seq2seq (or sequence-to-sequence) is a DL model including an encoder and a decoder, each of those is a sequence-based deep learning architecture. Normally, the encoder will encode the input as a context vector, based on VOLUME 10, 2022 which the decoder infers the corresponding output. In [43], a stacked CNN model was presented for the aging prediction of Lithium-ion batteries. However, neither of them concerned the contribution of the relaxation effect, which is the focus of our works.

IV. SEQUENCE-BASED DEEP LEARNING MODELS A. RECURRENT NEURAL NETWORKS (RNN)
Recurrent Neural Network (RNN) is a deep learning model commonly used for time series problem solving and other kinds of sequence data [44]. In an RNN model, the past information is preserved through the hidden layer and then passed through each cell of the RNN. The RNN is considered one of the most common and earliest models for sequencebased prediction. Depending on the number of inputs and outputs, RNN is classified into different types, such as oneto-one, one-to-many, many-to-one, and many-to-many.
RNN has various practical related applications such as speech-to-text, sentimental classification, machine translation, video recognition, etc. However, the RNN performance is limited by some constraints noticed as sequential and slow training, vanishing and exploding gradient, and longsequence processing difficulties. Therefore, the next version of RNN, which is LSTM, was introduced to improve the above limitation.

B. LONG SHORT-TERM MEMORY (LSTM)
Long Short-Term Memory (LSTM) is an upgraded version of RNN. In the standard LSTM model, processing information is more complicated when modules containing computational blocks are repeated over many timesteps to selectively interact with each other in order to determine what information will be added or removed, by a means of a memory cell. This process is controlled by three gates namely input gate i, output gate o, and forget gate f . Basically, a cell of an LSTM is depicted in Figure 4. Firstly, forget gate is used to determine the level of information rejection in the cell: Secondly, the cell will get the input gate to update the current memory cell: Thirdly, it will continue to calculate and cell state the C t of the current memory cell: The last thing is to calculate the output gate for the current memory cell: In the above equations i, f , o, C, h denote input gate, forget gate, output gate, internal state, and hidden layer, respectively. Here, W i , W f , W o , W C , and b i , b f , b o , and b C represent the weights and bias of three gates and cell, in the order given. Concretely, the activation function sigmoid helps an LSTM model to control the flow of information because the range of this activation function varies from 0 to 1 so that if the value is 0 then all of the information will be cut off, otherwise the entire flow of information will pass through. Similarly, the output gate will allow information to be revealed appropriately due to the sigmoidal activation function then the weights will be updated by the element-wise multiplication of the output gate and internal state activated by the non-linearity tanh function. With the pivotal component is the memory cell accommodating three gates, one input, one forget and one output gate, LSTM has overcome the limitations of RNN by enhancing the ability to remember values over an arbitrary time interval by regulating the flow of information inside the memory cell.

C. BIDIRECTIONAL LONG SHORT-TERM MEMORY
Bidirectional Long Short-Term Memory (Bi-LSTM) is an upgraded version of LSTM. Bi-LSTM is proven especially helpful in the occasions where the inputs are of sequence form and could be processed in both forward and backward directions to infer more information [45]. In a Bi-LSTM network, there are two unidirectional LSTMs in opposite directions, as shown in Figure 5. Bi-directional LSTM information flows from backward to forward to enrich information encoded in each step and better understand the context. The model adopts the combination of the input data from the past and the future to give the output of the current time frame. First, the input will go through a forward LSTM layer to get the output of this layer: Then also from this input through the backward LSTM layer to get the output of this LSTM layer: Finally, concatenating these outputs, we get the final output: In the above equations h f t , h b t are outputs of the forward hidden and backward hidden layers considering the input in Bi-LSTM is useful for sentiment analysis, PoS tagging, and text classification problems [46].

V. AUTOREGRESSION NESTED SEQUENCE MODEL
This section introduces a new architecture to process both the multi-channel charging profile and the rest periods information of LIB. The architecture, named Autoregression Nested Sequence model (ARNS), uses a model handling nested sequences together with an Autoregression model. Figure 6 gives the overall architecture of the ARNS model. Generally, the flow processing in the architecture includes two major branches as follows.

A. OVERVIEW ARCHITECTURE OF ARNS
• Linear Branch employs the Autoregression model to learn information about the trend of the sequence, i.e, to predict whether the trend will ascend or descend in the near future with an appropriate gradient.
• Nonlinear Branch is used to handle spike, or peak, prediction. Even though some works are reported to use linear approaches for effectively predicting this information in some datasets [47], we believe that the nonlinear approach is suitable for all general datasets. The mechanism of this branch is realized by incorporating relaxation information in a nested sequence model, which consists of 2 Bi-LSTM-based sequence networks, so-called local sequence network and global sequence network.

1)
Local sequence network is intended to process channel-wise information, which is internal information of the battery occurring within a charge cycle. 2) Global sequence network, in the meantime, aims to handle intracycle information, which generally includes the output of local sequence network and cycle-wise information of the battery.
• In addition, our ARNS model also introduces a Final Dense Layers to use the output of the Linear Branch and Non-linear Branch.
We also note that the details of the above-mentioned channelwise information and cycle-wise information will be given in the subsequent sections.

B. FEATURE EXTRACTION FROM BATTERY DATA
In the common working mode, existing LIB in NASA [7] and CALCE [8] datasets stretch up to approximately 1-3 years, equivalent to 100-2000 charge and discharge cycles throughout the working lifespan. During each cycle, LIB internal channels such as current, voltage, temperature or internal resistance are measured, marking multi-channel charging profile information. It is also noted that a battery capacity can be quantified in the discharge stage by measurement of stored coulombs [48]. Data collected during these cycles will be adopted for the later task of battery SoH prediction. We use such information to extract features of battery data for further processing. The battery data can be classified into two types as follows.
• Channel-wise data: they are data collected from the above-mentioned multi-channel charging profile of the battery, represented as a channel-wise feature vector.
• Cycle-wise data: they are useful aggregated data of the battery in a cycle, represented as a cycle-wise feature vector.

1) CHANNEL-WISE DATA
Channel-wise data is extracted from the multi-channel charging profile of the battery. Figure 7 shows the charging profiles of voltage, current, and temperature for different states-ofhealth in NASA dataset (90%, 80% and 70%) [13]. Obviously, the charging profiles of voltage, current, and temperature change for each SoH value. It should also be noted that the data come from an identical source, which may have been pre-processed for outlier and noise removal. Such datasets perhaps can be used for research like ours. In the real conditions strategy, the current approach could not be held because the cycles may be incomplete and the relationship between input and output proposed in this paper may be changed. Hence, our approach will work with the current setting with the aforementioned existing datasets.
However, as the multi-channel information of the charging profiles is continuous, we perform sampling to extract the channel-wise feature vectors eventually. As channel-wise data is only used as the input of our Local Sequence Network in the non-linear branch, we will discuss the sampling process and the channel-wise feature vectors in this section. The sampled channel-wise D chan k is represented as three vectors of length S described in Eq. 11.
where V s k , and I s k are the s-th sample points of voltage, current at k-th cycle, respectively. The T s k or R s k are the temperature (for NASA dataset) or the internal resistance (for CALCE VOLUME 10, 2022

dataset) correspondingly. And S is the number of sample points.
It is noted that when developing predictive models in the CALCE dataset, we also use internal resistance as an input parameter. This had been proposed in some studies such as [49] and [50]. In this paper, we do not use the current internal resistance as the input. Instead, these above values of previous cycles are used in SoH prediction of the current cycle, similar to the way that SoH itself of previous cycles is used as a part of the input to infer the SoH for the current cycle [13].

2) CYCLE-WISE DATA
Apart from the timeline-based information collected from each channel, the aggregated information of voltage was also proved to be useful for SoH prediction. As discussed in [51], during the CC-CV process there are 5 significant cycle-wise features as follows.  Figure 8. However, as confirmed in our exper-iment, we found that x 1 and x 4 are voltages measured at the starting and ending periods of the cycle that are not stable and often noisy. Moreover, x 2 is found to be correlated with SoH of the current cycle. Thus, we exclude x 1 , x 2 , and x 4 from our ultimate cycle-wise feature vectors.
Additionally, besides the SoH at the current cycle C k , in order to take into account the relaxation effect in our cyclewise feature vectors, we also introduce a new feature of T rest , which represents the LIB rest period before every charging cycle. Rest period of a battery at the k-th cycle (T rest k ) is calculated by Eq.12 where t s k is the calendar timestamp of the k-th cycle.
This is the approach to incorporating rest period information into the overall architecture and will later be mentioned in the next section. In addition, we also use P, a boolean peak indicator, to represent the peak status of the battery capacity. P is calculated by comparing the current value with the mean value of five previous battery capacities, to decide whether the current capacity is a peak or not.
Thus, our cycle-wise feature vector of a k-th cycle (D cycle k ) can be ultimately represented as 5-tuple vector as described in Eq. (13).
Then we use a sliding window of size L (from k + 1 to k + L) to predict the battery capacity in the k + L + J cycle, where J is the prediction interval. To accurately predict the  battery capacity, the rest time of the k + L + J cycle is also considered. Due to difficulties in obtaining the starting time of the predicting charge cycle in a real application, we choose    a prediction interval of 1 (J = 1). Ultimately, our model, as shown in Eq.14 and Eq.15 takes in a sequence of battery data, each represented by a tuple of channel-wise and cyclewise data along with the rest time of the predicting cycle as input. As discussed, we use Local Sequence Network to handle channel-wise data. It is carried by a Bi-LSTM network, each VOLUME 10, 2022 unit of which processes channel-wise data of a cycle, represented as a channel-wise feature vector. As previously mentioned, raw channel-wise data is continuous information collected from three channels: voltage, current and temperature. To make a channel-wise feature vector, we perform sampling from those data. Our sampling points are derived from the work of [52], which introduced the important sampling method of S points that represent the voltage curve during the CC process of the charging cycle. In this paper, we choose S = 11 together with the detail of the sampling method depicted in Table 3, and further illustrated in Figure 10.
As observed in Figure 10, there exists a non-sampling interval, which is typically determined because the battery is usually not fully discharged when charging begins in actual use, sampling headroom from 0-30% of the charging process. The closer to the end of the CC charging process, the more the voltage curve changes, so its significance is greater for that charging cycle. Thus, we make more sampling points toward the end of the CC process (i.e., 35%, 52%, 70%, 88%, 90%, 92%, 94%, 96%, 98%, 99%, 100% of the overall CC charge phase of the batteries). Not only voltage value, at each sampling point we also get corresponding values of voltage, current and temperature (for NASA dataset) or internal resistance (for CALCE dataset) of the batteries.
Thus, our input of the Local Sequence Network includes 11 units of the Bi-LSTM network, each of which corresponds to a sampling point. Finally, among the 11 Bi-LSTM units, the output vector of the last one, from now on referred to as charge profile features for convenience, will be taken as the output of the Local Sequence Network.

2) GLOBAL SEQUENCE NETWORK
After the charge profile features vector has been extracted and learned, we concatenate it with the cycle-wise features vector, which includes the battery capacity data and rest time of the charge cycle, to create a single flattened vector as the input for our Global Sequence Network. Then we use a Bi-LSTM many-to-one model similar to the local sequence networks to model the contribution of external changes (charging profile and rest time) to the non-linear changes of battery capacity. As demonstrated in Figure 6, the input of our Global Sequence Network is L units of a Bi-LSTM network, each corresponds to a charge cycle features vector and the return of the last Bi-LSTM unit is selected as the output of our nested sequence model, from now on referred to as C non−linear .

D. LINEAR BRANCH
In theory, the nested sequence model is capable of learning linear trends. However, since our nested sequence model is fairly complex the process of learning these trends can be time-consuming. So in order to better train our model we use a simple autoregression (AR) model, which is a linear model that uses observations from previous time steps as input to a regression equation to predict the value at the next FIGURE 11. The cross-validation method to separate training, validation, and testing sets for NASA dataset.
step. The inputs of this branch are the capacity data of L consecutive charge cycles (similar to the non-linear branch), and the output of this branch is a scalar value, from now on referred to as C linear (Eq.16).
where φ 1 ,. . . ,φ L are the parameters of the model, C k+i is the capacity measured in cycle number k + i and C linear k+L+J is the prediction capacity with a prediction interval of L.

E. FINAL DENSE LAYERS
Finally, we concatenate the linear change (C linear k+L+J ), nonlinear change (C non−linear k+L+J ) and the rest time value of the predicting cycle (T rest k+L+J ) to create a 3 × 1 vector as the input for a artificial neural network consisting of three dense layers with tanh activation function in the hidden layers. The structure of those three layers is similar to the common multilayer artificial neural network architect and each node value is described by Eq.17.
• Z i is the symbol for denotation of the value of a node in the i-th dense layer.
• W i k are the weights corresponding to that node. • X i k are the independent variables or the inputs. This network aims to combine linear change and non-linear change to estimate capacity at the next step. The network also considers the rest time value of that step T rest to predict a more accurate result.

A. DATASETS
To verify our proposed methods, we will use two datasets NASA [7] and CALCE [8]. The first dataset was gathered in the accelerated tests under controlled laboratory conditions, and with regular cycling patterns. Meanwhile, the second dataset also comes from lab tests with regular cycling patterns   just as the first dataset. Nevertheless, both datasets have variant and natural features of the lithium-ion data distribution. Especially, both datasets have some peaking movements in SOH degradation, which create challenges in our prediction.
The specification of each train and test sets are presented in Table 4.

B. DATA PREPROCESSING
Before training the model, the data cleaning is performed using the same step given in [13]. To obtain the capacity data after each charging phase, the datasets are sorted chronologically, and for each charging phase, we select the capacity measured in the next immediate discharge phase.
Any missing data is replaced with the average of subsequent previous n, (n = 5) values for both charge and discharge data.
After data cleansing, we performed standard feature normalization so that every model feature has the same range and gradient descent converges faster. In this research, we used min-max normalization as described by Eq. 18. The normalization is performed before splitting the data into training, validation, and test sets.
where x is the collection of selected features from the datasets, x s k is the s-th sampled feature of the k-th charge  cycle. The final capacity results are denormalized for accurate and fair assessment and will be presented in Ampere Hour (Ah).

C. OUTLIER REMOVAL
To stabilize the input data before feeding it into the models, we use a Z-score [53] Outlier Removal technique. Firstly, we calculate Z-score following the formula below.
where µ, σ are the mean and standard deviation of the population.
Secondly, we compare Z-score with the threshold range (min: -threshold, max: threshold) to decide whether a data point is an outlier.
In our experiments, the threshold Z NASA = 3.0 and Z CALCE = 2.5 are used separately for NASA and CALCE datasets respectively. Statistically, there are approximately 5% of the data points are excluded from the datasets. This percentage and the above setting create stable output for our experiment.

D. EXPERIMENT METHOD
To ensure a fair result we adopted the experiment method of the baseline paper [13]. In which, each battery's data is used in training, validation, and testing separately. For example,  with NASA dataset, when battery #5 is in the test set, the other batteries (#6, #7, and #18) are used for training and validation. Similarly, for CALCE dataset, when battery CS2_35 (#35) is in the test set, other batteries CS2_36, CS2_37, CS2_38 (#36, #37, and #38) are used for validation and training procedures. This process is repeated for each of the four batteries. The specification of each training, validation, and testing sets are presented in Figure 11. Number of samples for each testing scenarios are summarized in Table 4. Similarly, we apply the same separation for CALCE dataset in cross-validation.
For the performance metric, we mainly use the root mean square error (RMSE) while the mean absolute percentage error (MAPE) is also considered: where C k is the ground-truth capacity,Ĉ k is the estimated capacity of the k-th cycle and K is the number of cycles.

E. SoH ESTIMATION
In this section, we compare the proposed model with the Multi-Channel LSTM (MC-LSTM) proposed in [13] with a prediction interval of 30 cycles. We also include our test result using the same MC-LSTM as described in [13] with the exact prediction interval as our model (1 cycle), which is also done in [54]. In all four test sets, we use the last 30 cycles to evaluate the models as done in [13]. Table 5 depicts that our models statically outperform the baseline in almost every battery test data, especially with batteries #5 and #18 where the capacity regeneration happens irregularly. Furthermore, inspection shows that our model performs marginally better at the data points where the battery's capacity spikes or increases. Moreover, continuously executed on NASA dataset, Table 6 describes our comparison on the accuracy performance of overall battery cycles. In this table, we separate peak and non-peak testing sets, to prove our hypothesis. It is obvious that the ARNS architect provides similar loss on nonpeak sections, compared with some methods such as Gaussian Fitting (GF), Multi-Layer Perceptron (MLP), Support Vector Regression (SVR), Multi-channel [13] (MC-LSTM) but creates stable improvement peak counterparts. Table 7 depicts average accuracy output from a list of conventional methods (GF, MLP) as well as the MC-LSTM [54] and ARNS. In Table 7a, the average calculation clearly denotes this evaluation over all four batteries executed on NASA dataset. Similarly, Table 7b describes our experiment on another CALCE dataset. This CALCE measurement was recorded in the ambient temperature and during more than 800 cycles of the four battery batteries #35, #36, #37, #38. Table 7b shows that the ARNS also stably outperforms the above-mentioned methods in all the settings.

F. COMPARISON WITH OTHER NESTED RNN-BASED MODELS
One has to wonder if the Bi-LSTM networks in our ARNS model can be replaced by other RNN-based models. In this section, we further compare the proposed ARNS model using various sequential networks like RNN, LSTM, and GRU in the local and global sequence networks. The selected structure and hyperparameters are kept unchanged from the ones used in the ARNS model using Bi-LSTM. We evaluated the experiment using four batteries. The result is summarized in Table 8 and further described in Figure 14.
As can be seen in Table 8 and Table 9, Bi-LSTM model creates the high-accurate performance overall. With Bi-LSTM being the most stable, it comes in first and second for the test sets of batteries number #5 and #18. Especially for the #5 battery, the model with Bi-LSTM networks outperforms other methods with a significant margin. The ARNS models using RNN and Gated Recurrent Unit (GRU) also outperform others in the test sets of batteries number #6 and #7. In Table 9, the average calculation proved that ARNS with Bi-LSTM provides the best accuracy when considering all four batteries of the NASA dataset. Table 9 (parameters column) and Figure 16 show the number of parameters and estimation error (RMSE) of our ARNS model using different sequence networks. The estimation error is achieved by averaging the calculated RMSE and MAPE (Eq. 20 and Eq. 21) of the entire four NASA batteries. In addition, we include the test result of the MC-LSTM also published by Park in [54] that uses the same  calculation methods and prediction interval. As seen, our models outperform the baseline Multi-Channel LSTM (MC-LSTM) model on both metrics while also being less complex.

G. MODEL COMPLEXITY AND RUL PREDICTION
This improvement confirms the efficiency of our contribution mentioned above of local and global sequence networks data extraction and T rest data encoding. However, since RNN based model has to be trained sequentially, the training process of our ARNS models also follows a similar sequential manner. In addition, our model uses the predicting cycle (k + L + J ) rest time feature, which restricts immediate SoH estimation when the charging cycle begins. Thus, predicting Remaining Useful Life (RUL) using this approach is challenging due to the need of rest time data for every charging cycle from the remaining battery lifetime.
After training, the final configuration afterward is summarized in Table 10. Model tuning via grid search is used to pursuit the most suitable combination of hyperparameters.The Adam-optimizer [55] and learning rate α = 0.001 are used in both setting (with NASA and CALCE datasets). Figure 12 is another visualized figure for the data depicted in Table 5, which shows the prediction for the last 30 cycles in NASA dataset. Similarly, Figure 13 also describes the same information for the last 30 cycles in the CALCE dataset. As seen, the ARNS follows closely with the true data in most of the cases. In another view, Figure 14 and Figure 15 show the visualization of the prediction for all cycles. Moreover, Table 6 (NASA) and Table 7b (CALCE) summarize the quantitative output of these figures respectively, which clearly denotes the significant improvement from our method versus the baseline. Figure 16 illustrates the comparison of various models (including ours) in two metrics: accuracy and model complexity. Obviously, the less model complexity (i.e., fewer number of parameters), the better. It is noted that there are other metrics such as computational time or memory capacity required. However, more parameters also require more VOLUME 10, 2022 training time and consume more memory resource. Thus, comparison on number of parameters would be reasonably sufficient for complexity evaluation of models. In Figure 16, the x-axis refers to the number of parameters, and the y-axis describes the accuracy. On the x-axis, ARNS-RNN is the only model that has less than 500 parameters while all other models require more than that. The MC-LSTM has more than converged 2500 parameters and is considered as the most complex model in this set. On the other y-axis, the ARNS-Bi-LSTM can create the best RMSE loss, which is less than 0.019 on average, while all other models met higher RMSE losses correspondingly.

VII. CONCLUSION
This paper has introduced an improvement for the LIB SoH prediction, in terms of accuracy. Especially, the improvements are more visible when the prediction is carried out in peak-centered cycles. Such improvements come from the idea of leveraging multi-channel information incorporated into a suitable deep learning model. In particular, we focus on exploiting the relaxation effect for peak prediction, which has not been implemented yet in an end-to-end model. Two of the most robust contributions are: (1) We introduce a nested sequence learning model utilizing the battery's multi-channel charging profiles of voltage, current, and temperature. And (2), we extract the crucial feature, i.e., the LIB relaxation effect by engaging the T rest data, which plays the key distribution in the peak periods factor of the SoH. As the result, we have archived significant improvement compared to the baseline. Through experiments on NASA's LIB datasets, our model performed approximately 23.17% (RSME) and 17.47% (MAPE) better than the baseline. On CALCE dataset, we archived 21.91% (RMSE) and 6.49% (MAPE) better respectively, reported on Table 7a and Table 7b. Especially when considering the estimation at capacity peaks, the ARNS loss deduction are (28.34%, 17.44%) and (27.67%, 11.74%) for (RMSE, MAPE) for two different reported datasets, which are stable and significant.
The above promising output enables the approach of datadriven improvement for deep learning, specifying for SoH estimation. Moreover, a better understanding of LIB data enables us to select the proper features and efficiently optimize the prediction model. We also want to note that up to now our experiments were only conducted with existing common benchmarking datasets, as we still could not have a chance to access unseen battery datasets. It would also open rooms for transfer learning, as we intend to fine-tune our current model to continuously train with new data from additional unseen battery data that we can manage to obtain. This is also a direction we would like to pursuit for the future work.
However, in contrast of the advantages of the method, it's worth to mention the disadvantages of DL approaches used in our method. These models generally suffered higher memory consumption and longer execution time. We also illustrated this concept in Figure 16. Moreover, once the size of data is relatively small, DL models hardly outperform other machine learning methods. DL models only prove their capability when dealing with suitable-sized datasets, which can vary in different situation domains [56]. In our case with NASA datasets, other works [13] showed that the dataset sizes are fit for deep learning approaches and thus suitable for our experiments.
A visible limitation is seen at splitting the training and test sets known as data leakage [57]. This problem was caused by applying min-max normalization before splitting the dataset. As seen, it may cause over-optimistic results. However, since the above separation is inherited from the baseline, we will have the enhancement for this limitation in future works. Moreover, we also plan to include further experiments on real-condition datasets because the current approach may not work with the real LIB data with noise and incomplete cycles.