Transfer Learning for Adapting Battery State-of-Health Estimation from Laboratory to Field Operation

The importance of accurate estimation of the state-of-health (SOH) for Lithium-ion (Li-ion) batteries is going to increase as Li-ion batteries become more integrated into daily life. As the reliance on Li-ion batteries increases so does the need for battery pack size optimisation and the extension of battery lifetime. Data-driven methods for estimation of the SOH of Li-ion batteries have shown to have good performance under laboratory conditions, but often fail to achieve similar performance when used in real life applications. This is a consequence of the field data seldomly matching the laboratory data, which is a necessary condition of most data-driven methods. A method which aims to account for discrepancies between laboratory and field data is transfer learning. This paper shows how the transfer learning algorithm kernel mean matching can be used to transfer both multiple linear regression (MLR) and bootstrapped random vector functional link (BRVFL) models from the laboratory to the field. It is shown that these methods can achieve mean absolute percentage errors (MAPE’s) smaller than 1field data simultaneously.


I. INTRODUCTION
T HERE has been an increase in the use of Lithiumion (Li-ion) batteries in daily life both through the deployment of more electric vehicles and in grid-connected residential energy storage systems. It is, therefore, in the best interest of manufacturers and end users to optimise the size of the battery packs and the lifetime of the battery from both an environmental and economic perspective. In order to achieve this goal, it is important to accurately ascertain the health of the battery at every moment during its operation. Nevertheless, accurately estimating the battery state-of-health (SOH) often requires extensive and expensive laboratory experiments, which can quickly become obsolete.
Estimation of the SOH of a battery usually falls into one of two categories: (1) physics-driven methods, or (2) datadriven methods. The physics-driven methods aim to model the the internal states of the battery using theory of physics, chemistry, and electrical circuits. Among the most popular approaches are the electrochemical models consisting of par-tial differential equations, which model the internal battery components and their interactions [1], [2], and equivalent electric circuit (EEC), which relate battery current to the voltage through a series of simple circuit elements (e.g., resistors, capacitors, voltage sources etc.) [3]- [7]. While these methods can be very accurate, they require very large fully factorial experimental designs to determine the necessary parameter [8]. Full factorial experimental designs are necessary to account for all possible battery parameter (e.g., capacity. resistance etc.) dependencies such as temperature, SOC, and level of degradation. As an alternative, complex Kalman filters have had some success in estimating the battery SOH [9]- [17]. However, due to their intrinsic properties, a singular solution can not always be ensured to exist, and even if recursive estimation is used to account for this potential problem, they require constant monitoring of the battery. In recent years, there has been an increasing interest towards more data-driven methods, as they require little to no expert knowledge and are usually only dependent on the input and the output of the system. Among data-driven methods three of the more popular choices are: (1) support vector machines (SVM) [18]- [21], (2) Gaussian process regression (GPR) [22]- [28], and (3) artificial neural networks (ANN) [29]- [33]. While these methods are black-box methods offering little to no insight on the how/why the degradation occurs, they can achieve errors as small as 0.5%. However, a lot of data is required for ensuring small estimation errors. Therefore, alternative methods have been proposed which extract more relevant features from the raw measurements, and use simpler models like multiple linear regression (MLR) [34], random vector functional link neural networks [35]- [39] (RVFL), extreme learning machines [40], [41]. These methods require much less data, while still achieving errors smaller than 2%. Lastly, it has been shown that incremental capacity (IC) can be related to the SOH by extracting relevant features from the IC curves and modelling the relationship between features and SOH using MLR. However, this requires the IC curves to be known. It has been shown that the IC curves can be found using data-driven methods like SVM and ANN [42]- [48]. It has been shown that the methods based on the re-constructed IC curves can achieve errors as low as 0.5%, even in real-life application [47]. Furthermore, very recent advancements have shown that a hybrid EEC and ANN approach could achieve errors smaller than 1% [48]. However, the performance comes at the cost of a more complicated feature extraction due to the IC curve re-construction. A general disadvantage of the data-driven methods (including the hybrid methods) is that the laboratory data on which the models are trained needs to resemble the intended application. If the application deviates even slightly (dependent on the method) from the laboratory experiments, then the predictions of the model cannot be trusted. That is, if the usage pattern in application changes, then the laboratory experiments needs to be re-performed using this new pattern. An important question is: Can this be avoided? A possible solution is transductive transfer learning.
Transfer learning aims to reduce the amount of data recollection by accounting for the fact that the model is going to be used in a different context than where it was trained [49]. That is, when training the model, transfer learning tries to account for the differences between the features used to train the model, and the features observed in the application. While some researchers have considered transfer learning for battery SOH and remaining useful life (RUL) estimation [50], [51], they have focused on very complicated recurrent neural network models necessitating large training sets. Furthermore, the type of transfer learning used still requires knowledge of the SOH on the the field operated batteries (in this context called the target domain). Therefore, the aim of the this paper is to show that much simpler SOH modelsbuild and tested using laboratory ageing experiments can be transferred to field operated batteries, without the need for SOH measurements in the target domain. This was achieved by transferring the models using a type of (transductive) transfer learning called kernel mean matching [52].
The remainder of the paper is structured as follows: First the experimental setup and the results of the laboratory experiments are presented in Section II-A. After which, three strategies for extracting features are then presented in Section II-B. These features are used to estimate the SOH in Section II-C. Section II-D shows how the proposed models can be transferred from the domain in which they are to be trained (the laboratory) to the domain where they are to be applied (the field). The results of the transferred models can be found in Section III, and a discussion of the approach follows in Section IV. Lastly, while SOH can be measured on two fronts capacity and power, the focus in this paper will be on capacity degradation. That is, from this point forward SOH estimation will refer to capacity estimation (though the ideas outlined in this paper will also extend to SOH modelling in terms of power or resistance).

A. EXPERIMENTAL SETUP 1) Battery and forklift operation
In this work, Li-ion battery cells with a nominal capacity of 180 Ah and a nominal voltage of 3.3 V were considered. The cells are based on a graphite anode and a lithium iron phosphate cathode. Battery packs, composed of these cells, had been deployed in the field, in three forklifts, which were placed in the back of trucks around Europe and used to move heavy pallets throughout the day, and charged every few days. A representative one-week operation profile for the three forklifts is presented in Fig.'s 1 and 2. The figure shows that the operation of the forklifts leads to mostly short and shallow cycles.  Furthermore, throughout the operation of these forklifts there have only been a few deep cycles (i.e. with a depth of discharge larger than 80%) and subsequently constant current charging allowing for an approximation of the battery charging capacity in only a very few cases. Fig. 3 shows the approximate forklift battery charging capacity against the full equivalent cycles (FECs). As one can observe in the figure, the amount of degradation experienced by the battery in the forklifts is minimal -between 0.5 and 1% of degradation during the entire analysed operation period, approximately 17 months. Lastly, the operation of the battery allowed for the calculation of an approximate capacity at only four points in time for Forklift 2 and six for Forklift 3.

2) Laboratory Ageing Tests
Due to the nature of the usage of the batteries, with irregular deep discharges, creating a comprehensive battery degradation model would be difficult. Therefore, a total of six accelerated ageing tests were conducted; three concerning calendar ageing, and three cycle ageing. For both calendar and cycle ageing, the batteries were aged at 35, 40, and 45 o C to capture the effect of temperature on the degradation. The batteries used to analyse the effects of calendar ageing were stored at 90% SOC, as this was the average SOC the forklifts were subjected when they were in idling mode. Every two weeks a reference performance test is performed to measure the capacity of the batteries and to quantify their incremental degradation. Fig. 4 presents the capacity decrease of the cells during calendar ageing at the three ageing temperatures and 90% SOC. From these results, it is seen that the increase in the idling temperature from 35 o C to 45 o C, does not have a large influence in the capacity fade behavior of the cells (i.e., maximum 5% difference after 15 months of idling between the considered temperatures).
The batteries used for cycle ageing were subjected to a load profile created using the first six months of battery operation in the forklifts. The profile was created by removing all idling periods (which account for more than 90% of the total operation) from the first six months of the of the battery operation in the forklifts, resulting in a profile of approximately 12 days. However, while the forklift during operation is subjected to the average current applied to the battery was 22 A, it has peaks above 350 A; due to current limitations of the laboratory battery test station, the current had to be kept below 50 A. This creates a possible discrepancy between not only the currents of the ageing profile and the actual forklift profile, but also their SOCs. In order to overcome this issue (i.e., SOC mismatch between the two profiles), whenever the current in the forklift profile exceeded 50 A (mainly during discharging), the ageing profile was limited to 50 A until the same SOC value was reached for both forklift and laboratory ageing profile. This ensured that the SOCs of the two profiles were identical. The aforementioned procedure, distilled the six months of forklift operation into the two-week profile, shown in Fig.' 7 shows the evolution of the capacity degradation, of the tested battery cells, during cycle ageing at the three ageing temperatures. It can be observed that the degradation behaviour of the three batteries is almost identical despite the 10 o C difference in the ageing temperature, which is similar to the results obtained for the calendar ageing. Furthermore, Fig. 7 illustrates that during the considered cycle aging experiment, the batteries were subjected to nearly 600 FECs, which resulted in approximately 10% capacity fade.

B. STATE-OF-HEALTH FEATURES
The aim of feature extraction is to take raw measurement data and distill this information into a set of variables, commonly called features, which are still able to accurately represent the raw measurements. A model is then created to establish a relationship between these features and the battery capacity (i.e. the SOH). Furthermore, the feature extraction methods, in this paper, are window based (similar to the methodology presented in [34]), i.e. a period (or window-size) will be specified and the features will be extracted using the raw measurements within this period. The extracted features will be used to model the capacity measured during the reference performance tests performed at the end of every round of ageing of the battery cells. As every round of ageing will only have a single capacity measurement, it is natural to extract the features based on these rounds of ageing, i.e. the features on the ageing data will be extracted on a two-week basis.
The length of the period used to extract the features on the forklift data is not as important, as the period being large enough to yield consistent results. Preliminary extraction showed that an extraction period of one week yielded consistent results. Furthermore, due to the nature of the capacity measurements performed in the laboratory, the feature extraction from the forklift data has to be disjoint (i.e. they can not overlap).
The features extracted on both the ageing and forklift data will be based on three slightly different techniques: (1) simple descriptive statistics [34], (2) partial voltage charging [53], [54], and (3) online resistance extraction [55]. These methods were chosen because they can be performed in an "online" fashion using relatively little computing power.

1) Simple descriptive statistics
The descriptive statistics will be extracted from the raw voltage, current, and temperature profiles in every window in both the laboratory ageing and forklift data. The descriptive features give insight into the distribution of the voltage, current, and temperature. In the following sections the aim will be to link the change in these distributions to the degradation of the battery. The voltage, current, and temperature of window w will be denoted by V w , I w , and T w , respectively, and assumed to have length N (i.e. V w , I w , and T w are vectors in R N ). Furthermore, it will be assumed that the features for all previous windows, 1, 2, ..., w −1 have already been extracted from the raw measurements.
In the following, a short description of each of the descriptive features, and how they are calculated for each window, is given: • Average of the voltage, current, and temperature (X w is used to represent either V w , I w , or T w ): a measure of center of a distribution, and is calculated as: • Standard deviation of the voltage, current, and temperature (X w is used to represent either V w , I w , or T w ): a measure of deviation around the average (the square root of the average squared distance of every point from the center), and is calculated as • Skewness of the voltage and current (X w is used to represent either V w or I w ): a measure of the asymmetry of the distribution (if its negative/positive it has larger left/right tails), and is calculated as: • Kurtosis of the voltage and current (X w is used to represent either V w or I w ): a measure of how large the tails of the distribution are when compared to a normal distribution (if it is larger/smaller than 0, the tails are larger/smaller than those of a normal distribution), and is calculated as: • Maximum change in the voltage and current (X w is used to represent either V w , or I w ): a measure of the largest change in the sequence (in the case of the voltage this will be related to the ohmic resistance, while for the current it is related to the workload), and is calculated as: • Cumulative full equivalence cycles (FEC): a measure of the through-put normalised by the capacity of the battery cell, and is calculated as: where FEC w−1 is the FEC of the previous window (with FEC 0 = 0), and Q nominal is the nominal capacity of the battery (in this case the nominal capacity is 180). Fig. 8 shows two examples of the features extracted from the ageing data against FEC. The left panel shows the average voltage, it is clear that there is a decreasing trend in the average voltage as a function of the FEC, while there seems to be nearly no effect of the ageing on the skewness of the voltage distribution, as seen on the right panel. Furthermore, the panels show that the temperature does not seem to have an effect on the shape of these trend lines, only shifting them up or down.

2) Partial voltage charging
As the battery cell degrades, the time it takes for cell to completely charge from empty to full will naturally decrease, i.e. the time it takes for the cell to go from its lower to its upper voltage limits will decrease, as depicted in Fig. 10. Furthermore, it has been shown in [53], [54] that it is not necessary to observe the entire voltage curve from its absolute lower limit to its absolute upper limit, but calculating through-put in a restricted voltage window from V low to V high (shown as the red horizontal lines in Fig. 10) will be proportional to the calculating the capacity across the entire voltage curve.
If the extraction of these reduced capacity measurements, Q w , are to be performed from a dynamic profile, the current needs to be consistent as the voltage passes through the defined voltage limits, as it is well known that the capacity VOLUME Y, 2021 Charged capacity [Ah] . Exemplification of the partial voltage charging for a fresh and aged cell; the partial voltage interval lays between V low and V high .
is heavily dependent on the current. In real-life applications, the main difficulty of this method is to identify periods of time where the current is consistent, i.e. periods where the current profile repeats at different moments during the battery operation through its life. Luckily, the charging procedure of the forklifts and, therefore, the ageing profile, is consistent. This consistency allows for the extraction of Q w every time the battery is fully charged, yielding multiple extracted Q w values for every window in both the ageing and forklift data. After these features have been extracted, they will be summarised within each window by taking the average and standard deviation, denotedQ w and s(Q w ), respectively.
The average and standard deviation of the extracted Q w values for the ageing profile can be seen in the left-and right-hand panels of Fig. 9. As it can be observed, unlike the descriptive features, the evolution of these features as function of FEC is dependent on the temperature (at leastQ w ). Furthermore, two things are worth mentioning: (1) the curve ofQ w seems to be flattening, which could become a problem if used for prediction (unless the measured capacity behaves in a similar fashion), and (2) the standard deviation, seen in the right panel of Fig. 9, is increasing slowly overtime. That is, even though the voltage is passing through the same voltage limits, the time is takes to pass through these limits becomes more inconsistent as the battery degrades.

3) Resistance
It has been shown that both the ohmic and internal resistance can be extracted, to within a reasonable accuracy, from dynamic profiles [55]. The battery resistance can be extracted from a dynamic profile by keeping very careful track of the following: (1) Changes to the current: ∆I. Requiring a relaxation period at least as long as the previous pulse, the resistance can be extracted using these five variables, as sketched in Fig. 11. To be more specific, requiring ∆T relax ≥ ∆T previous , then the ohmic and internal resistances can be calculated as: where V 0.1s and V 18s is the voltage 0.1 seconds and 18 seconds after the initiation of the pulse, respectively. Like the case of the partial voltage method, the ohmic and internal resistances can be extracted multiple times during every period (for both the ageing and forklift data). Therefore, they are summarised using the average and standard deviation to track the change in the distribution of the extracted features instead of the raw extracted values. These will be denoted asR 0 , s(R 0 ),R i , and s(R i ).
In Fig. 12 the average of the extracted ohmic and internal resistances are shown against FEC for every window of the ageing data. As would be expected, it shows that the average resistance (both ohmic and internal) increases as the battery degrades. Furthermore, the figures show that the overall trend of both the ohmic and internal resistance is not affected by temperature. Lastly, it seems the variation in the average extracted ohmic resistance is more stable than the average extracted internal resistance. It may be possible to stabilise the extracted internal resistance, by adding further restrictions on ∆T relax , such as requiring it has to be larger than a minimum 15 seconds (i.e. ∆T relax ≥ max{∆T previous , 15}). However, asR i is just one of many SOH estimation features, it is not deemed unnecessary in the context of this paper.

C. STATE-OF-HEALTH MODELLING
As the aim of this paper is to transfer a model trained on the ageing data obtained in laboratory to the forklift data measured on the field, the modelling of state-of-health (SOH) should not be the focus. Therefore, the methods presented in this section are very simple, but with reasonably high accuracy.
It is assumed that the general SOH can be decomposed into two parts, the loss of capacity due to idling (calendar ageing), and the loss of capacity due to the cycling (cycle ageing).
Furthermore, it is assumed that this effect is additive. That is, the capacity in window w, denote Q w , can be written as: where Q 0 is the initial capacity, ∆Q (cy) w is the loss in capacity due to cycling, and ∆Q (ca) w is the loss in capacity due to calendar ageing.
The two components will be modelled separately as the loss in capacity due to each of these components, and the capacity is then predicted using Eq. (1). Lastly, the training and validation sets were created by making a random 70/30 split of the ageing data, where the 70% will used to train the models, and the 30% will be used to compare them.

1) Calendar model
Calendar ageing is mainly dependent on two factors: (1) the storage temperature, and (2) the SOC at which the battery is stored. As the storage SOC is going to be very consistent in the intended application (i.e., the forklifts), mostly between 90 and 100%, the storage SOC is going to be ignored as a variable. It has been shown in [56] that the relationship between storage time, temperature, and degradation should follow a power law, i.e. the logarithm of the loss in capacity due to calendar ageing, ∆Q (ca) w , can be modelled as: where w is the (accumulated) time in storage measured in weeks, and T is the temperature measured in centigrade. Using the calendar aged laboratory data, presented in Fig. 4, the parameters were found by simple least squares estimation (see Table 1), and the mean absolute percentage error (MAPE) on the validation set was calculate as ≈ 0.4%. The results of the model described by Eq. (2) using the trained parameters of Table 1 can be seen in Fig. 13.

2) Cycling model
Two methods will be compared when modelling the change in capacity due to cycling, ∆Q (cy) . The first method is a multiple linear regression model (MLR) [34], while the second is a bootstrap aggregated random vector functional link neural network (BRVFL) [41]. Before the models are trained feature reduction will be performed using principle components analysis (PCA) [57], [58]. Lastly, the two methods will be compared using cross-validation for each of the specified PCA thresholds.
Principle components analysis PCA can be thought of as a linear transformation of the features, specifically a translation to the origin, followed by a rotation such that the new first coordinate explains most of the variation, the second explains the second most variation, and so on. A simple 2-dimensional example can be seen in Fig. 14, the left-hand panel shows the original features (simulated from a multivariate normal distribution with correlation 0.8), and the right-hand panel shows the PCA rotated features. The PCA coordinate axes are shown in both panels as the red and blue unit vectors. If the features are stored in a matrix X, then the principle components can be found by diagonalisation of the matrix: C = X T X. That is, by identifying the eigenvectors, V , and eigenvalues, λ 1 , λ 2 , ..., λ M , such that: where Λ is a diagonal matrix containing the eigenvalues λ 1 , λ 2 , ..., λ M . The principle components correspond to the eigenvectors of C (i.e. V ).
Using the matrix of principle components, the feature matrix can be rotated by simple matrix multiplication: The elements of the diagonal matrix Λ are related to the amount of variation explained in the direction of the corresponding eigenvector, and found in numerically descending order, i.e. |λ 1 | > |λ 2 | > ... > |λ M |. Thus, the features can be reduced by selecting the number of columns included in V when making the rotation.
It follows that to reduce the features, it becomes necessary to calculate the amount of variance explained by each of the principle components. If Σ is the covariance matrix of S, and σ mm is the m'th diagonal element of Σ, then the proportion of the variation explained by the m'th principle component is: where σ + = M m=1 σ mm . As the principle vectors are arranged in descending order of variance they explained in the features, the cumulative sum of the proportions can be used to identify an index i such that the first i features will explain more variance than some specified lower limit t. That is, given t and the cumulative sum of the proportion of explained variance: it is of interest to find the index, i, such that c i−1 < t, but c i ≥ t. Given this index i, the size of number of features is reduced as: where V 1:i is the matrix of the first i columns of V . Note: from this point the superscript in S (i) will be generally be dropped to alleviate notation.

Multiple linear regression
Let ∆Q (cy) w be the change in capacity measured during the reference performance test, and S w be the PCA reduced features corresponding the window w (this is the equivalent to the w'th round of ageing). A multiple linear regression (MLR) model assumes that the capacity can be modelled by a linear combination of the features, i.e.

∆Q (cy)
where ε is assumed to follow a normal distribution with mean zero and standard deviation σ, β 0 is a common intercept, and β j is the slope of feature j. That is, if all S wi with i = j are kept fixed and S wj is increased by 1, then the response ∆Q The solution to this optimisation problem can be found in closed form, using matrix notation the solution takes the form:β (3) Fig. 15 shows the change in capacity against the FEC for each of the three temperatures used in the accelerated ageing tests. The black dots correspond to the measured change in capacity in the training set, while the black crosses is the measured change in capacity in the validation set. The solid and dashed lines correspond to the estimated change in capacity using a PCA threshold of 95% and 100% (i.e. a retention of 95% and 100% of the variation in the original features), respectively. The figure shows that there is nearly no difference between the two reduction thresholds with one very clear outlier seen when trying to predict the capacity around 200 FEC at a temperature of 45 o C. This can also be seen when comparing the mean absolute error (MAE) and MAPE on the validation sets in Table 2. The largest validation error was found at 45 o C with a value of 0.43% with a PCA threshold at 95% (with the second largest MAPE at 0.37%). However, as there is little to no difference between the two thresholds, it will enable the end user the choice of a smaller threshold yielding a larger reduction to the number of features used in the MLR. Either threshold showed good performance with errors less than 0.5%.
where λ is a regularisation constant, which should be chosen such that it minimises the out-of-sample error (this can be accomplished using k-fold cross-validation during training).
If λ > 0 the optimisation problem is a variant of ridgeregression, and the solution can be found in a similar fashion to what was described for the MLR: where I i+j is the identity matrix of size i + j. However, if λ is set to zero the solution will have to be found using the Moore-Penrose pseudoinverse, D + , as: Due the random nature of the RVFL method, various extension have been proposed to stabilise the random assignment of weights. Among the more promising variants are the sparse pre-trained RVFL (SP-RVFL) using a sparse auto-encoder to learn the hidden weights in an unsupervised fashion [38], ensemble deep RVFL (edRVFL) using an RVFL with multiple hidden layers each layer predicting the outcome [39], and bootstrap aggregated RVFL (BRVFL), which combines the random nature of the RVFL with a bootstrap aggregation [41].
The BRVFL is chosen as it is a simple extension offering more stability to the modelling process than the RVFL. When training the BRVFL, B bootstrap samples of the training set are created; bootstrap samples are samples of the same size as the training set, where each element has an equal probability of being chosen with replacement (i.e. the element is not removed if it is chosen and can, thus, be chosen again). A regular RVFL is then trained to each of the B bootstrap samples using Eq. (4). When predicting the capacity, initially each of the B trained RVFL models will make a prediction, Q (cy) (1) ,Q (cy) (2) , ...,Q (cy) (B) , and the final prediction of the BRVFL model is then the average of these predictions: Fig. 17 shows the result of a trained BRVFL using 2500 bootstrap sample, a hidden layer with 200 neurons, and a λ = 0.02. The figure shows the change in capacity against the FEC for each of the three temperatures used in the accelerated ageing tests, where the black dots correspond to the measured change in capacity in the training set, and the black crosses is the measured change in capacity in the validation set. The solid and dashed lines correspond to the estimated change in capacity of the trained BRVFL using PCA thresholds of 95% and 100%, respectively. The figure shows very similar behaviour to the estimated capacities of the MLR for both thresholds. This is further supported by the validation errors seen in Table 3, showing very similar results to that of the MLR (though the MAPE's tend to be slightly smaller for the BRVFL).

D. TRANSFER LEARNING
The aim of the paper is to take the SOH estimation models, which were parameterised using the laboratory ageing data, presented in Section II-C, and transfer these models to the field (i.e. the forklifts). Transference of these models cannot be done directly, because the distribution of the features extracted from the forklift data will not match those of the laboratory ageing data, as seen in Section II-A. This problem falls into a class of machine learning methods, called transductive transfer learning (TTL). In the context of TTL, the laboratory (where the ageing data is sampled from) is called the source domain, denoted S, and the field (where the forklift data is sampled from) is called the target domain, denoted T . Restating the problem more mathematically, with S and Q (cy) denoting the features and capacity, the joint distributions of the source and target domains are not equal: TTL assumes that the conditional distributions in the source and target domains of the capacity given the features are (approximately) equal, i.e.
Because any joint distribution can be written as: the assumption of equal conditional distributions, Eq. (7), implies that the difference between the joint distributions, Eq. (6), must be due to a difference in the marginal distributions, i.e. P S (S) = P T (S).
This particular type of TTL is, therefore, often called feature shifting (or more traditionally covariate shifting).
It can be shown that the difference in the marginal distributions can be accounted for by calculating importance weights for each sample in the source domain. That is, it is possible to find α(S) such that: When the importance weights are found they are used to either up or down weight the influence of the samples in the sources domain when training a model. Training on a weighted source sample is almost identical to what was presented in Section II-C. Therefore, all that remains is to find the importance weights. However, most TTL methods require some knowledge of the marginal distribution, which may be very difficult to ascertain. A method for finding these importance weights without needing to know anything about the marginal distributions is kernel mean matching [52]. where ||·|| 2 is the 2 -norm, and E X∼P X is the expected value taken w.r.t. variable X and distribution P X . and κ is a vector where: It follows that if a value κ i is large it implies that the corresponding observation is important, leading to a large value of α i . In this formulation, nothing is assumed about the marginal distributions P S (S) and P T (S). In fact, the only assumptions necessary to show convergence ofα to the 'true' α (in this context 'true' is used in the statistical sense, i.e. the sample ratioα converges to ratio α between the source and target domain) is that k needs to be universal (or equivalently strictly positive definite), and P T (S) needs to be absolutely continuous with respect to P S (S) (this ensures that P T (S) = 0 when P S (S) = 0).

III. RESULTS
The results of the transferred models will be evaluated in two ways: (1) The performance of the transferred models on the source domain, and (2) by the performance of the transferred model on the target domain. The first evaluation was included as when the model is transferred from the source to the target domain it should still perform well on the source domain. It is included as a sanity check.

A. SOURCE DOMAIN
Because there are three target domains, a total of nine combinations of sources and targets needs to be considered when evaluating the performance of the transferred model on the source domain. The results of the domain transferred models are shown in Fig. 18. The figure shows the battery  Table 4. Focusing on the 95% PCA reduction threshold, the MAPEs are in all but three cases less than 3%. Furthermore, it is clear from the figure and table that the BRVFL is much more closely fitted to the estimated capacity of the source domain. In fact, the errors exhibited by the transferred BRVFL models are comparable to the non-transferred BRVFL models.

B. TARGET DOMAIN
The estimated SOH on the target domain was calculated in a similar fashion to the source domain, with the additional dependence of calendar ageing. Under the assumptions outlined in the beginning of Section II-C and the results of the calendar ageing model, seen in Section II-C1, it was only deemed necessary to transfer the methods modelling the change in capacity due to cycling ageing. Given the estimated change in capacity due to calendar ageing and cycling, the capacity at time w is found using Eq. (1). The estimated capacity of the two methods on the target domain is shown in Fig. 19 against the FEC. The predictions made by transferred the MLR and BRVFL models are shown as crosses and triangles, respectively. In addition a smoothed curve is fitted better visualise the trend of the methods, shown as the dashed and solid lines for MLR and BRVFL, respectively. Furthermore, the capacity measurements performed during the operation of the forklifts are shown as black dots. The figure shows very similar estimation results for the two methods on Forklift 2 and 3, but that the MLR method has some stability issues on Forklift 1. As a consequence, the estimated capacities of the MLR method are very far from the measured capacities on Forklift 1, while the BRVFL method is consistent through all three forklifts. These results are also supported by finding the MAE and MAPE between instance of measured capacities and corresponding estimated capacities of the three forklifts, which can be seen in Table 5. The table shows that the BRVFL method generally outperforms the MLR method; however, if Forklift 1 is disregarded the results of the MLR still achieves MAPE's less than 1%.

IV. CONCLUSION AND DISCUSSION
The paper outlines and implements a paradigm for extracting different types of features, estimating battery SOH using cycle and calendar laboratory aging tests, and transferring the SOH estimation models to a real-life application. The methods used to parameterise the SOH estimation models based on the laboratory data, were chosen to be as simple as possible, while having good performance, narrowing the methods to multiple linear regression (MLR) and a bootstrapped variant of random vector functional link neural networks (BRVFL).
The analyses performed in the paper shows the ease of use and implementation of transfer learning for both the MLR and BRVFL methods. The transferred models showed good performance in both the source and target domains (i.e. the laboratory and field), achieving mean absolute percentage errors (MAPE's) smaller than 1% with the exception of the MLR method on a single forklift (Forklift 1).
A deviation worth pointing out is the capacity estimation results using the BRVFL method for Forklift 3 in Fig. 19. This sudden decrease while peculiar cannot be verified as no capacity measurements of the forklift exists in this period, though it is not unheard of in the literature. What further complicates matters is the fact that the decrease is not predicted by the MLR (though the variance in the prediction of the MLR increases during this period as well). Two possible scenarios could exist explaining this sharp decrease: (1) it is an actual decrease in capacity not accounted for by the MLR, this is a real possibility as the BRVFL has better performance on the source domain, or (2) it is not an actual decrease in capacity, which would imply that the BRVFL method is overfitting to the source domain. However, it is impossible to judge whether these predictions can actually be trusted without any capacity measurements.
Lastly, it is worth pointing out that this approach could be extended to involve databases of laboratory experiments with different current, SOC, and temperature profiles, as the estimated weights are used to up, or down, weight an observation dependent on the distance between the observation in the database and each observation of the field data. He was a Visiting Researcher with RWTH Aachen, Germany, in 2013. He has co-authored over 150 journal and conference papers in various batteryrelated topics. His current research interests are in the area of energy storage systems for grid and e-mobility, Lithium-based batteries testing, modelling, diagnostics and their lifetime estimation. VOLUME Y, 2021