Bias Suppression Framework for Detrending Mean of Multi-Output Gaussian Process Regression in LED Remaining Storage Life Prognosis

Advancements in storage prognosis tend to be limited by the inherent challenge to collect sufficient significant degradation data over an extensive period. Using only sparse fleet data, multi-output Gaussian process regression (MOGPR) is one of the few techniques that offers a practical data driven approach to model non-monotonic degradation profiles with low mean absolute percentage error (MAPE). This accuracy in the storage prognosis context is, however, sensitive to the choice of the detrending mean. Working with light emitting diodes (LED) sparse lumen degradation data under storage conditions in this study, the MAPE is observed to be highly correlated to the detrending bias – the difference between the detrending mean and the test mean. We explore various approaches to suppress this bias and advocate a generic framework for fleet storage prognosis. The approaches include detrending using (A) static training data mean, (B) dynamic observed test data mean, (C) static bounded training data set pairs, (D) dynamic weighted mean of unbounded training data set pairs and (E) moving average of weighted mean of unbounded training data set pairs. Our analysis shows that the moving average approach (Method E) of computing weighted mean of unbounded training data set pairs results in the most stable detrending mean to suppress detrending bias and helps achieve an MAPE lower than 1%.


I. INTRODUCTION
Storage has been a conventional approach to buffer the uncertainty in supply chain, be it to fulfill surge in demand or to address surplus in supply [1,2]. The equipment or commodity in storage does not perform its designated functional roles, hence, contributes the least value in the system lifecycle. As such, it is also intuitive that the designers prioritize design requirements for operational use over storage. With in-transit inventory optimization, manufacturers can minimize inventory holding and storage duration which further reduces the manufacturers' vested interest in storage.
More disruptive "Black Swan" events have recently been observed, be it a pandemic like COVID-19 or environmental disasters like earthquakes, fires, floods, and super typhoons. These have introduced disruptive demands for emergency response equipment on one hand and led to long-term storage of high value assets like aircrafts in aviation industries during the pandemic on the other hand [3,4]. Due to the lack of degradation models for such storage condition profiles, the affected industries have to suffer the uncertainty in the reliability of equipment or commodity placed under longterm storage or commit costly resources to inspect and service these unused equipment or commodities in order to preserve their reliability till activation [5] - [7]. Sabouri et al. [8] propose an algorithm to develop the inspection policy that minimizes the infinite horizon discounted expected penalty, replacement, and inspection costs.
In addition to the lack of financial motivation for the manufacturers to develop these degradation models for storage profile conditions, the manufacturers are in general not positioned within the equipment life cycle to collect appropriate storage data as well [9]. The relevant data can only be collected by the user or by an assigned warehousing operator, who often does not possess the technical knowledge, toolkits and infrastructure to collect data, analyze it appropriately and develop useful degradation models for storage life prognosis [9]. Furthermore, meaningful data has to be collected over an extensive period of time, due to the relatively slower rate of degradation while in storage, leading to both lower velocity and volume for "good data" [10,11].
Physics-based approaches, data-driven approaches and a hybrid of both are conventional prognostic approaches to develop degradation models. An et al. [12] provide a review for selecting data-driven or physics-of-failure prognostic approaches and conclude that Gaussian process regression (GPR) is an easy and fast option for implementation; however, it suffers from cubic computational complexity. Liu et al. [13] highlight that this complexity of O(n 3 ), due to inversion and determinant computation of the n × n kernel matrix, limits the use of GPR for large datasets. As our storage fleet data is normally sparse, this limitation becomes less of a constraint. Neural networks (NN) provide advantages to handle cases with large noise and complex models, however, they demand extensive and diverse training sets. Elforjani et al. [14] demonstrate the advantage of NN with back propagation (BP), over support vector machine (SVM) regression and GPR for bearing remaining useful life (RUL) prediction with acoustic emission signals. However, given the storage context with far lower data velocity and volume, adoption of NN is a challenge for us to estimate the remaining storage life (RSL). Bishnoi et al. [15], on the other hand, demonstrate the utility of GPR over NN, with a small dataset, to provide rigorous property estimates, without the issue of overfitting, which is a common problem associated with NN.
For the physics-based approaches, particle filter (PF) and Bayesian method (BM) provide more superior performance to data-driven approaches like GPR and NN; however, they require a prior established underlying physical model governing the degradation process along with the constraint of constant loading conditions [11]. In the context of storage prognosis, such degradation models may not be available as the physics of degradation under storage is hard to decipher and is often, unknown. This poses challenges to adopting PF and BM approaches for RSL prediction.
Focusing on readiness of aerospace products, Lu et al. [16] introduce the development of storage life assessment and evaluation with accelerated life test, life evaluation process and storage life assessment method. Mense et al. [17] provide a data driven approach with binary logistic regression (BLR) model to assess recent system failures during non-operating storage and transportation. However, the reliability predictions are sensitive to the accuracy and completeness of the information provided to perform the prediction. Li et al. [18], on the other hand, proposed a method to estimate the storage life distribution parameters by fitting to exponential component and the storage period can subsequently be determine with test results. However, this approach requires the storage degradation profile to follow the exponential distribution.
There are several good case studies of GPR being used for prognostic applications in the recent past. Yang et al. [19] use GPR to present a novel approach to estimate the state-ofhealth (SoH) for lithium-ion batteries, based on four specific parameters from the charging curve, instead of typical charging cycles. Richardson et al. [20] used the GPR to forecast battery state of health to suggest the relative advantage of data-driven approach over physics-of-failure approach when modeling complex battery degradation trends. Extending the typical GPR (single output), they explored the multi-output Gaussian process regression (MOGPR) to exploit the correlation of degradation timeseries trend between different battery cells under the same loading environment.
Liu et al. [21] presented a comprehensive review on the performance of MOGPR under asymmetric and symmetric scenarios. Under the asymmetric or multi-fidelity scenarios, there are differences in the number of training points across outputs and they attempt to improve the predictions of more costly high-fidelity output by transferring information from the less costly low fidelity outputs. Under the symmetric scenarios, the outputs have the same number of training points, and the authors attempt to improve predictions with all related outputs. The symmetric scenario discussed in their work has some synergy and synchrony with our intent to RSL prediction for fleet storage scenario where the prediction of an equipment degradation trend can be made and improved with degradation trend from other equipment, within the fleet under the same environmental conditions. Duong et al. [22,23] used MOGPR to predict the lumen maintenance life for light emitting diode (LEDs) over a long horizon of their active service life. While similar in concept, the training data sets in that work are large and the degradation of active components are more prominent than that in the storage context. To this end, we have previously explored the basic MOGPR to obtain residual storage life (RSL) prediction and achieved 80-90% estimation accuracy for Li-ion batteries in storage with sparse degradation data points, in the absence of an established degradation model (Ref. [24]). In order to further improve the accuracy of the predictions, we attempt to refine the basic MOGPR approach.
As part of the MOGPR procedure, it involves a step of detrending the degradation time series prior to performing the training. Typically, the arithmetic mean of the training data is used to detrend the training data. However, for the test data, the arithmetic mean of the test data will not be available until all the test data has been observed, and as such, the arithmetic mean of the training data itself is often used, as an estimate, to detrend the test data. It is observed that this choice of the detrending mean may introduce bias into the MOGPR training, due to inherent differences between the means of the training and test data sets. These differences may arise from manufacturing differences within specification tolerances or differing usage history prior to storage. As the magnitude of the degradation in storage is typically small, these inherent differences, while seemingly small, may induce a relatively significant bias that can result in unacceptable distortions in the degradation time series predictions, compromising on the RSL estimation accuracy quite a lot.
To minimize this bias, there is a need to explore approaches to detrend the test data, possibly with the mean of the incremental test data observed up to current time of test. A better prediction with lower mean absolute percentage error (MAPE) was obtained by trying this approach in our work [25]. However, this approach was unstable when the number of test data points is low. Subsequently, we also explored possibility of having bounded training sets with an equally weighted mean of both training data sets to detrend the test data. While this further lowered the bias and MAPE, resulting in improved RSL [25], it greatly constrains the selection of the training sets to be the upper and lower bound of all collected data (which most often may not be the case in practical circumstances) and enforces stringency on the test data sets which have to strictly stay within the bound of the training data sets in order to achieve a lower MAPE. This paper investigates into the effects of bias in the detrending mean on the MAPE (or RSL prediction) while performing MOGPR and proposes more versatile approaches to reduce MAPE in the degradation profile. Section II provides an overview of MOGPR and the concept of detrending mean in MOGPR. Sections III describes the data, performance evaluation metric and simulation set-up. Section IV examines the various detrending approaches and discusses the key results and observations. Finally, Section V presents a conclusion of our study with specific recommendations for future exploration. MOGPR models the degradation time series, f(t) with a Gaussian probability distribution over the function [26]: where m(t) is the function mean, estimated by the arithmetic mean of the test time series, and is the function covariance, derived as the product of the label covariance and temporal covariance: where ( , , ) is the temporal covariance matrix for test time series, ( , , ) is the label covariance matrix, t denotes a specific time within the time series, t' denotes the remaining times within the time series, denotes the label input for a specific times series, ′ denotes the label inputs for remaining times series, and and are the vector of hyper-parameters for the label and temporal covariance, respectively. In our context, to model the degradation time series, the intuition for the function mean (m(t)) can be perceived as the trend. The function covariance ( ) can be perceived as the relative changes from the trend. In a basic GPR, also termed as single-output GPR (SOGPR), for time series, we describe the relationship from one data point to the other data points in the time series, with the temporal covariance. For simplicity, we have adopted the common square exponential (SE) covariance for ( , , ): where denotes the scaling hyper-parameter for f. The hyperparameters are optimized through maximizing the log marginal likelihood expression: where Y = [( ( ), … , ( ( )] denotes the array of data points from the degradation time series, n denotes the number of training points (observations).
Extending SOGPR to MOGPR with multiple outputs, we leverage on the relationship of the observed time series test data set with one or more similar time series training data sets to enhance our prediction. This relationship is described by the label covariance matrix. The term "label" is merely used to index the training and test data sets. ( , , ) is a positive semi-definite covariance matrix, obtained by the free form parameterization by Cholesky decomposition to provide the elements of the lower triangular matrix [6]: where k = m×(m+1)/2 is the number of correlation hyperparameters. The diagonal elements represent the correlation of the times series, whereas the non-diagonal elements represent the correlation between multiple time series.

B. DETRENDING MEAN AND BIAS
Different approaches to preprocess the degradation time series can have a significant impact on the prediction performance. Ahmed et al. [27] listed some of the common approaches, including but not limited to deseasonalisation, log transformation and detrending. The basic MOGPR can be specified completely by just the mean function and covariance function. It is common and practical, though not necessary, to use a Gaussian Process (GP) with zero mean function. For simplicity, we start with a fixed (deterministic) value as the GP mean function.
For the training data set, this fixed mean function value can be easily obtained by computing the arithmetic mean of the training data set. As illustrated in Figure 1, by taking the difference between each data point and this fixed mean function value, we are effectively detrending the training data set into an adjusted training set with zero mean. We can then apply the common zero mean GP to this adjusted training set. However, in the case of the test data set, the test data points are collected incrementally, and hence, the true arithmetic mean can only be computed after the complete test data set has been collected. It will, however, be meaningless by then for practical application of MOGPR prediction. As such, the arithmetic mean of the training data set is typically used as the best estimate to detrend the test data before subjecting the adjusted test data to MOGPR based predictions. The inherent differences between the test and training data mean are known to introduce bias, as seen in Figure 2. This bias leads to distortions in the MOGPR predictions. Based on our earlier works (Ref. [24] & [25]), we observed a strong negative correlation between the detrending bias and the prediction MAPE. To understand the effects of this bias, various approaches to detrend the test data will be considered in our study here, as summarized in Table 1. In doing so, we hope to develop a generic and practical framework that can be widely used for bias-suppressed prognosis.

A. CHOICE OF LED TECHNOLOGY FOR THE STUDY
In this study, we have selected LED as the technology of focus to illustrate the utility of MOGPR in storage prognosis and develop our modeling framework, due to the increased proliferation of LED and associated prognostic challenges, as elaborated in the ensuing paragraphs.
Since 2011, LED-based products have steadily penetrated the market compared to conventional incandescent, halogen, florescent and high-intensity discharge (HID) lamps. Market penetration for LED-based products is projected to reach 75-85% by 2030 [28]. With such proliferation, there have been significant research significant on the reliability of LEDs. This is an upheaval undertaking. LEDs typically have long operating lifetime with few observable degradations. It is costly and time consuming for manufacturers to collect failure or degradation data to comprehensively assess and improve the reliability of LED.

B. CURRENT STATUS OF LED PROGNOSIS STUDIES
Furthermore, failures can occur outside laboratory conditions. Trivellin et al [29] provided a comprehensive review of the failures of LEDs in real-world applications, beyond laboratory environment where stress parameters are carefully controlled or isolated. Several real-world LED degradation mechanisms like electrical overstress, assembly issues, mishandling and chemical contamination were reported to be crucial in the upstream manufacturing process to prevent downstream catastrophic service failures. Driel et al. [28] reviewed that there have been a whopping 88 failure unique modes for LED documented since 2011 and projected, based on the Goel-Okumoto maturity model, and that another 42 unique failure modes remain to be discovered. The possible interplay of these large number of multiple failure modes presents a big challenge to develop a comprehensive physics-of-failure model for LED.
Nayak et al. [30] attempt to predict lumen degradation with traditional statistical models like Normal, Weibull and Lognormal models. They conclude that lognormal provides the best fit with Akaike Information Criterion (AIC). However, such models are deterministic and may not be useful for non-monotonic degradation scenarios, as observed for LEDs under storage.
Recent advances in artificial intelligence and machine learning have provided additional options for reliability analysis. As the LED lamp system comprises of sub-systems like LED driver, LED module, diffuser, reflector and interconnects, the system can be decomposed into subsystems for analysis. Instead of the traditional deterministic reliability block diagram (RBD) approach for LED system reliability assessment, Ibrahim et al. [31] leverage on fault tree analysis (FTA) results and expert knowledge to develop a Bayesian network (BN) to model the complex system reliability to address the uncertainties and interaction between sub-systems. The proposed BN approach provides an accurate lifetime prediction with shorter testing time.
While this approach provides better reliability insights for LED manufacturers to assess and improve their product design, it requires the further modelling of sub-systems which is often untenable for the storage agency.
Jing et al. [32] implemented long short-term memory (LSTM) and recurrent neural network (RNN) to predict the lifetime of ultra-violet (UV) LEDs against the nonlinear least squares ( algorithm. This approach did provided better prediction results against IESNA TM-28. TM-28 is the existing standard employed by the industry to project long term lumen maintenance for LED light source. All of these approaches, however, are still data intensive and focus on the degradation of LED under active usage conditions. Purwanto et al [35] presented an aging study of LED with counter-intuitive results. LED samples exposed to least level of radiation energy (simulated sunlight) displayed the highest physical degradation in conversion efficiency and transparency of LED, under ambient temperature and short exposure duration of 500 hours. They suggest that there may be other more dominating degradation factors, namely oxygen and humidity exposure, that are responsible for this observation. Degradation may be significant even under storage. This re-iterates the complexity of LED degradation process when different physics-of-failure mechanisms can have counter-intuitive interactions and effects and dominate the degradation under operating and/or storage conditions.

C. LED STORAGE DATA SET
The LED data sets are obtained from the work of Singh et al. [36] at Chang Gung University (CGU) from the Center for Reliability Science and Technology (CReST ® ) in their proposed moisture-electrical-temperature (MET) test to evaluate the outdoor reliability of high power LEDs. In order to simulate the morning condition when the streetlamp is "OFF", a subset of 20 blue LED samples is maintained in the "OFF" condition, with a constant temperature of 85°C and 85% relative humidity, as the accelerating environmental factor. The "OFF" condition is akin to LEDs placed under long-term storage, hence, is applicable for our study. In our previous work [25], we have explored some basic flavors of MOGPR to predict remaining storage life for these LEDs and the same data sets have been employed here for this study.  Each LED sample is a time series that consists of 12 lumen degradation data points, collected every 24 hours, for optical and electrical measurements. 5 LED sample data sets are selected, as illustrated in Figure 3, where 1 LED sample can be considered as the test data set, while the required training sets are selected from the remaining 4 LED samples to simulated different scenarios. The test data can also be subsequently reused as training sample in other simulations. Samples 11 and 13 serve as the upper and lower bound, respectively. Samples 3 and 5 follow the general degradation profiles of the bounded training samples, and they are distributed around the 50% (mid-point) and 30% (nearer to the lower bound), respectively, of the bounding pair. Sample 19 displays more profile irregularities with reference to the other four samples and is distributed around 30% (nearer to the lower bound), like Sample 5. Sample 19 is also selected as it intertwines with Sample 3 and Sample 5; hence, it can be used to simulate scenarios where the test data transcends the training pairs.

D. TEST REGIME
The Multi-task Gaussian Process (MTGP) Toolbox [37] for MATLAB ® is used to implement the MOGPR simulation to generate the projected degradation profile for the test data set. We have performed the MOGPR simulation over the various approaches of detrending on the test data, as listed in Table 1, referred to here on as Methods A to E.

E. ERROR METRICS
To visualize the typical remaining useful life (RUL) or RSL prediction, the predicted degradation profile is plotted together with test data point with a user defined condition failure threshold. The time when the predicted degradation profile cuts the failure threshold defines the predicted RUL/RSL, as illustrated in Figure 4. This can be validated against the true RUL/RSL determined with actual failure data. The prediction error is defined as: where denotes the true RSL based on the observed time series data, denotes the predicted RSL computed based on the remaining time from the time that j th test data is collected to the time where median generated by the MOGP regression intersects the defined failure threshold, j refers to the number of test data used for MOGP regression.
While this standard threshold failure-based approach works for monotonic degradation profile, it is not useful for nonmonotonic degradation profile as there may be multiple points of intersection of the degradation trace with the threshold. For non-monotonic degradation profiles, we have adopted the mean absolute percentage error (MAPE) as our measure of prediction performance: where n is the number of data points used, is the luminous flux observed at data point i, and is the predicted luminous flux at data point i, obtained from the median of the MOGPR prediction. MAPE effectively measures the performance of the approach to retrace or replicate the degradation trend. For this paper, only the predicted points, i.e., the remaining points that are yet to be observed at the time of simulation, are subjected to the MAPE computation. This will prevent the dilution of the errors with the increasing number of observed data points.

METHOD A: DETRENDING USING STATIC TRAINING MEAN
The ideal approach for MOGPR is to detrend the test data set with its true mean (as illustrated in Figure 1). However, the test data is observed incrementally, and the true mean can only be computed after all test data are observed. The true mean, hence, cannot be determined or used, in advance, at each iteration step. In lieu of the true test mean, the mean of the training data set is often used as an estimate for the true test mean (as illustrated in Figure 2). This premise is on the assumption that the training set for MOGPR should be largely similar in characteristics to the test set. In this case, both the training and test data sets are detrended by the same training mean, prior to training and prediction, where: where ̂ = estimator of test mean to be used as detrending mean for test sample, = mean of training sample.
As mentioned in Section II.B, this method introduces a detrending bias when ≠ . The training mean is used to detrend Test Sample 3 and Test Sample 5, with detrending bias of 0.67 and 0.82, to yield an MAPE of 2.9% and 3.8%, respectively. To establish the ideal MAPE baseline reference, the true means of the respectively test data are used to detrend the test data (which is the ideal scenario we refer to in Table I), prior to subjecting the test data to the MOGPR with single training data set (Sample 13). The detrending bias in this case is zero. For Test Samples 3 and 5, the ideal MAPE turns out to be 0.5% and 0.4%, respectively. From Figure 5, the MAPE obtained by detrending with training mean shows a significantly higher MAPE than that obtained with true mean.   Figure 6 shows the plot of the MAPE against the detrending bias. Since the training mean is fixed, the detrending bias is consistent throughout each iteration for Sample 3 and Sample 5, hence, the plot appears as 2 vertical lines of plots. It should be noted that Sample 5 has a larger bias than Sample 3, and therefore, the larger MAPE is also observed for Sample 5. Observing the loss of prediction performance due to the detrending bias between the training mean and the true test mean, we previously proposed an incremental observed test mean to be used (Ref. [10]), instead of the training function mean, to detrend the test data, as illustrated in Figure 7. This observed test mean is derived by averaging only the observed test data points, available at the respective measurement time intervals.
( ) = (∑ )/ where ( ) = estimator of true test mean to be used as detrending mean for test sample, = luminous flux at observation data point i, k = number of observation data points available, thus far in the LED unit under test. Based on Equation (9), it should be noted that the detrending mean is dynamic with every new test data observed.
In our simulation, detrending Samples 3 and 5 with their observed test means as compared to detrending with the training Sample 13, the MAPE is reduced by more than 40%. As the number of sampling points increases, this observed test mean will tend towards the true mean of the test data. It should naturally serve as a better estimate candidate for the true mean of the test data. While this approach offers lower MAPE, hence, better prediction performance, than the training mean approach, the region of instability (orange dashed box), exhibited as high MAPE, is observed when number of sampling points are small, as observed in Figure 8.     This instability reduces rapidly with the increase in number of data points. It usually offers a more stable MAPE only after 5 data points when there are high fluctuations in the initial actual degradation profile. This shortcoming can be explained as when the number of observed test data points are small, there are insufficient data to represent the true test data set. Consequentially, these observed test means computed (when data points are small) cannot achieve a good estimator of the true mean, leading to high detrending bias. This leads to erratic MAPE performance during the initial phase of prediction when the number of observed test data points is small. Figures 9 and 10 show the MAPE against detrending bias plot. Similarly, it is observed that the larger the detrending bias, the higher the MAPE. Here, a more obvious positive correlation between the MAPE and detrending mean can be clearly observed.
METHOD C: DETRENDING USING STATIC BOUNDED TRAINING PAIRS FIGURE 11. Degradation profile of test sample 3 bounded completely by training sample 11 (lower bound) and training sample 13 (upper bound).
To alleviate the initial shortcoming presented by detrending with the dynamic observed test mean, our group further proposed to use the equally weighted average of the training means of the upper and lower bound training data series, to detrend the test data series, as illustrated in Figure 11 (Ref [11]). The detrending mean is estimated by the mid-point of the upper and lower bound: where ̂ = estimator of test mean to be used as detrending mean for test sample, = = weight of mean of lower bound and upper bound training samples, respectively, = mean of upper bound training sample, = mean of lower bound training sample.
As both training data sets are completely available, they do not suffer from the inadequacy to represent the test data set, as in the previous case of the incremental test mean approach. By estimating the test mean with the equally weighted average of the upper and lower bound training data series, we help reduce the unilateral bias, which exists when performing detrending with either one of the training means alone. By restricting the training sets to be within the upper and lower bound, we restrict the detrending bias to be no greater than half of the difference between the mean of upper and lower bound training sets, as illustrated in Figure 12. This also implies that the smaller the difference between the means of the bounded training pair, the smaller the possible range of detrending bias. In this specific case, Sample 3 is situated close to the middle (50%) of the training pair, as such, it can be observed that their equally weighted mean (22.74 lumen) becomes a very good estimator for the true test mean (22.67 lumen). It provides better stability over the entire range of data points (entire lifecycle span of the prognosis) as compared to Method B, as shown in Figure 13 The detrending bias is greatly suppressed to 0.07 in this specific case, as seen in Figure 14. The MAPEdetrending bias plot reveals that the performance of Method C to be comparable to that derived by the true test mean, with average MAPE of 0.9%.   Figure 15 and Figure 16, respectively. As elaborated previously, Sample 3 lies around the mid-point between the bounded training pair, leading to low detrending bias, whereas Sample 5 and Sample 19 reside around the 30% (nearer to the lower bound) between the bounded training pair. The 30% offset, while within the constraint of half of the difference between the mean of upper and lower bound training sets, leads to the higher detrending bias (in blue and green) than that for Sample 3 (in red), as seen in Figure 17. As such, the resultant average MAPE from Sample 5 & 19 are also observed to be larger, at 1.4% and 2%, respectively, as observed in Figure 18.   It should be noted that while the equally weighted bounded training pair approach yields a more stable MAPE, it does not guarantee a better performance than the observed test mean, as the detrending bias is constrained to half the difference between the means of the bounded training pair. When the difference between the means of the bounding pair is large, it provides opportunity for larger detrending bias; and hence, higher MAPE. The observed test mean approach, on the other hand, tends towards the MAPE performance of the true mean approach as the number of data points increases. Furthermore, while equally weighted bounded training pair approach eliminates the challenges presented by the earlier approaches, it assumes that the upper and lower training data sets can be determined upfront. While in most practical cases, this may be possible, however, it is challenging to obtain the optimal bounding pair for each test sample and we cannot exclude the possibility that the test data series may become a new upper or lower bound, or it may intermittently transcend the upper and lower bound training data, hence, invalidating the basis to lead to the reduction of the bias from the true test mean.  Luminous Flux (lumen) To analyze the performance of equally weighted means of the bounded pair approach when the test sample transcends the bounded pair, Sample 19 is used as the test sample with Sample 3 and Sample 5 replacing Sample 11, as the lower bound, as seen in Figure 19 and Figure 20, respectively. It should be noted that Sample 19 intersects Sample 3 and Sample 5 as the "lower bound", at several points. Figure 21 shows Between the two runs with the "degraded" Method C where the test sample transcends the bounded pair, training sample 5-13 pair provides better prediction with lower MAPE, compared to training sample 3-13 pair. This is coherent with our postulation of the detrending bias-MAPE correlation, where bias arising from the training sample 5-13 pair (magenta) and training sample 3-13 (cyan) pair are 0.42 and 0.50, respectively, as illustrated in Figure 22. This analysis illustrates that the performance of the equally weighted means of the bounded pair approach degrades when the test data series transcend the bounded training pair.

METHOD D: DETRENDING WITH DYNAMIC WEIGHTED MEAN OF UNBOUNDED TRAINING PAIR
To relax the constraint to be able to determine upper and lower bound data sets as the training pair, prior to the complete observation of all test data points, a detrending mean obtained by dynamically computing the weighted average of any two training data set pairs is proposed here. In this case, the weight of the mean of each training sets is assigned based on the proximity to the test data. This allows the test data to permanently or intermittently transcend the bounds of either of the training data sets, while maintaining a lower bias from the true test mean.
where ( + 1) = estimator of test mean to be used as detrending mean for the test sample for k+1 runs, As the test sample is no longer completely bounded by the training sample, it is intuitive to reduce the bias of the detrending mean by estimating a mean by allocating dynamic weightage to the means of training samples, based on the proximity to observed test sample, as seen in Eqns. (12)- (14). Method D is illustrated in Figure 23. To contrast the effects of Method C and Method D, Method D is repeated for sample 3-13 and 5-13 training pair sets, using test sample 19 for our analysis. From Figure 24, it can be observed that the dynamic weighted approach provides a lower MAPE in general, than the equally weighted training pair mean approach when the test data transcends the training pair. The MAPE performance by the dynamic weighted mean approach does not have the restriction for the test sample to remain completely bounded by the training sample pair. We attribute the lower MAPE for the prediction to the lower detrending bias by the dynamic weighted mean approach, as seen in Figure 25. Similarly, the detrending biases are distributed laterally due to the dynamic nature of the detrending mean, as compared to the equally weighted mean approach where the static detrending biases are lined vertically. The dynamic weighted mean approach also suffers from the similar challenge of initial instability as the observed test mean approach. This is all the more apparent when there are high fluctuations of the degradation profile in the initial stage. The dynamic weighted approach will also result in highly fluctuating detrending means, and hence, detrending bias, in tandem with the degradation profile. Scatter plot MAPE against the detrending bias for unbounded Test Sample 19, detrended by the equally weighted and dynamic weighted mean, with Training Sample 3-13 pair and 5-13 pair.

METHOD E: DETRENDING WITH DYNAMIC WEIGHTED MEAN OF UNBOUNDED TRAINING PAIRS USING MOVING AVERAGES
While the dynamic weighted training pair mean approach removes the hard constraint for the test data sets to remain within the bounds of the training data sets, the approach produces highly fluctuating detrending means as the proposed detrending mean is a function of the individual observed test data that fluctuates around the static true test mean, at the time of observation. This is especially so when the degradation profile is undulating or non-monotonic. To smoothen this fluctuation, the previous dynamic weighted mean approach is further refined with the moving average (MA) preprocessing methods, with different sliding windows, considered in Ref.

MAPE(%)
Training S3 & S13, Equally Weighted Mean Training S5 & S13, Equally Weighted Mean Training S3 & S13, Dynamically Weighted Mean Training S5 & S13, Dynamically Weighted Mean [14]. Method E flattens the fluctuations of the detrending means, and generally reduces the bias from the true test mean, as illustrated in Figure 26. The generalized form of the dynamically weighted training pair mean, with moving average of sliding window, n is defined as: where ( + 1, ) = estimator of test mean to be used as detrending mean for test sample for the (k+1) th  The dynamic weighted means of any training sample pair with moving averages are extended to the same three test sets, i.e., test sample 19 with training sample 11-13, 3-13 and 5-13 pair sets, with varying moving average windows of window size, n = 2 to 5. It should also be noted that MA(1) is effectively the dynamic weighted training pair mean as discussed in the Section IV (Method D). Figures 27-29 illustrate the relative reduction of the MAPE from the equally weighted bounded pair approach (denoted by "zero" size of moving average), as discussed in Section IV (Method C), to the dynamic weighted mean of training pair by moving averages with varying window sizes. Figure 27 illustrates the case where the test sample 19 is completely bounded by the training pair. The new approach shows only slight improvement from the equally weighted bounded pair approach ("zero" case) as the previous approach has already yielded relatively good performance, given that it complies with our completely bounded constraint for the approach.    Figure 28 and Figure 29 illustrate the cases where the test sample 19 is not bounded by the training pair. Significant improvement in the MAPE, compared to the equally weighted bounded pair approach, can be observed. By using the dynamic weighted mean of any training sample pair with moving averages, the MAPE can be reduced to about 1%, regardless of whether the test sample is bounded, or not, by the training sample pair across various test samples. This approach provides flexibility to select training samples without compromising the performance of the prediction of the degradation profile.
The corresponding detrending bias are also plotted in Figure  30-32. The boxplots in Figures 30-32 show that this moving average approach suppresses the bias and the fluctuation or dispersion of the bias. This contributes to a lower MAPE during each iteration of the MOGPR. Larger windows of moving averages (n) will smoothen the fluctuations encountered in the previous approach, thereby providing a more consistent estimation of the detrending mean. This will lead to a lower detrending bias. However, the larger moving average windows will also be a limiting factor in that the very first prediction can only be made after n+1 test data points have been observed. This implies that prediction will be deferred until sufficient data points have been observed and fed as input to derive the moving averages, MA(n). This may be a challenge in the situation when there are sparse or little data points collected, as in our case for RSL prediction.  It will be necessary to probe deeper into the MA window size to trade-off the expected prediction performance and sufficient remaining life to make meaningful predictions for prognosis of equipment in storage. From our experience, MA(3) seems to provide an intuitive window to reap the effects of the moving average approach, while leaving sufficient remaining life to make meaningful predictions. The studied range, from MA(1) to MA(5), provides coverage for the analysis over our intuitive window of n = 3. It is observed that the degree of suppression of bias and improvement to MAPE is not strictly increasing with the increase of moving average windows, as seen in Figure 30 and Figure 31, where the detrending bias from MA (2) is observed to be lower than that from MA(3). As such, the choice of the ideal moving average window (which very much depends on the time series pattern and evolution of the data) remains inconclusive and will require further investigation.
The effects of bias suppression and improvement to MAPE could also be visualized with a scatterplot. Figure 33 and Figure 34 show the scatterplots of MAPE against detrending bias for test sample 19 with training samples 3-13 pair and 5-13 pair, respectively. A "V" shape clusters of data points in the scatterplot suggests the strong correlation between the detrending bias and MAPE.

F. SUMMARY OF METHODS
The approaches we present in the sequence of Methods A to E above gradually suppress the detrending bias and achieve a better MAPE performance. Method A uses the training mean as a simple and practical estimate for the detrending mean. However, due to the inherent bias between the training and test data, it introduces detrending bias which results in significant MAPE compared to the ideal case of detrending with true test mean. Method B uses the observed test data to incrementally compute the detrending mean, which tends to the true test mean in the long run. While it eventually tends to the performance of the ideal case, it may suffer from initial instability. Method C uses an equally weighted mean of a pair of bounding training samples to overcome the initial instability of Method B, however, it imposes the restriction for the test sample to be completely bounded by the training samples, to yield good detrending bias and MAPE. To relax the "completely bounded" constraint and to cater for flexibility in training samples selection, while maintaining comparable MAPE performance to the equally weighted mean of training pair approach, a dynamic weighted mean of unbounded training pair is proposed. This approach, however, suffers from similar initial instabilities as in the observed test mean approach. Eventually, we propose a moving average pre-processing of the detrending means from the weighted mean of training pair, to smoothen the initial instability, especially when the test data is nonmonotonic and fluctuates across the degradation profile. The summary of the characteristics and performance of the various detrending approaches, with their respective pros and cons, is presented in Table 2. Given the sparse data used for the simulations, the computation load is not demanding. The computation time ranges from around 10 sec for the single training sample approaches, to approximately 20 sec for the training sample pair approaches. The regression is performed on a MacBook Air (Retina, 2018) with 1.6 GHz Dual-Core Intel i5 and 16GB 2133 MHz LPDDR3 RAM.

V. CONCLUSION AND FUTURE WORK
In the absence of extensive datasets and established physicsof-failure models, MOGPR is one of the few data-driven techniques that offers a practical approach to model nonmonotonic degradation profiles for prognosis in fleet storage scenario. We observe that the introduction of detrending bias reduces the prediction MAPE performance. As such, we were motivated to develop a framework to suppress this detrending bias further to achieve lower prediction MAPE. The bounded training pair approach provides a good MAPE but it requires the test sample to remain completed bounded by the training sample pair. In practical applications where the test sample data points are incrementally collected, it will not be possible to ensure, in advance, that the entire test sample time series remains completely bounded by the selected training sample pair. In this study, we extend the approach with dynamically evolving weights derived from the proximity of the test data to the training data pair, to address scenarios where the test sample transcends the training sample pair. This relaxes the bounding constraint while achieving similar MAPE performance, thus, yielding improved flexibility for the training samples selection. The technique, however, suffers from initial instability. As such, we further refined the approach with the moving average sliding windows to smoothen the fluctuation of the dynamic detrending mean and bias. This eventually resulted in a drastic reduction of MAPE to about 1%, comparable to that obtained with the ideal true test mean, regardless of whether the test sample is bounded, or not, by the training sample pair. However, the choice of the ideal moving average window remains inconclusive and will require further investigation.
Note that past research works that dealt with GPR mostly focused on active use loading condition data sets, wherein the extent of degradation is far higher than the detrending bias. In such scenarios, the use of the standard training mean approach (Method A) already yields very good outcomes. However, in the storage prognosis scenario, as the extent of degradation becomes comparable to the detrending bias, we are required to more carefully analyze the data using the Methods B-E discussed in this work to achieve similar accuracy standards of MAPE. As such, it is worth mentioning that the approach presented in this work is mostly suited only for storage prognosis scenarios and such intricacies are not relevant to prognosis under active use conditions.
In summary, this paper has laid the foundation to shift the fleet storage industry from traditional periodic inspection approach to condition-based maintenance by offering simple and practical MOGPR modeling framework. It dispels the misconception on the need for Big Data to leverage artificial intelligence and machine learning for fleet storage prognosis with the use of only sparse degradation data patterns. Based on the correlation observed between the detrending bias and MAPE, we have developed the framework to suppress detrending bias and achieve much better degradation prediction with an improvement in MAPE accuracy for MOGPR by 70% (from 3.3% to 0.87%) through detrending with dynamic weighted mean of unbounded training data sets using the moving average sliding window approach.
Future works may include investigations into the behaviors of the various windows of moving averages on MAPE for different degradation profiles, vis-à-vis different sampling rate. The optimal moving average window may depend on the extent of non-monotonicity in the degradation profile and the full width half maximum (FWHM) of any humps in the time series trend, as well as the density of data points. Alternatively, an operational perspective can be studied wherein the adoption of an incremental window size n that increases with the number of data points is feasible, given that the expectation on accuracy in the initial storage life is not as critical, and a progressive reduction in the MAPE in tandem with the increase in storage life may be acceptable. Finally, beyond detrending bias, there may be other factors, for instance, the (1) number of observations, (2) degree of correlation between the multiple outputs, (3) degree of noise, (4) number of iterative steps for hyper-parameter optimization, introduced during MOGPR that may affect MAPE. A systematic analysis to understand these potential error sources may support further suppression of the MAPE.