Hierarchical Load Forecast Aggregation for Distribution Transformers Using Minimum Trace Optimal Reconciliation and AMI Data

Overloading and load imbalance have a significant impact on the health of distribution transformers. The load of a distribution transformer can be considered in a hierarchical way: individual single-phase customers connected directly to the transformer (the bottom level), the load at each phase (the middle level), and the total load among three phases (the top level). Load at each hierarchical level can be predicted individually, known as “base forecast”, through a state-of-the-art forecasting method, but this practice often leads to incoherency and bias, i.e., forecasts at a lower hierarchical level are not aggregated correctly to the forecast at a higher-hierarchical-level. In this paper, a novel load aggregation technique based on minimum trace (MinT)-based optimal reconciliation is proposed to improve the accuracy of prediction models. Base forecasts at each hierarchical level are firstly determined using independent autoregressive integrated moving average (ARIMA) models; MinT is then used to optimally reconcile base forecasts to ensure higher accuracy. The proposed method is validated by case studies. Advanced metering infrastructure (AMI) data recorded by Saskatoon Light and Power in Saskatoon; Canada is used as historical data in this study.


I. INTRODUCTION
In distribution systems, distribution transformers must be highly reliable to ensure continuous power supply to consumers [1] as transformer failures may result in huge financial losses to utilities and consumers [2]. Although transformers normally have a long lifespan (exceeding 50 years), the health of a distribution transformer may deteriorate due to oil leakage, overloading, unbalanced loading, and harmonics [3], and their failure rate increases due to demand increase and certain load types, for example, electric vehicle charging can dramatically lower a transformer's lifespan by 200% to 300% [1], [4]. Transformer load forecasts are crucial for health monitoring and failure prevention caused by overloading and phase imbalance. A distribution transformer's load can be considered in a hierarchical way: the amount of electricity consumed by each customer is aggregated to determine the load of each phase (Phase A, B or C); and the load at each phase is then aggregated to determine the total load among three phases of the transformer. Additional levels may be added similarly for a substation transformer. Load forecast at the top level determines the total load among three phases, which can assist in the evaluation of a transformer's health and potential failures due to overloading and winding overheating. Load forecast at the middle level determines the load at each phase, which assists in evaluating the phase imbalance of a transformer, and corrective actions may be required by distribution system operators. Load forecast at the bottom level determines the consumption of individual customers.
Hierarchical Load Forecasting (HLF) has been applied in the context of power generation to improve prediction accuracy through forecast reconciliation. Several HLF methods were proposed, including the game-theoretic optimal projection (GTOP)-based reconciliation [5], the least squares-based optimal reconciliation [6], and a vector autoregression framework (VAR)-based probabilistic forecasting [7]. These methods were tested on wind and solar power generation data with varying degrees of success. For example, a HLF method using the Alternating Direction Method of Multipliers (ADMM) for wind power is proposed in [8] to reconcile base forecasts and ensure disaggregated forecasts are aggregated correctly, resulting in improved prediction accuracy. A review of cutting-edge techniques and their pertinence to day-ahead wind power generation prediction is conducted in [5]. In [9], hourly generation data in Brazil is utilized to prove the accuracy of the aggregated generation prediction at the top hierarchy level using top-down (TD), bottom-up (BU), and optimal reconciliation methods. Optimal reconciliation as a post-forecast technique produces an unbiased and coherent reconciled forecast [10].
Among very limited HLF research in the literature, most existing HLF methods focus on the top hierarchy level. In [11], a closed-loop clustering (CLC) method for HLF is employed for substation load using smart meters and photovoltaic (PV) power generation data, showing a better performance than traditional BU and TD approaches, but it only focuses on the top hierarchy level. A multiplicative autoregressive integrated moving average (m-ARIMA) method in [12], the ensemble methods in [13], and a multi-scale expert aggregation method based on random forest (RF) for BU in [14] also only focus on the top-level hierarchy and ignore forecasts at the lower hierarchy, which are important for transformer health monitoring as well. A generic short-term load forecast is proposed in [15] using load pattern similarity-based child node classification, and wavelet neural networks (WNNs), and it can be used for distribution substations, feeders, and transformers. To improve HLF, a reduced model approach (RMA) is employed for electricity demand forecasting [16], along with smart-meter data with high resolution and dimension. VOLUME 11, 2023 93473 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. Very few research is reported on HLF in transformers. Although independent models have been used for base forecasts at different levels of a transformer's load, if not aggregated properly, this practice can lead to poor forecast accuracy. The literature review on HLF is presented in Fig. 1.
To overcome the limitations of HLF methods, a novel hierarchical load forecast aggregation method is proposed in this paper through the minimum trace sample (MinTSa) and minimum trace shrinkage (MinTSh)-based reconciliation algorithm, where power consumption forecasts at all levels of distribution and substation transformers can be reconciled and aggregated accurately. In this study, the base forecast at each hierarchical level is firstly conducted using an ARIMA model (in fact, any existing forecasting method can be employed for base forecasts); the base forecast is then optimally reconciled using MinT to ensure coherent forecasts with high accuracy. The proposed method incorporates information about the correlation structure of time-series data, resulting in a minimum reconciled forecast error and great computational efficiency in a large time-series dataset. The proposed method is validated through several case studies and shows superior performance compared to four benchmark methods.
The main contributions of this paper include.
• A novel transformer load forecast aggregation technique is proposed by a MinTSa and MinTSh-based postforecast algorithm to optimally reconcile base forecasts obtained independently at each hierarchy level and to increase the accuracy of reconciled forecasts.
• The proposed method is independent of the base forecast method at each hierarchy level, hierarchical structure and forecast horizon. The paper is organized as follows: in Section II, the loading impact on a transformer's health and the purpose of transformer load forecast aggregation are introduced; in Section III, the theory of HLF reconciliation is provided; in Section IV, the proposed method is explained; in Section V, simulation results of several case studies are presented; and the conclusion is drawn in Section VI.

II. LOADING IMPACT ON TRANSFORMER HEALTH AND THE PURPOSE OF LOAD FORECAST AGGREGATION
Loading history is one important pointer for the transformer health index (HI) along with dissolved gas analysis (DGA), oil quality analysis (OQA), furanic content analysis (FA), and degree of polymerization (DP) [17], [18]. Overloading and unbalanced loads are major contributors to distribution transformer failures and degradation [2], [19].
The load increase leads to the hotspot temperature (HST) increase, reducing the insulation life of a transformer. A transformer's loading and environmental conditions have considerable impacts on its aging process. Loading in particular is directly related to the current in the winding, overloading raises temperature in the winding and oil and shortens its service life [17], [20].
Unbalanced loads are responsible for unbalanced winding losses, which in turn impact the top-oil temperature and HST per phase [21]. In recent years, the introduction of electric vehicles (EVs), and increasing distributed energy resources (DERs) in distribution systems have exacerbated the unpredictability of load behaviours in distribution transformers [22]. Therefore, the load forecast of a distribution transformer is essential for its health monitoring. Loading behaviours of domestic consumers directly influence the reliability risk of distribution transformers [23], and thus, load forecast for individual customers is also critical.
Load forecasting is crucial in establishing the health state of a transformer, which allows utilities to proactively address potential problems. It may be used to determine future loadings of a transformer, an important aspect to calculate its health index, HI, which can be modelled as a multivariant regression model as follows: where L is the loading factor, T is the temperature factor, IR is the insulation resistance factor, t is the time elapsed since the start of the measurement, and β 0 , β 1 , β 2 , β 3 , . . . , β k are regression coefficients estimated from the data. Additional benefits of load forecasting for distribution transformers include: • Help identifying anomalies: Accurate load forecasting may assist in the transformer anomaly detections, such as unanticipated changes in loading patterns or abrupt spikes in load demands. These anomalies may signal a possible transformer malfunction, such as winding failures or insulation deterioration, and may be utilized as a trigger for additional examination and maintenance.
• Support optimal maintenance scheduling: Based on the expected loading patterns of a transformer, accurate load forecasting may be used to plan maintenance operations at optimal times, which can limit the maintenance effect on electrical systems and reduce risks of unplanned outages or equipment failures.
• Improve reliability and resilience: Accurate load forecasting and transformer health monitoring may contribute to improved reliability and resilience of power systems by allowing early diagnosis of faults and proactive repair efforts. This may lower the likelihood of equipment failure, reduce downtime, and enhance the overall system performance.

III. THEORY OF HIERARCHICAL LOAD FORECAST RECONCILIATION A. NOTATION
Let P t be m-dimensional vector representing observations of all power consumption at time t, and b t be n-dimensional vector representing observations at the bottom hierarchical level, i.e., the advanced metering infrastructure (AMI) data. The general matrix representation of P t is [10] where S is the order m×n ''summing matrix'' that is used to aggregate the power consumption at the bottom-level hierarchy to the higher level. m represents the overall quantity of observations and n indicates the quantity of observations at the bottom level only. S contains hierarchical information, which can be obtained from the utility's geographical information system (GIS). Fig. 2 shows a hierarchical time series structure of the transformer loading. Each higher hierarchy (parent) is the sum of the lower hierarchy (children). Let P t represent the total transformer power at time t; P A,t , P B,t and P C,t be transformer per phase power, and P A1,t , P A2,t ,. . . P C3,t are power consumption (AMI data) of customers. In Fig. 2, n = 8 and m = 12. The hierarchical structure can be represented in a matrix form below, where I n is an identity matrix representing the bottom level with a dimension, n = 8.
Let's defineP T (h) as a vector in the same order as P t , which represents the h-step-ahead base forecast of each time series derived from historical data extended for time T . It should be noted that any base machine learning or statistical forecasting methods can be used to get the base forecast. Linear reconciliation methods can be represented by [10].
where G is the carefully selected reconciliation matrix of order n×m, andP T (h) is a collection of reconciled forecasts that are now constructively coherent. Forecast reconciliation techniques are based on the idea that a set of base forecasts can be mapped linearly to a set of reconciled forecasts. The objective of the G matrix is to convert the base forecasts to bottom-level disaggregated forecasts that are subsequently summed up by the S matrix. The two commonly used methods for HLF forecasting are BU and TD. For BU, we set G = [0 n×(m−n) |I n ], where 0 i×j is i×j null matrix with i = n, j = m − n, and I n is the identity matrix with n×n dimensions. The base forecast at the bottom-level hierarchy yielded fromP T (h) are aggregated using S to get BU reconciled forecasts. Similarly, we can set G= [∂|0 n×(m−1) ], where ∂= [∂ 1 , ∂ 2 . . . ∂ n ] is a proportion vector that disaggregates the base forecast at the top level to the bottom level, and then S is used to obtain TD reconciled forecasts [10]. As an example, the G matrices for BU and TD are illustrated in a matrix form for n= 4 and m= 6 as follows: whereê T (h)is the base forecast error for h-step-ahead,P T (h) is the base forecast for h-step-ahead, and P T +h is the current true value. A reconciled forecast with the minimum error variance is obtained such that SGS = S is satisfied [10]. For every G matrix that satisfies SGS = S, the covariance matrix of the reconciled h-step-ahead forecast errors can be obtained by whereP t (h) is determined by (3), and W h represents base forecast errors variance-covariance matrix for h-step-ahead.
Assuming that forecast errors are normally distributed, Eqs. (5) and (6) can be utilized to build forecast intervals for all forecasting methods that generate unbiased and coherent forecasts, assuming that original forecasts are unbiased. The quality of the reconciled forecast depends on the effectiveness of the estimator for W h . Residuals of the reconciled forecast are located at the diagonal of the V h matrix and the summation of values located at the diagonal of a matrix is known as a ''trace''. Therefore, the final target is to obtain the G matrix that will result in a minimum trace V h in (5) that satisfies SGS = S. (6) be the positive definite matrix. The optimal reconciliation matrix G that has the minimum trace of (5) and satisfies SGS = S, is given by (7).
Therefore, the reconciled forecast using the MinT method is given bỹ Using MinT optimal reconciliation, the accuracy of reconciled forecasts is equal to or greater than the accuracy of base forecasts.
Letε h (h) be the error of the bottom-level base forecasts. Its variance-covariance matrix is W h as in (6). Assuming that the error is additive,ê h (h) = Sε h (h). This results in W h = SV h S T to be singular. Eq. (5) can be further expressed by [10].
Eq. (9) highlights that V h is independent of the reconciliation matrix G. Considering the assumption of error-additivity, G that satisfies SGS = S is the solution of MinT with the minimum variance.
Eq. (7) provides the solution for the ordinary least square (OLS) as follows: The state-of-the-art methods to compute the covariance matrix W h are discussed below. These estimators are used as a benchmark to validate the proposed method.

C. THE BENCHMARK COVARIANCE MATRIX ESTIMATORS
In this section, state-of-the-art methods used to estimate W h are discussed.

1) ORDINARY LEAST SQUARES (OLS)
Set W h = k h I , ∀h, where k h > 0, W h is the variance matrix of every base forecast error at the bottom level of the hierarchy, I is the n×n identity matrix. Referring (3) and (10), the reconciled forecast using OLS is estimated by (11).
This method is only suitable in some situations, such as base forecast errors remain stochastic and equivariant. It has the most straightforward assumption, known as ''homoscedasticity'', and is applicable only if the identical method is applied to obtain base forecasts for each power consumption observation in all hierarchies of the transformer loading. The OLS estimator does not require assumptions when estimating the covariance matrix.

2) WEIGHTED LEAST SQUARES -VARIANCE SCALING
Set W h = k h diag Ŵ 1 , ∀h, where k h > 0, and W 1 is a sample covariance estimator of the base forecast error at h = 1. As expressed in (4) [10],ê T is a n-dimensional vector of residuals of the base forecast model, andê T (1) =P T +1 −P T (1). The weighted least squares (WLS) estimator utilizing variance scaling (VS) adjusts the base forecasts based on the variance of errors. The expression to computeP T (h) in (8) can be modified as follows to represent WLS-VS estimator: WLS-VS can handle heteroscedasticity, when there is a different variance of base forecast errors at different hierarchical time series levels caused by using different methods to obtain base forecasts. This is an important feature of transformer load forecast aggregation as it is not always possible to use the same forecasting method to get the best base forecasts due to the variety of loads, such as residential, industrial, EVs, and DERs. For this reason, the WLS-VS estimator provides better accuracy compared to OLS.

3) WEIGHTED LEAST SQUARES -STRUCTURAL SCALING
Set W h = k h , ∀h; k h > 0, = diag(S1), and 1 is a n-dimensional unit vector. The expression to computeP T (h) in (8) can be adapted as follows to represent the WLS-SS estimator: It is assumed that every bottom-level hierarchy base forecast error has a variance of k h and that the errors are not attributed to each other. As a result, each element of encompasses the quantity of forecast error variations attributable to the higher hierarchy level [24]. This estimator, known as the WLS estimator that uses structural scaling (SS), depends on the hierarchical structure. In contrast to OLS, this method is more realistic as it only requires that the variance of forecast errors is equal at the bottom level of the hierarchy. In addition, it is independent of historical data and can handle heteroscedasticity in hierarchical load forecast aggregation. It is mostly helpful when the base forecast is unavailable, for instance, at a very early stage of smart meters deployment by utilities, no sufficient historical samples are available. Table 1 shows the comparison among state-of-the-art benchmark covariance matrix estimators.

IV. THE PROPOSED METHOD
In the proposed method, MinTSa for smaller distribution networks and MinTSh for large distribution networks are used by considering information on the hierarchical structure and covariance of errors at the base forecast; and ARIMA [25] is chosen to simulate base forecasts due to its simplicity (Note: base forecasts can be done through any existing forecasting method, and it is not the focus of this study; the optimal reconciliation for base forecasts is the main focus).

A. THE PROPOSED COVARIANCE MATRIX ESTIMATORS 1) MIN-T SAMPLE (MINTSA)
The covariance matrix estimated by the MinT estimator is a n × n without zero elements. For MinTSa, we set W h = k hŴ1 , ∀h, where k h > 0, andŴ 1 is a sample covariance estimator of the base forecast error at h = 1. Given that only a sample ofŴ h at h = 1 is utilized, MinTSa is relatively simple compared to MinTSh. It works best when m < T , i.e., the total quantity of observations is lower than the quantity of historical data used. Practically, the number of customers served by a particular distribution transformer is not greater than 50 in most cases. For T to be greater than 50, historical data must at least cover 12.5 hours in a 15-minute resolution. Similarly, 250 hours (10.42 days) of data is required for a substation transformer with 1,000 customers.
The expression to computeP T (h) in (8) can be further expanded as in (15) to reflect the MinTSa estimator.
whereŴ 1,D is a diagonal of the matrixŴ 1 , and λ D is the shrinkage parameter. The off-diagonal components ofŴ 1 are substantially reduced toward zero, but the diagonal values (variances) remain unchanged. Assuming variances are constant, λ D can be computed by wherer ij is the (i, j) th component of the correlation matrix when h = 1 (one step ahead) shrinks it to become an identity matrix. In the case of MinTSa, λ D = 0, leading to W h = k hŴ1 . The MinTSh estimator is used when a large bottom-level series is present. For a higher hierarchy level at a substation transformer, MinTSh has a better performance than MinTSa. The expression to computeP T (h) in (8) can be further extended as follows to represent the MinTSh estimator: The main innovation of the proposed method compared to existing hierarchical load forecast methods is the new and efficient approach to compute the covariance matrix W h and to consider the covariance of the base forecast errors of two time series data in the hierarchy.

B. AUTOREGRESSIVE INTEGRATED MOVING AVERAGE (ARIMA)
ARIMA is a linear forecasting approach that works well with static time series data. In the first step, a static time series is built through differencing d times and performing different nonlinear operations like logging [26]. The resulting data is characterized as a linear transformation of previous p datasets and q errors as indicated in (19).
where P t is the true value at a given time t, ε t is an error, which is Gaussian distributed (0,σ 2 ) characterized white noise, i is autoregressive (AR) coefficients for (i= 1, 2, . . . ,p), and j are moving average (MA) coefficients for (j= 1, 2, . . . ,q). The integers p and q represent orders of AR and MA, respectively. The important parameters to model ARIMA are p, q and d.

C. ALGORITHM OF THE PROPOSED METHOD
The proposed optimal reconciliation method is MinT, which has two variations: MinTSa and MinTSh. The base forecast and real values are used to calculate errors, then the hierarchical tree information is used along with the reconciliation algorithm to obtain the reconciled forecast, and a lower error is ensured by comparing them to the base forecast. The ultimate task of the reconciliation algorithm is to reduce the error of the reconciled forecast, in turn improving its performance. The base forecast errors of transformer loading in the hierarchy are correlated, resulting in the elements outside the diagonal of the covariance matrix W h to have non-zero values. The proposed method can better handle this situation than the benchmark covariance matrix estimators. The proposed method is shown in Fig. 3 and Table 2.

D. FORECAST EVALUATION METRICS
Error metrics to evaluate the proposed method include the mean absolute error (MAE), mean absolute percentage error (MAPE), mean squared error (MSE), and root-mean-square error (RMSE). MAE measures the average absolute differences, MAPE measures the average percentage differences, and MSE calculates the average squared differences between predicted and actual values. RMSE is the square root of MSE, providing a more interpretable measure. Lower values of these metrics indicate better model performance, and thus, they offer a quantitative comparison and evaluation  of different models' accuracy. They can be mathematically expressed by (20) - (23).
where P i is the true value,P i is the forecasted value, n refers to the number of samples, and i is the index.

V. CASE STUDIES
To validate the proposed method, five case studies are conducted using three test grids.

A. TEST GRIDS
Test systems A, B and C are used to validate the proposed method.

FIGURE 4. Test system A.
Test system A is composed of one 50 kVA-15/ 0.21 kV-Delta/Wye distribution transformer, 10 domestic customers and 3 industrial customers. The grid parameters are obtained from [27]. The details of Test System A are illustrated in Fig. 4. Test system B is the IEEE European network [28], [29] and is modified to represent a North American grid. It operates as an unbalanced grid, delivering single-phase connections to 55 domestic and industrial customers, with 21, 19 and 15 connected to phases A, B and C, respectively. Fig. 5 shows a detailed geographical single-phase diagram of Test System B. Fig. 6 shows Test system C, a more complicated distribution grid composed of one 2 MVA 132/14.4 kV substation transformer, twenty 14.4/0.21 kV distribution transformers, and 1,000 customers (18% are industrial customers and the rest are domestic customers). Power ratings of distribution transformers range from 25 to 200 kVA, and the number of customers connected to each transformer ranges from 13 to 62. Test system C has one more hierarchy than Test systems A and B. 93478 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.

B. AMI DATASET DESCRIPTION
The electricity power consumption data is provided by Saskatoon Light and Power, a distribution utility in Saskatoon, SK, Canada. The data has a resolution of 15 minutes for over 65,000 customers in the industrial, commercial, and residential sectors. A dataset for 1,000 customers is selected for this paper, which was recorded from 0:15 AM on March 1, 2017, to 11:45 PM on March 1, 2018. With a time resolution of 15 minutes, there are 35,136 steps in the dataset and a one-time horizon h = 15 minutes. In the proposed method, to get base forecasts using ARIMA, 80% of the data is used for training the model, and 20% is used for testing the performance of the model. In Cases 1-4, optimal reconciliation is performed at all three hierarchical levels of the transformer loading: 1) the total load -top level (Level 0), 2) the three-phase load -middle level (Level 1), and 3) customer consumption -bottom level (Level 2). In Case 5, one higher hierarchy level is added to incorporate a load for a substation transformer.

1) CASE 1 -SMALL NETWORK SIMULATION USING ONE MONTH DATA
In Case 1, one-month historical data is used for base forecasts, optimal reconciliation is then conducted using the proposed method with such limited data using Test System A. In this case, the accuracy improvement of the reconciled forecast vs. base forecasts evaluated by MAE is shown in Table 3 for all hierarchical levels and six forecast horizons. MinTSa outperforms other methods, followed by MinTSh. At the forecast horizon h = 2 and hierarchy level 1, there is a 7.41% improved accuracy. The proposed method (MinTSa and MinTSh) performs well for load forecasts at the middle level, but MinTSa performs better than MinTSh for a lower number of customer load forecasts at the bottom level. Among the four benchmark methods, BU performs the worst with negative values, indicating lower accuracy than base forecasts. BU provides 0% improved accuracy at the bottom level, so it is only suitable for a higher-level load forecast. Table 4 shows improved accuracy evaluated by MAPE. MinTSa performs better than others in Levels 0 and 1 for all six horizons, but MinTSh performs better at Level 2. Bold numbers in the table show the best accuracy improvement. Although benchmark methods (except BU) have acceptable results, they are less accurate than the proposed method.
In Table 5, MinTSa has the highest improved accuracy of 9.82% for the forecast horizon h = 2at level 1 evaluated by MSE. As the forecast horizon increases, the performance of MinT tends to decrease at all levels. However, the proposed method is still suitable for short/medium horizon prediction in distribution systems.  Table 6 shows the accuracy improvement of the proposed reconciled forecast vs. base forecasts evaluated by RMSE. MinTSa shows the best performance, followed by MinTSh. MinTSa has the highest improved accuracy among all linear reconciliation methods, including OLS and WLS, with the most accurate and coherent forecasts at all hierarchy levels.
The accuracy improvement of the overall hierarchical tree structure for Case 1 is shown in Fig. 7. The proposed reconciliation models based on trace minimization yield accurate and consistent predictions at all hierarchical levels. Fig. 8 shows a comparison between the true, predicted, and reconciled values for the first 100 samples. The reconciled lines (black and blue solid lines) are closest to the true line (dashed light blue line) with the lowest errors. MinTSa and MinTSh show consistent performance across all prediction horizons.

2) CASE 2 -SMALL NETWORK RANDOM-ORIGIN SIMULATION USING ONE MONTH DATA
In Case 2, one-month historical data is used for base forecasts in Test System A. The simulation for the forecast horizon, h = 1, is conducted for one particular time snap; and again for 30 random time snaps, and their average is taken. The objective of Case 2 is to show that the proposed method for VOLUME 11, 2023   Table 7. ''h= 1 (30)'' means the average of 30 simulations, and ''h = 1'' means the result of one simulation. In fact, ''h= 1 (30)'' shows even better accuracy than ''h = 1'' for most hierarchy levels and accuracy metrics for both MinTSa and MinTSh. 93480 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.  A comparison of the true, predicted, and reconciled forecasts for phases A, B, C and the total power of three phases for h= 1(30) is presented in Fig. 9. The superiority of the proposed method is confirmed as the MinTSa and MinTSh reconciled lines are closest towards the true line, showing the tendency of the proposed reconciliation method to minimize the error.

3) CASE 3 -SMALL NETWORK SIMULATION USING ONE YEAR DATA
In Case 3, one-year historical data is used for base forecasts in Test system A, and the proposed method is then applied for optimal reconciliation to determine if it can handle high data variance in one year due to seasonal changes, which causes dramatic changes of electricity consumption behaviours of customers using Test system A. In Case 3, RMSE is used for evaluation. In Table 8, the average accuracy improvement of MinTSa is the highest at all levels followed by MinTSh, indicating the robustness of the proposed method subjected to large variances in the dataset. The accuracy improvement of the overall hierarchical tree structure of the proposed method vs. benchmark methods for Case 3 is shown in Fig. 10, and the proposed method shows superior performance over benchmark methods. MinTSa (the green line) is always on the top, mostly followed by MinTSh (the blue dark line).
The comparison of the true, predicted, and reconciled forecasts using the proposed method and four benchmark methods for the first 100 samples in Case 3 is shown in Fig. 11.   Reconciliation algorithms push the predicted line, which is the result of ARIMA, toward the true line to minimize the error. The error is a vertical distance between the true and Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.

4) CASE 4 -UNBALANCED MEDIUM NETWORK SIMULATION
Test system B is used in Case 4 to simulate load forecast aggregation using the proposed method for an unbalanced medium size network with 55 customers. The simulation is done for two forecast horizons, h = 1, and 32, and one-month historical data is used to generate base forecasts. Table 9 shows the accuracy improvement of reconciled forecasts compared to base forecasts according to error indices, MAE, MAPE, MSE, and RMSE, for all levels of the hierarchy. MinTSa outperforms other methods, followed by MinTSh. The proposed methods (MinTSa and MinTSh) have the highest improved forecast accuracy at the middle level. The bold numbers in Table 9 indicate the best accuracy improvement.
Although the benchmark methods (except BU) show some improvements, they are less accurate than the proposed method. The highest accuracy improvements scored by the proposed methods are 8.51%, 7.27%, 13.47%, and 6.88% for MAE, MAPE, MSE, and RMSE, respectively. Among the benchmark methods, BU performs the worst with negative values, indicating lower accuracy than base forecasts. BU provides 0% improved accuracy at the bottom level, so it is only suitable for higher-level load forecast aggregation. MinTSa and MinTSh show consistent performance across both horizons. Fig. 12 shows the comparison of the true, predicted, and reconciled forecasts for the first 100 samples of Case 4 (h = 1 and 32 forecast horizons for phases A, B, C, and for the total power among three phases). MinTSa (blue solid line) and MinTSh (black solid line) are the closest to the true line (dashed blue line) compared to the lines of the benchmark methods. The superiority of the proposed method is confirmed as the MinTSa reconciled line is the closest to the true line, showing the tendency of the proposed reconciliation method to minimize the error (a vertical distance between the true and reconciled line). Case 4 demonstrates that the proposed method performs well in a medium-sized distribution grid.
(a four-hierarchical-level system). The simulation is conducted for two forecast horizons,h = 1 and 32, and one-month historical data is used to generate base forecasts. The average accuracy improvements using the proposed method for the two forecast horizons at each hierarchical level are evaluated by RMSE and MAPE as shown in Tables 10 and  11, respectively.
In Case 5, MinTSh has the highest MAPE average accuracy improvement of 9.65% at the top level (Level 0) for h = 1. MinTSh shows a similar or better performance than MinTSa mainly because MinTSh has a better strategy to deal with data variance when the number of the bottom level's observations (b t ) is high. MinTSa and MinTSh have the highest accuracy improvement compared to the benchmark methods.
OLS and WLS-SS do not require parameters to estimate the covariance matrix, W h , and only use the information contained in the hierarchy structure; while WLS-VS, MinTSa and MinTSh depend on errors in the historical data. Only MinTSa and MinTSh consider the covariance of the base forecast errors of two time series data in the hierarchy [6]; while OLS, WLS-VS, and WLS-SS are based on the assumption that the base forecast errors between two time series are stochastic. This is the main reason that the proposed MinT-based hierarchical load forecasting performs better than the benchmark methods.

VI. CONCLUSION
In this paper, a novel transformer load forecast aggregation method through the minimum trace optimal reconciliation and AMI data is proposed, along with an independent stateof-the-art base load forecast at each hierarchical level. The proposed method aims to exploit the information contained in the hierarchical structure and improve the accuracy of traditional load forecasting practices for distribution and substation transformers. It is proven through several case studies that the proposed method significantly improves the load forecast accuracy and has superior performance compared to the four chosen benchmark methods (BU, OLS, WLS-VS, and WLS-SS) in simple and complex distribution networks. Although the primary purpose of this work is for transformer health monitoring, the proposed method is also applicable for distribution grid rebalancing and renewable power generation aggregation.
The recommended future work includes: • Incorporating the Grid Topology Information: Integration of the grid topology information into the hierarchical load forecasting process can be done by considering spatial relationships among different grid components, such as transformers, substations and feeders, to enhance the forecasting accuracy.
• Integrating Demand-Side Management: Integration of demand-side management strategies into the hierarchical load forecasting model allows for load shedding, demand response, and peak shaving to be incorporated as part of the forecasting process.
• Uncertainties Propagation in Hierarchical Forecasts: Techniques can be developed to propagate uncertainties across different levels of the hierarchy. The uncertainty at a higher level may affect forecasts at lower levels, and accounting for these uncertainties can improve the overall reliability of load forecasting.
• Dynamic Hierarchical Structure: Dynamic adjustment of the hierarchical structure based on changing load characteristics or grid conditions involves automatically restructuring the hierarchy to adapt to load variations and system changes.