Building Marginal Pattern Library With Unbiased Training Dataset for Enhancing Model-Free Load-ED Mapping

Input-output mapping for a given power system problem, such as loads versus economic dispatch (ED) results, has been demonstrated to be learnable through artiﬁcial intelligence (AI) techniques, including neural networks. However, the process of identifying and constructing a comprehensive dataset for the training of such input-output mapping remains a challenge to be solved. Conventionally, load samples are generated by a pre-deﬁned distribution

T HE 2016 victory of AlphaGo, a computer program that defeated the strongest human Go player in the world, demonstrated the potential of artificial intelligence (AI) for solving complex decision-making problems [1]. The continuous development of AI is profoundly impacting everyday life and industrial developments. The increasing digitalization of the power grid and impressive leaps in computation capabilities are unlocking the possibility of AI-enhanced power systems [2]. The speed of development of AI techniques is revolutionizing traditional power system planning and operations. Reference [3] compared power system AI with the epic AlphaGo computer program and sketched promising prospects of implementing AI techniques in the power system. Reference [4] analyzed the opportunities and challenges of adapting and developing AI techniques in transmission, distribution, microgrids, and multi-energy systems.
One of the most recent applications of an AI technique in power systems is the use of AI in economic dispatch (ED), which is essentially a security-constrained optimal power flow (OPF) problem. Note, once the ED is solved, some results, such as generation dispatches, unsupplied loads, and system total cost, are directly available. Meanwhile, indirect results such as reliability indices and locational marginal prices (LMPs) can be easily obtained.
The ED problem must be solved repetitively within a short time during daily operations. Therefore, recent research has attempted to directly predict the results of the OPFbased ED problem through neural networks without solving the optimization model. In ref. [5], various regression models, including a support vector regression model and a fully connected neural network, were applied to predict optimal dispatch results based on load and contingency data. Similarly, ref. [6] built a neural network to learn the mapping between the load and generation dispatches. In ref. [7], a graph neural network was constructed to predict the optimal dispatch results based on loads. In ref. [8], a deep neural network was combined with the Lagrangian dual method to improve the accuracy of the prediction of optimal dispatch results.
Most recently, ref. [9] developed the DeepOPF approach, which applies a deep neural network to predict the dispatch result of a linearized OPF problem. With a linearized power flow, the OPF-based ED problem becomes convex. Although solving the linearized ED problem is generally efficient, it incurs computational complexity depending on different applications [10], [11]. Some of the literature has applied data-driven learning techniques to identify specific patterns of the linearized ED problem instead of directly predicting ED outputs. In ref. [12], a neural network was proposed to predict the umbrella constraints that form the feasibility regions of the ED problem. In ref. [13], a neural network classifier was proposed to learn the binding constraint of the ED problem. In ref. [14], statistic learning was applied to learn the mapping between the optimal basis of ED and uncertainties.
In summary, previous research has applied different types of neural networks to predict the following four outputs, direct or indirect, of the ED problem: (1) the optimal dispatch results; (2) the optimal cost; (3) the reliability index/status; and (4) the characteristic of the optimal solutions. However, the training dataset (i.e., load versus above four outputs) is generally produced by a pre-defined distribution without considering the intrinsic characteristic of the ED problem. For example, research works [8] and [9] apply uniform distribution to generate load samples. Research works [13] and [14] use a normal distribution to generate load samples. In general, the most straightforward way to generate a large set of load samples is to directly apply a certain distribution. However, this paper demonstrates that the randomly generated load samples are biased in relation to ED outputs, such as generator dispatches and LMPs. Here, three algorithms are proposed to construct the marginal pattern library, and another algorithm is proposed to enhance the dataset for model-free applications in ED and LMP calculations. In summary, this paper provides a better way to generate load samples for the training dataset of load-ED mapping.
The main contributions of this paper are two-fold: • This work identifies that a randomly generated dataset is biased for ED output prediction, even when the dataset capacity is large. The loading intervals for different marginal patterns differ significantly in size. It is possible that the randomly generated datasets overfill the intervals with large sizes and underfit the intervals with small sizes. It is worth noting that a loading interval with a large size is not necessarily more important than a loading interval with a small size. It is possible that load vs. ED outputs may vary considerably in a small loading interval for an intra-day operation, so it is important to understand the behavior of load vs. ED in this small interval.
• This work proposes three algorithms to construct a marginal pattern library and examine the dataset: (1) a VOLUME 9, 2022 comprehensive enumeration algorithm; (2) an iterative search algorithm; and (3) a fast screening algorithm. An enhancement algorithm is also proposed to enhance the training dataset according to the marginal pattern library. The characteristics of the proposed algorithms are illustrated with several examples, and a comparative study demonstrates the effectiveness of the enhancement algorithm. The rest of this paper is organized as follows. Section II briefly reviews the formulation of the ED problem. Section III discusses the phenomenon that a randomly generated dataset is biased for predicting ED-based problems. In Section IV, the three proposed algorithms for marginal pattern collection and the dataset enhancement algorithm are presented with illustrative cases. Section V describes a comparative study demonstrating the effectiveness of the enhanced dataset. Finally, a conclusion is drawn in Section VI.

II. PRELIMINARIES ON ED AND LMPs
The ED problem is typically formulated as a linearized OPF problem in most of the ISOs due to the computation issue [15]. A general ED problem with line flow limits and unit capacity constraints is formed in (1a)-(1d). Other technical constraints, such as N-1 contingency scenarios and reserves, could be added by modifying the constraint set in problem (1) but they are not explicitly modeled here for illustration simplicity.
The LMP is calculated after the ED result in (1a)-(1d) is obtained, so it is an indirect result of the ED problem. The LMP pricing scheme has been widely adopted in U.S. electricity markets to provide economic signals to market participants. LMPs are defined as the marginal increase in dispatch costs versus the marginal increase in load consumption at a particular bus, as given in (2) [16].

III. THE BIASED TRAINING DATASET FOR DATA-DRIVEN ED MODEL
The ED results, such as generation dispatches or LMPs, have a unique characteristic called ''step change'' [17] at some specific system load levels. The load level at which a step change of LMP occurs is referred to as a critical load level (CLL). As discussed in [17], when the loading level varies within a certain interval, the marginal unit output and load flow also vary w.r.t. the loads according to a certain pattern. In this paper, loading level refers to the sum of loads in the system or system loading level. This phenomenon is also identified as ''system pattern regions'' in [18] and [19]. When the loading level steps out of the interval, the pattern changes instantaneously with a step change of LMP [20].
If the load at each bus can be grouped by a set of participating factors, then each marginal pattern corresponds to a continuous loading interval. If the load at each bus changes individually, then each marginal pattern corresponds to a multi-dimensional region (also referred to as loading interval in this paper). Under either situation, some of the marginal patterns correspond to large loading intervals, while the others correspond to small loading intervals. Fig. 1 shows an illustrative example of marginal patterns and various CLLs where price step changes occur. The loading intervals between two adjacent CLLs for MP1, MP5, and MP6 are larger than the loading intervals for MP2, MP3, and MP4. Below are some crucial observations of marginal patterns (MPs) and loading levels in Fig. 1: • If load samples are generated according to pre-defined distributions (e.g., uniform, normal or Weibull distribution), the marginal pattern of a large interval will have many more training samples than the marginal pattern of a small interval. For example, if the dataset is generated randomly according to a uniform distribution, the probability of samples landing on different marginal patterns is shown in Table 1, where the percentages are rounded to the nearest integers. Most training samples will be placed in MP1, MP5, and MP6, and only a small portion of the samples will be placed in MP2, MP3, and MP4. Seemingly, this is reasonable because large intervals have more training samples. However, this may lead to insufficient number of training samples in a small interval to have good results. In plain language, a large interval may have an unnecessarily large number of training samples, while a small interval may not have enough training samples -possibly 0 samples in an extreme case.
• Note, a small interval is not necessarily less important than a large interval. It is possible that load vs. ED outputs may vary considerably in a small loading interval for an intra-day operation, so it is important to understand the behavior of load vs. ED in this small interval.
• Ideally, the number of training samples across all marginal patterns should be sufficiently large. It is preferred that the number of training samples is the same in each interval, rather than based on the width of intervals. In summary, if the training dataset and test dataset are generated together by a pre-determined distribution, then the test dataset also contains few test samples in the marginal pattern with small sizes, making the data-driven prediction less accurate. In other words, the biased training dataset eventually leads to a biased neural network. A detailed example is provided in Section V for comparative case studies.

IV. TRAINING DATASETS EXAMINATION AND ENHANCEMENT ALGORITHMS
The phenomenon of dataset absence in a marginal pattern with small loading intervals calls for efficient examination and enhancement methods. This section consists of two parts: (1) three algorithms are proposed to collect marginal patterns, which construct a marginal pattern library examining the training dataset; (2) if the marginal pattern library contains patterns that are missing in the training dataset, an enhancement algorithm is proposed to generate training data for those missing marginal patterns to eliminate the bias.

Algorithm CE (Comprehensive Enumerations):
The optimality of the optimization problem (1) always lies in the extrema of the constraint set [23], namely the intercepts of binding constraints. Therefore, solving problem (1) is equal to solving a system of linear equations, which means the optimum of problem (1) can be represented by (3) in a matrix representation. The generation variables are divided into the generation of marginal unit MG and the generation of non-marginal unit NG. The binding and non-binding line flow constraints are indicated by CL and UL, respectively. Equation (3) holds for any solution to problem (1).
Under a given network, the value of GSF is constant. Thus, a sensitivity matrix W of loads and basic variables (i.e., P MG and S UL ) can be obtained as in (4), which represents the change of marginal unit output and the line flow change in uncongested lines if there is a load increase at a particular bus.
The selection of MG and UL in (4) determines the marginal pattern. A marginal pattern is uniquely labeled by LMPs, as shown in (5). Each marginal pattern corresponds to a CLL. It is worth noting that outputs of the ED problem, such as optimal dispatches, marginal patterns, and LMPs, are consistent, which means identifying one of them is equivalent to identifying all [21]. The following discussion will focus on identifying LMPs.
If the binding constraints in the market-clearing model (1) are determined, the LMP is also determined. LMPs can be represented as the cost of serving the next incremental load that is covered by the marginal units, as shown in (6).
By substituting (4) into (6), LMPs can be formulated as in (7), where matrix W MG represents the row that corresponds to marginal units in matrix W. The values of matrix W MG are determined by the set of marginal units and congested lines.
From (7), any combination of congestion patterns and marginal unit patterns (i.e., potential marginal patterns) VOLUME 9, 2022 produces LMPs. However, some obtained LMPs are invalid, which means some combinations are invalid. For a specific system, the number of units and the number of transmission lines are both limited, and thus, the number of potential marginal patterns and LMPs are also limited or finite. The enumeration of the potential marginal patterns gives all the possible LMP values. The number of combinations is given in (8).
However, problem (1) is solvable only if the first matrix in (2) is invertible, which means the number of binding line flow constraints must equal the number of marginal units minus 1, as shown in (9). Therefore, the value of N co can be reduced as in (10).
The above steps give all the potential values of LMPs by enumerating all the potential marginal patterns. However, some patterns are nonexistent under any load.
Problem (1) can be equivalently represented by a system of constrained equations (i.e., Karush-Kuhn-Tucker (KKT) conditions) [22]. Traditionally, the load d i is known, and solving the KKT system provides the value of the Lagrangian multipliers, which construct the values of LMPs. However, for given marginal patterns and LMPs, there may be multiple suitable load patterns. If any load pattern leads to such marginal patterns and LMPs, the obtained marginal pattern and LMPs are valid. Therefore, the load d i at each bus is treated as a variable to examine if there is a solution for (11) and (12).
KKT system of problem (1) The KKT system is a necessary and sufficient condition for the convex problem (1). Therefore, if the KKT system is solvable, then the variables are the optimal solution for the problem (1). In (11), the value of LMPs is specified. If any solution for (12) exists, there are corresponding LMPs and a marginal pattern for the solution. Thus, for each combination from (10), equations (7), (11), and (12) are solved to remove invalid marginal patterns. Although the potential combinations are generally a large set, the possible number of congested lines is generally less than the number of branches, which could further reduce the value of N co . For example, the ISO New England system has 2771 branches but the average active transmission constraint in January 2020, their winter peak month, only has 142 branches [11].
The detailed procedures of this comprehensive enumeration are shown in Algorithm CE, where CE stands for ''comprehensive enumeration.'' Obtain the potential LMPs with (7). 4 Solving equation set (11) and (12)  6 If (11) and (12) are solvable do 7 Record the marginal pattern and LMPs. 8 Else 9 Continue. 10 End if 11 End for 12 Return the marginal pattern and LMPs library The proposed Algorithm CE is demonstrated on the PJM 5-bus system [24], [25]. The marginal pattern library for this test system is constructed by Algorithm CE, as shown in Table 2 (on the next page). Any load sample in this test system will correspond to one of the marginal patterns in Table 2. Future research may validate the implementation of Algorithm CE by comparing the results with Table 2. Algorithm CE provides a comprehensive enumeration method for collecting marginal patterns and LMPs. This test system contains 85 potential combinations, and 70 of them are invalid and removed. For example, unit 1, unit 4, and unit 5 cannot be marginal units simultaneously under any load pattern. The whole process takes 89.81s.

Algorithm IS (An Iterative Search Method):
Algorithm CE enumerates all the potential marginal patterns and then removes invalid patterns. When the system becomes larger, the number of potential marginal patterns becomes astronomical, making the validation process computationally expensive. Therefore, Algorithm IS aims to link one valid marginal pattern to another. Then, the marginal patterns and LMPs can be collected iteratively.
Equation (4) shows the incremental change in unit output and power flow with respect to incremental change in loads. When the incremental change in unit output and power flow are equal to the distance between the current value and the constraint limit (i.e., become binding), the required load increase at each bus can be represented as a matrix d as shown in (13). Each element in matrix Dis indicates a constraint that is one binding constraint away (denoted as ''surrounding'' marginal patterns) from the current marginal pattern. If a new binding constraint is identified, a new marginal pattern is found. Thus, if the load changes as indicated in the matrix's d column, all surrounding marginal patterns are obtained iteratively.
However, it should be noted that the sensitivity matrix W is only valid under the current marginal pattern, which means that although some columns in the matrix d may lead to a new marginal pattern, it does not correspond to the constraint as indicated in the matrix Dis. Under the assumption of the constant load participating factors, the value of the matrix d is deterministic. Thus, the constraint corresponding to a lower value of load increase is always reached first (i.e., the next binding constraint). Under the assumption of varying load participating factors, each element in the matrix d becomes a variable, and different participating factors correspond to different next binding constraints. Equation (13) is, however, always valid in terms of linking the current marginal pattern to other marginal patterns.
Thus, a bilevel optimization model can be constructed to determine if the surrounding marginal pattern is valid. As shown in problem (14), the upper level aims to find a valid load increase d under the current marginal pattern such that it can make the corresponding constraint in matrix Dis become binding, as shown in (14d)-(14f). The lower level is the original ED model in problem (1) with the load increase d. Iteratively solving the optimization problem (14) with respect to the element in matrix Dis gives all the valid surrounding marginal patterns from the current marginal pattern. If the problem (14) is not solvable, then the obtained marginal pattern is not valid.
Upper-Level Problem: If the element in Dis corresponds to a capacity constraint: If the element in Dis corresponds to a negative line limit: If the element in Dis corresponds to a positive line limit: Lower-Level Problem: The obtained surrounding marginal patterns are recorded in the library. Next, one of the surrounding marginal patterns is selected to be the next step. To collect as many marginal patterns as possible, the closest marginal pattern, which is the one with the smallest loading level increase, is selected. Then, (13) is recalculated at the new marginal pattern. The search is performed iteratively until problem (14) is unsolvable for all elements in matrix Dis or all elements in matrix Dis have been stepped. Then, a new step is selected from the library, until all the patterns in the library have also been stepped. This algorithm searches around the current marginal pattern and collects marginal patterns iteratively, which may miss some marginal patterns during the search. Therefore, Algorithm IS could be performed iteratively under different initial marginal patterns until the library is sufficient. The detailed procedures of this collection method are shown in Algorithm IS, where IS stands for ''iterative search.'' A test case is applied to demonstrate Algorithm IS via the European transmission network 89-bus system [26]. If all elements in matrix Dis have been stepped do 6 Break 7 End if 8 For each element in matrix Dis do 9 Solving optimization problem (14)  10 Record marginal pattern and LMPs in the library 11 End for 12 Identify the least load increase 13 Step to the identified marginal pattern and denote it as stepped 14 End while 15 End for 16 Return the marginal pattern and LMPs library The collected marginal patterns for Algorithm CE, Algorithm IS, and randomly generated by a uniform distribution are shown in Table 3. The computation times for Algorithm CE and Algorithm IS are 133,237 s and 31,352 s, respectively. Algorithm IS collects 79% of the marginal patterns, with only 17% of the computation time of Algorithm CE. The randomly generated load sample has 5 million samples. By contrast, Algorithm CE only collects 12% of the marginal patterns at 274% of the computation time of Algorithm IS. The computation time of Algorithm IS is reduced significantly compared to Algorithm CE, and most marginal patterns are collected.

Algorithm FS (A Fast Screening Method):
The training dataset is usually generated offline, which may make the computation time of dataset generation a minor concern. However, a fast screening method is preferred for collecting marginal patterns when the system operates in complicated conditions. For example, solving the bilevel model in Algorithm IS becomes computationally expensive when the potential line of congestion is a large set.
Algorithm FS proposes a fast screening method, which is a variant of Algorithm IS. The iterative searching procedure of Algorithm FS is similar to Algorithm IS, but instead of solving the bilevel model (14), Algorithm FS only solves the d at the most sensitive bus for each element in the matrix Dis, as shown in (15). Solving the load increase at the most sensitive bus provides the smallest load increase. (15) All surrounding marginal patterns are scanned by solving the linear equation (15), which is much faster than solving the bilevel optimization model (14). Both Algorithm IS and Algorithm FS miss a few marginal patterns during the iterative collection process. However, Algorithm FS is an ''incomplete'' local search, meaning that part of the surrounding marginal pattern will also be missed, while Algorithm IS is a ''complete'' local search, which can obtain all surrounding marginal patterns. It is worth noting that the missing marginal patterns under the current step could still be collected in later steps.
The End if 7 For each element in matrix Dis do 8 Solving equation (15)  9 Record marginal pattern and LMPs in the library 10 End for 11 Identify the least load increase. 12 Step to the identified marginal pattern and demote it as stepped 13 End while 14 End for

Return the marginal pattern and LMPs library
Algorithm FS is also performed on the 89-bus system for the benefit of comparison to Algorithm CE and Algorithm IS. Algorithm FS collects 465 marginal patterns with a 4088s computation time, as shown in Table 4.
If compared to Algorithm IS, Algorithm FS further reduces the computational time, although some marginal patterns may be missed. However, if compared with the random dataset generation, it collects more marginal patterns within a much shorter computational time. Algorithm FS is preferred if the computational time of dataset generation is critical.

B. DATASET ENHANCEMENT
Section A constructs the marginal pattern library using three different algorithms to examine the completeness of the training dataset. However, this is not the end of the effort. As previously mentioned, even if we identify many possible marginal patterns, the generated dataset may not contain samples in some CLL intervals, typically small ones. Thus, the samples can be biased. Therefore, to fix the above potential problems, this section proposes Algorithm DE, where DE represents ''dataset enhancement'' with unbiased dataset generation.
The three algorithms (i.e., Algorithms CE, IS, and FS) in Section A construct a marginal pattern library. Marginal patterns in the unenhanced training dataset are compared with the marginal pattern library. For each marginal pattern in the library that does not exist in the training dataset, the following bilevel optimization model (16) is solved iteratively to generate extra samples to enhance the training dataset.
Upper-Level Problem: Lower-Level Problem: The goal of the upper-level problem (16a)-(16g) is to find a minimal loading level that leads to a specified marginal pattern. Constraint (16b) restricts the value of β i , which results in different load samples at each iteration. In constraint (16c), the output of the non-marginal unit is restricted to either 0 or the maximum. In constraint (16d), the ε is a small positive value that restricts the generation of marginal units to be larger than 0 and smaller than the maximum. Similarly, the pattern of line flow is restricted through (16e)-(16g). Thus, this bilevel problem means that the upper-level problem tries to find a load sample, which makes the lower-level dispatch problem produce the marginal pattern as indicated in the upper-level constraints. Multiple load samples can be obtained by solving problem (16) iteratively. The obtained load samples are integrated to enhance the original training dataset. Then, the enhanced training dataset has enough load samples at all the marginal patterns. Thus, an unbiased dataset can be achieved to enhance the mapping library.
The Record the load and the desired outputs 6 ν = ν−1.
End while 9 End for 10 Return the recorded samples

V. COMPARATIVE CASE STUDIES
The marginal pattern collection algorithms (i.e., Algorithm CE, Algorithm IS, and Algorithm FS) and the dataset enhancement algorithm (Algorithm DE) are discussed in the previous section. This section will present comparative case studies to exemplify the biased dataset and demonstrate the superiority of the enhanced (unbiased) dataset obtained by Algorithm DE. Simulation runs were performed in MATLAB 2017 on a laptop with an Intel i7-8650U processor and 16 GB RAM.

A. INSUFFICIENCY OF THE BIASED DATASET
Three different distributions including a uniform distribution, a normal distribution, and a Weibull distribution, are considered here for scenario sampling at each nodal load to demonstrate that the model-free approach based on randomized datasets is biased for ED-based problems.
In Algorithm CE, all the marginal patterns for the modified PJM 5-bus system have been collected as in Table 2.  One hundred thousand load samples are generated based on the above three distributions, and the corresponding marginal patterns are shown in Fig. 2. It should be noted that a few marginal patterns contain most of the samples in those datasets. For example, marginal pattern 13 contains more than half of the samples for the dataset generated by the normal distribution. However, most of the marginal patterns have insufficient samples. For example, marginal patterns 6, 10, 12, and 14 contain less than 400 training samples in any of the three datasets compared with the total of 100,000 samples.
Three neural networks (NN1, NN2, and NN3) with the same settings are trained with the above three different datasets, respectively. The load-LMP mapping is selected as a representative for the ED-based mapping problem. The neural networks are structured with two layers and 20 neurons under the Levenberg-Marquardt training algorithm. One hundred illustrative test samples are generated for marginal pattern 10 (small loading interval) and marginal pattern 13 (large loading interval), respectively. Fig. 3 illustrates the prediction errors for patterns 10 and 13 in three different datasets. The x-axis sorts the test sample from the smallest error to the largest error. The average prediction errors for pattern 13 are 6.2% in NN1, 4.1% in NN2, and 6.1% in NN3, while the average prediction errors for pattern 10 are 29.2% in NN1, 34.1% in NN2, and 33.4% in NN3. This performance difference occurs because marginal pattern 10 contains considerably fewer samples than marginal pattern 13, as shown in Fig. 2. Thus, the prediction for marginal pattern 10 is much less accurate than for marginal pattern 13. Note, at this point, the Algorithm DE for dataset enhancement has not been applied. Fig. 3 shows poor performance due to a small number of training samples in a small loading interval (pattern 10) under biased training dataset.
If the number of training samples in a marginal pattern is not enough, any type of input-output mapping in this marginal pattern will not be accurate because marginal patterns determine the optimal solution of ED-based problems. This insufficiency will be exacerbated in larger systems that have more marginal patterns. Note, in general, this phenomenon exists in any model-free application for ED-based problems due to the step change nature shown in Fig. 1, and this paper uses the neural networks for load-LMP mapping as an example.
The next subsection shows a comparison of the enhanced dataset (unbiased) and the randomly generated dataset (biased) in this subsection on the modified PJM 5-bus system and an 89-bus PEGASE system.

B. COMPARISON OF THE ENHANCED (UNBIASED) TRAINING DATASET AND BIASED TRAINING DATASET 1) MODIFIED PJM 5-BUS SYSTEM
A neural network is trained with the enhanced (unbiased) training dataset generated by Algorithm DE. In the enhanced dataset, 666 new samples are generated for each marginal pattern, which constitutes a total of 9,990 new samples. The same 100,000 initial training dataset samples (i.e., biased samples) as from the last subsection are applied, and the neural network trained by the enhanced dataset is compared with the neural network results trained by the biased dataset using uniform distribution-based sampling (i.e., NN1 in Fig. 3 in the previous subsection) as an example.
The predication errors for marginal pattern 10 (small loading interval) and marginal pattern 13 (large loading interval) are shown in Fig. 4. The x-axis sorts the test sample from the smallest error to the largest error. The prediction errors of the enhanced training dataset for marginal pattern 10 is 5.0% on average. In contrast, the prediction errors of NN1 in the previous subsection are 29.2% on average. Thus, the prediction error on marginal pattern 10 is significantly reduced because the enhanced dataset has filled more samples in marginal pattern 10, which has a narrow loading interval. The only cost is the new 9,990 training samples generated from Algorithm DE, which is less than 10% of the initial 100,000 samples in the biased training dataset, so Algorithm DE should be a helpful and worthy effort.
However, the prediction accuracy on marginal pattern 13 on the enhanced, unbiased dataset is very close to the accuracy from the biased dataset. The errors are 3.9% vs. 5.5% on average, and 4.2% vs. 4.3% at median, respectively. The reason for this very minor improvement is that marginal pattern 13 has a large loading interval which already contains ample training samples in the biased dataset, and the extra dataset provided by Algorithm DE does not offer much help.
Note, although the comparison is carried out for NN1, similar conclusions hold for NN2 and NN3 since they have similar performance in prediction accuracy, as shown in Fig. 3.

2) 89-BUS PEGASE SYSTEM
Next, a similar prediction accuracy comparison between the enhanced (unbiased) training dataset and the biased training dataset is performed on the 89-bus PEGASE system. The biased training dataset is the same as the example in Section IV, which contains 5 million samples. This dataset is enhanced with the marginal pattern library obtained in Section IV using Algorithm DE. Two marginal patterns (205 and 339) with small loading intervals are selected as illustrative examples, as shown in Fig. 5. The x-axis sorts the test sample from the smallest error to the largest error. In the biased training dataset, patterns 205 and 339 contain less than 10 samples, and thus, the average prediction errors are 41.7% and 37.8%, which are extremely high. In contrast, the enhanced dataset adds 1500 extra training samples to each marginal pattern. This significantly reduces the average prediction error to 9.3% and 4.2%, respectively.
In both the 5-bus and the 89-bus systems, the enhanced (unbiased) dataset significantly improves the prediction accuracy for load-ED mapping in marginal patterns with small intervals (i.e., insufficient training samples).
Advanced learning techniques with the enhanced dataset will be investigated in future works, since the focus of this paper lies in the enhancement of dataset generation to provide unbiased training samples.

VI. CONCLUSION
In this paper, we have identified a phenomenon that training datasets generated by pre-determined distributions are biased for load-ED mapping. Marginal patterns characterize the optimal solution of ED-based problems, and different marginal patterns differ significantly in size, which causes the dataset to overfill patterns with large sizes while underfilling patterns with small sizes. Thus, this paper proposes three marginal pattern collection algorithms to construct a marginal patterns library. Then, a dataset enhancement algorithm is proposed to generate unbiased samples for each marginal pattern in the library. The proposed algorithms and enhanced training dataset are illustrated and examined with the modified PJM 5-bus system and the 89-bus PEGASE system. The case studies clearly demonstrate the effectiveness of the proposed approach which significantly improves the prediction accuracy of data-driven load-ED mapping.
Our future work will combine the enhanced dataset with advanced learning techniques to provide a comprehensive model-free load-ED mapping platform.