A Fast Penalty-Based Gauss-Seidel Method for Stochastic Unit Commitment With Uncertain Load and Wind Generation

Stochastic network-constrained unit commitment (S-NCUC) can be used to manage the uncertainty of an increasing penetration level of renewable energy effectively. However, the drawbacks of the progressive hedging algorithm (PHA) based solutions are that they are not provably convergent due to the non-convexity of S-NCUC. Additionally, the solution obtained is usually fractional-valued (non-binary) and therefore not readily implementable. In this paper, we apply a novel Penalty-Based Gauss-Seidel (PBGS) algorithm in solving S-NCUC using an exact augmented Lagrangian representation with proven convergence. To improve the computational efficiency of the PBGS, we further propose an accelerating technique with rigorous proof to skip solving scenarios when certain conditions are met during iterations. We numerically validate the proposed algorithms on the IEEE 118-bus system and a practically-sized ERCOT-like system, both with variable wind generation. Numerical results demonstrate the efficacy of the proposed algorithms in yielding high-quality S-NCUC solutions. The merits of the proposed algorithms are also revealed in comparison with PHA and extensive-form (EF) based mixed-integer programming solutions.

T HE exponential growth of variable renewable energy (VRE) such as wind and solar generation brings grand challenges to the operational planning of power systems. The instantaneous penetration of VRE reaches over 50% in certain balancing areas in the United States. The VRE generation is characterized by a large amount of uncertainties and variabilities. Consequently, power system operators, planners and researchers have made substantial efforts to manage VRE uncertainties in the power system scheduling, such as network-constrained unit commitment (NCUC) [1].
In order to account for the impact of VRE uncertainties, there are several noteworthy NCUC approaches in the literature, each with distinctive objectives, theories, computational requirements and economic outcomes [2]. The first approach is the use of dynamic operating reserve requirements, whereby exogenous reserve requirements are determined using the information on the distribution of possible forecast conditions [3], [4]. The second approach is the application of interval optimization, which uses confidence intervals in terms of upper and lower bounds to represent the uncertainty [5]. The third significant approach is the utilization of robust optimization, which solves the NCUC problem with respect to a worst-case scenario regarding uncertain parameters [6], [7]. While the robust NCUC guarantees reliability under any realization of scenarios considered, it may not be cost-effective as the worst case is rarely realized. The fourth vital approach is related to the chance-constrained optimization method, in which certain prevailing NCUC constraints are modeled as ''chance constraints'' to be satisfied at a predetermined level of probability [8].
Finally, a common approach presented in the literature is the use of stochastic programming, namely stochastic NCUC (S-NCUC), in which the expected system operating cost is minimized across a number of scenarios, each representing a possible realization of uncertainties. S-NCUC is typically a large-scale, non-convex, and mixed-integer programming (MIP) problem. It is modeled as a two-stage stochastic problem where the first-stage unit commitment decisions (here and now decision) are the same for all the scenarios. One pioneer work on S-NCUC was carried out in [9], where the uncertainties from generator and transmission line outages were considered. Uncertainties due to significant wind penetration was examined under rolling planning with scenario trees [10]. Constrained ordinal optimization technique was used in solving S-NCUC [11]. The synergy between VRE and storage was studied in the context of a multi-stage stochastic model [12]. A multi-stage model using stochastic dual dynamic integer programming was proposed in [13]. S-NCUC involves an indispensable process, i.e., scenario generation, to portray possible uncertainty realizations. A model-free scenario generation technique based on generative adversarial networks was proposed in [14]. A parallel-computing-based optimization technique was proposed in [15] to solve S-NCUC asynchronously within an acceptable operational time frame. A comprehensive review of S-NCUC was provided in [16].
The work discussed in [9]- [15] uses various methods to show the improvements of stochastic models on decreasing costs and improving reliability compared to traditional scheduling models or to show improvements in solutions from previous algorithms. Generally, S-NCUC solutions can be categorized into two main approaches. First, the most straightforward approach is to use a commercially available off-the-shelf solver to solve an extensive form (EF) of S-NCUC. However, for any realistic system with a reasonable number of scenarios, the resulting EF of S-NCUC may become computationally intractable [11].
Another pivotal approach to solve S-NCUC is known as a scenario-based decomposition method. This approach solves each individual scenario separately, usually in parallel, and a final solution is generated by coordinating all individual scenario solutions. Progressive hedging algorithm (PHA) proposed by Rockafellar and Wets [17] to solve scenario-based decomposition of linear programming (LP) problems, is one of these methods for solving stochastic MIP [18].
A detailed analysis of the PHA application to stochastic unit commitment (UC) is given in [19]. Obtaining a lower bound for MIP with an application to stochastic UC is discussed in [20]. Nevertheless, PHA is originally devised for a continuous convex program and is not provably convergent for the non-convex S-NCUC problem. The solution to the dual problem is generally primal infeasible and the once relaxed system-wide constraints may not be satisfied. An additional algorithm is required to restore the primal feasibility from a Lagrangian dual solution [21]. Therefore, it is desirable to directly obtain a primally feasible solution from the Lagrangian dual iterations. This gives rise to exact augmented Lagrangian, a class of exact penalty methods [22], [23] whose objective is to solve a constrained optimization (primal) problem through an unconstrained optimization problem that has the same local (global) solutions as the primal problem. In a non-convex problem like S-NCUC, only the local minimum is often found. This local minimum is an exact augmented Lagrangian solution and is referred to as an exact solution hereinafter.
Once the local minimum is obtained, it is imperative to measure the quality of that local minimum. Nevertheless, 212 VOLUME 8, 2021 the Lagrangian dual solution obtained from PHA is very sensitive to the penalty factor chosen [24]. The sensitivity adversely affects the convergence rate of PHA and may even result in cyclic behaviors [18], [24]. Heuristic methods were developed within PHA to choose an effective penalty factor, detect and break cyclic behaviors, as well as select terminating criteria [18]. A Frank-Wolfe algorithm combined with PHA (FW-PHA) has been recently applied in [25] to obtain better Lagrangian dual bounds of S-NCUC. However, the following critical questions remain unresolved for S-NCUC: 1) How can we devise an exact augmented Lagrangian function to acquire an exact solution? 2) If an exact solution is attained, how can we find a high-quality lower bound capable of measuring the quality of the exact solution efficiently?
To address the above questions, this paper applies a novel Penalty-Based Gauss-Seidel (PBGS) algorithm [26] with an exact augmented Lagrangian representation to solve S-NCUC within a scenario-based decomposition framework. To improve the computational efficiency of PBGS, we propose an accelerating technique that skips solving scenarios as long as certain conditions are satisfied. We numerically validate these algorithms on the IEEE 118-bus and an ERCOT-like large-scale systems. Numerical results demonstrate the high quality of the PBGS solution and the efficacy of the proposed algorithms. Additionally, we compare the proposed algorithms with other prevailing S-NCUC methods such as PHA and extensive-form-based MIP solutions. The main contributions of the paper are three-fold.
• We apply the PBGS algorithm for the first time to S-NCUC, including a positive-basis exact penalty representation and a binary implementable calculation to produce an exact solution. The exactness guarantees the primal feasibility of a Lagrangian dual solution.
• We propose a novel accelerating technique with mathematical proof to skip solving scenarios when certain conditions are met during the PBGS iteration.
• The proposed method is validated and compared with other algorithms, especially on a practically sized ERCOT-like system.
The remainder of the paper is organized as follows. In Section II, we briefly describe the S-NCUC formulation, followed by a detailed discussion on the PBGS algorithm followed by Fast PBGS. We validate the proposed algorithms and present the numerical test results in Section III. Conclusions and future work are provided in Section IV.

II. PROBLEM FORMULATION AND PROPOSED SOLUTIONS
In this section, we first lay out an S-NCUC formulation and then introduce the PBGS method, where a definition of an exact penalty representation is given. Finally, we discuss our proposed acceleration feature for PBGS which we call Fast PBGS.

A. STOCHASTIC-NETWORK CONSTRAINED UNIT COMMITMENT
We use an S-NCUC formulation that originated from Flexible Energy Scheduling Tool for Integrating Variable Generation (FESTIV) [27], [28]. The objective of the two-stage S-NCUC is to minimize the expected system operating cost across scenarios as follows: The first row in the objective function (1) represents the first-stage no-load cost and unit startup cost. The second row in (1) shows the second-stage expected production cost for all possible scenarios considered. This expected production cost is the summation of scenario production cost multiplied by the probability of each scenario. Slack variables on the power balance constraint (2) and power flow constraint (3) are also considered in (1) to maintain the feasibility of S-NCUC solutions, which are penalized as shown in the third and fourth row of (1), respectively. The objective function (1) is subject to the following individual scenario constraints (2)-(8): 2) Power flow constraints 3) Power flow equation

6) Generator minimum up and down constraints
Constraints (2)-(4) are system-wide constraints whereas constraints (5)- (7) are individual unit constraints in each scenario. Other prevailing constraints such as segment generation limits are also considered here. In addition, contingency and reserves constraints can be taken into consideration in the S-NCUC formulation [28]. The feasible region of decision variables is given as: where is determined by all the above constraints. Problem (1)-(8) is also known as the extensive form (EF) of the two-stage S-NCUC, in which even a moderate number of scenarios can result in a computational burden that quickly exceeds the capability of any stateof-the-art MIP solver. In addition, the computational burden increases exponentially with the size of the problem using the branch-and-cut method [26]. This is why scenario-based decomposition frameworks such as PHA are used to solve a large-scale S-NCUC problem [21]. It should be noted that we restrict the scope of this paper to the scenario-based decomposition method, which is one of the widely used approaches in solving large-scale S-NCUC problems.

B. PROPOSED FAST PENALTY-BASED GAUSS-SEIDEL ALGORITHM
The scenario-based decomposition framework separates the first-stage decisions into each individual scenario (e.g., I s i,t , IU s i,t ) and introduces a coupling constraint that enforces the first-stage S-NCUC decisions of non-quick-start generators (NQGs) to be the same across all scenarios. This constraint for NQGs, termed as a non-anticipativity constraint (NAC), is given below.
As UCs of quick-start generators (QSGs) can vary across the scenarios considered, constraint (9) is not imposed on quick-start units. By relaxing constraint (9), the augmented Lagrangian function is defined as: where and ω, ρ ∈ R NI×NT ×S ; is an augmenting function associated with NAC (9) and a penalty factor ρ; α is a scalar. In (10), the terms regarding the unit no-load cost, startup cost, and expected production cost across all scenarios are similar to (1) with the exception that the first-stage costs are written in the form of each individual scenario by using the split first-stage decision variables, i.e., I s i,t , IU s i,t . In addition, the NAC constraint (9) is relaxed using an augmented Lagrangian function.
We enforce s∈S Pr s ω s = 0 for the augmented Lagrangian function (10). The probability-weighted sum of the dual variable equaling zero is a necessary condition [16]. This condition makes the problem bounded from below. Under this imposed condition, Z i,t associated with ω s i,t in the third line of (10) vanishes and the augmented Lagrangian function (10) is decomposed into S subproblems, each representing an individual-scenario NCUC problem to be solved. Note that the proposed PBGS and widely used PHA solve the augmented Lagrangian function (10) using different augmenting functions ψ s ρ . Definition 1 (Exact Penalty Representation [29]): A given augmenting function ψ s ρ in (10) is an exact penalty representation, if there exists a dual vector ω, for all ρ sufficiently large, such that: and argmin Definition 1 indicates that, with an exact augmenting function ψ s ρ , an optimal solution to the augmented Lagrangian (10) is also a (local) optimal solution to the extensive form (1) of S-NCUC. The exactness implies that the optimal Lagrangian dual solution to (10) is directly primal feasible with all the relaxed NAC constraints being satisfied. Hence, no additional algorithm is required to restore the primal feasibility from a Lagrangian dual solution. It is worth mentioning that Definition 1 emphasizes the feasibility rather than the optimality of a solution to augmented Lagrangian (10). Specifically, an optimal solution to (10) is a local minimum to extensive form (1) due to the non-convexity of S-NCUC [23]. Therefore, a quantitative assessment of the quality of a solution to (10) is needed.
An important question is how to construct an exact augmenting function ψ s ρ . We employ an l-1-norm-like function based on a semi-Lagrangian approach [26]. Unlike the semi-Lagrangian approach wherein an equality constraint is reformulated as a pair of inequality constraints and the Lagrangian relaxation is applied to either of the pair [30], here we reformulate NAC (9) as two inequality constraints and then relax both constraints using two penalty factors (ρ, ρ) as follows: where [·] − represents a positive basis, which is defined as (13) can be implemented as follows: Augmenting function (13) is an exact penalty representation since it meets the following three criteria: for some open neighborhood V of 0 and positive scalars δ, ν > 0. The exactness proof of augmenting function (13) is given in Theorem 5 of [29].
It is worth noting that a common choice for the augmenting function ψ s ρ in the literature is the use of norms. For example, the PHA algorithm [17] uses the square of l-2 norm, i.e., as the augmenting function. As indicated in the third criterion above, the value of the augmenting function must be greater than or equal to an infinity norm in the neighborhood of zero. The PHA algorithm does not satisfy this criterion and therefore is not an exact penalty representation. For any norm used as penalty function, or any function that satisfy exact penalty representation criteria listed above, for any ω ∈ R NI ×NT ×S , there exists a finite ρ * such that ϕ + (ω) = ζ , allows us to choose ω = 0. This is detailed in proposition 8 of [29] and in definition 11.60 of [22]. Therefore, we set ω = 0 and omit ω from subsequent discussions for the simplicity of representation.
The second important consideration is in the calculation of implementable Z i,t . Since UC takes a binary value, it is imperative to maintain Z i,t binary. Calculation of Z i,t can be accomplished by minimizing (14) with respect Z i,t using fixed I s i,t over all the scenarios. Per [26], if the problem is restricted to take only binary values, we can calculate Z i,t as follows at every iteration: As shown, the implementable takes a UC consensus on the majority of scenarios weighted by the penalty factors, leading to a binary value. In contrast, the PHA algorithm calculates the implementable via a weighted average of UC, resulting in a fractional value that requires further conversion to a binary value.
After Z i,t is obtained, ρ at the k th iteration is updated as: In (16), we update the two penalty factors separately based on which direction NAC (9) is violated. Unlike the PHA algorithm, which uses the square of l-2 norm as the augmenting function to penalize NAC violations uniformly, the penalty factor updated in (16) offers more granular modifications to satisfy the NAC constraints. Note that scalar α in (10) is increased exponentially over iterations to improve the convergence, which is defined as follows: where β ∈ (1, 2]. A detailed explanation for this acceleration factor is given in Chapter 4 of [26]. The third critical factor to be considered is how we can accelerate the PBGS solution in view that an individual scenario may have to be solved many times, each time corresponding to one PBGS iteration. The idea behind the fast PBGS algorithm is to skip an individual scenario that is deemed unnecessary to solve where its optimal solution have already been procured in the previous iteration. We propose the following corollary to facilitate the discussion. Let I s * k , p s * k denote the optimal solution obtained in scenario s at the k th iteration of PBGS. Corollary 1: For scenario s in PBGS, I s * k , p s * k is also the optimal solution at the k + 1 th iteration if the following two conditions are satisfied at the k th iteration: (1) The optimal commitment decisions obtained are the same as the pre-update implementable, i.e., (2) The post-update implementable remains unchanged, i.e., The proof of Corollary 1 is provided in Appendix. Note that Corollary 1 is made possible in the Fast PBGS through how the implementable is calculated in (15). In the first condition (18), both the UC decision variables and the implementable always take binary values in each PBGS iteration. VOLUME 8, 2021 However, this is not the case in PHA as its implementable is fractional-valued during iterations. We further propose the following proposition to generalize Corollary 1 to the entire iterative process of the Fast PBGS.
Proposition 1: Scenario s does not need to be solved as long as equations (18) and (19) hold in successive iterations.
Proposition 1 can be easily proven by induction. Based on the above discussion, we show the proposed Fast PBGS algorithm in A1, in which indices i and t are omitted for brevity.
It is worthwhile to discuss the initialization of the Fast PBGS as shown in Line 5 of A1. The PBGS algorithm in [26] suggests initializing Z as follows: or rounding the weighted average of all unit statuses across scenarios for the respective unit and time period as follows: Here, we propose a third initialization method by using the unit statuses in a scenario from Line 3 of A1 that has the maximum number of online unit statuses.
whereŝ ∈ S denotes a scenario such that According to our numerical experience, the third initialization method could yield results faster than the first two methods.

III. NUMERICAL RESULTS
In this section, we validate the proposed algorithms on two systems as follows: (1) The IEEE 118-bus system [31]: The modified IEEE 118bus system has 118 buses, 186 branches and 54 generators that include 10 WTGs. The peak load of the system is 3,974 MW. We generate 10 scenarios and each scenario contains 57,048 constraints, 51,577 continuous variables, and 4,032 integer variables. (2) The ERCOT-like system: The ERCOT-like system has 7226 buses, 8853 branches and 725 generators, including 178 WTGs. The total installed capacity is over 80,000 MW including 21,582 MW of wind. We generate 30 scenarios on this system. Each scenario has 1,595,953 constraints, 1,837,921 variables, and 51,240 first-stage UC decisions. After using the line-screening technique, the number of non-zero coefficients in CPLEX is reduced from 1,013,282,556 to 10,439,695. The autoregressive moving average (ARMA) is used to generate a number of scenarios with uncertain wind and load in each system. Specifically, a standard deviation (STD) of 3% in load and 6% in the wind is used on the IEEE 118-bus systems (this scenario set is called IEEE118-10-s1), whereas an STD of 6% and 10% for load and wind on the ERCOT-like system. Note that the focus of this work is to propose and validate a novel PBGS solution for S-NCUC. The proposed solution can accommodate any state-of-the-art forecasting and scenario generation techniques.
All algorithms, including EF, PBGS, and FW-PHA are implemented in MATLAB within the FESTIV framework [27]. The MIP solver used is CPLEX 12.9 in GAMS 28.1. We set the CPLEX MIP optimality gap to 0.1% on the IEEE 118-bus systems and 1% on the ERCOT-like system. The computing platform is Intel Xeon CPU E5-2640 dual processors with 256 GB RAM.
for each s in S
for each s in S 9.
If Z l = Z l−1 or Z l−1 = I s 10.
I s , p s , φ s,l ← argmin end If 12.
end for 13.

A. EVALUATION OF FAST PBGS RESULTS
In this subsection, we evaluate the quality of the proposed Fast PBGS solutions using the following methods for comparison: (1) EF: a primal method, which solves the EF of S-NCUC by directly using CPLEX MIP solver; (2) FW-PHA: a lower-bound method [25], which is initialized with a Fast PBGS solution as a hot start. All gap and difference calculations are based on the following equation [32] where ϕ x represents anyone of the following: EF, FW-PHA or PHA.

1) THE IEEE 118-BUS SYSTEM
The operating cost by PBGS under different penalty factor ρ, and that of EF and FW-PHA are listed in Table 1.
In Table 1, the PBGS solution under all ρ values except for 10000 is very close to the EF solution with the gap calculated is less than 0.05%. When ρ = 1000, the PBGS difference with respect to EF is −0.01%, signifying that its operating cost is slightly less (better) than that of EF. The negative gap can be attributed to the MIP optimality gap of 0.1% set in CPLEX. The solution obtained from EF is not true optimal due to the CPLEX optimality gap. This is why the solution is minutely higher than the PBGS solution. For all ρ, the PBGS gap with respect to EF is less than 0.4%. The EF solution, based on the state-of-the-art commercial MIP solver, is one of the most successful approaches in solving small-to mediumscale S-NCUC problems. Here, we use the EF solution as a benchmark to compare the PBGS solution wherever the EF solution is available. It is observed that the difference in the operating cost between the proposed Fast PBGS method and EF is very small on the IEEE-118 bus system. The results demonstrate the effectiveness of the proposed PBGS solution on the medium-scale S-NCUC problem.
The last column in Table 1 provides two values, i.e., one for the Fast PBGS and the other in parentheses for the PBGS without the proposed accelerating technique. Both methods yield the same solution but with different solution time. It is seen the Fast PBGS is roughly 15% to 45% faster than the PBGS without the proposed accelerating technique. Simulations for different ρ values were carried out while fixing γ = 1.0 and β = 1.1.

2) THE ERCOT-LIKE SYSTEM
It is important to note that EF is not implemented in the ERCOT-like system due to an enormous computational burden of EF (computationally infeasible). Therefore, only FW-PHA is executed as the lower bound method. Three simulations are carried out with different penalty factors ρ. shows the PBGS operating cost and the gap with respect to FW-PHA. It is seen the PBGS yields an exact solution with a gap of less than 0.5% for the first two penalty factors chosen (ρ = 5000 and ρ = 10, 000). For ρ = 50, 000, the PBGS solution results in a higher operating cost and a larger gap (i.e., 1.48%) than those with a smaller ρ. However, the duality gap of 1.48% is acceptable on such a large-scale S-NCUC.
It is fascinating to discuss the computational performance of PBGS as this is a common concern in large-scale applications. In Table 2, the column labeled ''PBGS solution time in sequence'' shows the solution time of stochastic DA-NCUC on the ERCOT-like system with 30 scenarios, where we solve each scenario sequentially without parallelization. Analogous to Table 1, two values, i.e., one for the Fast PBGS and the other in parentheses for the PBGS without the accelerating technique, are shown in the last column of Table 2. It is seen the Fast PBGS is 34% to 39% faster than PBGS. In addition, we find that the time saving is more significant when the number of scenarios is increased.
With an increase in ρ, we observe a similar trend as that on the IEEE 118-bus systems. Apparently, if one were to implement the proposed algorithm in the day-ahead operation, the time taken for sequentially solving each scenario of the ERCOT-like system is far from practical. The solution time can be drastically shortened using parallel computing, as shown in the column labeled ''estimated PBGS solution time in parallel'' in Table 2. The time estimation is based on the longest time taken by a scenario in each iteration plus the time between MATLAB and GAMS interaction through the file I/O activities. The FESTIV environment we use is in an academic setting and currently does not support the parallel execution. We understand that the parallel computing environment is not a standard in ISOs or utilities, which may need hardware upgrades and adjustments to the existing day-ahead market timeline if the stochastic method is implemented.
Computational efficiency is a grand challenge in the power system community when stochastic NCUC is applied in an extremely large-scale system such as the ERCOT-like system. Since the EF is not likely to be solvable for a reasonable number of scenarios, decomposition techniques are required. To the best of our knowledge, the state of art solutions for such a large-scale real-world application are heuristic, such as terminating it prematurely for saving iterations and use a rule-based method to coordinate the implementable afterward. This paper is not intended to solve this grand challenge completely. Instead, we provide an alternate method to get the exact solution of S-NCUC (primal solution). Our simulation VOLUME 8, 2021 results demonstrate that Fast PBGS is a worthy alternative to the PHA since it results in a lower objective value with faster convergence. In tandem with the proposed scenario-skipping technique, the superior convergence leads to a significant amount of savings in computational time.
Fast PBGS can still be used for short-term such as weekahead stochastic planning studies and in DA NCUC for systems smaller than ERCOT-like System. The Fast PBGS algorithm is also beneficial to parallel computing, especially when the number of scenarios is more than the number of computing cores available.

B. COMPARISON OF FAST PBGS WITH PHA
Since the scope of this paper is restricted to the scenario-based decomposition methods, we also confine the comparison within this approach. In the literature, there exists a stage-wise decomposition approach, such as Benders decomposition. However, the stage-wise decomposition approach may eventually grow into a computationally intractable problem as the iteration progresses for the size of the problem we considered in this paper. Therefore, the cross-comparison between the scenario-based and stage-based decomposition methods is out of this paper's scope and will be systematically conducted in future work. Figure 1 shows a comparison of the Fast PBGS with the PHA for the IEEE 118-bus system. In Fig. 1, the second y-axis is the percentage difference in the objective value per (24). A negative percentage difference indicates that the F-PBGS leads to a better solution in terms of the objective value, and vice versa. Table 3 shows the data values of Fig. 1. The comparison between the Fast PBGS and PHA is carried out with five different ρ values ranging from 500 to 20,000. The Fast PBGS has one additional tuning parameter β that is used in the calculation of convergence accelerator α in (17). For each of these five ρ values, we use two β values, i.e., 1.11 and 1.25. It is seen in Fig. 1 when β is 1.25, the Fast PBGS solves faster than it does when β is set to 1.11, but with a slightly higher objective value. The difference in the solution time between the Fast PBGS and the PHA is much more apparent with a lower ρ. Overall, the Fast PBGS yields a much faster solution than the PHA. By using the Fast PBGS, we observe an average of 42% and 61% saving in solution time with β values equal to 1.11 and 1.25, respectively.

1) THE IEEE 118-BUS SYSTEM
In Fig. 1, all Fast PBGS simulations converge with zero violations. In contrast, none of the PHA solutions converges except for the case when ρ = 10000. The convergence tolerance for PHA is set to 0.01. At a lower ρ, we set a solution time limit of 180 minutes when PHA does not converge. At a higher ρ (i.e., 5000 and 20000), the PHA algorithm is terminated if no progress in violation is made for five consecutive iterations. Additionally, a terminated PHA algorithm requires rounding-off the fractional implementable values obtained, followed by UC to ensure its primal feasibility.
In Fig. 1, for a comparatively small ρ, i.e., when ρ = 500 and 1000, the Fast PBGS outperforms the PHA by 2% to 4% in terms of the objective value, respectively. This difference is attributed to the rounding of the PHA implementable values, resulting in overcommitment of generators. When ρ is increased, the difference in the objective values decreases. When ρ = 10000, the PHA converges and no rounding is required. The PHA performs slightly, i.e., around 0.5%, better than the Fast PBGS. The above results in Fig. 1 suggest that the significant improvement in computational efficiency makes the Fast PBGS a better choice than the PHA on the medium-scale system.

2) THE ERCOT-LIKE SYSTEM
For the ERCOT-like large-scale system, we set β = 1.11 and focus exclusively on the sensitivity studies with respect to ρ values. Comparative results between the Fast PBGS and the PHA are shown in Fig. 2. It is seen that the Fast PBGS outperformed the PHA in terms of the solution time, whereas the objective value of PHA is slightly better than that of the Fast PBGS in all three cases. The average execution time saved by the Fast PBGS is 50%, at an average cost increase of 0.48% in the objective value across the three cases. None of the PHA solutions converges. The closest is when ρ = 50000, the violation reaches 0.06 before it triggers the stopping criterion. Note that the convergence tolerance for PHA is set to 0.01. The PHA simulation is terminated if no progress in violation is made for five consecutive iterations. Additionally, a terminated PHA simulation requires rounding-off of the fractional implementable obtained, followed by UC to ensure its primal feasibility. In contrast, these additional heuristics and steps are not needed at all for the Fast PBGS since its convergence is guaranteed and the solution obtained is directly feasible. It is worth mentioning that in the case study of this paper, we never experience the cyclic behaviors of the PHA as reported in [33]. Therefore, the performance of the PHA may be illustrated on the high side if potential cyclic behaviors are considered. Furthermore, some heuristic enhancements [33], such as using a large MIP gap for initial iterations, and initializing successive iterations from the previous solutioncan be applied to both the Fast PBGS and the PHA to further improve their computational efficiency. The results in Fig. 2 show that the significant improvement in computational efficiency also makes the Fast PBGS a better choice on the practically-sized large-scale system.
As mentioned before, Fast PBGS skips a scenario from solving it if it meets certain conditions. Figure 3 shows the actual number of scenarios solved at each iteration. All 30 scenarios are solved in the first three iterations. Any changes in the implementable between the iterations require solving all 30 scenarios, as shown at Iterations 7,9,11,16,22,24 and 40.

C. OUT-OF-SAMPLE TESTING
For the out-of-sample testing, we create 25 sets, each containing 50 scenarios, and conduct simulations using EF, Fast PBGS, and PHA. We used ARMA with 4% STD for load and 8% STD for wind in creating these scenarios. Unlike the simulations in the previous section that used (21) to initialize the implementable, here we use the initialization method (22). For both Fast PBGS and PHA, we did not include any heuristic-based improvements suggested in the literature [33]. All the Fast PBGS solutions converge to zero violations, while only 10 out of 25 sets converge under the PHA. The PHA termination threshold was set at 0.1 or 60 iterations, whichever comes first.  Fig. 4 compares the objective values of the three methods using the box-whiskers plot. As seen, the median production cost obtained by the EF method is $885,667, which is the lowest among all three methods. In contrast, the median production cost of the Fast PBGS and the PHA are $893,280 and $906,628, respectively. The standard deviations for the EF and the Fast PBGS are $20,339 and $23,367, respectively, whereas the standard deviation of the PHA is $29,836, which is higher than the other two methods.  The results in Figs. 4 and 5 demonstrate that the Fast PBGS is a worthy alternative to the PHA since it yields a better VOLUME 8, 2021 production cost with much superior computational efficiency. When comparing the Fast PBGS with the EF at 50 scenarios, the Fast PBGS leads to a slightly higher production cost but better computational efficiency. The advantage in computational efficiency is even more apparent with a larger number of scenarios. This is because the computation time taken by the EF increases exponentially with an increased number of scenarios, whereas that by the Fast PBGS increases linearly [21].

D. DISCUSSION ON PARAMETERS
The Fast PBGS uses parameters β, ρ and γ , and therefore it is necessary to select the proper values for them. The parameter β is used in (17) to accelerate the convergence. The impact of β on the IEEE 118-bus system when it is increased from 1.11 to 1.25 is shown in Fig.1. A high value of β makes the convergence faster at the cost of a decrease in the solution quality. As the iteration number gets high, α in (17) grows exponentially and makes the penalty function higher. This may result in loss of load or additional load slack. This situation is can be mitigated by increasing the VOLL or increasing α adaptively only when needed instead of every iteration based on the progress toward the convergence.
We have conducted simulations with a wide range of ρ values for the IEEE 118-bus system and the ERCOT-like system. Like β, a high value of ρ makes the convergence faster with a compromise on the solution quality. While the ρ is used in the initial penalty, the parameter γ is (step size) used in every iteration (16). Although we have kept both ρ and γ the same in the results of this paper, we experiment them with different values. A large ρ with a small γ took more iterations to eliminate the violation. A small ρ value with a large γ did not make that much difference, compared to setting both ρ and γ equal. According to our simulations, the suggested initial value of ρ for a medium-sized system is from 5,000 to 10,000. For a large system, it is from 10,000 to 50,000. For the step size γ , we recommend setting it equal to ρ. The acceleration factor β can be set to 1.11 to 1.25. However, the above parameter selection is empirical and not conclusive. More theoretical analyses will be performed in our future work.

IV. CONCLUSION
In this paper, we propose a Fast PBGS algorithm that uses an exact augmented Lagrangian function to obtain a primally feasible solution to S-NCUC. For a quantitative evaluation of the PBGS, we use an FW-PHA technique initialized by the PBGS solution to calculate a computationally efficient, high-quality lower bound. Numerical results on the IEEE 118-bus system show the Fast PBGS yields high-quality exact solutions as good as that of solving the EF of S-NCUC using the commercial MIP solver. Numerical results on the ERCOT-like large-scale system validate that the proposed method can effectively solve a real-world S-NCUC problem with a small optimality gap. In both systems, an increase in the chosen penalty factor produces a faster solution but with a higher objective value. Therefore, a fine-tuning of the penalty factor is needed for achieving a trade-off between the solution quality and computational efficiency.
Through a comprehensive comparison between the Fast PBGS and the PHA, it is shown that the proposed PBGS algorithm saves a significant amount of computational time while yielding comparably good solutions. As opposed to the PHA, the Fast PBGS does not require additional heuristics and steps at all since its convergence is guaranteed and the solution obtained is directly feasible. This represents a salient feature of the Fast PBGS algorithm, which is shown as a worthy alternative to the PHA. Additionally, the proposed Fast PBGS algorithm is easy to be implemented as a wrapper over the existing deterministic UC in the ISO's current practice. Our future work will focus on the cross-comparison between the scenario-based and stage-based decomposition methods. Furthermore, we will improve the computational efficiency of the Fast PGBS algorithm by using expediting techniques [33] and [34].