Oil-Price Based Long-Term Hourly System Marginal Electricity Price Scenario Generation

We synthesize scenarios of hourly electricity price, which is known as the system marginal price (SMP), for thirty-years based on the oil price. Hourly SMP scenarios are very important when planning new generators because the revenue and cost of new capacity margins are determined based on the SMP. Because the SMP contains both short-term and long-term periodic patterns, designing a single model based on these patterns to predict the SMP is difficult. Although oil price affects SMP, they can not be directly used in the forecasting model because the resolution of SMP is at hourly intervals, but that of oil price is at yearly intervals. To overcome these problems, we decompose the SMP into annual, monthly, and daily components, and the components are predicted based on different models. The model for the annual component (AC) is designed to predict the long-term trend based on fuel price scenarios. The model for the monthly component (MC) is designed to predict the seasonal trends based on the long short term memory (LSTM) model. The model for the daily component (DC) is designed to predict the daily SMP fluctuation. Finally, we synthesize SMP scenarios by aggregating three components. We make three types of SMP scenarios (high, reference, and low), and the performance of the scenarios is tested using previous data for two years on the basis of mean absolute error (MAE). Due to the global COVID-19 pandemic, the low type of SMP scenario is most accurate. We also verify that the reliability of long-term scenarios can be secured by using oil price while maintaining monthly and daily patterns.


I. INTRODUCTION
The entry of new generators depends on predicted electricity price, which is known as the system marginal price (SMP). The SMP is the electricity price in electricity networks without any binding constraints of transmission limits. The SMP represents the most expensive variable cost of all on-line generators [1]. When planning a new generator, a long-term prediction of hourly SMP is indispensable because it directly relates to their profits. However, the SMP is highly volatile and affected by many factors, such as electricity demand, weather, fuel prices, economic conditions, and date [2]. Thus, predicting hourly SMPs over several years is difficult because those factors fluctuate for several years.
Forecasting the long-term electricity prices has received less attention because of uncertain factors [3]. Furthermore, increasing penetration level of renewable energy sources (RES)s has brought the variability to electricity prices because of their uncertain and variable characteristics [4].
The associate editor coordinating the review of this manuscript and approving it for publication was Fabio Massaro . Therefore, the importance of predicting the SMP has increased.
In this study, our goal is to predict the hourly SMP scenarios for 30 years based on oil prices. However, we cannot expect the high accuracy when predicting values for very long time, so we will focus on synthesizing several scenarios of electricity prices. There have been mid-and long-term forecasting of hourly electricity price. In [5], Hossam, et al. predicted hourly electricity price in Australia for a month using hourly load, hourly weather conditions, and hourly natural gas prices. However, it is very difficult to obtain hourly load, weather conditions, and fuel prices for 30 years. Moreover, prediction intervals of predictions were a year. In [6], Florian, et al. predicted hourly electricity prices in Germany and Austria for 3 years in the day-ahead electricity market by simulating the equilibrium between supply and demand curves daily. However, their model is also not appropriate for our goal either because they did not consider fuel prices or focused on detecting price spikes.
When designing the forecasting model, input variables are divided into two categories: historical price and exogenous VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ variables [7]. The forecasting model using historical price trains the model to follow its inherent patterns and anticipates the price based on recent trends. The forecasting model using exogenous variables trains the relationship between price and exogenous variables and predicts the price based on the future exogenous variables. The former is reliable for the short-term horizon, but when the forecasting horizon increases, the error of the model also increases [8]. For the short-term horizon, the latter has lower reliability than the former because of the uncertainty inherent in the future exogenous variables. However, for the long-term horizon, reliability of the latter is constant regardless of the forecasting horizon. Therefore, we design the model based on exogenous variables to have long-term reliability. Among many factors affecting electricity prices, we choose gas and oil prices as the exogenous variables because the SMP is generally determined by gas and oil prices [9]. By comparing the performance of gas and oil prices, we choose the oil price as an exogenous variable. Besides, hourly gas and oil prices for 30 years are required to predict the SMP, but all gas and oil prices are measured annually, so we introduce a decomposition strategy based on temporal granularity. In this study, we synthesize electricity price scenarios for 30 years while reflecting the statistical characteristics of electricity prices based on the decomposition strategy. In order to use the decomposition strategy, we use the fact that electricity price consists of short-term seasonal component (STSC) and long-term seasonal component (LTSC) [10]. The STSC accounts for the daily seasonality, which is due to business activities of consumers. And the LTSC is dominated by a irregular cyclic component that depends on macro-economic variables such as gas and oil prices. Besides, there is a discernible pattern in electricity price between summer and winter [11]. Therefore, we decompose the SMP into three parts: annual component (AC), monthly component (MC), and daily component (DC) to account for the LTSC, annual seasonality, and STSC, respectively. The AC is predicted using oil prices to reflect long-term patterns. The MC and DC are predicted to have monthly and intra-day patterns, and we assume that those patterns do not deviate over 30-years. By summing up all the components, we forecast the SMP.
The contributions of our research are as follows: • We establish a long-term hourly SMP scenario generation strategy that reflects future oil prices while maintaining the periodic patterns in the SMP.
• We analyze the effect of gas and oil prices on the SMP in terms of long-term trends.
• We devise the LSTM structure to generate the seasonal patterns of SMP. This paper is organized as follows. Section II explains our forecasting framework, decomposition strategy, and the meaning of each decomposed component. In Section III, we introduce several algorithms for our research. Section IV contains the experimental set-up and shows the experimental results. The last section concludes this study.

II. FORECASTING STRATEGY
In this section, the overall forecasting process and decomposition strategy are described. In addition, the effect of gas and oil price on the SMP is analyzed.

A. OVERALL PROCESS
The overall SMP scenario generation strategy is illustrated in Fig. 1. The SMP is decomposed into AC, MC, and DC. The AC accounts for the long-term trends of SMP, the MC accounts for the seasonal trends of SMP, and the DC accounts for the daily patterns of consumer behaviour. We extract the AC from SMP, and the residual is defined as the first residual (FR). Then, we extract the MC from the FR, and the residual is defined as the second residual (SR).
After the decomposition, we predict the three components individually. First, we verify the correlation between SMP and fuel prices and predict the AC based on the fuel prices. Second, we predict the MC using various algorithms to account for seasonal patterns, and then select the optimal one. We use the long-short term memory (LSTM), gated recurrent unit (GRU), autoregressive model (AR), and support vector regression (SVR) to predict MC. LSTM and GRU are designed to have a double layer and to use the output as the input to generate long-term seasonal trends. Third, we observe that the variability of the SR is dependent on the AC magnitude, and that the SR has a periodic pattern that repeats every day. We model the variability and periodic pattern of the SR using the standard deviation (STD) and mean values for each hour. We predict the mean and STD based on AC magnitude and generate 24 Gaussian distributions to utilize the mean and STD values. Then, the DC is generated from these distributions. Finally, the SMP scenarios are synthesized by aggregating the three components.

B. DATA SET
The hourly SMP data is collected from the electricity market of South Korea from 2002 to 2020. To investigate the correlation between SMP and fuel prices, we collect two crude oil and two natural gas price indices. The crude oil prices are West Texas Intermediate (WTI) and Brent Crude price indices. The natural gas prices are from the Henry Hub (HH) and National Balancing Point (NBP) price indices. Unlike the hourly SMP data, the fuel prices are given daily. The overall SMP and fuel price data are presented in Fig. 2. The units are as follows: SMP is in South Korean Won per kilowatt hour (KRW/kWh), the crude oil prices are in dollars per barrel ($/bbl), and the natural gas prices are in dollars per million British thermal units ($/mmBtu). The SMP and fuel prices from 2002 to 2018 are used to train the forecasting model, and we predict the SMP from 2019 to 2050. The SMP from 2019 to 2020 is used to measure the performance of the forecasting model. The overall dataset is provided in [12] or it is available on: https://github.com/ByoungryulOh/ForPaper-Oil-Price-Based-Long-Term-Hourly-System-Marginal-Electricity-Price-Scenario-Generation.git.

C. ANNUAL COMPONENT
The AC is calculated by averaging the SMP year by year, so we can present the AC as a function of a year. The AC of year y, AC(y), is obtained by averaging the SMP(y, m, d, h) by year, which is measured by the y th year, m th month, d th day, and h th hour: where D y is the number of days in the y th year, and D y,m is the number of days in the y th year and m th month. Then, the first residual, FR(y, m, d, h), is obtained by subtracting the AC from SMP, which is given by: where AC(y) is a piece-wise constant in hour-resolution because AC(y) is calculated from each year. The AC of SMP and fuel prices are presented in Fig 3. This component reflects the long-term trend of fuel prices, and we predict the AC based on them because the patterns of SMP and fuel prices are similar only in the AC. The FR denotes the SMP excluding the AC, and then we extract the MC from the FR.

D. MONTHLY COMPONENT
The MC represents seasonal trends. This component is calculated by averaging the FR(y, m, d, h) for every month: The MC also becomes a piece-wise constant in a hourly resolution because the monthly average is fixed at D y,m × 24 hours. The MC of SMP and fuel prices in month-resolution is presented in Fig. 4. Empirically, the SMP is lower in summer than winter due to electrical heating demand in cold weather. Fig. 5 represents the annual pattern of MC. We can observe that the MC is low in August and September and is high in December and January. Then, we define the remaining VOLUME 10, 2022  SMP as the SR. The second residual SR(y, m, d, h) is the remaining hourly SMP after extracting the AC and MC, calculated as: The SR accounts for intra-day fluctuations.

E. DAILY COMPONENT
Then, we extract the DC from the SR. The original SMP and its decomposed results are presented in Fig. 8. The SR reflects short periodicity in SMP and is highly volatile because of the intra-day load fluctuations. When the load fluctuates intensely, the marginal generator also changes frequently. Thus, the daily load pattern affects the fluctuation frequency of SMP because the marginal generator determines the SMP. Besides, when the marginal generator's fuel type changes due to sharp load fluctuation, the difference between fuel prices affects the fluctuation magnitude of SMP. Because the granularity of load pattern is hourly, the effect of load fluctuation remains in the SR. However, we can not predict the intra-day load pattern over 30 years due to the variability and lack of data, so it is hard to predict the SR using load patterns. Therefore, we try to model the statistical characteristics for the SR and generate future SR from them. Based on Fig. 8, we can see that when the AC is high, the SR fluctuates intensely, and when the AC is low, the SR is relatively stable. We estimate the magnitude of SR fluctuation using standard deviation (STD), and the STD versus the AC magnitude is illustrated in Fig. 6. We can see that as the AC magnitude increases, the STD also increases. Thus, we model the statistical characteristics of SR by the AC magnitude.  Because the AC is determined every year, we need to analyze the SR year by year. Furthermore, since the SR has a pattern that repeats every day, we present the SR by hour in Fig. 7. Although the amplitude is different every year, the SR has a standard daily pattern, a decrease in the early morning, and increase in daytime. To reflect the daily pattern, we consider averages for each hour.
Consequently, when we predict the SR, we need to model the STD and mean values for each hour as the AC magnitude changes. We assume the SR variation for each hour are Gaussian distributions because Gaussian distribution is decided by STD and mean values. We predict the future STD and mean values of each hour based on the AC magnitude and organize twenty-four Gaussian distributions using the predicted STD and mean values. Then, we generate future SR from these distributions, and define it as the DC because this component reflects the intra-day fluctuation of SMP.

F. EFFECT OF FUEL PRICES ON SMP
In this subsection, we analyze the effect of fuel prices on the SMP. To estimate the effect, we use correlation analysis with respect to different time-scales. Higher correlation means that the movement of SMP becomes more predictable as fuel price changes. If the correlation index is one, the SMP is directly proportional to fuel prices, and if the correlation index is minus one, the SMP is inversely proportional to fuel prices. If the correlation index is zero, the SMP moves regardless of fuel prices. At first, we calculate the correlation between SMP and fuel prices in day-resolution because the original fuel prices are given daily. To match the time-scale, we calculate the average daily SMP. Then, the correlation is also calculated  for the daily SMP and fuel prices, AC, and MC (Table 1). Among the time-scales, the correlations between AC values are the highest. Among the fuel prices, Brent Crude shows the highest correlation. Therefore, we can say that the SMP follows the Brent Crude oil price at the yearly resolution. As a result, we can predict the AC of SMP given future Brent Crude prices.

III. FORECASTING ALGORITHMS
In this section, we introduce the forecasting algorithms: LSTM, GRU, AR, and SVR.

A. LSTM
LSTM is a branch of RNN, which is specialized for chronologically ordered data. After the LSTM introduced [13], it's outstanding performance when dealing with time series data is proved by many studies [14]. To compare the RNN and LSTM, we introduce the RNN first. In RNN, the combination of the ordered input data X t and the ordered output data h t−1 enters into the neural network (NN) in the exterior box in Fig. 9 [15]. In Fig. 9, h t is delivered to the next NN recursively. The delivered output data h t , and the next input data X t+1 constitute the next combination. The sum of neurons in the input layer enters the neurons in the output layer after being multiplied by their weights respectively. The activation function in the output layer is a non-linear threshold function, such as tanh or sigmoid functions. The output of sigmoid is between 0 and 1 and represents the proportion of affect the input signal has on the output signal. The output of tanh is between -1 and 1 and represents the analytic relationship between input and output. Generally, RNN uses tanh as the activation function. The optimal weights are obtained by the gradient descent method, in which the gradients of errors with respect to the weights are minimized. In this process, the gradients are calculated through the back propagation method, which multiplies the local gradients using the chain rule to the input. However, during the back propagation in RNN, the local gradients of all former states are multiplied simultaneously. However, when the input of the activation function is a large positive or negative value, the derivatives of the activation function converge to zero, so the multiplication of local gradients also converges to zero. Therefore, early inputs for the RNN have less affect on the gradient during the training process. This is called the vanishing gradient problem. The deeper the neural network is, the more frequently this problem occurs. LSTM is devised to fix the vanishing gradient problem by introducing the cell state c that memorizes the long-term input [16].
The structure of the LSTM is in Fig. 10. To utilize c, LSTM introduces forget, input, and output gates. In the forget gate, the output f t is determined as: where W and U are the parameters that set up the relationship between inputs and outputs. The output f t determines the portion of new input x t and past h t−1 in c t−1 .
In the input gate, the output consists of i t andc t as: In (6), i t decides the portion of new input x t and past h t−1 entering into c t . In (7),c t decides the candidates organizing the new cell state c t . Then, new cell state c t is determined as: where ⊗ is the element-wise multiplication. Eq. (8) means that the new cell state c t is updated based on the previous VOLUME 10, 2022 In (9), o t decides the portion of new input x t and previous output h t−1 . In (10), tanh(c t ) decides the candidates calculating the new output h t , and h t is determined by o t and tanh(c t ). Due to cell state c, during back propagation, the multiplication of local gradients is transformed into the sum of derivatives with respect toc and three gates. Thus, the number of multiplication steps is the same in all states, and it means LSTM solves the vanishing gradient problem.

B. GRU
GRU is an improved version of LSTM. GRU solves the vanishing gradient problem and shows similar performance to LSTM but also accelerates the training by reducing the number of parameters [17]. In the same way that LSTM solves the vanishing gradient problem, GRU also transforms the multiplication of local gradients into the sum of derivatives using gates. GRU has reset and update gates instead of the out, forget, and input gates of LSTM as shown in Fig. 11. The reset gate works like the output gate of LSTM, and the update gate works like the forget and input gates of LSTM. Fewer gates leads to fewer weights, thereby enabling GRU to train faster. In spite of the reduced number of weights, GRU shows similar performance to LSTM.

C. AR
AR is a time-series model, which considers the input data as the sum of the autoregressive and noise terms [18]. In particular, AR only utilizes the autoregressive term to establish the relationship between inputs and outputs. AR is formulated as: where θ p is the weight, p is the order, and ε t is the noise term. In (11), x t , which is a scalar value at time t, is the weighted sum of x and ε. The order p must be determined to establish the relationship among x using Akaike information criterion (AIC) [19]. The AIC is defined as: where σ 2 is the variance of the prediction errors, and N is the total number of data. The optimal p is determined to minimize the AIC by training samples to minimize forecasting errors. Then, θ is estimated by analyzing the relationship between training samples for the given p.

D. SUPPORT VECTOR REGRESSION
Support vector regression (SVR) is a non-linear regression algorithm. SVR transforms a non-linear regression problem into a linear regression problem using a kernel function and then solves it using an optimization problem. The kernel function maps the input data into a higher dimension space, which is called a kernel space. In the kernel space, the input data can be modeled by a linear regressor. In this paper, we use the radial basis function (RBF) as the kernel function [20]. The RBF is given as: where X c is a constant value, x i is the input data, a 2 is a l 2 norm, and γ determines the distribution of mapped data. Therefore, the RBF is the function of distance between the constant value and the input data. Due to the RBF, the input data x i is modelled as a linear regression in kernel space. SVR must find the optimal regression line, which is called the optimal hyperplane, by minimizing the loss between the produced and actual values. To calculate the loss, SVR introduces the ε-insensitive loss function. The ε-insensitive loss function ignores loss less than ε as where y is the observed value, x is the input, w is weight, b is bias, and ε is the threshold. SVR can also prevent overfitting by minimizing the number of parameters, which is equal to minimizing the l 2 norm of parameters because the large number of parameters makes l 2 norm large. Therefore, SVR formulates the objective function as: where C determines the weight between the ε-insensitive loss and model complexity. In addition to the objective function, SVR utilizes the slack variables, which are ξ and ξ * , to deal with outliers. ξ is positive when y − (w T x + b) is less than −ε, and ξ * is positive when y − (w T x + b) is larger than ε. Therefore, the loss is between ε + ξ * and −ε − ξ . By introducing the slack variables, the optimization problem can be rewritten as: where N is the total number of data [21]. Eq. (16) is subject to constraints: The optimal weight w and bias b is determined by this optimization.

IV. SIMULATION AND RESULTS
In this section, we introduce simulation steps set-up for three components and give the results of three components and aggregation. First, we predict the AC using the Brent  Crude oil price. Second, we predict the MC using different algorithms and select one that has the highest accuracy while maintaining seasonal patterns. Third, we model the SR at every hour by using Gaussian distributions and generate DC from the SR. Finally, we aggregate the three components.

A. SIMULATION STEPS
In this subsection, we introduce simulation steps for each component. Furthermore, we also explain simulation conditions.

1) ANNUAL COMPONENT
We predict the AC of SMP using the AC of Brent Crude oil price. We use the oil price scenarios from the Baringa report published in 2018 [22]. The scenarios consist of high, reference, and low scenarios as described in Fig. 12.
To predict the SMP, we use a linear regression because the SMP and the oil price have a clear linear relationship owing to their high correlation.

2) MONTHLY COMPONENT
We predict the MC using LSTM, GRU, AR, and SVR, and then, we select the best model on the basis of accuracy and whether the results have seasonal patterns or not. Finally, we select the LSTM as the best model. The forecasting procedure of our LSTM-based model is in Fig. 13. We organize the double-layered LSTM-based model to train more complex relationship between inputs and outputs. This model receives the MC of Y th year and produces the MC of Y th + 1 year. Each layer has 12 states to train the pattern for 12 months of a year. To determine the number of neurons, we split the data into training and test sets at the ratio of 7 to 3.

VOLUME 10, 2022
Then, we find that the training error does not improve and the test error increases at over 50 neurons. Therefore, we use 50 neurons. Each state of the first layer receives one input and produces 50 outputs. Each state of the second layer receives 50 outputs from the first layer and also produces 50 outputs. The final outputs pass an additional NN with 12 neurons to produce the MC. We insert the We increase the training samples by reusing predicted values. After 2019, the predicted values are used to train the next forecasting model, and this forecasting model generates the following forecast values. Then, the Y th and Y th +1 values are also used as training data. For example, the predicted values from 2019 and 2020 are also used to train the model. Through iterating this procedure 30 times, the goal of this paper is achieved. For the GRU-based model, the forecasting procedure is the same as the LSTM-based model.
To compare the result from the LSTM-based model, we also train the AR-based model. For the AR-based model, we use AIC comparisons by changing the order p in Eq. (11) from 1 to 20. We find that 5 is the best order. Then, the AR-based model is trained to predict a value at time t, when values at time t − 1 to t − 5 are given as input data. After training, the AR-based model predicts the MC for January 2019 based on the MC of August to December 2018. Then, the predicted MC of January 2019 is reused to train the AR-based model. By iterating this procedure 360 times, the SMP for 30 years is obtained.
The structure of the SVR-based model is same as the structure of the AR-based model. To determine the optimal order p, the model accuracy is tested by changing the order p from 1 to 50. After comparing the accuracy of models based on root mean squared error (RMSE), we choose the order p at 38, which shows the smallest RMSE. Finally, the SVR-based model is trained to predict a value at time t, when values at time t − 1 to t − 38 are given as input data. After training, the SVR-based model predicts the MC for January 2019, based on the MC from November 2015 to December 2018. Then, the predicted MC of January 2019 is reused to train the model. By iterating this procedure 360 times, the SMP for 30 years is obtained.

3) DAILY COMPONENT
We model the DC based on the AC magnitude and on statistical characteristics of the mean and STD values of hourly SR. First, we predict the statistical characteristics of mean and STD values based on the AC magnitude. Then, we generate the DC using 24 Gaussian distributions, whose mean and STD values are predicted by the AC magnitude. The mean and STD values of the AC magnitude are presented in Fig. 14a and 14b. As the AC magnitude increases, the mean of the SR decreases at night (22:00 to 8:00) and increases at day (9:00 to 20:00). On the other hand, as the AC magnitude increases, the STD of the SR increases from 9:00 to 24:00, but fluctuates between 1:00 and 8:00. The fitted curves are the results of regression of second-order polynomials. Based on Fig. 14a and 14b, we predict the mean and STD values, and the DC is produced from the Gaussian distributions of that mean and STD values.

4) SIMULATION CONDITIONS
The models for the AC and DC are coded in MATLAB, and all models for the MC are coded in Python using Tensorflow, Scikit-learn, and Statsmodel modules. Tensorflow is used for LSTM and GRU, Scikit-learn is used for SVR, and Statsmodel is used for the AR-based model. The hyperparameters for LSTM and GRU are described in Table 2.

B. SIMULATION RESULTS
In this subsection, we enumerate simulation results for three components.

1) ANNUAL COMPONENT
The SMP scenarios synthesized based on oil price scenarios are in Fig. 15. The green points represent the observed SMP and oil price, and we can see that the green points spread linearly. The mean absolute error (MAE) values of the AC forecasts are in Table 3. In 2019, the oil price follows the reference scenario, so the MAE values for the reference case is 1.29, and MAE values of high and low cases are 9.34 and 6.75. In 2020, the oil price also follows the low scenario, and the MAE values for the low case is 1.02, and the MAE values for high and reference cases are 35.81 and 18.41. The overall MAE values for the three cases are 22.58, 9.85, and 3.88, and the low scenario has the smallest error.

2) MONTHLY COMPONENT
The results of the MC forecasts are shown in Fig. 16. Fig. 16a shows the MC forecasts of the LSTM-based and GRU-based models, and Fig. 16b shows the MC forecasts of the AR-based and SVR-based models. Among four models, the best model is selected based on the accuracy and persistence of seasonality. The accuracy is measured based on the MAE values in Table 4. In 2019 and 2020, the AR-based and GRUbased models are the most accurate models. However, the LSTM-based model is the most accurate model during the entire test period. Furthermore, the persistence of seasonality is tested in Fig. 16. In Fig. 16a, the results using LSTM-based and GRU-based models have apparent periodic patterns during all forecasting horizons because the SMP values are high in winter (beginning and end of year) and low in summer (middle of year). On the other hand, in Fig. 16b, the results using the AR-based and SVR-based models do not show the same periodic patterns. The AR-based model converges to zero because of its mean reverting attribute in [23], and the SVR-based model fails to train annual cyclic patterns. Therefore, the LSTM-based and GRU-based models persistently have seasonal patterns regardless of forecasting horizons. We score models in Table 5. According to the table,   we choose the LSTM-based model to predict the MC because of its accuracy and persistence of seasonality. Besides, it can also predict seasonality over long-term horizons.

3) DAILY COMPONENT
The DC also consists of three scenarios as shown in Fig. 18. The accuracy of DC is described in Table 6. In 2019, the MAE values for three scenarios are 6.01, 5.68, and 5.43. In 2020, the MAE values for three scenarios are 6.55, 5.77, and 5.56. During overall periods, the MAE values for three scenarios are 6.28, 5.72, and 5.45. Similar to the AC, the low scenario gives the smallest MAE.

C. AGGREGATION
The predicted AC, MC, and DC are aggregated, and the results are shown in Fig. 17. The estimated accuracy is in

D. DISCUSSION
Considering that a recent study for day-ahead electricity price forecasting verified by the MAE values of various machine-learning algorithms, which vary from about 1 to about 7 [24], our long-term SMP forecasting algorithm is less accurate than short-term forecasting algorithms. However, we confirm that our scenarios have long-term reliability by testing the accuracy for 17,520 data points (24 hours × 365 days × 2 years), and they do not deviate as the forecasting horizon increases. Furthermore, we can estimate the economic robustness for new generators. In 2020, there was a shock to fuel prices and SMP due to the global COVID-19 pandemic. If we had planned new generators in 2018, many generators would suffer a deficit because of the crash in SMP. However, if we planned new generators based on the low SMP scenario, we could have evaluated the profitability of new generators and installed generators.

V. CONCLUSION
In this paper, we synthesize the long-term hourly SMP scenarios based on fuel prices. We decompose the SMP into annual, monthly, and daily components. Then, we forecast three components based on fuel price scenarios while maintaining their periodic patterns. First, we analyze the effect of fuel prices on the SMP, and we verified that the Brent Crude oil price has the greatest effect on the SMP fluctuation. Based on three oil price scenarios, we predict the AC in three scenarios. Second, we create a novel doubly-layered LSTM structure to predict the MC. The performance of LSTM is verified by comparisons with GRU, AR, and SVR. Third, the DC is produced from 24 Gaussian distributions, which account for intra-day hours. Empirically, the DC has a variability trend that follows the AC, and the DC also has a periodic pattern that repeats everyday. The variability trend and the periodic pattern of DC are modeled as STD and mean values for 24 hours individually. We predict the STD and mean values based on the AC scenarios, and the Gaussian distributions are organized based on the STD and mean values. Then, we produce the future DC from the distributions. Finally, the SMP scenarios are synthesized by aggregating the predicted AC, MC, and DC. The long-term reliability of SMP scenarios are secured by using fuel prices, and the seasonal and intra-day patterns of SMP are also secured. Therefore, using our SMP scenarios, we can make an optimal decision on new generators.