Multi-Convolution Feature Extraction and Recurrent Neural Network Dependent Model for Short-Term Load Forecasting

Load forecasting is critical for power system operation and market planning. With the increased penetration of renewable energy and the massive consumption of electric energy, improving load forecasting accuracy has become a difficult task. Recently, it was demonstrated that deep learning models perform well for short-term load forecasting (STLF). However, prior research has demonstrated that the hybrid deep learning model outperforms the single model. We propose a hybrid neural network in this article that combines elements of a convolutional neural network (1D-CNN) and a long short memory network (LSTM) in novel ways. Multiple independent 1D-CNNs are used to extract load, calendar, and weather features from the proposed hybrid model, while LSTM is used to learn time patterns. This architecture is referred to as a CNN-LSTM network with multiple heads (MCNN-LSTM). To demonstrate the proposed hybrid deep learning model’s superior performance, the proposed method is applied to Ireland’s load data for single-step and multi-step load forecasting. In comparison to the widely used CNN-LSTM hybrid model, the proposed model improved single-step prediction by 16.73% and 24-step load prediction by 20.33%. Additionally, we use the Maine dataset to verify the proposed model’s generalizability.


I. INTRODUCTION
Electricity is a significant secondary energy source that has a sizable impact on both the national economy and daily life. With large-scale renewable energy connected to the power grid, we can expand the use of renewable resources, which has a number of benefits. However, there are additional complications associated with power distribution and dispatch. Short-term load forecasting (STLF) enables the generation of detailed forecasts for a time period that is critical for supply unit optimization, economy dispatch, and market transactions [1]. A detailed and reliable STLF enables the power system to operate safely and efficiently.
The associate editor coordinating the review of this manuscript and approving it for publication was Seifedine Kadry .
Due to the complexity of the factors affecting STLF, such as historical load curves, weather conditions, and calendar effects, staying on schedule is more difficult than ever. Numerous weather research experiments are currently being conducted to examine the four critical STLF variables: temperature, humidity, wind speed, and air pressures. Mukhopadhyay et al. [2] incorporated temperature and relative humidity into the load forecasting model. Friedrich and Afshari [3] included wind speed in their load forecasting model. Because the weather is influenced by both climate and geography, some of its forecast parameters are non-relational. Correlations between weather parameters and load data must be determined in order to determine which parameters (variables) to measure [4].The calendar effect was used to characterize the cyclical and seasonal variation in electricity consumption [5]. People's production and daily lifestyle changes can affect their energy consumption patterns, resulting in daily or weekly changes in the power load. Numerous calendar-dependent input features were incorporated into the load forecasting model to emphasize the importance of calendar variables via coding techniques.
Historically, a variety of load forecasting algorithms were employed to determine historical loads and load-related characteristics and produced accurate prediction results. Recently, deep learning models based on long short memory networks (LSTMs) and convolutional neural networks (CNNs) have been applied to anticipate wind energy generation, photovoltaic energy generation, and load forecasting. Their predicting abilities have been demonstrated to be superior to those of conventional shallow neural networks and machine learning techniques. The LSTM model accurately captures the time series' pattern information, whereas the CNN model extracts valuable features from the time series without requiring domain expertise. LSTMs are excellent at extracting temporal features, while CNNs are excellent at extracting spatial features [6]. Thus, by integrating the technological advantages of the two models, forecasting accuracy can be increased.
The periodicity of electric load and the effect of weather parameters on electric load are discussed in this paper. As input features, periodic calendar coding and weather parameters (temperature and humidity) are used. Different from other hybrid models of CNN and LSTM [7], [8], we propose a novel combined architecture of CNN and LSTM for STLF. We use three independent 1-dimensional convolution heads to extract historical load features, calendar features and weather features, and fuse the features into LSTM layer. This hybrid model is called multi-head CNN-LSTM(MCNN-LSTM). The independent convolutional head can concentrate on the input data, which can aid in feature extraction, and each independent convolutional head can finely tune the parameters to adapt to the feature extraction. At the same time, it can avoid the slow fitting speed caused by the increase of input feature dimension, thus reducing the training time. Currently, most of the methods used in deep learning focus operate on single-step load forecasting, while power market participants rely on multi-step forecasting. For multi-step load forecasting, we also utilize the proposed multi-input multi-output model. The main contributions of this paper are as follows: 1. We propose a novel MCNN-LSTM hybrid deep learning model for short-term load forecasting.
2. We compare the proposed model's performance to that of other machine learning models and deep learning models, demonstrating the model's validity.
3. We use polar coordinates coding to encode calendar variables, and analyze the impact of calendar coding on load forecasting at different time steps.
The rest of this paper is organized as follows. Section II describes related forecasting techniques. Section III details the basic framework of MCNN-LSTM hybrid deep learning model. Section IV gives the evaluation indicators of STLF model. Section V contains data processing and simulation analysis of experiments. Section VI presents the conclusions.

II. RELATED WORK
STLF methods can be divided into two categories: statistical methods and artificial intelligence (AI) methods.
The main statistical methods are linear regression (LR) [9], multiple linear regression (MLR) [10], auto regressive moving average (ARIMA) [11]. The goal of statistical methods is to measure changes in both current and past load series to establish a mathematical relationship. While time series correlation is considered, it is difficult to attain the requisite forecasting accuracy when working with non-stationary load series due to their inherent restrictions [1]. Because of the introduction of random and intermittent energy sources in the grid, the traditional statistical approach does not work for the complex load curve.
The use of machine learning models that overcame the shortcoming of statistical methods in STLF models were widely implemented for this class of problems, including support vector machine (SVM) [12], [13], light gradient boosting machine (LightGBM) [14] and artificial neural network(ANN) [15]- [17]. SVM's primary methodology is based on the principle of structural risk minimization. While SVM's time series performance is a strength, the accuracy with which the parameters are chosen has an effect on performance.
LightGBM is a promotion model for gradient boosted decision trees (GBDTs) that allows for equal gains in speed and accuracy. ANN makes it much easier to track the relationship between input and output variables due to the nonlinear processing. The ANN combined with a multi-layer perceptron (MLP) has become one of the most widely used load forecasting algorithms. However, the freedom of an ANN is constrained by its initial conditions and increases as the model becomes more complex, resulting in overfitting or underfitting [17].
As a new type of machine learning application, deep learning can solve the training problem of traditional neural network. In comparison to the shallow network, big data and unsupervised learning have garnered considerable interest. Deep learning, particularly convolutional neural networks (CNN), has aided in the study of STLF [18], [19] and long-term memory network(LSTM) [20], [21]. CNN's local connection and global sharing capabilities significantly reduce the model's training parameters and training time, with some organizations utilizing CNN for load prediction. Dong et al. [22] used CNN and K-means clustering to process large-scale power load data. Cai et al. [23] used gated CNN to forecast the short-term power consumption of commercial buildings, which has the best performance compared with gated recurrent neural network and seasonal ARIMAX. LSTM is a recurrent neural network, which is suitable for processing time series. Rahman and Zubair [24] presented a method of applying LSTM with data construction method for hourly electric load forecasting. VOLUME 9, 2021 Bedi and Toshniwal [25] proposed a deep learning framework based on LSTM to forecast electricity demand by learning history data dependencies. Sometimes a single model is not competent for all tasks, and deep learning is no exception. In order to take advantage of different models, some researchers have mixed different deep learning models into a hybrid network model [7], [14]. Alhussein et al. [26] proposed a hybrid CNN-LSTM model using CNN layer to extract features from the input data and LSTM layer for sequence learning to forecast household load. Kim et al. [27] proposed a recurrent inception convolution neural network (RICNN) that combines RNN and 1-dimensional CNN (1-D CNN) to forecast electricity load of three distribution industrial complexes in South Korea. The hybrid CNN and LSTM model has good performance in small-scale load forecasting of commercial buildings and individual household, while the STLF of large-scale regional level needs to be studied.
To capitalize on the CNN and LSTM hybrid model's advantages, many implementations of CNN and LSTM hybrid deep learning models have been developed in a variety of domains. Gao et al. [6] compared the forecasting performance of five different CNN-LSTM structures and proposed that the solar irradiance features be decomposed using complete ensemble empirical mode decomposition adaptive noise (CEEMDAN), and the prediction model of historical solar irradiance features obtained by multiple CNNs be fused into LSTM for optimal performance. Behnam Farsi et al. [28] introduced a parallel deep LSTM-CNN (PLCNet) model that relied on the parallel CNN layer and LSTM as the upper layer to extract the features of load data, and then connected the LSTM layer and the Dense layer to anticipate the final load data. Hourly load forecasting accuracy was 98.23% on the Malaysian dataset and 91.18% on the German dataset.The hybrid model of CNN and LSTM enables the incorporation of additional data features. Li et al. [29] introduced a cascaded CNN-LSTM model for PM2.5 prediction and demonstrated that the multivariate CNN-LSTM model is more accurate than the univariate CNN-LSTM model. Wang et al. [8] proposed the LSTM-Convolutional Network forecasting model for photovoltaic power generation. To begin, an LSTM layer is utilized to extract the temporal features affecting photovoltaic power, followed by a CNN layer to retrieve the spatial data. It is more accurate than a single CNN model or LSTM model. While our proposed model incorporates historical load and load-related characteristics and makes use of the CNN and LSTM hybrid model's multivariate feature processing capabilities, it also addresses the issue of training costs associated with the increased data dimension generated by multivariate features. It contains important references for tasks involving load forecasting with different input features.

A. 1-DIMENSIONAL CONVOLUTION NEURAL NETWORK
Convolutional neural network (CNN) was originally developed for image classification, and have been used in the field of computer vision with great success. Recently, CNNs are not only used in the field of natural language, but they are also extend to solving energy related problems, such as power quality issues [30], wind power prediction [31], solar irradiation prediction [32] and load prediction [23] etc. In comparison to a traditional full connected network, CNN has two main characteristics: local connection and weight sharing. The structure of CNN mainly includes convolution layer, pooling layer and full connection layer. Figure 1(a) shows the basic framework of one-dimensional CNN. Unlike a two-dimensional convolution image processing kernel, which moves in two dimensions from left to right and from top to bottom, the one-dimensional convolution kernel moves in just single dimension (time dimension). Time series can be defined as one-dimensional vector, and one-dimensional CNN is suitable for learning the characteristics of time series. In this study, CNN is applied for feature extraction of load forecasting. The operation example of one-dimensional convolution is shown in Figure 1(b). Convolution is the core concept of CNN, which is used to combine two sets of information into one. Convolution transforms the input data, and the convolution calculation formula is as follows: where w, n, x and b denote the weighting factors in kernels, number of kernels, input series and bias. The symbol ⊗ signifies the convolution operation. f is the activation function. The Rectified Linear Unit (ReLU) can effectively deal with the vanishing gradient problem and make network more trainable. It is adopted as the activation function in this paper. The formula is as follow: Pooling layer is mainly used to reduce the number of parameters. In this paper, maximum pooling is adopted.

B. LONG SHORT-TERM MEMORY NETWORK
In the traditional feedforward neural network, information flows from the input node to the hidden layer and then to the output layer in one direction, so the neural network cannot remember the input of different time series. Recursive network (RNN) can make use of internal state storage to realize memory function. Long short-term memory (LSTM) network is a special RNN, which can learn long-term dependence.
It was proposed by Hochreiter and schmidhub in 1997 [33], and has been improved and promoted by many people in the following works. It uses more complex internal structure cells instead of primitive low cell neurons to solve the problem of gradient explosion and gradient disappearance of primitive RNN. The basic unit of LSTM is shown in Figure 2. There are four important elements in the cell structure of LSTM: input gate, forgetting gate, output gate and cell state. Input gate, forgetting gate and output gate are used to update, maintain and delete the information contained in the status of control unit. The LSTM model updates the cell status and calculates the output as follows: where σ is sigmoid activation function. i t , f t , c t , o t are the output value of the input gate, forget gate, cell state and output gate. h t−1 is the hidden state of former  Figure 3 demonstrates the architecture of MCNN-LSTM.
CNN layer extracts the characteristics of historical load, time and weather factors respectively across three different positions of one-dimensional CNN. Using an independent one-dimensional CNN, which can obtain the characteristics of independent time series, and accurately identify the key features of the specific period (such as weekends and working day information) and weather changes that affect the load. The lower layer of MCNN-LSTM is LSTM. The time feature information related to power demand is extracted by CNN layer, and then processed by LSTM layer. LSTM is responsible for finding hidden time patterns from the extracted features. LSTM has the ability to remember the past events. Data passes through the unit neuron, which has two inputs: past historical data and current data. It can form internal memory that capture the actions through the whole sequence.

D. ARCHITECTURE OF THE PROPOSED MODEL
The proposed MCNN-LSTM hybrid model can adjust the parameters of each layer according to the composition of the network. The convolution layer can adjust the parameters of filter, kernel and stride, and the LSTM layer can adjust the parameters of hidden layer. Tuning the parameters will adjust the model's learning rate and learning effect of the model. Each model will differ. Dropout works to avoid over-fitting on the results. To improve continuity, the parameters of CNN layer and LSTM layer of MCNN-LSTM as shown in Table 1.

E. SIMULATION SOFTWARE AND HARDWARE
The proposed model is implemented in Python using backend libraries such as Keras and TensorFlow. This work is being conducted on a personal computer equipped with an Intel Core i7-4720HQ processor, a NVIDIA GeForce GTX 950m graphics card, and 16.0 GB of random access memory (RAM).

IV. PREDICTION EVALUATION
In order to evaluate the forecasting performance of the model proposed in this paper, the error metrics of five different viewpoints are adopted. They are MAE(Mean Absolute Error),

RMSE(Root Mean Square Error), MAPE(Mean Absolute
Percentage Error), R-squared and error as fellow: where y i is the true load data,ŷ i is the forecasting value, y i is the average of true load data, and N is the number of y i . Maine's electrical power and weather data are provided by ISO New England 3 . The data above is granular to the hour. The datasets from these two regions are divided into three components in this paper: training set, verification set, and testing set. The proportions of these three components are 6:2:2.

B. DATA PREPROCESSING AND FEATURE ENGINEERING
Outlier analysis and missing value filling are examples of data preprocessing. Due to the continuity of load data, the abnormal load value included in the data calculation and analysis process will have a negative impact on the load forecast result. A box plot is a graphical representation of digital data by its quartiles, it is a straightforward and effective method for visualizing outliers. Using the upper and lower tentacles as the distribution's boundaries, any data points that are higher or lower than the upper tentacles can be considered outliers.
In terms of missing value filling, our data has sparse missing values, which we fill using interpolation. The purpose of this study is to forecast future load at the large region level using historical load data, meteorological data, and calendar variables. The primary exogenous factors affecting STLF problems are weather conditions. Seasonal weather has an effect on electricity consumption. For example, the increased use of air conditioning and heating systems during the summer and winter seasons will affect the load curve's trend. We determine the correlation between load and four weather variables (temperature, humidity, air pressure, and wind speed) using the Pearson correlation coefficient method, and identify temperature and humidity as the most relevant input features. The load on the large level region accumulates the power system's large-scale energy consumption. In comparison to the stochastic microgrid, the periodicity of power load changes is more apparent [4]. STLF benefits from an appropriate mechanism for encoding calendar information in a way that accurately indicates the periodic change pattern. The most common methods of encoding a calendar in the traditional sense are natural encoding and one-hot encoding [11]. However, because these two coding methods do not take into account the periodicity of electricity behavior, they are ineffective for load forecasting. For instance,if natural coding is used, for example, 23:00 on the first day and 0:00 on the second day are adjacent, but their distance differs by 23. When one-hot encoding is used, the Euclidean distance between two distinct time points is always √ 2 and the distance between two identical time points is 0, making it difficult to distinguish the distance between the two time points.We use polar coordinates to encode calendar effect features in order to reflect the periodicity of temporal data. The polar coordinate coding method makes use of the sin and cos functions to identify calendar variables within a given period. Electricity consumption is influenced by people's production and lifestyle choices and exhibits varying trends over time. For instance, the load on weekdays and weekends is different, as is the load at various times of the day. As a result, we perform periodic coding for the hour of the day and the day of the week or year. The encoding process is by (14)(15)(16)(17)(18)(19) and

1) IRELAND SINGLE-STEP LOAD FORECASTING
The length of the proposed model series is crucial since it determines the amount of historical data that will be used in the training process. If the series is too brief, it is impossible to extract sufficient data to forecast the load. If the length is excessive, the presence of a significant amount of weakly correlated data increases the training cost and decreases the accuracy. We determine the length with the strongest connection by computing the load series' autocorrelation and partial autocorrelation coefficients. The autocorrelation coefficients of load series denote the correlation between a given load point and the load point at time t lagging, whereas the partial autocorrelation coefficients denote the correlation between the load point and the load series trailing over a specified time period. As illustrated in Figure 5, we use Python's statsmodels tool to plot the autocorrelation and partial autocorrelation of the Ireland load series. According to Figure 5, we design five models with 24-, 48-, 72-, 144-, and 168-step series lengths.The proposed model input set is historical load series and corresponding length of historical temperature (T), humidity (H) and calendar variables (p hsin , p hcos , p wsin , p wcos , p dsin , p dcos ). Table 2 compares the performance of MCNN-LSTM model with varying input set lengths.As the length of the input set rises, the training time increases as well, but the values of MAE,RMSE and MAPE firstly decrease and then increase. Although LSTM can solve the vanishing gradient problem of long series to a certain extent, too long series will contain more weak correlation information and affect the improvement of accuracy. When the first 72 steps were used on the input set of the Ireland dataset, the prediction accuracy was the highest. For Ireland's testing set, the results of all models' single-step load forecasting are calculated using the performance indicators MAE, RMSE, and MAPE. The evaluation indicators for the seven algorithms in Table 3 indicate that: (i) Deep learning models outperform traditional machine learning models in terms of prediction accuracy, while hybrid models outperform single models. (ii) Our proposed hybrid model's MAE, REMSE, and MAPE are the smallest and have the best predictive performance. (iii) When compared to the worst-precision SVM model, the proposed model's MAE is decreased by 42.83, RMSE is decreased by 42.58 MAPE is decreased by 1.35% and its R-squared is increased by 2.12%. (iv) Compared with the CNN-LSTM model, the proposed model's MAE decreases by 8.32, RMSE decreases 5.07, MAPE is decreased by 0.28%, and its R-squared is increased by 0.7%. Figure 6 illustrates error distributions using box plots. It can be seen that the proposed model's error is concentrated near the zero-valued baseline, indicating that the predicted load value is closer to the actual value. Figure 7 shows the simulation results of load forecasting curves of different models. All models predict values in a consistent manner relative to the actual values. To illustrate the predictive performance of various models, the details of the lower and peak load in Figure 7 have been partially   enlarged and are shown in Figure 8 (a) and (b). In comparison to other models, the proposed model's lower and upper peak loads are more similar to the actual value in terms of trend shape and degree of fit.
2) MAINE SINGLE-STEP LOAD FORECASTING Figure 9 depicts the autocorrelation coefficients and partial autocorrelation coefficients of Maine datasets. The prediction evaluation of different length load series in the Maine dataset is shown in Table 4, and the findings indicate that the proposed model performs best when a 48-step input set length is used. As a result of the variances in load curves, load data sets from different locations must select appropriate sequences to get the highest prediction performance.
Additionally, we forecast single-step load on Maine's testing set. Table 5 illustrates the evaluation indicators for various    also demonstrates that the error distribution of the proposed model is the smallest overall. Figure 10 illustrates the load forecasting simulation curves for various models. Figure 12 (a) and (b) are enlarged versions of the lower and peak load curves from Figure 11. The findings indicate that the proposed model is capable of accurately forecasting the electricity load and has superior fitting performance for lower and upper peak loads with a higher degree of randomness.

D. MULTI-STEP PREDICTION ANALYSIS
We compare the proposed MCNN-LSTM model to other models for 24-step load forecasting. For example, a sliding window multi-input multi-output forecasting method for MLP, CNN, LSTM, CNN-LSTM, and the proposed MCNN-LSTM models. The previous time-steps of data are used as input, and the predicted load for 24 time-steps is the output result. Because the SVM and LightGBM models do not support multiple outputs, multi-step load forecasting is accomplished by creating 24 models corresponding to 24 time steps.

1) IRELAND 24-STEP LOAD FORECASTING
We conduct 24-step load forecasting on Ireland's testing set, forecast the load for the next 24 hours. Table 6 summarizes the error evaluation metrics for the 24-hour ahead, including MAE, RMSE, and MAPE. In comparison to single-step load forecasting, the forecast errors of each model increase as the output dimensions increase. However, the proposed MCNN-LSTM model still has the best predictive performance, MAE is 67.31, RMSE is 105.12, MAPE is 2.10% and R-squared is 96.87%. The performance improvement of the CNN-LSTM hybrid model over the single CNN and single LSTM models is not immediately apparent. Compared with  To concretely demonstrate the forecast results, we evaluate different time points of each week. It can be observed from table 7 that the proposed model is smaller than other models in MAE, RMSE and MAPE, indicating that the proposed model has strong load forecasting capacity for weekdays and weekends. The simulation of a 24-hour ahead prediction curve for the Ireland testing set is shown in Figure 13. The findings indicate that deep learning-based methods such as CNN, LSTM, CNN-LSTM, and the proposed MCNN-LSTM are capable of accurately simulating the next 24-hour power consumption trend. The power consumption trends of MLP, SVM, and LightGBM, on the other hand, are poorly fitted. Additionally, the proposed MCNN-LSTM model can more accurately predict the 24-hour power consumption prediction's lower and upper peak loads.

2) MAINE 24-STEP LOAD FORECASTING
The weekly MAE, RMSE and MAPE show in Table 8, the proposed model has obvious advantages over other models. The overall evaluation index of Maine test set is shown in Table 9, and the prediction curves are shown in Figure 14. The results demonstrate that our proposed model provides the best 24-hour load forecasting performance and validates its generalizability.

E. COMPARISON OF TRAINING TIME
We compare the training times of four deep learning models with the best prediction performance, including CNN, LSTM, CNN-LSTM, and the proposed MCNN-LSTM. The CNN model is composed of two stacked one-dimensional convolutions with a total of 32 or 64 convolution kernels. The LSTM model is composed of two LSTMs stacked into a two-layer LSTM, with each hidden layer having a neural unit of 128. The CNN-LSTM hybrid model is constructed by stacking two-layer CNNs and two-layer LSTMs in series. The CNN layer contains between 32 and 64 convolution kernels, while the LSTM layer contains 128 hidden layer neurons. The MCNN-LSTM model's structural parameters are described in Section III D. To compensate for randomness, we run each model ten times over 100 epochs and employ early stopping to avoid overfitting. Table 10 summarizes the average ten-time training time for various deep learning models.
According to the results in Table 10, the CNN model requires the least amount of training time, while the LSTM model requires the most. The CNN-LSTM and proposed MCNN-LSTM models have a shorter training time than the LSTM model, owing to the use of the CNN layer to extract features and the convolution operation, which reduces the number of parameters and improves training efficiency. The training time for MCNN-LSTM is less than that for CNN-LSTM, because multiple convolution heads extract features independently, avoiding the issue of slow fitting speed caused by increased dimensionality of the input features.     Simultaneously, the Ireland test set has 5256 data points. Single-step load forecasting has a test time of 12.08s and an average hourly forecasting time of 0.0023s. The forecasting time for a 24-step load is 10.23s, and the average forecasting time every 24 hours is 0.047s.

F. THE IMPACT OF PERIODIC CODING
To demonstrate the effect of periodic coding, we compare the proposed MCNN-LSTM model to a model that does not include a convolution head for extracting calendar variables. The forecasting of single-step, six-step, twelve-step, and twenty-four-step loads is carried out using Ireland's testing set, and the error evaluation indicators are listed in Table 11.
As shown in Table 11, as the output dimension increases, the MAE, RMSE, and MAPE values increase as well. The proposed MCNN-LSTM model performs better than the MCNN-SLTM model without calendar variables. According to MAPE values, the performance of single-step load forecasting is improved by 1.39%, 6-step load forecasting is improved by 6.47%, 12-step load forecasting is improved by 10.23%, and 24-step load forecasting is improved by 20.45%. The experimental results indicate that periodic coding of calendar variables is more advantageous for forecasting loads with large time steps.

VI. CONCLUSION
We propose a multi-head CNN-LSTM (MCNN-SLTM) short-term load forecasting model in this article. Two datasets (the Ireland dataset and the Maine dataset) are used to evaluate the proposed MCNN-LSTM model's performance. On these two datasets, experiments with single-step and 24-step load forecasting are conducted. When compared to other machine learning and deep learning models, the proposed model achieves the highest accuracy for the evaluation indicators MAE, RMSE, and MAPE. The training time of four deep learning models is also recorded in this experiment. Although the proposed model requires significantly more training time than the CNN model, it is more accurate. In comparison to the CNN-LSTM model, the proposed model can reduce parameter variables and accelerate feature extraction speed through the use of multiple convolutional heads, while maintaining high accuracy and training speed. To investigate the effect of calendar encoding, we remove the proposed MCNN-LSTM model's convolution head about calendar variables. To investigate the effect of calendar encoding, we remove the proposed MCNN-LSTM model's convolution head about calendar variables. Comparative experiments indicate that the addition of calendar variables improves load forecasting performance slightly with short output steps but significantly with long output steps.
In general, the MCNN-LSTM short-term load forecasting model proposed in this paper is capable of producing acceptable results for both single-step and multi-step load forecasting. For future research, a variety of optimization algorithms could be considered to optimize the proposed model's parameters. Holidays are not treated differently in the proposed model, and necessary holiday elements can be introduced into future model upgrades.