Multi-Scale Convolutional Neural Network With Time-Cognition for Multi-Step Short-Term Load Forecasting

Electric load forecasting has always been a key component of power grids. Many countries have opened up electricity markets and facilitated the participation of multiple agents, which create a competitive environment and reduce costs to consumers. In the electricity market, multi-step short-term load forecasting becomes increasingly significant for electricity market bidding and spot price calculation, but the performances of traditional algorithms are not robust and unacceptable enough. In recent years, the rise of deep learning gives us the opportunity to improve the accuracy of multi-step forecasting further. In this paper, we propose a novel model multi-scale convolutional neural network with time-cognition (TCMS-CNN). At first, a deep convolutional neural network model based on multi-scale convolutions (MS-CNN) extracts different level features that are fused into our network. In addition, we design an innovative time coding strategy called the periodic coding strengthening the ability of the sequential model for time cognition effectively. At last, we integrate MS-CNN and periodic coding into the proposed TCMS-CNN model with an end-to-end training and inference process. With ablation experiments, the MS-CNN and periodic coding methods had better performances obviously than the most popular methods at present. Specifically, for 48-step point load forecasting, the TCMS-CNN had been improved by 34.73%, 14.22%, and 19.05% on MAPE than the state-of-the-art methods recursive multi-step LSTM (RM-LSTM), direct multi-step MS-CNN (DM-MS-CNN), and the direct multi-step GCNN (DM-GCNN), respectively. For 48-step probabilistic load forecasting, the TCMS-CNN had been improved by 3.54% and 6.77% on average pinball score than the DM-MS-CNN and the DM-GCNN. These results show a great promising potential applied in practice.


I. INTRODUCTION
Load forecasting plays an essential role for energy management and distribution management in power grids. It is a necessary part in order to ensure the balance between generation and demand. Operators of power grid need highaccuracy power load forecasting to maintain the safety and stability of power supply. Accurate load forecasting becomes more challenging due to the continuous development of the power grids and the increasing complexity of grid The associate editor coordinating the review of this manuscript and approving it for publication was Hui Ma. management [1], [2]. Many countries have opened up electricity markets and facilitated the participation of multiple agents, which creates a competitive environment and reduces costs to consumers. In the electricity market, load forecasting has become one of the most important tasks for electricity market entities. Electricity load forecasting categories can be simply summarized as follows: very short-term load forecasting (VSTLF), short-term load forecasting (STLF), medium-term load forecasting (MTLF), and long-term load forecasting (LTLF). The cut-off horizons for these four categories are one day, two weeks, and three years respectively [3]. STLF gives great significances to power system in providing strategies, reliability analysis, interchange evaluation, security assessment, and spot price calculation [4], [5], which brings a higher accuracy requirement.
In the last decades, short-term load forecasting algorithms have been widely studied. The major forecasting models are divided into three categories: traditional, machine-learning, and artificial intelligent models. The traditional models are studied and used frequently due to their fast computing speed and robustness [6]. Some common ones include linear regression [7], [8] and auto regressive integrated moving average (ARIMA) [9], [10]. Others, such as exponential smoothing [11] and multiple linear regression [12], [13], have also attracted the interests of relevant researchers. In dealing with the linear forecasting problem, these models are effective as a result of studying the qualitative relationship between electric load and its influencing factors. However, the linear regression methods are not competent in nonlinear problems like relationship between weather and load. Also, there are some other confusing problems. For examples, ARIMA has difficulty in selecting the corresponding order in two processes auto regressive (AR) and moving average (MA) etc.
Machine-learning models mainly include support vector regression (SVR) [14], [15], regression trees [16], random forest (RF) [17], gradient boosting regression trees (GBRT) [18], Kalman filtering [19], [20], and gray forecasting [21]. Gray forecasting can just only deal with the data type with exponent increase trend [22]. SVR can effectively extract non-linear features and avoid over-fitting, so it shows good performance in STLF. However, SVR is not stable enough to outliers, and the setting of training parameters involves many skills and difficulties, which produces a terrible training process. RF and GBRT are derived from decision tree, which has better robustness to outliers, fewer parameter settings, and higher forecasting accuracy. However, the decision tree algorithm does not serve an acceptable performance in the case of large load growth.
Then, more intelligent and automated artificial neural network (ANN) [23] methods are developed to overcome the non-linear and complex relationship in electric time series, such as back propagation neural network (BPNN), extreme learning machine (ELM), and radial basis function (RBF). Recently, deep learning has become one of the preferred technology in many research fields. Deep learning borrows spirits from ANN and boost the power of ANN via deepening its layers and leveraging its structures. These learning methods have been widely implemented to solve natural language processing and speech/image recognition problems. Deep belief networks(DBN), made up from multiple layers of restricted Boltzmann machines, was used in the problem of 24 hour ahead electricity consumption forecasting, and applied to available historical real data describing the electricity consumption in the Republic of Macedonia [24]. The layer-by-layer unsupervised training procedure is followed by fine-tuning of the parameters by using a supervised back-propagation training method. There are two most widely adopted models: recurrent neural networks (RNN) and convolutional neural networks (CNN). Long short term memory (LSTM) [25], an improved gate-based RNN, is suitable for processing and predicting important events with relatively long intervals and delays in time series [26]. LSTM cannot only resolve the gradient disappearance problem that exists in the RNN structure, but also enhances the ability of long-term memory achieving the state-of-the-art precision in STLF [27], [28]. Nevertheless, LSTM has some probabilities to lose important information in training process. At the same time, it does not hold the capability of the parallel computing that results in a time-consuming and computing resource wasting behavior. The basic architecture of CNN is designed for the image recognition, but through careful structural design, CNN-based networks can still achieve top-level accuracy in sequence processing [29]- [31]. Gated CNN(GCNN) and temporal CNN are reported being practiced in load forecasting [32].
However, most researches only pay attention to one-step load forecasting. On the contrary, multi-step load forecasting gives more contributions to practical applications, such as electricity market bidding and spot price calculation. Bonetto and Rossi [33] proposed an original forecasting technique based on non-linear autoregressive (NAR) neural networks. This architecture allows parallel and efficient training and is also lightweight at runtime. Yan et al. [34] proposed a multi-step forecasting strategy with a one-dimension convolutional neural network to predict the load quantity of a relatively longer span for electricity market bidding. Although their work improves the ability of parallel computing saving time and resources, the accuracy of this model is not satisfied enough. Cai et al. [35] presented a direct multi-step model based on GCNN for multi-step load forecasting. As a result, this model achieves the state-of-the-art performance. Besides, it gets rid of the shortcomings of the accumulated errors that exist over the recursive multi-step process since the forecasting is obtained directly from the prior neighbor data. Nonetheless, these researches ignore the time feature that is indispensable in multi-step load forecasting.
In order to solve problems mentioned above, this paper proposes the multi-scale convolutional neural network with time-cognition (TCMS-CNN) for multi-step forecasting of power aggregation load. With the multi-scale convolution, the ability of CNN have been improved by extracting complex and significant features of power load sequences. In addition, the temporal cognition of our deep model is strengthened by the periodic representation of a special time mark. At last, we propose a novel framework using deep learning, which combines sufficient and discriminative features to extract potential law in the dataset providing an excellent result. The main contributions of the paper can be summarized as follows.
• A one-dimensional multi-scale convolution is introduced, which extracts the intrinsic relationship of the load sequence from different locations. Besides, MS-CNN is proposed with multiple residual blocks to increase the depth and improve the ability of VOLUME 7, 2019 feature representation. Through comparison experiments with the state-of-the-art methods, MS-CNN effectively improves the forecasting accuracy at first.
• A novel strategy of a sophisticated periodical coding is proposed that enable our deep model born with a better capability of time-cognition. Experiments demonstrated that our network achieve considerable accuracy compared with currently popular algorithms.
• The TCMS-CNN model unifies two innovations above, which combines the multi-scale and time-cognition features to form an end-to-end multi-step predictive deep learning model. Complete experiments, including point load forecasting and probabilistic load forecasting, were performed to demonstrate the effectiveness of the proposed method. By competing with the state-of-the-art models, we find that TCMS-CNN can serve more accurate results and show excellent stability in multi-step forecasting, giving strong generalization in electricity market bidding and spot price calculation. To the best of our knowledge, this is the first paper that presents periodically coded deep neural network combining multi-scale convolution for multi-step load forecasting. At the same time, this is the first paper to carry out multi-step probabilistic load forecasting. These innovations increase the learning ability of our novel deep learning network to discriminative features and implement end-to-end training and predicting. The purpose of our work is to optimize a multistep load forecasting model based on deep neural network to improve accuracy in and facilitate production practice. This paper will be structured as follows: Section II defines the problem and describes the details of our proposed method. Section III reports the experimental results. Section IV further discusses some insights as well as problems of the proposed method. The conclusions are drawn in Section V.

A. PROBLEM FORMULATION
This paper focuses on STLF, which is considered a kind of time series prediction problem. Historical load (x L ) and holiday data (x H ) are selected as input, since in practice, they are highly relevant to load forecasting and acquired easily for business. The problem is to construct mapping relationships between historical load sequences and future load sequences, i.e., where X = x 1 , x 2 , . . . , x N i is the input sequence, Y = y N i +1 , y N i +2 , . . . , y N i +N o represents the output prediction, and x t = x L t , x H t . N i tells the length of the input sequence and N o the length of the output sequence. When N o is equal to 1, it is single-step forecasting, and if N o is greater than 1, it is a multi-step forecasting problem. In point load forecasting, y t is a scalar. While in probabilistic load forecasting, y t is a vector with length q, denoting the estimated q quantiles at step t. Based on the data set adopted in this paper, the forecast horizon is set to 48 steps(hours), covering the hourly load forecast for the next 2 days.

B. MS-CNN
Recently, CNN-based models have begun to receive attentions in sequence processing, and have achieved the stateof-the-art results in sequence tasks such as speech synthesis, language modeling, and machine translation. A simple convolutional neural network can only extract features in neighborhoods, which reflects limited relationships and cannot fit the target of load forecasting. Consequently, we introduce the multi-scale convolution to fuse extensive receptive fields information by stacking multiple layers of dilated convolution with various scales [36]. Dilated convolution cannot only reduce the size of learning parameters in deep network, but extract the multi-level and significant relationships of different position fused into our neural network. As shown in Fig. 1, we define a series of blocks, each of which contains a sequence of L convolutional layers. The activations in the l-th layer and j-th block are given by S (j,l) ∈ R F w ×T , where T is the length of input sequence that keeps equivalent in every layer. Also, the number of filters F w preserve the consistence, which enables us to combine activations from different layers later. Meanwhile, each layer consists of a set of dilated convolutions with a specific rate parameter s and related non-linear operations. We denote the non-linear activation function, the normalization and the dropout operation together as g (·). In our work, we pay no attention to causal relationships in load forecasting due to it does not fit time series problems well. Convolutions are applied over three time steps, t, t −s, and t +s, so the complete equations is as follows. The filters are parameterized by W = W (1) ,   In prediction phase, the result of the previous step forecasting is integrated into the next step input. For example, if we want to getŷ 2 , we must predictŷ 1 first and integrateŷ 1 into input of next step.
Let V ∈ R F w ×F w and e ∈ R F w be a set of weights and biases for the residual, where parameters {W , b, V , e} are different for each layer. The dilation rate increases for consecutive layers within a block such that s l = 2 l . This enables the structure to extract relationships from more scales rather than just neighborhood and increase the receptive field by a substantial amount without drastically increasing the number of parameters. Fig. 2 describes our proposed multi-scale convolutional neural network (MS-CNN) architecture that holds 8 blocks, each of which have 3 dilated convolution layers with different dilation rate s. This mechanism combines multi-scale features and makes a deeper network that is demonstrated an effective model backbone for load forecasting in section 3.

C. CONVOLUTIONAL NEURAL NETWORK WITH TIME-COGNITION
Currently, multi-step load forecasting methods can be divided into two categories: direct multi-step and recursive multistep [35]. The recursive multi-step method at beginning forecasts the first step, and then integrates the predicted value into the input sequence for the next step forecasting, as illustrated in Fig. 3. The prediction error of every step is accumulated to influence next step severely. Conversely, the direct multi-step method feeds the input sequence into the model and obtains all steps to be forecasted.
However, the current direct multi-step method does not take into account the temporal relationships between multiple forecasted load points, which makes the neural network . Three coding methods are illustrated for comparison by coding seven days of one week. Natural coding uses integers from 0 to 6 to represent days of a week, while the one-hot coding converts the natural code into seven one-hot vectors. The periodic coding uses the sin and cos functions to convert the natural coding into 7 vectors, each of which contains 2 values. Natural coding tells the time mark using semantic description lacking of period. The same problem also exists in one-hot coding. With our proposed periodic coding, the uniqueness and periodicity are preserved effectively that enables the neural network with a well-trained time-cognition. model lose prior knowledge resulting in a precision to be improved. Traditionally, there are two ways to encode time: natural coding and one-hot coding [37], which ignore the cyclic characteristic in electricity behavior and are not suitable in load forecasting. For example, 0 Sunday and 6 Saturday seems too far but a neighborhood actually. Therefore, we propose a kind of periodic coding to mark each step of the input sequence and predicted ones. Periodic coding emphasizes the uniqueness of the moment in one period and provides periodicity description. In details, periodic coding depends on a unique markup through the sin and cos functions. For example, if the natural encoding to one day in a week is n dw , then the periodic encoding converts n dw to a vector [p dwsin , p dwcos ] by the following formula: where T dw describes the length of a period, 7 days in a week. In load forecasting with time-cognition, let n hd , n dw , and n dy denote the hour location in one day and day in one week or year with [p hdsin , p hdcos ], [p dwsin , p dwcos ], and [p dysin , p dycos ] respectively. The complete periodic coding for one point in the year is followed by (5) and Fig. 4 gives an example three coding approaches for comparison. Meanwhile, a fully connected network learns the periodic codes of the forecasted multi-step position to increase extra features for prediction. Both of strategies strengthen the hybrid network's capability on time-cognition periodically. Then, the outputs of two subnetworks are fused and fed to another fully connected layer to learn and predict the final result. The entire framework of TCNS-CNN is described in detail in Fig. 5. TCMS-CNN reveals temporal characteristics of the input sequence with multi-scale convolutions to extract more potential features, which provides an advanced prediction for electricity market bidding. The implementation can be roughly divided into three stages: data preparation, model training, and forecasting, which are shown in Fig. 6.

A. DATASET
We use Ireland's 2014-2018 load dataset and the granularity is hourly [38]. Load data are the total electricity consumption in this country. It is widely known that there is a strong nonlinear correlation between load and weather characteristics such as temperature and humidity. In the experiments, we compared our proposed MS-CNN and TCMSC-CNN with a wide range of state-of-the-art models, some of which are linear models. For the sake of fairness and clear contrast, only load, holiday, and time data are used for all experiments. We intercepted a two-week hourly load profile, starting at zero o'clock on Tuesday, as shown from the Fig. 7, and the daily and weekly loads have significant periodicity, which also belongs to a common character of most power system loads. Fig. 8 shows a box plot to collect the statistics of the electric load distribution in 24 hours of a day (0:00 AM -24:00 PM) over our dataset. We observe that the load value of each hour has different maximums, minimums, medians, and quartiles, which proves a great fluctuation caused by dataset's periodicity leads to difficulties of learning the rules of load forecasting. Our dataset contains 31,128 pieces that are divided into training set, validation set, and testing set by 60%, 20%, and 20%.

B. EXPERIMENT SETUP
All experiments were conducted on a cloud server with two NVIDIA P4 computing cards and the CPU with 8 cores. The implementations of SARIMAX and SVR are based on the StatsModels and scikit-learn packages respectively. Other neural network-based models are realized by the Keras framework with Tensorflow [39] backend.
The experiments consist of three sections. At first, MS-CNN is compared with SARIMAX, DBN, SVR, LSTM, and residual neural networks (ResNet) at the single-step forecasting. Moreover, based on the multi-step MS-CNN, time coding methods including natural coding, one-hot coding, and our proposed periodic coding are compared. Finally, we evaluated our proposed TCMS-CNN to compare the performance with the state-of-the-art methods like recursive multi-step LSTM (RM-LSTM), direct multi-step Subnetworks on the right hold two full connection layers, inputs of which are periodic encoding of many predicted steps. The representation vectors output from the two sub-networks are concatenated as inputs of the top-level fully connected layer for generating loads at the predicted steps. This framework ensures the model obtain sufficient characteristics, which enhanced the understanding of dataset.

MS-CNN (DM-MS-CNN), and the direct Multi-step GCNN(DM-GCNN).
Global parameters settings are followed. Seasonal AutoRegressive Integrated Moving Average with eXogenous regressors model (SARIMAX) is an extended model of ARIMA that adds periodicity and uses external information to enhance the predictive ability of the model. The standard SARIMAX model follows the notation of SARIMAX(p, d, q) (P, D, Q)S, where p = non-seasonal auto-regressive (AR) order, d = non-seasonal differencing, q = non-seasonal moving average (MA) order, P = seasonal AR order, D = seasonal differencing, Q = seasonal MA order, and S = time span of the repeating seasonal pattern. SVR is an important application branch of Support Vector Machine (SVM) and has been widely used in regression problems in recent years. In SVR parameters, 'kernel' specifies the kernel type to be used in the algorithm. 'degree' denotes degree of the polynomial kernel function. 'gamma' is kernel coefficient and 'coef0' independent term in kernel function. 'tol' and 'C' represent the tolerance for stopping criterion and penalty parameter of the error term respectively. 'Shrinking' tells whether to use the shrinking heuristic. The parameters of SARIMAX and VOLUME 7, 2019  SVR are shown in Table 1. We use grid search over all possible combinations of parameter values within a predefined range of values to get the parameters of SARIMAX, and the S is 24 clearly. For SVR, most parameters are defaults except that C and input_length are chosen by cross validation.
There are several infrastructure neural networks including DBN, LSTM, ResNet, MS-CNN, and GCNN, and they share the same parameters in different experiments. The parameters of the DBN consist of two parts: the parameters of the unsupervised training phase and the parameters of the supervised fine tuning phase. The parameters of DBN are shown in 2, and the others in Table 3.

C. EVALUATION METRICS
The point forecasting performances are evaluated using three performance metrics [40], including root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE). Smaller values of their outputs mean higher forecasting accuracy. The formulations of the above three metrics are as follows: where y i is the ground truth of testing sample,ŷ i represents the prediction, and N is number of testing samples. For probabilistic load forecasting evaluation, there are three commonly used attributes: reliability, sharpness, and resolution. Reliability refers to how close the predicted distribution is to the actual one. Sharpness refers to how tightly the predicted distribution covers the actual one. Resolution refers to how much the predicted interval varies over time. Measures like Kolmogorov-Smirnov statistic, Cramer-von Mises statistic, and Anderson-Darling statistic are only be used to assess the unconditional coverage of a probabilistic forecast, but not evaluate its sharpness or resolution. In this paper, the performance of the probabilistic forecasts are evaluated by the average of the total pinball score, which is a comprehensive measure considering not only reliability but also sharpness and resolution. The pinball score for one quantile is calculated as follows: (9) where q denotes the targeted quantile,ŷ q t denotes the forecast at the qth quantile at step t.

D. PERFORMANCE COMPARISON WITH ABLATION STUDY AT SINGLE STEP
In order to clarify the predictive performance of our proposed MS-CNN model for prediction, we compared it with the state-of-the-art models such as SARIMAX, SVR, LSTM, and  ResNet at single-step forecasting accuracy. In this ablation study, all input sequences are not marked with the time stamps. The results of the comparison are shown in Table 4 and Fig. 9.
Through the results, we can observe that models based on neural networks generally achieve higher accuracy than SARIMAX. SVR's MAPE performs better, but its RMSE becomes inferior among all models. Due to its powerful learning and fitting ability of deeper network, vanilla ResNet outperforms LSTM in three metrics giving great potential for loading forecasting. In statistics, our proposed MS-CNN achieves the best accuracy, and compared with LSTM, MS-CNN increases 7.2%, 11.58%, and 10.09% in RMSE, MAE, and MAPE respectively. It demonstrates that the multiscale convolution helps MS-CNN extract more significant features from the input sequence providing the excellent ability of mining valuable relationships in the dataset, as illustrated in Fig. 9.

E. EVALUATIONS ON TIME CODING METHODS
In this subsection, we compare our proposed time coding strategy periodical coding with common methods on multi-step forecasting performance, including natural coding, one-hot coding, and periodic coding. All experiments are implemented using the MS-CNN for the sake of fairness. For three dimension informations, hours of a day, days of a week, and days of a year, as discussed in section 2, the natural coding has 3 (1 + 1 + 1) features, one-hot coding holds 397 (24 + 7 + 366) features, and our periodic coding serves 6 (2 + 2 + 2) features. The target of multi-step load forecasting adopts next 48 hours and evaluated results are listed in Table 5.  Table 5 expresses that the natural coding has the lowest accuracy, and the neural network cannot utilize time features to produce an acceptable prediction. Natural coding provides VOLUME 7, 2019 only simple representation to time, which exhibits a significant periodical relationship. One-hot coding removes the linear temporal features in the natural coding, thus improved the accuracy of load forecasting. However, it still lacks of the periodic correlation between time and load, which is difficult for generalization in our problem. The periodic coding achieved the best accuracy, indicating that it effectively learned periodic features in the training addressing problems caused by other two algorithms. In experiments, RMSE, MAE, and MAPE of periodic coding decreased by 19.69%, 30.1%, and 38.59% compared to the one-hot coding respectively.

F. COMPARISON OF THE STATE-OF-THE-ART MULTI-STEP FORECASTING MODELS
We evaluated several the state-of-the-art multi-step models, including RM-LSTM, DM-MS-CNN, and DM-GCNN, with our proposed TCMS-CNN for 48 hours forecasting on 5976 pieces of testing data listed in Table 6, where each item represents an average MAPE of 5976 experiments on each step. RM-LSTM is derived from the recursive multistep model. DM-MS-CNN and DM-GCNN belong to the direct multi-step model without any time coding. MAPE was chosen as the only metric to reflect the algorithm's performance. The results are depicted in Table 6. RM-LSTM yielded the best results before the fourth step but fell behind others increasingly. RM-LSTM has the predominance in single-step, whereas accumulated errors results in poor performances with growing steps. DM-MS-CNN is built up with our proposed MS-CNN for direct multi-step forecasting. DM-GCNN is another direct multi-step model with gated CNN outperforming RNN for the sequential modeling task. Their MAPE growth are slower than RM-LSTM indicating an improved performance for multi-step load forecasting. Our proposed model TCMS-CNN achieved the best accuracy over steps on average, improved by 34.73%, 14.22%, and 19.05% on average MAPE than RM-LSTM, DM-MS-CNN, and DM-GCNN respectively. Fig. 10 gives 10 examples of four models' predictions to different group of 48 hours, where our proposed TCMS-CNN results keep a stable and closest performance to ground truth in general. Fig. 11 describes the compared methods' MAPE curve. Obviously, the MAPE curve of TCMS-CNN grows slowly with lower and steady errors manifesting a remarkable robustness in 48 hours load forecasting. Its success was owed to two points: 1) a basic convolutional neural network backbone MS-CNN playing a critical role in extracting multilevel features and fusion; 2) an effective periodical coding to integrate sequential features into our model. In conclusion, TCMS-CNN is proved to have the advanced precision and stable performance holding a promising application in practice.  We adopt quantile regression with pinball loss for producing quantiles [41]. Recursive methods are not suitable for this experiment, because quantile outputs cannot be fed into input recursively. The application of AMSGrad requires the loss function to be differentiable so that the neural network can be trained using gradient descent. Common pinball loss is not differentiable everywhere, so we introduced the Huber norm to the loss function, with least change, making the loss function differentiable everywhere. The Huber norm can be viewed as a combination of the L1-and L2-norms: where ε denotes the threshold magnitude for the L1-and L2-norms. In order to minimize the error caused by this approximation, we set ε to a very small value of 0.001. Then, we can get the approximated pinball loss: As with the point prediction experiment, we still forecast the 48 steps probabilistic load for each sample on all 5976 test samples. Each predicted step produces 9 quantiles, from 0.1 to 0.9, so in the forecasting phase, a total of 5976 × 48 × 9 quantiles are generated. The average pinball score for each step and for all 48 steps are calculated in Table 7. As shown, the average pinball loss of TCMS-CNN on entire test set is 300.11MW, which is 3.54% and 6.77% lower compared to DM-MS-CNN and DM-GCNN respectively. The change of pinball score for each step is demonstrated in Fig. 12. Three average pinball scores curves all goes upward as the steps moving forward. Compared with DM-MS-CNN and DM-GCNN, the average pinball score curve of TCMS-CNN is less fluctuating and smoother, and the curve rises more slowly, which shows the superiority of the proposed method.   three subgraphs, the red lines of TCMS-CNN are the smoothest and most compact, indicating that TCMS-CNN has achieved the best sharpness. Although the red lines of TCMS-CNN are very compact, the black line is still well wrapped in the red lines, showing that the reliability of TCMS-CNN keeps high.

IV. DISCUSSION
Multi-step short-term load forecasting is gaining more and more attentions in electricity market bidding and spot price calculation. The development of deep learning technology provides a pathway to improve accuracy for short-term load forecasting and a growing ability to fit the time series data. In this paper, we give some contributions to multi-step load forecasting with deep learning.
At first, a multi-scale convolutional network (MS-CNN) is proposed to extract multi-level features fused to our model with dilated kernels. In addition, the shortcomings of traditional time coding approaches are analyzed. For strengthening the periodical description to sequential model, we design the periodical coding strategy to encode the load data for improved prediction. At last, we present the TCMS-CNN model to integrate multi-scale convolutions and periodical coding into an end to end trainable neural network, which optimizes the structure of CNN and extracts more relationships with periodical characters raising the accuracy of multi-step load forecasting. Some ablation experiments of comparison with the state-of-the-art methods were finished. For verifying the MS-CNN, we compared it with some popular models SARIMAX, SVR, LSTM, and ResNet on singlestep prediction. The results demonstrated that MS-CNN has an obvious predominance 7.2%, 11.58%, and 10.09% improvement in RMSE, MAE, and MAPE to 2nd rank algorithm respectively which can be preferred as the baseline for advanced networks for load forecasting. In the second experiment, we evaluated our proposed periodical coding in comparison with natural coding and one-hot coding. Through the same neural networks, the statistics of RMSE, MAE, and MAPE gave a proof that the periodical coding outperforms remarkably than others providing a great potential in the sequential model prediction. Finally, the most advanced currently models RM-LSTM, DM-MS-CNN, and DM-GCNN had joined in our testings. 34.73%, 14.22%, and 19.05% improvements on MAPE of point forecasting than the state-of-the-art methods RM-LSTM, DM-MS-CNN, and DM-GCNN respectively had been found on 5976 pieces of testing data for 48 steps load point forecasting. For 48 steps probabilistic load forecasting, TCMS-CNN had been improved by 3.54% and 6.77% on average pinball score than DM-MS-CNN and DM-GCNN respectively.
However, for direct multi-step forecasting, the performance of TCMS-CNN is not satisfied enough at first steps. Meanwhile, its network structure runs some complicated that results in a little time-consuming training and inference process. Future work includes improving accuracy for first steps and optimizing the network structure.

V. CONCLUSIONS
This paper proposed TCMS-CNN for short-term multi-step load forecasting. With the multi-scale convolution, the ability of CNN have been improved by extracting complex and significant features of load sequences. In addition, the temporal cognition of our proposed model is strengthened by the periodic coding. At last, we propose a novel framework that combines sufficient and discriminative features to extract potential law in the dataset providing an excellent result. By competing with the state-of-the-art models, we find that TCMS-CNN can serve more accurate results and show excellent stability in multi-step point and probabilistic forecasting, giving strong generalization in electricity market bidding and spot price calculation.  His main research interests include information integrate, complexity software systems, network coding and communication security, chaos-based digital communications, applications of complex-network theories, and cryptography. He has authored and coauthored over 130 international journal papers and 100 conference papers. He has also published five books, including Introduction to Communication and Program Designing of Visual Basic .NET. He is a Senior Member of the Chinese Institute of Electronics and the Teaching Guiding Committee for Software Engineering under the Ministry of Education. He was a recipient of nine academic awards at the national, ministerial, and provincial levels. He has served in different capacities for many international journals and conferences. He serves as the Co-Chair for the 1st-8th International Workshop on Chaos-Fractals Theories and Applications. VOLUME 7, 2019