A Hybrid Approach for Energy Consumption Forecasting With a New Feature Engineering and Optimization Framework in Smart Grid

Electric energy consumption forecasting enables distribution system operators to perform efficient energy management by flexibly engaging energy consumers under the intelligent demand-response program in the smart grid (SG). With this motivation, in this paper, a fast and accurate hybrid electrical energy forecasting (FA-HELF) framework is developed. The proposed framework integrates two modules with support vector machine (SVM) based forecaster. These modules are data pre-processing and feature engineering, and modified enhanced differential evolution (mEDE) based optimizer. First, feature selection algorithms like random forests and relief-F are combined to devise a hybrid feature selection algorithm to alleviate redundancy. Secondly, for feature extraction, a radial basis Kernel-based principal component analysis algorithm is employed to eliminate the dimensionality reduction problem. Finally, to conduct accurate and fast electrical energy consumption forecasting, the mEDE based optimizer is integrated with the SVM based forecaster. The resulting FA-HELF framework is tested on publicly available independent system operator New England (ISO-NE) control area hourly load data. The results demonstrate that the FA-HELF framework is robust and shows significant improvements when compared to other benchmark frameworks in terms of accuracy and convergence speed.


I. INTRODUCTION
The electrical power system is a network that includes two types of agents: power agents (generation, transmission, and distribution) and consumer agents, where both the agents attempt to maximize benefits and minimize expenses. The emergence of advanced metering infrastructure (AMI) has renovated the legacy power grid with smart grids The associate editor coordinating the review of this manuscript and approving it for publication was Xianming Ye.
(SGs) [1]- [3]. This renovation has created a platform for both agents to forecast electrical energy consumption for decision making in the SG. Thus, electrical energy consumption forecasting is an indispensable task in planning, operation, and energy management of the SG. Generically, electrical energy consumption forecasting has four types based on time horizon: (i) very short-term forecasting, (ii) short-term forecasting, (iii) medium-term forecasting, and (iv) long-term forecasting. The forecast time horizon for these classes ranges from one minute up to one hour, more than an hour up to a week, more than a week up to a year, and more than a year, respectively.
The first three-term electrical energy consumption predictions are favorable for the operation and energy management of the SG. However, long-term forecasting is beneficial for the infrastructure planning and development of the SG. Thus, it is essential to acquire accurate load forecasting because incorrect forecasting may result in improper operation, planning, and energy management, which leads to imprudent reservoirs, excessive generators, and additional operating costs. Moreover, optimistic electrical energy consumption forecast may cause risk to the power grid or lead to unnecessary energy purchases from operators who have forecasted energy consumption accurately by paying high costs. A few recent papers have highlighted the problem of electrical energy consumption forecasting in [4]- [6]. However, some authors focused on net electrical energy consumption forecasting and very few authors focused on sub-metered electrical energy consumption forecasting. Therefore, electric energy consumption forecasting issues remain open for solutions. Realizing the importance of the problem, authors have developed a variety of electrical energy forecasting models during the past few years. Recently, a model based on enhanced deep neural network (EDNN) is developed to predict week and year ahead electrical energy consumption [7]. In [8], a framework based on a multi-task regressor is proposed to predict electrical energy consumption based on the recorded energy consumption data by the smart meters. Sideratos et al. [9] proposed a load prediction model based on a hybrid DNN (HDNN). The proposed HDNN based model utilizes key parameters of ANN and deep learning models to resolve the forecasting issues. Though these papers are a good start for studying electrical energy consumption forecasting models, the frameworks used in literature are condemned for issues like impotent learning, handcrafted features, inaccurate appraisal, insufficient guiding significance, and limited learning capacity.
Thus, a fast and accurate hybrid electrical energy consumption forecasting (FA-HELF) framework is developed in this work to solve issues related to electrical energy consumption forecasting in real-time for efficient energy management. The proposed FA-HELF framework is a hybrid model having three modules: (i) feature engineering, (ii) forecaster, and (iii) optimizer. The main features and contributions of this research are described as follows: 1) A non-linear integrated forecasting framework is designed that is capable of handling the complex real situation and is implementable in the energy management of the SG. 2) To develop a novel feature engineering framework, a hybrid feature selector technique is proposed by fusing two algorithms: relief-F and random forests techniques, to monitor and control the feature selection process. Then, a radial basis Kernel-based principal component analysis (KPCA) is proposed for feature extraction to resolve the dimensionality reduction issue. Finally, the optimizer module based on modified enhanced differential evolution (mEDE) algorithm is integrated with the forecaster module based on SVM to improve the accuracy by optimizing the hyperparameters. 3) To develop an optimization framework, a novel algorithm namely mEDE is proposed by devising modifications in the EDE algorithm to enhance forecast accuracy and convergence speed for efficient energy management. 4) The proposed FA-HELF framework is tested on hourlyload data of the ISO New England (ISO-NE) control area. The experiments validate that the proposed FA-HELF framework outperforms other existing frameworks.
The remainder of the paper is structured as follows: First, the recent and relevant background study of electric load forecasting is discussed in section II. In section III, the proposed FA-HELF framework is introduced and explained. The simulation results and discussion are presented in section IV. Finally, in section V, the paper is concluded along with future research directions.

II. RELATED LITERATURE STUDY
In this section, the existing recent and relevant research on electrical energy consumption forecasting for efficient energy management in the SG is discussed. Various electrical energy consumption forecasting strategies like statistical methods, machine learning methods, time series models, dynamic systems, and hybrid forecasting models are presented. Some of them are explained below.
A hybrid model is presented in [10] for load forecasting, which employs the combination of extreme learning machine (ELM), wavelet transform (WT), and least square regression (LSR), to improve the forecast accuracy. An ensemble strategy based on WT is proposed to avoid the trivial process of utilization of the complementary information of wavelet parameters and wavelet parameter selection to forecast accuracy. This model is also extended to electricity price forecasting by altering the input variables. However, the criterion to select decomposition levels and mother wavelet is not precisely defined. Besides this, the objectives are achieved at the cost of increased computation time. In [11], a framework for time-series load forecasting based on random forests and ensemble of ELM is proposed. The training sample sets are generated through a bootstrap sampling technique for random forests-based ELM ensemble model to handle multiregime time series forecasting. However, accurate, stable, and efficient performance is achieved at the cost of high execution time. Authors developed a subsampled support vector regression ensemble (SSVRE) framework in [12] to solve the prediction problem. The accuracy is improved by support vector regression (SVR) and the learning process is enhanced by a subset of small sized subsamples. The proposed VOLUME 8, 2020 framework is validated on publicly available datasets of New South Wales and Jiangxi Province. However, some forecasters in the ensemble are more accurate and the final output should be the average of the single outputs. An integrated framework of wavelet neural network (WNN), empirical mode decomposition (EMD), autoregressive integrated moving average (ARIMA), and fruit fly optimization algorithm (FFOA) is developed in [13] to predict future load. However, the accurate load forecasting is influenced by social and natural parameters that make the prediction process cumbersome, time-consuming, and slow. An enhanced Gaussian process mixture (EGPM) is presented in [14] to solve the load forecasting problem in the most effective manner. The efficacy of the model is evaluated by comparing with benchmark schemes like Gaussian process mixture (GPM), radial basis function neural network, and SVM classifier, in terms of forecast accuracy. The simulation results indicate that EGPM outperforms existing methods in accuracy. However, the accuracy is enhanced at the cost of increased compilation time. In [15], a hybrid model of EMD, general regression neural network, and FFOA are proposed for load forecasting having minimal redundancy and maximum relevancy. However, it is combersome to forecast due to various influencing parameters that lead to high volatility and computational complexity. A load forecasting model is presented in [16] for efficient energy management. The hybrid model is an integrated framework of least square SVM, auto-correlation, and grey wolf optimization algorithm. However, efficient energy management is performed at the cost of high system complexity.
In [17], a hybrid short-term electric load forecasting framework based on ANN is proposed. The forecasting is achieved by training the multi-layer neural network using the genetic algorithm (GA), general regression network, and Elman neural network. Accurate results are achieved by validating the system model on real data of Victoria and New South Wales states of Australia. In [18], authors focused on features selection based on genetic strategy and forecasting on the ELM. The forecasting model is validated on the load dataset of the Australian electricity market operator (AEMO). The simulation results validated that the forecasting model reduced the error and improved forecast accuracy. However, the ELM degree of freedom reduces with the increase in network complexity that leads to the problem of model overfitting. The work in [19] is based on multi-scale modeling for accurate demand forecasting ranging from shortterm to medium-term time horizon. The emphasis of the authors is more on data analysis. Moreover, both seasonal and non-seasonal cycles are detected using autoregressive and moving average methods. The proposed model's accuracy is assessed based on the Akaike-Bayesian criterion. The accuracy is, however, enhanced while the rate of convergence is compromised.
A deep learning approach is presented in [20] for the household load forecasting. The uncertainty in the household is learned using deep belief recurrent neural network (RNN) model for accurate load forecasting. However, the increasing number of layers in the deep model, data, and volume diversity may sometimes lead to the problem of overfitting. In [21], a data-driven framework based on copula model and DNN is proposed for load forecasting. The data is pre-processed by Box-Cox transformation and the parameters are tuned using the copula model for accurate load forecasting. The proposed model is tested on the power load data of both Texas and Arkansas in the United States. The proposed model performs better than classical neural networks and ELM based models.
A load forecasting framework based on artificial intelligence (AI) and DNN is described in [22] to improve the electric load prediction process for the South African distribution network. The proposed model outperforms both optimally tuned ELM and adaptive neuro-fuzzy inference system (ANFIS) based models. However, the accuracy of the forecasting process for the distribution power system is increased by including the effect of temperature, which in turn results in slow convergence. In [23], an intelligent hybrid model based on Kalman filtering (KF), WNN, and ANN using clustering techniques is presented to forecast the day and week ahead load of the commercial sector of Egypt and Canada. However, the accuracy is achieved at the cost of increased model complexity.
It can be confidently concluded from the literature that tremendous progress has been made in the field of electric load forecasting for energy management. However, the existing methods are weak in processing large data and it is hard to tune control parameters, which results in high computational complexity and inability to quickly converge because the redundancy, irrelevancy, and dimensional reduction are not averted. Moreover, the aforementioned literature does not cater for both forecast accuracy and convergence rate simultaneously. To solve such problems, a fast and accurate framework is the need of the day. Thus, in [24], SVM and gradient descent algorithm-based framework is proposed. However, this framework introduces much computational complexity and is unable to converge. Some authors focus on feature selection algorithms and traditional classifier decision trees and artificial neural networks [25]. However, decision trees face the problem of overfitting, which means that a decision tree performs well in training but poorly in prediction, and the artificial neural network has limited generalization capability and has difficulty in controlling its convergence. In [26], the authors proposed hybrid feature selection, extraction and classification-based framework for load forecasting. However, this method has high system complexity and is unable to converge.
In this context, a novel hybrid forecasting FA-HELF framework is designed in this study to forecast the convergence rate and accuracy simultaneously. The objective of this framework is to provide fast and accurate forecasting for efficient energy management to fulfill energy needs of the society. The recent and relevant work discussed in this section is summarized in Table 1.

III. PROPOSED HYBRID FRAMEWORK WITH NEW FEATURE ENGINEERING AND OPTIMIZATION MODULES
The proposed FA-HELF framework has three modules: data pre-processing and feature engineering module, SVM based forecaster, and mEDE based optimizer, as depicted in Figure 1. This work is an extension of our earlier work published in a conference [27]. The feature engineering part is based on Grey correlation analysis (GCA), and radial basis kernel principal component analysis (KPCA). The forecaster module is based on SVM. The optimizer module is based on the proposed mEDE algorithm. The overall step by step procedure of the proposed framework is depicted in Figure 2, and a brief discussion is given below:

A. DATA PRE-PROCESSING AND FEATURE ENGINEERING MODULE
Data from the ISO-NE control area is fed to the data pre-processing and feature engineering module, which is composed of two phases: (i) pre-processing, and (ii) feature engineering, as shown in Figure 2. The brief description of (i) and (ii) is as follows: The first phase is the data pre-processing. In the preprocessing phase, data cleansing action is conducted on the dataset to recover defected, erroneous, and missing data by taking the average of the previous day's load. Then, the cleansed data is forwarded to the normalization phase to pre-prepare the data within limits of the activation function. After normalization, the normalized data is structured in descending order via a structuring action. Finally, the obtained results are normalized to move towards desired energy consumption forecasting. The prepared and cleansed data is fed into the feature engineering phase, as depicted in Figure 2.

2) FEATURE ENGINEERING
The second phase is feature engineering. In this phase, the abstractive and key features are selected and extracted from the prepared data, and redundant and irrelevant features are discarded, as depicted in Figure 2. The desired features from the dataset are selected and extracted through GCA and radial basis KPCA, respectively. To control feature selection, the feature selector is based on GCA, which combines relief-F and random forests algorithms to calculate the importance of the features. Furthermore, the feature selector decides based on feature importance whether to reserve or discard a feature. The radial basis KPCA based feature extractor uses the Kernel function to deal with high dimensional non-linear data because the PCA is not suitable for this dilemma. The purpose of feature extraction is to reduce redundant features. A brief demonstration of the feature engineering phase is given below: 1) Feature selector: The feature selection process is based on GCA, which is developed by combining relief-F and random forests and is controlled by combined controlling threshold σ . The GCA roughly selects a feature space where the most relevant and desired features are kept and irrelevant features are discarded based on feature importance and feature selection controlling threshold σ . The electric load data matrix is represented by D and is defined as follows: The columns denote the feature index and rows represent the time-stamps. Furthermore, d mn is the mth component of data, which is nth hour ahead of electrical Step by step working flow chart of the proposed schematic framework for fast and accurate electrical energy consumption forecasting. Red dotted box shows data pre-processing and feature engineering module, green dotted box shows SVM based forecaster module, and the blue dotted box represents mEDE algorithm-based optimization module.
energy consumption pattern that is to be forecasted. Equation 1 can also be written in the time sequence form as: where, The different features have various degrees of impact on forecasted electrical energy consumption patterns, and the GCA calculates the importance of each feature and its influence on electrical energy consumption forecasting. The GCA determines the correlation between each feature and the final electrical energy consumption pattern to effectively control the feature selection process. The GCA via correlation determines the closeness between data signals. Closer the two data signals, greater is the correlation and vice versa. Thus, GCA measures the closeness between two data signals.
Since each feature has different physical meaning and different dimension in a framework, when the GCA is carried out, non-dimensional data is normalized either via their average value or the maximum value. The original data sequence is normalized as follows: where are the two sequences, n is the count of features, and m is the time sequence length. The Grey correlation coefficient, after normalization, is calculated as: where ρ represents a distinguishing coefficient, which is set as 0.50 by [28]. The grade of Grey correlation is determined as follows: The low correlated features are dropped and the remaining selected features are sorted in descending order and the time sequence becomes as follows: where δ is the dropped features and t i is the time sequence.
After GCA, the data sequence is passed through feature selectors. At this stage, the data sequence is processed by two evaluators: relief-F evaluator κ, and random forests evaluator γ , respectively. The purpose of evaluators is to evaluate the importance of each feature to select desired features. The feature evaluation and selection processes are shown in Algorithm 1. The first evaluator is the random forests evaluator that moves on boot-strap bagging samples [29]. The bagged samples are divided into training samples and out of bag samples. For the first evaluator γ , all weights are initialized to zero, and the training of random forests is initiated. Then, the feature's importance is determined by the out of bag data with noise. For the second evaluator κ, weights are updated on the concept of distance among near hits and misses. The two evaluators (γ and κ) forward the determined feature importance to the selector to perform feature selection based on the controlling threshold.
In Algorithm 1, r [n] represents the random forests having n number of decision trees. The ω F [τ j ] and ω r [τ j ] represent feature importance that is calculated by relief-F and random forests, respectively. The parameters are updated as follows: where in class C i , d * represents randomly selected item, and function diff (D, r 1 , r 2 ) calculates difference D between r 1 and r 2 . Mathematical definition of the difference function is as follows: The features selection process is based on importance ω F and ω r , respectively. Both are normalized as follows: The features that have combined importance value greater than σ are considered as key features, while those features that have combined importance value less than σ are considered as irrelevant features. The key features are kept and irrelevant features are discarded. This process is mathematically modelled by Equation 11.
The selected features are sent to the feature extraction phase based on a radial basis KPCA to remove redundancy among the selected features. 2) Feature extractor: In the second phase, the feature extraction operation is performed, which is based on radial basis KPCA. The purpose of this operation is to remove the redundant information to resolve dimensionality reduction issue. The output of the feature selector is given as an input to the radial basis KPCA based feature extractor in order to generate dimensionally reduced matrix having desired and most relevant features, which can be modelled as: where s i is the i th variable related to the electric load. The correlation of features and eigenvalues is computed as: where λ represents the eignevalue, V denotes covariance matrix of S, and f * denotes the feature space.
Moreover, V f * e is calculated through Equation 13: where ϕ denotes input data mapping and feature space, and s, y represents the product of s and y. Equation 13 becomes Equation 15 by devising the above modifications: where e for λ = 0 can be determined as: where β i represents coefficients that correspond to s i . Now, the Kernel function defined in [30] is used as: Equations 15 and 16 are combined, and the combined form is defined as follows: where β i represents coefficients that correspond to s i and β = [β 1 , β 2 , . . . , β N ] T , then Equation 15 can be rewritten as: The eigenvectors β and λ are selected to perform dimensionality reduction via normalization. Therefore, we have: The resultant Equation 21 can be obtained by substituting Equation 16 in Equation 20, which is as follows: The principal component extraction can be determined in the following manner: where p shows the principal component. The Kernel functions have the following generic forms: Linear form : K (s, y) = s, y Activation basis form : K (s, y) = tanh α 0 s, y d + α 1 Radial basis form : The feature extraction phase is shown in Algorithm 1. After the feature engineering phase, the selected and extracted features matrix is fed as an input to the forecaster module based on SVM to forecast electrical energy consumption pattern.

B. SVM BASED FORECASTER
The data is cleaned after the data pre-processing and feature engineering phases, and has no redundant and irrelevant features anymore. This module achieves desired electrical energy consumption forecasting through the cleaned and processed data. Various machine learning techniques exist in the literature for electrical energy consumption forecasting. The SVM, among the machine learning models, is chosen as a forecaster for electrical energy consumption forecasting due to its robust and efficient performance to produce accurate results with less computational time. In this section, we formulate and investigate the classification problem. The SVM based forecaster is depicted in Figure 2 and the detailed description of the problem is as follows:

1) PROBLEM FORMULATION
The classification problem is mathematically modeled as: where p ∞ i (i = 1, 2, 3, . . .) are the parameters of the forecaster to be determined, D is the dimensional space, and ∂ depends on data distribution and parameters of the classifier. The objective of SVM is to define a hyperplane in D-dimensional feature space that differentiates the data points. In this work, the hyperplane is defined in Equation 24. After that, the regularized risk function is defined as: where σ represents feature selection controlling threshold, ς denotes insensitive loss function parameter, and L a i represents the target electrical energy consumption pattern. The minimization of this regularized risk function is required to obtain the parameter p. Robust error function can be computed as follows: In Equation 26, a function is used to minimize Equation 25 and can be modeled as: where α * ≥ 0 for all values of i. K * (s, y) is the Kernel function for SVM that shows dot product in the feature space f * of radial basis KPCA as: The Kernel function makes χ i feature not needed to be calculated in an infinite feature space. The α and α * can be achieved by maximizing the quadratic form as: The SVM based forecasting process is indicated in Algorithm 1. The forecasted energy consumption pattern is fed into the optimizer module to improve the accuracy by further minimizing the error.

C. MODIFIED ENHANCED DIFFERENTIAL EVOLUTION ALGORITHM BASED OPTIMIZER
The goal of this module is to further improve the forecast accuracy by minimizing regularized risk function. Since the returned value of the regularized risk function from SVM based forecaster is minimum as per its capabilities, therefore, the optimizer module is integrated with the forecaster based on SVM to further minimize the regularized risk function. Thus, the optimization module takes the regularized risk function minimization as an objective function. But this function is related to the hyper parameters like Kernel parameters ξ , cost penalty ϑ, and insensitive loss function parameters ς . However, optimizing these hyperparameters for fast, accurate, and efficient load forecasting is still a crucial issue. In this view, scholars have used various methods like gradient descent algorithm, cross validation, and back-propagation algorithm to optimize hyperparameters [31]. However, these methods have high computational complexity and are unable to converge. Therefore, DE among the optimization algorithms is chosen due to two reasons: (i) it avoids premature convergence, and (ii) it has optimal search capbility. Authors in [32] used an improved version of DE (EDE), which is proposed in [33]. The work done in [32] is enhanced in terms of accuracy of the trial vector generation and the convergence rate. Therefore, mEDE is used with SVM model to optimally tune and select control parameters, as shown in Figure 2. The SVM with mEDE is depicted in Figure 3. A brief discussion is given below: In [32], the trial vector V for i th individual in t generation is represented as follows: where u t (i, j) is the mutant vector, and and x t (i, j) is the parent vector. In Equation 30, FF () represents the fitness function ranging between 0 and 1, and rand() represents the random number that lies between 0 and 1. Based on X t (i) and Y t (i), the next generation X t+1 (i) offspring is generated as: From the above Equations 30 and 31, it is obvious that the selection of next generation t + 1 offspring depends on trial vector of the previous generation that relies on rand() and FF() functions. The EDE algorithm in [32] updates load values by comparing random number rand() with fitness FF(). This random updating of load is a big puzzle. Thus, this issue is resolved by removing the dependence of offspring selection on the randomly generated number. The process of updating load values is devised by comparing the fitness function of the candidate load value with the previous load value. Thus, the new load values will become optimal and will contribute to the improvement of the forecast accuracy. The devised modifications in Equation 30 are as follows: In this view, the fitness function of parent and mutant vectors is defined in [32] as: In fitness functions of Equations 33 and 34, it is assumed that each mathematical operator, i.e., division and addition, requires 1 unit of time to execute. Since Equations 33 and 34 will take 5 units of time in each iteration to execute, therefore, according to [32], the total iterations for EDE algorithm are 100. During each iteration, one fitness function is calculated by EDE in 500 units of time and two fitness functions are computed in 1000 units of time. Thus, to alleviate the execution time and improve the convergence speed, the modifications in the fitness functions in Equations 33 and 34 are as 96218 VOLUME 8, 2020 follows: Using Equations 35 and 36, the algorithm takes 400 units of time to calculate two fitness functions in 100 iterations. In this way, the convergence speed of the EDE algorithm used in [32] is enhanced. The pseudo-code of the proposed framework is presented in detail in Algorithm 1.

IV. SIMULATION RESULTS AND DISCUSSION
To evaluate the validity and applicability of the proposed FA-HELF framework, extensive simulations are performed using MATLAB. For this evaluation, the hourly load data of the ISO-NE control area is used. The used dataset includes 5-year (from 2008 to 2013) historical load data with hour resolution, and is publicly available at [34]. The hourly load data is split into three datasets: training, testing, and validation, as depicted in Figure 4. The 80% load data is for training, and the remaining 20% is kept for testing and validation purposes, respectively. For validation, the FA-HELF framework is compared with benchmark frameworks like F-RBF-CNN [9], SDPSO-ELM [35], and SSA-SVM-CS [36] in terms of convergence speed and accuracy. These frameworks are selected because of their architectural resemblance with the proposed FA-HELF framework, which is needed for a fair comparison. The simulation parameters are listed in Table 2 and are kept the same for the proposed and benchmark schemes. The detailed description of the simulation results is presented as follows: In feature engineering, the first GCA is applied to the selected abstractive features from the hourly load data of the ISO-NE control area during 01-01-2008 to 31-12-2013 time horizon by calculating the correlation between features and the target load. The purpose is to remove irrelevant features. As the correlation threshold grade (σ ) is set to 0.5 by [28],  the features having a value less than this threshold σ are discarded and the features having a value greater than σ are reserved. The importance value of the reserved features is evaluated through two evaluators, γ and κ, as described in Algorithm 1. We observed that with the increase of feature selection threshold σ , large features are dropped, which leads to the best training speed and worst forecast accuracy.
Then, the radial basis KPCA in feature engineering is used to extract principal components and eliminate redundant information within the reserved features. The comparison of KPCA, PCA, and different Kernels in terms of cumulative contribution is depicted in Figure 5. The radial basis KPCA can extract principal components and the cumulative contribution approaches 95%, as depicted in Figure 6. Thus, among the different Kernels, radial basis function is chosen as the Kernel for KPCA because radial basis KPCA distributes data points along coordinate axes to extract principal components, which contributes in the accurate load forecasting.
A learning curve of error vs the number of epochs for SVM based forecaster is depicted in Figure 7, which enables us to evaluate whether the selected model is learning or Perform optimization on regularization risk function based on mEDE to optimally tune parameters using Equations 30-36 end Return fast and accurate results end memorizing the data. At the start, when the number of epochs is zero, the error is maximum, indicating that the SVM based forecaster is not well trained. When the number of epochs increases, the error decreases, indicating that the SVM based forecaster is training. During this training course, a point is reached where the error is not decreased any further with the increase in epochs. That point is known as saturation point and the SVM based forecaster is well trained by that  time. The simulation results of the well-trained network for hourly electrical energy consumption forecasting, weekly electrical energy consumption forecasting, and monthly electrical energy consumption forecasting are discussed at the same time.

Algorithm 1 Pseudo-Code of the Proposed FA-HELF Framework for Energy Consumption Forecasting
The performance evaluation of FA-HELF framework and the benchmark schemes such as F-RBF-CNN [9], SDPSO-ELM [35], and SSA-SVM-CS [36] in comparison with actual electrical energy consumption pattern with an hour time horizon is shown in Figure 8. The statistical analysis of the proposed FA-HELF framework and benchmark frameworks in terms of mean absolute percentage error (MAPE) is listed in Figure 3. The proposed FA-HELF framework based forecasted electrical energy consumption pattern is closely related to the target electrical energy consumption pattern, which demonstrates that significant improvement in forecast accuracy is obtained when the optimization module and feature engineering module are integrated with the forecaster module based on SVM. The SSA-SVM-CS framework outperforms both F-RBF-CNN and SDPSO-ELM in terms of MAPE. Moreover, F-RBF-CNN outperforms SDPSO-ELM in terms of MAPE. The superior performance of the proposed FA-HELF framework is due to the integration of feature engineering and optimization module with the forecaster module based on SVM. Feature engineering avoids irrelevancy and redundancy, and the optimization module helps to minimize the error by optimizing hyperparameters. Table 4 illustrates statistical evaluation of forecasted electrical energy consumption and target electrical energy consumption in terms of accuracy, where FA-HELF and benchmark frameworks: F-RBF-CNN [9], SDPSO-ELM [35], and SSA-SVM-CS [36], are chosen for analysis. The table shows that the average MAPE for a specific day (28 May 2013) is significantly reduced in case of the proposed FA-HELF framework. The MAPE of the proposed FA-HELF framework, SDPSO-ELM, F-RBF-CNN, and SSA-SVM-CS, is 0.410%, 1.655%, 0.988%, and 0.899%, respectively. Hence, FA-HELF has improved accuracy of 75.22%, 58.50%, and 54.39% in comparison to the benchmark frameworks SDPSO-ELM, F-RBF-CNN, and SSA-SVM-CS, respectively. Figure 9 illustrates the evaluation of target electrical energy consumption vs forecasted electrical energy consumption for the proposed FA-HELF framework and benchmark VOLUME 8, 2020  frameworks for the week time horizon of 12/22/2013 to 12/29/2013. The week ahead results provide electrical energy consumption forecasting for different days including working days, holidays, and weekends. Results demonstrate that the proposed FA-HELF framework enhanced the forecast accuracy by 76.80%, 58.56%, and 55.39% as compared to SDPSO-ELM, F-RBF-CNN, and SSA-SVM-CS, respectively.
The forecasted electrical energy consumption vs target electrical energy consumption evaluation for the month ahead time horizon (11/01/2013 to 11/30/2013) is illustrated in Figure 10. The monthly forecasted results demonstrate the significant enhancement in forecast accuracy. This improvement in forecast accuracy is due to the integration of both prior feature engineering and post mEDE algorithm based modules with SVM based forecaster. The simulation results show that the FA-HELF framework improved the forecast accuracy by 77.65%, 58.66%, and 57.39% as compared to SDPSO-ELM, F-RBF-CNN, and SSA-SVM-CS, respectively.
The accuracy results in terms of MAPE of the proposed FA-HELF framework and benchmark models like F-RBF-CNN, SSA-SVM-CS, and SDPSO-ELM are for different months of the year 2013, as listed in Table 5. The simulation results in Table 5 demonstrate that the proposed FA-HELF framework and benchmark models like SDPSO-ELM, The performance analysis of FA-HELF and benchmark frameworks like F-RBF-CNN [9], SDPSO-ELM [35], and SSA-SVM-CS [36] in terms of computational time is depicted in Figure 11. The individual models, ELM, CNN, and SVM, without the integration of both feature engineering and optimization modules have low computational time and worst error performance. However, when both feature engineering and optimization modules are integrated with these individual models, the computational time is increased and the error is reduced due to the tradeoff between accuracy and convergence rate. The statistical evaluations of the proposed FA-HELF framework and benchmark frameworks in terms of forecast errors and computational time are listed in Table  6. The average values of computational speed and forecast error for both individual (ELM, CNN, and SVM) models and hybrid (F-RBF-CNN, SDPSO-ELM, SSA-SVM-CS, and FA-HELF) models for daily, weekly, and monthly time horizon are listed in Table 6. The first row, second row, and thirdrow show daily, weekly, and monthly evaluations of models (both individual and hybrid), respectively, in terms of computational speed and forecast error. The computational speed of individual models without the integration of both feature  Thus, the feature engineering module rectifies the feature space by removing redundant and irrelevant features and the optimization module based on the mEDE algorithm tunes the control parameters of the SVM, which ensures accurate electrical energy consumption forecasting.
Robustness evaluation of the proposed FA-HELF framework and benchmark frameworks like SDPSO-ELM, F-RBF-CNN, and SSA-SVM-CS is depicted in Figure 12. The evaluation is conducted by adding error (noise) to each feature and observing the accuracy of each scheme. The proposed FA-HELF framework is more robust than the benchmarks frameworks like SSA-SVM-CS, F-RBF-CNN,   and SDPSO-ELM as illustrated in Figure 12, because the noise within features has little influence on accuracy, and therefore, less important and irrelevant features are dropped during the feature engineering phase. Thus, the proposed FA-HELF framework is also robust against noise in the features.

V. CONCLUSION AND FUTURE WORK
In this work, a non-linear framework (FA-HELF) is proposed to perform efficient and accurate energy consumption forecasting for energy management in smart grid. The proposed framework combines random forests and relief-F with GCA to choose the most appropriate features, and uses radial basis KPCA for the extraction of features to overcome the dimensionality reduction problem. The selected and extracted features provide the most relevant information about electrical energy consumption to train SVM based forecaster. In addition, the mEDE optimization algorithm is used in the proposed framework to optimize the hyperparameters for improving the forecast accuracy along with minimizing the computational time.
To analyze the validity of FA-HELF framework, load data with hour resolution of the ISO-NE control area is used. We can draw the following conclusions based on our results and evaluation: (i) GCA and radial basis KPCA feeds the most relevant and desired features to the SVM based forecaster and removes the redundant and irrelevant features from the input data to improve the forecasting performance; (ii) although feature engineering assists in the improvement of forecast accuracy, mEDE optimization algorithm contributes in the improvement of both convergence rate and forecast accuracy; (iii) the proposed FA-HELF framework is better as compared to the benchmark frameworks like SDPSO-ELM, F-RBF-CNN, and SSA-SVM-CS. Simulation results show that the developed framework is robust, fast, and powerful to forecast electrical energy consumption for efficient energy management. Thus, it is believed that the proposed FA-HELF framework is scalable and reliable and can be applied in real-life for efficient energy management.
In future, some other advanced heuristic techniques for suitable parameter selection can be integrated with SVM based forecaster for fast and accurate electrical energy consumption forecasting. Moreover, the advanced heuristic techniques can be integrated with advanced deep learning models and the work can be extended to medium-term and long-term electrical energy consumption forecasting.  ZAHID WADUD received the B.Sc. and master's degrees in electrical engineering from the University of Engineering and Technology (UET) at Peshawar, Pakistan, in 1999 and 2003, respectively, and the Ph.D. degree from the Capital University of Science and Technology, Islamabad, Pakistan, with a thesis entitled Energy Balancing With Sink Mobility in the Design of Underwater Routing Protocols. He is currently working as an Assistant Professor with the Department of Computer Systems Engineering, at UET, Peshawar. He published over a dozen state-of-the-art publications in the renowned international journals. His research interests include wireless sensor networks, energy efficient networks and subsystems, the mathematical modeling of wireless channels, embedded systems, and sensors interfaces.