Hybrid ANN and Artificial Cooperative Search Algorithm to Forecast Short-Term Electricity Price in De-Regulated Electricity Market

Smart grid has evolved into a viable platform for participants of electricity market to effectively regulate their bidding strategies based on demand-side management (DSM) models ascribed to its immense technological advancements in recent years. Reliability of system operation as well as capital cost investments can improve greatly with responsiveness of market participants. In this regard, efficient design, implementation, evaluation of numerous demand response measures and development of robust short-term price forecasting in the day-ahead transactions are of the utmost importance. Accuracy and efficiency of the day-ahead price forecasting process are complex challenges in deregulated electricity market. The unstable nature of electricity price compared to load series causes lower accuracy. Therefore, this research proposes a hybrid method for electricity price forecasting via artificial neural network (ANN) and artificial cooperative search algorithm (ACS). In parallel, a feature selection technique based on the combination of mutual information (MI) and neural network (NN) is developed in this study to select the input variables subsets, which have substantial impact on forecasting of electricity price. Actual data sets are collected from Ontario electricity market of the year 2017 for the verification of simulation results. Finally, the simulation results validated the premise of the proposed hybrid method through enhanced accuracy compared to the results acquired by implementing hybrid support vector machine (SVM) and hybrid ANN optimization methods.

The recent electricity grids are expected to maintain high standards of quality to meet the growing diversity in demand whilst providing a constant and reliable supply. These intricate challenges are the driving force behind smart grid technologies to be constantly evolved and developed. Realization of the smart grid technologies faces complex challenges such as optimization of distributed generations (DGs) capacity, transmission and distribution (T&D) systems and efficient energy storage technologies. These challenges require extensive research and substantial investment. Therefore, electricity price forecast plays an instrumental role in today's advanced electricity market as well as smart grid operation. This forecasting helps every individual generator to determine the optimal bidding layout. Furthermore, decision of joint agreement and investment in a new generation facility in the long run are highly influenced by the price forecasting [1]. It is imperative to forecast electricity price for the generation companies or Independent System Operator (ISO) as well as different level of customers and investors. Basically, different bidders in competitive electricity market require the future electricity prices to gear up their profit. Since the recent energy markets are highly deregulated and nonlinear, the price forecasting has become more complex compared to previous days. Due to the nonlinearity and instability of this system, the accuracy of price prediction has become lower [2]. Moreover, it leads to an explosive electricity market by affecting the bidding policies. Due to the uncertainty nature of electricity market price, the supply and demand side managements are experienced with numerous difficulties in day-ahead electricity market [3]. The power suppliers may receive more privileges in their short-term prediction of their rational offers by knowing the preceding information of electricity market price variations [4]. Moreover, it helps the power suppliers to setup their bidding strategies to enhance their profit in maximum scale. On the other hand, it is very important for demand side management to have the knowledge of market price changes and variations to develop the short-term operational planning. Therefore, in recent years, the researches in electricity market for price forecasting have become more significant.
Forecasting techniques can be classified into three categories in accordance to the forecasting framework i.e. statistical models, time series methods and Artificial Intelligence (AI) based approaches. Among different techniques, AI-based forecasting approaches have gained significant traction in recent years as these approaches offer a remarkable advantage of assuring a certain level of estimation accuracy compared to high fluctuation of independent and dependent variables in the statistical model [5]. For instance, ANN was extensively adopted among different AI-based approaches in [6] to forecast electricity demand and price. However, there are criterions like convergence speed, weight adaptation algorithm and network architecture selection over which the ANN based methods' accuracy and robustness are heavily dependent. Support vector regression (SVR) was used in [7], [8] for electricity price forecasting as it is capable of adapting and encapsulating complicated relationships with the input data. Conversely, AI techniques have the capability to handle nonlinearity issues related to short term electricity price forecasting as these methods can remove different discriminators in complex environment, and from past experience they can recall, learn and store information which have made them popular in the area of electricity price forecasting.
Development of hybrid techniques has emerged as feasible approach to overcome the nonlinearity involved in short-term forecasting, which increases accuracy and improve reliability. In [9], another technique has been presented in which ARIMA procedure and wavelet transform (WT) are combined adopting the power market prediction. To be more precise, in this technique, first historical data is separated by wavelet transform. Then, the ARIMA technique is applied and the inverse wavelet transform is employed respectively to obtain the final prediction outcomes. In [10], ANN has been used to forecast the electricity price where the price and quantities are found by using the estimated future parameters and history. Exploiting fuel cost and demand as entry data, a three-layer back propagation (BP) neural network (NN) has also been introduced in this study. A combination of the probability neural network (PrNN) and orthogonal experimental design (OED) has been adopted in [11] to forecast the power price. The PrNN and OED methods have been used for classification and locating the best variable respectively, which eventually increase the forecast precision. The support vector machine (SVM) and the projected assessment of system adequacy (PASA) have been used in [12] for prediction of price according to the price and load history, and the entry data respectively. The work has also used a regional data of south wales to carry out their evaluations.
In [13], the fish swarm algorithm (FSA) has been selected and used as a time series forecast procedure to select the SVM variable. The work used the power price as data entry. Wavelet packet transform (WPT) and feature selection have been combined in [14] to introduce least square support vector machine (LSSVM) algorithm for prediction purpose. A probabilistic power price prediction method by combining support vector regression (SVR) and ARIMA method has been proposed in [15], [16]. A combination of three methods as WT, radial basis function neural network (RBFNN) and ARIMA has also been used in [17]. In [18], another mixed model has been presented based on interaction of load and price prediction. Another model for price-directed demand response has been introduced in [19], which Virtual Budget (VB) approach is developed for couples' price and load prediction, and let automated morphing of a consumer's electricity demand. Another hybrid model has been proposed in [20] to predict load and price. The work applied a hybrid time-series and adaptive wavelet neural network (AWNN) to predict the price. In [21], different states of multi-block based forecast engine are applied for both price load forecasting purposes. In this work, mechanism of forecasting consists of multi-block neural network (NN) and was optimized by an intelligence algorithm to increase training time and forecasting capabilities. In addition, genetic algorithms, particle swarm optimization and cuckoo search technique have been proposed in [22], [23] to perform better NN training as network training is an important feature in ANN based price forecasting.
Though remarkable advancements in precise electricity forecasting have been achieved through ANN, SVR, ANFIS and hybrid techniques, a precise and more accurate method to enhance the accuracy of electricity price is still needed. Moreover, all the aforementioned research regarding electricity price forecasting performs well though the forecasting accuracy is still impoverished.
For instance, linearly structured time series are incapable of capturing the nonlinear patterns and the frequent regularity in underlying the data changes over time. Moreover, the input parameters for electricity price forecasting are carried out based on trial-and-error procedures and engineer experience. Few researches have been observed where AI techniques were employed for electricity price forecasting due to its excellent nonlinear modeling capability. Another vital aspect that enhances the accuracy and efficiency of the forecasting is the proper feature selection. However, the selection of proper features using the existing feature selection techniques for electricity price forecasting by considering price non-linearity has been found to be difficult and complex task, which made it essential to explore enhanced feature selection techniques.
Therefore, in this research, to resolve the aforementioned issues, a new hybrid approach for short-term electricity price forecasting is proposed. The proposed approach is carried out through the combination of artificial cooperative search algorithm (ACS) and artificial neural network (ANN) to enhance the precision of the forecasting even more. ACS is a metaheuristic algorithm, which has two-population. Unlike other metaheuristic methods, it has only a single parameter, which needs to be controlled and it does not possess high sensitivity towards the initial value of the control parameter. In addition, ACS operators exhibit high capabilities for exploitation of better results and problem exploration in search space due to its unique crossover and mutation process. Furthermore, mutual information and ANN techniques have been combined together to form a robust feature selection technique for enhancing the accuracy of the proposed hybrid ACS-ANN based price forecasting method. ANN has the potential of simulating data with nonlinear and complex relationships, which makes this approach preferable in the present study, since the feature selection will be based on the non-linear price signal.
To evaluate the accuracy of the methods proposed in this work, the Ontario electricity market was considered for the case studies. This is due the Ontario electricity market has been recognized as one of the unstable markets in the world as a result of its single settlement nature [24], [25]. Fig. 1 illustrates the relationship between hourly Ontario electricity price (HOEP) and hourly Ontario electricity demand (HOED) within one week. As shown in the figure, electricity price in Ontario electricity market is a function of electricity demand in the deregulated electricity market. There is a high competition for electricity price when electricity demand is very high and the generation is limited. Therefore, with the inherent correlation between electricity price and demand, prediction in smart grid environment such as Ontario electricity market is more complex than the conventional power systems. Hence, a novel prediction approach should be applied for this market to provide high accurate forecasts.
A summary of contributions of this research work is as follows: The main contribution of the study is to propose a hybrid electricity price forecasting technique based on efficient ACS algorithm along with ANN method for enhancing the accuracy of the price forecasting compared to existing forecasting methods. ACS has been used to search the most suitable biases and weights values of ANN for acquiring least error, which results in improved forecasting accuracy. Although ACS has simple structure, due to its effectiveness on solving multidimensional functions, it has been widely used in various numerical problems. Furthermore, only one factor is required to control in ACS algorithm. The second contribution of the work is related to the feature selection problem. To address this issue, a robust hybrid feature selection technique based on the combination of NN and mutual information techniques has been developed. Here, NN has been used to choose the best subset of features, where MI has been applied to extract input variables with minimum redundancy and maximum relevancy. By adding a penalty term to the error function of the network, redundant network connections can be distinguished from those relevant ones by their small weights when the network training process has been completed. Pertinence and precision of the proposed hybrid forecasting method are evaluated through comparison of achieved results with that of hybrid SVM and hybrid ANN methods, whose parameters are optimized by particle swarm optimization (PSO) and Cuckoo search algorithm (CSA).
Subsequent sections are organized as follows: Section 2 explains the short briefing on structure and evolution process of ANN and ACS respectively. Section 3 presents the development of short-term EP forecasting models. It also describes the proposed feature selection to provide the most influential features on short-term EP forecasting through filtering input variables using MI in first stage and developing neural network (NN) technique in second stage. Section 4 provides comprehensive discussion and results where statistical analysis is provided to ensure that the proposed approach is strongly suitable and applicable for future electricity price forecasting in de-regulated electricity market. Finally, section 5 concludes the work.

II. ARTIFICIAL NEURAL NETWORK (ANN)
Artificial neural networks (ANNs) are inspired by human brain consisting of millions of interconnected cells to interpret and process the information [26], [27]. Thus, a learning process will be simulated within the interconnected cell and collectively performs tasks that surpass the supercomputers with high-level computational capacity. As such, ANN is developed, which consists of a highly interconnected network known as neurons. Fig. 2 shows the architecture of ANN, which consists of input layer, hidden layer and output layer that are interconnected to each other. In each layer, it contains several numbers of neurons. For example, in input layer, the number of neurons is equal to the number of input data. For the output layer, the number of neurons represents the desired outputs or results. Whereas, in hidden layer, there is no limit to the number of neurons. However, these numbers of neurons will affect the quality of learning process, learning time and generalization process. The neurons are communicated with each other through the weighted connection consisting of weight and bias. In each neuron, the associated transfer function will interpret the input signal with respect to the output signal. Besides, the numerical weights and biases within the weighted connection are optimized in order to generalize and fit between the input and output data. Then, an accumulated experience and information during the ANN learning process is stored in the form of network file. The saved network file will have the capability to respond to the new input data, which does not involve during the learning process.
MLP networks are usually applied to perform supervised learning tasks, which involve iterative training methods to adjust the connection weights within the network. Generally, to achieve the preferred level of approximated accuracy, a number of passes is required. Standard error back-propagation algorithm is used for the adjustment of the correction weights, where the gradient decent method is applied to minimize the total error [28]. Back-propagation is a systematic method used for training MLP networks and its schematic diagram is briefly described in Table 1.

A. INITIALIZE WEIGHTS
The numerical or initial estimates of the linking strength between all neurons (w ij ) are allocated arbitrarily. Moreover, the activation threshold (w 0j ) initial value is also randomly VOLUME 7, 2019 assigned to each neuron. This activation threshold is similar to an autonomous term of the linear combination of the outputs from the previous neurons. It is considered as a weight allotted to a fictitious neuron known as bias unit with an output value of 1. Therefore, the rule of the bias input (memory) is to shift the origin of activation function for better learning.

C. FEED-FORWARD COMPUTATION
Starting from the first hidden layer and propagating toward the output layer, each input unit (x i ) assigns initial weight (w i ) and broadcast this weight to all neuron in the first hidden layer. i. Each neuron in the first hidden layer (n j ) sums its input weights by ii. The activation function (f ) process the output signal of each neuron (n j ) in hidden layers by Generally, the activation functions are one of the linear, logistic sigmoid and bipolar sigmoid (hyperbolic tangent) activation functions. The linear activation function is In MLP networks, if the neurons have linear activation functions, the capabilities of the network is no better than a single layer network with linear activation function. Thus, nonlinear activation functions (sigmoid functions) are used, which usually limit the output signal of each neuron to the values between two asymptotes.
The logistic sigmoid function is the most widely used activation function in MLP, given by The hyperbolic tangent function as formulated by Eq. (5) is another sigmoidal function used as activation function for neurons in hidden layer of MLP networks. The hyperbolic tangent function is closely related to the bipolar sigmoid function as iii. The output signal j th neuron of total (N ) neuron in hidden layer (L) denoted by (n L j ) is transferred to next hidden layer as follows: iv. The output signal of each neuron in the last hidden layer is propagated toward the output layer as follows: where (g), (h ij ) and (h 0j ) are the activation function of output layer known as transfer function, the connection strength (weight) between i th neuron in last hidden layer and j th neuron in output layer, and the weight assigned to the bias unit of j th neuron in output layer respectively.

D. CALCULATE THE OUTPUT ERRORS
The error information term of each neuron in output layer is computed as follows:

E. BACKPROPAGATION OF THE ERROR
The error backward is propagated to the input layer through each hidden layer using the error information term. The backward weight correction term from output layer to last hidden layer and its bias correction term are computed by where α is the learning rate. The error information term of each neuron in last hidden layer is calculated from multiplying the summation of its backward weights correction by derivative of its activation function as follows: The backward weight correction term from the hidden layer (L) is sent to its hidden layer below (L-1) and its bias correction term are computed as follows: The error information term of each neuron in hidden layer is calculated as follows: The backward weight correction term from first hidden layer is sent to the input layer and its bias correction term are computed as follows:

F. UPDATE WEIGHTS AND BIASES
Each neuron in the output layer updates its bias and weights by Each neuron in hidden layer updates its bias and weights by Each neuron in first hidden layer updates its bias and weights by Finding the optimal values of different weights and biases is achieved by training of the networks. Generally, to find the appropriate values of weights and biases of the ANN, different techniques are used. The optimum training of ANN is achieved by ACS.

III. ARTIFICIAL CORPORATIVE SEARCH ALGORITHM
The ACS is an algorithm that involves two-population search method and it is based on coevolution process. A few drawbacks of metaheuristic approaches have been overcome by this ACS; e.g. the parameters of initial values are over sensitive, have too many control parameters and when more time is required to compute between exploitation of better results and exploration of problem's search space [29]. ACS can be controlled by only one parameter and this is not sensitive to the initial value. In order to keep balance between exploration and exploitation, crossover and mutation are utilized in this algorithm. The functions of these operators are different in the crossover and mutation strategies than other methods, such as GA and DE. The advantage of ACS is that it has a memorization process to explore the feeding areas. It comprises of seven stages: initialization, selection of Predator, selection of Prey, mutation, crossover, boundary control, and export the best individual. Table 2 shows the generic structure of ACS.

IV. DAY-AHEAD SHORT-TERM EP FORECASTING
From the relation between electricity price and demand situations, it exhibits a time series spread over hourly intervals, which can be seen in a competitive electricity market. On the contrary, the price of electricity is a function of demand of electricity. The price of electricity depends on its present value t as well as electricity price's and demand's past values. It can be expressed as (30), as shown at the bottom of this page, where NL ED is the electricity demand lag order, EP (t) and ED (t) are the price and demand of electricity at instantaneous time t assuming them as a time series and NL EP is the lag order number for the price of electricity.
This work is focused on the implementation of ANN-ACS method for hourly Ontario electricity price (HOEP) forecasting. From [24], the input HOED and HOEP past data sets for the year of 2017 have been acquired. Historical data has variable range and Eq. (31) is used to normalize the independent and dependent variables. Normalizing specific data entails calibrating the data collected on distinct scales to an estimate common scale, usually applied before data processing.
where normalized data is represented byZ , the data to be normalized is represented by Z and the hourly interval is represented by t.
For the purpose of electricity price forecasting in this study, only one week of exogenous variables (NL EP = NL ED = 168) with hourly lagged values are assumed, where, total 336 exogenous variables lagged values are available. Overburdening the machine learning algorithms with excess amount of features results in a sluggish learning process, rendering a deplorable performance and overfitting the training data. Therefore, only features significantly affecting the output (for electricity price forecasting process) should be assigned to machine learning algorithm.
In the context of statistics and machine learning, feature (variables or predictors) selection, also known as attribute selection, variable selection or variable subset selection is a method for selecting a subset of relevant features in model  construction. The objective of using feature selection techniques is a three-fold [30]: 1. Improving the prediction performance of the predictors by reducing overfitting (formally, reduction of variance).
2. Providing faster and more cost-effective process to construct the model (facilitate learning process).
3. Providing a simplified model that makes it easier to interpret (improving the generalization ability).
For electricity market price forecasting, mutual information (MI) technique has been broadly employed in [31]. However, this technique is facing difficulties due to the lagged values of the candidate input comprising of price, load demand and other variables provided by the electricity market. Thus, the individual probability distribution and the joint probability distribution of the candidate input are difficult to be obtained. Besides, it is noted that the electricity price is a time variant signal. Therefore, long history of the candidate input is not relevant to be used as the market conditions evolve every time. As such, it can mislead or give inaccurate price forecast process due to the lack of information values [32].
The main purpose of mutual information is to obtain the mutual correlation between two arbitrary variables X and Y , while, by using one variable information amount is achieved in this technique with regard to another variable which is random in nature. In other words, the mutual information is zero if variable X does not have any information related to variable Y and vice versa. As such, these two random variables are independent. High mutual information is obtained if variable X is a deterministic function of variable Y as well as variable Y as a deterministic function to variable X [33]. The link between MI and CE is depicted in Fig. 3. It can be seen that X and Y are closely related and dependent on each other when MI is large.
Apart of the entropy, conditional entropy (CE) is also observed. It is a measure of the average uncertainty of the first random variable after the second random variable. As shown in Fig. 3, P XY (X,Y)'s joint probability distribution is used to achieve mutual information among X and Y and MI(X,Y) random variables' as the random variable entropy is complexly related with the theme of mutual information.
Let  of MI, all the input features are sorted in descending order. Strong dependency between each variable of input and output is represented by higher value of MI. Lower MI value than the threshold TH is removed as it indicates less significant influence on the output while the remaining of the input features will form a subset, SX ⊂ X .
In the second stage, elimination of the redundant features is done and the neural network algorithm focusing on finding and removing those features is applied. Fig. 4 shows the utilization of the applied algorithm of three-layer feed forward neural networks. The input to the hidden layer and from the hidden to the output layer is weak for the redundant input. Therefore, it will be eliminated due to having less significant effect on the network accuracy. Commonly, the error function defined during the training process is defined as follows: where n represents the number of observation, t represents output of the network and y represents the real value. In order to detect irrelevant and redundant features, a penalty function is added to the error function as per Eq. (33), as shown at the bottom of this page, where α 1 , α 2 and β are coefficients that control the influence of the penalty function. h is the number of hidden units, nf is the number of features selected in the first stage, is the weight connecting from l-th attribute to m-th hidden unit, and is the weight connecting from m-th hidden unit to network output. The NN model will first evaluate the accuracy of the network, N using the set of input features SX = {X 1, . . . , X nf }, SX ⊂ X , nf < (NL EP +NL ED ). Then, the number of features  is sequentially reduced to form a new set of input features and subsequently evaluate the accuracy of that network, N k where k = {1, 2, . . . ., nf }. The accuracy of the network will be computed to determine the total number of features that can be eliminated. The steps of applied feature selection are outlined as follows: 1. Given the input vector SX = {X 1, . . . , X nf }, SX ⊂ X is divided into two data set, which are training set, SX tr and testing set SX ts . The network N is trained and the accuracy of the trained network is calculated for both SX tr and SX ts . In this algorithm, the value of α 1 , α 2 and β are set to 10 −1 , 10 −4 and 0.03 respectively as mentioned in [34]. Otherwise, the values will be divided by 1.1. This allows significant input to have higher connection magnitude after the network is retrained. It is important to note that the input feature, which has weak connection magnitude, will be eliminated in this algorithm.
Feature selection technique is of utmost concern for selecting the important input variables. By selecting a feature, the contributions of the final predictor variables (HOED and HOEP in preceding hours) in the best NN model were evaluated. After developing and controlling several models with different combinations of input variables, these variables were identified. A hybrid feature selection is applied in order to reduce the running time. In the first stage of hybrid feature selection, relevancy threshold of TH = 0.46 has been chosen for filtering the redundant features, and after filtering 60 relevant features are selected. In the second stage, (NN) is used in this study to select the input variables subsets, which have substantial impact on forecasting of electricity price. In this stage, hybrid MI and NN has been used to choose 31 dissimilar and most relevance features among the previously selected 60 candidates, which for the process of forecasting have been used as input. The description of the implemented feature selection method is presented in Fig. 5(a). The input variables subsets chosen by (MI + NN) as follows:

V. SIMULATION RESULTS AND DISCUSSION
In this paper, the electricity price forecasting (EPF) accuracy of Ontario mainland is forecasted by employing ANN-ACS, which is known as the one of the most volatile electricity market. Significant features are determined by developing a feature selection (MI + NN) as input for forecasting analysis in this section. Moreover, to evaluate the usefulness of ANN-ACS for short-term EPF accuracy, the proposed method is compared with well-known AI-techniques that include ANN-PSO, ANN-CSA, ANN, SVR-ACS, SVR-PSO, SVR-CSA and SVR. Fig. 5(b) shows the methodology used for forecasting the short-term electricity price. Sequential steps to obtain AI-based models for short-term EP forecasting are carried out for all models as follows: Step 1: Considering one month in each season (i.e., winter (February), spring (May), summer (August) and autumn (November)), the efficacy of the applied methods has been evaluated for EPF in different seasons due to seasonal effects. The Ontario electricity market particular data (HOEP and HOED) for 2017 is selected by taking independent variables and HOEP as the dependent variable. Both of them consist of two subsets, at first for training of design phase, the first three weeks hourly data is used and where for testing phase the last week data of each month is exploited.
Step 2: Designing training phase entail derivation of algorithms responsible for connecting the input variables to the output variables and Eq. (31) is used to normalize the input and output variables to make the learning process swift.
Step 3: To predict the electricity price (EP) precisely, metaheuristic methods is implemented for seeking the optimal coefficients of ANN and SVM by minimizing the cost function as follows: where EP (t) observed and EP(t) forecasted are the actual and predicted electricity price respectively and N represents the number of observation.
The coefficients of ANN and SVM models (w) are determined by Step 4: The purpose of designing a testing phase is to evaluate the model performance on the results of AI approaches applied on datasets having no function in building models. Various assessment criteria are applied to quantify the performance of the prediction models, such as root mean square error (RMSE), mean absolute percentage error (MAPE), and Thiel's inequality coefficient (U -statistic) as:

RMSE
U -statistic always generates binary results [0, 1], where zero represents higher forecasting precision and one represents estimation is as inaccurate as a naï ve guess. Appropriateness description of a given data series obtained through models is ensured through the whiteness test, also known as the Durbin-Watson test [26], acquired after a confirmatory analysis. The main objective of this confirmatory analysis is that it can confirm the whiteness of estimated residuals (e(t)) and also confirms the un-correlation between them. Residuals autocorrelation function (RACF) is used to provide this calculation defined by: (e(t)) 2 (39) Results of RACF are ranged between 0 and 1. RACF value falls outside a confidence level if it substantially differs from zero, implying un-correlation (whiteness) of residuals and hinting that a crucial independent variable has been excluded from the investigated model.
Successful methodologies in the literature of electricity price forecasting are taken under consideration to define the control parameters of the simulated methods due to highly problem dependent parameter setting of AI based methods and the lack of consensus regarding their optimal values in this study. All parameter settings of the applied methods and ANN-ACS model are summarized in Table 3.
The machine learning methods performances in winter season of 2017 for Ontario EPF is presented in Table 4. From the results tabulated in Table 4, it can be said that the whiteness of the estimated residuals for all developed models has been validated by the calculated RACF values, which are in an affirmed confidence range. Moreover, all developed models are able to describe the given set of data sufficiently. The analysis that has been done on Ontario electricity market regarding the evaluation of forecasting accuracy of methods concludes that according to multi-criteria decisions using the mean rank of the methods, each indicator (absolute error, RMSE, U -statistic and MAPE) is ranked as ANN-ACS > ANN-PSO = SVR-ACS > ANN-CSA > SVR-PSO > SVR-CSA > ANN > SVR. The comparison of the developed models with the existing similar models concludes that ANN-ACS approach performs exceptionally   better. Table 4 also shows that the ANN-ACS based method is superior term of MAPE = 4.58 %, U -statistic = 0.04, RMSE = 0.08 and absolute error = 25.99. Fig. 6 presented the performance test of ANN-ACS method in winter season of 2017 when the training is executed for both testing and design phase. Table 5 shows a tabulated performance of the applied machine learning techniques in Ontario region for EPF in spring season of 2017. According to the RACF values, the conclusion can be drawn that the estimated residuals of all obtained models are white at a confidence interval level. The accuracy of the methods is ranked as ANN-ACS > ANN-PSO > ANN-CSA > SVR-ACS > SVR-CSA > SVR-PSO > SVR > ANN, which is based on multi-criteria decisions adopting the mean rank of the methods for each indicator (U -statistic (0.02), MAPE(1.2%), RMSE (0.03), and absolute error (8.18)). Hence, the most efficient model is ANN-ACS. Fig.7 illustrates the performance of ANN-ACS in spring season for the same region.
The performance results of different machine learning methods are tabulated in Table 6 for EPF for the same area in summer 2017. The RACF values, like in summer season, declare the whiteness of estimated residuals at a confidence interval level for all obtained models. The forecasting accuracy of these methods is ranked as ANN-ACS > SVR-ACS > ANN-CSA > ANN-PSO > SVR-CSA > SVR-PSO > SVR > ANN in terms of multi-criteria decisions, which the mean rank of the methods for each indicator (U -statistic,  MAPE, RMSE, and absolute error) is used to extract this ranking. To be more precise, U -statistic, MAPE, RMSE and absolute error are 0.03, 2.62%, 0.07 and 19.54 respectively for ANN-ACS. Additionally, ANN-ACS can be concluded as the most promising and advanced model than any other applied methods used for Ontario's short-term electricity price forecasting. The performance analysis of ANN-ACS for EPF in summer 2017 is depicted in Fig. 8 during both training and testing design phases for Ontario region.
The ANN-ACS performance for EPF of Ontario region in autumn season of 2017 is compared with other AI-methods for the purpose of further examination of solution methodology as shown in Table 7. From the values obtained for RACF, it can be shown that the estimated residuals of all models are uncorrelated and the obtained models sufficiently describe the given set of data. Based on Table 6, the forecasting accuracy of the applied methods is ranked as ANN-ACS > ANN-CSA > ANN-PSO > SVR-ACS > SVR-CSA > ANN > SVR-PSO > SVR. By using the mean rank of multi-criteria decisions methods, the ranking for each indicator (absolute error, RMSE, U -statistic and MAPE) in the whole set has been performed. According to the findings, from the comparison of studied methods for electricity market in Ontario region in autumn season, ANN-ACS approach performs much better than other methods in terms of electricity forecasting since it has higher precision.
The values of U -statistic (0.04), MAPE (3.79%), RMSE (0.08), and absolute error (29.73) are presented in Table 7.   Fig. 9 depicts the ANN-ACS performance for forecasting the electricity price in autumn season of 2017 during the training and testing phase for Ontario region. From the demand side management perspective, the negative price can be controlled in whole seasons by providing incentive for using the electricity in a particular time to curtail the consumption accordingly when the demand is low in order to stabilize the frequency and voltage of grid.
The validity of mathematical models developed by ANN-ACS is verified through the application of different statistical methods as external validation. To evaluate the performance of the developed model, following attributes are recommended [36]- [39]: I. If a model generates |R| > 0.8, a strong correlation exists between the observed and predicted values.
II. If a model generates 0.2 < |R| < 0.8, a correlation exists between the observed and predicted values.
III. If a model generates |R| < 0.2, a weak correlation exists between the observed and predicted values. The studied model's statistical factors for forecasting are computed for different month of Ontario mainland as well.
The develop model fulfill all the statistical requirements as tabulated in Table 8. Findings indicate that the developed model is a promising and optimistic approach to be implemented for forecasting the future electricity price in deregulated electricity market.

VI. CONCLUSION
In this research, for forecasting day ahead of electricity price, a hybrid ANN-ACS method has been developed, which has been verified with the Ontario electricity market based on hourly electricity demand and hourly electricity price.
• Feature selection is a fundamental element of machine learning algorithms. High predictive accuracy can be achieved through proper feature selection, which entails selection of relevant attributes of data amidst a vast pool of irrelevant and redundant features. In this work, by combining ANN and mutual information techniques, a hybrid feature selection technique has been proposed for the selection of optimum subset of features within a pool of 60 features, to be employed as input for direct prediction method. The robustness of the proposed technique is evident through efficient selection of the most suitable features by removal irrelevant and redundant attributes.
• The results have shown the robustness of the developed ANN-ACS model in Ontario electricity market.
In the case of electricity price forecasting, it provides a higher forecasting precision and simplicity compared to other AI methods in terms of MAPE = 4.58%, 1.2%, 2.62% and 3.79 % in winter, spring, summer and autumn respectively.
• In developing the sustainable smart grid in forthcoming days, the importance of the proposed approach is inevitable to EP forecasting. Therefore, the presented synthesis can be a profound contrivance to develop energy strategies for the electricity participants in bidding as well as for the EP forecasting researchers plainly.