Locational Marginal Price Forecasting Using Deep Learning Network Optimized by Mapping-Based Genetic Algorithm

,


I. INTRODUCTION
Electricity pricing has been a crucial indicator of all transactions in the power market since the reformation of the power industry [1], [2]. When prices are high, sellers should produce electricity to the pool market or buyers should use their own generating facilities. Sellers/buyers can change their bidding schemes to maximize their benefit and protect themselves from financial risk. These sellers (producers) and buyers (consumers) depend greatly on electricity price forecasting for planning and managing price risks. Many The associate editor coordinating the review of this manuscript and approving it for publication was Pierluigi Siano . factors must be considered in forecasting the price of electricity in the power markets. These factors include demand, supply, weather, and other variables that are related to the fuel markets [3], [4]. The volatility of electricity prices and large errors in the use of forecasting techniques in other markets should also be considered [5]. In the PJM power market, the locational marginal price (LMP) is used to charge the price bought and sold energy. LMP involves pricing congestion costs into energy transmission within a regional transmission organization (RTO) and considering bulk power system losses. Authorized by the federal government, PJM is responsible for electricity transmission systems and the operations of the wholesale electricity market in a particular VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ area. Commonly, LMP aggregates system energy prices, transmission congestion costs, and the cost of marginal losses. Over the last two decades, recommended price forecasting methods have had varying rates of success; they include statistical methods and artificial intelligence methods. Statistical methods include autoregressive integrated moving averages (ARIMA) [6]- [8] the similar days' method [3], generalized autoregressive conditional heteroskedasticity (GARCH) [8], [9] and other autoregression (AR) methods. These methods are linear predictors, which are only accurate when data vary slowly. The effectiveness of these methods is uncertain because of their limited ability to capture the nonlinear variation of electricity prices and rapid variations in price signals [10]. Artificial intelligence (AI) methods include artificial neural networks (ANN) [11], deep learning networks [12]- [15] evolutionary computing methods, and hybrid models or combinations of at least two methods [14], [16]- [19]. AIs can approximate any multivariate function to an expected degree of accuracy by adjusting weightings during updates and they can be used to extract the complex features of electricity prices. Hence, these AI methods have higher forecasting accuracy than statistical methods. Different evolutionary computing methods such as genetic algorithm (GA) [20]- [22] and particle swarm optimization (PSO) [4], [19], [22], [23], were used in conjunction with other algorithms to forecast electricity prices. These combinations represent excellent means of price forecasting because they combine a linear autocorrelation structure with a nonlinear component.
CNN and GA have already been combined in recent studies [24], [25], but these studies have focused on using CNN for classifying images and used evolutionary algorithms to optimize its parameters or to generate the optimal CNN network. Other combinations, such as a combinatorial neural network trained by a stochastic search method [26], were presented to study the datasets from PJM and Spain. The results were compared with those obtained using other methods and proved to be more extensive.
Many researchers have attempted to determine the optimal solution to a particular problem by translating mathematically using a fitness function of specific parameters. Such mathematical methods have become foundational for the various optimizations that are of interest today. Today, deep learning has been used to solve difficult problems, yielding results that are similar to or superior to those obtained by human experts. However, setting the parameters of a deep learning network can be difficult because their values control the learning process and determine the performance of a deep learning network.
Grid search [27]- [29] and random search [28], [30], [31] are traditional techniques of hyperparameter optimization. Both techniques are directed by performance metrics that are evaluated by cross-validation on the training and validation sets. Bayesian optimization [15], [17], [32]- [34] is used to optimize parameters by creating a probabilistic model of the functional mapping from the values of the parameters to the objective that is evaluated on a validation set.
Classical optimization techniques have difficulty in attaining the global optimum owing to their inability to handle nonlinearities and non-convexity. On the other hand, evolutionary algorithms (EA) are members of the family of population-based algorithms, which were developed to find quasi-optimal solutions in any complex search field. Particle Swarm Optimizations (PSO) [4], [19], [22], [23] and Evolutionary Strategy [24], [35] are developed to solve optimization problems of continuous variables without any constraints. Although variants of PSO (such as Discrete PSO) can deal with binary or discrete variables, it is still difficult to deal with inequality constraints (e.g., maximum and minimum limits of a variable). Consequently, an enhanced GA based on mapping encoding is proposed herein because GA can deal with binary variables and inequality constraints efficiently.
A novel hybrid electricity price forecasting method [20] was applied to historical data from the New England area. A set of relevance vector machines (RVM) was used for individual ahead-of-time price forecasting. Individual predictions are combined to form as a linear regression ensemble, whose coefficients are obtained as the solution to a single optimization problem. The solution was found using a microgenetic algorithm, which yielded the optimized ensemble which provided the final price forecast. Another study used GA [21] to optimize the parameters of the support vector machine (SVM) model, which was used to forecast prices in large power systems using data from the National Electricity Market (NEM) of Australia. Another study [36] used a novel model-based demand response control method to control the use of residential air conditioners and optimally in response to changing day-ahead electricity prices. GA is used to find the optimal solution to indoor air temperature, which was formulated as a nonlinear programming problem. Simulation results showed a reduction in the electricity costs and peak power demands during demand response hours. This paper concerns multivariate time series forecasting, specifically using LMP spatiotemporal data series. The aim is to forecast the 24h-ahead LMPs at a target location using other related time-series by designing a CNN architecture that is optimized by an evolutionary algorithm. Initially, a gene mapping scheme is designed to represent the CNN structure and connectivity. The advantage of this scheme is its flexibility in creating network structures of various lengths. The main contributions of this paper are as follows.
• Implementation of LMP and demand time-series as ''2D images'' applied to CNN • Development of mapping-encoding chromosomes in GA to reduce the length of a bit string • Optimization of hyperparameters and structural parameters of the CNN using the proposed mapping-encodingbased GA without trial-and-error The rest of the paper is organized as follows. Section II reviews CNNs and various optimization techniques.
Section III outlines the proposed price forecasting method. Section IV presents the results of the experiments conducted on two datasets, and Section V concludes.

A. PJM MARKET
Like other products, electricity is bought, sold and traded in wholesale and retail markets. The wholesale market involves the purchasing and selling of power between generators and resellers. Resellers include electricity utility enterprises, power providers, and electricity vendors. The Federal Energy Regulatory Commission (FERC) controls the operations and trades of the wholesale market in most regions of the United States. The wholesale market begins with the generators, which connect to the grid and generate electricity after they have obtained the necessary permits. The electricity that is produced by generators is purchased by entities that will resell it to satisfy end-user demand. The resale entities purchase electricity through markets or contracts with individual sellers. In many cases, utility companies own generation facilities and sell directly to customers. The price of wholesale electricity is known by a buyer and seller from the wholesale market. PJM power market is one of successful power markets, as shown in Fig. 1, in the world. As a regional transmission organization (RTO), PJM manages a wholesale electricity market that spans all or part of Delaware, Illinois, Kentucky, Maryland, Michigan, New Jersey, North Carolina, Ohio, Pennsylvania, Tennessee, Virginia, West Virginia and the District of Columbia. Figure 1 shows the map of the 21 zones that comprise the PJM interconnections [37], which are American Transmission Systems, Inc. (ATSI), Atlantic City Electric Company, Baltimore Gas and Electric Company, ComEd (CE), Dayton Power and Light Company (DAY), Duke Energy Ohio and

B. DATASET
The proposed method uses two datasets, which were obtained from the PJM website [38]. The dimension of the first dataset is 22 × 2184, corresponding to the 21 zonal prices of the zones that make up the PJM interconnection and the LMP of the target location, ''Athenia''. The value of 2184 is the number of hours in a season; the total number of hours from December 1, 2017 to November 30, 2018, is 8736. This dataset was used to construct 2-dimensional spatiotemporal forms as inputs to the 2D CNN, creating a multivariate timeseries, which yields multiple variables in zones and at a location at a single time; multiple channels per time-series input are thus obtained. Figure 2 displays the correlation heatmap of the target location (Athenia) and the 21 zones of the PJM power market. The LMPs of the 21 zones were considered in Dataset 1 because the target location has strong correlation scores within [0.84, 1) for all PJM zones. In addition to the aforementioned 22 LMPs, Dataset 2 also includes the demands. These demands in only eight zones (AEP, ATSI, APS, CE, DAY, DEOK, DOM, and EKPC) are available on the PJM website.
Feature scaling was applied to both datasets to yield values between zero and one. Let x, x max , x min and x new be the datum, maximum, minimum and scaled datum, respectively. The formula for the feature scaling is shown in (1) [39] using the min-max normalization as follows.
The PJM power market is very well-developed. Thus, there are neither missing data nor outliers in the studied datasets. In case that missing data or abnormal values occur in the datasets, a general ''data cleansing'' process can be conducted as follows: (i) interpolation, (ii) prior experience (persistence; duplication of prior data), or (iii) usage of the average value from neighboring locations (zones).

C. CONVOLUTIONAL NEURAL NETWORK
Recurrent neural networks (RNNs) [13], particularly those with a long-short term memory unit (LSTM) [40], [41] are the preferred single time-series forecasters. The effectiveness of these networks can be evaluated in terms of the recurrent connections, which permit the network to access all historical time series values. However, CNN can have multiple convolution layers, in which filters are applied by skipping elements in the input, allowing the network to grow exponentially. This process allows the network to access a wide range of historical data, like RNN. The convolutional structure of CNN and its number of trainable weights is small, making it more efficient in training and forecasting than RNN.
CNN is a biologically inspired means of deep learning, which has shown promise in solving classification problems, such as image recognition, segmentation, object detection, and time-series classification and prediction. CNN comprises a sequence of convolutional layers whose outputs are connected to a local region in the input by sliding a filter along with the input and computing the dot product of the input and the filter at each point. The convolutions replace the weighted sums from the neural network. The structure enables the model to identify specific patterns in the input data. Accordingly, CNN discovers filters that define repeating patterns in the series and use them to forecast future values. The layered structure of the CNN works efficiently on noisy series by removing the noise in each succeeding layer and extracting only the meaningful patterns. A feature map is generated when the input is convolved with the filter in each layer. Unlike typical neural networks, the values in the output feature map share similar weights, so all of the nodes in the output detect exactly similar patterns. This process reduces the number of learnable parameters and increases the efficiency of training and learning in every layer.
Although the literature presents many versions of CNN, all use similar algorithms. The building blocks of CNN are the convolutional layer, the pooling layer, and the fully connected layer. Figure 3 presents an example of CNN architecture. Convolutional layers comprise a rectangular grid of neurons so the input layer or the previous layer is a rectangular grid of neurons, in which all of the weights of the rectangular section are the same for all neurons. The convolutional layer generates an image convolution of the preceding layer in which weights denote the convolution filter. The pooling layer receives small rectangular blocks from the convolutional layer and samples them, yielding a single output from the block. Pooling takes the average or maximum operation of the neurons in the block. A fully connected layer (forecasting layer herein) receives all neurons in the preceding layer (which could be a pooling, convolutional or a fully connected layer) and connects them to each of its neurons. Fully connected layers are one-dimensional whereas convolutional layers and pooling layers are generally two-dimensional.
The convolutional layer performs kernel convolution in which a small matrix of numbers, known as kernel or filter, is passed over the image or the rectangular grid, to convert it based on the values of the filter. This filter outputs new matrices called feature maps. The values of feature maps are calculated using (2) [42] as follows.
where f is the input and h is the kernel, and m and n are the indices of the rows and columns of the result matrix. In this work, these hyperparameters (kernel size, number of kernels, pooling size and dropout) and structural parameters (number of layers) of CNN are encoded to the GA to characterize CNN.

III. PROPOSED METHOD A. IMPLEMENTATION OF GA
Tuning the parameters and hyper-parameters of CNN is laborious if a trial-and-error approach is used. This was performed in our early experiments and in other existing papers, too. However, this brute-force search is time-consuming and cannot guarantee the optimum. This motivates the authors to apply GA to find the optimal structure of CNN.
GA is an optimization technique for solving complex problems by iteratively considering various probable solutions. In this work, GA is used to find the optimal parameters of CNN, which is used as a standalone forecaster for 24h-ahead spatiotemporal data. This goal is to demonstrate the efficiency with which GA searches the solution space using the following phases; (a) gene mapping -population initialization, (b) genetic operation, and (c) chromosome evaluation.
Beginning with the initialization of the population size and the number of generations, GA applies a series of evolutionary operators until it obtains the best architecture of CNN for LMP forecasts. First, a population is initialized to have a predefined random size using gene encoding or gene mapping, as will be discussed in Sec. III.C. Throughout evolution, the fitness of each individual, which generates a specific CNN architecture is evaluated from the given dataset.
In the subsequent process, the parent individuals are selected according to fitness, and generate new offspring by the application of genetic operators. The selection operator selects from the current population the individuals who survive in the next generation. Therefore, the current population includes the generated offspring population and the parent population. The evolution continues until the predefined maximum number of generations is reached.

B. MAPPING-BASED ENCODING
At the beginning of the GA process, an initial population of chromosomes (binary strings; individuals) is encoded in the solution search space. Gene mapping enables the algorithm to locate a binary string in a genome pool. The gene mapping corresponds to the considered parameters of CNN and assessed in terms of fitness. Initially, a chromosome is set to a fixed length of 22 bits (the genotype), as shown in Table 1; its corresponding values are the phenotypes and are measurable. The hyperparameters/parameters to be optimized are the number of convolutions (bits 1 to 3 from the left), the filter size for layer 1 (bits 4 to 7), the kernel size (bits 8 and 9), the pool size (bits 10 and 11), the filter size for layer 2 (bits 12 to 15), the filter size for layer 3 (bits 16 to 19) and the dropout size (bits 20 to 22). Each combination of binary bits represents a feasible solution to the problem. The number of convolutions was set from zero to seven, but in the manual experiments without GA, the maximum number of convolutions that yielded satisfying results was only three. The experiments revealed that if GA used more than three convolutions, then a resource exhausted error occurred, and the updated parameters cannot converge at the maximum number of iterations. Accordingly, a return function was set whenever the GA selected (zero, four, five, six or seven) convolutions to prevent errors in the program. However, for filter numbers (with an interval range of 32), kernel size, pool size, and dropout size, the function returns the equivalent values in the gene mapping.

C. GENETIC OPERATIONS
The genetic operators that are used in this work are similar to those presented in many other works. They include selection, mutation, and crossover, which are used iteratively until the convergence criteria are satisfied.
The deap.tools module, for which is available online [43], [44] is used to execute the selection operation, which performs tournament selection and returns a list that contains references to selected individuals. This operator allows the best individual to be randomly selected from the individuals that participate in each tournament. The parameters include the number of individuals to be chosen, which is denoted by k and is set to 1 in this study, a list of individuals to select from, the number of individuals that participate in the tournament, and the attribute of the individuals that are used in the selection criterion.
The crossover operation performs a two-point crossover on individuals in the input sequence. The places of the two individuals are switched and their unique lengths are retained. The parameters specify the two participating individuals in the crossover and a tuple for both individuals is returned. The mutation operator executes a function that applies a Gaussian mutation of the mean (mu) and standard deviation (sigma) to the input individual. The parameters mu =0, sigma =1, and indpb =0.2 (probability rate) are used herein [43], [44]. These parameters include the individual subject for mutation, the mean or sequence of means in the Gaussian mutation, the standard deviation or sequence of standard deviations in the Gaussian mutation, and the independent probability that an individual is mutated. This operation returns a tuple for one individual.

D. OBJECTIVE FUNCTION AND ACCURACY METRICS
The purpose of GA is to find the optimal hyperparameters/parameters of CNN for the LMP forecasting. Accordingly, GA sets a fitness function for the selected individuals in its selection process. The objective function is RMSE [4], [45], [46], which is obtained from the evaluation of the created CNN, where it returns the minimal values of VOLUME 8, 2020

RMSE
whereŷ is the forecasted value; y is the actual value; andȳ denotes the mean of the forecasted values. The accuracy and error rates are evaluated after the optimal CNN architecture is obtained using the following performance metrics:RMSE, R-squared (coefficient of determination) [2], [47], [48] and mean absolute percentage error (MAPE) [1], [4], [26]; the latter two are defined in (4) and (5). R-squared quantifies the linear correlation between the measured and correlated values and trendline reliability. Perfect correlation yields a value of unity; thus, an R-squared value closer to one indicates more accurate forecasting. The objective function is evaluated for each solution that is obtained by the GA algorithm via CNN training. All such solutions are subsequently ranked as the population evolves through several operations, such as selection, crossover, and mutation, to optimize the fitness function and yield the final optimal solution.

E. IMPLEMENTATION OF THE PROPOSED METHOD
In this paper, Tensorflow was used to generate the code for CNN and importing the modules from Keras, including Conv2D, AveragePooling2D, Dropout, Activation, Flatten, Dense, and ZeroPadding2D. The DEAP library or Distributed Evolutionary Algorithm for Python was used to create the algorithm and to integrate GA and CNN. The DEAP library facilitates the mutation, selection, and evaluation operations. The algorithmic steps are detailed as follows. The algorithm starts with initializing values such as the number of generations (n), population size (p), hall of fame (hof), crossover rate (cxpb), and mutation rate (mutpb). The initialized population will be saved into a variable (pop). The population will be trained using the dataset and its architecture and will be saved on another variable (pop_cnn). Using the validation set, pop_cnn will be validated and will select the lowest RMSE. The lowest RMSE will be saved to a variable (best_cnn). This process will continue until the maximum iteration is reached. The inner loop of the algorithm comprises the mutation and selection process of the genetic algorithm, where 2 CNN architectures with the lowest RMSE will be selected from pop_cnn. The 2 selected architectures will generate an offspring using crossover and saves the offspring to a variable (pop_offspring). An architecture will be selected from pop_cnn and will be mutated to produce a mutated architecture, which will be saved in pop_offspring. The mutation and crossover processes continue and update the population. When the maximum iteration is reached,  the best CNN architecture will be selected and validate its accuracy and error rates. Figure 4 shows the flowchart of the methodology in the conducted study. The conducted research started with identifying the crucial problem in the power market.
Next, a thorough review of existing papers was performed concerning the problem and possible solutions. After the possible solution was identified, datasets were collected, and an algorithm was developed to avoid the demerits of existing methods and enhance the merits of the proposed method. The next phase was to develop the codes following the presented algorithm. These codes were tested for initial runs of the model. To verify the accuracy of the proposed method, training and evaluation were carried out using 2 datasets. The same datasets were also studied by other methods to compare the performance obtained by all methods. Finally, the proposed method was used to forecast the 24-hour day-ahead LMP for the target location.
The proposed 4 CNNs will be used for forecasting LMPs in different seasons. Because CNN has the capability of learning, the trained CNN using the old data can be retrained using the new data periodically (daily, weekly or monthly) in the corresponding season next year. Updating the parameters and hyperparameters of the CNN still utilizes the RMSE in (3) as an objective function, despite the positive or negative error. Every 24 LMPs and demands in the specified zones and target location will be stored for the subsequent updates of the CNN. The current parameters and hyperparameters of the CNN will serve as initial conditions for the next updates of the CNN.

F. COST-BENEFIT ANALYSIS
There are fixed and variable costs involved in developing the forecasting system in a real working environment. The fixed cost consists of the expenses for hardware: (1) a computer (e.g., an Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz with 256 GB of installed RAM and an NVIDIA GV100GL (Tesla V100 DGXS 32GB) graphics card herein) and (2) the computer network infrastructure for data transmission. The fixed cost also includes the expenses for software: (1) computer code based on the proposed method and (2) the computer network system. The variable cost comprises the cost of maintenance and operation of the above hardware/software as well as personnel salaries. The studied data are free of charge for open access real-time data in the PJM power market. The developed software may be set up using Python with the Keras library and Tensorflow as the backend. TensorFlow is an end-to-end open-source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources. Keras is a high-level neural network interface. Both have no licensing costs.
The forecasted LMP can be served as a crucial reference for buyers/sellers in the power market to develop their bidding strategies. Accordingly, the benefit in the implementation of a forecasting system is the increasing profit due to selling the electricity or decreasing payment owing to purchasing the electricity.
After the above future expected costs and benefits are estimated, they can be converted into a present value amount with a discount rate and the net present value to achieve the cost-benefit study [49], [50].

IV. EXPERIMENTAL RESULTS
To evaluate the performance of the proposed algorithm, a series of experiments on LMP forecasts was performed. Specifically, two datasets are used to demonstrate the effectiveness of the proposed CNN optimized by GA. These two datasets are similar. As described in Sec. II.B, the first dataset consists of 21 zonal prices and target (Athenia) LMP time series. The second dataset comprises the aforementioned 22 price time-series and eight demand time-series.
In all experiments, the following parameters were used; cxpb = 0.9 and mutpb = 0.05, where cxpb is the crossover rate and mutpb is the mutation rate [43], [44]. The number of binary bits is 22, as indicated in Table 1. The number of populations is set to 110, which is five times the number of binary bits, and the number of generations is 50. For CNN, the parameters in the optimizations provided in Table 1 are as follows; (1) number of convolutions, (2) number of filters for layer 1, (3) kernel size, (4) pool size, (5) number of filters for layer 2, (6) the number of filters for layer 3, and (7) dropout ratio. Table 2 presents the initial parameters of CNN that are used in finding the optimal parameters.

A. DATASET 1: LMP ONLY
To demonstrate the ability of GA to find the optimal parameters of CNN for forecasting LMPs at Athenia, experiments with the first dataset, covering four seasons, were performed. The first dataset comprises a pure time-series of locational marginal prices on all seasons, including holidays. The target location ''Athenia'' is in the PSEG zone of the PJM interconnection. Using the class deap.tools.Logbook, the evolution process generates a list of statistics during the optimization process; these include the minimum, maximum, mean and standard deviation. Figure 5 presents the minimum fitness statistics for all four seasons, that are obtained from the optimization process using the first dataset. Notably, the minimum fitness in the winter converged to 0.043 after 20 generations until the 50th generation. However, minimum fitness in spring converged as early as the fourth generation to 0.083 until the final iteration. The minimum fitness in the summer converged by the fifth generation to 0.029 until the final iteration. Finally, the VOLUME 8, 2020   minimum fitness in the fall converged after the tenth generation to 0.017 until the last iteration.
The average or mean fitness values in all the seasons converged in early iterations, as seen in Fig. 6. In winter and spring, the mean fitness converged to 0.1 and 0.16, respectively, after the third generation; in summer, it converged by the fourth generation to 0.057, and in fall, it converged after the 2 nd generation to 0.04. Figures 7-10 plot the LMP forecastings in all seasons using the first dataset; the blue lines represent the actual LMP, and the red dashed lines represent the predicted LMP. The results      The fitness values converge smoothly in 50 iterations. Figures 13-16 plot predictions for all seasons using the second dataset; the accuracy is much better than that achieved with the first dataset. Figure 17 is the scatter plot indicating the relationship between the actual and predicted values, which is strong and positive.    for Datasets 1 and 2, respectively. Dataset 2 leads to much lower MAPE and RMSE, and higher R-Squared values than Dataset 1. Restated, the predictions based on Dataset 2 are much better than those based on Dataset 1.
The optimal CNN architectures that were obtained using datasets 1 and 2 vary with the seasons, which are presented in Table 4 (a) and   values of the optimal CNN architecture obtained by dataset 2 on the prediction time horizons, respectively. The distribution of error shows that the RMSE values increase while the R-squared values decrease throughout the 24 hours. Therefore, the one hour-ahead prediction is more accurate than the 24 hour-ahead predictions. Figure 20 presents an example of an optimal CNN network generated from the proposed method for the winter season, which is described in Table 4 (b). The figure presents the output shape for each layer of the CNN that has three convolutional layers and three pooling layers.

D. COMPARISON OF PROPOSED METHOD WITH OTHER METHODS
The proposed method was compared to other forecasting methods in recent literature. The same datasets were used   to test their accuracy and percentage errors in terms of R-squared, RMSE, and MAPE. The compared forecasting methods were long short-term memory (LSTM) [46], support vector machine (SVM) [51], [52], k-nearest neighbor (KNN) [53], Bayesian ridge regression (BR) [54], decision tree method (DT) [55], multilayer perceptron (MLP) [56], [57] and ARIMA [6], [7]. To ensure fair comparisons, the datasets were prepared similar to those used in the proposed method, in which they were scaled within  0 and 1 using the MinMaxScaler in Sklearn kit. Since the datasets are multivariate time series, they all needed to be arranged and re-shaped as 3D as the input for the different methods. The parameter settings as described in each literature were followed to ensure the best configurations.
Specifically, the parameters of the Adam-optimized LSTM network have a visible layer with one input, a hidden layer with 4 LSTM neurons and an output layer that makes a single value prediction [46]. To make it applicable to our dataset, the output layer has to be designed to have 24 outputs. Different optimizers such as SGD and RMSProp were tested in our experiments but Adam was found to have the best results. The network was trained for 100 epochs and a batch size of 1. The result yielded low MAPE and RMSE values and high R-squared values; however, the proposed method is still better.
For the SVM method, the ''linear'' kernel function was applied to determine the best model. The input of the SVM is 3D shape. C = 1 and γ = 0.0315, where C is the penalty parameter and the gamma value influences the support vectors to obtain better scores [51], [52]. The values of C and γ can be easily updated using the Python's Scikit-Learn. The results of SVM are poorer than LSTM.
For the KNN method, only the values of ''k'' (the number of neighbors) need to be varied. The value of k chosen for testing is from 1 to 25 [53]. The grid search was used to examine the lowest MAPE in determining the best model. The   results reveal that the KNN method is worse than the SVM method.
For the BR method, the input needs to be 3D shape and the output should be a 24-h LMP forecast. The Scikit-Learn linear model Bayesian Ridge was used to study the dataset. To obtain the best result, Bayesian Optimization was implemented to tune the hyperparameters of the model and optimize the loss function by exploring the underlying distribution [54]. However, only the winter data obtained good results, while the other seasons obtained 100% MAPE values and negative R-squared values. VOLUME 8, 2020 For the DT method, the model was trained with maximum tree depth of the base learners which was set to 10 [55]. The learning rate, minimum loss reduction, L1 regularization parameter, L2 regularization parameter, and the number of boosted trees were set to 0.09, 0.1, 1e-07, 7 and 800, respectively. To obtain the best parameters, the random grid search from Scikit-Learn was used. However, the results were worse compared to the KNN method.
For the MLP method, the number of hidden layers was set to 2. The parameters in the literature [56] were followed, where the number of epochs = 500, learning rate = 0.001, number of neurons = 10, batch size = 100, and the activation function used is ReLu. The experimental results of MLP are worse than LSTM, SVM and KNN.
Finally, the proposed method was compared to ARIMA method, where the model was set to (5, 1, 0), which sets the lag value = 5 for autoregression, difference order = 1 and moving average model = 0. The experiments in the paper were done using Matlab and R applying it on univariate timeseries. The LSTM results are better than the results obtained from ARIMA. Figures 21-24 compare various methods, including the proposed GA-CNN method for both datasets 1 and 2. The actual LMP values are plotted against the values predicted using the different methods.
Tables 5 (a) to 5 (d) present the percentage of errors and coefficients of determination (R 2 ) obtained by different methods for Dataset 1, while Tables 4 (a) to 4 (d) are for Dataset 2. According to the results shown in Tables 5 and 6, the proposed method outperformed the other methods, with the lowest RMSE and MAPE values and the highest R-squared values in all seasons. Besides, the results using Dataset 2 are better than those using Dataset 1.

V. CONCLUSION
This work proposes an efficient method for 24h ahead LMP forecasts using CNN that is optimized using a novel mapping-based GA. The contributions and findings of this paper can be summarized as follows.
• The LMP and demand time-series are preprocessed as 2D (spatiotemporal) data used as inputs to the CNN, thereby, allowing the CNN to successfully capture the spatial and temporal dependencies of the datasets. Thus, additional complex digital signal processing techniques can be avoided but the 2D CNN still can be used.
• The traditional method for tuning hyperparameters and structural parameters of a deep learning network is conducted through brute-force trial and error. The proposed method can find the optimal hyperparameters and structural parameters of CNN by novel mapping-encoding chromosomes in GA to reduce the length of bit-strings without trial and error.
• The proposed method was extensively tested on two datasets from the real-world electricity market of PJM in the United States. From the simulation results, it can be found that hourly demands, in addition to historical LMP data, are also crucial factors as inputs to the 2D CNN to improve the accuracy of 24h ahead LMP forecasting.
• The studied data were grouped into 4 subsets corresponding to various seasonal characteristics (different types of situations in real life). Experiments revealed that the proposed method outperforms the other forecasting methods. These results validate the proposed method. For future studies and improvement of the proposed method, additional information such as temperature and other factors that affect electricity price forecasting will be considered further to test and validate the method.
At present, the PJM market only provides hourly demands in eight zones (21 zones in total). It can be found Dataset 2 consisting of these eight hourly demands can improve the accuracy of LMP forecasting. Thus, the administrators in the power market shall provide more information for buyers and sellers (participants) to develop their tools (such as LMP forecasters or bidding strategies).