Machine Learning Based Cost Effective Electricity Load Forecasting Model Using Correlated Meteorological Parameters

Electricity, a fundamental commodity, must be generated as per required utilization which cannot be stored at large scales. The production cost heavily depends upon the source such as hydroelectric power plants, petroleum products, nuclear and wind energy. Besides overproduction and underproduction, electricity demand is driven by metrological parameters, economic and industrial activities. Therefore, the region specific accurate electric load forecasting can help to effectively manage, plan, and schedule appropriate low cost electricity generation units to decrease per unit cost and provision of on time energy for maximum financial benefits. Machine learning (ML) offers different supervised learning algorithms including multiple linear regression, support vector regressors with different kernels, k-nearest neighbors, Random Forest and AdaBoost to forecast the time series data, but the performance of these algorithms is data dependent. It is vitally important to consider correlated metrological parameters of the specific region for accurate prediction of electricity load demand using ML based forecasting models to minimize the price per unit. In this study, an algorithm is proposed to select least cost electric load forecasting model (lcELFM) using correlated meteorological parameters. We developed least cost forecasting models by minimizing root mean squared error, mean absolute error, and mean absolute percentage error. For simulations, the recorded electricity demand data is taken from a substation of water and power development authority Muzaffarabad city from $1^{\mathrm {st}}$ January 2014 to $31^{\mathrm {st}}$ December 2015. The meteorological time series data are obtained from the substation of Pakistan meteorological department for the same period and same region. Empirical results demonstrate the robustness of the proposed algorithm to select lcELFM. Moreover, SVR (Radial) based electric load forecasting model proves to be the robust model when built using correlated features (temperature and dew point) for the said region and in turn can save up to PKR 0.313 million daily.


I. INTRODUCTION
Electricity is used as a major source of energy which is produced from electricity generating units. These units may use water, petroleum products (oil, natural gas etc.), nuclear energy or wind as a fuel to generate electricity [1]. One of the major issues regarding electricity generation is its The associate editor coordinating the review of this manuscript and approving it for publication was Mauro Tucci . storage at large scales [2]. End users, being one of the main stakeholders, demand minimum per unit cost. The per unit cost of electricity is mainly governed by two factors: type of fuel used; and electricity production as per requirements for a specific region. Both overproduction and under production of electricity cause financial loss to electricity generating and distribution companies (EGDCs). Therefore, EGDCs try to use least cost production units to produce electricity as per demand for a specific time period to maximize economic VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ benefits [3]. Therefore, an automated and accurate electricity load forecasting (AELF) plays a vital role in devising policies for adequate load management, scheduling the electricity generating units, predicting future trends, contract evaluation, tariff adjustments, and ensuring uninterruptable energy supply to the consumers [4], [5]. More accurate an AELF, more it is financially beneficial for EGDCs. These factors trigger the research for accurate electricity load forecasting so that EGDCs avoid unnecessary power generation to prevent increase in per unit cost and to serve the consumers with low cost and uninterrupted power supply.
During the last few decades, various research studies have been conducted for AELF models suggesting that both under and over forecasts greatly affect profitability of EGDCs [6], [7].
Numerous factors such as air temperature, dew point, wind, relative humidity, holidays, working hours, and seasons are critical and predominantly affect the demand in the electricity consumption [8]. These factors may be collectively called as critical factors affecting electricity load demand. A comprehensive understanding of these factors is necessary for developing accurate load forecasting models. These factors cause deviations from typical load curves and hence, make AELF a challenging task [9].
Irrespective of the approach (short, medium or long term) being used, meteorological parameters are important in AELF. Temperature, relative humidity, visibility, cloud cover, rain fall, precipitation, dew point, wind speed and wind chill are commonly used variables (either individually or in combination) in various techniques [10].
In [11], researchers reported 6% additional electricity consumption per annum due to the global warming in Greece. The authors in [12] reported that the increase in temperature can increase peak load demand up to 15% in Thailand at the end of this century. In [13], the authors showed the strong influence of relative humidity and wind speed on electricity consumption in eight states of America. In [14], different patterns of load demand during different seasons in New South Wales Australia are represented through regression models based on the metrological parameters. It is found that, the per capita consumption demand can rise up to 6.14% and 11.3% in summer and spring seasons respectively by the end of this century. In [15], the authors showed the sensitivity of weather change to load consumption for different cities of Australia. The temperature may also have an inverse relationship with the load consumption in some countries. For example, the authors in [16] show that in New Zealand, an increase in temperature by 10 • C causes a decrease in average yearly load demand by 1.4% and a decrease in temperature by 10 • C causes an increase in average yearly load demand by 1.6%. In [13], it was found that the fluctuations in the meteorological parameters especially, the air temperature strongly impact the residential cooling demand in USA.
In short, the consumption of electricity by households and businesses for heating and cooling purposes is primarily driven by weather phenomenon. A change in weather patterns causes a shift in the electricity load demand, which if not foreseen may result into an imbalance between supply and demand [17]. This imbalance may also trigger an increase in production cost of electricity due to the use of expensive production units (kerosene oil based units for example) and wastage of unutilized produced electricity due to overproduction.
Though, recognizing the significance of this issue, studies have been carried out to analyze the impact of weather on load demand in different regions [18], [19]. As per the best of our knowledge in the existing literature in this dimension, the region of Kashmir is completely missing. In addition, the existing studies for computing energy demand equation all the over the Pakistan, do not consider meteorological parameters as descriptive variables [20], [21]. The authors in [22] studied the relationship with other influencing variables, but didn't consider the impact of meteorological parameters. By the same token, these studies used only monthly load demand data which does not provide information about insights of load-weather relationship on small scales, as the problem requires much disaggregated analysis. A comprehensive investigation of this relationship is imperative because it has consequences on capacity building of electricity generating units. The supply bottlenecks can be avoided by devising strategic plan, which hinges upon accurate estimates of potential power demands. Temperature has been documented as the most influential variable on load demand among others in numerous research studies [8], [23]. However, setting apart temperature, information is completely inconclusive for the region under consideration on the combination of weather variables which provide meaningful input to AELF i.e. the variables with the highest correlation with load demand.
The presented study has a broader objective of estimating the impact of climate variables on the selection of a suitable, least cost, and accurate electricity load forecasting model. For this purpose, in this study, an algorithm is proposed for the choice of robust least cost electric load forecasting model (lcELFM) using meteorological variables and electric load demand. As the first step of proposed algorithm, correlation analysis is performed to determine the correlation among weather parameters and electric load. For this purpose, this study considers all important meteorological variables that may influence the load consumption on hourly, seasonal and annual basis in the region under study (Muzaffarabad city). The meteorological and electricity load demand datasets are acquired from local stations of Muzaffarabad, Azad Kashmir. In the proposed algorithm, the projections of the fluctuations in electricity demand under different weather parameters like temperature, dew point, relative humidity, cloud cover and rainfall are provided on hourly basis using correlation analysis. This helped in identifying only relevant weather variables to load forecasting in Muzaffarabad, because parsimony can be achieved in the prediction models by considering only those weather variables which proved to be relevant to electricity load demand in the correlation analysis and using extra exogenous variables will create complexity dilemma in forecasting models with negligible or no improvement at all in the results. Literature shows that numerous studies used Pearson's correlation analysis to determine the relationship between electric load and meteorological parameters [24]- [30]. However, we used Pearson's correlation, Spearman correlation analysis (a popular non-linear rank based correlation analysis method) and nlcor function of R language to compute nonlinear correlation. The results of Spearman correlation analysis and nlcor are in line with the results of Pearson correlation analysis. All three of these correlation analyses show that there is linear relationship between meteorological parameters and electric load variable for available dataset used in this study. The relevant variables are then embedded into forecasting models to determine the accuracy improvement using popular machine learning algorithms. As a next step, the economic significance of developed forecasting models are computed. Finally, the proposed algorithm selects that forecasting model as lcELFM which has minimum cost per unit of electricity. Empirical results show the usefulness of proposed lcELFM algorithm for least cost electricity production.
The paper is organized as follows. First section includes the introduction and literature review of the study. Data description and preprocessing of data are discussed in the second section. Methodology of the proposed study is detailed in the third section. Proposed algorithm is presented in fourth section. Simulation design to investigate the usefulness of proposed algorithm is discussed in section five. Results of the correlation analysis are presented in section six. Results of electric load forecasting and economic significance are discussed in sections seven and eight, respectively. Finally, discussion and conclusion are detailed in the last section. In this manuscript, electric load and electricity load and load are interchangeably used but refers to the same entity i.e. electricity load demand.

II. DATASETS
Numerous studies from recent past used one to three years' time series data to build energy load forecasting models. For example, in [8], [31]- [34] the authors used datasets of eighteen months to two years duration for developing forecasting models. Data of one year was used in [6], [35]- [37] while three years data was used in [38]. In this study, a time series data of hourly electricity consumption of Muzaffarabad city in kilowatt-hour (KWh), across the time period from 1 st January 2014 to 31 st December 2015 (two years) is used. Electricity demand data was recorded by water and power development authority (WAPDA) which is responsible for production and supply of electricity across the country. The data was obtained from a local substation in Muzaffarabad city.
The meteorological time series data are obtained from a local meteorological substation of Pakistan meteorological department (PMD) in Muzaffarabad. The meteorological data are collected for the same period as it was for the electricity consumption data to develop the relationship over the corresponding period i.e. 1 st January 2014 to 31 st December 2015.
The dataset includes hourly air temperature, dew point temperature, relative humidity, cloud cover, wind speed, and rainfall. Table 1 displays the descriptive statistics of electricity load and meteorological parameters. Between the years, the mean values of load demand, dew point, humidity and cloud cover are higher in 2015 as compared to 2014, contrary to temperature and wind speed which are lower. Overall, similar weather pattern can be observed between the years.

III. METHODOLOGY
This section discusses the data preprocessing step and methods adopted to determine the correlation between electric load and meteorological parameters. Next, machine learning algorithms used to build forecasting models in present study are presented. Performance evaluation measures and computation of forecasting models' economic significance are detailed at the end of this section.

A. DATA PREPROCESSING
Pakistan has been struggling against severe energy crisis from decades, so as the Muzaffarabad city which caused load shading and sudden power blackouts, resulting into missing values in the load time series. These missing values are interpolated, instead of discarding them in order to obtain real time behavior. Means by nearby points' method is used for this purpose according to the following example criterion: where x 1 and x 2 are two previous values and x 3 and x 4 are two values next to missing value (x ). The pre-processing helped in removing irregularities and smoothing the load time VOLUME 8, 2020 series. Once the data is pre-processed it is divided into different categories like mean daily value, maximum daily value, hour to hour data of each variable, mean of each hour and seasonal data.

B. WEATHER LOAD RELATIONSHIP
The effects of weather variables on load demand are determined by scatter plots and correlation analysis using linear regression trend line. Statistically, the interdependence of two variables is given by the equation: where y is the dependent variable, x denotes the independent variable, a and b represent the regression coefficients.
In statistics, correlation analysis is used to measure the strength of relationship between two variables. The most widespread and frequently employed method for this purpose is the Pearson's correlation analysis. Numerous studies used Pearson's correlation analysis to determine the relationship between electric load and meteorological parameters [24]- [30]. In this study, Pearson's correlation, Spearman correlation analysis (a popular non-linear rank based correlation analysis method) and nlcor function of R language are used to investigate the relationship among meteorological parameters and electric load demand. Higher levels of the correlation estimates of nonlinear correlation analysis (values near to 1) indicate that the variables involved in the analysis are nonlinearly correlated and lower values (near to 0) show linear relationship. The results of all three of these correlation analyses show that there is linear relationship between meteorological parameters and electric load variable for available dataset used in this study. Pearson's correlation coefficient r is an index that quantifies; to what extent the given variables are related linearly. Its value can range from +1 to -1 and is given by the following equation: where n is the number of pairs xy is the sum of products of paired scores x is the sum of x scores x 2 is the sum of squared x scores y 2 is the sum of squared y scores The strength of the relation is inferred from the value of r according to the following criteria: r = +1 shows linear and perfect positive correlation. 0.0 < r < 0.09 shows no correlation. 0.1 < r < 0.25 shows small linear correlation. 0.26 < r < 0.40 shows medium linear correlation. 0.41 < r < 1.0 shows strong linear correlation. r = 0 shows no correlation exists between two variables −1 < r < 0 shows negative linear correlation. r = −1 shows linear and perfect negative correlation.

C. MACHINE LEARNING METHODS FOR LOAD FORECASTING
Many important mechanisms are involved in the energy generation like policy decisions, management, planning, reliable power operations, meeting future demands, contract evaluation and maintenance [10]. These mechanisms require advance information of future electricity load demands. Several machine learning methods have been used to assist EGDCs in estimating these mechanisms through AELF. These methods provide useful information about future load demand trends by analyzing past data and influencing factors. The EGDCs can make smart decisions regarding power generation and distribution based on the information provided by machine learning methods. As evidenced by the scientific literature, the meteorological variables have significant effects on electricity load demand in many regions and in turn, these may have a role in the accuracy of load forecasting. In order to assess the improvement in the load forecasting due to inclusion of highly correlated weather variables in Muzaffarabad region, a predictive analysis is conducted by employing various well known machine learning methods. These are Multiple Linear Regression (MLR) [39], k-Nearest Neighbors (kNN) [40], Support Vector Regressor (SVR) [41] which is based on the principles of Support Vector Machines (SVMs) [42], Random Forest (RF) [43] and AdaBoost [44]. These supervised learning algorithms have applications in decision support systems [45]- [48], sentiment analysis [49], [50] and time series analysis [51]- [53].
To avoid over-fitting and for tuning predictive parameters, 10 fold cross validation method is used. MLR based models have been used in load forecasting for decades, which are advanced forms of simple linear regression used to predict an output variable (y) based on multiple predictor variables (x) [39]. For example, with four predictor variables (x), the following equation is used to express the output variable (Y ) : where ''b'' values are called beta coefficients (regression weights). They are used to measure the relationship between the predictor (x) and the output (Y ). k-NN has widely been used as non-parametric method in statistical estimation and classification for decades. It predicts any new numerical target point on the basis of a similarity measure (distance functions) i.e. the predicted point is calculated by averaging the kNN nearest neighbors of the numerical target in the training set.
SVR [41] is a popular machine learning method used for regression which is basically a modified version of SVMs [42]. SVMs aim to separate each data point into an n-dimensional feature space after which a hyper-plane is estimated to separate those data points in their corresponding classes. A SVR works on the same principles as support vector machine (SVM) does. SVM has many kernel functions. In this study, three popular kernels, SVR with linear kernel (SVRL), SVM with radial kernel (SVRR) and SVR with polynomial kernel (SVRP) are employed for electricity load forecasting. The SVR models are developed by employing the SVM function of the 'e1071' package of R Language. This package has SVM implementation for regression problems namely SVR. Important parameters in this function are kernel, Vapnik's insensitive loss function (ε), constrain violation (cost), tolerance of termination criterion, and γ . Linear, Polynomial and Radial are used as values for kernel parameter. ε is set to ''0.1'', cost value is ''1'', tolerance value of ''0.001'' and γ = 1/(data dimension). These are the typical values used to develop electric load forecasting models. The R Language implementation of SVR function automatically optimizes the other hyper-parameters for the given problem.
RF [43] is an ensemble or decision trees based learning algorithm for classification and regression tasks. It consists of several individual decision trees operating as an ensemble. These trees are trained individually and split out a class guess by searching randomly selected subset of given input variables. For classification task, the class label possessing the highest vote count among individual decision trees, becomes the model's prediction. In regression, the mean of the individual decision trees becomes the model's prediction.
Adaptive Boosting [44], in short AdaBoost, is a meta-algorithm which is used to improve the performance of model by conjunction of different weak classifiers. The predictions of the week classifiers are joined into a weighted sum, representing the final outcome of the boosted learner. It is called adaptive because consequent weak learners are pulled towards those instances that were misclassified by preceding classifiers. In this article, Adaboost.RT algorithm [54] is used, which follows the same boosting procedure as that of [44] but in context of regression problems.

D. PERFORMANCE EVALUATION MEASURES
The performances of forecasting models can be evaluated using the mean absolute error (MAE), the root mean square error (RMSE) and the mean absolute percentage error (MAPE). These are widely used statistical measures for time series based models to measure their performances [51], [52]. These are defined as follows: where N is the size of test samples, andȳ l and y l are the estimated and actual electric load values, respectively.

E. COMPUTING ECONOMIC SIGNIFICANCE OF FORECASTING MODELS
MAPE is a commonly used statistical measure to determine the accuracy of ELF models. Improvement in the MAPE can be converted into a financial benefit (e.g. rupees or dollar amount). It was found in [3] that 1% improvement in the MAPE can decrease variable generation cost by around 0.1%-0.3% and with a conservative estimate, 1% decrease in the forecasting error can save approximately $1.6 million annually for a 10,000 MWh utility. The economic significance can be estimated from MAPE using the following equations: where, MAPE EL is the MAPE of ELF model built using electric load data only, MAPE ELnMP is the MAPE of ELF model built using electric load data coupled with meteorological parameters (as described before), MAPE redu is the reduction in of forecasting error due to inclusion of meteorological parameters, AvgED MW /d is the average daily demand in megawatts AvgED MW /h is average hourly electricity demand in megawatts, H d is the hours per day. In equation (10), AvgDS is average daily saving and GC MWh is the generation cost of per MWh of electricity. Once AvgDS for each of the developed electricity load forecasting models using different machine learning algorithms are computed, the following proposed equation can be used to obtain the least cost ELF model.
where lcELFM represents the least cost electric load forecasting model, and TMLA i is the list of traditional machine learning algorithms. Equation (11) can be helpful for exploring least cost ELF model.

IV. PROPOSED ALGORITHM
In this section, proposed algorithm to select lcELFM is described as follows.
According to the best of our knowledge, before this study, no algorithm has yet been proposed to select the lcELFM among the available ELF models. In this section, the proposed algorithm to select lcELFM is described, which is one of the main contributions of this study. The algorithm starts by taking lists of original EL, MPrams, TMLAs, and GC MWh .
Here EL is the hourly electricity load demand, MPrams are the meteorological parameters, TMLAs is the list of traditional machine learning algorithms to build forecasting models, and GC MWh is the generation cost per MWh of electricity, respectively. The algorithm returns lcELFM which shows the least cost ELF model having minimum MAPE ELnMP value among the built ELF models as per provided TMLAs using the details listed in the algorithm.  Electric load demand (EL) data split for k-Fold CV 8. Split data χ into k folds i.e. ELnMPD k = χ n k=1 EL and MParms time series data split for k-Fold CV 9. Repeat steps 10 to 13 for each fold k, where k := 1 to n 10.
ELnMPDtr k := To investigate the usefulness of proposed algorithm, the obtained electric load and meteorological datasets are first checked for missing values. Missing values are filled by using equation (1). As a second step of simulation design, the correlation between available meteorological parameters and electric load is computed by Pearson's correlation analysis (Refer to equation (3)). Correlated dataset is then used in two settings (Setting-I and Setting-II). Setting-I corresponds to the electric load dataset while Setting-II contains dataset having electric load and correlated meteorological parameters. For both of the settings, the datasets are divided into 10 equal folds for cross validation purposes. The divided data is alternatively provided to machine learning algorithms (MLR, kNN, SVML, SVMR, SVMP, RF and AdaBoost) to build predictive models as per the notion of well know k-fold cross validation technique. The built models are then tested using test dataset to have predicted values of electric load. The obtained electric load values are used to compute error measures (MAE, RMSE and MAPE). Using equations (8), (9) and (10), the economic significance of the predictive models is computed. Equation (11) is used to select the least cost electric load forecasting model. The whole process is illustrated in the Fig. 1.

VI. CORRELATION ANALYSIS RESULTS
As discussed before, according to scientific literature, electricity load demand relies on the human reactions to weather conditions. Therefore, in order to quantify the relationship between meteorological variables (temperature, relative humidity, dew point, cloud cover, rainfall and wind speed) and electric load demand, scatter plots are drawn on annual and seasonal bases and corresponding correlation estimates are computed.

A. HOUR TO HOUR LOAD TEMPERATURE RELASHIONSHIP
In order to analyze the hour to hour correlation between load and temperature on yearly and seasonal basis, the scatter plots VOLUME 8, 2020 with linear trend lines are generated for the peak hours of the day. Fig. 2 shows the hour to hour correlation on yearly basis for 1am, 6am, 8am, 12pm, 5pm and 7pm. The subgraphs in Fig. 2, other than subgraph for 1am, show the readings at five different peak hours.
Power demand tends to increase in the morning, when people usually start their day and similarly secondary peaks are observed in the evening, when most of the people return home. Annual hour to hour scatter plots of the most significant hours (Fig. 2) of Muzaffarabad during the study period resembled this trend. Higher correlation exists between load and temperature in the morning at 6am and 8am ( Fig. 2(b), 2(c)) and in the evening at 5pm and 7pm (Fig. 2(e), 2(f)) due to maximum activities being taken place during these hours. Simply, all the times of the day (Fig. 2 (a-f)) clearly indicated a strong inverse linear relationship between load and temperature, which shows notable response of the residents of Muzaffarabad to temperature changes during prominent hours.

B. SEASONAL HOUR TO HOUR LOAD TEMPERATURE RELASHIONSHIP
The relationship between electric load and temperature can be defined in a better way on seasonal time scales. In the Figs. 3-6, hour to hour relationship between electric load and temperature is shown (only for important time slots of the day) on seasonal basis. Important time slots include before sun rise, morning, noon, afternoon and evening. Hours with maximum correlation are plotted for each important time slots of the day which can vary for each season. For instance, plots at 1am, 5am, 9am, 12pm, 5pm and 7pm are shown for summer season whereas plots at 1am, 6am, 8am, 12pm, 5pm and 7pm are shown for spring season because corresponding hours have maximum correlation. (In order to preserve maximum possible details of the plot, the x-axis and y-axis scales of the scatter plots are altered). Fig. 3 (a-f) shows correlation of temperature against electric load at 1am, 6am, 8am, 12pm, 5pm and 7pm for spring season. Inverse linear correlation is observed for all time slots. The correlation is small during predawn hours i.e. at 1am and 6am (Fig. 3 (a-b)) but as majority of the citizens wakeup and begin a combination of routine activities, the correlation become stronger at 8am (Fig. 3 (c)). This is due to the gradual rise of temperature from uncomfortable levels of late winter temperatures to pleasant early spring temperatures causing a decline in electric load demand. Fig. 4 (a-f) shows correlations of temperature and electric load at 1am, 5am, 9am, 12pm, 5pm and 7pm for summer season. This season exhibited different and a nonlinear correlation for corresponding hours as compared to other seasons. A small positive correlation is found for all hours in this setting. The reason is that the increase in temperature during summer and hence the use of electric equipment for  cooling purposes triggered the increase in electric load demand. However, the correlation for this season was not much impressive because the temperature of Muzaffarabad is not as higher as it is in other parts of the country and pleasant climate conditions restrict citizens not to show abnormal trends in the summer.    5 (a-f) shows the correlation between load and temperature at 1am, 5am, 8am, 12pm, 5pm and 8pm for fall season. In this figure, the last peak hour in the evening changes from 7pm to 8pm. The possible reasons are: (1) during fall season, the temperature starts decreasing in the evening as compared to summer; (2) the sun sets later in the evenings as compared to winter and the citizens engaged later in their indoor activities. An inverse strong relationship is developed during these hours. The correlation is well-defined for 5am ( Fig. 5 (b)) in the morning (when most of the residents wake up for their day) to 8pm at night (when most of the residents are involved in the variety of post evening activities after returning home). The highest correlation is found at 12pm (Fig. 5 (d)) because, the temperature remains comfortable during midday hours in the fall season which results into a decreased load demand, however the fall values of other prominent hours in the scatter plot mimicked that of 12pm. The correlation between pleasant fall temperatures and corresponding loads of the prominent hours ( Fig. 5 (b-f)) is comparable to those of spring season (Fig. 3 (c-f)). Fig. 6 (a-f) shows the correlation between load and temperature at 1am, 6am, 8am, 12pm, 5pm and 8pm for winter season. Muzaffarabad is known for its frigid temperatures in the winter so, elevated levels of load consumption are seen during this season. An inverse relationship between load and temperature is observed for all hours, which is most noticeable at 8pm. This is because of the combined effect of dropping temperature and rush oriented post evening routines of the residents. People struggle against cold temperatures of the winter throughout the day, so handsome correlation exists between load and temperature. Usage of heating equipment is a major part of the customer's adjustment to uncomfortable temperatures and hence the load demand increases. This adjustment is clearly reflected in the scatter plots for all hours in Fig. 6 (a-f).

C. LOAD DEW POINT RELATIONSHIP
Dew point, relative humidity and air temperature are reported together because these variables are related with each other i.e. when dew point, temperature and air temperature are equivalent, the relative humidity would be 100% and fog, clouds or frost are created depending upon the season. However, dew point is more accurate measure of ''how humid the air is'', because it is an absolute measure whereas the relative humidity is a relative measure and can sometime give misleading results [8], [9].
Dew normally appears during the clear sky, when the surface emits heat and become cooler as there are no clouds to trap the heat. The emission of radiations from earth's surface cools down the air more rapidly overnight, elevating its chances to drop under the dew point. Heavy dew becomes more common in the mornings of fall season. It is considered as the peak dew season among all because the air is normally cool enough to drop under the dew point temperature, but not enough to make frost. Fall season of Muzaffarabad is known for its clearer and calmer nights that are important for cooling and surface radiation which causes surface temperature to drop under the dew point temperature.    . 7 (a-f) shows the correlation between load and dew point at 1am, 5am, 8am, 12pm, 5pm and 8pm for fall season. The air temperature normally reaches down to the lowest value from the midnight at 12am to the morning around sunrise, so during this time, it is more likely that the dew point temperature is reached. As a result of this phenomenon, VOLUME 8, 2020 correlation between load and dew point began to develop at 1am (Fig. 7 (a)) but this is not good enough because most of the residents are sleeping at midnight and load consumption prone activities are rear during this time. As the sunrise approaches ( Fig. 7 (b)), the correlation becomes stronger and stronger because the overnight decreased temperature is nearly equivalent to the dew point temperature at this time and the highest relative humidity is there. The relationship reaches its highest value at 8am (Fig. 7 (c)) because majority of the residents have awoken and begin their day through a combination of activities and people respond to dew point temperature by increasing the load demand at this time. The inverse relationship is the combined effect of customer's post dawn activities and dew point temperature, which results into an increase in the load demand. As the day proceeds after the sun rise, the dew point temperature moves away from the air temperature and the correlation between dew point temperature and load demand decreases (Fig. 7 (d-f)). For example, the highest customer's activities are also seen at 8pm as these are seen at 8am but the correlation is not that high as it is at 8am.

D. LOAD RELATIVE HUMIDITY RELATIONSHIP
The correlation between electric load and other meteorological variables is also explored. Relative Humidity is one of them, having significant relationship with electric load demand. It is defined as ''the amount of water vapors present in the air expressed as a percentage of the amount needed for saturation at the same temperature''. Higher humidity during the hotter months results into hotter feelings as compared to the actual temperature. Fig. 8 (a) shows the scatter plot of relative humidity verses electric load demand for the whole year of 2014. The figure revealed a small positive correlation between relative humidity and load demand. Summer is the most relevant season for relative humidity because of higher temperatures. So, scatter plots of relative humidity and temperature are also shown in Fig. 8 (b) along with scatter plots of relative humidity and load demand Fig. 8 (c) for summer season. The following inferences can be drawn from the figures.
During the summer season, the electric load decreased as the relative humidity increased (Fig. 8 (c)). Because, an inverse relationship exists between temperature and humidity ( Fig. 8 (b)) during the summer season, so when temperature increased, humidity decreased and in turn the load demand increased. Whereas, the electric load demand seemed independent of relative humidity during the winter season ( Fig. 8 (d)). This is so because in winter season, wind speed and wind chill are more relevant to load vagaries as compared to relative humidity. Spring and fall seasons do not show any relevancy to load demand and resembled the similar winter trend.

E. LOAD CLOUD COVER RELATIONSHIP
Cloud cover is another important variable of meteorological data which is related to solar radiation. It is defined by three aggregated cloud layers i.e. low clouds, medium clouds and high clouds depending upon the height from the earth's surface. Measurement of cloud cover is a difficult task. Satellite cameras, ground sensors and visual observation are commonly used methods for this purpose each has its own limitations. Muzaffarabad meteorological department uses a scale from 0 to 8 for cloud determination where 0/8 means no clouds at all and 8/8 means complete sky cover.
So widespread scatters are found for transitional seasons (not shown) that the load demand seemed nearly independent of cloud cover however, small to medium level correlation existed between load and cloud cover for some hours of different seasons which are narrated here.
Cloud cover plays an important role for blocking unpleasant solar radiation striking the earth in the summer season. Solar radiation reaches to maximum at solar noon. In summer season, from 1pm to 4pm are the most relevant hours for loadcloud cover relationship because solar radiation is at its maximum during these hours if clouds are not there. Therefore, small inverse correlation between load and cloud cover is found from 2pm to 5pm (r = −0.12312, r = −0.29609, r = −0.22755 and r = −0.12553 respectively). The reason is obvious. As the clouds started disappearing, uncomfortable effect due to solar radiation increased, and in turn the load demand increased.
The highest correlation is found at 1pm during springtime (r = 0.33834) and prior two hours showed medium correlation i.e. r = 0.32204 at 12pm and r = 0.29616 at 11am whereas from 1pm onwards the correlation raged 0.1 to 0.3. Perhaps, solar radiation was not strong enough to create the inverse correlation from pleasant early to 2/3 of spring season even in the absence of clouds, therefore positive correlation is reported.
Clear sky during the fall season results in dew points at nights which create cold conditions as compared to the sky covered with clouds. Therefore strong inverse load-cloud cover relationship is found from 8pm to 5am (r values between −0.2 to −0.5) with the highest value occurred at 10pm (r = −0.4679) when residents are involved in closing activities of the day. The same damp conditions occurred for winter season in the absence of clouds at night, therefore inverse relationship existed during the night hours of the winter but it was not well pronounced as it is for fall season and ranged from −0.1 to −0.2.

F. LOAD WIND SPEED: RAIN FALL RELATIONSHIP
The relationship between electric load and wind speed, electric load and rain fall is not significant during any season. Inconsistent and non-apparent correlation (not shown here) was found between the variables. However, correlation from 0.1 to 0.3 exist for some random hours of the day for all seasons, for example a correlation of 0.312773 is noticed in load-wind speed relationship at 5pm during summertime. So, the independent consideration of wind speed and rain fall variables for electric load forecasting seems to be of no use. However, these may be useful when used with the combination of other variables like wind chill.

VII. ELECTRIC LOAD FORECASTING RESULTS
Two model simulations, one with electric load variable only and other with meteorological parameters coupled with electric load variable are performed in order to estimate the null hypothesis i.e. ''relationship between weather variables and electricity load can improve the electricity load forecasting for Muzaffarabad city''. Meteorological parameters that showed strong relationship with the electric load demand in the correlation analysis are integrated into a forecasting model to access the accuracy improvement. Hourly observations of the corresponding variables are used to create two ELF models by considering following two settings: Setting I: Use only electric load demand variable to build electric load forecasting model (ELFModel). Setting II: Electric load demand variable coupled with meteorological parameters to build forecasting model termed as ELnMPFModel.
For both of the settings ML algorithms MLR, kNN, SVRL, SVRR, SVRP, RF and AdaBoost are used to build and test both types of ELF models. The results of the modeling studies are used to determine which combination of variables i.e. setting-I or setting-II gives accurate forecasting.
The performances of both types of models are evaluated using the RMSE, MAE, and MAPE.
Short term load forecasting (24 hours ahead) is performed with both types of models. The outputs from each model are compared with the original observations to estimate MAPE, RMSE and MSE over the specified forecasting period. The model with the lowest evaluation values is considered better. Previous research studies showed that many exogenous variables like day of the week, month, holidays, year etc. are essential to load forecasting [8]. So, these are used in both settings (I and II) during the training phase to develop forecasting models. Other model specifications are listed in Table 2. This table shows common variables of the models (which are used in both models) and the list of variables used to build models for both of the settings (I and II).  Table 3 shows the actual values of electric load and the predicted values by ELFModel (referred as M1 in Table 3) and ELnMPFModel (referred as M2) used in this study. Tabulated values (for twenty-four hours ahead forecasting) show that ELnMPFModel based values are in close agreement with actual electric load values than those produced by ELFModels.
Tables 4 shows the performance indexes of both types of models built using Setting-I and Setting-II. It can be seen from the Table 4  It can be seen from the Table 4 that the model errors are reduced, when ELnMPModels are used (as compared to using load data alone) in all cases. Big differences in the error measure are reported by the results. In case of load variables, the errors are inconsistent and possess higher values for simulated forecasting period. However, two exogenous variables, weekday and IsWIsH (is working day or holiday) did not resemble a statistical significance from other variables including exogenous, load and weather variables i.e. the load is independent of these two variables.
The simulated results revealed that the electricity use at Muzaffarabad responded to temperature and dew point temperature more than other two metrological parameters because these are greatly related to human comfort. The strongest relationship is shown between load and these variables as compared to others in the correlation analysis. The model learned the load-temperature and load-dew point temperature relationship more effectively compared with the other weather and exogenous variables. The results uncovered the benefit of using temperature and dew point in the simulated models, however relative humidity and cloud cover also looked beneficial in some studies and the model can still capable of creating an acceptable forecast when equipped with them but not for the region under consideration in this study.

VIII. RESUTLS OF ECONOMIC SIGNIFICANCE
The economic impact of ELnMPFModel is quantified by computing a rupee amount saved by WAPDA if the MAPE could be reduced in the load forecasts. These economic benefits should be portable across the Azad Kashmir. Azad Kashmir region has the load demand of around 400 MWh. Considering, mean demand of Muzaffarabad reaches 50 MWh and assume the mean cost of electricity is $70/MWh (considering the cost of hydropower projects as Muzaffarabad receives electricity from these resources). Table 4 shows that the best prediction is given by SVRR with 2.2% reduction in MAPE for 1-day-ahead forecasting. Total daily savings of WAPDA in Muzaffarabad using ELnMPF-Models and one day-ahead load forecasts are calculated using equation (5) and 6 as below: Reductions in MAPE with corresponding reductions in cost due to ELnMPFModels using other ML algorithms are shown in Table 5 (calculated by equation (5) and (6)).
Forecasting models built using different ML algorithms give different reductions in MAPE, hence different reductions in cost, because of their own performance limitations and advantages but one thing is certain that each model shows MAPE reduction by including weather parameters in the forecasting models which leads to cost reduction. However, the cost reduction of the method with the lowest MAPE can be considered economical and effective for the utilities. Although, SVRL based ELnMFModel has the maximum reduction in MAPE and maximum reduction in cost of forecasting model but here the proposed algorithm selects that ELnMPFModel which is built using SVRR learning algorithm because Table 5 shows that SVRR based ELnMPF-Model is the one that has least MAPE value. Least MAPE value means that forecasting model has least forecasting error and hence the most AELF model. In this case reduction in cost is 0.313 million PRS per day for the utility providing electricity to the City of Muzaffarabad. Therefore, the consideration of significant meteorological parameters in the electric load forecasting models due to their influence on the electric load demand can save WAPDA around RKR. 0.313M daily in Muzaffarabad. As, WAPDA is responsible for supplying electricity to whole country and larger cities need a huge amount of energy than Muzaffarabad, so, if these findings are applied to the operational decisions of the WAPDA across  Pakistan, the economic benefits could approach billions of rupees per day.

IX. CONCLUSION
Electric load demand is associated with the uncertainties in load consumption due to human behavior related to weather fluctuations. Moreover, the cost of electricity production mainly depends upon the source such as water, petroleum products (oil, natural gas etc.), nuclear energy or wind and it cannot be stored at large scales. End users (individuals and industries etc.) are interested in cheaper price per unit while EGDCs require maximum profit. Therefore, AELF requires both weather fluctuations and human behavior towards such fluctuations reflected as historical electric load consumption to de studied together for a machine learning based AELF model. At the same time AELF may also be used to reduce the production cost of electricity. This study suggests an algorithm for the selection of least cost forecasting model and gives an exclusive and first research for correlated meteorological parameters driven electric load demand of the Muzaffarabad city. An extensive analysis on annual and seasonal basis reveals that the meteorological parameters especially air temperature and dew point temperature clearly influenced the electricity consumption in Muzaffarabad, though the effects varied depending upon the factors like time of the day and seasons. Strong correlation was found between load and dew point temperature during prominent hours of the fall season. The relationship is not significant during summertime, however good inverse correlation existed during the night hours of the winter season. The cloud cover does not show enough strong relationship with the electric load demand that it can be considered as major predictor of electric load however, small to medium level correlation is revealed for some hours of different seasons. Inconsistent and non-apparent correlation is found between load and wind speed: load and rain fall during all seasons.
Forecasting results revealed significant reduction in the prediction errors for ELnMPFModels which showed the use of meteorological parameters can increase forecasting accuracy in Muzaffarabad and model learned more effectively from load-weather relationships than using exogenous and load variables alone. The reduction in the (MAPE) is translated into economic value. The computed daily savings are RS 0.313 million for day ahead forecast of Muzaffarabad. If these results are projected to bigger cities that consume a huge amount of electricity, WAPDA can reduce its operating expenses up to billions of rupees per year.
Though this study primarily focused on suggesting an algorithm for the choice of least cost electric load forecasting model and the interrelation between load demand and meteorological parameters in Muzaffarabad city, but the general methodology and findings of this study can be tested and applied to other cities of Pakistan and even every region of the globe. Based on the findings of this study, power utilities in Muzaffarabad can make economic decisions regarding their generation operations and can decrease operating cost, allowing profitable sales and reduction in purchases of power if applicable (they are buying from the market). The findings can be replicated nationwide and globally.

ACKNOWLEDGMENT
Special thanks to WAPDA (Muzaffarabad station) and PMD (Muzaffarabad station) for providing electricity load demand and meteorological data of Muzaffarabad region, respectively.