Energy Demand Forecasting and Optimizing Electric Systems for Developing Countries

Currently, developing countries are experiencing a massive shift toward industrialization. Developing countries lack the technical sophistication and infrastructure to encourage low-carbon and sustainable economic growth because of weak public awareness, regulations, and technology. Developing countries must plan the industrialization process for maximum energy efficiency of production, thereby reducing their CO textsubscript 2 emissions significantly by increasing energy efficiency. This paper presents a systematic survey on the current pragmatic methods for forecasting the future load demands from minutes to years ahead in developing countries, following the Preferred Reporting Items for Systematic review and Meta-Analysis Protocols (PRISMA-P). The primary focus of this systematic survey paper is to provide an optimal forecasting model selection strategy for potential researchers and forecasters. Based on the strengths and weaknesses of the different models, we will discuss the most suitable methods to tailor them to multiple applications and scenarios of load forecasting. The comparison elements are Forecast horizons, Spatio-temporal resolutions, factors affecting the load, different dimensional reduction techniques, model complexity analysis, and the MAPE for error analysis. From the results, We have found ANN hybridized with meta-heuristic techniques to be superior in most of the analysis cases. ANN’s ability to handle non-linear data, flexibility, and robustness is why. Consumption data aggregated at the national level can capture trends efficiently. Meteorological and calendar features influence short-term forecasting extensively, whereas economic factors influence long-term load patterns. Finally, we have identified the trends and research gaps from the existing literature, presenting relevant technical recommendations for improvement.

alarming environmental crisis with their low-cost, pro-growth policies [2]. Firstly, developing countries have become increasingly influential in the global growth paradigm over the last three decades. Their economies have transformed from agricultural to energy-driven industries, enjoying higher economic growth over the past three decades [3]. Accordingly, the shares of developing countries in the world GDP, energy consumption, and CO 2 emissions are 19.56, 43.01, and 43.39%, respectively [4]. The findings suggest that developing countries should prioritize reducing CO 2 emissions through energy efficiency. In addition to energy efficiency, improved forecast accuracy contributes to good planning and cost savings.
Many review papers have been conducted on electrical load forecasting models using different approaches and methods and focusing on other criteria for various sectors. What they lack is targeting a specific part of the globe that needs this the most: developing and under-developed countries. The trends in power demand vary between developed and developing economies. In developed countries, demand growth is mostly driven by digitalization and electrification. In developing countries, however, income levels, industrial output, and the services sector have a considerably greater impact on the development of electricity consumption. Therefore, the forecasting techniques for developed countries is not guaranteed to provide a similar outcome for developing countries. The motivation for this particular study is to elaborate on the above-mentioned gap. This article provides a comprehensive overview of the existing electric load forecasting techniques. Table 1 compares the existing review journals and our manuscript.
This paper includes a broader scope of exploration, including load forecasting models. The proposed review discusses the current realistic modeling techniques for forecasting the future load demands from minutes to years ahead in developing countries and the exogenous variables of the forecasting models utilized by numerous researchers and presents relevant technical recommendations for improvement. Based on the strengths and weaknesses of the different models, we will discuss the most suitable methods tailored to multiple applications and scenarios of load forecasting.
The previous systematic reviews primarily provide a generalized solution for load forecasting on a global scale. Our investigation mainly focuses on modeling techniques for developing countries as they need it the most to plan their industrialization correctly. The major contributions of our paper are: 1) Highlights of the top models used for forecasting the energy demand and outlooks of the analysis cases: Data pre-processing, Spatio-temporal resolution, forecasting horizon, model complexity, and error metrics. 2) Statistical analysis of the state-of-the-art models based on various use cases. This type of analysis contributes by providing deeper insights and patterns.
3) We identified the trends observed in this domain for developing countries over the last decade and outlined the potential research gaps and recommendations. 4) Formulation of a unique framework for selecting the optimal strategy for energy demand forecasting based on our analysis and summarized the strength and applications of the models from our analysis. The outlines for the rest of the paper are as follows -Section II elaborates on the protocols and framework of our systematic review. Section III presents the findings of our systematic review. Afterward, the discussion and recommendations are presented in Section IV and V. Finally, Section VI concludes the article.

II. METHODOLOGY
We followed the Preferred Reporting Items for Systematic reviews and Meta-Analyses Protocol (PRISMA-P), a popular framework widely used across engineering and energy sciences for systematic reviews [6], [11]. The quality of every systematic review depends on the description in the protocol of the investigation's hypothesis, reason for undertaking the investigation, and planned techniques [11]. However, systematic review studies rarely describe the framework of systematic reviews [5]. A detailed, well-described structure for systematic reviews facilitates the understanding and evaluation of the methods adopted. Hence, the PRISMA model was adopted in this study ( Figure 1). As shown in Figure 1, the PRISMA presents the flow of information from one stage to another in a systematic review of the literature and gives the total number of the research identified, excluded, and included and the reasons for inclusion and exclusions.

A. STUDY SELECTION STRATEGY
Before identifying the studies to be selected, we need to define the aims, objectives, and research questions in our systematic review. Table 2 defines our study's objectives, scopes, and research questions. The next step is to define the inclusion and exclusion criteria based on which we will select the relevant studies. The inclusion criteria were not too extensive and not too brief. We set the criteria, fulfilling the scope of our study. In Table 3, we defined the selection criteria for commencing our investigation.

B. RESEARCH ARTICLE SELECTION
The potential databases we used for searching the articles are: • Scopus • IEEE Xplore • Wiley online library • ACM library • SpringerLink The search keywords embodying our investigations are: ''electric* load forecast'', ''electric* load predict*'', ''electric* demand forecast*'' and ''electric* demand predict*''. In a search engine, an asterisk (*) is often used as a wildcard character, which can match any string of characters. For example, the asterisk after the word ''electric'' would match any string of characters that follows ''electric'', such as ''electricity'', ''electrical'', or ''electronic''. This allows the search to find more relevant results, as it is not limited to searching for the exact term ''electric''. Our search for articles is limited to studies published between 2012 and 2022. For our review, we only selected Articles that are published in the English language. We chose the PRISMA flow diagram for this study because it can be used for systematic reviews of various topics and improve the quality and value of the reviews. We screen articles in accordance with the PRISMA framework. When articles are searched, they are exported as Comma Separated Values (CSV) files with their abstracts. A single file is created by combining all the records from different databases and removing all duplicates. Upon receiving unique articles, we perform the first screening by going over their titles and abstracts. A complete match between the exclusion criteria and the article is automatically dropped at the second screening. Otherwise, the article proceeds to the second screening. The second screening, which considers the full text of articles, includes and excludes them according to the criteria set.

C. DATA EXTRACTION STRATEGY FROM THE INCLUDED STUDIES
For performing the empirical analysis of the studies included, we formulated a data extraction strategy drawing out relevant information. The point of doing this is to reduce the biasness and to generalize the results. Setting pre-defined criteria helps with the extraction process. We aim to systematically evaluate the load forecasting methods for grid-connected sectors in developing countries. So, categorizing the information into  segments provides flexibility to the whole process. In Table 5, we selected the features of the articles and grouped them for quantitative analysis and drawing a conclusion from them. We stored the extracted information about the articles in a CSV file for efficient comparison and analysis.

III. RESULTS
In this section, we highlighted the findings of our systematic review. First, we provided an overview of our search results from the different databases. Secondly, we define some analysis cases which will be the basis for our analysis. There, we presented an overview of the analysis cases and finally, we performed quantitative analysis of our cases.

A. SEARCH RESULT
A total of 1841 articles are identified throughout all the databases using the generalized search string. Out of all the databases, Scopus yielded the highest article count at 808. All the other databases like SpringerLink, IEEE Xplore, Wiley and ACM digital library returned a similar number of articles ranging from 200-300. Afterward, we isolated 116 duplicate articles from the databases. However, most of the articles that ACM search engine returned were completely unrelated with 5 articles passing the initial screening. The search result that Scopus had were pretty much on point as majority of them passed the initial screening. A total of 841 articles passed the first screening with Scopus encompassing majority of the articles. Next, following the eligibility criteria in Table 3, we performed a full-text analysis of the initially screened articles. The final number of articles included in our systematic review was 111. Figure 1 illustrates the PRISMA flow  diagram of our article selection. The article selection stages for all the databases are provided in Table 6.

B. OVERVIEW OF THE ANALYSIS CASES
For the unbiased and generalized assessment of the forecasting methods, it is necessary to draw out some analysis cases. These cases will draw out critical patterns on when and under what circumstances we should use the models. The analysis cases we will use for quantitatively synthesizing the forecasting models are drawn in Table 7. The properties of the models in the 111 included studies are summarized and analyzed using the analysis cases defined in Table 7. This will provide us with a better insight into the model selection process. After the analysis, the articles are sorted using structured tables for easier accessibility.

1) FORECASTING HORIZON
Time frame is one of the most critical characteristics of load forecasting [12]. We categorized electricity load forecasting into four main types based on the forecast horizon. They are: 1) Very Short Term Load Forecasting (VSTLF): VSTLF method predicts electricity load in the next few minutes to hours [13]. Real-time safety analysis and monitoring of power systems are its main applications. It is difficult to accurately predict very short-term load factors because electrical loads are heterogeneous and random [14]. Energy Management Systems can perform economic dispatch by monitoring load frequency when dealing with load forecasts with short time horizons.

2) Short Term Load Forecasting (STLF):
STLF is a method of forecasting loads over the next one-day period to one week in advance [15]. Its main uses are economic dispatch, fuel purchase scheduling, fuel allocations, and real-time control. STLF is essential for power systems as it helps make critical decisions like unit commitment and load switching [16].

3) Mid Term Load Forecasting (MTLF):
MTLF is concerned with predicting electricity load in the coming weeks or months [17]. Its purpose is to save operating costs by planning maintenance scheduling and coordinating dispatching loads [18]. For the upcoming months, the price fixing and meeting demands, MTLF is essential. This will improve supply reliability.

4) Long Term Load Forecasting (LTLF):
LTLF method is used to plan power systems over the long run [19]. The forecasting time range is three years up to decades. A better understanding of how the system works and how economic and technological changes can affect the electricity market is essential for long-term forecasting. Figure 2 shows the applications of load forecasting over different time horizons.
From Figure 3, we can infer that most forecasting methods are mainly short and long-term focused. Predictions for the very short-term and mid-term are uncommon. It is because the industry needs short-term primarily and long-term forecasts. Short-term predictions are often used to respond to  electricity demand rapidly, while long-term predictions are used to develop strategies and plans.

2) FORECASTING RESOLUTION
Forecasting resolution is the time intervals considered for sequence-to-sequence prediction [8]. Single-step ahead forecasting predicts the following load values from historical load values. Multi steps ahead forecasting involves forecasting multiple sequence values in a time series. Power utility planners and demand controllers rely heavily on multi-step forecasting to ensure necessary generation for the next few hours or even days [20]. Figure 4 illustrates that day-ahead predictions are most commonly accessed. Since production depends on intra-day electricity market negotiations, this is especially relevant. Next comes annual consumption prediction and hour-ahead predictions. Hour-ahead predictions are significant for optimal economic scheduling of generating capacity and generator unit commitment. On the other hand, year-ahead forecasts are crucial for energy policymakers. System planning and staff recruitment also dramatically relies on the Year ahead forecasting.

3) SPATIAL RESOLUTION
It is becoming increasingly difficult to predict load profiles due to external factors such as weather, emergencies, stochastic human activities, etc. Their correlation might be explained by the distribution of load in space scales and the historical load curve. To predict future electric behavior on a larger scale, we must consider both time and space regions [21]. Throughout the articles included, the spatial resolution is determined by the smallest entity consuming the most electrical energy. In our study, we had three levels for investigating the detail of models. They are national level, province level, and city level. Figure 5 shows the distribution of entity-level consumption data used for forecasting.
The majority of the analyzed literature focuses on consumption at the national level/country level. District and citylevel consumption come next with a similar implementation rate. Our study concentrates on large-scale forecasting, so we excluded studies with consumption at residential, commercial, and appliance levels. The majority of forecasting focuses on consumption at the national level.

4) PREDICTOR/INPUT VARIABLES
Electricity and power forecasting are not always linear, especially in developing countries where energy demand and consumption are highly erratic due to a shortage of resources [22]. Multiple factors come into play when energy forecasting is concerned. External factor-independent models cannot explore the effect external factors have on the prediction and thus produce prediction errors. Therefore the majority of the time, factor-dependent models outperform univariate models. A forecasting model inhibits different explanatory variables for each application according to its forecast horizon.
People productively use power when they have to work or study on weekdays; however, due to the internet growth of gadget use, power consumption does not decrease significantly since people keep in touch with each other through social media on weekends. On national holidays, some people seem to go out for a holiday, so there is a decrease in power consumption.
The exogenous variables contributing to better prediction accuracy for VSTLF are temperature and historical load [13]. Calendar features like an hour of the day, workday, weekend and holiday, day of the week, Week number in a year, and Month number in a year are sometimes incorporated for VSTLF [1]. In Table 8, the classification of the contributing factors is illustrated.
In the case of STLF, the factors primarily responsible for influencing consumers' electricity consumption patterns are mostly load, calendar, and meteorological variables. Weather parameters have a high correlation with electricity consumption. In a study by Mir et al. [12], air conditioning generally consumes more electricity during hot weather due to increased temperatures. Conversely, in winter, electricity consumption tends to fall as temperatures drop. Nevertheless, the increase in load demand in summers is more significant than in winters, resulting in higher demand for utilities. In the case of calendar features, they mainly incorporate the effects of human activities on electricity consumption. In addition, weekends tend to be less demanding than weekdays. In another study by Han et al. [23], due to the longrange dependence on the electric load and the occurrence of certain factors once a year, such as festivals, holidays, and seasons, the load data from the same events can be used to predict the future load for events like the Spring Festival and Mid-Autumn Festival in China. VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.

39756
For MTLF and LTLF, economic variables are incorporated. As the economy develops rapidly, electricity demand is expected to increase. For example, a larger population and growing industry will affect the trends in industrial electricity consumption. The electricity price similarly affects the consumption pattern in the long run. Consumers can be significantly reduced when the price is higher than the capacity of enterprises [24]. As a second factor, price affects the distribution of intensive electricity consumption in the industrial sector.

5) DATA PRE-PROCESSING TECHNIQUES
Real-world data is always noisy, inconsistent, and incomplete. If the raw data is not pre-processed and fits into forecasting models, it is likely to give higher prediction errors. It is impossible to produce accurate and good results with dirty data. In essence, dirty data is transformed into clean data. Data pre-processing mainly deals with this noise, outliers, and missing values. To improve the performance of the forecasting model, data pre-processing consists mainly of 4 steps: data cleaning, data aggregation, dimensionality reduction, and data transformation [25]. Especially for fitting those data into machine learning algorithms, data preprocessing is essential. Data pre-processing is divided into four stages:

a: TECHNIQUES TO DEAL WITH OUTLIERS, NOISE, AND MISSING VALUES
Data cleaning mainly deals with missing data, outliers, and noisy data. Jawad et al. [26] interpolated the missing data rather than discarding them, which helps in obtaining realtime data patterns. They used the method of 'Means by nearby points'. Other ways are filling the values with the mean or the last observed value. In some cases, the duplicate rows are dropped. A famous method for removing outliers is using the interquartile range, where IQR = Q 3 -Q 1 , representing the interquartile range. In an IQR, outliers are data that lie outside the upper and lower edge lines [27]. A study used the extreme value constraint method (EVCM) to deal with outliers [28]. Zhang et al. [21] proposed a k-medoids clustering approach to remove the effect of outliers and noise. A local regression filtering technique can also be used to remove outliers [29]. However, Zheng et al. [30] replaced the outliers with an average value of adjacent points.

b: TECHNIQUES TO TRANSFORM THE DATA
Data normalization prevents errors and weights from being dispersed too widely during training. In most cases, the data must be scaled within the [0-1] range or [- 1 1]. In a study by Houimli et al. [31], Mapminmax normalization method was adopted to normalize data. Cai et al. [32] normalized the samples using the z-score normalization method. In most cases, min-max normalization is done. Normalization can reduce the error percentage by 5.99% to 7.85% [25]. Data normalization also affects the runtime of the models, which is critical for short-term load forecasting. In a study, the influence of various data normalization methods (z-score, min-max, max, decimal, sigmoid, and softmax) on the run time of the models was studied [33]. According to that study, decimal methods had the shortest run time of 32 seconds, but the softmax normalization method gave the lowest RMSE. Copula notion is another method adopted [34], where data normalization is achieved by using Copula, a pre-processing technique that enhances peak load forecasting precision. Calendar features are mostly categorical variables and not continuous in nature. When working with a regression model, calendar features need to be recoded before they can be used in the model. This is because most regression algorithms can only handle numerical data. Zeng et al. used six binary features to encode the calendar index, which stands for Monday-Saturday [35].

c: TECHNIQUES TO REDUCE DIMENSIONALITY
Dimensionality reduction is a must for improving model performance and reducing computational time. In machine learning, the concept of the 'Curse of Dimensionality' is genuine. Several algorithms have trouble training effective models when the number of features relative to observations is huge. The ''Curse of Dimensionality'' refers to this phenomenon. Features are crucial inputs to the forecasting models for getting better predictions. Dimensionality reduction has mainly two sub-categories [36]:

1) Feature Selection techniques 2) Feature Extraction techniques d: FEATURE SELECTION TECHNIQUES
Through feature selection, model complexity can be reduced, computational efficiency can be enhanced, and generalization error reduced by selecting a subset of features from the original set. Three different methodological categories can be used to classify the feature selection methods: 1) Filtering, 2) Embedded, and 3) Wrapper. The primary distinctions between these approaches are as follows: 1) the degree to which the process of feature selection is handled independently or whether it is incorporated into the learning algorithm as a whole; 2) the evaluation metrics; 3) the level of computational complexity; and 4) the ability to identify redundant features and interactions between features.
According to the bar chart in Figure 6, Correlation analysis is mainly prescribed with a percentage of 45.65%. Other techniques are RF-based, GA-based, fuzzy-based, and clustering techniques.
Wrapper methods use prediction accuracy (out-of-sample) as a quality criterion for evaluating the appropriateness of a feature subset by conducting an exhaustive search on a large pool of candidate features [37]. This contrasts the filter  method, which uses only in-sample data. Meta-heuristics algorithms are particularly used for their optimum searching ability. Genetic algorithm (GA) is one of the most commonly used feature selection techniques. In the article by Shiekhan et al. compared GA-ACO with other feature selection methods (ACO, PCA) and found that GA-ACO outperforms them [38]. Sarkar et al. compared two feature selection techniques: Pearson correlation coefficient and Recursive Feature Elimination (RFE). The findings there suggest RFE to be more accurate as it aims to find best performing feature subset and ranks the features based on the order of elimination [39]. In another study, HCSA is adopted for the feature selection process. Essentially, the best inputs-output relationship is modeled using the HCSA method, and the outputs of all models are combined to forecast the future [40]. ANFIS is another powerful fuzzy-based feature selection technique adopted [41], [42]. A genetic algorithm (GA) is one of the most commonly used feature selection techniques.
Filter methods might be viewed as a more straightforward and expedient alternative to wrappers. To evaluate the appropriateness of each feature, they simply analyze its statistical relationship based on evaluation criteria such as correlation analysis (CA) or mutual information (MI) as proxies for the model's performance metric. Pearson correlation coefficient is one of the most used methods to remove redundant features by analyzing the feature correlation matrix. Javed et al. explored various Exploratory Data Analysis (EDA) methods used to select features [43]. Auto-Correlation Analysis is one of the techniques they used for de-trending the time series. In order to demonstrate the significance of previous data, [44] applied the partial autocorrelation function (PACF) to the relationship between historical and current load. MI is ideal for the 'information content' evaluation of features in complex classification tasks since it assesses the arbitrary dependence between random variables. However, a significant redundancy issue has arisen because MI does not properly account for the interdependence between candidate variables [45]. Gao et al. proposed partial mutual information (PMI) based filter to overcome the limitation of MI [45]. It must be noted, however, that the statistical regression algorithms are insufficient to capture the space and time variations and non-linear patterns of electrical loads in the case of short-term load forecasting [43]. Clustering divides input data points into different groups based on some similarity measures. Using cluster analysis, data points belonging to the same group should be highly similar, and points belonging to other groups should not be highly similar. Bedi and Toshniwal applied K-means clustering for the electricity data to find similar consumption patterns between groups of months [25].
Embedded methods combine filter methods with wrapper methods. It is implemented by algorithms that have their own method of feature selection built-in. LASSO, tree-based, and permutation importance (PI)-based feature selection are among the most well-known examples of these techniques. Different feature subsets are challenging to predict with SVR and ANN, which makes feature selection difficult [27]. The Random Forest (RF) method of feature selection for highdimensional feature sets reduces the number of iterations, increasing the search strategy's efficiency [27], [46]. When the original feature set is utilized for training a RF model, the permutation importance (PI) of each feature for the prediction can be determined throughout the training process. On this premise, the sequential backward search (SBS) approach can be applied to choose the ideal features. Thus, when RF is utilized for load forecasting, complicated feature selection methods are not required [27]. During the model creation process, the authors utilized a feature selection technique in order to create an accurate and reasonably straightforward model. Elastic nets provide the capacity to pick aggregated or pooled variables, a feature that lasso regression lacks [12].
A concise summary of the methods is presented in Table 9. Table presents  Decomposition techniques help estimate the time series separately. There are various decomposition techniques in existence. They are discrete wavelet transform (DWT) [47], wavelet packet transform (WPT) [48], empirical mode decomposition (EMD) [13], [25], ensemble empirical mode decomposition (EEMD) [40], [49], complementary ensemble empirical mode decomposition (CEEMD) [50], variational mode decomposition (VMD) [50], and dynamic mode decomposition (DMD) [28]. Wavelet Transform (WT) provides a powerful means of managing non-stationary load behavior by converting a time series into a set of constitutive components. Saleh et al. [47] used DWT to decompose the ARIMA residuals. Its main advantage is to capture the nonlinear factors. Another advantage is its ability to retain both frequency and spatial information in the temporal resolution giving DWT the edge over Fourier analysis [51]. The application of dynamic mode decomposition (DMD) is limited to a certain extent by the sample dimension [28] as it is based on error correction. Again, EMD limits mode aliasing, leading to false distribution of time-frequency. EEMD has been proposed as a solution to EMD's pattern mixing problem. The decomposition rate of EEMD is slow, and the method cannot eliminate all white noise. However, a comparison between EMD and wavelet analysis shows that EMD is more effective than wavelet decomposition in the VSTLF, as the MAPE increases by 9.62 and 5.68%, respectively [13]. A comparison between all the decomposition techniques for multi-step ahead forecasting shows that, for one-step, three-steps, and five-step forecasts, VMD ensemble models give the best performance [50]. Nevertheless, for specific use cases, the result can be different. For example, wavelet packet decomposition (WPD) gives better results than DWT [48]. Though fundamentally the same, the detail and approximation coefficients are decomposed in WPD. This helps extract relevant highfrequency components, which is crucial for peak demand forecasting.
PCA and sensitivity analysis is done to overcome the limitations of statistical regression analysis. Cheng et al. [13] verified that using PCA on the model boosted the MAPE performance by 6.9%. A variant of autoencoder that got the limelight was a variational autoencoder (VAE). VAE adds a few restrictions to the encoding process, resulting in roughly a standard normal distribution of the generated latent vector. Yang et al. [50] compared VAE and A.E. and observed that in the region of Nanjing, MAPE decreased by 61.5%, 23.8%, and 32.1% in one-, three-, and five-step-ahead forecasts using VAE. CNN cascades features of different levels, effectively extracting relationships between continuous and discontinuous data and fusing all the levels for a compelling feature vector, which can be used to make predictions [52].

6) EVALUATION METRICS
To evaluate a model's performance, forecasting accuracy is crucial. Various error functions can determine the performance of forecasting and prediction algorithms [43]. To assess the model's performance, the following evaluation metrics are used: Mean Absolute Percentage Error (MAPE), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Mean Square Error (MSE), and R-squared value. Figure 8 shows the distribution of the metrics used for evaluating the performance of models. There is no doubt that MAPE is the most widely used metric [28]. Figure 8 also demonstrates that the articles in our investigation have MAPE as the highest implemented evaluation metric.

7) ENERGY DEMAND FORECASTING MODELS
Of the 111 studies included, nine unique forecasting models have been highlighted. Table 10 illustrates the categorization of the five unique models.
According to Figure 9, the distribution of the models is illustrated. From Figure 9, we observed that ANN had been implemented as the most used forecasting model, roughly (37.5%) out of all the papers. The time series techniques  occupy the second position (18.27%). They are followed closely by SVR models (16.35%) and the LSTM techniques (13.46%), with ANFIS coming at last (9.62%). The other models are in insignificant proportions. Indeed we can conclude that ANN and stochastic time series models are famous among researchers as they prefer their usability and applications in the load forecasting domain. This is why they are considered top models in the field. Despite their low prevalence, SVR, LSTM, and ANFIS have an established framework on which several studies have built.

a: ARTIFICIAL NEURAL NETWORK (ANN MODELS)
• Single models Traditional modeling techniques have difficulty modeling complex relationships and non-linear relationships. Due to its ability to solve the problem, ANN is more prevalent among others. Multilayer perceptrons (MLP) and backpropagation (BP) are used to design the ANN. ANN is composed of three layers: input, hidden, and output layers. There has always been a challenge in creating an ANN model for peak load forecasting because choosing the input variables, constructing the model, and obtaining training data are important considerations [53]. A successful ANN relies heavily on determining the appropriate network structure. Primary consideration should be given to the number of hidden neurons and the selection of hidden and output layer activation functions [18]. Another study adopted a new method of selecting the hidden layer neurons of feed-forward neural networks [54]. Azadeh et al. [16] proposed a seasonal ANN model for STLF that can easily handle non-linear and complex relationships, use exogenous variables as inputs, and handle corrupted data more effectively than regression models. One limitation of ANN is the slower convergence to the actual value [22]. Using a backpropagation algorithm, an auto-encoder aims to make its output values as close as possible to its input values. This technique has some limitations as well. To tackle the overfitting problem proposed by the SDAE technique, a Denoising Auto-Encoder (DAE) is adopted, which corrupts the input data to prevent the over-fitting problem. Still, the gradient vanishing problem remains. Liu et al. [55] proposed a stacked DAE model which uses a greedy algorithm to pretrain and fine-tune to fix the gradient vanishing problem. BPNN is well known for its strong generalization capability and quick convergence toward actual data. However, for multistage and weather fluctuations, the issue of low accuracy still exists. Wang et al. [56] removed this limitation by integrating a maximum deviation similarity criterion (MDSC) of time segment to BPNN. Another comparison between SARIMA and ANN shows that, even though the error trend appears to be the same, ANN with calendar data and previous loads outperforms SARIMA. Regarding predicting with a high degree of accuracy, calendar data and load feature sets work best on ANN [57]. The multivariate forecasting success is again observed in a study [58] comparing ANN with LSTM in LTLF. The usage and applicability of ANN in STLF are significant as the research community heavily relies on this method. This concludes that for unit commitment, fuel allocation, economic scheduling of generation, maintenance scheduling from demand side management, etc., ANN is pretty much reliable.
• Hybrid models Computing efficiency, complexity, and high error percentage are often disadvantages of single methods for load forecasting. Hybrid load forecasting methods and models have been developed over the years to achieve greater accuracy with a lower error rate by overcoming the limitations of single models. The purpose of hybrid models is to make forecasting more accurate, efficient, and effective by combining two or more different methods.
In MLP-based ANN architecture, one of the significant disadvantages is that it requires large MLP structures to train the actual data [9]. Various meta-heuristic methods are integrated into ANN for tuning hyperparameters. In 2021, Golilarz et al. [59] implemented PSO-ANN for STLF. Particle Swarm Optimization (PSO) also has reduced runtime and usage, i.e., faster convergence than other meta-heuristic techniques. Ardadkani and Ardehali [19] proposed an improved PSO-ANN for long-term electrical energy consumption by introducing a mutation operator to overcome the limitations. Another case study in Jordan [1] used a grey wolf optimizer (GWO) to optimize the neural network. GWO can control the optimal convergence rate and find the optimal hidden node numbers. fNN-GWO outperformed other meta-heuristic algorithms (fNN-GA, fNN-PSO, fNN-ACO, fNN-ES, and fNN-WOA). A modified GWO-ANN was proposed by Rizk-Allah et al. [60], overcoming the limitations of GWO-ANN. The main advantage is lower convergence time and error. The model's run time is lowest at 39.5s compared to other hybrid algorithms. For VSTLF applications, the ANN-LM hybrid approach was proposed to determine the optimal structure for neural networks [31]. The MAPE values ranged between 1.1%-3.4%, which confirms the performance of this algorithm as evaluated by evaluation indicators. For forecasting short-term power demand or load from developing countries such as Ghana, where power stability is a significant concern, the study [61] proposes the DWT-PSO-RBFNN method to remove the irregularities within a time series through a decomposition technique like DWT and optimizes the parameters for the prediction stage. It is capable of handling volatile and complex non-linear time series data. A unique approach, ANN-GA-WT, was proposed for dayahead peak load forecasting for Iran's national grid, [62], where the signal is decomposed through WT. Then GA was used for hyperparameter tuning the neural network. This resulted in a training time of 95s instead of 50 mins and a lower error. Another feature selection-based hybrid approach was proposed to deal with complex non-linear time series data [38]. MLP is used for hourly load prediction here, and GA-ACO is used for feature selection. Several factors are considered in this study that influence load, including the weather and calendar features. Another experiment validated in Thailand [63] proposed ANN-GA-PSO where PSO-GA was implemented in response to the limitations of BP, where it can get trapped in local minima. The introduction of GA helps PSOs escape local minima, even if they get stuck in them. ANN applications in LTLF are not as rich as STLF. Yu et al. [24] propose a hybrid PSO-GA-RBF for longterm forecasting. By self-generating a unique RBF model, it eliminates the drawbacks of traditional learning methods, such as identifying problems and selecting parameters for RBFs. Wu et al. [64] proposed an online training algorithm that uses a rolling mechanism to save computational costs. There is no need to select the inputs for the network when choosing the single multiplicative neuron structure (SMN). The forecasting model's parameters can also be updated when a new set of observed data arrives (being online) so that it can accommodate, over time, changes in the model's parameters. The only limitation is that they work on small datasets.

b: RECURRENT NEURAL NETWORK (RNN MODELS)
• Single models Recently, the application of RNN has been exponentially rising due to its effortless handling of long-term dependence, which is crucial for time series forecasting. A recurrent neural network (RNN) is a form of neural network suitable for sequential input data such as time series. RNN can retain the memory of what it has processed through temporal feedback loops in contrast to Feed-forward neural networks (FFNNs). However, RNNs have limitations that are followed up and fixed in their extensions, like LSTM, GRU, BiLSTM, BiGRU, etc. In 1997, Hochreiter and Schmidhuber proposed long short-term memory (LSTM), which many research groups modified. LSTM is a variant of RNN which acts as a solution for vanishing and exploding gradients. GRU is modeled similarly to deal with the same issue [65]. Another variant is BiLSTM which utilizes both forward and backward propagation in contrast to conventional LSTM models. When forecasting seven days in Turkey, the algorithm has a R 2 of 0.73, down from 0.94 for one day ahead [65]. However, the study also showed the limitation of LSTM in LTLF and multivariate forecasting [66]. Also, Large-scale power data, which requires many neurons, cannot be predicted using LSTM's step-by-step prediction method [66]. NARX (nonlinear autoregressive neural network with exogenous inputs) is another variant of RNN used for long-term forecasting of load [67]. Compared to other techniques under different study periods and forecasting horizons, the NARX technique proved superior to other econometric approaches in all studied cases. Another comparison between LSTM and ARMA, ARIMA, and SARIMA shows that LSTM has the slightest deviation in prediction, which is only 3.2%, opting for its applicability in generation and distribution planning [68]. Phyo et al. [29] observed in their experiment that LSTM outperformed the Deep belief network (DBN) with a yearly average MAPE of 3.79% only through smoothing out raw data.
• Hybrid models RNN models are a better choice for time series data but still have some limitations, which a hybrid model fixes. RNN is a viable option for univariate modeling using only its time lags. Validated in Bangladesh, Rafi et al. [69] adopted a hybrid model of CNN-LSTM for STLF, which utilizes the advantages of both CNN and LSTM. A study suggested a hybrid CNN-BiLSTM-AM model overcome the multivariate feature usage limitations of LSTM [70]. BiLSTM filters invalid data using the CNN layer into the max-pooling layer to choose the top features from the captured data features through feature maps with activation functions. AM (Attention mechanism) optimization limits computation resources to vital parts. Another similar model CNN-BiGRU was adopted by Xuan et al. [46], which gives a higher prediction as GRU has lower parameters affecting the prediction than LSTM. To deal with the limitations of multivariate load forecasting issues in LSTM, Javed et al. [71] proposed a hybrid SRDCC-BiLSTM with improved generalization capability for multi-step and multivariate STLF. Compared to CNN-LSTM, the proposed approach exhibits 35% greater accuracy and can capture local trends in electrical load patterns. To model the periodicity and volatility of electricity consumption data, Zheng et al. presented a Time-Frequency Variational Autoencoder (TFVAE) that simultaneously extracts frequency features and inherent temporal features outperforming other hybrid auto encoder LSTM methods [30]. For electricity demand time series forecasting, EMD-LSTM uses EMD to capture trends for non-stationary signals and deal with the non-linear features (weather, location, timing, etc.) of electricity time series data [25]. Another similar but improved approach was introduced, namely, VAE-VMD-LSTM, where in addition to reducing the data dimensions, VAE contributes to extracting adequate information from components by incorporating it into the decomposed sequence by VMD, outperforming EMD-LSTM [50]. For MTLF, since RNNs do not scale linearly, in addition to the vanishing gradient problem that still exists for LSTMs, they are ineffective when processing large time-span load series. Han et al. [23] proposed a twohybrid approach between TD-CNN and C-LSTM. The temporal correlation is extracted directly using CNN in this case. It overcomes the limitations of LSTM mentioned above while retaining most of the original data by rebuilding a new short sequence from the actual load series. This hybrid approach can forecast from 1 week to 1 year ahead with a computation time of 51 minutes, whereas conventional LSTM requires 216 minutes. For LTLF applications and multivariate analysis, it seems like LSTM is still struggling [58]. However, LSTM is a viable option for univariate modeling using only its time lags.

c: MACHINE LEARNING MODELS
According to our studies, most machine learning models are kernel-based techniques like SVR, LS-SVM, boosting, bootstrap, ensemble approaches, etc. In 1995, Vpakian (1998) introduced a support vector machine (SVM) algorithm for classification problems which can also be applied for regression using loss functions [72]. This usability for regression problems gave rise to support vector regression (SVR) models. SVR can easily handle complex and non-linear data. SVR can easily avoid trapping in local minima, a common problem in neural networks [72]. Compared with neural networks, SVR is more reliable in achieving the global optimum when working with systems with a limited amount of historical data [73]. In a study, [72] performed on the data from the Iranian national grid, a performance comparison between ANN and SVR shows that for hourly demand forecasting, SVR outperforms ANN on different days with different patterns. Zeng et al. succeeded in the integration of calendar and weather features into a hybrid. The main problem is that the SVR model improves peak load forecasting accuracy [35]. SVR model prediction accuracy is affected by parameter settings. In a study by Zhang and Guo [74], a hybrid SVR-GA model was proposed adding weather and electricity price data to their inputs. Tran et al. [33] analyzed the effects of data normalization methods on grid search-based optimization of SVR. A softmax normalization was selected in this study based on the results showing that it can significantly improve the accuracy of the grid search algorithm for SVR models during training and testing. Kavousi-Fard et al. proposed a modified FA-SVR which outperformed ARMA, ANN, SVR-GA, SVR-HBMO, SVR-PSO, and SVR-FA by increasing the diversity in the population of fireflies [75]. The decomposition technique can be adapted further to improve the optimization of the meta-heuristic tools. Such an adaption was proposed by Niu et al. [40], where they used EEMD to decompose the signal that outperformed ANN, ELM, SVM, SVM-EMD, and SVM-EEMD. The experiment was validated on four grids proving its robustness and reliability. Boosting models generated higher MAE and MAPE values than bagging models and were thus unsuitable for real-time scenarios. LS-SVM is a hybrid SVM model that uses the least square as a loss function to optimize SVM. For longterm forecasting, Compared to the MLR and ANN models, the LSSVM model achieved better results by 1.70 and 0.88% [22]. LSSVM models incorporated optimization techniques like FOA, HSA, CSA, and MFO [76], [77]. Jawad et al. proposed a unique solution to select the least cost electric load forecasting model using correlated weather variables [26]. MLR, SVR with different kernels, kNN, Random Forest, and AdaBoost were the models implemented for choosing the least cost forecasting models by minimizing RMSE, MAE, and MAPE. SVRr-based electric load forecasting is proven to have the best performance and can save PKR 0.313 million per day when used for the Lahore region in Pakistan. In one of the studies by Chafi and Afrakhte [59], a hybrid WSVM is proposed where wavelet decomposition is performed to improve the data and a sparrow search algorithm is used for optimization. For capturing the long-term trend of the data, the model must be good at non-linear curve fitting. Wang et al. proposed a hybrid DE-SVR model to overcome the curve overfitting of BPNN [78]. The proposed model captures the trend of the annual load curve pattern, quickly proving its reliability in LTLF applications.

d: FUZZY LOGIC AND CLUSTERING MODELS
In fuzzy logic systems, fuzzy numbers and sets can be defined in semantic variables to address imprecision in input and output variables [9]. In many studies, data clustering is done through fuzzy techniques. Liu et al. [79] proposed an improved version of Fuzzy C-means designed to solve the uniform effect of FCM and give a significant performance boost through robust load profiling. For VSTLF applications, Pati et al. [14] developed an incomplete fuzzy decision system an hour ahead of load forecasting using rough sets to extract relevant domain knowledge from a load-lowering computation time. To deal with uncertainties in variables like calendar and weather, a type-2 fuzzy approach is proposed [80]. Fuzzifying the dataset generates fuzzy sets, with the total number of fuzzy sets equaling the number of optimal partitions. Forecasting and de-fuzzing are then accomplished using the weighted average method. Jain et al. [81] proposed KmFuzz, which removes the limitation of finding the optimal number of partitions for clustering with an accuracy of 98%. Adaptive neuro-fuzzy inference system (ANFIS) has the upper hand on ANN in handling complex load patterns. ANFIS is easy to design and is more robust compared to conventional ANN. A case study performed in India showed that ANFIS provided a MAPE of 0.55589%, proving more accurate than other state-of-the-art models [82]. ANFIS is generally more suitable for large datasets. For LTLF applications, to tackle the non-linear behavior of input variables, a hybrid ANFIS-GEP approach is developed where GEP and ANFIS effectively combine the ability to select model structures and features in the proposed approach [41]. ANFIS-GA is another hybrid technique to deal with the dependence of clustering on the radius of the cluster by minimizing the SRC value [83]. Panda et al. adopted a hybrid EMD-PSO-ANFIS, which utilized feature selection to impact forecasting accuracy positively [84].

e: STOCHASTIC TIME SERIES MODELS
Several time-series analysis techniques are around to analyze and forecast time series, including Auto Regressive Integrated Moving Average (ARIMA), Auto Regressive Moving Average (ARMA), and Seasonal Auto Regressive Integrated Moving Average (SARIMA). Compared to other approaches, the structure is simpler, faster, and faster to develop. This approach does not consider external factors that strongly influence consumption. As load profiles no longer follow a homogeneous evolution, ARIMA models no longer work well beyond a threshold forecast horizon. The order of ARIMA models is selected using the Akaike information criterion (AIC) [85]. ARIMA model is mainly used to forecast the linear part of the load data due to the linear correlation configuration. By using probabilistic forecasts, the planner can evaluate whether the system will have the ability to generate more electricity in the future than it does today. It can be calculated when the system can exceed its current maximum capacity in such a case [86]. A probabilistic (Bayesian structural time series) approach is more appropriate for places with unpredictable load trends [86]. If results seem counterintuitive, BSTS modeling enables one to investigate the origin of the problem by determining the contribution of each component. As a solution to the seasonality issue for time series data, SARIMA has been widespread. Since the transmission peak load has a periodic distinction, SARIMA can be used for mid to long-term forecasting. A hybrid of SARIMA-GPR showed acceptable potential for Cambodia's mid-term load forecasting [87]. The main drawback of SARIMA is that it cannot differentiate between working days and holidays. However, there seems to be an advantage to SARIMA in recovering quickly from holiday effects [57].
Grey modeling is another stochastic technique used for predicting the future load pattern. They are mostly adopted for LTLF applications. The doctrinal methods are GM (1,1) and GM (1,2). A significant advantage of grey modeling is that it describes a system's behavior and reveals the continual process of change in it. Gray forecasting theory requires only tiny amounts of data. Modeling gray differentiation and timevarying differences with GM (1,1) allows the definition of a gray system of ''one order'' one variable [88]. The main disadvantage of this method is that, when a forecasting horizon is large, it uses only predicted values for predicting the next period. Therefore, Hamzacebi and Es [89] proposed an optimized GM (1,1) model and validated the robustness of the model in Turkey. For complex patterns, Lee et al. integrated dynamic programming into forecasting [90].

8) MODEL COMPLEXITY
In a forecasting system, the computational time is the total number of executions it takes to run a model [16].  A computer's processing power, data length, and forecasting method are the main factors that affect execution time. When choosing the best forecasting methods based on similar performance, the computation time of the model could be the deciding factor, especially for lower-middle-income countries where resources are somewhat scarce. In contrast to artificial neural networks, hybridization achieves better generalization results since ANNs suffer from noise, bias, variance, and inefficiency [63]. ANN, on the other hand, can achieve improved results in less time if these problems are resolved. LSTM is a variant of a deep neural network with a higher number of layers. This increases the complexity of the models and affects the computation time. In one experiment [58], the total computation time of the LSTM model was 2685.22s. In another study [69], the total computation time was 25 minutes, roughly 1500s. The inefficacy of LSTM in MTLF and LTLF applications was once again observed in a study [23], where it took roughly 50-80 minutes to forecast month and year-ahead values. Therefore it is not recommended to use LSTM without sufficient processing power. Time series techniques are computationally inexpensive due to their simplicity. As a result of a study conducted [91], longterm forecasts took 1.35 seconds and daily predictions for four years took 153.35 seconds. SVR models are computationally expensive due to higher convergence time and the selection of optimal parameters. However, when hybridized by optimization algorithms, the limitations can be removed. Integrating DE and FOA with SVR reduced the time to 78s and 17s [77], [78].

C. QUANTITATIVE ANALYSIS OF THE CASES
This section illustrates the statistical analysis of the cases in Table 7 with bar and box plots.

1) RESEARCH TIME FRAME ANALYSIS
In Figure 10, we observe that load forecasting research has boomed since 2017. Recent advancements in smart grid technology can be one of the reasons for that. A robust global economy and higher heating and cooling demands in some parts drove a two-fold increase in global energy consumption in 2018. With renewable energy incorporated into the grid, carbon emissions are reduced. Solar and wind energy open limitless opportunities for clean energy as the world introduces renewable energy sources. In 2020, every world sector took a hit due to the pandemic (COVID-19), especially the GDP sector. As GDP is an essential factor affecting the energy sector, researchers all around started to contribute to it so that the world (especially the developing countries) can quickly get back on its feet.

2) FORECASTING TECHNIQUE ANALYSIS
Starting with this section, a statistical comparison has been conducted based on the number of appearances the forecasting models had over the last decade. Previously, we considered the top 5 models based on many appearances for comparison cases. For further analysis, we created a box plot of MAPE values by categories of techniques validating our previous result. All five models show MAPE below 4%. From the bar chart in Figure 11, we can see that all the techniques have STLF applications. ANN models show an overall greater implementation rate over all horizons. LSTM and ANFIS provide us with a robust model for STLF applications only. SVR shows implementation over all horizons at a low MAPE. ANN, Time series techniques, and SVR models are reliable for mid-term load forecasting. For longterm forecasting, ANN and Time series techniques provide us with more accurate models. Finally, ANN and LSTM models are somewhat reliable for very short-time forecasting.

3) FORECASTING TOOLS ANALYSIS
Forecasting tools become an essential factor when it comes to reproducing the experiment. For utility companies, implementing an online forecast system automatically updates the training dataset with real-time electricity demand observations each day so that the execution time can be increased. Table 11 illustrates the tools used in the review papers. While most articles do not specify the tools used, MATLAB is the most convenient tool. Limited articles adopted TensorFlow for modeling and training the deep learning models, specifically LSTM.

4) INPUT FACTORS ANALYSIS
In this section, the impact of different inputs combination is presented. From Figure 12, it is pretty evident that the historical load data will always be there in every combination. For LTLF applications, economic factors affect forecasting accuracy significantly in ANN and ANFIS models. Otherwise, the load variables alone successfully model techniques like ANN, SVR, and time series. For MTLF applications, meteorological and calendar features are seamlessly incorporated into all the models other than ANFIS. ANFIS can utilize the economic variable in this horizon to some extent.
For STLF applications, it shows a strong correlation with meteorological features and calendar features for both ANN and SVR models. ANN model especially shows its flexibility by introducing most of the input combinations. LSTM models can only forecast using the historical load due to their longterm dependency. Exogenous variables are rarely used in time series analysis. For VSTLF applications, all the exogenous variables other than economic variables affect the forecasting accuracy of the models. Most models are designed to introduce exogenous variables for better performance, but they are not necessary for the model to function correctly.

5) SPATIO TEMPORAL ANALYSIS
Besides representing the level of detail in the forecasting models, Spatio-temporal properties are essential criteria for model selection since the entity level of consumption data significantly affects the resolution and horizon of a model's outputs. From Figure 13, the influence of spatial detail on the forecasting model and horizon is visualized.
Based on an analysis of the spatial resolution in Figure 13, it can be concluded that there is a general tendency to model energy consumption on a national scale for all forecast horizons. Except for short-term forecasting, which emphasizes every level of consumption, relatively few approaches focus on regional and city-level consumption. There is a strong tendency for ANN techniques to be used at any level, but mainly at the national level. It demonstrates how flexible this technique is. For LTLF applications, ANN models have been scaled to the national level in half of the cases through extensive socio-economic data. However, time series  methods are widely used nationally, reflecting their ability to capture trends at large geographic scales. Concise time horizons are the only timescales for which LSTM models are implemented. Figure 14 dictates that forecasting on a daily and annual basis is more prevalent in the analysis of forecasting time resolution. Having daily steps allows for the inclusion of effects of human activities impacting consumption, such as daily routines, while still managing the amount of data. This could be because daily steps offer a reasonable compromise between the level of detail, availability, and data quantity for most projects. For the case of yearly steps, the socio-economic factors tracked yearly can be incorporated for accurate long-term analysis. Sub-hourly to week-ahead forecasting is effective using RNN techniques like LSTM. Additionally, Figure 14 shows that time series techniques VOLUME 11, 2023 39765 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.   are more commonly used as time steps increase, whereas AI techniques usage remains the same more or less. This study again shows how flexible ANN approaches are, regardless of the time resolution.

6) FORECASTING ACCURACY ANALYSIS
For our particular investigation, we chose MAPE as our performance metric. From Figure 15, we can observe that consumption data from the national level produces the lowest error consistently over all horizons. Data aggregated at the city or area level gives a bit higher MAPE. Based on their smoothness and tendency to display seasonal or trendrelated patterns, aggregated loads at the level of countries are more accessible to predict than lower-level aggregated loads. Human behavior and its influence on the load consumption pattern depend on varying degrees of randomness, which naturally produces lower precision for disaggregated loads. The boxplot in Figure 16 shows that for increasing time steps, the forecasting model performance is more or less the same. However, the error is lower for yearly time steps due to smoothness and its tendency to show seasonal patterns. For daily or lower time steps, the introduction of various exogenous variables affecting the load curve is incorporated, producing an overall lower error in these temporal resolutions [70], [92].

7) STUDY VALIDATION COUNTRY ANALYSIS
From Figure 17, we can see an unequal distribution of research performed in different countries. China is leading the research of energy sector forecasting among developing countries. In contrast to other middle countries, China has already invested heavily in renewable energy and sustainability. Because of this, China boasts exceptional wind power resources and is the world's largest producer of renewable energy electricity. While countries like China, Iran, Turkey, India, etc. have rapid progress in this sector, several countries are in the singularity region.

IV. DISCUSSION
Large-scale forecasting is not as easy as it seems due to numerous factors influencing the model's complexity, robustness, and accuracy. The analysis done in this study revolves around different cases. Our investigation thoroughly analyzes all the different energy demand forecasting models under various scenarios. For ease of analysis, we chose the models most prevalent among the research community for the analysis cases. The models which are most frequently adopted is a proof of their flexibility and robustness. An essential aspect of decision-making is the accuracy of prediction, which allows stakeholders to understand the performance of the methods and making choice of the model more plausible. We chose to compare the model's performance using MAPE, which does not necessarily dictate which model is the best, as there is no universal metric for evaluating a model's performance. Table 13 outlines the strength and applications of the forecasting models summarized in our analysis. Table 12 and 14 provide a comprehensive forecasting model selection framework for various scenarios in UMICs and LMICs based on our analysis.
As stated before, the whole point of this systematic review is that we can answer the generic question we formulated. For example, let us generate a scenario based on which we can answer that question. Sample scenario: Suppose we want to forecast the hourly electricity demand on a national level for a developing country like Bangladesh. From Table 12, we can evaluate the scenario step by step: 1) Bangladesh is a lower middle-income country.
2) Beforehand, it is recommended to deal with outliers or missing data and perform data normalization always.
3) Now, referring to Table 12, we can see both univariate and multivariate options available for an hour ahead forecasting on a national level. 4) The ANN and SVR models perform well with correlation analysis for dimensionality reduction. 5) Now, two approaches are available: ANN and SVR.
Both are computationally expensive. We know that computation time is a crucial aspect of hourly or daily forecasting. So any of the two models can be utilized with the high processing unit. They can always refer to the corresponding article for more details on the model structure. So the solution to the scenario is utilizing the SVR or ANN model with historical load consumption data as the input features and using correlation analysis as a feature reduction technique. For the model detail, they can use the references from Table 12.

A. KEY FINDINGS
• STLF and LTLF applications are on the rise due to their versatility. While short-term forecasting is necessary for energy purchase and operations scheduling, long term focuses on financial and distribution planning in the long run. Similarly, Day-ahead and annual forecasting are on the rise. Day-ahead is most relevant in the case of optimal generator unit commitment, whereas annual forecasting helps in the system and economic planning.
• When it comes to LTLF applications, economic factors play a prominent role in forecasting accuracy. Forecasting accuracy for mid to short-term applications is affected to some degree by meteorological and calendar features. Historical load data alone can produce good results in time series techniques.
• Across all forecast horizons, energy consumption is typically modeled at the national level. A disaggregated load has a lower degree of precision due to human behavior and data granularity. For long-term forecasting, data aggregated at the city level gives a higher error. Daily steps allow human activities that influence consumption, such as daily routines, to be considered.
Since seasonal patterns tend to show up for yearly time steps, the error is lower because of the smoothness and seasonality. VOLUME 11, 2023 39769 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
• Data normalization is essential for increasing accuracy and reducing computation time. It is recommended to use data normalization beforehand fitting into the model. Dimensional reduction plays a significant role in improving the accuracy and computation time of the forecasting models. Most techniques adopt correlation analysis for feature reduction. Clustering-based grouping helps in detecting load curve trends.
• For multi-step forecasting, VAE and VMD-based feature extraction techniques significantly reduce the MAPE. Peak load forecasting is performed more efficiently using WPD than DWT.
• Hybrid models have been observed to overcome the limitations of single models, though they sometimes introduce complex structures. Time series techniques are computationally inexpensive compared to other models due to having a non-complex structure.
• Among all the optimization techniques, PSO is the most conventional and provides minimal MAPE. In addition to having faster computation time and ease of implementation, PSO is superior because of its ability to converge more quickly than other optimization techniques.
• Broadly speaking, ANN is the most conventional technique due to its flexibility and ability to handle nonlinear data. ANN can be incorporated with optimization techniques to reduce its complexity and computational cost limitation. Time series methods are most commonly applied at a national level because they capture trends at large geographic scales.
• Surprisingly, the simulation modeling technique, a commonly used method for load forecasting, has shown scarcity in developing countries. The most probable reason for this is the computational requirements of simulation models. This can also be a barrier in a developing country with limited access to advanced computing equipment. These models often require significant computational resources to run, which may not be available in such a setting.

V. RECOMMENDATIONS
Based on the findings of the previous sections, this section discusses quite a few areas that could be explored in future research.

A. PERFORMANCE METRIC
The lack of a universal evaluation metric is something that should be explored. Every evaluation metric has a different focus. For example, R 2 determines how well a model fits the dependent variables and considers the overfitting problem. RMSE, on the other hand, determines the goodness of fit. So each metric has its unique traits. So it is recommended to use a specific set of metrics to compare the models correctly.

B. MODEL ADAPTABILITY
So far, minimal literature exists on adaptive models, which will automatically adjust with the change of model parameters and input data. Adaptation to load behavior should be possible quickly in the model. It is vital to ensure that the model is suitable for various situations. It is also crucial that the model be reliable, as it should be free of unreasonable forecasts caused by exceptional circumstances. It is advised to focus on the effortless adaptability of the model to an energy management system.

C. SPATIAL DISTRIBUTION OF RESEARCH
The majority of the research is conducted in China, Iran, Turkey, and India. Most of the other developing countries are in the singularity region. The forecasting models work differently for different countries as the load consumption pattern, climate, and economy are inconsistent. It is highly suggested that future researchers from all over the world explore this research gap in their respective countries.

D. MODEL REPRODUCIBILITY
Reproducibility is another major limitation of the current studies. Most of the literature does not specify the experiment setups, including the computational time, tools used, source code, and the computer specifications in which the experiment is validated. This drawback makes the reproducibility of the models harder. Researchers should consider making their source code and experiment setup details publicly accessible.

VI. CONCLUSION
The planning cycle of an electric utility starts with forecasting energy demand and consumption. Developing countries face complex challenges in forecasting demand and energy as the grid is experiencing a change of shape as technology turns a demand-driven energy system into a generation-driven system. Several factors contribute to the problem, including a high load growth rate, new technologies at the consumer level and differences in consumption levels and modes between cities. Because of limited safety margins and increased risk levels, forecasting energy demand has become more critical than ever for the future grid. Our aim is to identify the energy demand forecasting methods relevant to the various use cases from a developing country's perspective, identify the potential research gaps, and offer recommendations to future researchers and forecasters. To this end, we performed a systematic review of energy demand forecasting articles over the last decade following the PRISMA protocol. The prevailing models identified so far in our investigation were: SVM, ANN, LSTM, ANFIS, and time series techniques. To comprehensively assess the models, we concocted various analysis cases designed to generate deeper insights. ANN has the most overall appearances for different use cases due to its flexibility and ability to handle non-linear data. For a better outlook on the performance of the models in the real-time systems, we considered the model complexity alongside the evaluation metrics. A framework was fabricated based on the findings to determine the optimal energy forecasting method for different scenarios with references. For a better understanding, we showcased the strength and applications of the forecasting models. We expect this article to be of significance to future forecasters and researchers in the domain of energy demand forecasting in developing countries.
For future works, we intend to explore the existing adaptive load forecasting models and how well they perform with changing conditions in smart grid environments. As developing countries are transitioning towards smart grid adaptations, the research can have a lasting impact and can be used to inform and guide future work in the field.

APPENDIX A
In this section, the works of literature are categorized based on the analysis cases, making it easier for the researchers to access the reviewed studies directly. Table 15 grouped the literature based on factors affecting the load for various VOLUME 11, 2023 39771 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.   Table 16 and 17 the articles are sorted based on Spatio-temporal granularity.