A Review of Multitemporal and Multispatial Scales Photovoltaic Forecasting Methods

Reliable photovoltaic(PV) forecasting can provide important data support for power system operation, which is the key to realize the large-scale consumption of solar energy resources. PV forecasting task becomes crucial to ensure power system stability and economic operation. This paper reviews the existing research of PV forecasting methods from the perspective of multi-temporal scale and multi-spatial scale. Firstly, according to the forecasting process, demand, temporal and spatial scale, the forecasting methods are classified and the evaluation indicators involved in the research are listed. Secondly, based on the temporal scale of PV power generation, the results are combed through the three kind of scale of ultra-short-term, short-term and medium and long-term prediction. Thirdly, on each kind of temporal scale, the results are subdivided into single-site prediction and regional prediction to sort out in detail. Finally, the results are analyzed on the basis of the predicted temporal scale, spatial scale and input data. It has been observed that most recent papers highlight the importance of short-term predictions. The machine learning method shows excellent nonlinear description ability in short-term prediction, the prediction results are satisfactory. The spatial average effect of regional prediction reduces the variability of solar energy, the prediction results are reliable.


I. INTRODUCTION
Solar energy engineering is one of the energy alternatives for global sustainable development goals, photovoltaic panels are the biggest driving force for the rapid growth of solar power generation [1], [2]. According to the ''Renewable Energy Installed Capacity Statistics 2020'' report released by the International Energy Agency, the total global installed photovoltaic capacity reached 578GW in 2019, and the new global installed capacity of photovoltaic is expected to have a year-on-year growth of 14% in 2020 [3]. However, photovoltaic systems are easily affected by environmental factors such as sunlight, seasons, time, geographic location, etc., resulting in the characteristics of stochastic, intermittent and variable of photovoltaic [4]. These characteristics will reduce the efficiency of photovoltaic output and the power The associate editor coordinating the review of this manuscript and approving it for publication was B. Chitti Babu . quality of grid connected photovoltaic. The reliability and stability of the power supply connected to the main network will also be affected [5], [6]. With the penetration of largescale solar energy resources into the power grid, how to ensure the effective consumption of variable and stochastic solar energy resources under the normal operation of the power system is an urgent problem. Reliable prediction of photovoltaic can provide important data support for power system operation, which is the basis and key to realize the large-scale consumption of solar energy resources [7], [8]. Photovoltaic forecasting can help to adjust the scheduling plan timely and improve the efficiency of solar energy utilization. Improving the accuracy of photovoltaic prediction is of great significance to the safe and stable operation of the system and the maintenance of power quality [9], [10].
The existing research has analyzed the prediction methods of photovoltaic and solar radiation from different angles. From the perspective of forecasting scales, some studies VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ consider the temporal and spatial scales. On the temporal scale, Wang et al. [11] reviewed the photovoltaic prediction methods based on deep learning, and they compared and analyzed the prediction effects of different deep learning models under the time scale. Barbieri et al. [12] summarized the specific research and application of large-scale photovoltaic grid connected power generation field. According to the grid operation considered, the authors defined the prediction time range and spatial-temporal resolution. Ellahi et al. [13] investigated the photovoltaic prediction technology for the availability of resource supply in different periods. They considered the supply scheduling mechanism extracted from these resources. At the spatial scale, Antonanzas et al. [14] summarized the solar power generation prediction methods from the perspective of time prediction range from hours to days or weeks and spatial scale from single site to regional power group prediction. Furthermore, the prediction accuracy was included in the economic evaluation of the impact on the power grid. Pazikadin et al. [15] summarized the application of artificial neural network in the prediction of solar power generation, and they analyzed and discussed the instruments for measuring solar irradiance. Based on the temporal and spatial scales, the authors evaluated the prediction performance of different types of neural networks for photovoltaic. In view of the vulnerability problem caused by intermittent solar and wind energy to power grid, Ssekulima et al. [16] introduced the current main prediction methods of physics, statistics and artificial intelligence in time and space, and then proposed key prediction problems and future development trend. From the perspective of forecasting methods, conventional methods and ensemble methods are usually considered. For a single method, Ssekulima et al. [16] and Sobri et al. [17] reviewed photovoltaic prediction technology from the perspective of current main physical models, time series statistics and artificial intelligence. Compared with different prediction methods, artificial intelligence method that can solve the nonlinear and complex structure of data has been widely used. Digne et al. [18] introduced the prediction technology based on cloud image or numerical weather forecast model when they reviewed the current solar radiation intensity prediction methods. In addition, Dliento et al. [19] reviewed the research progress of photovoltaic system monitoring, diagnosis and artificial neural network prediction methods in recent years. Akhter et al. [20] reviewed and compared machine learning and meta-heuristic methods based on historical data, prediction horizon and input parameters. Moreover, they summarized the advantages and disadvantages of various methods. Das et al. [21] and Mellit et al. [22] comprehensively and systematically reviewed the direct prediction of photovoltaic, and they discussed the importance of input-output data correlation and model input data preprocessing. The authors considered the advantages and disadvantages of different prediction models, which include the hybrid model and the performance matrix of the evaluation model. Law et al. [23] summarized the prediction accuracy of numerical weather prediction model, time series analysis method, cloud motion vector and hybrid method.
From the perspective of other reviews, Viscondi et al. [24] conducted a literature review on the big data model of solar photovoltaic prediction by using the method of system literature review (SLR), mainly introducing machine learning and data mining technology. The authors aim to evaluate the most applicable and accurate technology for photovoltaic prediction. Yang et al. [25] introduced how to use text mining technology for studying solar radiation and photovoltaic power prediction. The proposed text mining technology can be transferred to other solar engineering topics. Van der Meer et al. [26] mainly introduced the latest progress in solar energy probabilistic forecasting and load forecasting. Furthermore, the research focused on uncertainty prediction and probabilistic prediction. Lai et al. [27] analyzed three types of photovoltaic forecasting demand (point forecast, interval forecast, probability forecast), and they respectively sorted out, categorized, summarized and reviewed the current forecasting methods and measures of forecasting effects. The factors affecting the prediction accuracy of photovoltaic power generation and the typical technical means to improve the prediction accuracy are summarized.
Distributed grid related operations which include unit commitment, planning, scheduling, load balancing and load following are performed on different spatial and temporal scales. Photovoltaic forecasting related to these operations also needs to be considered about different spatial and temporal scales. Existing photovoltaic forecasting review papers analyze and study photovoltaic forecasting methods from the perspective of time scale or space scale, but they do not combine time scale and space scale to discuss photovoltaic forecasting methods. In this paper, the photovoltaic prediction methods will be taxonomically and comparatively studied from the view of combining time scale and spatial scale, meticulous to the comparison of methods at the same spatial scale on different time scales and the comparison of methods at different spatial scales on the same time scale. The differences and correlations of PV prediction methods between different time scales and different spatial scales are discussed.
The structure of this paper is as follows: Section II classifies the photovoltaic prediction method according to the prediction process, forecast demand, forecast spatial scale and temporal scale, and evaluation metrics involved in this study are listed. According to the time scale, photovoltaic forecasting methods are divided into three categories: ultrashort-term(Section III), short-term(Section IV) and medium and long-term(Section V). In each section, the prediction method is subdivided in terms of the spatial scale (singlesite, regional). Section VI discussion the correlations and differences of prediction methods on the same spatial scale at different temporal scales and different spatial scales at the same temporal scale. Section VII summarizes and concludes the paper.

A. CLASSIFICATION OF PHOTOVOLTAIC FORECASTING METHODS
With a large number of photovoltaics connected to the grid, accurate photovoltaic forecasting method can provide information support for grid dispatching plans and reduce the phenomenon of ''abandonment of solar energy''. According to different forecasting standards, forecasting methods can be divided into many categories. The classification of photovoltaic forecasting methods is shown in Figure 1. Based on the different forecasting process, it can be divided into direct forecasting and indirect forecasting [28]. For different forecasting needs, it can be divided into point forecasting, interval forecasting and probability forecasting [29]. According to different forecasting time scales, it can be divided into ultra-short-term forecasting (0-4h), short-term forecasting (4h-72h) and medium and long-term forecasting (1m 1y) [30]. In the light of different forecasting spatial scales, it can be divided into single Site prediction and regional prediction [31]. For different forecasting processes and different forecasting demands, photovoltaic can be predicted from the perspective of temporal-spatial scale. This article will focus on comprehensively combing, summarizing and reviewing photovoltaic forecasting methods from the perspective of different temporal scales and spatial scales.

B. EVALUATION METRICS OF PHOTOVOLTAIC FORECAST METHODS
The performance and accuracy of the photovoltaic prediction model can be evaluated through several metrics [32]. According to different forecast demands, there are different evaluation metrics. The evaluation metrics involved in this research are shown in Table 1.

III. ULTRA-SHORT-TERM PHOTOVOLTAIC FORECAST METHODS
Ultra-short-term forecast refers to the forecast time range between 0 and 4 hours, and its time resolution is small. When sudden large-scale fluctuations in photovoltaic occur, accurate ultra-short-term photovoltaic prediction can obtain power transient information, which can provide powerful data support for the power sector to arrange reasonable preparatory measures, so as to avoid the transient problems of power grid and reduce the impact of grid connection. The main challenges of ultra-short-term photovoltaic power prediction are as follows: 1) the time scale of ultra-shortterm prediction is small, the generation, dissipation and movement of clouds are the main reasons for the fluctuation of photovoltaic. How to mine the relationship between cloud and photovoltaic is one of the means to improve the prediction accuracy. 2) Due to the characteristics of fluctuation and stochasticity of photovoltaic power generation and the lack of appropriate descriptive models, it is difficult to provide a reference prediction model for researchers. Therefore, researchers need to keep trying to find out the factors that affect photovoltaic in accordance with different prediction situations, and then formulate appropriate ultra-short-term prediction model for photovoltaic. The following analyzes the ultra-short-term photovoltaic prediction methods from the perspective of spatial scale (single-site, regional).

A. SINGLE-SITE PHOTOVOLTAIC FORECAST METHODS
Photovoltaic utilizes the photovoltaic effect, and ground irradiance is the decisive factor that directly affects the photovoltaic output. Single-site ultra-short-term photovoltaic forecasting methods are mainly divided into two categories: cloud image-based and data-driven.

1) SINGLE-SITE PHOTOVOLTAIC FORECASTING METHODS BASED ON CLOUD-IMAGES
Photovoltaic is easily affected by meteorological factors. When the forecast time range is less than 4 hours, the generation, dissipation and movement of clouds are the fundamental factors to generate the fluctuation of photovoltaic output. At the same time, the solar radiation received by the photovoltaic power station will also be influenced by the geographical environment. In order to study the relationship between cloud cover and ground irradiance, researchers often adopt cloud image-based prediction methods, which mainly combine ground-based cloud images, satellite observation cloud images or radar observation cloud images. Firstly, the cloud above the photovoltaic power station is observed by image acquisition instruments (all sky imager, satellite, radar, etc.). The cloud images are collected according to the fixed sampling time. Then the cloud image is processed and classified by pattern recognition. Finally, the movement trend of cloud clusters is predicted to provide data support for ground irradiance prediction. The flow chart of prediction algorithm based on cloud chart is shown in Fig.2.
Hu et al. [33] proposed an ultra-short-term prediction model that based on the dynamic characteristics of overcast clouds. The motion vector was used to predict the motion trajectory of the cloud, and the cloud blocking the sun would be selected. Digital image processing technology was adopted to extract the dynamic characteristics of the target cloud that has a greater impact on photovoltaic power generation. Meteorological data and historical photovoltaic data were chosen as radial basis function (RBF) neural network model inputs to predict photovoltaic. Venugopal et al. [34] established a model based on convolutional neural network (CNN) to predict the power output of photovoltaic panels in the next 15 minutes. In addition, the input datas were photovoltaic power output data and ground Sky images in the past 15 minutes. Ai et al. [35] emplaced a fisheye camera to collect sky images. The authors implemented optical flow and an adaptive threshold scheme for cloudy days to predict cloud movement. They build a support vector machine (SVM) prediction model to predict solar irradiance. Chu et al. [36] proposed an automatic Smart Adaptive Cloud Identification (SACI) system for sky image and solar irradiance prediction. The system applied offthe-shelf fisheye cameras for deployment, and it adopted Smart Image Classification (SIC) algorithms to combine sky images and solar irradiance measurements. The cloud layer retrieved by SACI is used as the input of the artificial neural network (ANN) model to predict the global average irradiance.

2) SINGLE-SITE PHOTOVOLTAIC FORECASTING METHODS BASED ON DATA-DRIVEN
The stochasticity and variability of photovoltaic mainly occur in the period with short prediction time scale. The output power of photovoltaic will change due to the location, environment, forecast time and season of photovoltaic power station, and it will be difficult to detect because of the impact of noise. Therefore, the ultra-short-term prediction mainly considers how to mine these characteristics from the data and deal with the current sequence to match the features for getting the predicted value. The mining features can also be used to summarize the error change patterns between the predicted value and the real value so as to help correct the predicted value. The framework is shown in Fig.3.
Data-driven ultra-short-term prediction models include: decision tree (DT) models, regression models, markov chain (MC) models, neural network models (NN) and hybrid models. These models can achieve multi-step predictions. For decision trees, Ibrahim et al. [37] adopted Random Forest (RF) technology to predict the hourly solar radiation, and then they used the firefly algorithm to optimize the number of trees and leaves in each tree. Compared with the prediction data of ANN and SVM, the experimental results of RF model are closer to the real data. In terms of regression model, Yang et al. [38] used the lasso model to automatically reduce and select the appropriate lagging solar radiation time series data for regression. This model predicted the solar radiation for 5 minutes at a time interval of 1s. The experimental results show that the proposed model is significantly better than univariate time series models and least squares (LS) regression, especially in the case of less training data and more predictors. Regarding the MC model, Hocaoglu et al. [39] assumed that solar radiation data appeared repeatedly in historical data, and then they adopted Mycielski algorithm to check historical solar radiation data for predicting the data in time. In order to ensure that all sub-patterns obtained from Mycielski most likely history of the sub-mode, the authors took advantage of the MC to transform the measurement data into state. About the neural network model, Qing et al. [40] proposed a new solution for day-ahead solar irradiance forecasting using weather forecast data. The forecasting problem was described as a structured output forecasting problem that predicts multiple outputs simultaneously. This scheme used the data collected by the numerical weather prediction (NWP) as the input of the long short-term memory (LSTM) prediction model. In order to improve the prediction accuracy, the correlation between several consecutive hours in the same day was considered. For the hybrid model, Chen et al. [41] proposed the K-nearest neighbor(K-NN) method to predict the meteorological data of 8 stations near the target station. The authors applied the ANN model to model the available relationship between the K-NN prediction and the solar radiation value of the target site to predict the solar radiation value of the next hour.
B. FORECASTING METHODS OF REGIONAL PHOTOVOLTAIC POWER STATIONS Regional photovoltaic power station group usually refers to the collection of a certain scale of photovoltaic power station in geographical location. These photovoltaic power stations are often located in similar or related geographical locations, and the power generated is usually packaged and connected to the grid. The power generation of each photovoltaic power station in the photovoltaic power station group is related due to the connection of meteorological change process.
Based on the regional prediction process, the prediction methods can be divided into three categories: 1) superposition method: the output of all photovoltaic power stations in the region is predicted, and then the regional output value is obtained by simply adding the predicted results of single VOLUME 10, 2022 photovoltaic power station. 2) Extrapolation method: firstly, the distributed photovoltaic power stations in the whole region are divided into several sub regions. Secondly, the historical data which is most similar to the current irradiance is selected as the matching object [42], and the output value of each sub region is predicted. Finally, the output value of the whole region is obtained by summing the output value of each sub region. 3) Statistical upscaling method: Firstly, the whole region is divided into molecular regions, and the representative power stations which can represent the sub region are selected in each sub region. Then the output value of each representative power station is predicted. Finally, the weight coefficient of each representative power station is computed by mathematical statistics method. The predicted output value of regional photovoltaic power station group is calculated by weighting.
Researchers choose clustering algorithm to divide the whole region into several sub regions. In order to obtain the spatial mapping relationship between the target photovoltaic power station and the reference photovoltaic power station, Zhang et al. [43] adopted hierarchical clustering algorithm to judge and match the spatial correlation between photovoltaic power stations. The back propagation neural network algorithm was applied to predict the ultra-short-term power of the reference photovoltaic power station, and the predicted results were used as the input of the spatial mapping relationship to obtain the output power of the target power station. Jiao et al. [44] took cluster analysis method to divide photovoltaic power stations of each type of weather type according to the region, and then they formed a group of time-space photovoltaic prediction through distributed power stations with meteorological consistency. Saint et al. [45] proposed a probability area photovoltaic model, which uses the average photovoltaic model with very limited input number (two module orientation angles) to calculate the generation of the most frequent module orientation angles. The probability of their occurrence was finally weighted according to the generated power value to estimate the actual photovoltaic output. Saint et al. [46] took k-means clustering method to divide the region into multiple sub-regions and used bayesian method to parameterize the regional photovoltaic model. The deviation of the initial guess and variance matrix is restricted to standardize the linear system. Generally speaking, cluster analysis consists of the following four basic steps, which are equally important and closely related, and a feedback pathway that indicates that cluster analysis consists of a series of trials and repetitions rather than a one-shot process (see Fig.4).

C. SUMMARY OF ULTRA-SHORT-TERM PHOTOVOLTAIC FORECASTING METHODS
This section mainly analyzes the ultra-short-term photovoltaic forecasting methods, which are divided into singlesite prediction and regional power station group prediction according to spatial scale. For single site prediction, the photovoltaic forecasting methods based on cloud images and data-driven are summarized. About regional power station group prediction, the prediction method has spatial correlation obviously. The relationship between single photovoltaic station and the whole power station group is analyzed, and the regional ultra-short-term prediction method is described.
As the prediction time scale is 0-4h, the fluctuation of single-site prediction is caused by the movement and dissipation of clouds. The cloud images-based prediction method uses cloud amount and historical data as the input of the model so as to improve the prediction accuracy. The prediction method based on data-driven improves the prediction accuracy from the perspective of data-driven. It selects the model with strong ability to deal with nonlinear predict problem. Due to the wide coverage of the regional photovoltaic power station group and the inconsistent weather changes in the area, the idea of clustering algorithm can be fully utilized to divide the entire area into multiple subareas. In addition, the sub regions can be predicted to fit the output power curve with higher accuracy. Table 2 compares the prediction methods, input data and prediction errors in different spatial scales for ultrashort-term photovoltaic forecasting. In terms of prediction error, there is no uniform evaluation standard for the prediction error of solar radiation and photovoltaic in the world. In addition, Sources of used data are not the same. Therefore, there is no comparability between the prediction errors.

IV. SHORT-TERM PHOTOVOLTAIC FORECAST METHODS
Short-term photovoltaic prediction refers to the prediction time range of 4h 72h. The prediction of short-term photovoltaic can provide help for power system dispatching and management departments to formulate daily generation plan of conventional power supply and adjust maintenance plan. It is helpful to optimize frequency modulation and spinning reserve capacity. Moreover, accurate prediction can provide effective data support for micro grid energy management, power grid dispatching management, pricing, load management and other fields. For the short-term photovoltaic power forecasting, there are two main difficulties: 1) Owing to the influence of meteorology and environment, short-term forecast presents randomness and uncertainty. How to explore the relationship between photovoltaic, meteorological factors and environmental factors is one of the main difficulties. 2) At present, researchers mostly adopt shallow learning method to predict. But the learning ability of this kind of algorithm is limited. It cannot show excellent prediction level in the deep feature training process, and the prediction accuracy is difficult to improve. In order to improve the prediction accuracy of the model, researchers need to make use of the existing knowledge to solve the problems of data processing, feature extraction and parameter optimization. In this section, the research results of short-term photovoltaic forecasting methods are introduced from the perspective of spatial scale (single-site, regional).

A. SINGLE-SITE PHOTOVOLTAIC FORECAST METHODS
Due to the short prediction time scale, the factors affecting the short-term prediction accuracy of single photovoltaic station generally include the fluctuation and non-stationarity of photovoltaic power generation process and the selection of data interval and reconstruction step size in the prediction process. Finding the variation characteristics in the process of unstable solar radiation or photovoltaic is the key to prediction. It can eliminate the influence of non-stationarity on prediction. The research results of single-site short-term photovoltaic forecasting is analyzed from two aspects: historical data, comprehensive historical and other meteorological data.

1) SINGLE-SITE PHOTOVOLTAIC FORECASTING METHODS BASED ON HISTORICAL DATA
For the prediction method based on historical data, the selection of data interval and reconstruction step size directly affects the prediction results of photovoltaic. Reference [47]- [50] study the influence of the length of historical data on the prediction accuracy. In [51]- [55], some researchers hope to decompose the historical time series by signal decomposition technology. The block diagram of the method that predict photovoltaic/solar radiation by using signal decomposition is shown in Fig.5. First, the non-stationary sequence is decomposed into a stationary component and a random component. The unpredictable random component accounts for a smaller proportion of the overall amplitude, while the stationary component accounts for a larger proportion of the overall amplitude. Then, for each group of components, the predictor of current component is trained through phase space reconstruction analysis or regression analysis. Finally, the predicted value of short-term photoelectric or solar radiation is obtained by using the parameters of signal decomposition to reconstruct the signal. Behera et al. [56] adopted empirical model decomposition (EMD) method to decompose and filter the photovoltaic series at each time interval. To obtain the denoising sequence, they took the local extreme values far away from the entire sequence as outliers and filters them out. Li et al. [57] used wavelet packet decomposition (WPD) technology to decompose the original photovoltaic power series into four subsequences. The authors trained the LSTM prediction model for these four subsequences. The prediction results of each LSTM network are reconstructed, and the final prediction results are obtained VOLUME 10, 2022 by linear weighting method. Based on the Variational Mode Decomposition (VMD), Zang et al. [58] decomposed different frequency components from the historical time series of photovoltaic. They constructed frequency components into a two-dimensional data form with correlations in both daily and hourly timescales that can be extracted by convolution kernels. The VMD residual time series were refined into high-level features by a CNN, which could reduce the data size.
Compared with ultra-short-term prediction, the time scale of short-term prediction is extended, and the uncertainty of photovoltaic power generation process will also increase. At this time, the prediction method based on state transition and linear regression may not be applicable. Instead, prediction methods based on deep search for high-dimensional features hidden in data, including NN, SVM, DT and other soft computing methods are used [59]- [65]. These methods take solar radiation and photovoltaic sequences of the past period as input vectors, and the predicted single-step or multi-step solar radiation or photovoltaic as output variables. A low-cost and more robust black box model is used to simulate this dynamic relationship. Some researchers predicted photovoltaic through simulated ANN, which overcomes the limitation of traditional network that is difficult to obtain optimal structure and fall into local optimum [66]. In [67], the authors proposed a probabilistic bayesian learning technique to estimate the parameters of a multilayer perceptron NN. The bayesian framework allows to obtain confidence intervals and estimate the error line predicted by the NN, which helps to simplify the network structure and learning process. In [68], researchers adopted the periodic graph method to extract the random component of photovoltaic power and the least square (LS) method to determine the dynamic weight value of each NN. The random component of photovoltaic is predicted through the combined model, and then superimposed it with the periodic component to obtain the final photovoltaic prediction result.

2) SINGLE-SITE PHOTOVOLTAIC FORECASTING METHODS BASED ON HISTORICAL AND OTHER METEOROLOGICAL DATA
Other meteorological data include measured information of the meteorological environment such as atmospheric temperature, atmospheric pressure, relative humidity and geographic information around photovoltaic power stations as well as include NWP information based on these measured information and dynamic equations. Meteorological information will be packaged with solar radiation information as the input of the photovoltaic prediction methods mentioned in this subsection.
With the addition of meteorological environmental information such as temperature, humidity, air pressure, geographic location and other meteorological environmental information, it is more likely to find factors more relevant to the change process of photovoltaic power generation. For the data-driven prediction method, improving the dimension of input data is more conducive to finding the change characteristic of photovoltaic system. Reference [69] analyzed the relationship between each weather variables and photovoltaic output. It proposed a data-driven recursive arithmetic average integration model, which can iteratively improve the weight of sub independent models according to the recursive process. And this model is suitable for processing large amounts of data. Reference [70] proposed a multi time scale data-driven model to predict single-step photovoltaic based on the spatial and temporal correlation between adjacent solar stations. NWP has been used for a long time. Since 1904, NWP model has been developed. By gradually improving the calculation accuracy, more accurate weather forecast results are obtained [71]. According to the results of NWP in reference [72]- [75], the shortterm predicted power can be obtained by using photoelectric conversion model, in which solar irradiance is the most important meteorological factor affecting photovoltaic power generation.
There are studies using feature analysis methods such as clustering or component analysis to find similar photoelectric change processes from historical data and meteorological data. According to different processes, the prediction models are designed to realize the prediction of photoelectric or solar radiation. In [76], the kernel principal component analysis (PCA) was adopted to preprocess the sample data of weather type, temperature, humidity, atmospheric pressure, wind speed, etc. And the obtained characteristic sequence was used as the input of Elman network to predict photovoltaic. Reference [77] discussed the influence of missing data in photovoltaic prediction. For the training and testing data of SVM prediction model, K-NN method was applied to fill the missing data. The prediction effect is good. In [78], Fuzzy C-Means clustering algorithm (FCM) was used to cluster the original meteorological data according to the membership degree. Fig.6 summarizes the methodology framework for short-term photoelectric / solar radiation prediction using abundant meteorological data. Firstly, different types of data are preprocessed with the same resampling interval to form a high-dimensional input vector based on historical data. Then, in order to determine the parameters of the predictor, the high-dimensional regression characteristics of the vector and its delay matrix are analyzed by using the method of correlation analysis, principal component analysis, empirical mode analysis or spectral analysis. Finally, the predicted results of photovoltaic generation/solar radiation can be obtained by input the feature sequences into the predictor.
Short term prediction models such as neural network, support vector machine, time series method, etc. [79]- [92] are widely used in photovoltaic forecasting. By using meteorological data to analyze the randomness and uncertainty of photovoltaic, the generalization ability of the prediction model is improved and the power curve of actual photovoltaic is approximately fitted.

B. FORECASTING METHODS OF REGIONAL PHOTOVOLTAIC POWER STATIONS
The forecast of photovoltaic Station Group is the overall estimation of photovoltaic. Assuming that the generation power of N photovoltaic power stations in a group of photovoltaic power stations is the same as opposite distribution, then according to the central limit theorem, the variance of average power generation is 1 / N of the power generation method of a single power station. As a result, the stochasticity and non-stationarity of the generation power of the regional photovoltaic power station group is weaker than that of the single photovoltaic power station.
Photovoltaic not only has a certain degree of autocorrelation in time, but also it has high similarity in photovoltaic sequence of similar geographical locations. The degree of similarity can be described by the spatial correlation of photovoltaic. The uncertainty factors of cloud disturbance and weather change between similar photovoltaic power stations have certain similarity and delay. Therefore, there is obvious spatial correlation between photovoltaic power station. The stronger the correlation is, the more obvious the synchronization of change trend between sequences is. In [93], the spatiotemporal correlation of photovoltaic data of distributed photovoltaic system was deeply studied. And a bayesian network prediction model based on spatio-temporal correlation was established. Reference [94] conducts spatial clustering of photovoltaic power stations. The power output of each cluster was estimated and predicted by selecting the representative points in the regional centroid. The model output was averaged to obtain the regional prediction results. In [95], sub regions were divided by EMD and hierarchical clustering. The representative power stations were selected based on the minimum redundancy maximum correlation (MRMC) criterion. The Elman NN was used to predict the power generation of representative power stations in each sub region to realize the photovoltaic prediction of the whole region. Reference [96] adopted the generalized parameter covariance function to simulate the correlation between the irradiances of observation points. And it used the Kriging method of space and time to predict solar radiation. In [97], a chain structure of echo state network (ESN) was proposed. Autocorrelation analysis was used in the time information of each spatial variable to provide appropriate input for each ESN module. Reference [98]- [103] analyzed the spatial correlation of the photovoltaic sequence of each power station in the region. It used the similarity of photovoltaic output changes between photovoltaic power stations to obtain the trend of power generation. In [104], a number of solar radiation measurement stations in the northern states of the United States were modeled as spatial undirected graphs. The convolution graph automatic encoder (CGAE) probability model was used to predict the temporal and spatial probability of solar irradiance. The distribution of future irradiance of each station under the given historical radiation observation values could be estimated. Fig.7 shows the framework of regional photovoltaic prediction method based on spatial correlation. Firstly, the actual photovoltaic data of each power station is standardized. According to the spatial similarity of the data, the whole region is divided into N sub-regions. Then, by calculating the mutual information of the predicted power and regional power of each station, the representative power station is selected. The photovoltaic prediction of the station is carried out by using meteorological data. Finally, the regional power prediction results are obtained by statistical upscaling of the regional power prediction model. VOLUME 10, 2022

C. SUMMARY OF SHORT-TERM PHOTOVOLTAIC FORECASTING METHODS
At present, the vast majority of photovoltaic forecasting work is aimed at short-term forecasting. At the same time, the research on short-term regional forecasting is much less than that of short-term single site forecasting. Compared with ultra-short-term forecasting, the short-term forecasting has a larger time scale, and the stochasticity, variability and nonstationarity of photovoltaic have a obvious influence.
When only solar radiation / photovoltaic output power is used as input, the choice of data and model is very important. Due to the complexity of photovoltaic power generation process, it is difficult to fully reproduce. Generally, only the existing data and methods can be adopted to find the approximate real photovoltaic power generation process. In terms of data, the appropriate data type and scale will reduce the variability of the data series and increase the correlation between the input and the predicted value, which is conducive to the detection of photovoltaic characteristics hidden in the data. In the aspect of model, we mainly consider how to find the features in the data and how to find the corresponding short-term prediction method for the characteristics. In addition to trying one by one, principal component analysis, autoregressive integral moving average and other methods can actively detect the hidden features of photovoltaic / solar radiation sequences. Combined with the selected data and models, it is possible to simulate the dynamic change process of photovoltaic/ solar radiation and achieve high-quality photovoltaic prediction.The prediction methods based on historical data are shown in Table 3.
If other meteorological information is used, the abundance of data will be greatly improved and the possibility of finding more relevant data related to the change process of photovoltaic will become greater. Numerical weather prediction (NWP) based on meteorological characteristics can help to provide more accurate and relevant meteorological data (such as solar radiation, temperature, humidity, wind speed, etc.). By using meteorological data to analyze the randomness and uncertainty of photovoltaic power generation process, the generalization ability of the prediction model is improved and the power curve of actual photovoltaic is approximately fitted.The forecast methods of combining historical data with other meteorological data are shown in Table 4.
Compared with a single photovoltaic power station, the power station group has a larger area. The overall power generation change process has a spital average effect on the terrain environment, which reduces the variability of solar energy and reduces the error coefficient. Therefore, regional prediction is more reliable than single-site prediction. The regional photovoltaic prediction method based on spatial correlation describes the similarity of photovoltaic sequences with similar geographic locations. Cluster analysis methods such as k-means and hierarchical clustering are often used to divide the entire area into multiple sub-regions. The representative power stations are used to forecast the photovoltaic power in the sub region, so that the short-term single site forecasting method is applied to the power generation prediction of regional power stations. Table 5 compares the short-term regional forecasting methods.

V. MEDIUM AND LONG-TERM PHOTOVOLTAIC FORECAST METHODS
The medium and long-term prediction time scale of photovoltaic is more than 72h. Through the statistical analysis of long-term meteorological data, geographic location information and historical data of solar radiation in a certain region, photovoltaic situation over 72 hours or within one year can be obtained, which can be applied to the site selection of new photovoltaic power stations and the economic value assessment of new power plants [105].
The following is still from the perspective of spatial scale (single site, regional) to analyze the medium and long-term photovoltaic forecasting methods.

A. SINGLE-SITE PHOTOVOLTAIC FORECAST METHODS
Medium and long-term photovoltaic/solar radiation prediction has a significant influence on the location of photovoltaic facilities. The terrain and meteorological conditions of different locations are different and change over time. It is a challenge to accurately estimate the power generation potential of a new location. For the medium and longterm photovoltaic/solar radiation prediction of a single site, existing researches apply more deterministic models or data-driven models as prediction models. This section summarizes and analyzes the medium and long-term power generation/solar radiation prediction methods based on these two types of prediction models.

1) SINGLE-SITE PHOTOVOLTAIC FORECASTING METHODS BASED ON DETERMINISTIC MODEL
The deterministic model can be further divided into an empirical parameterized model and a physical model in accordance with the mathematical form. At present, the empirical parameterized model is a widely studied method for predicting global solar radiation. Most stations usually measure meteorological data such as sunshine duration, temperature, steam, clouds, pressure and wind speed. So empirical models can be used to estimate solar radiation based on these meteorological variables. Empirical model parameters are often specific to specific locations. Local calibration is always required. It is conducive to solar energy assessment and location of photovoltaic facilities that using empirical models to predict mid and long-term solar radiation at designated locations. In the past few decades, a large number of empirical models have been developed. The model adopts commonly used meteorological data and geographic information to predict and estimate global solar radiation. It is widely used because of its low computational cost and reasonable prediction accuracy. Empirical models can be divided into sunlight-based models, temperature-based models, other weather models and hybrid models. The duration of sunshine is the most commonly used parameter for estimating global solar radiation. Since most weather stations in the world can reliably and accurately measure the duration of sunshine, the sunlight-based model becomes the most extensive empirical model for solar radiation estimation. Angstrom and Prescott proposed the first and most widely used empirical model for estimating monthly average daily global solar radiation. It referred to as AP model [106], which is based on radiation from a horizontal surface. The expression is shown in formula (1).

AP model :
where H is horizontal surface, H 0 is extraterrestrial solar radiation, H H 0 is clearness index, S is sunshine duration, S 0 is potential sunshine duration, S S 0 is sunshine duration ratio, a and b are empirical coefficient.
Many researchers have conducted verification studies and improvements to the AP model. Besharat [107], Giwa [108], Chukwujindu [109] and Yao [110] verified the reliability of the AP model in Iran, Nigeria, Africa and China respectively. Mecibah [111] relates the monthly average daily solar radiation of six cities in Algeria with the monthly average sunshine record and temperature data. The AP model is modified and the performance of six platforms in Algeria is optimized by the quadratic and cubic regression models. Bayrakci [112] summarized the polynomial (linear, quadratic and cubic), logarithmic, exponential, power, rational and combined function regression models based on the AP model. The summarized model was used to provide an estimate of the monthly mean global solar radiation for the city of Mula in southwest Turkey. Studies [113]- [115] have also shown that modified forms of the AP model (such as twotime, three-time, logarithmic, exponential, power, rational and combined function forms) may improve global solar radiation estimates. In addition to the most widely used AP model, empirical models based on sunlight include Rietveld model [116], Ogelman model [117], Bahel model [118], Newland model [119], Jin model [120], etc. Some sunlightbased models are listed as follows.
Jin model : where ϕ is latitude of the site, a, b, c and d are empirical coefficient.

b: TEMPERATURE-BASED EMPIRICAL MODELS
Temperature is a commonly used measurement parameter. Due to the common availability of daily maximum and minimum temperatures, researchers have proposed empirical methods to estimate solar radiation based on these variables.
The temperature-based model assumes that the difference between the highest and lowest temperature is directly related to the proportion of extraterrestrial radiation received by the ground. Hargreaves and Samani proposed an empirical equation that only uses the highest and lowest temperature to estimate solar radiation, and it referred to as the HS model [121]. The specific expression is shown in equation (7), and the coefficient a for arid and semi-arid regions is set to 0.17. In addition, Bristow and Campbell developed the Bristow-Campbell (BC) model [122]. The empirical formula is expressed in the form of an exponential function. The specific expression is shown in equation (8). The coefficient a represents the maximum radiation that can be predicted on a sunny day, and the coefficient b and c control the rate of approaching a as the temperature difference increases. Apart from the HS and BC models, temperature-based models include Allen model [123], Hunt model [124], Goodin model [125], Chen model [126], etc. Many researchers analyze and study temperature-based empirical models. Sharifi [127] adopted five temperature-based empirical models with wavelet regression, artificial neural networks and gene expression programming to estimate solar radiation values and analyze performance. Li et al. [128] divided areas according to solar radiation. Then the solar radiation was estimated by the empirical model of selecting the best temperature in each radiation area. Hassan et al. [129] proposed a new global solar radiation estimation model based on ambient temperature, which was tested in ten different locations around Egypt.

HS model :
BS model : Allen model : Hunt model : Goodin model : Chen model : where T min is minimum air temperature, T max is maximum air temperature, T is difference between maximum and minimum air temperatures, a, b and K ra are empirical coefficient.

c: OTHER WEATHER MODELS AND MIXED EMPIRICAL MODELS
In addition to models based on sunlight and temperature, many researchers have established corresponding empirical models based on meteorological parameters such as relative humidity [130], precipitation [131], clouds [132], [133] and water vapor [134]. At the same time, a mixed model composed of parameters such as sunlight, temperature, relative humidity, precipitation, altitude, etc. also be extensively studied. Fan et al. [135] introduced precipitation and relative humidity data as input of the temperature model for tropical and subtropical regions in China, which can greatly improve the accuracy of a single temperature model. In order to estimate solar radiation in the Mediterranean, Yildirim et al. [136] added relative humidity to a single sunlight model. The results showed that the newly proposed model improves the predictive accuracy of solar radiation. Jahani et al. [137] introduced a complex model that takes temperature, precipitation, relative humidity and sunshine duration as inputs. And he verified the accuracy of the complex model. Fan et al. [138] combined sunshine time, overall radiation, temperature and relative humidity to form a hybrid model, and then they verified the prediction accuracy of the proposed hybrid model in five different climate regions in China. Pandey et al. [139] built a corresponding empirical model by correlating the diffusion ratio with the relative sunshine duration. They estimated the monthly average daily diffuse solar radiation in the four main locations in India.

2) SINGLE-SITE PHOTOVOLTAIC FORECASTING METHODS BASED ON DATA-DRIVEN MODEL
In addition to the determination model used in the previous section for medium and long-term prediction, the prediction method based on data-driven model is also one of the medium and long-term prediction methods. And the prediction method is similar to the prediction method in the previous two chapters. Aslam et al. [140] adopted gated recurrent unit (GRU), LSTM, Feedforward NN, SVM and RF to predict the annual solar radiation. The results show that the prediction effect of GRU is better than other models. To predict the photovoltaic capacity in one month, Jung et al. [141] trained the LSTM from the data sets (power plant capacity, power transaction data, weather conditions and estimated solar radiation) collected from 164 photovoltaic stations. Citakoglu [142] predicted the monthly solar radiation value through ANN, adaptive network fuzzy inference system (ANFIS), multiple linear regression (MLR) and empirical Model. When estimating the solar radiation in Turkey, the performance of ANN model was better than other models. Ozoegwu [143] proposed a hybrid prediction model combining nonlinear autoregression and ANN. And they verified the prediction accuracy of the hybrid method based on various statistical criteria. Lin et al. [144] developed the evolutionary seasonal decomposition LS-SVM to predict the monthly photovoltaic. Seasonal decomposition can be used to reduce the complexity of seasonal time series data. Compared with ARIMA, SARIMA, RNN and LS-SVR models, it shows the effectiveness of long-term prediction.

B. FORECASTING METHODS OF REGIONAL PHOTOVOLTAIC POWER STATIONS
When building a new photovoltaic power plant, it is necessary to evaluate the long-term solar energy resources of the region through long-term prediction. This assessment needs to predict more than a few years in the future, so it needs historical data with roughly the same time range as reference.
The available data include the observation data of meteorological observation points (including weather stations, satellites, etc.) that have been built around. Heo et al. [145] proposed a CNN model based on digital elevation map, which is used to predict the annual solar radiation under clear sky conditions. The map data and spatial information set are used as model input. The model can recognize and learn complex terrain features, which provides data support for determining the location of solar panels to be installed. Eom et al. [146] provided a framework for feature selection through correlation analysis and reverse elimination. A feature selection ensemble prediction method based on CNN are adopted, which can use different prediction sources for medium and long-term prediction.
Li et al. [147] developed a rolling prediction model based on EMD and ANN to predict the long-term solar radiation in Gonghe County, Qinghai Province. Ghimire et al. [148] constructed the extreme learning machine (ELM) model to predict the monthly average daily solar radiation by combining the medium resolution imaging spectrometer satellite and numerical weather prediction data.

C. SUMMARY OF MEDIUM AND LONG-TERM PHOTOVOLTAIC FORECASTING METHODS
The medium and long-term forecast time scale is above 72 hours. The evaluation of solar energy resources in the designated area will help to the site selection of new photovoltaic facilities. For single-site medium and long-term forecasting, the empirical model has the advantages of simple process and reasonable accuracy for the resource evaluation of the designated location. It only needs to combine meteorological data such as sunshine duration, temperature, relative humidity and precipitation as the input of the empirical model. The results of solar radiation estimation can be obtained. Due to the difficulty of modeling and the relatively lack of data, there are few studies on regional medium and long-term forecasts, but the empirical model has great significance to the rational use of solar energy resources and the safe operation of the power grid. The regional power station group has an average effect on a single photovoltaic. Its variability is smaller than single photovoltaic and its stability is better. After selecting the appropriate scale data, the prediction method for a single site can be directly used on power generation forecast for the regional power station group. The real measurement data can VOLUME 10, 2022 assist to modify the prediction methods based on the stacking of single photovoltaic sites, model extrapolation and scale upscaling so as to apply the single-site generation prediction method to the prediction of regional power stations. Table 6 compare the medium and long-term photovoltaic forecasting methods.

VI. DISCUSSION OF MULTI-TEMPORAL AND MULTI-SPITAL SCALES FORECASTING METHODS
In essence, photovoltaic prediction is to find the law of photovoltaic power generation process. It is approximately described by mathematical method to deduce the change process of deterministic photovoltaic output. It is necessary to select appropriate data and methods to analyze the characteristics of the change process of photovoltaic output according to different spatial and time scales. Then select the appropriate mathematical model based on these characteristics. The ultra-short-term prediction time scale is less than 4h. The generation, disappearance and movement of clouds are the fundamental factors that produce the variability of photovoltaic. The cloud image-based method can predict the movement trend of clouds and provide data support for surface photovoltaic. The variability of photovoltaic can be found by methods such as Markov chain, random forest, Bayesian inference and artificial neural network. When the time scale increases, the variability and non-stationarity of PV / solar radiation are weaker than that of ultra-short term. Due to the increase of data complexity, it is necessary to use the prediction model with strong ability to describe the nonlinearity to predict photovoltaic power generation. Some studies hope to find relevant variables or predictable sequences through principal component analysis, wavelet analysis, empirical mode decomposition and other methods to improve the predictability of input data under the selected model. In the long-term prediction, if the data time scale is large and the volatility of photovoltaic/solar radiation decreases, an empirical model for a specific location can be selected to predict the solar radiation amount of the photovoltaic power station. The calculation process is simple and the prediction accuracy is reasonable.
At present, most researches focus on short-term prediction. The reason for this behavior is that most of the energy is traded in day-ahead markets, when planning and unit commitment take place. With the development of energy market, trading within a day is becoming more important. In this time scale, using feature extraction method and machine learning prediction model (such as ANN, SVM, RF, etc.) with strong nonlinear description ability can obtain excellent prediction accuracy.

B. DISCUSSION OF FORECASTING METHODS IN DIFFERENT SPATIAL SCALES
Different spatial scales have different spatial characteristics including geographical and climatic characteristics. For a single photovoltaic power station, the variation characteristics of power generation and solar radiation of photovoltaic power station are greatly affected by its specific geographical location or geographical environment, which makes the variation rules of solar radiation details of different photovoltaic power stations in the same region are not completely the same or even quite different. How to find the variation law of photovoltaic power generation process is the key to improve the accuracy of single site prediction. Numerical Weather Prediction (NWP) provides more accurate and relevant meteorological data, and it is more likely to find more relevant data related to the change process of singlesite photovoltaic. The machine learning model with strong ability to deal with nonlinear problems is selected to predict photovoltaic, and the curve fitting the actual photovoltaic power generation approximately can be obtained.
Stochasticity and variability are the key factors affecting the establishment of photovoltaic power generation prediction model. Regional power stations have spatial average effect, which reduces the variability of solar energy. As a result, the fluctuation of its photovoltaic power generation process is smaller than that of a single photovoltaic station, and its stability is better. After selecting the appropriate scale data, the prediction model for single station can be directly used for the prediction of regional power stations, and the results of regional prediction will be more reliable. Regional prediction can be approached by several ways (superposition method, extrapolation method, statistical scaling method), depending on the information available. In spite of the bottom-up strategy (Superposition methods) outperforming the others, satisfactory results can be obtained with other less computational demanding techniques.

VII. CONCLUSION
According to the temporal scale, spatial scale and input data, this paper classifies the recent research results of photovoltaic prediction and solar radiation prediction. And it summarizes the scientific issues and technical means that the prediction methods focus on under different spatial and temporal scales and input data. For the temporal scale, shortterm prediction is a hot topic. With the development of energy market, the status of day-ahead prediction is becoming more important. Machine learning method shows excellent nonlinear description ability in day-ahead prediction, and the prediction results are satisfactory. On the spatial scale, the spatial average effect of the regional reduces the variability of solar energy. The power generation process of the whole region has less fluctuation and better stability. Therefore, VOLUME 10, 2022 the results of regional prediction are more reliable. The purpose of this research is to provide assistance for absorbing solar resources and provide forecasting methods available for reference when researchers are faced with different photovoltaic forecasting tasks.