A Hybrid Predicting Model for the Daily Photovoltaic Output Based on Fuzzy Clustering of Meteorological Data and Joint Algorithm of GAPS and RBF Neural Network

Photovoltaic (PV) output is greatly affected by meteorological factors. If it has no efficient meteorological factors, the prediction accuracy for PV is a little low. Although the Radial Basis Function (RBF) network is already widely utilized in photovoltaic prediction, its prediction error is too large. An algorithm for forecasting the evaluation of the short-term PV output based on fuzzy clustering of meteorological data and a joint algorithm of the Genetic Algorithm Programming System (GAPS) and Radial Basis Function (RBF) is proposed in this paper to increase the prediction accuracy. Selecting the three main types of meteorological data, including atmospheric turbidity, relative humidity, and solar irradiance, as clustering feature vectors of the cluster class and clustering that historical PV outputting data into three groups by an improved fuzzy c-means clustering (IFCM) method are significant in this study. Finally, this research implemented the computational simulation for a real case. Its results show that the proposed model and algorithm work well and can reduce the dimension of the model and improve the prediction accuracy.


I. INTRODUCTION
With the rapid development of the social economy, the problem of fossil energy pollution and energy shortage is becoming increasingly worse [1]. The sustainable utilization and development of renewable and clean energy, mainly based on wind power, and photovoltaic is an efficient, reasonable, and feasible way to address this problem [2]. After wind power generation, photovoltaic generation has already become a new growth point in the region of renewable energy [3]. Photovoltaic generation is beneficial because of the alternation between day and night illumination.
The associate editor coordinating the review of this manuscript and approving it for publication was Shuo Sun.
At the same time, owing to the influence on photovoltaic generation from meteorological aspects, including cloud cover, temperature [4], and aerosol [5], etc., photovoltaic generation also has the feature of great uncertainty [6]. These two reasons cause the photovoltaic power generation grid-connected to the grid to affect the grid [7]. Consequently, if the prediction accuracy of the PV output power can be on time, it would be the key for power grid dispatching or regulating and stable operation for a PV power station [8].
In photovoltaic power prediction, the collection and process for the digital images rely on the satellite of GMS-5, and the process usually includes four types of channel data, including IR1channel, water vapor channel, IR2channel, and visible channel, etc. [9]. These geostationary satellite images VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ alongside spatial resolutions of 5km at hourly time intervals. Figure 1. shows a specific GMS-5 IR1 image and a cloud/none cloud image [10]. The full name of the GMS-5 (also known as Himawari-5) is Geostationary Meteorological Satellite-5 [11]. There are many classification methods for the prediction of PV output. As far as the forecast principle is concerned, there are two main methods of classification: a statistical method and a physical way [12]. The physical way usually depends on the equations of the PV module, the solar radiation transfer, and some other physical equations. It needs specific meteorological and geographic information about the area with the photovoltaic station and photovoltaic module information [13].
The physical method is not to demand over the historical data that records the operation of the PV station. The statistical way works based on historical operation data for the statistical analysis to derive the inner rule of photovoltaic output and its affected factors. These methods include Support Vector Machine (SVM), grey prediction, and Artificial Neural Network (ANN). Paper [14] constructed one forecast model for the value of solar radiation based on the uncertainty theory included the influence of the cloud cover on photovoltaic power.
However, it is hard to attain precise cloud cover data for a meteorological station under the current conditions. The literature above all considered the effect of these meteorological factors on the photovoltaic output. Although the methods mentioned above have a high prediction accuracy in nonabrupt weather, the accuracy is low once in the abrupt one. Figure 2 shows differences in the daily PV output plots among different weather conditions of the rainy, cloudy and sunny conditions. For these three weather conditions, considering the extreme conditional influence of rainy(worst) or sunny (best) weather on photovoltaic and its characteristics of generation conditions and uncertainty, we should pay much more attention to the photovoltaic prediction of the cloudy situation. It means that cloud cover is the most significant factor that affects photovoltaic generation. There are other reasons to explain the theoretical aspect of this case in more detail listed in the beginning part of the next part of ''Influencing factors of photovoltaic output''. The meteorological factors in the atmosphere could cause the diversity mentioned above, which is gonging to affect the training efficiency for the neural network, so the prediction accuracy is also intended to be decreased.
The spectral band of an optical sensor is usually affected via cloud shadow and clouds. According to the international ISCCP-FD (Satellite Cloud Climatology Project-Flux Data), the average cloud cover of the whole world per year is near to 66% [15]. When those cloud shadows and clouds cover the Earth'surface, the satellite images cannot correctly present specific it; which probably influences many types of studies in turn, including that on the land cover [16], atmospheric correction [17], feature extraction, and change detection [18], etc. Consequently, before different applications utilize satellite images, an essential pre-procedure is cloud shadow and cloud detection.
From the review above, in the past two decades, people have developed some methods for automatically screening cloud shadows as well clouds and broadly utilized them in the different satellite images. Most of the above articles adopted a spatial matching way to fine-tune the initial cloud shadow pixels for high prediction accuracy. There are about two classes in these matching methods. On the one hand, based on the law of the projection [19] and a geometric relationship [20], the 1st class can predict and offer the location of the given cloud shadow. Nevertheless, although it is easy to obtain most meteorological aspects from the metadata files, the cloud's height is usually hard to know. Because different clouds have different altitudes, each cloud object may utilize those iterations. Therefore, the computing complexity may cost high; On the other hand, the 2nd class thinks that it can regard those cloud shadows as right if there are some cloud pixels in the neighborhood of cloud shadows. Otherwise, the methods will remove those wrong cloud shadows. Though methods in the 2nd type can obtain the cloud shadow masks at short notice, some effecting factors, including searching direction and size setting of the local window, can still affect the prediction accuracy. This research combines these two categories of matching schemes, i.e., to determine the searching direction by the former method and to find cloud shadow pixels via applying the latter one. Furthermore, the study also proposed a new cloud shadow detecting strategy to improve the detecting accuracy of regions surrounded by clouds.
A HYBRID prediction model based on an improved fuzzy c-means clustering (IFCM) of meteorological aspects and a Genetic Algorithm Programming System (GAPS) is proposed in this article to optimize the initial thresholds and weights of the RBF neural network for daily forecasting of the PV output of the PV power station. The structure of this paper is as follows. First, this method utilized a PEARSON correlation coefficient [21] to analyze the correlation between the photovoltaic output and meteorological factors. The paper chooses atmospheric turbidity [22], relative humidity [23], and solar irradiance [24] as clustering feature variables, which is especially important for cloud days. Second, to build a new dataset using the cloudy data to be forecasted and the historical meteorological data. Third, using this data set and the IFCM algorithm detected the cloud cover characterized through these three meteorological factors can significantly affect photovoltaic power. Finally, we chose out the historical data of the PV output and the IFCM clustering data of the prediction day before being selected to predict the output of the PV power station via the GAPS-RBF algorithm.
The structure of this paper is as following 7 parts: in part I is the introduction to the sustainable development of renewable energy, especially photovoltaic power output, and reviewed its solution first; Then, in Part II, the factors affecting photovoltaic output are discussed. And the PEARSON correlation coefficient is applied to analyze the correlation between the photovoltaic output power and many meteorological factors that can influence the photovoltaic power. Section III proposed an IFCM algorithm to detect the cloud cover using the dataset created in part II. and to predict the PV output of the PV power station, part IV gives the RBF neural network in detail, which will coordinate with the GAPS method in part V. part VI, that is, the computing simulation results and their analysis. Finally, section VII concluded some specific summaries of this paper.

II. INFLUENCING FACTORS OF PHOTOVOLTAIC OUTPUT
Photovoltaic generation is a process that converts solar energy into electrical energy directly by applying solar cells [25] based on the specific principle of the photovoltaic effect [26]. The photovoltaic output depends on the radiation intensity from solar radiation and the efficiency of the conversion from solar cell modules to a great extent [27].
The daily PV is a type of short-term prediction because the photovoltaic grid-connected inverter usually operates under the tracking mode of the maximum power point, which has a relatively stable power conversion rate [28]. Accordingly, for an established photovoltaic power generation system, because it includes the system information of those PV arrays, we can consider the installation angle of the PV panel and the conversion efficiency of the inverter as constants, and the temperature of photovoltaic modules and the radiation intensity from the surface as variables [29]. On the one hand, because it is difficult to measure the temperature of photovoltaic modules, the ambient temperature of a photovoltaic power station can be applied. On the other hand, the surface solar radiation is the solar irradiance reflected, scattered, and absorbed via the atmosphere and reaches the earth's surface, and meteorological factors mainly affected it. The aerosol can directly reflect, assimilate, and scatter the solar radiation. The water vapor and the gas molecules can absorb and reflect the solar radiation in the atmosphere. The analysis above shows that there are a lot of meteorological factors, including temperature, wind speed, aerosol, solar irradiance, humidity, air pressure, and cloud cover, etc., which can affect the photovoltaic power output. Generally, the atmospheric turbidity ∂ is the ratio of the scattered radiation D to the direct radiation S to measure the effect of aerosol over solar radiation as follows: The influence of each factor mentioned above on the PV output is different. To improve the clustering accuracy and reduce the complexity of this model, we utilized the Pearson correlation coefficient to quantify the correlation among photovoltaic output power and these meteorological aspects [29]. Pearson correlation coefficient is a parameter applied to detect whether the two data sets are alongside a line or not, which is usually a way to measure a linear relationship between some distance variables: Table 1 above gives the Pearson correlation coefficient between photovoltaic and meteorological factors. The data in this table are from statistical data recorded via the domestic photovoltaic power station in August and September 2017 [30]. This table denotes that some meteorological factors with the closest correlation to PV power generation are solar irradiance, environment temperature, relative humidity, and atmospheric turbidity based on the computed correlation coefficients above. Furthermore, the environmental temperature affects the photovoltaic module in the photoelectric conversion. The other three aspects influence the PV output by affecting the solar radiation (i.e., cloud cover) that passes through the atmosphere. It also confirms the previous opinion again that ''cloud cover is the most significant factor over influencing photovoltaic generation'' in part I, which will be researched in the following section to obtain a method based on IFCM to detect the cloud cover.

III. IFCM ALGORITHM FOR CLOUD AND CLOUD SHADOW DETECTION A. METHODOLOGY AND PRINCIPLE
As the review and analysis before, the cloud cover may be the most significant affecting factor to the PV output. In addition, it is an essential preprocess of the detection for cloud and cloud shadow. Figure 3. shows this detection process in detail.
The three types of cloud characteristics of statistical, texture, and spectral are utilized in this study to represent atmospheric turbidity, relative humidity, and solar irradiance, respectively. Due to the wide range of those reflectance values shown via all types of land cover objects and clouds, it is hard to detect the clouds and cloud shadows precisely from cloud-free observations by applying only a specific given spectral band [31]. Therefore, synthesizing no less than two or more distinctive bands and based on the spectral information, some proper characteristics of the cloud shadow and cloud to emphasize the cloud shadows and clouds while reducing the influence.
Consequently, for better abstracting these missed or ignored clouds, this method involves secondary cloud detection is significantly essential, and the other features (e.g., the texture features in our paper).
As shown in Figure 3., the processes of cloud detection and cloud shadow detection based on IFCM include four and three main steps, respectively. The following section explains these two detecting processes step-by-step.
Firstly, there are four steps of cloud detection as below: x The three cloud characteristics, including spectral, statistical, and texture factors, are calculated via utilizing the blue, green, red, and Near-infrared spectroscopy (NIR) bands to highlight cloud pixels; y Statistical and spectral data are applied to detect an original cloud by utilizing the IFCM; z After the first IFCM classification, then all of the cloud factors of pixels belonging to the non-cloud class are used to detect the next (i.e., secondary) cloud; { The cloud pixels gained in the secondary cloud detecting process were checked and affirmed to resolve whether this secondary cloud detection is essential.
Secondly, there are three main steps of cloud shadow detection in the following section below: x By utilizing the water test to separate the water pixels, as well as the NIR band, to compute the cloud shadow index for the non-water pixels; y Via utilizing the IFCM in the cloud shadow index to obtain the original cloud shadow pixels; z Using a fast cloud shadow and cloud matching algorithm to get the ultimate pixels of a cloud shadow. Therefore, the IFCM algorithm is the right way for cloud detection and cloudy shadow detection. Nevertheless, one spatial matching method is usually required to rectify the original cloud shadow pixels.

B. IMPROVED FUZZY C-MEANS METHOD (IFCM)
The FCM proposed by Dunn [32] is a classical fuzzy clustering algorithm that permits those data points to belong to more than one cluster. Its purpose is to minimize the objective function computed applying Equation (3) through optimizing cluster centers c j gotten via the formulas (4,5) and iteratively membership µ ij : where the parameter p governs the quantity of the fuzzy overlap among those clusters, and a smaller value indicates a lower degree of overlap, and this value is commonly greater than 1. In this article, the value of p is 2, µ ab being a degree of membership of the multi-dimension data, which is measured from the a-th pixel in the b-th cluster; and c and n symbolize the number of classes and the number of pixels in one constantly given image, respectively. And d ab is the distance from the a-th measured data to the center of the b-th cluster c b : In the expressions above, x a is the a-th measured data. The iteration will not stop until the improvement ε in the objective function between two consecutive iterations tends to be weak, even none. Its setting value was 10 −5 . To save computing time and prevent the iterative dead loop [33], the pre-definition of the longest iteration time was a limitation of 100. According to the same statistical data, which were recorded via a Chinese domestic PV power station in August and September 2017 [34] and are also as references in Figure 2., Table 1, and Figure 4. illustrate values of the membership degree every day for 61 consecutive days of these two months.

C. CLOUD CHARACTERISTICS
As described before, the three types of cloud characteristics of statistical, texture, and spectral are utilized in this paper to represent atmospheric turbidity, relative humidity, and solar irradiance, respectively [35]. In this section, the three cloud characteristics, including spectral, statistical, and texture factors, are mathematically defined and explained in theory successively.

1) SPECTRAL CLOUD CHARACTERISTICS
In general, clouds generally have a much larger reflectance than the land. Thus, clouds appear brilliant and white in the RGB space. The factor of the HOT index [36] was as the first cloud spectral feature initially, calculating from the expression below can get it: In expression (6) above, HOT symbolizes the values of the HOT index, and the parameters B red and B blue represent the reflectance values of the red and blue bands, respectively.
The brilliant value of pixels was chosen as the second cloud spectral feature because clouds are usually opaque and white in the RGB [37] space: where the parameter Brilliant is a bright value, and B green represents the reflectance value of the green band, similar to the definition of B red and B blue . Furthermore, the third cloud spectral feature intends to be on a fixed dark channel because of the problem of color for the cloud shadow. This dark tunnel can remove haze, which is efficient and feasible for detecting the cloud. It formulated as follows: where DARK denotes the value of dark. Consequently, by now, these three factors {HOT, Brilliant, and DARK} listed above can be considered cloud spectral features.

2) STATISTICAL CLOUD CHARACTERISTICS
In this article, using variances and local means illustrated the details and intensity of the initial image.
In expression (9), parameters σ 2 a and M a are the variance and mean value of the a-th pixel, respectively; B r is the VOLUME 10, 2022 specific r-th pixel of the fixed spectral band within a regional (i.e. local)window a surrounding the a-th pixel, N symbolizes the number of pixels located at this local window, and the size of this window is set to 3-5. Accordingly, approximately twelve statistical characteristics can be obtained in sum because all the visible bands were included in this paper.

3) TEXTURE CLOUD CHARACTERISTICS
Based on the analysis of the results of principal components [38], considering the source of more than 98% of the information of the initial image, the first and second main components of the image are chosen to compute the cloud texture features. The Gabor filter is commonly a well-known and efficient model for identifying texture: In this research, λ represents the wavelength of the sinusoidal function, and its setting is to be 4 and 3. ψ is the orientation and its value to be 0, 45 • , 90 • , and 135 • respectively; σ symbolizes the standard deviation of the Gaussian signal intensity envelope(Note please. here sets the value of the Gaussian signal intensity envelope as a data set of [−1, 1] for convenience.) and limits its value at of [0,1].
Parameter i is a specific variable relevant to the wavelength and bandwidth; γ is the aspect ratio, which governs the ellipticity of the Gaussian envelope, and was selected to be a fixed value of 0.5. After computing these texture features, Each PC can produce eight texture characteristics and utilize 16 cloud texture features.

4) FEATURE FUSION
After computing the characteristics above, these features need to be Synthesized into specific basic feature sets and regarded as the inputting ends of the IFCM classification at last. The fusion process for the features includes two paces: First, all kinds of cloud features gained above are normalized and then selecting some appropriate characteristics from these normalized features combined into the best subset of the factors. These multiple-class factors can be normalized to a set [0,1] as follows: In formula (11) above, f nor symbolizes the normalization data of features, f is the original feature data, and f min , f max are the minimum and maximum values of the original feature data, respectively.

D. FLOWCHART AND ANALYSIS OF THE CLOUD DETECTION
The IFCM algorithm was applied to detect clouds after merging these subsets. In the detecting process of clouds, IFCM was utilized twice, as in previous reviews and discussions. For every time, the target pixels need sorting into two classes: non-cloud pixels and clouds. The total grades of membership of a specific pixel within two clusters are equal to the value of 1. To determine whether a pixel belongs to the cluster, adopting the grade of membership of the pixel for each group as a decision factor is a significant way. Within the detection process of the original clouds, some pixels with higher degrees of membership in the cloud clusters are considered cloud pixels. Accordingly, we can set the classification threshold to 0.5 due to the distinctly big difference between the non-cloud pixels and the cloud. After the first cloud detection, the second one may be harder to detect than the initial detecting process. Consequently, using expression (12) can obtain an adaptive threshold at this time as follows: In the formula above, TH res is the threshold, U represents a set of degrees of membership of that non-cloud pixel after the original cloud detection for cloud clusters (CDCC). Indexes σ 2 {·} and Mean {·} denote the standard deviation and mean values, respectively.
Using an assured cloud verification step judges the pixels of some potential clouds. Once the difference between the non-cloud clusters and cloud clusters is tremendous, the conclusion draws that non-cloud pixels and the cloud are separated or the classification for the cloud and non-cloud pixels as false. In Figure 5, as for the initial Cloud Detection in Cloud Clusters (CDCC), the non-cloud cluster centers and the cloud are symbolized as CDCC1_L and CDCC1_H, respectively, and CDCC2_H and CDCC2_L denote the cloud and noncloud cluster centers for the secondary CDCC. The number of characteristics applied in every IFCM classification is virtually the size of the cluster centers, that is, its length. The authors utilized these cluster centers with overlapping factors to compute the distance owing to the difference in the cloud feature subsets between the original and the second cloud detection.
However, on the opposite, it can be derived that the noncloud and the cloud pixels gained via this second cloud clusters detection are not efficient. Accordingly, we can consider those cloud pixels gotten from the original cloud clusters detection as clouds indeed firstly. The distance among CDCC2_L and CDCC2_H mentioned above can be written by the normalized distance: In expression (13), Distance 2nd represents the normalized distance between the non-cloud cluster centers and the cloud in the process of secondary cloud cluster detection. Once this normalized distance is longer than an exact threshold, the secondary cloud cluster detection is essential, and the selected value was 0.25 in this paper.
For a Fixed region Cloud Cover (FrCC), which is the definition of one value that calculated by dividing the number of the cloudy pixels at non-cloud or cloud images by the total number of pixels in the same picture: In equation (14), A cloud denotes the number of the cloudy pixels in the non-cloud or cloud image, N total the total number of pixels in the same picture, and FrCC ∈ [0, 1]. For instance, if there are 55 cloud pixels in a profile of 10 × 10 non-clouds and clouds, then the FrCC of this cloud and the non-cloud image is 0.55.

IV. RBF NEURAL NETWORK
From the analysis about obtaining the cloud and non-cloud features, part III defined the cloud cover directly relative to photovoltaic power. Furthermore, as random and uncertain [39], the cloud cover can be simultaneously forecast through the RBF neural network if FrCC is as the input of this RBF neural network and the inputting vector of the RBF neural network X (t) is the FrCC at the time t.

A. RBF NEURAL NETWORK
Powell proposed the first presentation of a multiple-variable interpolation Radial Basis Function (RBF) algorithm in 1985 [40]. The neural network of RBF applies a specific radial basis function for the activating function and imitating neurons of the human being, which have specific local reactions to the outer stimulation. Figure 6. shows the structure of the neural networks of RBF, which is a feed-forward network, and generally has three layers, including an inputting layer, a linear outputting layer, and a hidden layer with a specific nonlinear RBF activating function.
As seen from figure 6. that the input port X = (x 1 , x 2 , · · · , x n ) T is an n-dimensional vector, and the outputting port Y = (y 1 , y 2 , · · · , y m ) T is an m-dimensional vector in RBF. X (t) is a function of FrCC with time parameter t as an independent variable, and q i denotes the output of the hidden layer as for neuron i: In the expression above, c i symbolizes the center of the hidden layer neuron i, and is an n-dimensional vector with i = 1, 2, 3, . . .; ||•|| is generally adopted as the Euclidian distance; (•) exactly is the radial basis function, which is the transforming function of those hidden layers.
It is a non-negative and nonlinear function with the features of local experience, which also has characteristics of attenuation on the center of radial symmetry and the original distribution.
This function has many types of forms and mirrors the nonlinear mapping ability of the RBF neural network. This study adopted A Gaussian as the radial basis function: Supposing the node k is an outputting layer neuron, then its corresponding output y k can be written as a specific linear combination for the output of the hidden layer neurons: In equation (17) above, w ki represents a linking weight value from the neuron k of the output layer to neuron i of the hidden layer, and θ is the threshold of neuron k of the outputting layer.

B. PREDICTING STEPS BASED ON RBF NEURAL NETWORK 1) PREPARATION OF TRAINING SAMPLES
To reach the FrCC prediction's needs. It is significant to obtain sufficient training samples to train the neural network [40]. To forecast, FrCC is a specific detection algorithm for cloud cover based on an RBF neural network, which utilizes the values of k consecutive times from the time i to forecast the value at time i + k.
Using follows formula can obtain the samples for training: {x (t) |t = 1, 2, 3 · · · } symbolizes the time series for changes in a specific FrCC, and W denotes a sliding window, whose VOLUME 10, 2022 width is k + 1 (i.e., this sliding window involves k + 1 consecutive time). The window W will slide on the series of the time with step 1 (once a time), as well as obtain these values from W for k +1 times, where i represents the location of W in this time series, and W i is the k + 1 value of FrCC at the initial time i: Splitting W i into two sections of the (k + 1) th value (i.e., the target one.) and the initial k value (i.e., those measured values). Table 2 illustrates some forms of {measured value, target value} obtained from a lot of training samples by constantly moving the sliding window of W .

2) TRAINING THE NEURAL NETWORK OF RBF
To train the neural network of the RBF. Firstly, inputting those training samples obtained from step (1) into RBF, and we regard the measured values as an input port and think of target values as the output of the network. The adjustment for these weights of the network is not finished until learning all training samples one after another.

3) PREDICTION BY APPLYING THE NEURAL NETWORK OF THE RBF ALREADY TRAINED
Obtaining k times of the actual value before time j (i.e., {x (j − k) , x (j − (k − 1)) , · · · , x(j − 1)}) is significant to predict FrCC at time j. And then, by putting the k actual values into the network can get the prediction value x(j).

4) EXPERIMENTAL PROCESS AND ITS PREDICTION RESULTS USING RBF
There are main four steps in the procedure of the prediction: x Step 1: Choosing distinct values of the parameter k (i.e., k = 1, 2, 3, 4, 5, 6) to prepare those training samples.   smallest APAE to confirm an ideal one of the parameter k: Figure 7. illustrates the experimental results of the prediction for selecting different values of the parameter k simulated by MATLAB. Table 3 lists the predicted FrCCs and their corresponding APAE. As seen from Fig.7 and Table 3 that the APAE of the actual values and predicted values are comparatively little when k is 1, 2, 3, 4, and 5, except for 6.
In particular, when k = 2, the APAE reaches a minimum value of 0.0192. The curve plot of the predicted values is much closer to that of the actual values. The results illustrate that the prediction accuracy of the RBF neural network was quite good. Furthermore, with the increase in the parameter k, APAE also increases gradually. It indicates that the predicted values deviate from actual values slightly when k = 4, 5, 6.

V. PREDICTION MODEL BASED ON GAPS-RBF NEURAL NETWORK
The theory of membrane computing (also known as P-system) involves simulating the function and structure of living cells and extracting a computing model from them. In the Genetic Algorithm Programming System (GAPS), we introduced an Adaptive Genetic Algorithm (AGA) into the membrane calculation. Based on the genetic operation. The procedure involved some operation rules of the communication between membranes. It can enrich the evolution rules of this algorithm and those solution object sets and address the problem about GA's ''premature,'' in which every object denotes one solution and produce those initial objects in the distinct membranes of the membrane system. Considering the characteristics of the three types of clouds discussed before, and including the three types of cloud features of statistical, texture, and spectral, the membrane structure of degree 3 was selected as the genetic membrane utilized in this study and follows expression can formulate its multi-groups: The crossover probability and mutation probability of this genetic algorithm can be written via the adaptive function, respectively, using the following equations: In equation (21), P c and P m are the crossover probability and mutation probability in this genetic algorithm. F a and F b is this fitness for individuals to be crossed and mutated; Parameter K i (i = 1, 2, 3, 4) is a random number, which has the value of (0,1) and K_1>K_2, K_3>K_4. F avg and F max are the averages and maximum values of the fitness for the current population.
In every iteration, using a genetic operation and the transferring rule on the membrane selection can choose out the individuals with the best fitness value in membranes. At the same time, the same genetic operator is also operated outside the membrane to obtain the optimal ones and send them to the membrane. Accordingly, these operations can help realize communication among membranes and increase the efficiency of membrane computation. Figure 8. shows the flow chart of the GAPS-RBF algorithm. The first module of the initialization has four parts, including the determination of the RBF neural network structure, initialization of the parameters for the RBF neural network, given the population number and optimization objectives, and code in actual numbers for the initial weights and thresholds of neurons. The second module of the processing procedure for GAPS-RBF is involved in the computation of fitness of each group, selection operation, crossover operation, mutation operation, etc. The third module of the output and its evaluation consists of getting the optimal initial weights and thresholds of neurons, error calculation, and refresh for updating the initial weights and thresholds.
In addition to these processing parts, two conditions can link two modules or two procedures. The first decision condition for achieving the optimal object connects the second and third modules. Furthermore, the second judgment case on the ending requirements links the two procedures of output results and updating the initial weights and thresholds. This algorithm of prediction based on GAPS-RBF can be described in detail as follows:

VI. RESULTS ANALYSIS AND DISCUSSION FOR EXPERIMENTAL SIMULATION A. EXPERIMENTAL SAMPLES
In the experiment, the selected database is 235 times cloud/ none cloud images with 1-hour temporal and 5km spatial resolution whose recorded span is from 1 March 2011 at 00UTC to 16 August 2011 at 1600UTC, as well as the chosen study area is a region of 10 * 10 size.
Some other settings are: the iteration number of the IFCM is 100 times, the number of those clustering centers is C = 3, and the power weight is m = 2. As for the RBF neural network, its iteration number is 1000, and it adopted 24-15-22 as structure, which means that the nodes of the input layer, hidden layer, and output layer are 24, 15, and 22, respectively. Furthermore, the iteration number of the genetic membrane is 2000, and the iteration number of the genetic algorithm for comparison is 2000. Meanwhile, using the expression (22) below normalized all samples: where X i and X i are the original and normalized data, respectively. And X max and X min are the maximum and minimum values, respectively, in all samples. VOLUME 10, 2022 Algorithm   Table 4, although the average PAR (Producer Agreement Rate) obtained from the FCM is a little lower than that from the FMASK method (Function of MASK method proposed by Woodcock), the average NAR (Non-Agreement Rate) of the FMASK way is higher. Furthermore, the NAR and average PAR of the FCM is better than the SVM method. Consequently, among these three algorithms, the best method is the FCM, and the SVM way is the second one that follows it. The algorithm of the FMASK has the highest average NAR and PAR, meaning that though FMASK can catch most of the clouds, but also wrongly classify some pixels in the clear sky as clouds. In contrast with the FCM method, the SVM way can have a little lower average PAR but higher average UAR (user agreement rate), suggesting that the SVM method may wrongly detect or not detect some pixels of the actual cloud. Similarly, the lowest average UAR obtained from the FMASK method indicates the results may contain some pseudo cloud pixels.

2) PREDICTION RESULTS AND ANALYSIS
In the experimental simulation, we predicted the photovoltaic output for each 30 minutes time point from 7:30 to 17:30 in one day. Figure 9 shows the forecasted data, and Table 5 lists the corresponding original data.  According to the original data in Table 5 and the predicted results for the three classes of the weather seen in Figure.10, the forecast plot of the sunny day is fundamentally coherent to the physical truth and only has a small prediction error. There is usually a deviation in the prediction profile of the cloudy weather, and its forecasting error is also the largest among these three weather classes, which is affected by the specific uncertainty of the cloud position and cloud amount.
For quantitatively assessing the effectiveness of FCM, there are four indexes are used in this paper, including PAR (Producer Agreement Rate), NAR (Non-Agreement Rate), UAR (User Agreement Rate), and RER (the ratio of PAR to NAR) defined as below: With or without the secondary detection(as shown in figure10). The proposed method has the best accuracy. The statistical solution is better than the physical way.

3) SIMULATION AND ANALYSIS OF THE ERROR
To analyze and compare the prediction for these three weather types in quantity. The authors regarded the Root Mean Square Error (RMSE) as the factor for assessing the error. The analyzed results are as shown in Table 5, and the RMSE is as follows: In the expression above, P pi and P mi are the actual power and the predicted output of the PV, respectively. N is the total number of output powers in the prediction system. Following the error data shown in Table 5 and Fig. 11, under the three types of weather situations, the predicted results of models 1 and 2 are almost more consistent with the actual PV output of the day to be forecasted.
However, the forecasted value of model 3 is far from the actual values because the original data used in this model are not fit to cluster via IFCM. Therefore, as found from this result, using IFCM clustering can improve the prediction. Meanwhile, in contrast to model 2, the prediction error of model 1 is lower because of the introduction of the computation of the genetic membrane based on the GAPS-RBF algorithm, which can better choose the optimal solution in the population.

1) CONTRIBUTIONS OF THE INITIAL AND SECONDARY CLOUD DETECTIONS
To most scenes, the method of initial cloud detection can achieve much higher BERs, PARs, and lower NARs than that of secondary cloud detection, which means that the initial cloud detection possesses a better detecting accuracy. This result is because the initial cloud detection mostly screenings the clear clouds while applying the secondary cloud detection may detect those thin clouds surrounding the thick clouds that are hard to find. Furthermore, the method of secondary cloud detection could look like being much more key in the scenes with a large amount of thin cloud, just like the Landsat 7 ETM+ scene. Meantime, even though the secondary cloud detection mislabels some pixels in the clear sky as clouds, it also can improve the PARs.

2) NECESSITY OF THE SECONDARY CLOUD DETECTION
It is not suitable to use the secondary cloud detection in eight Landsat 7 ETM+ images due to the relatively short distance between those cluster centers gotten from this detection way. Secondary cloud detection is not a suitable method for these scenes. Additionally, although secondary cloud detection is not a stable way to screen the eight images, it can improve the average BER. Therefore, that means it is better to exclude these eight images for the second cloud detection and then set a reasonable threshold to determine the necessity of this detecting algorithm.

3) SEARCHING WINDOW SIZE FOR CLOUD SHADOW DETECTION
From the results and analysis above, it is visually to determine the searching window size and is easy to fine-tune it. In addition, it should be better not to set the searching window size too small or too large. It may include some pixels of the pseudo cloud shadow if the searching window size is too big, and there may exclude some pixels of the actual cloud cover once the searching window size is too small, which could decrease the UAR.

4) SUMMARY OF CONTRIBUTIONS
The experimental results denote conclusions as follows: (1). it is reasonable to utilize RBF neural networks for predicting a change of some specific regions cloud cover; (2). to choose a value of K can improve the performance of the prediction accuracy for cloud covers; (3). the proposed algorithm is an intelligent prediction way to forecast, and it can have good generalization ability and better robustness.

VII. CONCLUSION
To forecast the photovoltaic output of the photovoltaic station precisely daily. This paper proposed a HYBRID prediction model based on an improved fuzzy c-means clustering (IFCM) of meteorological aspects and GAPS optimized by a radial basis function (RBF) neural network as predicting model. First, utilizing the PEARSON correlation coefficient to choose the three most significant factors in many factors affecting the photovoltaic output; Second, based on the three types of data, and using IFCM algorithm to detect the cloud cover; Finally, a detailed RBF neural network is given and coordinated with the GAPS method to predict the PV output of the PV power station and apply the genetic membrane optimization algorithm for optimizing the initial weight threshold of the RBF neural network model. The usage of the RBF neural network for predicting the PV is feasible and can improve the prediction accuracy by choosing k. The results show that the proposed method is efficient and suitable, can efficiently reduce the prediction error, and has good generalization ability and better robustness.