Quality Analysis of Photovoltaic System Using Descriptive Statistics of Power Performance Index

The performance evaluation (PE) of the Photovoltaic(PV) system is an index representing the efficiency and reliability of the system. Most PE indicators evaluate the ratio of theoretically calculated power generation to actually measured power generation. The closer the ratio is to 1, the more ideal the PV system is. PV system varies depending on weather conditions and regional characteristics, especially on the types of sensors and measuring variables. Floating Photovoltaics (FPVs) and Marine photovoltaics (MPVs) vary with the environmental variables more as it is installed on the water and sea. In this paper, on the contrary to the existing PE methods, the most accurate regression model considering ambient temperature, relative humidity, and wind speed was used to predict the power in order to improve the accuracy. The optimal PE method for the PV system to easily and accurately detect failures of the PV system is proposed. Data from three FPVs in the same environment were analyzed for 1 year. The PV power prediction model including the wind speed and relative humidity was used to improve accuracy. The quality diagnosis was performed with an improved PE and the impact of various events can be represented through this. In this paper, the distribution of the corresponding Power Performance Index (PPI) values was analyzed using a descriptive statistic method, and indicators in terms of quality control were presented. The PV power generation state can be determined by the location of the average and median values of the boxplot. The fluctuation of PV power generation was identified using the size change of Inter Quartile Range (IQR), which represents the degree of data scattering. As a result, it was confirmed that failure occurred in the system when the IQR value is 2 times bigger than the normal IQR value.


I. INTRODUCTION
Global PV capacity is expected to reach 2.2 [TW] by 2030, which means the Operation and Maintenance(O&M) of the PV system will grow rapidly over the next decade. This indicates the need for the management of PV systems [1]. O&M is necessary to maintain the independence and sustainability of PV system and economic analysis based on LCOE, the unit cost of electricity production; is also necessary during the lifetime of the PV system. The life of the PV system is 20 to 30 years so the optimal technology through O&M for maximum power generation is needed [2]. The PV system installation capacity is increasing worldwide, so the necessity of O&M considering climate and environmental variables such as solar irradiance, temperature, relative humidity, and wind speed is increasing as they affect PV systems a lot. The level of the monitoring system must be determined by the size and type of the PV power plant and the effect of environmental variables must be considered as well [3]. Usually, visual inspection is used to detect shading and contamination of PV modules. This method is inexpensive but the accuracy of detecting failures is very low. Another method is using digital image processing technology with thermal images taken by drones or aerial vehicles [4]. Supervisory Control and Data Acquisition (SCADA) is a lowcost open-source system for remote controlling of monitoring electrical and environmental data of PV systems. This is based on the Internet of Things (IoT) and web services [5]. Based on the collected data, the failure mode and effects analysis (FMEA) approach also secures moderate fault detection accuracy. The predicted life model of the PV system can be used to calculate the degradation rate and to identify or classify potential defects [6]. In addition, the O&M using prediction models based on machine learning is used. Using this data, PV power is predicted with time-series analysis like the ARIMA model, or the climate pattern is analyzed [7]. The PV system can be divided into the DC part and the AC part. The DC part is then divided into a PV module that generates power and a junction box that integrates PV modules connected in series and parallel. The AC part includes inverters and power distribution facilities [8].
In the DC part, the PV module is the main cause of PV power system failures. PV modules generate at low power even when they have failures because complete breakdown is rare in PV modules. However, this can lead to short-term and long-term failures. A short-term failure is usually caused by power degradation and a long-term failure is usually caused by the mismatch of PV modules. The broken PV module operates as a load and causes permanent failures. Basically, silicon PV modules have an annual degradation rate of 0.5%, and 80% of power output is ensured for 20 to 25 years [9]. In the case of the inverter, when the voltage of the PV array and string does not reach the Maximum Power Point Tracking (MPPT) operating voltage, the overall power generation is stopped and the life span of the inverter is 7 to 10 years [10], [11]. When PV power generation is connected to a power system, power quality problems like the harmonic problem should be considered because harmonics are generated in the process of converting DC to AC. In addition, the PV system is weather-varying power generation, which causes voltage fluctuations. According to IEEE, the voltage inequality is recommended to be within 1 ∼ 4 % based on the asymmetry of phase angles between voltages of different phases [12].
This paper presents the optimal power performance index (PPI) using the optimal PV power generation prediction model and aims to classify events such as failures using the distribution of these PPIs. This can quantitatively represent the state of the overall operation of the PV power generation system and is visually expressed so that it is easily interpreted as central tendency, variation, and distribution. The organization of this paper is as follows. In Chapter 2, the types of PV system performance indices are introduced. In Chapter 3, the experimental site details are described. In Chapter 4, the expected model validation of the PV system is conducted. In chapter 5, the results of the PPI model are discussed.

II. PV SYSTEM PERFORMANCE INDICES A. CAPACITY UTILITY FACTOR (CUF)
The performance of the power plants can be expressed with a capacity utility factor(CUF). In general, it is expressed as a percentage and calculated by dividing the measured power generated by the product of the installation capacity of the power plants and the power generation time as shown in (1).
The CUF depends on the characteristics of the plants since the operating times of the power plants are different. According to the report from U.S. Energy Information Administration, the CUF of coal thermal power plants tends to decrease steadily. It was 56.2% in 2012 and 40.5% in 2020. However, the CUF of oil thermal power plants tends to increase from 13.7% in 2012 to 16.4% in 2021. The average CUF of natural gas power plants was 53.6%. The CUF of nuclear power plants is much higher than other power plants. It was 93.4% in 2019 and the average CUF was 91.7%, which is almost close to the ideal value. This is because of the long operation time and high capacity it has. Hydroelectric power plants also have high capacity but the operation time is shorter than the nuclear power plants so the average CUF from 2012 to 2021 was 39.3%. The CUF of wind power plants steadily increased from 31.8% in 2012 to 34.6% in 2021. The CUF of PV power plants was 24.5%, which is lower than other power plants [13]. This is because PV power plants have shorter operation times than other power plants. Therefore, the performance index suitable for PV is required since it is too dependent on the weather environment.

B. YIELD
Yield is a method of analyzing the time from when sunlight is incident on the PV module to the time when the AC output of the inverter occurs using the operation time. This can be evaluated for each section of the PV system and expressed as a quantified value. Y r in (2) is the reference yield of solar incidence entering the PV module. It is a ratio of the amount of solar radiation measured(G a,meas. ) to the 1kW /m 2 (G a,ref ). Y a in (3) is the ratio of the actual measured amount of generated power to the installed PV system DC output capacity (P as ). Lastly, Y f in (4) is the final generated power and expressed as the ratio of the measured AC output(P f ,meas. ) to the DC output capacity(P as ). Equation (5) is the performance ratio(PR) calculated using yield [14].
PR is used for comparing the performance of PV systems and for monitoring the PV system over a long period of time. PR is expressed as the ratio of the actual measured power to the theoretical power. This is the final yield compared to the reference yield in IEC 61724. In addition, PR is divided into two types: PR considering only solar radiation and PR considering the module temperature coefficient.

1) UNCORRECTED PR
(6) is uncorrected PR, which is the PR considering only solar radiation. It can be represented as the ratio of the reference solar radiation(G POA /G ref ) multiplied by the capacity of the PV system(P max,ref ) to the actual solar radiation(P meas. ) on the PV system [15].
2) TEMPERATURE-CORRECTED PR (7) is temperature-corrected PR, which is the PR considering the PV module temperature coefficient. In summer, when the ambient temperature is high, (6) tends to decrease rapidly and in winter, when the ambient temperature is low, (6) tends to be higher than in any other times. This is a problem due to the characteristics of the solar cell. The deviation of PR increases according to the method of calculating the output, which is the denominator of (7) [16].
Performance Index(PI) is a ratio of actual measured power(P meas. ) to the expected power (P expected ) that is calculated from the power prediction model. Like PR, the closer the PI gets to 1, the better the system is. PI is detailed in SunSpec and PV performance evaluations from San Jose University [17].
PPI is represented by the ratio of the actual instantaneous power and the predicted instantaneous power. The error is small on clear days because the expected power is similar to the actual power, but it is big on cloudy days. Thus, to reduce the uncertainty, it is analyzed only for a certain period of time considering the altitude or by setting a minimum range of solar radiation.

2) ENERGY PERFORMANCE INDEX (EPI)
EPI is the ratio of the actual energy to the predicted energy for a certain period of time. If compensation factors such as temperature or other factors are added to the PR formula, it is similar to PPI. If this is converted into energy, it is EPI.
There are a few methods, such as using programs like SAM, and PVSYST, or using regression models, to analyze EPI. Simulation models like SAM and PVsyst rely on accurate PV power models and have a low error when it shows a trend over time. In particular, the SAM model uses TMY3 data, which is used in a wide variety of meteorological fields. So, it includes many meteorological factors but, for PV, only solar radiation and temperature are needed, and sometimes wind speed is also considered to increase accuracy. The regression analysis model relies on the types of sensors installed in the PV system. The PPI and EPI evaluation methods can be used to set a reference line for evaluation and future performance measurement after the trial run of PV systems or after O&M of PV systems. Table 1 summarizes the types of performance evaluation indicators of PV systems. In this paper, the regression model considering solar radiation, temperature, wind speed, and humidity is used to calculate EPI. For detailed analysis, it is expressed as PPI, which is an instantaneous power change evaluation method [17]. Fig. 1 and Table 2    is 24. The monitoring system includes a pyranometer, thermometer, wind meter, hygrometer, and module temperature sensors. The monitoring was conducted for 1 year, and the power and environment data were measured every 5 minutes. The data from Unit 3 were normalized by multiplying 1.125 since the capacity of Unit 3 was different from other units. Fig. 2 shows the cumulative power of Units 1 to 3. Overall, the cumulative power was low in the winter (November January) and gradually increased from February. Unit2 and 3 show lower cumulative power than Unit 1. Fig. 3 shows the monthly CUF of Units 1 to 3. It shows a similar trend to Fig. 1. The highest CUF was 24% in May and the lowest CUF was 3 to 6% from November to January.     Fig. 2. and Fig. 3. The highest yield was 5.7 hours in May and the lowest is 0.7 hours in December. Unit 2 and 3 show lower values than Unit 1 clearly in times when cumulative power, CUF, and Yield have the lowest value. So, the Units were inspected and analyzed in this paper.

IV. EXPECTED MODEL VALIDATION OF PV SYSTEM
The accuracy of the power prediction model is important since it corresponds to the denominator of the PI. The expected power reflects not only prediction error but also faults of the PV system. Therefore, the PV system can be  analyzed more accurately by using a model with little error. The prediction models are listed in Table 3. Tables 4, 5, and 6 show the sMAPE according to the types of the prediction model. MAPE is generally used in error analysis but MAPE has a lot of bias against over-and under-prediction. Unlike MAPE, sMAPE can be analyzed symmetrically with an upper limit of 200%. The mathematical expressions of MAPE and sMAPE are as follows [24], [25].   are installed, has the lowest error. Unit 3 has the highest error among the three units. VOLUME 11, 2023   Therefore, Kim et al.(P5) model was used to calculate PPI since it has the lowest error. This model was more accurate than the other regression models consisting of irradiation and temperature only, especially because the humidity was added in the low irradiation period [32]. Thus, accuracy is high in a low power generation period rather than a high power generation period, and units 1,2, and 3 commonly produce less error during the fault-detected period.

V. RESULTS
As a result, the PPI was calculated using a regression model with the lowest error and the distribution of PPI was analyzed with a descriptive statistic method called box plot. Fig. 5. and Table 7 show the concept of a box plot. It is represented by the maximum value, the minimum value, the first quartile (Q1), the intermediate value (Q2), the third  quartile (Q3), and the box length (IQR). The maximum and minimum values are usually within a distance of 1.5 times the IQR, and the data outside this are considered outliers. Q1 is the median of the bottom 50% of the total data, and Q3 is the median of the top 50% of the total data. The box plot is useful to see the overall distribution between data at a glance so it is used to compare data. The average, median, and minimum values are the same if it is expressed with the normal distribution as a box plot. In this study, an indicator of quality management by analyzing the PPI distribution using box plots is presented [33], [34]. Fig. 6. shows the box plot of PPI for the entire period. The length of IQR is similar in Units 1 and 2 but, that of Unit 3 is almost twice longer. Also, the average and median of Units 1 and 2 are close to 1 but, that of Unit 3 is bigger than 1. Fig.7. shows the bod plot of PPI for the normal period, which excludes October to January. The size of IQR was reduced by nearly half, and accordingly, the range of the minimum and the maximum values were also reduced. The size of IQR in Unit 3 was also decreased, but there was no significant change in the maximum value, and only the minimum value was slightly increased. Fig. 8. shows the PPI of faults detected period, which is from October to January. Overall, the length of the IQR tends to increase in all units, and the positions of the median and average values have changed a lot.
For a detailed analysis of the PV power system, the PPI box plot was analyzed on a monthly basis. Fig. 9. to Fig. 11. show the distribution of PPI monthly. Fig. 9. is the case of   Unit 1, and it can be seen that the average and median values are close to 1 in the entire section. However, for Unit 2 in Fig. 10., it is lower than 1 during the fault-detected period. Unit 3 in Fig. 11. has a lower value than Units 1 and 2 during the fault-detected period.
Unit 1 has a relatively large cumulative power, but the length of the box is long during the fault-detected period. Therefore, PVsyst was used to simulate the expected  cumulative generated power of Units 1, 2, and 3. Fig. 12. is a graph comparing actually measured cumulative power for each unit, Kim et al.(P5) model simulation, and the PVsyst simulation. Overall, it showed the same tendency as units 1,2, and 3, and; in particular, it was almost identical in the failure section. The data used to simulate using PVsyst were the past environmental data, so there is an absolute difference in the amount of generated power but it was conducted to compare trends in time series. The power of Units 1 to 3 is significantly lower than the PVsyst simulation during the fault-detected period as shown in Fig.12. The average and the median of Unit 1 are close to 1, but the box length is large. For Unit 2, the average and median are lower than 1, but the box length is the same as for Unit 1. For Unit 3, the average and median are lower than Units 1 and 2, and the box length is larger than Units 1 and 2. This shows that Unit 3 has a problem compared to Units 1 and 2. However, if the units are compared with the simulation, it can be seen that all the units have a problem during the fault-detected period. During the faultdetected period, the water level of the reservoir in winter and the low solar altitude caused shade by trees in front of Units 1,2, and 3. This is the reason why power generation decreased rapidly. Especially, Unit 3 had the lowest power generation due to the larger shaded area followed by Unit 2 and Unit 1. The power generation showed the same trend as the PVsyst simulation on any day except in winter. Since power generation decreases due to low solar altitude in winter, it was compared with another PV system located within 7km of the test site to compare the effect of the specific season. Fig. 13, 14, and 15 are diagrams that analyzed the box plot of PPIs of Units 1 to 3 at daily intervals during September 2021, December 2021, and July 2022 respectively. According to Fig. 13, the distribution of the average values of the daily PPI box plot for each unit is 0.80∼1.08 for Units 1 and 2, and for Unit 3 was 0.57∼0.82. The distribution of intermediate value for Unit 1 was 0.82∼1.09, Unit 2 was 0.92∼1.09, and Unit 3 was 0.60∼0.79. The size of IQR for Unit 1 was 0.04∼0.32, Unit 2 was 0.04∼0.29, and Unit 3 was 0.04∼0.32. This corresponding part is sorted as a normal section. The IQR range between Units 1 and 3 is similar, but in the case of Unit 3, the average value and the median value are lower than other Units. This appears to be larger than the actual value when the PPI is less than 1. Therefore, Unit 3 is suspected to have failures due to the distribution of intermediate and average values lower than Units 1 and 2 even though it has a small IQR size.
According to Fig.14, the distribution of the average values of the daily PPI box plot for each unit is 0.38∼1.59 for Unit 1, 0.05∼1.22 for Unit 2, and 0.01∼1.23 for Unit 3. The distribution of intermediate value for Unit 1 was 0.13∼1.29, Unit 2 was 0.03∼1.23, and Unit 3 was 0.00∼1.25. The size of IQR for Unit 1 was 0.04∼0.18, Unit 2 was 0.03∼0.72, and Unit 3 was 0.02∼1.32. In this period, 2021 December, the median and average values are low and the size of the IQR is big, which means that this period is the fault-detected period. The size of the IQR of Unit 2 is smaller than that of Unit 1. This is because Unit 2 has a small error between the actual value and the predicted value of irradiation since the pyranometer is located at Unit 2.
According to Fig.15, the distribution of the average values of the daily PPI box plot for each unit is 0.90∼1.14 for Unit 1, 0.92∼1.11 for Unit 2, and 1.08∼1.26 for Unit 3. The distribution of intermediate value for Unit 1 was 0.94∼1.16, Unit 2 was 0.96∼1.12, and Unit 3 was 1.07∼1.27. The size of IQR for Unit 1 was 0.04∼0.24, Unit 2 was 0.03∼0.27, and Unit 3 was 0.06∼0.36. In this period, July 2022, the median and average values were close to 1 and the IQR size was small so it can be said that the system was in normal operating condition. Unit 3 has a higher average, median, and IQR than other Units. This is because PPI was calculated using the predicted value from regression analysis. Fig.16 is the rooftop PV system site near the FPV test site. This rooftop PV system was installed at the rooftop of Chungbuk Technopark, Korea (36 • 54 ′ 08.5 ′′ N 127 • 32 ′ 25.2 ′′ E). PPI was calculated using this system during the same period as FPV. Fig. 17 is the PPI box plot of the rooftop PV system. The average and median values were near 1 and the IQR size was small, however, the IQR size was big in September 2021 and June 2022. As a result, it was confirmed that fault detection can be conducted using the change of average, median, and IQR regardless of the season.  The process of the method described above is expressed in a diagram at a glance. The first step is to choose the best regression model through model validation. The second step is to calculate the PPI and make it to the box plot of the median, average, and IQR. The third step is the fault-detecting part. The ideal value of median and average is 1, meaning if the median and average are not 1, the data lacks accuracy. Also, through the size of IQR, precision can be confirmed. If the IQR value is more than twice the VOLUME 11, 2023   minimum IQR, it can be said that the system has a precision fault.

VI. CONCLUSION
In this paper, on the contrary to the existing PE methods, the most accurate regression model considering ambient temperature, relative humidity, and wind speed was used to predict the power in order to improve the accuracy. The ratio of actual measured values to predicted values of the PV system was calculated as PPI, and the distribution of this index was presented as a quality management analysis method using descriptive analysis. When the PPI value is 1, it means that the smaller the IQR is the more ideal the PV system would be. The average and the median of Unit 1 are close to 1, and the length of IQR was the smallest. During the fault-detected period, the value of IQR tends to increase. Unit 2 has a similar length of IQR to Unit 1, but the average and median values are lower than 1 during the fault-detected period unlike Unit 1. Unit 3 has the longest length of the box(IQR) and the average was close to 1, but the median was lower than 1. During the normal period, IQR is lower than in the faultdetected period, however, the average and the median values were higher than 1.
The cumulated power of the three units was compared with the simulation data by PVsyst. It was proved that there was a poorly operated period since the average and the median are not close to 1, and the IQR is high. During the normal period, the average and median of Unit 1 and Unit 2 are close to 1, and the IQR is low. On the contrary, the average and the median of Unit 3 are higher than 1, and IQR is also higher than Units 1 and 2. This happened because the predicted value was lower than the actual value, which made PR to be more than 1.
It was possible to evaluate the photovoltaic system in terms of the quality of the indicators in the fault-detected period by comparing the normal period through changes in the average value, intermediate value, and IQR of the PPI. Therefore, the performance of the photovoltaic system through regression analysis can be expressed as a quality indicator, and the moment the problem occurs can be quantitatively analyzed by the average value, intermediate value, and IQR size.
As a result, the performance of the PV system can be quality managed using the descriptive statistics of PPI, and the moment when the problem occurs can be quantitatively analyzed. PPI values, and distributions of IQR, average, and median vary depending on the capacity and installation environment of the PV power plant. It seemed that more research about environmental data in various environments is necessary so that would be our future work. This paper could contribute to the O&M of the PV system, especially in the case of FPVs and Marine Photovoltaics(MPVs) where generated power varies due to environmental variables a lot.