Improved PV Forecasts for Capacity Firming

Some balancing authorities give owners of medium to large photovoltaic (PV) generation plants a choice between firming the production of their plants using battery energy storage or paying the balancing authority for the cost that these imbalances impose on the system. If the owner of a PV plant decides to do capacity firming, the net production of the PV plant and the battery must match a forecast value. A more accurate forecast of the PV production reduces the energy throughput of the battery and hence its degradation. This article compares capacity firming using persistence forecasts with predictions based on long short-term memory recurrent neural networks (LSTM-RNN), encoder-decoder LSTM-RNN and multi-layer perceptrons. This article also proposes to use the type-of-day, such as sunny, cloudy etc, which can be generated by clustering historical PV generation data according to the total daily PV generation, as a feature of the PV forecasting model. Results based on the Snohomish County Public Utility District’s Arlington Microgrid show that the machine learning techniques perform significantly better than the persistence method in forecasting PV generation. In particular, encoder-decoder LSTM-RNN would reduce the yearly battery energy throughput by 29% and the number of battery cycles with a greater than 10% depth-of-discharge (DoD) by 51%. Including the day-type as a feature in PV forecasting reduces the battery energy throughput by 5.3% and the number of cycles with a DoD larger than 10% by 5.9%.


I. INTRODUCTION
Providing the resources needed to balance the increasing amount of naturally variable and uncertain generation from solar and wind can be very costly for the balancing authorities (BA) that are responsible for maintaining the balance between load and generation within their territory. Some have therefore given the owners of these resources a choice between paying a balancing fee and firming up the output of their plants using their own resources. This work is motivated by the capacity firming requirements of medium to large photovoltaic (PV) generation plants in the balancing area of the Bonneville Power Administration (BPA) and presents results based on the 500 kWac PV system of the Arlington Microgrid owned by Snohomish County Public Utility District (Snohomish PUD) [1].
The associate editor coordinating the review of this manuscript and approving it for publication was Ning Kang .
Snohomish PUD can either pay a fee for capacity firming to BPA or compensate deviations between the actual and forecast PV generation using the 1MW/1.385MWh battery energy storage system (BESS) of the microgrid.
Capacity firming is based on a one-hour-ahead target PV generation determined considering the PV forecast, the maximum allowed ramp rate of the PV profile and the battery state-of-charge (SoC) that should be achieved at the next time-step. The BESS is charged when the PV generation is above the target value or discharged when the PV generation is below this value. In most cases the BESS has enough power and energy capacity to achieve this goal. However, the more inaccurate the forecast is, the more the battery has to compensate with deeper cycles. Since deeper cycles cause more battery degradation [2], [3], it may be more economical in the long run to pay the fee for capacity firming to BPA rather than firming the capacity using the battery. Since more accurate forecasts extend the life of the battery, a careful analysis of the benefits of improved accuracy is thus required.
This article compares state-of-the-art PV forecasting techniques, such as long short-term memory recurrent neural networks (LSTM-RNN), encoder-decoder LSTM-RNN, multi-layer perceptrons (MLP), and the persistence method suggested by BPA. The paper also proposes to use the typeof-day as a feature in the PV forecasting model. The type-ofday is generated by clustering historical PV generation data according to the total daily PV output. The cluster with the highest PV generation is classified as sunny while the cluster with the least PV generation is classified as cloudy. These forecasting methods are compared using not only the rootmean-square error (RMSE), mean-absolute error (MAE) and mean-bias error (MBE), but also the resulting number and depth of the battery cycles.
The remainder of this article is structured as follows: Section II explains capacity firming; Section III reviews the existing PV forecasting techniques and discusses the machine learning based forecasting techniques; Sections IV and V describe the model, while Section VI presents and discusses the simulation results. Section VII concludes the paper.

II. CAPACITY FIRMING
This section explains the incentive scheme for capacity firming (Section II-A), formulates this problem mathematically (Section II-B) and describes the PV forecasting technique proposed by BPA (Section II-C).
A. INCENTIVE SCHEME FOR PV CAPACITY FIRMING BPA, the balancing authority for the Snohomish PUD service region has decided to charge $5400 per year as the cost of capacity firming for the 500 kWac PV system of the Arlington Microgrid. Snohomish PUD could save this cost by ensuring that the actual PV production matches their one-hour-ahead prediction. If Snohomish PUD chooses this option, BPA has proposed to use the persistence approach for PV forecasting described in Section II-C.

B. FORMULATION OF THE CAPACITY FIRMING PROBLEM
The one-hour-ahead PV generation can be firmed using a BESS. The battery is charged when the PV generation is above the target value and discharged when the PV generation is below this value. Fig. 1. shows the energy flow diagram for this capacity firming application of the BESS.
The decision to do capacity firming or not depends on a cost/benefit analysis. While Snohomish PUD could avoid paying this fee to BPA if they did capacity firming, the extra battery cycling required would increase battery degradation and carry a long term cost. More accurate PV forecasts would require fewer battery cycles, cause less battery degradation costs and hence improve the cost/benefit ratio from capacity firming.
The capacity firming problem can be formalized as follows: K and k represent the total number of time-steps and a particular time-step in a day, respectively. We choose 30 minute intervals because this is the duration that BPA uses to assess capacity firming.
• A set of variables to define the energy flows in the system. The battery charge/discharge rate p b k , is the difference between the actual PV generation p pv,actual k and the target output p t k : where a positive value for p b k means the battery is discharging and a negative value that is charging.
• The target output p t k is estimated considering the PV forecast, the maximum allowed ramp rate of the PV profile and the desired battery SoC at the next time step. The target PV generation is calculated as follows: where B SoC k is the battery SoC at time-step k in kWh, B SoC,50% is the battery 50% SoC in kWh and C determines how close to the 50% level the SoC should remain. A higher value of C keeps the SoC closer to 50%. The factor ''2'' is needed to convert energy into power because we use 30 minute intervals. The BESS system is assumed to have enough power and energy capacities to correct the expected range of deviations in PV generation. Finally, the target PV generation submitted to BPA cannot violate the ramp rate limits: where γ r is the ramp rate limit. The battery SoC, B SoC k ∈ [s SoC,min , s SoC,max k ], evolves according to: where η b is the battery efficiency. The number of battery cycles that a lithium-ion battery can undergo before its capacity falls below an acceptable threshold depends on the depth-of-discharge (DoD) of these cycles. The number of battery cycles and their magnitudes are calculated using the rainflow cycle counting algorithm and these numbers are then used to quantify the battery degradation.
The net present value (NPV) of capacity firming is calculated by subtracting the present value of the battery replacement cost after the lifetime of the project from the present value of all the revenue over the lifetime of the battery. The battery replacement cost is the expected degraded battery capacity over the lifetime multiplied by the battery replacement cost per kWh.

C. PERSISTENCE APPROACH TO FORECASTING
Because of its simplicity, BPA has suggested to Snohomish PUD to use the persistence forecasting technique for one-hour-ahead PV forecasting. For example, under 30/30 persistence forecasting, the net generation for the 2:00 PM to 2:30 PM interval is calculated by taking the average of the generation output from 1:00 PM to 1:30 PM. Similarly, the schedule for the 3:00 PM to 3:30 PM interval is calculated by taking the average of the generation output from 2:00 PM to 2:30 PM.

III. PV FORECASTING USING MACHINE LEARNING
This section first presents a brief review of existing PV forecasting techniques and then the state-of-the-art techniques that have been shown to provide the best PV forecasts.

A. BRIEF REVIEW OF EXISTING SOLUTION TECHNIQUES
A range of techniques have been proposed to optimize capacity firming [4]- [10] but these approaches did not include state-of-the-art machine learning techniques discussed in this article.
The underlying PV forecasting problem is a sequence-tosequence (seq2seq) time series prediction because the PV generation has to be predicted two time-steps ahead. PV forecasting can be categorized into physical models and datadriven models. Physical models use numerical weather prediction, which shows good performance for forecast horizons from several hours up to six days [11], [12]. Data-driven models can be further divided into statistical and machine learning models. Statistical models include auto-regressive integrated moving average, auto-regressive moving average, coupled auto-regressive and dynamic system, Lasso, and Markov models. Machine learning models include support vector machine, feed-forward neural networks, and RNN such as LSTM networks [13]- [28]. These approaches can be further subdivided according to the type of input features that are used to train the model. A forecasting model that uses only a target time-series as an input feature (solar irradiance in this case) is referred to as a nonlinear auto-regressive (NAR) model. On the other hand, if a model uses additional exogenous inputs, such as temperature and humidity, it is referred to as a nonlinear auto-regressive with exogenous inputs (NARX) model. According to [13], [18], a vector output LSTM-RNN performs the best at forecasting dayahead PV generation.
Section III-B describes feedforward neural networks, also known as multi-layer perceptrons, while Section III-C is devoted to LSTM-RNN and encoder-decoder LSTM-RNN. The encoder-decoder LSTM-RNN was first proposed and applied to speech recognition problems [29], [30] in order to effectively solve seq2seq time-series problems with multiple outputs. Our encoder-decoder LSTM-RNN implementation is based on [18].

B. MULTI-LAYER PERCEPTRONS
In general, MLPs are feedforward neural networks with multiple layers of perceptrons. A perceptron is a single neuron model that has weighted input signals and produces an output signal using an activation function. An MLP consists of at least three layers: an input layer, one or more hidden layers and an output layer, as depicted in Fig. 2. The input, hidden and output nodes are defined as Here the total number of input, hidden and output nodes are D, M and R, while d, m and r denote a particular node. The output value for y r is calculated using: where σ and H are the activation functions for output and hidden layers, respectively. The biases b (1) m and b (2) r are for the hidden node m and the output node r, respectively. The weight matrices of hidden-input and output-hidden are w (1) md and w (2) rm , respectively.
MLPs utilize a supervised learning technique called back propagation for training and the training data needs to be arranged in a 2-dimensional (2D) matrix of samples VOLUME 8, 2020 and features. Therefore, the only way to use MLPs for timeseries prediction is to use the time sequence data as a feature (more details in Section IV). The other option is to use a RNN, which is explained in the next section.

C. LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORKS
LSTM networks are a special type of RNN that are capable of learning long-term dependencies. Fig. 3 illustrates the RNN and Fig. 4 the LSTM cells [13]. A RNN has a feedback connection where the output depends on the current input to the network, and the previous inputs, outputs, and/or hidden states of the network, as depicted in Fig. 3. Given an input sequence x = (x 1 . . . x t . . . x T ), RNNs compute the hidden vector h = (h 1 . . . h t . . . h T ), and output vector sequence y = (y 1 . . . y t . . . y T ) by iterating (6) and implementing the calculation shown in (7).
where H is the activation function of the hidden layer. W xh , W hh and W hy are the weight matrices of the input-hidden, hidden-hidden, and hidden-output connections, respectively. The hidden and output bias vectors are b h and b y . The error gradients of RNNs that are trained using backpropagation through time can accumulate during an update and result in very large gradients. These gradients result in large updates to the network weights, and in turn, an unstable network. At an extreme, the weights can become so large as to overflow and result in non-computable values. LSTM networks have been developed to overcome the exploding back-propagated gradients of RNNs by providing explicit memory to the network. LSTM units are building units for layers of RNNs. A typical LSTM unit consists of an input gate, forget gate, output gate, and a cell unit.
The operations of the LSTM unit can be described as follows. The most important component is the cell state c t , which serves as a memory and remembers values over an arbitrary time interval. The input gate i t , forget gate f t , and output gate o t control the flow of information into and out of the cell and has the same size as the hidden vector h. The forget gate f t outputs a 0 or 1 for each number in cell state to decide what information we want to put out from the previous cell state c t−1 , according to: where σ is the sigmoid activation function. W f and b f are the weight matrix and bias of the forget gate, respectively. Similarly, the input gate output i t decides the new input information that should accumulate in the memory cell: where W i and b i are the weight matrix and bias of the input gate, respectively. The LSTM cell state is then updated as follows, but with conditional self-loop weights W c and b c : The output hidden state h t of the LSTM cell depends on the cell state c t and the output gate o t : The output gate o t is calculated using: where W o and b o are the weight matrix and bias of the output gate. Note that the hidden state h t can be shut off via the output gate o t , which uses a sigmoid activation function. Encoder-decoder LSTM-RNN further improve the solution quality. This architecture, as depicted in Fig. 5, involves two models: one for reading the input sequence and encoding it into a fixed-length vector, and a second for decoding the fixed-length vector and outputting the predicted sequence. Next section models the PV forecasting problem.
Steps one and two are done day-ahead while step three is done in real-time.

A. DATA PREPARATION
We used two datasets for the analysis: Dataset A has the actual PV generation from the Arlington Microgrid from June 5, 2019 to June 4, 2020 and Dataset B has the PV generation data generated using the empirical formulas from [31] at the approximate location of the Arlington Microgrid (latitude 48.17 and longitude -122.14) and based on National Renewable Energy Laboratories (NREL) historical solar insolation and meteorological data over 6 years (2013 to 2018). Historical data are separated into training and testing sets and arranged in a 3D matrix (samples, time sequence, features) for RNNs and 2D matrix (samples, features) for MLPs. We use PV output, temperature, time-of-the day, season and day-type as features. Days are classified into sunny, less sunny, less cloudy and cloudy day-types by clustering historical PV generation data according to the total daily PV output using a k-means algorithm. Fig. 6 shows the median PV profiles for the four day-types over a year. When predicting, the day-type is estimated on the day ahead using a weather forecasts or a day-ahead PV-forecast. Preliminary simulations showed that the optimum length of the input time-sequence is 47 time-steps (i.e. one day). Outputs are the PV generation for the next two time-steps.

B. TRAINING
Training the non-parametric model maps inputs and outputs as shown in Fig. 7. The parameters of the LSTM-RNN are given in Table 1.
Since Dataset A covers only a year, we cluster the PV generation data into four day-types according to the total daily PV generation using a k-means algorithm. The aim here is to identify the performance of the machine learning techniques on different types of days. Each of these clusters is trained separately and, because of the limited sample size, only the PV generation was used as a feature (i.e. we implemented a NAR model). On the other hand with the NREL dataset (Dataset B) we were able to use all the features mentioned in Fig. 7 (i.e. we implemented a NARX model). Therefore, we used the Dataset B for the economic analysis.

C. TESTING
The testing dataset was used to make real-time predictions for the next two time-steps, i.e. one hour-ahead forecasts. For Dataset A, 28 days in each cluster are used for testing and the remainder is used for training, as shown in Table 1. On the other hand, the NREL dataset used one year for testing and five years for training. VOLUME 8, 2020 The machine learning models were trained and tested using Keras, which is a high-level neural networks application programming interface, written in Python, on a computer with a 2.8 GHz Intel Core i7 processor and a 16 GB 2133 MHz LPDDR3 memory card.

V. BATTERY MODELING
Operation of the 1 MW/1.385 MWh Lithium-ion battery of the Arlington Microgrid is limited to the 10% to 90 % SoC range. Fig. 8 shows the expected degradation of this battery over ten years assuming one energy throughput cycle per day, i.e. if there are multiple cycles, the total discharge equals 800 kWh. The average charge and discharge rate is assumed to be 1 C. Note that after ten years, this BESS is expected to have 1 MWh of capacity after degradation. In order to compare the battery degradation from the four PV forecasting methods, we generated battery SoC profiles for given actual and forecast PV generation using (1)-(3). The battery charging, battery discharging and inverter efficiencies are assumed to be 98%, 92% and 92%, respectively. The value of C is set to 1.2. Based on these SoC profiles, we use the rainflow algorithm to calculate the battery cycles and their magnitudes. Figures 9, 10, and 11 show the PV generation forecasts and the battery SoC for typical cloudy, less sunny and sunny days, respectively. Tables 2 (Dataset B) and 3 (Dataset A) compare the RMSE, MAE, MBE and the resulting battery cycles of the LSTM-RNN, encoder-decoder LSTM-RNN, MLPs and the 30/30 persistence forecasting techniques. In all cases, machine learning forecasts are significantly better than the 30/30 persistence forecasts. The RMSE and MAE over a year from the encoder-decoder is 35.7% and 42.6% better than the persistence method ( Table 2). The accuracy of the PV forecast varies with the type of day and machine learning technique, as shown in Table 3. For example, LSTM-RNN produced the best PV forecasts on sunny and less sunny days   while encoder-decoder LSTM-RNN performed the best on less cloudy days.

VI. SIMULATION RESULTS AND DISCUSSION
The number of yearly energy throughput cycles can be reduced by 29.1% (66 cycles per year) using the encoderdecoder LSTM-RNN forecasts. Battery cycles involving more than a 10% Depth of Discharge (DoD) can be reduced Simulation results based on NREL data over a year using five years training data. The improvements from the persistence method are given as percentages in parenthesis. The ''e'', ''b'' and ''h'' in the training time column are the epochs, batch size and number of hidden units, respectively. A positive value for MBE indicates an under-prediction while a negative value indicates an over-prediction. by 51%. Such deep cycles have a disproportionate effect on lithium-ion battery degradation. Unfortunately, due to the limited amount of data available about battery degradation, it has not been possible to quantify this effect more accurately. The benefit of using machine learning based forecasts is much higher during sunny days because incorrect forecasts result in higher battery energy throughput, as shown in Fig. 11.
The PV forecast from any of the machine learning techniques are significantly improved by using the type-of-day as a feature, as shown in Table 2. The battery energy throughput, the number of cycles above 10% DoD and the RMSE are improved by 5.3%, 5.9%, and 5.3%, respectively, by using day-type as feature in the encoder-decoder LSTM-RNN. Similar improvements are seen with the MLPs and LSTM-RNN.
In general, over-forecasting and under-forecasting have different effects on the battery usage because the PV output can be curtailed instead of charging the battery to meet the target generation. However, since the PV array of the Arlington Microgrid is part of a community solar project where each panel is owned by a different individual, the PV array is expected to always operate at maximum power (i.e. PV curtailment is not allowed). Given this, the only effect over-forecasting and under-forecasting have in our capacity firming problem is because of the different battery charging and discharging efficiencies. For the sake of completion, we compare the MBE (mean-bias-error) of the PV forecasts FIGURE 12. Net present value of capacity firming using encoder-decoder LSTM-RNN and 30/30 persistence after 10 years vs. battery replacement cost in 10 years. The revenue is fixed at $5400 per year, the discount rate is assumed to be 5% and the battery degradation per cycle is based on Fig. 8. in Tables 2 and 3. The small values for MBE in Table 2 means that our yearly forecasts consist of equal amounts of overforecasts and under-forecasts. According to Table 3, all the machine learning techniques over-forecasts on sunny and less sunny days. This could be because our Dataset A is limited in size. Tables 2 and 3 show that the time required for training MLPs is significantly smaller than for LSTM-RNN and encoder-decoder LSTM-RNN. However, the testing time is similar for all three techniques. Given that the offline training is done on a fast computer, the training time should not be considered when deciding the best PV forecasting method. Fig. 12 compares how using the best machine learning forecast and persistence forecast affects the NPV (net present value) of capacity firming. Since it is difficult to know what the battery replacement cost will be in ten years, these values have been calculated for a range of replacement costs. This figure shows that encoder-decoder LSTM-RNN based PV forecasting makes capacity firming more profitable than persistence forecasting. If the battery replacement cost is high, capacity firming is not profitable if persistence forecasting is used. In these NPV calculations, the revenue from capacity firming is assumed to be $5400 per year (i.e. the amount that Snohomish PUD would not have to pay to BPA), the interest rate is assumed to be 5%, the battery degradation is assumed to be 0.00687945% per cycle based on Fig 8, and the yearly battery cycles are from Table 2.
Our analysis shows that it is beneficial to do capacity firming if the PV forecasts are based on a state-of-theart machine learning technique and the capacity firming is already implemented in the BESS or the cost of implementing capacity firming in the BESS is low.

VII. CONCLUSION
This article compared capacity firming using photovoltaic (PV) forecasts based on long short-term memory recurrent neural networks (LSTM-RNN), encoder-decoder LSTM-RNN, multi-layer perceptrons and the 30/30 persistence approaches. The results showed that the encoderdecoder LSTM-RNN performs significantly better than the persistence method in forecasting PV generation and therefore significantly reduces the battery degradation cost of capacity firming. DANIEL S. KIRSCHEN (Fellow, IEEE) received the Ph.D. degree from the University of Wisconsin-Madison and the Electromechanical Engineering degree from the Free University of Brussels, Belgium. He is currently the Donald W. and Ruth Mary Close Professor of electrical engineering with the University of Washington. His research interests include the integration of renewable energy sources in the grid, power system economics, and power system resilience. Prior to joining the University of Washington, he taught for 16 years at The University of Manchester, U.K. Before becoming an Academic, he worked at Control Data Corporation and Siemens on the development of application software for utility control centers. He is the author of two books.
NATHAN SHIH (Student Member, IEEE) is currently pursuing the bachelor's degree in electrical and computer engineering with the University of Washington. He was an Undergraduate Researcher with the University of Washington's Renewable Energy Analysis Laboratory and has previously interned at the National Renewable Energy Laboratory and the NASA Ames Research Center. His research interests include renewable energy, energy poverty, and sustainability.
SCOTT GIBSON (Member, IEEE) received the B.S.E.E. degree from the University of Wyoming, in 1987. He is currently a Principal Engineer with Snohomish County PUD. After graduation, he worked at Boeing and then with a consulting company designing building electrical systems. In 2000, he joined PUD, where he is working on new electrical generation projects. He worked on the development of a tidal generation project and helped to design and constructs two run-of-theriver hydroelectric projects. He is the Project Manager of the PUD's latest energy storage project Arlington Microgrid.