Statistical Evaluation of Sentinel-3 OLCI Ocean Color Data Retrievals

We employ a previously developed statistical method to evaluate the performance of the Sentinel-3 Ocean and Land Colour Instrument (OLCI) global ocean color data relying on the temporal stability of the retrievals. We analyze the normalized water-leaving reflectance <inline-formula> <tex-math notation="LaTeX">$\rho _{\mathrm {wN}}(\lambda)$ </tex-math></inline-formula> spectra generated by the National Oceanic and Atmospheric Administration (NOAA) Multi-Sensor Level-1 to Level-2 (MSL12) ocean color data processing system from the OLCI measurements and the European Organization for the Exploitation of Meteorological Satellites (EUMETSAT)-Instrument Processing Facility for OLCI Level-2 (IPF-OL-2) OLCI reflectance <inline-formula> <tex-math notation="LaTeX">$\rho _{\mathrm {wN}}(\lambda)$ </tex-math></inline-formula> spectra. The deviations in <inline-formula> <tex-math notation="LaTeX">$\rho _{\mathrm {wN}}(\lambda)$ </tex-math></inline-formula> spectra from temporally and spatially averaged baseline data are statistically evaluated corresponding to various parameters, including the solar-sensor geometry, various ancillary data (i.e., surface wind speed, sea-level atmospheric pressure, water vapor amount, and ozone concentration), and other related parameters. Our results show that, under most conditions, both NOAA-MSL12 and EUMETSAT-IPF-OL-2 data processing systems produce statistically consistent ocean color products in the open ocean with respect to all corresponding parameters analyzed but with some underestimates of <inline-formula> <tex-math notation="LaTeX">$\rho _{\mathrm {wN}}(\lambda)$ </tex-math></inline-formula> spectra by EUMETSAT retrievals in moderate sun glint conditions being the notable exception.


I. INTRODUCTION
A N INCREASED awareness of global environmental changes and interconnections has prompted growing interest and investments in global Earth observation research programs, which increasingly rely on satellite observations. The EU Copernicus program (https://www.copernicus.eu), including the Sentinel satellite missions, is designed to provide comprehensive multidecadal global atmosphere, marine, and land observations. Sentinel-3 is part of the Copernicus program focused on the global ocean and land surface observations [1]. Sentinel-3A was launched on February 16, 2016, and carries an instrument suite, including the Ocean and Land Colour Instrument (OLCI) [2]. OLCI is the next-generation successor of the Medium Resolution Spectrometer (MERIS) on the ENVISAT mission [3]. Sentinel-3A is flying in a polar Sunsynchronous orbit with morning local overpass time, and the OLCI field-of-view is tilted slightly westward to reduce the sun glint in the observations. Sentinel-3B, carrying the same set of instruments, was launched on April 24, 2018, and is flying in the same orbital plane with a phase shift of 140 • . Two more Sentinel-3 satellites are expected to launch within this decade. With sufficiently high spatial resolution for ocean observations, important additional spectral bands, full global coverage in two days by two satellite sensors, and long-term mission continuity ensured by future launches, the Sentinel-3 OLCI ocean color observations are bound to be indispensable for monitoring and research of global aquatic ecosystems.
With the Sentinel-3 mission well underway, work continues on OLCI sensor calibration, satellite ocean color measurement validation, and retrieval algorithm evaluation and further development [4], [5], [6], [7]. Indeed, a recent study validates the most recently reprocessed OLCI ocean color radiometric data (the Operational Baseline Collection-3) over a wide range of coastal water types [8]. The process of deriving the ocean color properties from satellite-measured top-of-atmosphere (TOA) radiance spectra is first managed by the atmospheric correction, which involves accounting for various optical scattering and absorption processes in the atmosphere, water, and at the water surface, and retrieves multispectral water-leaving reflectance [9], [10], [11], [12]. Instrument calibration and atmospheric correction biases are rectified by an on-orbit system vicarious calibration with in situ optical measurements, such as those recorded using the Marine Optical Buoy (MOBY) in the waters off Hawaii [13], [14], [15], [16], [17], [18]. The water-leaving reflectance products are then converted to bio-optical products, such This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ as chlorophyll-a (Chl-a) concentrations. Due to the overall complexity of ocean color processing, independent means of product and retrieval algorithm validation are crucial to data quality and accuracy assurance.
Nevertheless, in situ measurements are comparatively scarce. There are relatively few locations with continuous observations (usually in coastal and inland environments); thus, most locations are never directly sampled. In addition, in situ data are also subject to measurement uncertainties and data quality requirements [16], [19], [20], [21]. In this situation, the intersensor comparisons are useful [22], [23], [24] but may not always be conclusive in attributing data quality issues to a particular sensor.
Recently, Mikelsons et al. [25] introduced a statistical evaluation approach of satellite ocean color data that relies on gradual temporal changes of water properties in most parts of the global open ocean. The utility of this method was demonstrated by analyzing the ocean color data derived from the Visible Infrared Imaging Radiometer Suite (VIIRS) measurements and searching for potential biases in data dependence on various parameters, including the solar-sensor geometry, as well as atmospheric and other local environmental conditions [25].
The Multi-Sensor Level-1 to Level-2 (MSL12) software package is the enterprise ocean color data product retrieval system used at NOAA. The MSL12 has been used for deriving global ocean color products for several satellite sensors [23], beginning with the global ocean color data production for the Sea-viewing Wide Field-of-view Sensor (SeaWiFS) [35]. Recently, it has been adopted to derive ocean color data from OLCI observations and has been used to routinely generate OLCI global ocean color products. In fact, the OLCIderived ocean color data have been used to generate improved global gap-free ocean color products from multisatellite measurements [36].
In this work, we employ the methodology designed for VIIRS ocean color data statistical analysis and apply it to the Sentinel-3A OLCI ocean color observations, derived by the two distinct retrieval algorithms (or ocean color data processing suites), namely, the European Organization for the Exploitation of Meteorological Satellites (EUMETSAT) Instrument Processing Facility for OLCI Level-2 (IPF-OL-2) ocean color data products (the most recently reprocessed data, i.e., the Operational Baseline Collection-3) and those from NOAA-MSL12. The only changes in the statistical methodology stem from differences in sensor design between OLCI and VIIRS.
Like MERIS, OLCI is a push-broom imaging spectrometer, whereas MODIS and VIIRS are whisk-broom scanning spectroradiometers. This difference has some significance for ocean color retrievals. For example, the along-track striping artifacts that are common in MODIS and VIIRS data [37] are absent in OLCI retrievals. However, OLCI data may contain cross-track camera discontinuities. Each OLCI cross-track retrieval is captured by five cameras arranged in a fan shape configuration, and every camera's CCD response is sometimes referenced as cross-track detectors. In general, each of the detectors has a slightly different spectral response function. On the absolute scale, these differences are rather small and accounted for in the Level-2 data processing by the so-called smile correction [38]. However, the residual cross-track effects can be amplified during the atmospheric correction process, especially for ocean color retrievals where the water-leaving reflectance signal is quite low compared to a more reflective land surface [11], [39]. In practice, the differences between the OLCI detectors from adjacent cameras are the most prominent [38], [39]. OLCI cross-track coverage is resampled on an Earth grid from 740 detectors per camera and, from here on, are called cross-swath "samples." The remainder of this article is structured as follows. We briefly review the two OLCI ocean color data processors and datasets analyzed in this study and explain the main steps of the methodology in Section II. We then show the results of analysis for both OLCI datasets from the NOAA-MSL12 and EUMETSAT-IPF-OL-2 in Section III, following with discussion in Section IV before presenting the conclusions in Section V.

A. MSL12 Ocean Color Data Processing System
The MSL12 is based on the NASA SeaWiFS Data Analysis System (SeaDAS) [40] version 4.6 with several improvements and modifications, and is updated to be able to process data from a range of more recent satellite-based ocean color instruments. MSL12 has been used to routinely process and generate the global ocean color data products from VIIRS sensors onboard the SNPP and NOAA-20 satellites since the mission started [31]. Recent updates have extended MSL12 capabilities to also process data from the OLCI sensor onboard the Sentinel-3 satellite series. As the name suggests, MSL12 is a rather general data processing system and uses the same thoroughly tested algorithms for many spaceborne sensors. The MSL12 data analyzed in this study were produced using the near-infrared (NIR)-based atmospheric correction with a specific correction algorithm for the NIR water reflectance contributions [28], [31]. However, it is noted that the NIR water reflectance correction is not essential for open oceans due to the overall low NIR reflectance in clear waters. In addition, the improved land mask has been used in MSL12 for all satellite data processing [41].
In addition to the sensor-measured TOA radiances, atmospheric correction also requires complete geolocation records (latitude and longitude), the relevant retrieval geometry angles (solar-and sensor-zenith angles, and relative azimuth angle), as well as information about the atmospheric and ocean surface conditions, such as the total column ozone amount, the sea-level atmospheric pressure, the sea surface wind speed, and the total column water vapor amount [64].

B. EUMETSAT Retrieval Algorithms for OLCI
As OLCI follows the earlier MERIS sensor, the design of the atmospheric correction algorithm relies on the heritage approaches developed for MERIS [62], [65], [66], [67] and the updates introduced with Collection-3 (https://www. eumetsat.int/media/47794). The Level-2 preprocessing includes cloud flagging, a gaseous correction (https://www. eumetsat.int/media/40899), and corrections for sun glint (https://www.eumetsat.int/media/38633) and white cap (https://www.eumetsat.int/media/47794) contributions [48], [68], [69]. After the molecular Rayleigh scattering pressure adjustment, the smile correction is performed to eliminate varying pixel spectral responses across each OLCI camera field of view (https://www.eumetsat. int/media/38634). System vicarious calibration gains are then applied (https://www.eumetsat.int/ocean-colour-systemvicarious-calibration-tool), followed by the bright pixel correction, which eliminates potential contributions of water reflectance in the NIR bands (https://www.eumetsat.int/ OC-BPC). The follow-on atmospheric correction assumes that all remaining signal in the NIR comes exclusively from the atmosphere, and it uses a combination of bands at 778 and 865 nm to select the aerosol models and the approach from Antoine and Morel [65] to remove the scattering and absorption by aerosols and the Rayleigh scattering by air molecules; see ATBDs at https://www.eumetsat.int/oceancolour-resources and Collection-3 report at https://www. eumetsat.int/media/47794. Similarly, as for the NOAA-MSL12, the EUMETSAT-IPF-OL-2 OLCI primary ocean color data products are ρ wN (λ) spectra, yet no correction is included for the bidirectional reflectance distribution function (BRDF) effects both in water surface (i.e., the factor) and in-water (i.e., the f /Q factor) [11], [51], [52], [53]. In fact, we have directly used EUMETSAT-IPF-OL-2 produced Level-2 ocean color data and generated Level-3 data without applying the BRDF corrections. On the other hand, the MSL12-derived ρ wN (λ) spectra have applied corrections for both ocean/water surface (i.e., , valid for global waters) and in-water (i.e., f /Q factor, valid for open oceans) BRDF effects [11], [51], [52], [53]. The reason for not applying the full BRDF correction in IPF-OL-2 ρ wN (λ) products is that most ocean color data users have been historically interested in coastal complex waters for which the current BRDF correction is not suitable. To retrieve the bio-optical products, EUMETSAT-IPF-OL-2 does, however, apply the complete BRDF correction in the internal processing. The products derived from ρ wN (λ) include Chl-a concentration using the OC4ME algorithm [70], [71] combined with the Chl-a index (CI) method [55], the water diffuse attenuation coefficient at 490 nm K d (490), and others. In addition, a set of data quality flags is provided, which indicates various retrieval conditions and can be used to narrow down on the more reliable ocean color retrievals.
An overview and a comparison of the two processing systems are shown in Table I, which lists brief descriptions and references for the algorithms and datasets used in the major parts of the retrieval process.

C. Evaluation Approach and Criteria
We use the MSL12 to process the Sentinel-3A OLCI Level-1B data and generate global daily ocean color product data for the entire 2019 year. We also perform identical analysis using the EUMETSAT-IPF-OL-2 Sentinel-3A OLCI Level-2 daily ocean color data for the same time period.
We follow the method established by Mikelsons et al. [25] and briefly summarize the main steps of the analysis. In the first step, the Level-2 ρ wN (λ) data are spatially averaged to Level-3 products [72] for each day of retrievals. A bin size of 9 km is used as it provides sufficient resolution to capture the gradual changes in the open ocean.
When binning the MSL12-produced ρ wN (λ) data, Level-2 data quality flags are used to avoid retrievals affected by various adverse conditions, e.g., cloud shadow/stray light [46], large sun glint contamination [49], [50], and very high aerosol concentrations. The EUMETSAT-produced ocean color data also include a set of quality flags, which we use to discard retrievals with adverse conditions. Specifically, we use the EUMETSAT recommended set of flags in our analysis [73]. For the analysis of the most recent Collection-3 data, the following flags were applied: CLOUD, CLOUD_ AMBIGUOUS, CLOUD_MARGIN, INVALID, COSMETIC,  I   COMPARISON OF THE NOAA-MSL12 AND THE EUMETSAT IPF-OL-2 OCEAN COLOR DATA PROCESSING SYSTEMS SATURATED, SUSPECT, HISOLZEN, HIGHGLINT, SNOW_ICE, AC_FAIL, WHITECAPS, ADJAC, RWNEG_02, RWNEG_03, RWNEG_04, RWNEG_05, RWNEG_06, RWNEG_07, RWNEG_08, and OC4ME_FAIL. However, we retain the Level-2 retrievals with a high solar-zenith angle to evaluate the data quality with respect to this parameter but apply HISOLZEN in the Level-3 binning for the rest of the analysis.
We note that conditions under which various quality flags are set may be different between the MSL12 and IPF-OL-2 retrieval algorithms. In fact, even the meanings of some quality flags may differ between the two retrieval algorithms. This can lead to some differences in ocean color data coverage between the two datasets. However, it is noted that we apply the most optimal flags in the two data processing systems recommended by the NOAA and EUMETSAT ocean color teams.
Once all Level-2 ρ wN (λ) data are spatially binned into the daily Level-3 data, we calculate a weighted temporal average for each of the spatial data bins. We use the period of time average T a = 27 days, including 13 days before and 13 days after a specific day. In fact, the time interval T a is chosen to reflect the OLCI orbital revisit cycle. The weights for the temporal average are chosen as w(t) = cos[πt/(T a + 1)], where t ∈ [−13, . . . , 13] is the number of days between a day within the 27-day interval and a particular day for which the moving average is calculated.
The averaged spatial-temporal Level-3 data are then used as the baseline reference data for intercomparisons with the in (1)]. Importantly, the baseline reference data are derived for each of the two datasets separately. Due to a narrower OLCI swath width (compared to VIIRS), the sensor-zenith angle does not exceed around 56 • . Thus, excessively large (>60 • ) values of sensorzenith angle, associated with degraded retrieval accuracy [25], are never encountered with OLCI.
To analyze the consistency of retrievals from different locations and times, we calculate the ρ wN (λ) anomaly, ρ wN (λ), i.e., where ρ wN (λ|t, x) are the daily Level-2 ocean color retrievals and ρ (REF) wN (λ|t, x) denotes the corresponding values of the Level-3 baseline reference data at the same spatial location x on the same day (t). For brevity, we drop the temporal and spatial labels (t, x) in further discussion. Since any instrument and retrieval artifacts are averaged out in the Level-3 baseline reference data, while retaining the long temporal and spatial scale natural variability of ρ wN (λ), the anomaly ρ wN (λ) essentially represents the retrieval artifacts and short temporal and spatial scale variation of ρ wN (λ) versus its mean. The anomaly ρ wN (λ) should, therefore, be treated as a relative quantity because the actual absolute value of ρ wN (λ) is not known. It is again underlined that ρ wN (λ) from the EUMETSAT-IPF-OL-2 processing is not fully BRDF corrected, while the correction is included in NOAA-MSL12 products.
Again, the same set of data quality flags is applied to exclude poor quality Level-2 ρ wN (λ) data. As in the previous work [25], we also restrict ρ wN (λ) data to open oceans with water depths larger than 1 km. The ρ wN (λ) anomaly data are then aggregated into histograms over the relevant ranges of different geometrical, physical, and ancillary parameters p i . The mean ρ wN (λ, p i ) for each of these parameters is computed for all data points within the same value range of the parameter p i . The entire range of each parameter p i is discretized into 50-200 equal size bins to capture changes in mean ρ wN (λ, p i ) while also ensuring a sufficient amount of data for statistics within each bin. Other statistical measures (such as median) describing the distribution of ρ wN (λ) anomalies within the same parameter p i bin can also be used in the analysis. We also study how the number of data points is distributed across the range of each of the dependent parameters p i .
Finally, the consistency and the quality of the ocean color retrievals can be gauged by evaluating the dependence of the mean anomaly ρ wN (λ, p i ) [from here on shortened to ρ wN (λ)] with respect to all parameters p i . High absolute values of the mean anomaly ρ wN (λ) (relative to the relevant accuracy requirements) imply a systematic bias of ρ wN (λ) spectra for the particular values of the dependent variable p i . For ocean color remote sensing, the absolute value of the anomaly should be compared to the accuracy requirement of sensor-measured ρ wN (λ) for the open ocean, which states that ρ wN (λ) at the blue band (443 nm) should be less than ∼0.001 (or 5%) [9], [10], [11], considering that ρ (REF) wN is not a ground truth but only a relative reference.

III. RESULTS
As in the previous study for VIIRS retrievals [25], we divide the relevant parameters into three groups, corresponding to the geometrical parameters, ancillary parameters characterizing atmospheric and ocean surface conditions, and other intermediate parameters.

A. Solar-Sensor Geometry
The data consistency of OLCI retrievals is first evaluated with respect to the solar and sensor geometry. The results obtained with NOAA-MSL12 and EUMETSAT-IPF-OL-2 retrieval algorithms are shown in Figs. 1 and 2, respectively. The dependence on the sensor-zenith angle in MSL12 retrievals [see Fig. 1(a)] reveals nonmonotonic behavior associated with differences among the five cameras spanning the OLCI swath. This behavior is more evident in the dependence on the cross-swath sample number [see Fig. 1(b)], where the discontinuities between the samples from adjacent cameras are more obvious. The samples are numbered from the west (sample 0) to east (sample 4860) directions. Due to the OLCI sensor tilt away from high sun glint conditions on the eastern side, the nadir roughly corresponds to sample 3620. Thus, most of the swath is pointing west of the nadir, including the samples corresponding to larger values of sensor-zenith angle. Unsurprisingly, samples with larger sensor-zenith angles in the west part of the swath also deviate more from the average. The overall qualitative behavior and the differences between samples from different cameras are also evident in the IPF-OL-2 retrievals [see Fig. 2 For a better quantitative comparison, we also calculate the mean absolute deviation of ρ wN (λ) for low and medium sensor-zenith angles (less than/equal to and larger than 20 • ) for both NOAA-MSL12 and EUMETSAT-IPF-OL-2 retrievals in Table II. We note again that, due to the narrower swath of OLCI (compared to VIIRS), the sensor-zenith angle only reaches up to approximately 55 • [see Figs. 1(a) and 2(a)], less than the 60 • threshold value used to identify the questionable retrievals [25]. While the magnitude of the deviations is the same for both retrieval algorithms, NOAA-MSL12 results appear to have somewhat lower absolute ρ wN (λ) deviations from the average. In Fig. 2, beyond the camera patterns, especially in the westernmost camera 1, IPF-OL-2 results also demonstrate broader underlying solar-and sensor-zenith angle trends and a marked relative azimuth angle dependence. These results point to the IPF-OL-2 geometric uncertainties, also documented in the Collection-3 report (https://www.eumetsat.int/media/47794) and by Zibordi et al. [8]; nevertheless, the missing BRDF correction may also contribute to the trends. Overall, absolute ρ wN (λ) is higher for lower wavelengths (blue bands), where the reflectance ρ wN (λ) is usually also higher over open oceans [9], [10], [11].
Since the sample number more completely describes the behavior of retrievals across the entire swath, we also show the quantitative comparisons in ρ wN (λ) for the five regions across the swath, roughly corresponding to the five cameras of the OLCI sensor. We also see some underestimates relative to the average ρ wN (λ) spectra for low values of solar-zenith angle (less than 30 • ), especially notable in the IPF-OL-2 retrievals, possibly exacerbated by the missing BRDF correction. For OLCI, the values less than 30 • of solar-zenith angle occur almost always on the eastern side of the swath for relatively fewer retrievals in the tropics that are often subject to medium or high sun glint conditions. We note that, due to the Sentinel-3A satellite flying in an orbit with a morning local equatorial overpass time, and due to the OLCI camera tilt westwards away from the sun glint reflection, values of the solar-zenith angle for OLCI retrievals are always greater than ∼22 • . We summarize the quantitative comparisons in ρ wN (λ) for low (less than/equal to 30 • ), intermediate (30 • -70 • ), and high (larger than 70 • ) values of solar-zenith angle in Table IV. The results for ρ wN (λ) dependence on the relative-azimuth angle φ are shown in Fig. 1(d) for MSL12 and in Fig. 2(d) for IPF-OL-2. The angle φ is defined as ranging from   satellite retrievals usually associated with large solar-zenith angles and data quality are affected by a longer optical path, contributing to higher variability of results. The same goes for VIIRS results [25] since the number of retrievals strongly depends on the relative azimuth angle, and there are a significantly reduced number of retrievals for φ around ±90 • , which also results in more noisy retrievals for φ values. However, the IPF-OL-2 OLCI retrievals [see Fig. 2(d)] also show a significantly reduced ρ wN (λ) compared to the baseline reference data when φ is close to zero (likely sun glint conditions), consistent with a similar reduction for low values of solar-zenith angle [see Fig. 2(c)]. It is noted that, for moderate sun glint contamination regions, both the MSL12 and IPF-OL-2 data processing have used the sun glint correction algorithm to improve satellite-derived ρ wN (λ) spectra [50]. For a more complete picture, we also investigate the combined dependence of retrievals corresponding to both the solar-zenith angle and the number of samples across the swath. Fig. 3(a) shows ρ wN (λ) at the short blue band 400 nm, ρ wN (400), as a function of both parameters from MSL12. Recall that, for OLCI, low solar-zenith angles may only occur near the easternmost part of the swath. As seen in Figs. 1(c) and 2(c), large values of solar-zenith angle (above 70 • ) result in significant ρ wN (400) deviations. The discontinuities in ρ wN (400) retrievals corresponding to the five OLCI cameras are clearly visible as vertical bands (related to sensor/detector performance), indicating that the dependence on cross-swath sample number is more prominent than the dependence on the solar-zenith angle, except for cases with solar-zenith angles above 70 • . The results are quite similar for ρ wN (λ) at other spectral bands, i.e., ρ wN (443), ρ wN (510), and ρ wN (560), as shown in Fig. 3(b)-(d), respectively, with overall decreasing amplitude of the deviation from the baseline ρ wN (λ) for larger wavelengths λ, confirming correlation of ρ wN (λ) anomalies for different spectral bands [11].
The IPF-OL-2 retrievals show qualitatively similar results in the combined solar-zenith angle and cross-swath sample number dependence (see Fig. 4). Here, retrievals corresponding to the westernmost camera are significantly overestimated relative to the baseline reference data [consistent with Fig. 2(b)], while retrievals near nadir are underestimated (in relative terms). In addition, there are fewer retrievals for low solarzenith angles along the eastern part of the swath than those from MSL12. It should be noted that large sun glint contaminations often have a significant impact on OLCI retrievals for cases with the low solar-zenith angle for the eastward side of the swath (i.e., region with highlighted by dashed ovals in Fig. 4).

B. Dependence on the Ancillary Data Inputs
Next, we examine the impact of various physical parameters on the stability and consistency of ocean color retrievals. Since we have already demonstrated that both NOAA and EUMETSAT retrieval algorithms produce more biased ocean color data when the solar-zenith angle is >70 • , we restrict the subsequent analysis to ≤70 • to match the standard flag recommendation.
The effect of the ocean surface wind speed is shown in Figs. 5(a) and 6(a) for NOAA-MSL12 and EUMETSAT-IPF-OL-2 retrievals, respectively. Both retrieval algorithms tend to produce slightly elevated ρ wN (λ) spectra in windy conditions. This is also consistent with earlier findings for VIIRS-SNPP [25]. The origin of this increase is not firmly established. Whitecaps, the most obvious culprit, have been shown to have a negligible effect on MSL12-retrieved ρ wN (λ) spectra [47], [74]. Other possibilities include contributions from the correction of sun glint [50] and computation of the Rayleigh radiance (scattering by air molecules) contribution [75]. Both of these rely on the model proposed by Cox and Munk [49] to account for the sun glint and depend on the sea surface roughness (which is directly related to the surface wind speed). We also note that, while both algorithms produce a comparable number of retrievals in low wind speed conditions, IPF-OL-2 retrievals are masked out by the WHITECAPS flag above wind speeds of 12 m/s and are not recommended to be used under such conditions. NOAA-MSL12 provides retrievals for all wind speeds, including values above 12 m/s. The quantitative values of mean absolute deviation of ρ wN (λ) are summarized in Table V for both data processing systems for values of   [25], certifying that MSL12 provides consistent, high-quality ocean color retrievals from multiple satellite data.
The dependence of the retrievals on the integrated mass of water vapor in the atmospheric column shows almost no deviation from the average for both retrieval algorithms, indicating a consistent performance, as demonstrated in Figs. 5(b) and 6(b) for MSL12 and IPF-OL-2, respectively. Likewise, the sea-level atmospheric pressure has no significant effect on the ρ wN (λ) retrievals [see Figs. 5(c) and 6(c)] although minuscule ρ wN (λ) increases are seen for very low atmospheric pressures in both datasets, which occurs for the two bluest bands for a small fraction of the total number of retrievals. No significant deviation from the average values is found with varying ozone amounts in either NOAA-MSL12 or EUMETSAT-IPF-OL-2 data in Figs. 5(d) and 6(d), respectively, except for slightly elevated biases at the extremes of the ozone range, particularly for IPF-OL-2. We conclude that both ocean color data processing systems produce statistically consistent ρ wN (λ) products with respect to the ancillary data of the sea-level atmospheric pressure, water vapor amount, and ozone concentration, while NOAA-MSL12 provides satisfactory and consistent retrievals in very windy conditions >12 m/s, yet the high wind retrievals are masked in EUMETSAT-IPF-OL-2.

C. Dependence on the Intermediate Retrieval Parameters
Finally, we examine the dependence of the retrieved ρ wN (λ) spectra on various other parameters characterizing the retrieval conditions. The sun glint coefficient, derived from the Cox and Munk [49] model, encompasses the solar-and sensorzenith angles, as well as the wind speed, which determines the surface roughness [50]. All of these factors influence the intensity of the solar reflection from the water surface, as seen by the satellite. The NOAA-MSL12 retrievals show a very slight increase of ρ wN (λ) in moderate sun glint conditions [see Fig. 7(a)]. We note that the vast majority of retrievals for both MSL12 and IPF-OL-2 occur in near-zero glint conditions (see dashed-dotted lines in Figs. 7(a) and 8(a), respectively). In fact, for MSL12, 74% or retrievals have a glint coefficient less than 0.0001, whereas, for IPF-OL-2, the fraction is even higher at 83%. Thus, IPF-OL-2 has relatively fewer retrievals at medium glint conditions. The negative ρ wN (λ) seen in IPF-OL-2 results in these conditions are countered by a small positive ρ wN (λ) for the vast majority of retrievals with near zero glint coefficient [see Fig. 8(a)]. This is consistent with the results for low solar-zenith angle [see Fig. 2(c)] and low values of the relative azimuth angle [see Fig. 2(d)]. Since glint coefficient data are not included in the EUMETSAT-IPF-OL-2 product data, we have used the corresponding glint coefficient data from NOAA-MSL12 to derive the results in Fig. 8(a). The glint coefficient is computed using the Cox and Munk [49] model, which primarily depends on the solar-sensor geometry, as well as sea surface wind speed. Thus, it is reasonable to use the same glint coefficient data for the analysis of both datasets. However, some uncertainty may be introduced in the analysis in Fig. 8(a) due to slight differences in the ancillary sea surface wind speed data used to estimate the glint coefficient by NOAA-MSL12 and EUMETSAT-IPF-OL-2.
The AOD quantifies the aerosol density in the atmosphere above the retrieval location and is derived as an internal parameter within the atmospheric correction process [9], [10], [60]. The denser aerosol cover makes ocean color data retrievals more challenging [9], [10], [11], or even impossible, if the aerosols are sufficiently dense (such as thick smoke and dust). The results for ρ wN (λ) dependence on AOD are displayed in Figs. 7(b) and 8(b) for NOAA-MSL12 and EUMETSAT-IPF-OL-2, respectively. MSL12 produces slightly reduced ρ wN (λ) at the blue bands with higher AOD (>∼0.2) and with very  low AOD (<∼0.03). IPF-OL-2 shows relatively consistent ρ wN (λ) at most AOD levels except for a sudden positive bias in all bands for AOD < ∼0.015 [see Fig. 8(b)]. The number of retrievals for MSL12 is relatively stable through the low AOD range and becomes significantly reduced closer to AOD∼0.3. For IPF-OL-2, the number of retrievals is very stable throughout and only slightly smaller closer to the lowest AODs. The results in ρ wN (λ) for low (≤0.1), medium (0.1-0.2), and relatively high AODs (>0.2) are summarized in Table VI.
The effects of cloud shadow/stray light have been shown to have a considerable impact on ocean color retrievals [34], [46]. We estimate these effects by looking at how the proximity to clouds affects satellite-derived ρ wN (λ). Figs. 7(c) (NOAA-MSL12) and 8(c) (EUMETSAT-IPF-OL-2) show somewhat higher reflectance anomaly ρ wN (λ) spectra for retrievals within ∼5 km from clouds, consistent with the earlier results for VIIRS [25]. Thus, more conservative stray light masking may lead to slightly more accurate (but often insignificant) ρ wN (λ) spectra, however, at a cost of significantly reduced number of retrievals [34].
Finally, we also investigate if the deviations of ρ wN (λ) spectra at individual spectral bands are correlated with the overall quality of the retrieved ρ wN (λ) spectra, as evaluated by the data quality assurance (QA) score [76]. The QA score evaluates the ρ wN (λ) spectra by comparing them against a library of known ρ wN (λ) spectra characterizing the majority of global waters [76]. Values of the QA score used in this analysis are calculated separately for NOAA-MSL12-and EUMETSAT-IPF-OL-2-produced ρ wN (λ) spectra. The results of this analysis are shown in Fig. 7(d) for NOAA-MSL12 and Fig. 8(d) for EUMETSAT-IPF-OL-2 retrievals. Unsurprisingly, both algorithms show that ρ wN (λ) spectra are significantly reduced to below-average values when the QA score is low (e.g., QA score < ∼0.6), with blue bands again having the largest deviations by the absolute value, especially in the EUMETSAT-IPF-OL-2 retrievals. The results in Figs. 7(d) and 8(d) show that the QA score is an effective data quality measure for satellite ocean color products.

D. Impact on Derived Chl-a Concentration
The ρ wN (λ) spectra are used to derive a number of ocean/water property products, e.g., Chl-a concentrations [29], [54], [55], which could be impacted by any systematic biases in ρ wN (λ) spectra. However, the algorithms for derived quantities usually rely on reflectance differences or ratios in different bands; therefore, the effect of biases at individual bands is not obvious. We choose to focus on Chl-a concentration as the most commonly used parameter quantifying global phytoplankton productivity. Due to a wide range of Chl-a, spanning several orders of magnitude, and skewness of frequency distribution, we opt for median anomaly instead of mean anomaly, as a more appropriate measure of statistical consistency [25]. Chl-a derived by the two processing systems is produced by different algorithms. NOAA-MSL12 has used the OC3V algorithm [54] and, more recently, the ocean color index (OCI) algorithm [29], while the EUMETSAT-IPF-OL-2 data include Chl-a produced by the OC4ME algorithm [70], [71] combined with the CI Chl-a algorithm, introduced in the Operational Baseline Collection-3 (Collection-3 Report, EUMETSAT, 2021). Fig. 9(a) shows the dependence of the three Chl-a products on the sample number across the swath. Here, the differences among the five OLCI cameras are much less obvious compared to ρ wN (λ) spectra in Figs. 1(a) and 2(a). Nevertheless, IPF-OL-2-derived Chl-a (using OC4ME) is very slightly (but systematically) lower (by about 0.01 mg/m 3 ) on the eastern part of the swath (and the number of retrievals is also lower there).
The dependence of Chl-a products on the solar-zenith angle, as shown in Fig. 9(b), reveals a somewhat different behavior. While all three Chl-a algorithms are practically unaffected by Both NOAA-MSL12 and EUMETSAT-IPF-OL-2 retrieved ρ wN (λ) show slightly increased ρ wN (λ) in windy conditions [see Figs. 5(a) and 6(a)]. However, the impact on the derived Chl-a is quite minimal, as results shown in Fig. 9(c) for all three Chl-a algorithms. While the EUMETSAT-IPF-OL-2 ρ wN (λ) has no retrievals for high wind speeds (>12 m/s, due to the application of WHITECAPS flag), it has slightly more retrievals than NOAA-MSL12 in less windy conditions (mainly due to differences in cloud and cloud shadow/stray light mask/flags).
Finally, in Fig. 9(d), we show that ρ wN (λ) retrievals with poor spectral consistency (as indicated by low values of the QA score; see Figs. 7(d) and 8(d)) do not necessarily result in excessive biases in the derived Chl-a for the OCI algorithm, indicating that the OCI algorithm has a much better performance for noisy ρ wN (λ) spectra (i.e., less sensitive to ρ wN (λ) errors) over clear oceans [29]. There are obviously degraded Chl-a retrievals from both OC3V and OC4ME algorithms for cases with low QA score values [see Fig. 9(d)], consistent with results from the previous study [29].

IV. DISCUSSION
The analysis of NOAA-MSL12 processed Sentinel-3A OLCI data shows that results are similar to the earlier analysis of MSL12 VIIRS data [25], at least qualitatively. Slight increases in ρ wN (λ) for cases of high wind speeds and pixels close to the clouds are not surprising. Also, degradation of retrievals for high solar-zenith angles is expected. In fact, the most prominent difference from earlier MSL12 VIIRS result analysis directly relates to the variations between the sensor designs: potential striping artifacts in VIIRS data [37] are replaced by differences among the OLCI cameras and detectors within the cameras across the swath width [38], [39]. Based on our analysis, differences among the OLCI cameras and detectors are not completely accounted for by either of the two ocean color data processing systems and point to the need for further refinement to reduce or eliminate this effect.
Overall, both NOAA-MSL12 and EUMETSAT-IPF-OL-2 algorithms show reasonable and similar consistency with NOAA-MSL12 producing smaller deviations from the average ρ wN (λ) values. EUMETSAT-IPF-OL-2 results, however, demonstrate underlying dependence on solar and view geometries, which have been documented in the Collection-3 report [https://www.eumetsat.int/media/47794] and a recent study [8]. In addition, EUMETSAT-IPF-OL-2 results are also subject to the missing BRDF correction. An additional , and retrievals on the eastern part of the OLCI swath. Both the MSL12 and IPF-OL-2 data processing systems use the Wang and Bailey [50] sun glint correction algorithm, which, with MSL12, has also been successfully applied to SeaWiFS, MODIS, and VIIRS global ocean color product retrievals. It is also noted that some biases from EUMETSAT-IPF-OL-2 results related to the glint Sunsensor geometry may be partly due to the lack of correction for the BRDF effect.
To evaluate the effects of BRDF correction, we produced global ocean color Level-2 data from four days of Sentinel-3A OLCI Level-1B data (January 1, April 1, July 1, and October 1, 2019) using the NOAA MSL12 with and without the BRDF correction (for both and f /Q factors) and compared the results. Overall, skipping the BRDF correction results in higher ρ wN (λ), especially in the blue part of the spectrum. However, the effect is not spatially uniform-it is larger in the tropics and away from high glint areas (the western part of the swath). We performed a quantitative evaluation for the four days of data using the regular MSL12 Level-2 data with the BRDF correction included as the baseline. The results are shown in Fig. 10, where the effect of skipping the BRDF correction is plotted with respect to three retrieval parameters. Since skipping the BRDF correction produces higher reflectance, we subtracted the average differences for this figure to better visualize the trends and compare them with the EUMETSAT IPF-OL-2 results from Sections III-A and III-C. The average differences in ρ wN (λ) are given as follows: 0.0381 (Oa01, 400 nm), 0.0351 (Oa02, 413 nm), 0.0276 (Oa03, 443 nm), 0.0185 (Oa04, 490 nm), 0.0105 (Oa05, 510 nm), 0.0047 (Oa06, 560 nm), 0.0008 (Oa07, 620 nm), and 0.0005 (O08, 665 nm). From Fig. 10(a), we see that skipping the BRDF correction, on average, lowers ρ wN (λ) on the eastern side of the swath, relative to higher ρ wN (λ) in the western part, somewhat similar to the EUMETSAT IPF-OL-2 results in Fig. 2(b). Since the BRDF effect in general is stronger in the tropics, it also appears as elevated ρ wN (λ) for lower values of the solar-zenith angle [see Fig. 10(b)] compared to the baseline data. This is in contrast to the IPF-OL-2 results in Fig. 2(c). However, the results in Fig. 2 were obtained by using the IPF-OL-2 baseline data, which also lacked the BRDF correction. Finally, Fig. 10(c) shows that skipping the BRDF correction can lead to overestimated ρ wN (λ) in zero sun glint conditions and underestimates in moderate to high glint conditions, similar to IPF-OL-2 results seen in Fig. 8(a).
Overall, it appears that at least some of the deviations seen in the IPF-OL-2 data are very likely due to the missing BRDF correction. This also confirms the requirement to apply the full BRDF correction in validation analyses.
Results also show that EUMETSAT-IPF-OL-2 generally has somewhat more retrievals (∼25%) compared with those from NOAA-MSL12 (see Tables II-VI). However, more numerous EUMETSAT-IPF-OL-2 Level-2 retrievals do not translate into a wider area coverage in the daily Level-3 9-km binned data. In fact, both datasets produce a nearly identical number of filled 9-km bins in the daily global Level-3 data though their coverages do not completely overlap. NOAA-MSL12 has better coverage of moderate sun glint areas, while EUMETSAT-IPF-OL-2 has more retrievals in areas with heavy aerosol presence and in higher latitudes. The difference in the total number of Level-2 retrievals between the two datasets stems mainly from the density of Level-2 retrievals in the areas with sparse cloud cover: NOAA-MSL12 has a somewhat more conservative cloud masking and cloud shadow/stray light flags excluding more Level-2 data compared to the EUMETSAT-IPF-OL-2. This enables it to provide a slightly better statistical consistency in the results while retaining the coverage in the global 9-km binned Level-3 data.
We also analyzed an earlier version of the EUMETSAT-IPF-OL-2 derived data (Operational Baseline Collection-2; https://www.eumetsat.int/media/43298) using the same methodology and found that, over the global open ocean, results from the Collection-2 were generally very similar to those from the analysis of the latest version (Collection 3). This is different from a recent evaluation study over coastal regions [8].
As noted in the previous study [25], our analysis also has certain limitations. First, it only gives comparisons against the same processor-averaged retrievals. The baseline reference ρ (REF) wN (λ) values cannot be ascertained as a "ground truth," i.e., the results need to be analyzed in relative terms of trends away from the average. Second, it only gives correlations of systematic biases to some relevant retrieval parameters, which, sometimes, do not necessarily identify the exact source of such biases. Third, the set of relevant retrieval parameters used is not exhaustive. Therefore, it is possible that there are hidden biases with respect to other parameters or parameter combinations. Fourth, any systematic ρ wN (λ) biases affecting all parameters equally or any gradual ρ wN (λ) changes (such as changes due to sensor degradation or imperfect calibration) will not be revealed by this approach.
Nevertheless, our analysis represents a significant test for ocean color data quality and highlights any systematic ρ wN (λ) biases in the ocean color retrieval algorithms with respect to a wide range of retrieval parameters. While this analysis covers one year (2019) of global OLCI ocean color data, we do not expect the results to be significantly altered if data from other time periods were analyzed, so long as the same retrieval algorithms are used. In fact, we used VIIRS data in 2016 for our previous study [25]. Finally, we note that the ocean color retrieval algorithms are being continuously refined and improved. Therefore, this study represents a snapshot analysis of the available data at a certain point in time. Whenever the Sentinel-3 OLCI missionlong data are reprocessed with updated retrieval algorithms, improvements in data consistency are expected. Nevertheless, this study provides a comprehensive test for systematic biases in the two datasets analyzed and is, therefore, expected to help with further improvements of the corresponding retrieval algorithms.
V. CONCLUSION We have employed an established methodology to analyze the statistical consistency of two distinct ocean color datasets, obtained from Sentinel-3A OLCI data for the entire 2019 year by two different data retrieval systems-NOAA-MSL12 and the corresponding ocean color data retrieval system by EUMETSAT-IPF-OL-2 (the most recently reprocessed data, i.e., the Operational Baseline Collection-3). We analyzed the deviations from average ocean color product values or anomalies and searched for their correlations to various retrieval parameters. Overall, the deviations are small and generally within the acceptable accuracy range. Both retrieval algorithms produce a somewhat elevated ρ wN (λ) spectra for large wind speeds (>10 m/s) affecting all bands though EUMETSAT-IPF-OL-2 retrievals are masked out by the WHITECAPS flag for wind speeds greater than 12 m/s. Increased ρ wN (λ) is also seen in the regions that are close to clouds, likely due to stray light contaminations. In addition, the NOAA-MSL12 data processing system underestimates ρ wN (λ) spectra at the blue bands for cases with dense aerosols, and EUMETSAT-IPF-OL-2 produces more retrievals in such conditions. Likewise, some differences among cameras spanning OLCI swath width are seen in both datasets.
Two notable differences between the two processors, likely intertwined, lie in EUMETSAT-IPF-OL-2's underlying dependence on solar and view geometries, and its biased retrievals in moderate sun glint. In sun glint areas, where ρ wN (λ) spectra are somewhat underestimated in EUMETSAT-IPF-OL-2 results, significantly fewer retrievals are also recorded. This effect is also seen for small values of solar-zenith angle (occurring only in the tropics) and cases with low relative azimuth angle values (directly related to sun glint conditions), which affects retrievals on the eastern part of the swath (which is the part of swath exposed to sun glint conditions in the tropics due to Sentinel-3A flying in an orbit with morning local overpass time). Nevertheless, some of these apparent biases in the EUMETSAT-IPF-OL-2 results are likely to be partly due to the BRDF effects. The full BRDF correction is not applied to EUMETSAT-IPF-OL-2 ρ wN (λ) products because MERIS/OLCI applications have historically focused on regional users and the existing BRDF correction is not suitable for complex turbid waters. For the global open ocean, however, our results have clearly demonstrated that it is important to include full BRDF correction. Activities are now ongoing to develop a BRDF correction applicable to clear and complex waters. Finally, any systematic deviations in ρ wN (λ) spectra relative to the baseline values usually translate into relatively small deviations in Chl-a results with the OCI Chla algorithm for clear open oceans. For low-quality ρ wN (λ) spectra, Chl-a data are somewhat degraded with both OC3V and OC4ME Chl-a algorithms.
We have conducted a thorough statistical evaluation of two distinct global OLCI ocean color datasets over the global open ocean, searching for systematic biases with respect to the most relevant parameters that are used in ocean color data processing, confirmed the overall statistical consistency of both datasets, and identified the regions of these parameters where improvements are possible. We anticipate that our analysis will be helpful in the design and improvement of satellite ocean color retrieval algorithms for OLCI specifically. Finally, the methodology used in this study is generally applicable to the statistical evaluation of other ocean color retrieval algorithms and corresponding datasets.