Using Saildrones to Assess the SMAP Sea Surface Salinity Retrieval in the Coastal Regions

Remote sensing of sea surface salinity (SSS) near land is difficult due to land contamination. In this article, we assess SSS retrieved from the soil moisture active passive (SMAP) mission in coastal region. SMAP SSS products from the Jet Propulsion Laboratory (JPL), and from the remote sensing systems (RSS) are collocated with in situ data collected by saildrones during the North American West Coast Survey. Satellite and saildrone salinity measurements reveal consistent large-scale features: the fresh water (low SSS) assocciated with the Columbia River discharge, and the relatively salty water (high SSS) near Baja California associated with regional upwelling. The standard deviation of the difference for collocations with SMAP Level 3 (eight days average) between 40 and 100 km from land is 0.51 (0.56) psu for JPL V5 (RSS V4 70 km). This is encouraging for the potential application of SMAP SSS in monitoring coastal zone freshwater particularly where there exists large freshwater variance. We analyze the different land correction approaches independently developed at JPL and RSS using SMAP level 2 matchups. We found that JPL's land correction method is more promising in pushing SMAP SSS retrieval towards land. For future improvement, we suggest implementing dynamic land correction versus the current climatology-based static land correction to reduce uncertainty in estimating land contribution. In level 2 to level 3 processing, a more rigorous quality control may help to eliminate outliers and deliver reliable level 3 products without over-smoothing, which is important in resolving coastal processes such as fronts or upwelling.


I. INTRODUCTION
I N COASTAL oceans, satellite remote sensing of sea surface salinity (SSS) provides a unique capability of studying the terrestrial-ocean connection within global water and biogeochemical cycles. Seasonal and interannual variation of freshwater inputs from river discharge are reflected in coastal SSS which regulates (along with ocean temperature and pressure) the density of upper layer seawater and drives the dynamics of various coastal processes, e.g., upwelling, fronts, and hurricane landfall [1], [2]. Previous studies have demonstrated the capability of satellite SSS for monitoring the river influence in the Gulf of Mexico [3], [4] and the Bay of Bengal [5], and for depicting the seasonal and interannual variation at world major river mouths [6]. However, the uncertainty of satellite SSS increases near land (exceeding 1 psu within 100 km distance from the coast) and data coverage was inconsistent in coastal regions between SSS products even based on same satellite measurements [4]. With growing scientific and public interest to coastal SSS, it is critical to improve the accuracy of satellite retrieval as close to land as possible to resolve coastal processes [1]. This article uses in situ data collected by saildrones to assess the performance of two soil moisture active passive (SMAP) [7] SSS data sets produced at the Jet Propulsion Laboratory (JPL) [8] and remote sensing systems (RSSs) [9] and identify possible issues in retrieval algorithms particularly the land correction for future improvement.
The accuracy of retrieved SSS degrades near land due to land contamination, which is the intrusion into the radiometer receiver of land surface emission that is much higher than sea surface emission at L-band. Even when the main lobe of the SMAP antenna pattern is over water, a portion of energy received in the antenna sidelobes could originate from land surfaces. Depending on the observing geometry of the spaceborne instrument at any particular moment, radiometer measurements could be affected by landmass presence from up to a thousand kilometers away [10]. To mitigate the effect of land contamination, JPL and RSS have independently developed SMAP land correction algorithms, which remove an estimate of the land contribution to the radiometer footprint from the measured brightness temperature (TB) prior to SSS retrieval. Over-or under-estimation of the land correction term will result in biases of the retrieved SSS (level-2). Most previous validation studies compared in situ salinity with satellite SSS of level-3, which is created on uniform grids by averaging multiple days of level-2 retrievals. Although level-3 validations provided useful guidance for scientific data applications, its implication on remaining issues in the land correction algorithm could be blurred by different filtering or smoothing that was performed in level-2 to level-3 processing. In this article, we use both level-2 and level-3 SMAP data products This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ collocated with in situ saildrones measurements to assess the performance of JPL and RSS land correction algorithms to identify areas for future improvements.
A saildrone uncrewed surface vehicle is a steerable platform designed based on wind-powered propulsion technology, carrying a suite of solar-powered meteorological and oceanographic sensors to perform autonomous long-range data collection missions in harsh ocean environments [11], [12], [13]. Using two saildrone deployments in the Arctic Ocean, [14] demonstrated that SMAP SSS observations resolved the runoff signal associated with the Yukon River with high correlation between SMAP products and saildrone measurements. In this article, we use saildrone data collected during the North American West Coast Survey (NAWCS) in 2018 and 2019 [15]. In addition to in situ salinity for satellite SSS validation, the sea surface temperature (SST) and surface wind speed simultaneously collected by saildrones also provide useful information to determine whether the biases of coastal SSS retrieval were caused by possible deficiency in the land correction algorithm or related to other error sources [16].
Data sources and collocation method are given in Section II. Section III presents results of the validation of SMAP level-3 SSS products, and the diagnostic analysis of the bias of level-2 matchups which are the direct output of SMAP SSS retrieval algorithms. In Section IV, we discuss potential issues of the JPL and RSS land correction algorithm, and level-2 to level-3 processing. Conclusion is given in Section V.

A. Satellite Salinity Products
We use JPL version-5 (V5) [8] and RSS version-4 (V4) [9] Level-2 and level-3 SMAP SSS data sets. All data can be accessed from the physical oceanography distributed active archive center at https://podaac.jpl.nasa.gov. Here we briefly summarize the basics of each product, with further details of the retrieval algorithm and data processing given in [8] for JPL V5 and [9] for RSS V4 products.
1) SMAP SSS Level-2 Products: Level-2 (L2) data are the direct output from the retrieval algorithm for each satellite orbit.
The JPL combined active passive [17], [18] retrieval algorithm is run for each salinity-wind cells (SWCs) on a swath grid (L2B) posted at approximately 25 km in spacing, although the intrinsic resolution is slightly larger than 40 km due to edge-overlap in grouping level-1 TB measurements into SWCs [19]. Level-1 measurements collected in the SWC (excluding pixels flagged as ice or land) are averaged for the H-pol and V-pol TB for fore and aft looks separately to create up to four values for each SWC, which are inputs to the L2B retrieval algorithm.
RSS provides two L2 SSS data sets at 40 and 70 km resolution respectively (named RSS40 km and RSS70 km). The RSS retrieval algorithm is run on a fixed 0.25°Earth grid at approximately 40 km spatial resolution (L2C) after resampling the level-1 SMAP measurements onto the same grid using a Backus-Gilbert type optimum interpolation [20], [21]. The resulting salinity product is called RSS40 km. RSS70 km is the equal-weighted average of RSS40 km over next-neighbors including the center pixel. Both RSS40 km and RSS70 km gives two SSS values at each grid point corresponding to the fore-and aft-look respectively (separately retrieved), which are averaged to produce one SSS value at each grid point for this article (named RSS40 km_L2 and RSS70 km_L2).
2) SMAP SSS Level-3 Products: All level-3 (L3) data are created from L2 products on the 0.25°x0.25°latitude-longitude uniform grid monthly and daily. The monthly maps are created by averaging of all valid data within the calendar month, and the daily maps are eight-day running means. JPL and RSS processing differs particularly in data filtering.
JPL L2 to L3 processing uses Gaussian weighting to interpolate L2 SSS onto the map grid with a search radius of approximately 45 km and a half-power radius of 30 km. L2 data are filtered before aggregation into the level-3 map product. To increase data inclusion, the quality checks for the level-3 data product are somewhat relaxed, and only excluding land, ice, and high ancillary winds (bits 5, 7, and 8 of the quality flag).
RSS provides two L3 products of 40 and 70 km resolution, which are average of 8 days running mean of level-2 data after averaging for a given day the two SSS values retrieved from forand aft-look. Note that RSS creates 70 km L3 maps directly from L2 40 km SSS (instead from L3 40 km) with more rigorous filtering.

B. Saildrone Data
We use saildrone data from the NAWCS in 2018 and 2019. Sponsored by NOAA, the goal of NAWCS was to augment ship-based fish stock assessment to improve the effectiveness and efficiency of fisheries management. The saildrone fleet of NAWCS collected data over a near 20°latitude range from northern Baja California to north of the Columbia River mouth extending 300 km seaward (with more frequent sampling within 100 km from land). Fig. 1 shows salinity measured by the conductivity, temperature and depth (CTD) sensor onboard saildrone at depth of 0.6 m and the time and latitudes that NAWCS cruises collected the data.

C. SMAP-Saildrone Matchups
There exists large discrepancy between SMAP and saildrone measurements in terms of their sampling frequency and spatial scales. SMAP's footprint is ∼25 km with a revisit time of eight days, while Saildrone collects data every one minute. Our collocation strategy was designed so the final matchups represent the intrinsic spatial scale of SMAP data.
The L2 SMAP products were collocated with the saildrone data within 25 km and 24 h, using the Pyresample kd-tree resample_nearest method and SciPy spatial kd-tree method for quick nearest neighbor lookup [22], [23]. For L3 SMAP products, all saildrone measurements collected in that L3 daily time stamp (centered at noon of the day) were matched with the nearest L3 grid points. Since saildrone data is sampled at 1-min intervals, multiple sailldrone observations will match with the same SMAP data point, for either L2 or L3 products. Therefore, we average all saildrone observations that matched with  the same SMAP observation into a single saildrone datapoint (equal-weighted), providing a unique matchup pair. Fig. 2 shows salinity measured at the locations of NAWCS by saildrone CTD and three collocated SMAP Level-3 SSS products. The large-scale salinity features depicted by saildrone and all satellite products are quite consistent: low SSS is observed near the Columbia River mouth under the influence of river discharge; and the relatively high SSS is observed near Baja California likely associated with coastal upwelling. However, there is large discrepancy in the data coverage near land. The data gap near coast in RSS70km_L3 product is much wider than that of the JPL_L3 product, and there are fewer matchups away from land (near 40°N). We included RSS40km_L3 [see Fig. 2 Fig. 2(c)], which is the result of additional filtering implemented in RSS L2 to L3 processing (see Section III-B). Fig. 3 shows the bias (dSSS = SMAP L3 minus CTD) as function of the distance to land for all three satellite products. Statistics given in Table I are the bias, standard deviation and number of collocations for each SMAP product in three coastal zones. Away from land at distance from 100 to 300 km, all three SMAP products have small bias against saildrones with standard deviation less than 0.5 psu. Most SMAP/saildrone collocations were found in the zone of 40 to 100 km from land; and standard deviation of the difference (StdD) over near 1000 pairs were 0.51 psu for JPL L3 SSS and 0.56 psu for RSS70km_L3. In the zone closest to land within 40 km, StdD increased to 1.03 psu for JPL_L3 and 0.67 psu for RSS70km_L3, while the number of matchups for JPL_L3 (630 pairs) was more than doubled of that for RSS70km_L3 (255 pairs). The statistical result is considered very encouraging, particularly in the coastal area 40 km from the land, since satellite SSS with uncertainty less than 0.5 psu could provide useful information in studying processes with large salinity variations.

B. Validation of Level 2 SSS
The validation based on L3 matchups (see Section III-A) provides important uncertainty analysis of SMAP SSS products in the coastal region. However, to identify possible deficiencies in the retrieval algorithm, specifically the land correction for this article, we need to look at level 2 data which is the direct output from the retrieval algorithm. Examining the differences of L2 and saildrone matchups will allow us to understand exactly where and how well the land correction algorithm works for SSS retrieval. The quality flag associated with each L2 retrieval can be used to identify information useful for algorithm improvements. Such information might be concealed in the L3 matchups, after filtering and spatiotemporal averaging involved in the L2 to L3 process.
We created the database of SMAP L2 and saildrones for JPL_L2, RSS40km_L2 and RSS70km_L2. We included analysis for RSS70km_L2 here for completeness, keeping in mind that RSS40km_L2 is the only output from RSS retrieval, while RSS70km_L2 is the spatial average of nine nearest neighbors [9]. We then select matchups used for validation using the SMAP quality flags which are provided in L2 data files. Since applying the whole set of RSS quality flag (bits 0 to 15) would completely eliminate all RSS40km_L2 matchups, we used a subset of the quality flags for each SMAP product. Note that since there is no one-to-one correspondence between the quality flags for JPL and RSS products, caution was taken to ensure the selected L2 matchups were obtained under similar conditions. Specifically, for JPL_L2 we excluded those if any of bit-0, 1, 2, and 4 of the JPL quality flag is set [Fore et al., 2020], while ignored bit-5 (wind speed > 20 m/s), −6 (SST < 5°C) and −7 (land detected in SWC). And for RSS40km_L2 we excluded those if any of bits 0-10 of the RSS quality flag is set [9], while ignoring bit-11 (SST <5°C), −12 (wind speed > 15 m/s), −13 (light land contamination, G land > 0.001), −14 (light sea ice contamination, ice fraction > 0.001), −15 (rain > 0.1 mm/h). Fig. 4 shows the distribution of Level-2 matchups with quality control described above, for JPL_L2, RSS40km_L2 and RSS70km_L2. Statistics of L2 SSS validation with saildrones are also given in Table I. Comparing with L3 matchups, the StdD for L2 increased in all three distance zones which is expected a result of SMAP L2 to L3 processing in reducing noise. We noticed that compared with L3 matchups, the number of L2 matchups are reduced to a different degree for JPL and RSS products in different distance zones. Particularly puzzling was that there were only 13 pairs of JPL_L2 matchups [see Fig. 4(a)] found within 40 km from land, while 630 pairs were found for JPL_L3 [see Fig. 3(a)]. This is because during JPL L2 to L3 processing, a value on a specific L3 grid point is the average of all valid L2 retrievals within 45 km radius in eight days with Gaussian weighting. This procedure was performed globally in JPL L2 to L3 processing, which has the effect of propagating valid retrievals towards land. On the other hand, RSS interpolate SMAP measurements to fixed grid which are exactly the same for RSS L2 and L3 products. Specifically, RSS40km_L3 is the average of nine RSS40km_L2 values (4 days before and after), and RSS70km_L3 is the average of nine RSS40km_L2 nearest neighbors based on data from each of eight days with rigorous filtering. We point out that the interpolation RSS implemented before L2 retrieval may also propagate SMAP observation towards to coastline as well.

C. Diagnostic Analysis Based on L2 Matchups
We examine the distribution of the difference of SMAP L2 SSS and collocated saildrone salinity (dSSS) associated with ancillary parameters in the SSS retrieval. Any systematic patterns of dSSS may reveal problems in the retrieval algorithm. The bias against saildrone measurements is generally small, with most |dSSS_JPL_L2| < 2psu (light color). The few points with large biases (dark red or dark blue color) are randomly distributed. Particularly encouraging is the similarity between 40-100 km and >100 km zones, with no systematic dependence on the distance to land or SST. Within 40 km to land, however, the retrieved SSS are noisy [see Fig. 5(a)], with points of positive or negative biases mixed for SST less than 18°C and more dominated by positive dSSS above 18°C. Indeed, majority of these points with large biases seen in Fig. 5(a) are effectively eliminated by the JPL quality flags applied, resulting in the rather clean pattern of Fig. 5(d).
In contrast, RSS40km_L2 [see Fig. 5(b)] shows systematic pattern of large biases in the coastal zone: from large positive dSSS_RSS40km_L2 (>2 psu) clustered around 40 km distance to land, and a large negative dSSS_RSS40km_L2 (< −2psu) from 50 to 80 km from land. This systematic pattern remained when we excluded points according to RSS quality flags as described in Section III-B [see Fig. 5(e)]. However, when we applied additional RSS land filtering criteria (excluding RSS40km_L2 where F land ≥ 0.0001 or G land > 0.04), the negative dSSS_RSS40km_L2 cluster disappeared [see Fig. 5(c) and (f), right column], regardless of quality control. It is still puzzling to us why the RSS land filtering would eliminate points further away from land.
Similar dSSS patterns are found with respected to surface wind speed (see Fig. S1) and latitude (see Fig. S2). In summary, comparison of saildrone and SMAP L2 matchups indicated potential issues in the land correction algorithms. Positive dSSS (SMAP SSS retrieval too low) may result from under correction (corrected TB too high), and vice versa, negative dSSS (SMAP SSS retrieval too high) may result from over correction (corrected TB too low).

IV. DISCUSSION
The discrepancies between SMAP SSS products in coastal regions (see Section III) are rooted in the different approaches to mitigate land contamination between JPL and RSS retrieval algorithms. Radiometer measured TB represents integrated energy that is received from the entire visible disk of the Earth weighted by the antenna gain. Even when the main lobe of the SMAP antenna pattern is over water, a portion of the energy received could be due to land, and can have a significant bias on the retrieved SSS since emissivity from land surface is much higher than that from water surface at L-band. As illustrated in a simplified sketch (see Fig. 6), JPL's land correction algorithm attempts to remove emissivity received from the land portion within main-lobe in a land-water mixed footprint, while RSS's method is limited to sidelobe correction which removes emissivity from outside of the main lobe. Whenever SMAP's footprint touches land, RSS's method breaks down. This causes a data gap at the coast at least as wide as 40 km (which is approximately the diameter of SMAP footprint). Moreover, collocation of SMAP L2 with saildrone measurements indicates that RSS's coastal data void area extends beyond the SMAP footprint size up to around 80 km from land. In contrast, JPL's land correction delivered encouraging results in the coastal zone 40 km away from land. With respect to saildrone measurements, JPL SSS retrieved 40-100 km from land has biases on the similar order as those further away from land, with no systematic error in terms of distance to land or ancillary parameters (SST, wind speed or latitude) (see Fig. 5 and Fig. S1 and S2). However, data within 40 km from land are very noisy, and most L2 retrievals are flagged and not used in JPL L2 to L3 processing. Apparently, to pursue the goal of pushing SSS retrieval as close to land as possible, JPL's land correction method is more promising. Next,  we briefly review JPL's land correction method and identify areas for potential future improvements.
JPL's land correction algorithm has been developed and implemented for SMAP SSS retrieval for earlier release [8]. Basically, it is a method to remove contribution due to land from observed TB (TB obs .), and use the corrected TB which represents emissivity from the water portion (TB water ) to retrieve SSS. Under the assumption that TB water and TB land are uniform for water and land portion, respectively, we have where f land 1 is the land fraction given by, Here, G is the SMAP antenna gain pattern, F(x,y) is 1 (over land) or 0 (over water) at antenna sampling location projected on the earth surface location (x,y), dΩ is the solid angle of integration, and the integration domain field of view (FOV) is over the entire visible disk of Earth including side-lobes and main-lobe. To identify land and sea components, we used the 24-category Land Cover and Land Use maps from the United States Geological Survey which is posted at 30 arcs resolution [24]. Note TB water is equivalent to the land corrected TB in relevant documentation [Fore et al. 2020]. In theory one can integrate over the antenna pattern using a high-resolution land mask and climatological land TB to compute the land contamination explicitly for every footprint. However, this is not feasible as it would require excessive computing time. Therefore, a lookup-table (LUT) approach is developed. The land TB climatology maps were first generated from SMAP measurements for V and H polarizations for each month. Then, for all ocean points within 1000 km of land, a value called TB land,near is computed by the averaging TB value for all land points within 1000 km of that ocean point. This climatology map of TB near,land represents the expected TB of land that contributes to the observation over the ocean for that particular location and time. To correct a given TB observation, the pre-generated LUTs for f land and TB land,near are interpolated to the measurement location and time, and substituted in (1) replacing TB land by TB land,near to obtain The uncertainty of TB water given by (3) can be derived from error propagation laws as where δ indicates variance of a variable. The first term of (4) is associated with the observation noise, the second term is the noise associated with the uncertainty of f land , and the third term is the noise associated with the estimation error of land surface TB. We show (see Fig. 7) the relative contribution of each noise term to the total variance of TB water , by making a rough estimation of δT obs = (1K) 2 , δf land = (f land /4) 2 , δT near,land = (10K) 2 and T obs -T near,land = 20K [Fore et al., 2020]. The observation noise dominates the total variance for small land fraction up to f land = 0.1; beyond that the noise term due to variance of land TB estimation dramatically increases, causing δTB water to be too large for meaningful SSS retrieval. Based on above analysis, we consider the following for future improvements of JPL's SMAP SSS product in coastal regions. With regard to the development of a dynamical approach for land correction, and is for improving the L2 to L3 processing. 1) Reduce the Uncertainty of Land TB: As described earlier, JPL had produced offline the monthly TB land climatology LUT based on SMAP measurements with the consideration of operational latency for SSS retrieval. This approach could introduce errors in TB land due to the ignored anomaly of land surface emissivity associated with interannual variability or synoptic weather systems. One possible option for a future product is to develop a dynamical approach, that is to replace the current static climatological TB land by using simultaneously measured SMAP TB over land. A similar approach has been developed to improve SMAP TB in coastal regions for soil moisture retrieval [25] and SMAP SSS retrieval near the ice edge [26]. 2) Optimize the Estimation of TB near,land : Currently, LUT for TB near,land is computed by the averaging TB value for all land points within 1000 km of an ocean point. It is still an open question what is an appropriate choice of averaging area to estimate TB near,land . If the area is too large, estimated TB near,land may not be representative for the land portion within the footprint and the variance will increase due to large scale land surface variation. On the other hand, searching an area too small may not find measurements completely over land. Taking the advantage of recently released SMAP TB (version 5) already corrected for water body contribution [25], it is not necessary to impose the requirement on footprint's complete land coverage, and therefore reduce the search area to find TB near,land closer to the coastal zone. 3) Eliminate Outliers in L2 to L3 Processing: Although implementing the land correction delivers more L2 SSS retrieval closer to land, we cannot ignore the fact that some of those retrieved are very noisy. A procedure to eliminate extreme outliers in L2 to L3 processing without sacrificing SMAP's intrinsic resolution is needed, which is critical for resolving features in coastal processes such as upwelling plums or fronts.

V. CONCLUSION
Using saildrone salinity measurements along the west coast of North America, we conclude that in the coastal area 40 km away from land, the uncertainty of SMAP Level 3 SSS is 0.51 psu for JPL_L3 and 0.56 for RSS70 km_L3. This is encouraging for the potential application of SMAP SSS in this coastal zone (40-100 km) particularly in monitoring processes associated with large freshwater variance (exceeding 0.5 psu), for example, river plumes, ocean fronts and coastal upwelling. Within 40 km, the uncertainty increases to 1.03 psu for JPL_L3 and 0.67 psu for RSS70km_L3. While JPL_L3 data covers all the way to the coast, there is a data gap more than 30 km wide in RSS70km_L3.
The discrepancy between JPL and RSS products is resulted from the different approaches to mitigate the land contamination. JPL's land correction algorithm works well in coastal area 40 km away from land where the land fraction within the footprint is generally small. On the other hand, RSS's side lobe correction breaks down whenever satellite footprint touches land; which not only creates a data gap within 40 km to land, but also severely over-correct at distance 50 to 80 km from land.
We believe that JPL's land correction method is promising in pushing SMAP SSS retrieval towards land. For future improvement, we suggest to implement dynamic land correction versus the current climatology-based static land correction to reduce uncertainty in estimation land contribution. For L2 to L3 processing, our goal is to maintain as much as possible the instrument intrinsic resolution, which is important to resolving coastal processes, such as fronts or upwelling plums. A procedure with more rigorous quality control in conjunction with median filtering may help to eliminate outliers without over-smoothing. This article lay the groundwork for future improvements of SMAP derived salinity in the critical coastal regions that are linked to societal benefits.  At JPL, he has been working primarily on scatterometry, both forward modeling of the radar observation as well as retrieval of the geophysical quantity from the radar observation. He also has experience in synthetic aperture radar processing and calibration algorithms.