A Spatial PWV Retrieval Model Over Land for GCOM-W/AMSR2 Using Neural Network Method: A Case in the Western United States

Precipitable water vapor (PWV) is an important and active part of the atmosphere. As known, microwave PWV retrieval is well applied for the ocean but challenging over land. In this article, we creatively established a microwave PWV retrieval spatial model over land using the backpropagation neural network to combine the high precision ground-based global navigation satellite system (GNSS) data and high spatial continuity satellite-borne data. Three-year data from 167 GNSS stations located in western America were utilized for training the network, and 40 untrained sites were selected as the test set. Root-mean-square error (RMSE) of the test set can reach 3.90 and 3.88 mm in the ascending (As) and descending (De) orbits, respectively. Then, we analyzed the influence of land cover types on the model over land. As a result, we found that stations located in the area with a single large-scale continuous land cover type had higher retrieval accuracy. In contrast, stations with diversified land cover types had lower precision. Furthermore, after fully considering the impact of land cover type, we performed an improved model with 61 stations on the single large-scale continuous grasslands, and the results of 8 stations as the test set showed that the RMSE could reach 3.41 and 3.31 mm in the As and De orbits, respectively. Compared with the spatial model established previously, the accuracy had been improved by about 13%. We think this is due to the stable physical properties (such as microwave emissivity) of the single large-scale continuous land cover type.

A Spatial PWV Retrieval Model Over Land for GCOM-W/AMSR2 Using Neural Network Method: A Case in the Western United States

I. INTRODUCTION
W ATER vapor accounts for only about 0.25% of the total mass of the atmosphere, but it acts as the main source of precipitation and a major absorbing gas of atmospheric radiation [1], [2], [3]. It has significant variability, and the temporal and spatial changes of water vapor greatly affect the global hydrological cycle, climate change, and other environmental issues [3], [4], [5]. In the field of geodesy, water vapor is also the main influencing factor of troposphere wet delay in many space geodetic techniques, such as global navigation satellite system (GNSS) [6], [7] and very long baseline interferometry [8], [9]. In this article, we used precipitable water vapor (PWV) to define the total water vapor content in the atmosphere. PWV retrieval with high spatial resolution and accuracy can help us in environmental monitoring and further improve space geodesy accuracy.
There are two methods commonly utilized to retrieve PWV, which are ground-based and space-based [10]. Ground-based observations usually have higher accuracy. The commonly used ground observation methods include GNSS observations, ground-based microwave radiometer observations, and radiosonde observations (RAOBs) [11], [12]. Among these methods, RAOBs can only detect the atmosphere twice daily, which has a poor time resolution. Differently, GNSS observations and ground-based microwave radiometers can provide continuous observations with relatively high time resolution. However, ground-based observation can only be regarded as a technology in the point domain. It is challenging to apply on a global scale due to the high cost of constructing ground-based observation stations [12]. In contrast, space-based observation, such as polar-orbiting satellites, can observe global data [13], [14]. Thus, space-based observation can be regarded as a measurement in the surface domain. According to the measurement band of the satellite-borne sensor, space-based observation can be divided into two categories: optical and microwave remote sensing. Visible sensors, such as the Global Ozone Monitoring Experience 2, measuring on the water vapor absorption lines in the blue spectral band [15] and near-infrared and thermal infrared sensors, such as the Moderate Resolution Imaging This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Spectroradiometer (MODIS), measuring on the water vapor absorption lines near 0.94 and 11 µm are the standard tools of optical remote sensing used to retrieve PWV [13], [16], [17]. Due to the relatively short waveband of optical remote sensing measurement, the spatial resolution of water vapor retrieval in the optical band is relatively high. However, they have their own defects. Visible light remote sensing can only obtain data during the daytime, and infrared remote sensing can only be used for retrieval under clear sky conditions due to the weak penetrating ability [13], [16]. In contrast, although the spatial resolution of microwave remote sensing is lower than that of optical remote sensing, the microwave has a relatively strong penetration ability, so it can be used to retrieve PWV under nearly all conditions [18]. Water vapor has a strong absorption line near 22 GHz, and passive microwave sensors measuring at microwave frequencies of 18 and 23 GHz of the earth's surface-atmosphere system are commonly used to retrieve PWV [19], [20]. Among these sensors, Advanced Microwave Scanning Radiometer 2 (AMSR2) is a widely used sensor.
Considering the characteristics of the high accuracy and time resolution of ground-based measurement and high spatial resolution of the space-based measurement, many algorithms have been proposed to integrate ground-based and space-based measurements. Among these algorithms, the neural network method is novel and reliable. Zhang and Yao [21] proposed a neural network method to use the high-quality GNSS PWV to calibrate and optimize the MODIS and European Centre for Medium-Range Weather Forecasts Reanalysis 5 (ERA5) PWV based on the dense GNSS stations in North America. Compared to the original MODIS and ERA5 PWV, the total improvements in root-mean-square error (RMSE) were 37.1% and 15.8%. In the spatial domain, most GNSS stations have been well-calibrated, but there were still some stations that have not been well-calibrated. Stations with large RMSE were mainly distributed in the western coastal areas. Ma et al. [22] proposed a novel NIR retrieval algorithm based on MODIS products and GNSS stations located in North America, which had an excellent accuracy improvement in the entire spatial domain. Also, this research indicated that the land cover types in the spatial domain greatly influenced the retrieval model and analyzed the model performance on different land cover types. Xiong et al. [23] utilized a similar method to fuse GNSS, MODIS, and REA5 PWV data in China at different timescales, and validation results demonstrated that modifying the MODIS and ERA5 PWV at the monthly timescale results had the best accuracy. Jiang et al. [24] applied the backpropagation (BP) neural network method to combine AMSR2 and GNSS data to retrieve PWV on the global scale. Data from 2012-2016 and 2017 at 82 GNSS stations distributed globally were utilized as training and test sets. The validation results showed that the best mean RMSE of the test set was 3.53 mm. In the aforementioned studies, they only utilized the data of different time periods at the fixed GNSS stations as training and test sets. These algorithms can only predict the PWV values in the time domain on the trained stations, but they cannot predict the untrained stations. Therefore, these algorithms were still made in the time domain without extending to the spatial domain. In this article, we aimed to establish a spatial retrieval model to complete the prediction in the time and spatial domains over land using a neural network. Ground-based GNSS and satellite-borne microwave sensor AMSR2 data were utilized to build the model. Compared with infrared or visible light frequencies, microwave technology can get data under nearly all conditions and has been successfully applied over the oceans [25]. But microwave retrieval is challenging over land due to the variable emissivity of the complex land surface. And the accuracy of microwave PWV retrieval in the spatial domain is seriously affected by the complexity of land surface parameters [18]. Thus, for the spatial model over the land we established, here we will further improve the model by introducing the land cover types.
In this article, we selected the western region of the United States to establish a PWV retrieval spatial model using the neural network method to combine the AMSR2 and GNSS data. The GNSS-derived PWV (GNSS-PWV) data and other related parameters of 207 GNSS stations from the SuomiNet network of 2015-2017 and corresponding brightness temperature (Tbs) data of AMSR2 were selected as the dataset to train and test the neural network. Then, we further introduced the influence of land cover type on the built model over land in the spatial domain, and we proposed an improved model considering the land cover type in the end.
This article consists of five sections. Section I is the introduction, Section II includes the data used in this article, Section III is the theory, and Section IV consists of the results and discussion. Finally, Section V concludes this article.

A. AMSR2 Brightness Temperature Data
AMSR2 is the next generation sensor to replace the AMSR-E, which completed its mission in October 2011 [26]. Similar to AMSR-E, the AMSR2 sensor is a six-frequency dual-polarized microwave radiometer. It can observe horizontally and vertically polarized Tbs at 6.925, 10.65, 18.7, 23.8, 36.5, and 89.0 GHz, and there is an additional channel at 7.3 GHz in the design to mitigate radio frequency interference [27]. It was carried on the sun-synchronous satellite Global Change Observation Mission 1st -Water "SHIZUKU" (GCOM-W1) launched by the Japan Aerospace Exploration Agency on 18 May 2012 [28]. The nominal passing time of AMSR2 is 1:30 A.M./P.M. over the equator, and this time will shift in each orbit. Also, the pixels in other latitude areas have different acquisition times [29]. The launch of AMSR2 and AMSR-E can help us better detect the earth's surface and atmosphere. In recent years, AMSR2 and AMSR-E have been widely applied to retrieve PWV and other products (such as land surface air, temperature, and soil moisture) related to earth's surface and atmosphere [10], [24], [29], [33].
The Tbs products of AMSR2 were utilized to retrieve PWV, they are available in three levels: Level-1 (L1) is original observation, Level-2 (L2) is swath product of Tbs, and Level-3 (L3) is global gridded Tbs product [30]. The products used in this article were global gridded products at 0.1°spatial resolution.

B. SuomiNet GNSS-PWV Data
SuomiNet is a GNSS stations network mainly in North America providing high accuracy and time-resolution GNSS-PWV data. The GNSS-PWV was estimated by the Zenith Tropospheric Delay and a conversion factor [12]. The estimated RMSE of the GNSS-PWV data of SuomiNet is less than 2 mm, and the time resolution is 30 min [12]. The SuomiNet network includes approximately 360 GNSS stations. Except for GNSS-PWV data, every GNSS station can provide spatial parameters, such as elevation, longitude, and latitude, as meteorological information, such as air pressure and surface temperature. We chose a square area of the SuomiNet North American network in this article with a denser GNSS station distribution to build the spatial model. The range of the experimental area is 30°-50°N and 100°-130°W. There were about 207 GNSS stations selected in the experimental area, which are shown in Fig. 1. The 167 stations in the blue color were set as the training set, and 40 stations in the blue-red mixed color were as the test set. The GNSS-PWV was utilized as the output of the neural network. As for the inputs, we selected the longitude, latitude, elevation, and surface temperature provided by the stations. The time period was 2015-2017.

C. MODIS Global Land Cover Type Product
MODIS is a key instrument aboard the Aqua and Terra satellites. It has a total of 36 spectral channels, and these spectral channels can achieve full-range spectral coverage of 0.4-14.2 µm [31]. Many products related to the earth can be derived from the MODIS measurement. These products can be divided into three categories: land, atmosphere, and ocean.
The global land cover type product MCD12C1 used in this article belongs to the land category. It describes land cover characteristics derived from observations of Terra and Aqua MODIS data and provides an annual image of the global land cover type. Each product file is a global grid product with a resolution of 0.05°, including 17 land cover classes defined by the International Geosphere-Biosphere Programme [32]. We downloaded the products for the three years 2015 to 2017 and calculated the three-year average land cover type. The 17 land cover classes of the three-year average land cover type are shown in the color bar in Fig. 1. They are expressed by number 1-17 in the grid. The average land cover type of 2015-2017 was used here. The base map of Fig. 1 demonstrates the distribution of average land cover type in the experimental area.
In the experimental area, we can find that the land cover type is complex. In the northwest coastal area, there are dense forests. In the southwest coastal area, there are mixed land cover types of grasslands, croplands, and urban. The land cover type in the central and northern parts is relatively simple, with grasslands playing a main role in this area. The south is a large area of shrublands, and Permanent is mixed in the shrublands. Northeast and southeast areas have some scattered croplands. On these complex land cover types, dense GNSS stations are utilized to establish the spatial model.

A. Spatio-Temporal Match
The global gridded product of Tbs of AMSR2 and land cover type product MCD12C1 are 0.1°× 0.1°and 0.05°× 0.05°grid data. GNSS stations of the SuomiNet network have a discrete distribution with specific latitude and longitude. For the spatial matching of AMSR2, GNSS, and MCD12C1 products, we first found out the four grid points around the location of the site, and then determined the Tbs and land cover type at the site using the bilinear interpolation method. For the time matching, we first found out the two closest observation times of GNSS according to the scan time of AMSR2 and then unified the time of GNSS-PWV with AMSR2 using linear interpolation. In this way, we had completed the spatio-temporal matching of these three types of data.

B. Theory of BP Neural Network
We applied the BP neural network to our research, which is one of the most mature and commonly used neural networks [33]. Compared with other types of the neural network, BP neural network has a relatively simple structure so that it is easier to implement through computer programs [34]. And, it is widely applied to complicated nonlinear fitting [35].
There are commonly one input layer and one output layer of BP neural network and they can contain multiple inputs and outputs. But, it generally has one or more hidden layers, and each hidden layer can have multiple neurons. Fig. 2 shows a simple BP neural network structure with a hidden layer, where the number of inputs and outputs is p and q, respectively. In the hidden layer, the number of neurons is s, the output and the threshold are a j and b j , respectively, and the transfer function is f 1 . In the output layer, the threshold and the transfer function are b k and f 2 , respectively. Additional, w ij and w jk are the weights. After layer-by-layer transmission and calculation, the output y k of the network can be obtained. The "real" output is assumed to be d k . From the input layer to the hidden layer, the output a j of the j th neuron is 1, 2, 3, . . . , p; j = 1, 2, 3, . . . , s) . (1) Then, we can calculate the output y k Error function e of the neural network is defined as Before neural network training, we first need to set the number of hidden layers and the number of neurons of one hidden layer in the neural network. Then, we need to set the maximum number of training iterations and the minimum error as the threshold for stopping training. When the neural network error exceeds the minimum preset mistake, the neural network will backpropagate to adjust the weights and thresholds. In the end, the neural network training stops when the neural network error reaches the preset minimum value or reaches the maximum training iterations.
Due to the complex parameters over land related to microwave PWV retrieval, it is difficult to reflect the exact relationship of PWV to each nonlinear parameter using the atmospheric radiative transfer equation. Therefore, the neural network is utilized here as a reliable method to deal with the relationships between PWV and each related parameter. Due to the strong water vapor absorption band near 22 GHz, 18-and 23-GHz channels of AMSR2 were widely used for PWV retrieval [36], [37]. Then, the absorption frequencies of CLW are 89 and 36 GHz, and the polarization difference of 89 and 36 GHz can be used to indicate the impact of CLW [38]. Land surface temperature and elevation can be obtained from SuomiNet GNSS network and expressed as T s and H. T s and H are other parameters related to PWV retrieval [37], [38]. These related parameters will be applied as inputs in the neural network. Fig. 3 shows the BP neural network structure designed in this article. According to the previous research and our analysis of the atmospheric radiation transfer theory as mentioned earlier, we finally set the Tbs polarization difference ΔT b (18) , ΔT b (23) , ΔT b (36) , and ΔT b(89) of AMSR2 and H, T S of Suominet GNSS stations as inputs, and GNSS-PWV values as output. Aim to spatialize the created model and predict the untrained GNSS stations, the longitude, and latitude of the GNSS stations (spatial parameters) were also utilized as inputs.

A. Accuracy Verification of the Spatial Model by GNSS-PWV Data
The point density figures and the RMSE spatial distribution of individual GNSS stations were used to show the accuracy of the spatial model verified by GNSS-PWV data. Data from 2015 to 2017 of 167 stations in blue color shown in Fig. 1 were utilized as the training set. And, the whole 2015-2017 data of the 40 untrained stations in blue-red color shown in Fig. 1 were selected as the test set to predict. For the spatio-temporal matching for AMSR2 and GNSS data, the bilinear interpolation method was applied to perform spatial matching, and linear interpolation was used to unify the time of GNSS-PWV with AMSR2. The R values are all greater than 0.85, showing that the PWV from the BP neural network and GNSS PWV have a strong correlation. Then, RMSE was utilized to compare the accuracy. We can find that the prediction accuracy of the test set is about 0.2 mm lower than the training set, which is small. This proved that the built model had excellent performance in predicting the PWV of the untrained stations in the time and spatial domains. Additionally, the results show that De orbit can match more points, and the accuracy of De orbit is slightly higher. We think that the lower accuracy of As orbit is attributed to the interference of sunlight. According to the microwave radiation transmission model, microwave sensors receive radiation emitted by the atmosphere and land surface [28], [37]. Thus, solar radiation can interfere with the measurement. As known,  the As orbit data are mainly collected during the daytime, and the De orbit data are observed at night. Therefore, the As orbit can get more interference from sunlight than the descending orbit, which causes that As orbit has slightly lower accuracy.
Furthermore, we calculated the RMSE on individual stations. Fig. 5 shows the RMSE histogram of stations as the test set. From Fig. 5, we can find that the RMSEs of different stations are uneven and have a big difference. There are about five stations in As orbit and four stations in De orbit with RMSEs larger than 5 mm. Then, we completed the spatial distribution of RMSE on individual GNSS stations, and Fig. 6 shows the results. The results of Fig. 6 show that the selected area has an uneven retrieval accuracy distribution. And the stations with high accuracy (blue points) are primarily located in the northern and central regions, whereas the stations with low accuracy (red and yellow points) are mainly distributed in the southwest and southeast area. We think it may be caused by the complex land cover types in the selected area.

B. Study on the Relationship Between the RMSE Spatial Distribution and the Types of Land Cover
Microwave radiative transfer was easily affected by complex land surface physical properties (such as microwave emissivity parameters) [36], [37], [38]. On land, different types of land cover have different physical properties, which means the complexity of land cover types can cause the variability of land surface physical parameters. Thus, we think that the complex land cover types in the selected test area led to uneven spatial PWV retrieval accuracy.  To verify the aforementioned analysis, we need to simplify the land cover types to find the laws. First, we performed the experiment on the single land cover type. According to the land cover type on the station, we divided the GNSS stations into eight categories. As shown in Table I, in the selected area, most stations belong to the type of grasslands, with 94 stations, shrublands type are second, with 27 stations, fewer stations are located in the rest of the land cover types, which are less than 20 stations. Then, as shown in Fig. 1, the distributions of grasslands and shrublands are relatively clustered, and there were enough stations to establish the spatial model, whereas the distributions of other land cover types are relatively scattered. Therefore, we selected grasslands and shrublands as examples to establish a spatial model on the single land cover type. Fig. 7 shows the RMSE spatial distribution on individual GNSS stations of the spatial model on grasslands and shrublands. It can be observed from Fig. 7 that stations located on continuous grasslands and shrublands have higher accuracy, whereas stations with lower accuracy are located in the areas of grasslands and shrublands mixed with other land cover types. Take Fig. 7(a) and (c) as examples, the stations in the southwest and southeast are located on the grasslands, but they are scattered and surrounded by croplands and other land cover types. Therefore, the variability of the physical properties and physical parameters happens in these mixed land cover types, which will further cause low PWV retrieval accuracy. On the contrary, stations located on the entire grasslands areas without mixing with other land cover types, such as the center of the test area, can get a high retrieval accuracy due to the stable physical properties. Fig. 7(b) and (d) shows a similar phenomenon, stations located on entire shrublands have higher accuracy, whereas stations with lower accuracy are located on mixed land cover types. According to the results, we can conclude that the proposed spatial model can achieve relatively higher accuracy on a single large-scale continuous land cover type, but lower accuracy on mixed land cover types. This is because the single large-scale continuous land cover type has stable physical properties, but the mixed land cover types have variable physical properties.
According to the aforementioned analysis, removing the stations located on the mixed land cover types and building the model on the single large-scale continuous land cover type will improve the model. Thus, we only selected the GNSS stations located in the central single large-scale continuous grasslands area to build an improved spatial model. There were 61 stations selected for the improved model, 8 stations were utilized as the test set, and the remaining stations were utilized as the training set. After spatio-temporal matching, 23 462 and 3 323 pairs of matched points were selected as the training and test sets in the  From the results in Fig. 8, the BIASs are all close to 0, and the STDs are close to the RMSEs, which shows a low system difference. In the As orbit [see Fig. 8(a) and (b) Fig. 4(a)-(d), the accuracy of the improved model had increased by about 13%. From the results in Fig. 9(a) and (b), we also can see that the RMSEs in most points are small with blue color, which means the RMSE distribution is relatively uniform in the selected central area. This is because the improved model was built on a single large-scale continuous grasslands area. In this area, grasslands play a leading role without other land cover types. Thus, this area has single physical properties. The 13% accuracy improvement benefits from the switch in physical properties from variability to stability. This is also why microwave is well applied to the ocean.
In summary, the improved model established on the single large-scale continuous grasslands had a great performance. This example also verified the theoretical analysis in Fig. 7 that the microwave spatial model had high accuracy on the single largescale continuous land cover type but low accuracy on the mixed land cover types.

V. CONCLUSION
In this article, there are two main innovations. First, we successfully applied BP neural network to establish a spatial PWV model combined with the ground-based GNSS data and AMSR2 space-borne remote sensing data. The untrained stations were utilized as the test set to predict the PWV in the time and spatial domains. In the comparison of GNSS-PWV, the RMSEs of the test sets can reach 3.90 mm of As orbit and 3.88 mm of De orbit, and this proved that the spatial model we built had an excellent prediction performance. Furthermore, we found that the RMSE on individual GNSS stations was distributed unevenly. Second, we proposed that the great variability of physical properties (such as microwave emissivity parameter) happened on the complex land cover types and led to the uneven distribution of RMSE on individual GNSS stations. From the examples of grasslands and shrublands, we found that the spatial model had relatively higher accuracy on a single large-scale continuous land cover type, but had lower accuracy on mixed land cover types. This is because the single large-scale continuous land cover type has stable physical properties, and the mixed land cover types have variable ones. To further verify the aforementioned analysis, we proved an improved model on the single large-scale continuous grasslands in the central region. From the results, RMSEs are 3.41 mm of As orbit and 3.31 mm of De orbit for the test set. Compared with the spatial model established on all land cover types, the accuracy had been improved by about 13%, and the retrieval accuracy also had a more uniform spatial distribution. This accuracy improvement benefits from the switch in physical properties from variability to stability.
In this article, based on the characteristics of GNSS and remote sensing observations, we used BP neural network to execute the combined operation with the point domain observations of GNSS stations and surface domain observations of remote sensing. Then, the PWV values of the untrained stations were well predicted in the time and spatial domains. And we analyzed the uneven spatial distribution of retrieval accuracy on the individual station. Additionally, the relationships between retrieval accuracy and land cover types were discussed in depth. This study will probably make a contribution to microwave PWV retrieval and interdisciplinary research on remote sensing and GNSS.