Digital Soil Mapping Based on Fine Temporal Resolution Landsat Data Produced by Spatiotemporal Fusion

Multitemporal Landsat-8 satellite images with fine spatial resolution (i.e., 30 m) are crucial for modern digital soil mapping (DSM). Generally, cloud-free images covering bare topsoil are common choices for DSM. However, the number of effective Landsat-8 data is greatly limited due to cloud contamination coupled with the coarse temporal resolution, and interference of material covering topsoil in most of the months, hindering the development of accurate DSM. To address this issue, temporally dense Landsat images were predicted using a spatiotemporal fusion method to improve DSM. Specifically, the recently developed virtual image pair-based spatiotemporal fusion method was adopted to produce simulated Landsat-8 time-series, by fusing with 500-m moderate resolution imaging spectroradiometer time-series with frequent observations. Subsequently, the simulated Landsat-8 data were used for distinguishing different soil classes via a random forest model. Training and validation samples of soil classes were collected from legacy soil data. Our results indicate that the simulated data were beneficial for improving DSM owing to the increase in class separability. More precisely, after combining the observed and simulated data, the overall accuracy and kappa coefficient were increased by 3.099% and 0.047, respectively. This article explored the potential of the spatiotemporal fusion method for DSM, providing a new solution for remote-sensing-based DSM.


I. INTRODUCTION
S OIL is critical for ecosystems cycling and plays a considerable role in carbon cycling, food security, biodiversity, environmental quality, and human activity [1], [2], [3], [4]. To sustainably utilize the soil resources, it is necessary to accurately distinguish soil classes on the Earth's surface. Traditional field surveys based on soil profiles are laborious, resulting in spatially sparse measurements of soil profiles. Moreover, spatial Manuscript  variations of the soil profile cannot be accurately portrayed. However, the results of traditional field surveys are important data sources for legacy soil data, which have been used as key data in many fields. With the continuous development of information technology, a number of digital soil mapping (DSM) methods have been developed [5], [6]. Reliable DSM can be achieved accurately by coupling the spatial differences between soil and environmental factors [7], [8], [9]. As significant environmental factors, legacy soil data are always used for DSM, which can provide an approximate description of the soil information. Pahlavan-Rad et al. [10] demonstrated that the use of legacy soil data can improve the classification accuracy. Kempen et al. [11] accomplished DSM in The Netherlands using legacy soil data based on a multinomial logistic regression approach. An accuracy improvement of approximately 6% indicates that the use of the legacy soil data is valid. Legacy soil data can also be used as prior information. Specifically, multiple samples collected from legacy soil data are commonly used for training and validation purposes. Collard et al. [12] updated the soil map of France by using soil samples acquired from legacy soil data. It should be noted that fuzzy boundaries in legacy soil data can provide unavoidable uncertainties, which can affect the DSM accuracy. To address this issue, Yang et al. [13] extracted training and validation samples from the core areas of legacy soil data to ensure accurate DSM. In addition to legacy soil data, optical satellite images are also important environmental factors for DSM, as they provide multiple reflectance bands with different wavelengths. Considering that different wavelengths of the reflectance bands are sensitive to disparate information for topsoil, the difference in topsoil can be characterized by satellite images [14], [15]. This type of data can provide continuous spatial observation, frequent updates, and convenient downloads [16], [17]. Accordingly, Kornblau et al. [18] used multispectral satellite images to generate soil classification maps, and Dobos et al. [19] supplied a normalized difference vegetation index (calculated from satellite images) in DSM to characterize vegetation phenology [20]. With topsoil information being the primary DSM target, Flynn et al. [21] developed a soil adjusted vegetative index for predicting soil properties at a farm-scale to intensify topsoil information.
Monotemporal data are common choices for DSM; however, the accuracy of using monotemporal data is generally limited, since many inevitable factors can amplify the uncertainty, such as abnormal climatic conditions and artificial land cover This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ changes. Multitemporal data have been widely applied in DSM to increase the accuracy. By extracting common information and capturing subtle spectral difference, the uncertainty of monotemporal data can be effectively reduced [22], [23]. Both Maynard et al. [24] and Yang et al. [13] demonstrated that numerous optical satellite data can improve the reliability of DSM. Dematte et al. [25] developed a procedure to improve DSM accuracy by composing a representative image based on multitemporal satellite images. Therefore, the use of multitemporal data is a preferable choice in DSM. It should be noted that, however, the interference of dense land coverings (e.g., vegetation, crops, and snow) is the main obstacle for DSM using satellite images, as they contaminate the spectral reflectance of topsoil. Specifically, to acquire the topsoil spectral reflectance directly, the satellite images for DSM are always selected in a period when the topsoil is bare (called the bare soil period) [26], [27], [28]. However, the bare soil period is always short for most of the areas. This means that it can be very difficult to collect sufficient usable satellite images for DSM in the required period.
It has been acknowledged widely that Landsat-8 satellite images are common choices for DSM [29], [30], [31], as the 30-m spatial resolution can clearly describe the spatial texture at a large scale [32], [33], [34]. Moreover, the stable data quality and long on-board periods are beneficial for DSM using multitemporal data. However, there are still three problems constraining the development of DSM using multitemporal Landsat-8 data. First, optical satellite images of the study area should be cloud-free, but optical satellite images are easily contaminated by atmospheric conditions, resulting in the gaps in the data [35], [36]. Second, Landsat-8 satellite images have a coarse temporal resolution (i.e., 16-day), leading to limited number of effective images. This can be further exacerbated by the limited usable data in the short bare soil period, as mention earlier. Third, for distinguishing soil classes, the class separability is small using images at limited time points. This means that the available optical satellite data are not sufficient for DSM. Therefore, it is worthwhile to increase the number of usable multitemporal Landsat-8 satellite images to improve DSM, especially in the bare soil period of study area.
In this article, a strategy for improving DSM was proposed. Currently, spatiotemporal fusion has shown its applicability in increasing the temporal of Landsat time-series. Specifically, by fusing 500 m, daily moderate resolution imaging spectroradiometer (MODIS) data with 30 m, 16-day Landsat data, 30 m, daily Landsat time-series can be simulated [37], [38]. Based on spatiotemporal fusion, this article simulated Landsat-8 time-series data with fine temporal resolution in the bare soil period, which were applied jointly with the observed Landsat-8 data for DSM. We adopted a random forest (RF) model to classify different soil classes and demonstrated that the developed spatiotemporal fusion method is beneficial for improving DSM. By introducing simulated Landsat-8 images produced by spatiotemporal fusion, the separability between different soil classes can be increased, improving the accuracy of DSM. Consequently, the proposed strategy provides new insights for future DSM research by increasing the temporal resolution of Landsat-8 data.

A. Study Area
As shown in Fig. 1, the study area (125°34 -126°21 E, 47°01 -47°16 N) is located in Heilongjiang Province, China, which covers approximately 2360 km 2 . The annual average temperature and precipitation are approximately 2.9°C and 480 mm, respectively. The climate type is monsoon climate of medium latitudes (i.e., Dwa in the Köppen-Geiger climate classification system). Hence, it is rainy and warm in the summer but dry and cold in the winter. This area is called the black soil region by the local inhabitants because the soil surface appears black. In China, the soil in the black soil region is a valuable resource because of its abundant nutrients, which is suitable for cultivation [39]. According to the World Reference Base for Soil Resources, there are three primary soil classes in the study area, including Phaeozems, Chernozems, and Cambisols ( Fig. 1). It should be noted that the three soil classes have similar spectral curves, since the surface of the Phaeozems and Chernozems is covered by black or dark humus, and the spectral curves of Cambisols are similar to those of the adjacent soil classes [40].
Based on basic land cover data and visual interpretation, land use types were classified as farming and construction land. Only farming land was extracted for DSM to ensure the reliability of classification. Most of the extracted farming land is used for cultivation, in addition to land that is easy to waterlog and salinize. Therefore, more than 80% of the farming land is used for cultivation, and the remainder is grassland. Annual crops, such as corns and soybeans, play a dominant role due to climatic conditions. According to the cultivation habit, almost all crop residues are used for burning and feeding livestock by local farmers. Meanwhile, cultivated land is ploughed from the end of March to the beginning of April every year, indicating that the topsoil can be directly exposed without a large area of vegetation and snow cover (Fig. 2). Generally, the crop growing period is from June to September, and the coldest period of the year is from November to February. As a result, the period after ploughing and before crop growth is considered the bare soil period for cultivated land (i.e., April and May).

B. Legacy Soil Data
The legacy soil data in China were completed in the 1980 s based on the second national soil survey of China, as shown in Fig. 1. Although the data are still widely used in studies, the  uncertainty of the legacy soil data is unavoidable due to backward field survey technology and slow update (approximately 40-year temporal interval).

C. Satellite Images
The study period of interest was from 2017 to 2019. By screening, only three cloud-free observed Landsat-8 images in the bare soil period (April and May) were collected for the DSM, as the remaining images were contaminated by atmospheric conditions. Meanwhile, 32 scenes of cloud-free MODIS satellite images in this period were employed in this article. The detailed image dates are listed in Table I. Six surface reflectance bands in both Landsat-8 and MODIS data were considered in this article, including blue, green, red, near infrared (NIR), shortwave infrared 1 (SWIR1), and shortwave infrared 2 (SWIR2). The selection, processing, and download of the satellite data were performed using the Google Earth Engine [41], [42].

D. Sample Collection
We collected training and validation samples from legacy soil data, following three principles. First, the samples were collected in the core area to prevent original errors due to the fuzzy boundaries of the soil classes. Second, for the same soil class, the samples were collected separately in different areas. This is because the same soil class can exhibit disparate spectral reflectance in different regions. Third, to ensure that the samples were spatially uniform, the number of samples for each soil class is affected by the corresponding cover area [43]. Based on these three principles, 100 samples in Phaeozems, 80 samples in Cambisols, and 60 samples in Chernozems were randomly collected (Fig. 1). To minimize the effects of abnormal spectra extracted from a single pixel, all collected samples were executed outwards for buffer processing, at a distance of 50 m. Processed samples were randomly divided into training and validation samples, with a training to validation sample ratio of 1:1 for each soil class. The final training and validation samples include 636 and 638 pixels of Phaeozems, 513 and 524 pixels of Cambisols, and 390 and 387 pixels of Chernozems.

A. Spatiotemporal Fusion Method
The virtual image pair-based spatiotemporal fusion (VIPSTF) method proposed by Wang et al. [38] has shown a flexible and stable performance in blending MODIS and Landsat data. It should be emphasized that the VIPSTF method requires known image pairs as input, that is, the MODIS and Landsat-8 images at the same time (on 04-28-2018, 05-30-2018, and 04-15-2019 in this article). Based on the known image pairs, this method was used to generate simulated Landsat-8 images by fusing the available MODIS images. Specifically, each simulated Landsat-8 image is expressed as follows: whereL_T t is the simulated Landsat image at time T t and ΔL is the 30-m increment Landsat-8 image of the model. L VIP is the virtual Landsat-8 image, which is produced with a linear combination of the multiple known Landsat-8 images, as follows: where a i is a coefficient for the ith image L_T i , b is a constant, and n refers to the number of known Landsat-8 images (n = 3 in this article). According to the assumption of scale invariance [44], the optimal coefficients a i and b are predicted by where M_T t and M_T i refer to known MODIS images at the corresponding time of Landsat-8 images. ΔM denotes the residual image of the regression model in (3). The increment image ΔL in (1) is predicted based on the following spatial weighting scheme: where (x k , y k ) is the spatial location of the similar pixels surrounding the center pixel (x 0 , y 0 ), w k represents a weight acquired from the distance between the center pixel and the kth surrounding similar pixels, and s refers to the number of surrounding similar pixels. In this article, s was set to 30. To match the spatial resolution of Landsat-8, the ΔM image was interpolated to 30 m in advance using the bicubic method.

B. Classification Model
The RF method was used for DSM based on the observed and simulated Landsat-8 images. The RF model is an ensemble learning method consisting of multiple decision trees [45], [46]. Owing to the random and bootstrap-based sampling technique, the RF model can effectively hinder overfitting. Therefore, many articles have adopted the RF model for regression and classification [47], [48], [49]. As a key parameter, the number of trees (n_tree) should be optimized by testing the out-of-bag (OOB) error in the RF model. Specifically, n_tree was set to 150 in this article. In addition, the importance of each independent variable is provided by the RF model according to the OOB error, which is always used to analyze the contribution degree of different independent variables to the model. For the classification model, the soil classes in the training samples (extracted from Section II-D) were used as dependent variables. The values of multiple spectral reflectance bands corresponding to the location of the training samples were regarded as independent variables, which are extracted from different Landsat-8 data. Specifically, the classification model can be constructed using the dependent and independent variables. Subsequently, complete reflectance bands corresponding to the independent variables were injected into the model to acquire the results of the soil classes.

C. Class Separability
To illustrate that the increase of usable data can improve the DSM, the divergence between soil classes α and β (i.e., D αβ ) was calculated, which is a widely used method for measuring the class separability in remote sensing images [50], [51]. The detailed calculation of D αβ is as follows: where μ α and C α are the mean vector and covariance matrix of the normal distribution for soil class α, respectively. Tr represents a trace function. D αβ varies from 0 to ∞, and a higher D αβ means a stronger separability between classes α and β.

D. Validation Strategy
The validation contains two aspects in this article, that is, validating the spatiotemporal fusion method and DSM accuracy.  TABLE II  EVALUATED RESULTS OF THREE SIMULATED LANDSAT IMAGES (OBSERVED  LANDSAT-8 IMAGES ON 05-30-2018 AS REFERENCE) evaluation indices were utilized to evaluate the accuracy of the simulated images, including the correlation coefficient (CC), root mean squared error (RMSE), unbiased RMSE, and bias (Bias). For the DSM accuracy, the validation samples extracted from Section II-D were adopted to evaluate the mapping. Based on a confusion matrix, overall accuracy (OA) and kappa coefficient (Kappa) were calculated to describe the accuracy of the classification. In addition, the user's accuracy and producer's accuracies were used to evaluate the accuracy of each soil class.

A. Experimental Design
In this article, we constructed three image pairs (MODIS-Landsat-8) based on available data. Based on the 29 acquired scenes of MODIS time-series images, 29 scenes of simulated Landsat-8 time-series images were generated using the VIPSTF method, which were subsequently employed for DSM. To illustrate the advantages of the simulated images, we executed three DSM tests using different data, including three scenes of observed images (i.e., Observed), 29 scenes of simulated images (i.e., Simulated), and a combination of observed and simulated data (i.e., Observed + Simulated). Both mono and multitemporal data were adopted for the DSM. The detailed flowchart is presented in Fig. 3.

B. Evaluation of the VIPSTF Method
Each of the three observed Landsat-8 images was simulated (e.g., the data on 05-30-2018), in turn, with the other two image pairs as known data for fusion (e.g., the data on 04-28-2018 and 04-15-2019). The three types of satellite data, including MODIS, observed Landsat-8, and simulated Landsat-8 images, were exhibited in Fig. 4. We randomly selected three subregions to display the spatial details of the data. It can be seen that the Landsat-8 images have finer spatial resolution than the MODIS data. Thus, the Landsat-8 images are more suitable for DSM than MODIS images. Moreover, we found that the observed and simulated Landsat-8 images have similar spatial texture.
The quantitative evaluation results for the simulated Landsat-8 images on 05-30-2018 are depicted in Table II. It is seen that the mean CC and RMSE of the simulated Landsat-8 images are 0.698 and 0.025, respectively. According to CC, the SWIR bands (i.e., SWIR1 and SWIR2) show greater accuracy than the visible bands (i.e., blue, green, and red). In addition, the blue band has  To demonstrate that the simulated Landsat-8 images are similar to the observed Landsat-8 images, we used the training samples to extract spectral reflectance curves from the observed and simulated Landsat-8 images for the three soil classes. The spectral reflectance curves are acquired from the image on 05-30-2018, as shown in Fig. 5. With respect to the three soil  [40]. However, sparse vegetation in grasslands affects the spectra of Cambisols, resulting in an increase in the NIR value and a decrease in the SWIR value.

C. Comparison Between DSM Results Produced Using Different Inputs
Based on the RF model, the classification results of the three tests are portrayed in Fig. 6 (for using monotemporal data, only the optimal results are shown). It is seen that using multitemporal data is more accurate than using monotemporal data. Moreover, the classification results for the simulated images are more satisfactory than those for the observed images. This indicates that the numerous simulated images, which were generated by the spatiotemporal fusion strategy (e.g., VIPSTF), are more effective for DSM than insufficient observed images. Additionally, compared with using the observed or simulated images alone, the combination of the observed and simulated images can reduce the classification errors and image noise, further eliminating classification uncertainty.  The results of the quantitative assessment are presented in Table IV. For monotemporal DSM, we found that using observed images has a higher accuracy than using simulated images. The optimal OA and Kappa of monotemporal observed data (on 05-30-2018) are 79.342% and 0.684, which are 1.614% and 0.024 larger than those of monotemporal simulated data (on 05-02-2019). Meanwhile, the use of multitemporal data can produce more accurate results than optimal monotemporal data. The OAs for observed and simulated multitemporal data are 81.536% and 83.925%, which are 2.194% and 6.197% larger than that for observed and simulated monotemporal data, respectively. Furthermore, the accuracies of DSM can be further increased by using both observed and simulated images, with an OA of 84.635% and a Kappa of 0.766, which are 0.710% and 0.011 larger than those obtained using the simulated images, respectively.

D. Evaluation Based on Separability Between Soil Classes
We calculated the D αβ (the separability between soil classes α and β) of using different images. Fig. 7 shows that using the observed and simulated data jointly yields the optimal D αβ for classification, and the separability is the largest in all cases. We also found that the D αβ of the simulated data is higher than that of the observed data. This is because sufficient common information can be extracted (including 29 scenes of simulated data). However, there are only three scenes of the observed data, which are difficult to provide sufficient common information because of the low divergence. In addition, the D αβ between Phaeozems and Chernozems is distinctly smaller than that between other soil classes due to their similar spectral reflectance curves. Conversely, since the spectral reflectance curve of Cambisols is different from that of Phaeozems and Chernozems, Cambisols can be easily distinguished than Phaeozems and Chernozems.
To analyze the relationship between the separability and accuracy of the DSM, we gradually increased the number of simulated data in the independent variable based on the observed satellite images, and calculated the D αβ and accuracy. Since subtle changes in the independent variable cannot improve the DSM, the increased step of the simulated images was set to 5. For instance, the independent variable contains the observed images and increasing simulated images (e.g., Observed + 5, Observed + 10, etc.). As shown in Fig. 8, the increase of the simulated data gradually improves the accuracy of the DSM. Meanwhile, the separability between disparate soil classes also increases. Hence, separability is positively correlated with the accuracy of DSM.

E. Accuracy of Each Soil Class
To analyze the accuracy increase for each soil class, we calculated the user's and producer's accuracies based on the three tests. It can be seen from Fig. 9 that the three soil classes show greater accuracies through the use of multitemporal data, which are larger than 70%. With respect to Phaeozems and Chernozems, the user's and producer's accuracies based on the observed images are smaller than those using the simulated images.
The results indicate that the use of the simulated data is more effective for improving the accuracy of these two soil classes. However, it is difficult to distinctly increase the accuracy of Cambisols by using simulated data alone. When the observed and simulated data are coupled, both the user's and producer's accuracies of the three soil classes are further increased, which are larger than 80%. Thus, the incorporation of simulated data is significant for improving the accuracy of most soil classes in DSM.

A. Contribution of Different Inputs in the RF Model
To evaluate the contribution of different independent variables for DSM, all independent variables (i.e., including observed and simulated images) were used for analysis based on the RF model. The importance was normalized before comparison. The importance of the six bands is depicted in Fig. 10(a). We found  that the contribution of the blue, SWIR1, and SWIR2 bands are higher than that of the remaining bands. Meanwhile, the three wavelength ranges of the blue, SWIR1, and SWIR2 bands are similar to those of spectral characteristics extracted by Zhang et al. [40] when allocating soil individuals to soil classes in laboratory. This means that the blue, SWIR1, and SWIR2 bands are more suitable for distinguishing soil classes in the black soil region than the other bands in the Landsat-8 image. Fig. 10(b) refers to the data importance on different dates, which displays the top 15 of all the data. It is seen that many of the simulated data can provide a more important contribution to the optimal DSM than the observed data. For example, the simulated data on 04-22-2018 have the highest importance among all the data. For the observed data, the highest contribution is provided by the data on 05-30-2018, ranking the 10th highest contributor overall.
These results indicate that the information acquired from spatiotemporal fusion plays a significant role in the DSM process.

B. Spatial Difference Between the DSM Result and Legacy Soil Data
The proposed spatiotemporal fusion strategy can improve the accuracy of DSM. However, spatial difference still exists between the produced DSM result and legacy soil data. To analyze the reasons, we portrayed the spatial difference between the optimal DSM result and legacy soil data. As shown in Fig. 11, we found that the difference is always located at the boundaries of the different soil classes. There are three primary reasons for the difference. First, the spectral reflectance of topsoil for the three soil classes is similar, which increases the difficulty in the detection of the boundaries between different soil classes. Second, land use change and land degeneration in some areas change the reflectance of the topsoil spectra, leading to the difference in classification. Third, complex terrain can affect DSM; for example, the mapping in the western area is more accurate than that in the central and eastern areas. This is because that terrain can determine the soil classes [52]. In the study area, the western terrain is relatively flat, whereas the terrain in the central and eastern regions is relatively complex.

C. Uncertainty and Future Prospect in DSM
Although the proposed method can effectively improve the accuracy of DSM, there still exist inevitable uncertainties in DSM. First, the legacy soil data were generated in the 1980 s. A temporal gap of approximately 40-year can influence the reliability of the legacy soil data. Although we collected samples from the nonboundary area, these errors cannot be ignored. Second, considering the different ploughing methods for disparate crop types, residual straw and vegetation can still disturb the acquisition of topsoil spectral information in some areas. It is difficult to ensure that the topsoil spectral information is acquired from pure pixels, leading to the uncertainty in the samples.
As a reconstruction method for remote sensing data in the temporal domain, the spatiotemporal fusion method provides simulated data to improve the DSM. Similarly, a reconstruction method for remote sensing data in the spatial domain is inspiring. Specifically, the satellite images contaminated by clouds and haze also contain partly effective information. By removing clouds and haze, more cloud-free images can be simulated [35], [53]. Therefore, DSM can further be enhanced based on the cloud removed results. Additionally, only optical satellite images were employed for accurate DSM in this article. It should be noted that, however, the data from various sources can also provide useful information (e.g., terrain, moisture, and geomorphic information). Theoretically, a combination of optical satellite images and data from various sources has the potential to further improve DSM.

VI. CONCLUSION
In this article, we explored the potential of the spatiotemporal fusion method for improving DSM, solving the insufficient data of Landsat-8 satellite images for DSM. Specifically, the advanced VIPSTF method was adopted to generate temporally more frequent Landsat-8 images by fusing the MODIS data with fine temporal resolution. The RF classifier was employed for distinguishing different soil classes. The results indicated that by introducing simulated Landsat-8 data, the class separability is greatly increased, resulting in the improvement of DSM in terms of accuracy. More precisely, the OA and Kappa of DSM were increased from 81.536% and 0.719 (using only observed, temporally sparse Landsat-8 data) to 84.635% and 0.766 (using observed and simulated, temporally fine Landsat-8 data), indicating that the simulated Landsat-8 data produced from spatiotemporal fusion can be used for enhancing DSM. Thus, the developed method can overcome the limitation of insufficient observed Landsat data in DSM, especially in the short bare soil period.