Wavelength Selection for Estimating Soil Organic Matter Contents Through the Radiative Transfer Model

Considering that large quantities of soil hyperspectral data include the redundancy and overlap of spectral information, the technology of selecting feature wavelength can effectively solve these problems, and improve the accuracy and stability of the soil organic matter (SOM) content retrieval model. Traditional methods of wavelength selection mainly attempt to establish the empirical relationship between reflectance and SOM contents, and the performance is directly related to the quality and representativeness of the “training data”. This study first distinguished the sensitive wavelength interval of SOM through the sensitivity analysis (SA) of the SOM to soil reflectance in radiative transfer model. Then sensitive wavelength points of SOM were ascertained using the successive projection algorithm (SPA): 468, 476, 496, 599, 775 and 900nm. Results show that SOM content can be estimated with high accuracy (root-mean-square error of prediction (RMSEP) < 0.234%, coeficient of determination ( $\text{R}^{2})>82.9$ %) by adopting the selected six wavelengths. Especially at 599nm, the accuracy of SOM content estimation is the highest (RMSEP: 0.176%, R2: 90.4%). Compared with traditional empirical wavelength selection methods, the wavelength selection based on the SA-SPA with the SOM radiative transfer model improves the generalizability and accuracy of the result. The research provides theoretical basis and technical support for the remote sensing retrieval of SOM, the development of rapid spectral instruments, and the bands setting of sensor instrument.


I. INTRODUCTION
Soil organic matter (SOM) is an important part of soil, whose content is generally regarded as a criterion to assess soil fertility and an important indicator of soil degradation [1], [2]. Quickly and accurately grasping the spatial change of SOM content is of great significance for precision agriculture. Hyperspectral remote sensing technology owing to its characteristics of high spectral resolution, multiple bands and strong The associate editor coordinating the review of this manuscript and approving it for publication was Qiangqiang Yuan . continuity will gradually replace the traditional monitoring methods based on chemical analysis [3]- [7]. It can obtain subtle spectral information of ground objects and provides a powerful tool for quantitative prediction of SOM content. However, for practical applications, the spectral information overlaps severely. Selecting the feature wavelengths of SOM is the key to improving the predictive capability of model [8]- [10].
According to the similarities and differences for the principles and characteristics of various wavelength selection algorithms, common SOM wavelength selection algorithms are roughly divided into five categories: (1) Wavelength selection algorithms based on partial least squares (PLS) parameters, such as uninformative variables elimination (UVE) and competitive adaptive reweighted sampling (CARS); (2) Intelligent optimization algorithms, such as genetic algorithm (GA), particle swarm optimization (PSO) and ant colony optimization (ACO); (3) Continuous projection strategy, such as successive projection algorithm (SPA); (4) Model cluster analysis strategy, such as variable iterative space shrinkage approach (VISSA); (5) Wavelength interval selection, such as interval PLS (iPLS), moving windows PLS (MWPLS) and interval combination optimization (ICO) [11]- [27].
However, these wavelength selection methods mainly try to establish the empirical relationship between large quantities of observed reflectance and SOM contents. These statistical methods entail extensive field observations, and their performance is directly related to the quality and representativeness of the ''training data''. In addition, the wavelengths selected using these methods lack a strict physical foundation [28].
In order to solve these problems, this study firstly built the SOM radiative transfer model based on the Kubelka-Munk (KM) theory. Then the sensitivity of the SOM to soil reflectance in radiative transfer model was analyzed. According to the result of sensitivity analysis (SA), the wavelengths in the 450-2500 nm spectral range were classified to distinguish the sensitive wavelength interval of SOM. Finally, the sensitive wavelength points of SOM were ascertained using the SPA. The validation set was used to estimate SOM content at the selected wavelengths, which verifies the effectiveness of the method.
The rest of this paper is organized as follows. Section II provides the description of the SOM radiative transfer model, the details of experimental datasets and the method of wavelength selection with SA-SPA. The results and performance of wavelength selection are discussed in Section III. Section IV presents the conclusions of this paper.

A. EXPERIMENTAL DATASETS
The data set used in this study is the same as [29]. Threequarter whole dataset was chosen by sample set partitioning based on joint x-y distance (SPXY) method [30] and used for the calibration set (n = 82). The remaining was used for the validation set (n = 26). The specific application scenes are: (1) Inverting unknown parameters a 1 and a 2 of SOM radiative transfer model with the calibration set; (2) Selecting wavelength using SA-SPA with the calibration set; (3) Validating the results of wavelength selection using SA-SPA with the validation set. The summary statistics of SOM for the whole, calibration and validation sets are respectively provided in Table 1. The values of the mean, standard deviation (SD) and coefficient of variation (CV) from three sets are relatively similar. Generally speaking, the characteristic statistics of both the calibration and the validation sets are similar to the whole set, indicating that they are well divided to represent the whole set.

B. SOM RADIATIVE TRANSFER MODEL
According to [29], the relationship between transformed reflectance r and SOM content θ can be expressed as With: where k and s are absorption coefficients and scattering coefficients of soil, respectively; R 1 is the reflectance of the soil when SOM content is θ 1 ; a 1 and a 2 are unknown parameters related to wavelength. According to the KM theory, the relationship between infinite reflectance R ∞ and SOM content θ is derived as: For dry soil, reflectance, which is related to SOM, mainly depend on Fresnel reflectance R i and diffuse scattering R d . The relationship can be expressed as: With: where n soil is refractive indices of soil (≈1.5) and n air is refractive indices of air (≈1). The unknown parameter a 1 and a 2 need to be acquired according to the calibration set based on least-squares algorithm. The best criterion for model parameter selection is to VOLUME 8, 2020 minimize the residual sum of squares between the simulated and the measured value. The optimization objective function is constructed as follows: min where, R measure is the measured value for the laboratory, R model is the theoretical value of the model. All data analyses were carried out in Matlab R2014b (The Math Works Inc.: Natick, MA, USA).

C. SENSITIVITY ANALYSIS USING THE SOBOL MRTHOD
SA calculates the fractional contribution of a given input variable to the variance of an output variable. In this study, Sobol's global SA was performed using the Matlab R2014b (The Math Works Inc.: Natick, MA, USA). Within the wavelength range of 450-2500 nm, the sensitivity of SOM to soil reflectance in radiative transfer model was calculated wavelength by wavelength to determine sensitive wavelength interval of SOM. Sobol is a global sensitivity analysis method based on variance decomposition, which quantitatively evaluates the influence of each input parameter and the interaction between the parameters on the output variable by decomposing the variance of the output variable. If y = f (X 1 , X 2 , · · · , X m ) represents the model structure, X 1 , X 2 , · · · , X m represent the model parameters, and m represents the number of model parameters, the variance decomposition formula can be expressed as: where: V (y) is the total variance of the model output y; V i is the variance produced by the parameter X i ; V ij is the variance produced by the interaction of parameters Xi and X j ; V ijk is the variance produced by the interaction of parameters X i , X j and X k ; V 1,2,...,m is the variance produced by the combined action of m parameters.
For parameter X i , the first-order sensitivity index S i can be used to express the direct contribution rate of parameter X i to the total variance of the model simulation results. The totalorder sensitivity index S Ti represents the common influence of parameter X i and all other parameters. The specific formulas can be expressed as: where V ∼i is the variance produced by the interaction of other parameters except parameter X i .

D. SPA
Due to the eficiency consideration that the wavelengths set by the instrument should be as few as possible, the SPA was used to select wavelength points in the determined sensitive wavelength interval of SOM, which can also eliminate the collinearity between wavelengths effectively.
The SPA is a forward-style method of wavelength selection [32]. In the process of generating a wavelength combination, the SPA first starts from a wavelength point and calculates its projection on each remaining wavelength, and adds the wavelength with the largest projection value to this combination. Then the projection step to select the next wavelength is repeated until a certain number of wavelengths are selected to form a wavelength combination. At the same time, since the correlation between each newly selected wavelength and the previous wavelength is the lowest in each wavelength combination generated by the SPA, the SPA generally can effectively eliminate the collinearity between wavelengths.

E. VALIDATION
The root-mean-square error of prediction (RMSEP) and the coeficient of determination (R 2 ) between the predicted and measured SOM were selected to evaluate the model performance.

RMSEP
where y i and y i are the observed and predicted value, respectively; y is the mean of the observed data; n is the number of samples with i = 1, 2, n.

III. RESULTS AND DISCUSSION
A. TESTING OF SOM RADIATIVE TRANSFER MODEL θ 1 is 2.95%. The unknown parameter a 1 and a 2 were acquired by the least-squares algorithm combining the calibration set, wavelength-by-wavelength, in the range of 450-2500 nm. Reflectance related to SOM content can be estimated with validation set by using the model mentioned in (4). RMSEPs between estimated and measured reflectance were computed wavelength-by-wavelength in the range of 450-2500 nm. Fig. 1 shows that the accuracy of the model is high, RMSEPs are generally less than 0.03. Especially in the range of 450-815nm, RMSEPs are less than 0.023. It provides theoretical basis for wavelength selection with SA-SPA using the SOM radiative transfer model.

B. SENSITIVE WAVELENGTH INTERVAL SELECTION USING SA
In order to determine sensitive wavelength interval of input parameters, within the wavelength range of 450-2500 nm, the first-order sensitivity index and total-order sensitivity index of input parameters to soil reflectance in radiative transfer model were respectively calculated by (8) and (9) wavelength by wavelength. The contribution of input parameters to soil reflectance in radiative transfer model varies at different spectral regions. In Fig. 2 and Fig. 3, the contribution of input parameters is marked by a unique color on the basis of the SA results. Fig. 2 and Fig. 3 show that the change of total-order sensitivity index with wavelength is consistent with that of first-order sensitivity index with wavelength. The parameter SOM has unique sensitive wavelength interval where SOM's influence is significantly stronger than that of the other parameters. The sensitive wavelength interval of SOM ranges from 450 nm to 1020 nm. It reveals that the VNIR bands provide the optimal bands in the solar domain (i.e. wavelength between 350 and 2500 nm) for remote sensing of SOM. It is in accordance with previous findings. Yuan et al. found that the SOM retrieval model has the highest accuracy, and the best predictive ability in the range of 552-950nm [29]. Liu et al. used the typical black earth area in Heilongjiang Province as the study area, and showed that the sensitive bands were 445-1380 nm, the significantly correlated spectral range was 545-1250 nm [33]. Luan et al. found that saline-alkali SOM had a high correlation with the spectral reflectance at 560-750 nm and 760-1000 nm [34]. Ji et al. found that although the SOM feature bands of different sorts of soils in different regions are different, most of the feature bands are concentrated around 600-800 nm, which shows that the 600-800 nm band is universal for SOM content analysis of various soil [35].

C. WAVELENGTH SELECTION USING THE SA-SPA
The aim of this study is to utilize as few wavelengths as possible, while providing accurate retrieval of SOM content. To select the most eficient wavelengths for the retrieval of SOM content, the SPA was used to select wavelengths in the range of 450-1020nm with the calibration set. According to the results of Table 2, the number of wavelengths has changed from the original value 2051 to 6, and the root-mean-square error of cross validation (RMSECV) has been reduced from 0.391 to 0.387. Not only has the number of wavelengths been reduced, but the accuracy has also been improved.
The selected wavelengths 468, 476 and 496nm correspond to the specific absorption peak of SOM around 400-500nm.
As to the selected wavelengths of SOM at 599nm and 775nm,   they match the SOM specific absorption peak around 620-700nm. The selected wavelengths 900nm corresponds to the specific absorption peak of soil iron at 900nm [36].

D. VALIDATION USING THE EXPERIMENTIAL VALIDATION SET
The optimal wavelength combination to detect SOM content was settled using the calibration set. However, the performance of these wavelengths needs further validation using the validation set. SOM content was respectively estimated at selected six wavelengths by the SOM content retrieval model which is the inversion of the SOM radiative transfer model. The RMSEPs between retrieved and measured SOM content were respectively computed at selected six wavelengths. Fig. 4 shows that SOM content can be estimated with high accuracy (RMSEP < 0.234%, R 2 > 82.9%) using the validation set at the selected six wavelengths. Especially at 599nm, the accuracy of SOM estimation is the highest (RMSEP: 0.176%, R 2 : 90.4%). Besides, The RMSEP using PLS with the selected six wavelength combination was computed. Compared with the SOM radiative transfer model inversion, the RMSEP with PLS is bigger, which is 0.219%.
It is worth noting that prediction accuracy of SOM content with SOM radiative transfer model inversion is calculated at selected six wavelengths, respectively. However, the prediction accuracy of SOM content with PLS is calculated using the selected six wavelength combination. Therefore, SOM radiative transfer model inversion can be well applied to the prediction of SOM content with higher accuracy and less wavelengths. Regardless of whether the selected wavelengths are used to estimate SOM content using the SOM radiative transfer model inversion or statistical method PLS, its accuracy is high, which verifies the validity of the selected wavelengths.

E. COMPARISON WITH TRADITIONAL EMPIRICAL WAVELENGTH SELECTION METHODS
In order to further verify the effectiveness of the SA-SPA method proposed in this study, it was compared with ICO-SPA, CARS-SPA and GA-SPA method for selecting optimal wavelength combination to detect SOM content. The performances of the wavelengths selected by these four methods were compared and analyzed using the validation set ( Table 3). The number of wavelengths selected by the four methods is not much different. The comparison shows that the performance of wavelengths selected by the SA-SPA method (RMSRCV<0.234%) is better than ones selected by the other three methods (RMSRCV values of 0.309%, 0.328% and 0.357%, respectively). Provided that the selection was made based on a physical model and validated using experimental samples, these wavelengths are useful for predicting SOM content on a large scale. Additionally, compared with traditional empirical wavelength selection methods, the wavelength selection based on the SOM radiative transfer model improves the generalizability and accuracy of the result However, this method has the following limitations: (1). The experimental samples in this study include black soil, chernozem soil, and meadow soil. Some other wavelength combinations aside from those presented in this study may be more efiective when detecting other specific types of soil.
(2). The certain limitation of the SOM radiative transfer model is that it contains two unknown parameters and thus requires soil information a priori to be solved (i.e. calibration) and thus requires soil information a priori to be solved (i.e. calibration).

IV. CONCLUSION
Six wavelengths were selected in this study through the SOM radiative transfer model to estimate SOM content. This method avoids the problem that statistical methods require a large amount of actual measurement data, and the performance is directly related to the quality and representativeness of the ''training data''. The main conclusions of this study are summarized below: (1). This study firstly built the SOM radiative transfer model based on the KM theory. Then the sensitivity of the SOM in radiative transfer model to soil reflectance was analyzed. According to the result of sensitivity analysis, the wavelengths in the 450-2500 nm spectral range were classified. The distinguished sensitive wavelength interval of SOM is 450 nm to 1020 nm. Compared with traditional empirical wavelength selection methods, the wavelength selection based on the SOM radiative transfer model improves the generalizability and accuracy of the result.
(2). The sensitive wavelength points of SOM were determined by using SA-SPA: 468, 476, 496, 599, 775 and 900nm. The validation set was used to estimate SOM content at the selected wavelengths, which verifies the effectiveness of the method. Compared with ICO-SPA, CARS-SPA and GA-SPA method for selecting optimal wavelength combination to detect SOM content, the performance of wavelengths selected by the SA-SPA method (RMSRCV<0.234%) is better than ones selected by the other three methods (RMSRCV values of 0.309%, 0.328% and 0.357%, respectively).The research results provide theoretical basis and technical support for the remote sensing retrieval of SOM, the development of rapid spectral measurement instruments, and the setting of sensor instrument bands. The radiative transfer model in this study only takes the influence of SOM on reflectance into account, ignoring soil moisture, mineral composition, organic matter, nutrients, etc. Future studies are underway to improve the radiative transfer model by synthetically thinking over the influence of SOM, soil moisture, etc. on reflectance. The contribution of different parameters to soil reflectance in radiative transfer model needs to be made a thorough inquiry in order to obtain sensitive wavelengths of SOM, soil moisture, etc. His current research interests include integrated design and optimization of optical machine structure and structural topology optimization design. VOLUME 8, 2020