Application of Hyperspectral Technology Combined With Bat Algorithm-AdaBoost Model in Field Soil Nutrient Prediction

This paper proposes a hyperspectral soil nutrient estimation method based on the bat algorithm (BA)-AdaBoost model. The spectral reflectance, the first derivative of the reflectance, and the reciprocal logarithm of the reflectance are analyzed based on the 800 field soil samples and their hyperspectral data collected. The first derivative of the reciprocal logarithm of the reflectance and the sensitive band was extracted using the correlation coefficient method, and the correlation of the content of soil organic matter, phosphorus, and potassium was solved. The BA is used to optimize the two core parameters of the AdaBoost model (i.e., the maximum number of iterations <inline-formula> <tex-math notation="LaTeX">$(n)$ </tex-math></inline-formula> and the weight reduction coefficient <inline-formula> <tex-math notation="LaTeX">$(v)$ </tex-math></inline-formula> of the weak learner), the classification and regression trees(CART) decision tree is selected as the weak regression learner of the model, and the coefficient of determination is used as parameter optimization. Based on the objective function value, a BA-AdaBoost model was constructed to estimate soil organic matter and phosphorus and potassium contents. The results show that the BA-AdaBoost combined model can better search for globally optimal parameters. The AdaBoost model optimized by BA significantly improved accuracy and reliability. Among the three elements, soil organic matter estimation accuracy is the highest, and the coefficient of determination and the root mean square error are 0.867 and 0.151g <inline-formula> <tex-math notation="LaTeX">$\cdot $ </tex-math></inline-formula> kg−1, respectively. Compared with the model before optimization, the model accuracy and reliability improved by 29.0% and 24.1%, respectively. The results indicate that hyperspectral technology combined with the BA-AdaBoost model has certain application prospects in field soil nutrient estimation.

estimate K element in soil [25]. He. B. et al. Combined 99 support vector machine and chaotic whale optimization algo-100 rithm to estimate soil moisture of corn [26]. 101 At present, using CNN, SVR and other algorithms to estab-102 lish the inversion model between soil hyperspectral charac-103 teristics and element content is the main method with which 104 to estimate soil element content [27]. CNN, SVR and other 105 algorithms usually need to set some model parameter values 106 in advance [28]. Because the predefined parameter values 107 may not contain the globally optimal parameter values, the 108 aforementioned model cannot achieve the best effect [29]. 109 In order to overcome the problems of machine learning mod-110 els in finding the best model parameters, classical optimiza-111 tion algorithms such as the genetic algorithm and the particle 112 swarm optimization algorithm are used to optimize the inter-113 nal parameters of machine learning models [30]. However, 114 classical optimization algorithms such as particle swarm opti-115 mization (PSO) are sensitive to the initial parameter setting, 116 and it is easy to fall into the locally optimal solution in the 117 optimization process, resulting in slow convergence speed 118 in the later stages of the algorithms [31]. The bat algorithm 119 is a new swarm intelligence method. In the process of opti-120 mization, the bat algorithm imitates the adaptive adjustment 121 process of bat acoustic pulse loudness and frequency to 122 realize the free switching between global optimization and 123 local optimization so as to balance the global search ability 124 and local search ability of the algorithm and perform well 125 in the optimization calculation of model parameters [32]. 126 AdaBoost makes good use of weak classifiers to cascade and 127 has high prediction accuracy. However, the maximum number 128 of iterations N and the weight reduction coefficient V of weak 129 learners in AdaBoost model are not set well, and training is 130 time-consuming. 131 This paper studied the applicability of the mixed Ba 132 AdaBoost model in estimating the contents of organic matter, 133 phosphorus and potassium in agricultural landscape soils. 134 Specifically, the objectives are (1) to develop a mixed Ba 135 AdaBoost model using 800 collected field soil samples and 136 their hyperspectral data, and to compare the prediction per-137 formance of the mixed Ba AdaBoost model with that of 138 AdaBoost model, MLR, CNN and SVR methods. First, 800 field soil samples and their hyperspectral data 141 were collected, and the content of OM, P, and K was deter-142 mined in the corresponding samples. Through the hyper-143 spectral characteristic transformation of soil nutrients, the 144 spectral reflectance, the first-order differential of reflectance, 145 the reciprocal logarithm of reflectance, and the first-order 146 differential of the reciprocal logarithm of reflectance were 147 solved. Sensitive bands were extracted using the correlation 148 coefficient method, and the correlation of OM, P, and K con-149 tent was analyzed. The bat algorithm is used to optimize the 150 two core parameters of the maximum number of iterations (n) 151 and the weight reduction coefficient (v) of the weak learner 152 in the AdaBoost model. The CART decision tree is selected 153 VOLUME 10, 2022 The remainder of this paper is organized as follows: 168 Section 2 discusses related work, Section 3 discusses the 169 proposed method in detail, Section 4 discusses the results and 170 discussion, and Section 5 draws a conclusion.

172
The traditional detection method of soil nutrients is that of 173 field sampling followed by laboratory analysis [33], [34]. monitor soil nitrate-nitrogen [35]. Moreover, JCV Puno et al. 176 used genetic algorithms for the qualitative level classifica-177 tion of soil nutrients [36]. A large number of scholars have 178 also installed sensor devices in fields to measure soil mois-179 ture, electrical conductivity, and pH to invert soil nutrient 180 content [37], [38]. to reveal the soil reflectance spectroscopy mechanism to 190 estimate total nitrogen content [39]. In addition, Guigue et al. 191 studied the hyperspectral inversion method of organic matter 192 and total phosphorus and concluded that the influence of 193 different land use methods on the accuracy of organic matter 194 inversion can be ignored but that the total nitrogen needs to be 195 differentiated and modeled [40]. Moreover, Greenberg et al. of inputting existing knowledge into computers and explor-208 ing new knowledge discovery. The introduction of machine 209 learning theory in this field has become a research hotspot. 210 Support vector machines [43], manifold learning [44], firefly 211 algorithms [45], neural networks [46], random forests [47], 212 decision-making trees [48], and other machine learning meth-213 ods are used in soil classification, content estimation, model 214 optimization, automatic interpretation, and feature recogni-215 tion. The conventional method matches spectral data or their 216 variation with laboratory data one by one, which are inputted 217 into a computer to obtain the information discovery model 218 with which to calculate the soil composition content repre-219 sented by the unknown spectrum [49]. 220 Balancing the problem of local and global optimums is 221 difficult due to the large randomness of machine learning 222 model parameter settings [50]. This paper analyzes in detail 223 the spectral characteristics of OM, P, and K elements in the 224 fields, selects the optimal spectral transformation form of the 225 elements and the sensitive band with high correlation, and 226 combines the bat algorithm and the AdaBoost machine learn-227 ing model to build a soil nutrient content prediction model. 228 The bat algorithm is used to solve the key parameters in 229 the AdaBoost model modeling, and the prediction accuracies 230 before and after the optimization of the model parameters 231 are compared, providing an efficient new method for the 232 hyperspectral prediction of soil nutrient content in fields.

235
The soil sample collection was conducted in the Xinjiang 236 Autonomous Region, China, and the location of the soil 237 sample collection is shown in Figure 1. A 5 km grid was 238 set as the sampling unit. To enable the sampling points to be 239 arranged so as to represent the soil properties of the sampling 240 unit, 800 soil samples were collected in the experimental 241 area. During the sample collection process, the sampling 242 distance was at least 150 m from the road. The position of 243 each sampling point was the center, and the samples were 244 collected within a range of 5 × 5 m in circumference. Five 245 black soil samples were collected at each sampling point, 246 and the sampling depth was within 15 cm of the topsoil. The 247 samples were fully mixed and then placed inside of sampling 248 bags.

292
The distance between the sample collection area and the 293 laboratory is long, so it is inevitable that the temperature and 294 moisture of the soil will be affected during the transportation 295 of soil samples. We also attach great importance to samples 296 in the process of transportation. After collecting soil samples, 297 we should put them in a dark and low-temperature (4 • C) 298 closed environment as soon as possible to maintain the stabil-299 ity of soil water content. The dark environment is to avoid the 300 growth of algae in the soil under light. The low temperature is 301 to reduce bacterial reproduction and maintain the stability of 302 microbial flora. Polyethylene bags shall be used as packing 303 materials for loose binding. In addition, in order to avoid 304 overlapping of sample bags during storage, the method of 305 paving should be adopted to avoid damaging the original 306 aggregate structure of the soil and causing the underlying 307 samples to be in an anaerobic environment. The process comprised removing the plant tissue, gravel 311 and other sundries in the sample, and then grinding and 312 screening after air drying so that the particle size of the soil 313 was <0.25 mm; The samples were divided into two parts, 314 namely one for the determination of soil element content and 315 the other for indoor hyperspectral measurement. The potas-316 sium dichromate volumetric method was used to determine 317 the content of soil organic matter. The delta professional 318 4050 portable X-ray fluorescence spectrometer (Olympus of 319 the United States) was used to measure the content of soil 320 phosphorus and potassium. The equipment was equipped 321 with an Au target micro X-ray exciter and adopted an SDD 322 detector cooled by Peltier semiconductors. The detector area 323 was 25 mm 2 and the energy resolution is 128ev, It could 324 simultaneously measure the content of 38 elements (includ-325 ing soil phosphorus and potassium). Thereafter, the mea-326 surement results of element content were statistically ana-327 lyzed according to [51], and the results are presented in 328 Table 1. 800 soil samples were divided into 200 groups 329 according to the element content from low to high. One 330 sample was randomly selected from each group and put into 331  data of the soil samples. The data before 400nm and after 360 2,450nm were excluded because spectral data in the ranges 361 of 350-399nm and 2,451-2,500nm has high noise and a 362 low signal-to-noise ratio, which interferes with analysis of 363 the relationship between soil elements and reflectance. The 364 sampling interval of the spectrometer was 1 nm (i.e., 2,051 365 bands were obtained in the range of 400-2,450 nm). Due 366 to the high spectral resolution and a large number of bands, 367 information overlap may happen between adjacent bands, 368 which is more susceptible to noise. Therefore, the spectral 369 data were resampled and the sampling interval was set to 370 10 nm. Based on denoising and resampling processing, the 371 original spectral reflectance was transformed with features 372 such as the first-order differential, reciprocal logarithm, and 373 so on. Different transformation forms can help to find peaks 374 and valleys accurately and quickly and determine the corre-375 sponding wavelengths through peaks and valleys to determine 376 a sensitive band. The bat algorithm is a heuristic search algorithm. It simulates 380 bats using sonar to detect prey and avoid obstacles. Moreover, 381 the optimization search process simulates the process of bats 382 flying to find prey. In the calculation process, the fitness value 383 of the problem is used to select the position of the bats using 384 the evolutionary process of survival of the fittest to simu-385 late the iterative search process in which the better feasible 386 solution replaces the poorer feasible solution [52]. Because 387 Ba has the characteristics of fast speed and few parameters, 388 it has been widely concerned in many optimization problems 389 and has shown good performance. Based on the basic BA 390 principle, the heuristic search starts at a random position (Z i ) 391 in the d-dimensional search space after the parameters of the 392 algorithm are initialized. The prey was searched at a fixed 393 frequency of different wavelengths and sound intensities. 394 Bat algorithm flowchart. As shown in Figure 3. During the 395 search process, a bat automatically adjusts the wavelength 396 following the distance to the prey. After the global search, 397 the flight speed and spatial position of each bat are updated 398 and the fitness value of the objective function is calculated. 399 The updated formula for the speed and spatial position is 400 Formula (1): unified core initializer are used. At the same time, a pooling 441 layer Max pooling is applied, the pooling size is 1, and the 442 maximum value of each neuron cluster is in the previous 443 layer. After the pooling layer of CNN, the dropout layer with 444 a rate of 0.001 is used to adjust the overfitting; The pooling 445 layer also helps to overcome overfitting. A fully connected 446 layer, called 'dense', is applied to the output layer, where 447 a linear activation function is used. Finally, the model was 448 compiled using the MSE loss function and adagrad optimizer 449 with a learning rate of 0.01.

450
The structure of CNN model is: The selection of a characteristic wavelength is very important 476 in establishing a stable spectral model. When using full-477 band spectral data to establish the model, not only is the 478 calculation workload heavy, but the prediction accuracy of 479 the calibration model will make it difficult to reach the opti-480 mal value. Therefore, it is necessary to select the character-481 istic wavelength before establishing the calibration model. 482 At present, commonly used characteristic wavelength selec-483 tion methods include the correlation coefficient method, anal-484 ysis of variance, stepwise multiple linear regression, particle 485 swarm optimization, the inverse interval partial least squares 486 method, and the continuous projection algorithm. Among 487 them, the continuous projection algorithm is a new wave-488 length selection method which has become more and more 489 widely used. This study also uses this algorithm to extract the 490 characteristic bands. SPA is a variable group that can fully 491 find the minimum redundant information from the spectral 492 information so as to minimize the collinearity between the 493 variables. It is a deterministic search method, and its variable 494 VOLUME 10, 2022 selection results are reproducible, which will be more robust Step 1: let i = 1, assign the k-th column of the spectral 505 matrix (X) to x k (1) , that is, k (1) = k, x k(1) = x k , and let Step 2: the wavelength vector (x j ) that has not been 508 selected; the position set of is marked as Step 3: construct orthogonal projection operator Step 4: calculate the orthogonal projection vector x Step 5:  (2) and (3): The lower the RMSE calculated by the prediction model 528 and the closer R 2 is to 1, the higher the accuracy and stability 529 of the prediction model.  Figures 3(b)-(d) are the spectra of the original 545 reflectance transformed by the first-order differential (R ), 546 the reciprocal logarithm (lgl/R), and the inverse logarithmic 547 first-order differential [(lg1/R )], respectively. Moreover, the 548 first-order differential transformation can amplify the orig-549 inal spectral changes, and the reflectivity fluctuates more 550 at 1,400nm, 1,900nm, and 2,200nm after transformation. 551 To optimize the display effect, only 30 random groups of soil 552 sample spectra and their transformation curves are presented 553 in Figure 3. SPA was used to calculate the correlation coefficient of soil 557 organic matter, phosphorus, and potassium content and soil 558 reflectance, and the correlation coefficient curve was drawn 559 (Figure 4). From the correlation coefficients of the original 560 spectra in Figure 4(a), soil organic matter and phosphorus 561 element contents are observed to be negatively correlated 562 with spectral reflectance, while potassium element content 563 is inversely correlated. Compared with the original spectral 564 reflectance, the correlation between the transformed spectral 565 data and OM, P, and K content is higher, and the correlation 566 coefficient between the first-order differential transformation 567 and OM, P, and K content is positive and negative. More 568 peaks and troughs were noted, and the highest correlation 569 coefficient of each element was significantly improved after 570 the first-order differential transformation (Figure 4b, c and d). 571 In order to enable original reflectance, R', lgl/r and 572 [(lg1/R')] transformation curves to select a certain number 573 of characteristic bands, the median of 0.4 of the medium 574 correlation coefficient (0.3-0.5) was taken as the selected 575 threshold. The sensitive band with the absolute value of the 576 correlation coefficient >0.4 was selected as the sample input 577 in Table 2 of the prediction model. The statistical results show 578 that the correlation coefficient between soil element content 579 and spectral reflectance improved after different spectral fea-580 ture transformations ( Figure 5(b)). The absolute value of the 581 coefficient increased to 0.788, with the wavelength range 582 being ∼1,376mm. The best transformation form of the soil 583 phosphorus element is the first-order differential of the recip-584 rocal logarithm, with the absolute value of the correlation 585 coefficient being up to 0.590, and the wavelength is around 586 541 mm. For the soil potassium element, the corresponding 587 optimal transformation form is the reciprocal logarithmic 588 first-order differential, with the absolute value of the cor-589 relation coefficient being up to 0.634, and the wavelength 590 around 580 mm ( Figure 5(d)). According to different predic-591 tion objects (OM, P and K), the model input curve will also 592 change, and the best transformation form should be used as 593 the model input. The efficiency of the model was examined using the Taylor 657 diagram (Fig. 7). For the prediction of OM, the correlation 658 coefficient of Ba AdaBoost is 0.982, and the normalized 659 standard deviation is 0.523, which is the best result. The 660 correlation coefficient of SVR model is 0.783, and the nor-661 malized standard deviation is 1.0. The result is the worst. 662 Similar to the OM prediction, the correlation coefficient of 663 Ba AdaBoost is 0.960 and the normalized standard deviation 664 is 0.407 for the prediction of P, and the result is the best. 665 The correlation coefficient of SVR model is 0.715, and the 666 normalized standard deviation is 1.0. The result is the worst. 667 Similar to OM prediction, for K prediction, Ba AdaBoost's 668 correlation coefficient is 0.923 and normalized standard 669 deviation is 0.619, with the best result. The correlation 670 coefficient of SVR model is 0.736, and the normalized stan-671 dard deviation is 1.0. The result is the worst. To sum up, 672   It can be seen from the box plot comparison model (Fig. 8)

675
that for the predictions of OM, P and K, the median value 676 generated by Ba AdaBoost model is the most similar to 677 the observed value. The model is different in the lower 678 quartile, 25th percentile (Q25) and data range (maximum 679 and minimum), but it is better than the prediction effect 680 between spectral reflectance and element content. The cor-736 relation coefficients of OM, P, and K reach the maximum at 737 1,376mm,541mm and 580 mm, respectively.

738
(2) The BA-AdaBoost soil content prediction model is 739 constructed by combining the BA and AdaBoost models. 740 The combined model only needs to set the search space and 741 then automatically search for the optimal parameter value of 742 the model. Compared with the prediction accuracy before 743 and after optimization of the BA algorithm, the R 2 of the 744 BA-AdaBoost model increased, the RMSE decreased, and 745 the prediction accuracy significantly improved, which shows 746 that the BA-AdaBoost model has certain applicability in the 747 hyperspectral prediction of soil element content and expands 748 the application of machine learning models in the prediction 749 of soil composition.

750
(3) Through the comparison of Taylor diagram, box dia-751 gram and violin diagram, the mixed Ba AdaBoost model is 752 very close to the observation value, and has a good distribu-753 tion state and probability density fitting ability. However, it is 754 still necessary to further study the relationship among OM, 755 P and K, verify the robustness of Ba AdaBoost model under 756 different soil types, and verify its ability to explore different 757 soil moisture contents.