Prediction Dynamics in Cotton Aphid Using Unmanned Aerial Vehicle Multispectral Images and Vegetation Indices

Cotton harvest can be increased by having real-time information on the state of cotton aphid populations. However, traditional cotton aphid monitoring relies on ground sample methods supported by models such as linear regression, resulting in low forecast accuracy. Therefore, this paper purposes to enhance the precision of the remote sensing prediction model by investigating the cotton aphid prediction model construction approach. We explored the effectiveness of the XGBoost algorithm combined with the GWO algorithm and SVR method for cotton aphid prediction relying on vegetation indices derived from UAV multispectral photography. Originally, 12 indices related to cotton aphids were calculated by UAV multispectral reflectance. Additionally, the optimal index combination for pest prediction was determined utilizing analysis of correction and two-way ANOVA, combined with the XGBoost algorithm. Furthermore, a pest prevalence prediction model for cotton aphids was constructed via the SVR methodology associated with the optimal catalog combination, and the model was optimized using the GWO algorithm. Compared with the seven algorithms, experimental results demonstrate that the MSE and MAE of the XGBoost-GWO-SVR model are reduced by 90.20% and 70.36% (SVR), 90.14% and 70.26% (XGBoost-SVR), 7.47% and 0.14% (XGBoost-GA-SVR), 5.80% and 0.11% (XGBoost-PSO-SVR), 12.06% and 58.95% (LR), and 84.77% and 89.22% (BPNN), whereas the <inline-formula> <tex-math notation="LaTeX">$R^{2}$ </tex-math></inline-formula> is increased by 22.5% (SVR and XGBoost-SVR), 0.3% (LR), and 12.51% (BPNN). The <inline-formula> <tex-math notation="LaTeX">$R^{2}$ </tex-math></inline-formula> of the prediction model of XGBoost-SVR combined with GWO, PSO, and GA is not significantly different. Among these models, the XGBoost-GWO-SVR obtained the highest <inline-formula> <tex-math notation="LaTeX">$R^{2}$ </tex-math></inline-formula> of 0.980 and the lowest MAE of 2.838.


I. INTRODUCTION
Xinjiang is the largest cotton-producing region in the country. Pests inhibit cotton development throughout the year, resulting in 10-15% yield loss, and pest stress is continuously growing due to global climate change and insufficient ecological and natural management of cotton fields [1]. Cotton aphids are common pests in the cotton industry. Aphid damage results in early leaf development, slow growth, and a delayed growth cycle, all of which affect the quality and yield of cotton [2]. Aphis gossypii Glover (cotton aphid), Aphis atrata Zhang, and Acyrthosiphon gossypii Mordviko are the three aphid species that cause damage to cotton plants in Xinjiang. Cotton aphids can cause the greatest damage and incidence, and they have a high capacity to migrate, resulting in rapid pest outbreaks [2], [3].
Cotton aphids are migratory, movable, with rapid outbreaks and rapid spread, susceptible to environmental changes, with unstable population development and random distribution in cotton fields. Furthermore, they are affected by many characteristics with extensive coupling relationships. Thus, it is challenging to construct a general model for reliable prediction. Existing assessments and forecasts of the degree of cotton aphid infestation rely on human ground surveys, and prediction models built from field survey data offer higher accuracy but are challenging to promote for large-scale applications [4]. Intelligent information processing techniques are crucial for accurately predicting the dynamics of cotton aphid occurrence, understanding the changing patterns of cotton aphids in cotton fields over time, recognizing abnormal changes in cotton aphids, and providing early warnings [5], [6], [7]. Remote sensing technology detects insect activities by detecting damage inflicted by the insects such as sooty deposits, plant defoliation, color changes, and geometric deformation of plant tissue shapes [8], [9], [10]. Field pests can be recognized, studied, and forecasted using precise spectral information from airborne aerial and satellite sensors, as well as vegetation indicators [4], [11], [12]. Several signal processing techniques, machine learning methods, and pattern recognition algorithms have been continuously applied to models used in continuous monitoring, as UAV data and pest monitoring have improved, and machine learning models have seen widespread usage in pest detection [13], [14], [15]. In recent years, several researchers have employed hybrid models generated from several methodologies to make predictions, indicating that merging different models may improve the prediction performance [16], [17]. Despite tremendous progress, certain unavoidable limitations remain, such as a weak model generalization capacity, local optima, and under-and over-learning. The use of UAV multispectral data, vegetation index data, and machine learning approaches to target cotton aphids is received relatively little attention in prior research. In this paper, we employ a support vector regression (SVR, table 8 presents a cross-reference of phrases and abbreviations used in the text) model to predict the cotton aphid population status, integrating the benefits of XGBoost and numerous different optimization techniques, which may address issues such as the low accuracy of traditional cotton aphid prediction. SVR is a unique small-sample learning approach that is not based on chance or the law of large numbers. In comparison to other statistical approaches, SVR produces superior results for representative slice fitting, and it is better adapted to the current situation of a long time span and small sample size for cotton aphid pests [18]. Therefore, SVR has been widely used in many fields, such as in predicting vegetation coverage in desert areas [19], estimation capability of cotton leaf nitrogen [20], accurate temperature forecasting in the field of waterfowl breeding [21], predicting chlorophyll-a in water bodies [22], predicting surface temperature [23], estimating farmland surface soil moisture [24], and predicting lithium battery life [25]. However, the SVR model has a complex network and overfitting owing to the enormous variety of parameters, and the uncertainty of coefficient C and tolerance coefficient ϵ affect the model accuracy, which must be optimized. The XGBoost method is a data screening method that can effectively reduce overfitting and computational effort in wheat stripe rust prediction feature screening and parthenium weed classification [26]. The optimization algorithms of the SVR algorithm are the GA algorithm, PSO algorithm, and GWO algorithm, in which the GWO algorithm can quickly solve the optimal solution of coefficient C and tolerance coefficient ϵ. It has been applied to ship identification, data filtering, and path planning optimization [27], [28], [29].
The main contributions of this paper are as follows: • An improved XGBoost-GWO-SVR cotton aphid prediction model is proposed. It can estimate cotton aphid populations accurately and expand avenues for informative cotton aphid monitoring.
• Based on the information on vegetation spectral characteristics after the occurrence of pests and diseases, several new vegetation index algorithms are improved, such as the MSAVI-SAVI ratio index and the Simple ratio Green/Blue Vegetation Index.
• Compared with other algorithms, experimental results demonstrate that the XGBoost-GWO-SVR model can obtain better performance in prediction accuracy, and the combination of the GWO-SVR model and the XGBoost algorithm can enhance the prediction accuracy of cotton aphids. VOLUME 11, 2023  The rest of this paper is organized as follows. The materials collection and research methods are described in Section II. Experimental results are provided in Section III. The experimental results are discussed in Section IV. Finally, we conclude our work in Section V.

A. FIELD PREPARATION
The study was anchored on a 63-acre research field (41 • 11 ′ 25.1091 ′′ N, 82 • 51 ′ 37.9317 ′′ E; elevation: 896 m), located in Gülbargh Town, Shayar County, Xinjiang Uygur Autonomous Region, China. Potassium, phosphorus, and plant growth regulators were not administered in the cotton fields to achieve uniform field management guidelines. Fig.1 shows that cotton was planted at approximately 16,700 seeds per acre on April 19, 2021. Twelve plots (≈ 68 m long × 36 m wide [12 columns]) were arranged in a randomized complete block design (three replicates per treatment) with resident populations of cotton aphids (Aphis gossypii Glover). Cotton aphids are perennial cotton pests in Gülbargh, and their populations develop normally in late May. The number of cotton aphids peaked during late June and early July (Table 8).
The arthropod populations were monitored from May 20 to July 16, 2021. We monitored the cotton aphid population at 288 sample point locations in 12 plots starting with the introduction of cotton aphids into cotton fields, without the application of any chemical insecticides. Under natural conditions, enemies of aphids, such as ladybugs, lacewings, syrphid flies, and suppress aphid reproduction.
Ground-truthing data were collected by sampling aphids at 5-day intervals ( Table 8). The GPS locations and aphid populations were recorded for all spots showing symptoms of aphid infection. Five cotton plant sites were randomly selected from each sample point, a sample of 10 leaves (third mainstem node leaf from the top of the plant) was counted, and individuals were counted directly under natural conditions (without destroying the cotton) using 4× magnification. The accumulated aphid days for the period after treatment was estimated using the aphid populations.

2) UAV IMAGERY ACQUISITION
The low-altitude UAV camera system platform consisted of an M300 multi-rotor aircraft (DJI, Shenzhen, China) and a RedEdge-MX airborne multispectral imager (MicaSense, Seattle, WA, USA). The camera had a spatial resolution of 8 cm@120 m altitude hover-scan imaging, and five channels in the 400-900 nm band range were selected: blue (475 nm, 40 nm bandwidth), green (560 nm, 20 nm bandwidth), red (668 nm, 10 nm bandwidth), red edge (717 nm, 10 nm bandwidth), and near-infrared (840 nm, 40 nm bandwidth). The horizontal field of view was 47.2 • . Flight data were collected during simultaneous ground collection efforts from June 8 to July 16, 2021. Table 8 describes the cotton seeding date, and the first detection of aphids date, and also records information on the time of the drone data collection on days 50, 66, 73, and 88 from seeding. Flight data were used to correspond to a range of arthropod populations and cotton growth statuses. Images were acquired under clear and cloud-free conditions, and the sensor was calibrated using calibration plates before and after the flight. During the flights, the altitude of the UAV was maintained at 70 m over the ground, with an image resolution of 5 cm. The flight data were preprocessed using the Pix4DMapper software, and the resulting ortho-images were georeferenced and indexed using ArcGIS Pro software before being produced as single-band TIFF grayscale images. The research area is located in the county of shayar. Note: 70% of the sampled data is utilized for modeling, whereas 30% is employed for validation.
In addition, a linear resampling method was utilized to handle the UAV multispectral and cotton aphid survey data, and the resampled data were used for subsequent analyses.

C. VEGETATION INDEX EXTRACTION
Vegetation indices computed with help of spectrally susceptible bands can facilitate the prediction of crop pests. In this paper, 12 vegetation indices sensitive to cotton aphids were computed as initial features of the model aiming to the application of those indices in cotton aphid prediction on the basis of the features of cotton aphid damage [4]. Table 8 lists the indicator names, formulae, and references.
The reflectances of green (G), blue (B), red (R), nearinfrared (NIR), and Red Edge (RE) were chosen to create the vegetation index based on the features of the Red Edge-MX sensor. Because the proposed indices, such as ARI, TCARI, and SAVI, are based on hyperspectral or MERIS remote sensing data, and the reflectance of the required central wavelengths, such as 700 nm and 800 nm, are not available in UAV multispectral images, the nearest neighboring wavelengths are used to generate the updated indices instead.

1) THEORETICAL BASIS OF SUPPORT VECTOR REGRESSION
SVR is a support vector machine regression problem extension [41]. Compared with other machine learning approaches, SVR offers the benefits of using fewer samples, obtaining better global optimum solutions, and producing better outcomes when handling multidimensional nonlinear problems [27], [42].
For the nonlinear case, for a given aphid training sample D = [(x 1 , y 1 ), (x 2 , y 2 ), . . . , (x n , y n )], x ∈ R n are the input parameters and y ∈ R are the output parameters. The functional expression of the SVR corresponding to the VOLUME 11, 2023 optimization problem is described following: where w and b are the weights and biases, respectively, and the penalty factor C > 0 and the non-negative relaxation variables ξ n , ξ * n , ϵ are insensitive loss function parameters. To improve the generalization ability using formula (1), and reduce the error using formula (2), by introducing the Lagrange function, we can obtain formula (3) as follows: is the kernel function, a n ≥ 0 and a * n ≥ 0 are Lagrangian function multipliers. The goal of the kernel function is to compute the inner product of the original low-dimensional space vector in a high-dimensional space following a particular transformation. The polynomial, radial basis, and sigmoid kernels are examples of common kernel functions.

2) IMPROVED XGBOOST-GWO-SVR MODEL CONSTRUCTION AND EVALUATION
We utilized the SVR method for cotton aphid data with a long-time span, small sample size, remarkable randomness, and nonlinearity, which is more suitable for cotton aphid data analysis and prediction because of its small sample size, high accuracy, high generalization capacity, and good resilience. However, the SVR model is hampered by plenty of features in the dataset, which leads to overfitting or low fitting accuracy. Consequently, we utilized an appropriate optimization approach to address SVR's weaknesses of SVR.
First, the classic SVR technique is not ideal for large datasets because of its poor performance when the number of features per data stage exceeds the number of training data specimens, or when the dataset contains more noise. The natural habitat in which the cotton aphids were found in this study contained a variety of information. If all gathered data are entered into the SVR method, the precision of the model is compromised. Therefore, we employed the XGBoost method for input data screening to minimize the total of features in the training data samples and to increase the accuracy and execution efficiency of the SVR model. XGBoost utilizes a decision-tree-based integration technique that employs a gradient boosting algorithm to minimize the loss of previously created decision trees and generate new trees to construct the model, thereby ensuring the dependability of the final decision [43], [44]. The XGBoost algorithm provides the emphasis score of each feature in each iteration to improve the performance of a training process to generate a new tree, indicating the significance of each feature in the model being trained and providing a rationale for constructing a new tree by its orientation in the following iteration [45].
Therefore, this paper is inspired by the idea in the previous work with the RBF [46]. However, the RBF penalty coefficient C and tolerance coefficient ϵ can significantly affect the classification accuracy. The GWO method is a cluster intelligence optimization approach that creates optimum solutions without the need for any input parameters by utilizing multiple built-in functions [47]. The optimal penalty coefficient C and tolerance coefficient ϵ obtained from the GWO algorithm were substituted into the SVR for the construction of the XGBoost-GWO-SVR prediction model.
The fundamental idea of the cotton aphid prediction model, XGBoost, is utilized as a precursor data processing system for SVR, and the target dataset (UAV multispectral reflectance data, vegetation index data, and cotton aphid survey data) was screened for features, and the feature data were normalized by feature screening. Second, the SVR penalty coefficient C and tolerance coefficient ϵ are adjusted using GWO, and the tuning parameters are fed into the SVR algorithm. Fig.2 shows the construction process of The XGBoost-GWO-SVR prediction model.
To better evaluate the advantages of the XGBoost-GWO-SVR algorithm in the prediction model, two classical algorithms (LR and BPNN) and four SVR algorithms (SVR, XGBoost-GA-SVR, XGBoost-PSO-SVR, and XGBoost-SVR) were compared with the XGBoost-GWO-SVR algorithm. Seven prediction models for cotton aphids were constructed by combining two sets of features with seven combinations of the methods. A fold cross-validation method is performed to assess the accuracy and stability of the predictive models. In this methodology, the cotton aphid samples are subdivided into 10 subsets of similar size that are repelled by each other. Three data subsets are assigned as the test set and the remaining seven subsets are dedicated as the training set. Finally, the prediction model's reliability was assessed by using MSE, MAE, and coefficient of determination (R 2 ) of prediction results [48].
where y i is the actual value, g(x i ) is the predicted value,ȳ is the mean value, and n is the number of samples.
According to the preceding equation, the MSE represents the average square of the difference between the anticipated and actual values. The MAE represents the real situation of the inaccuracy of the predicted values, and R 2 is the criterion for assessing the goodness of the model.

A. OPTIMAL FEATURE SELECTION AND ANALYSIS
Previous research depended on the relevance of vegetation indicators in identifying the presence of vegetation pests, Fig.3 illustrates the determination squares of the correlation coefficients of vegetation index and multispectral reflectance data with cotton aphids. To establish confidence levels (p) for the healthy and sick subjects in the 12 vegetation indicators and five band reflectances, a t-test for each sample and two-way ANOVA were utilized. Cotton aphids exhibited a considerable potential in the p corresponding to each of the vegetative indicators throughout the figure (p < 0.001). The data in Fig.3 show that the R_Blue band has the lowest coefficient of determination value, indicating the significant relationship between R_Blue and the prevalence of cotton aphids, and the vegetation indices GNDVI and ARI have the ability to indicate aphid occurrence. Relying on the initial values of vegetation indices and UAV multispectral reflectance data, 12 vegetation indices and five reflectance data were discovered to better reflect cotton aphid pest incidence (p < 0.001), and these indicators were utilized as the shortlisting factors for later studies.
As shown in Fig.4, aphid data, UAV multispectral reflectance data, and vegetation indices have a random distribution; therefore, we employ the XGBoost method to compute data error values in order to categorize the relevance of the original feature variables. The better the element priority level, the further significant the feature [49]. Fig.4 shows that different indicators define aphids differently, with ARI having the most fantastic relevance score among the 19 indicators, owing to the aphids' ability to drain nutrients from leaves, resulting in unusually high anthocyanin concentrations [39]. Considering that selecting too many or too few features affects the model's accuracy and stability, we test various values to figure out the optimal number of features  Table 8 illustrates the results of assessing the importance of the features and combining them with MAE as the evaluation index, demonstrating that the influence of MAE worsens when N > 12. This study determines that the best combinations of environmental variables were ARI, GBI, GLI, TCARI, R_NIR, RRI, MSR_re, MSAVI_SAVI, ARVI, GNDVI, and SAVI.  Table 8 exhibits the findings of each model, including the MSE and MAE of various types of cotton aphid prediction models that combine the feature selection methods. According to the statistics in Table 8, XGBoost-SVR reduces MSE and MAE by 0.078% and 0.35%, respectively, when compared to a single SVR, and the model's efficiency is improved by reducing the number of features from 20 to 12. It can be observed that XGBoost can increase the efficiency of the algorithm by lowering the quantity of data supplied to the SVR to a certain level, and it can lessen the influence of irrelevant data on SVR accuracy.

B. MODEL ANALYSIS AND VALIDATION
Although XGBoost can improve the SVR efficiency to some extent, it cannot address the issue of insufficient SVR accuracy, which requires an optimization technique. The GWO technique is a heuristic optimization algorithm inspired by the prey hunting behavior of grey wolves, with a good   convergence capability, limited parameters, and a simple implementation [47]. The GWO method was utilized to optimize the penalty coefficient C and tolerance coefficient ϵ of SVR to increase the prediction accuracy and prediction speed of SVR by selecting the global optimal parameters for a total of 50 rounds with 10 rounds each; the addressing ranges of the penalty coefficient C and tolerance coefficient ϵ are (0,1000] and (0,10], respectively. The implementation results of XGBoost-GWO-SVR are given in Table 8. The optimum scale values using XGBoost-GWO-SVR are listed in Table 8. The above-mentioned data analysis and algorithm programming were conducted in Python 3.7. Table 8 displays the outcomes of each model, including the performance assessment indices for the proposed XGBoost-GWO-SVR model and the six comparator models. The cotton aphid prediction model employing the XGBoost algorithm  in conjunction with the GWO algorithm combined with the SVR method had the lowest MSE and MAE as well as the highest R 2 . The findings revealed that the XGBoost screening features as input variables paired with the GWO-SVR method could successfully forecast the incidence of cotton aphids; however, the SVR prediction model had the lowest accuracy and stability, and the MAE of BPNN was the poorest. When compared to the LR method, the combined XGBoost-GWO-SVR prediction model exhibits a 12.06% and 58.93% reduction in MSE and MAE, respectively, and improved R 2 by 0.3%. Compared with the BPNN model, the MSE and MAE were reduced by 84.76% and 89.21%, respectively, and R 2 was increased by 12.5%. Moreover, for the same variables, the XGBoost-GWO-SVR model had the best accuracy and correctness, the LR model had the second highest accuracy and correctness, and the BPNN model had the lowest accuracy and correctness. Therefore, the strategy used to construct a prediction model has a substantial impact on the accuracy of cotton aphid prediction. In comparison to LR and BPNN, XGBoost-GWO-SVR is based on a radial basis kernel function that addresses the problems of small sample size, randomness, and nonlinearity in cotton aphid monitoring. Furthermore, the accuracy of the prediction model was significantly improved by optimizing the penalty coefficient C and tolerance coefficient ε using the GWO algorithm. Overall, the cotton aphid prediction models developed using the LR, BPNN, SVR, XGBoost-SVR, XGBoost-PSO-SVR, and XGBoost-GA-SVR algorithms outperformed the XGBoost-GWO-SVR model. Furthermore, the performance of GWO, GA, and PSO in optimizing the cotton aphid prediction SVR algorithm after XGBoost-filtered features were compared in this work. The model accuracy and efficiency of the three optimization algorithms are significantly improved when compared with those of the SVR and XGBoost-SVR algorithms. The R 2 of the models obtained by all three optimization algorithms reached 0.980, while the MSE and MAE of the models were significantly reduced, indicating that the optimization algorithms  can improve the model accuracy of SVR. Compared to the XGBoost-SVR method, the MSE and MAE of the XGBoost-GWO-SVR model decreased by 90.18% and 70.25%, respectively, while its R 2 was improved by 22.5%. Table 8 shows that the MSE and MAE of the GWO method outperformed the other two optimization procedures, indicating that the GWO algorithm was more suitable for optimizing the SVR cotton aphid prediction model than the other two optimization strategies. Meanwhile, Fig.6 illustrates that the GA is less stable than the GWO and PSO methods.
The combined aphid population prediction model of XGBoost-GWO-SVR is compared with LR, BPNN, SVR, XGBoost-SVR, XGBoost-GA-SVR, and XGBoost-PSO-SVR to confirm the performance of the model described in this study. A collapsed cross-validation procedure was utilized in this study to evaluate the precision and robustness of the prediction models. Cotton aphid samples were split into ten mutually exclusive subgroups of identical size using this procedure. A training set of seven subsets of data and a test set of three subsets were employed. Finally, the cotton aphid prediction model's performance has been evaluated utilizing the results of MSE, MAE, and R 2 . Fig.7 illustrates the results of the experiments; the accuracies of the XGBoost-GWO-SVR model outperforms that of the LR and BPNN strategies, while SVR and XGBoost-SVR had the lowest accuracy, and the curve of the XGBoost-GWO-SVR model was closest to the true value. As a result, the XGBoost-GWO-SVR model has an excellent prediction effect and can reliably estimate the aphid population based on the given parameters.
In conclusion, we discovered that the enhanced XGBoost-GWO-SVR approach may overcome the shortcomings of the standard support vector regression methodology, such as overfitting, low accuracy, and slow response.
The experimental findings demonstrate that the 11 index factors screened by XGBoost increase SVR efficiency to some amount, but cannot solve the problem of insufficient SVR accuracy. As a result, optimization procedures are required to increase the accuracy and resilience of the SVR model. The GWO method has the best stability among the three optimization algorithms utilized in this study, whereas the GA algorithm has the poorest. When compared to previous prediction models, the XGBoost-GWO-SVR model has excellent stability and high accuracy, and it can predict the dynamic distribution of cotton aphid populations in near real-time using UAV multispectral data.

IV. DISCUSSION
The goal of this research is to investigate a generic technique for cotton aphid infestation prediction that is independent of ground surveys to take advantage of the benefits of remote sensing in monitoring broad regions while also reducing human and financial investment in ground surveys. In this study, several improved vegetation indices are proposed based on UAV multispectral data and the characteristics of cotton aphids. First of all, suitable remote sensing index feature factors are screened using feature screening methods, the SVR model is algorithmically optimized, and finally, the XGBoost-GWO-SVR model applicable to cotton aphid prediction is obtained. The experimental results show that the 11 index factors screened by XGBoost improve the efficiency of SVR to a certain extent, but they cannot solve the problem of insufficient SVR accuracy. Optimization algorithms can significantly increase the accuracy and robustness of the SVR model, and among the three optimization algorithms used in this study, the stability of GWO and PSO is approximated. However, the stability of the GA model is weaker. Compared with the LR and BPNN methods, the proposed XGBoost-GWO-SVR model has strong stability and high accuracy; thus, it is more suitable for prediction analysis of small samples and nonlinear data, such as cotton aphid infestation. The cotton aphid prediction model XGBoost-GWO-SVR, which combines the benefits of the XGBoost, GWO, and SVR algorithms, has strong cotton aphid prediction ability and may be used in cotton field production management.
The vast majority of vegetation indices employed in this study are broadband vegetation indices, which are less constrained by atmospheric conditions and loads than narrowband vegetation indices, and may be acquired from multispectral aerial or satellite imagery. However, the small-scale UAV multispectral data utilized in this study only include sensors from visible to NIR range of wavelength, with the SWIR band omitted, making identifying water-stress signs in cotton problematic. Vegetation indices, including those in the short-wave infrared band, could be considered in future studies to obtain better results. Furthermore, the adaptability of large-scale remote sensing data and its indexes in the XGBoost-GWO-SVR model must be investigated in future research. VOLUME 11, 2023

V. CONCLUSION
In this work, the XGBoost-GWO-SVR model on the basis of UAV multispectral data was implemented to predict the occurrence level of cotton aphids. The model was combined with GWO and SVR algorithms to develop a predictive model via feature selection of vegetation indicators employing the XGBoost feature selection algorithm. We can obtain the following conclusions.
1) The precision of the model obtained by the XGBoost features extraction method outperforms the models without the selection algorithm, demonstrating that the combination of XGBoost algorithms can provide better prediction of pests. 2) XGBoost was employed as the method of selecting features, the prediction accuracies of the GWO-SVR algorithm and the MSE, MAE, and R 2 of cotton aphids reached 196.567, 2.838, and 0.980, respectively, which are better than those of SVR, LR, BPNN, XGBoost-SVR, XGBoost-GA-SVR, and XGBoost-PSO-SVR for predicting cotton aphids. This observation shows that the combination of the GWO-SVR model and XGBoost algorithm can enhance the prediction accuracy of cotton aphids effectively, and it can provide a reference of methodology and technology for the treatment of early cotton aphids. Furthermore, when comparing the three distinct SVR model optimization approaches, the GWO algorithmoptimized prediction model has the highest accuracy, and the GA algorithm has the minimum accuracy. The findings show that the XGBoost-GWO-SVR model has excellent accuracy and performance, which can be useful in aphid prediction situations that require high precision and efficiency, as well as in cotton pest treatment.  CONG ZHANG was born in Zhangjiakou, Hebei, China, in 1995. He received the Bachelor of Engineering degree in information management and information systems from the Taiyuan University of Science and Technology in 2018. He is currently pursuing the master's degree in agricultural resources and environment with Xinjiang Agricultural University. His research interests include agricultural ecology, environmental protection, and remote sensing.
YAN LI was born in Liulang, Yunnan, China, in 1987. She received the M.Sc. degree from Guangzhou University, in 2011. She is currently pursuing the doctor's degree with the Kunming University of Science and Technology. Her main research interests include photogrammetry and remote sensing.
SHUANGYIN LIU received the Ph.D. degree from the College of Information and Electrical Engineering, China Agricultural University, in 2014. He is currently a Professor with the College of Information Science and Technology, Zhongkai University of Agriculture and Engineering. His current research interests include intelligent information systems for agriculture, artificial intelligence, big data, software engineering, and computational intelligence.