Metaheuristic Optimization Algorithms Hybridized With Artificial Intelligence Model for Soil Temperature Prediction: Novel Model

An enhanced hybrid artificial intelligence model was developed for soil temperature (ST) prediction. Among several soil characteristics, soil temperature is one of the essential elements impacting the biological, physical and chemical processes of the terrestrial ecosystem. Reliable ST prediction is significant for multiple geo-science and agricultural applications. The proposed model is a hybridization of adaptive neuro-fuzzy inference system with optimization methods using mutation Salp Swarm Algorithm and Grasshopper Optimization Algorithm (ANFIS-mSG). Daily weather and soil temperature data for nine years (1 of January 2010 - 31 of December 2018) from five meteorological stations (i.e., Baker, Beach, Cando, Crary and Fingal) in North Dakota, USA, were used for modeling. For validation, the proposed ANFIS-mSG model was compared with seven models, including classical ANFIS, hybridized ANFIS model with grasshopper optimization algorithm (ANFIS-GOA), salp swarm algorithm (ANFIS-SSA), grey wolf optimizer (ANFIS-GWO), particle swarm optimization (ANFIS-PSO), genetic algorithm (ANFIS-GA), and Dragonfly Algorithm (ANFIS-DA). The ST prediction was conducted based on maximum, mean and minimum air temperature (AT). The modeling results evidenced the capability of optimization algorithms for building ANFIS models for simulating soil temperature. Based on the statistical evaluation; for instance, the root mean square error (RMSE) was reduced by 73%, 74.4%, 71.2%, 76.7% and 80.7% for Baker, Beach, Cando, Crary and Fingal meteorological stations, respectively, throughout the testing phase when ANFIS-mSG was used over the standalone ANFIS models. In conclusion, the ANFIS-mSG model was demonstrated as an effective and simple hybrid artificial intelligence model for predicting soil temperature based on univariate air temperature scenario.

thermal characteristics, govern ST [4], [5]. Attempts have focused on finding the relation between soil temperature and other variables for prediction of ST [6].
The relationship between AT and ST is significant to determine the soil's heat regime. In some regions, ST is not measured and therefore the relationship between AT and ST can be used to construct a predictive model. The prediction of ST for any future time horizon without the need for direct measurement that is labour intensive and logistically expensive has a significant advantage of soft computing models that can help predict soil temperature for soil health and agriculture. Based on the predictive performance of the proposed hybrid intelligence model and for the univariate air temperature scenario, decision-makers can rely on data inexpensive approaches to better understand soil temperature dynamics to guide their practices.
About 54% of the total energy from the sun is reflected by the earth while retaining the remaining 46% [7]. The reason for this phenomenon is the slow heat transport capability of the ground; it is also due to the high heat storage capacity and slow changes in temperature over long periods with respect to the measurement depth. This soil's low thermal conductivity allows the release of heat through the cooling period and heat retention during the heating period. The heat retained in the soil during the hot season is released back to the air during the cold season and this heat exchange occurs throughout the year, making the soil a good thermal reservoir [8]. The ST during the winter season is higher than the temperature of the ambient air but reverses during the summer. This implies that ST can serve as an important meteorological parameter for different applications, including solar energy applications, frost prediction, agricultural applications, and ground source heat pump applications [9].
Several studies have been carried out recently on short to mid-term ST predictions in two specific categories [6], [10]. The first category focused on the use of statistical techniques, such as numerical weather prediction methods, which assume that changes in the statistical properties of ST data series in the future would be similar to those in the past [11]. These models for long-term prediction require large amounts of data which usually is not be available [12]- [14]. The second category uses artificial intelligence (AI) models, like support vector machine (SVM), artificial neural network (ANN), genetic programming (GP), gene expression programming (GEP), adaptive neuro-fuzzy inference system (ANFIS), decision tree (DT), M5 Tree, etc [10], [15]- [22]. Many studies have modeled ST as a nonlinear physical phenomenon [19], [20], [23]- [26].
One of the earliest studies used linear regression (LR) and ANN models for soil temperature prediction using several hydrometeorological variables including AT, atmospheric pressure, solar radiation, wind speed, relative humidity, and sunshine at Adana, Turkey [7]. Results of the ANN model showed better predictive performance. Xing et al. (2018) predicted daily ST for different seasons in different climatic zones at United States using the SVM model with AT and solar radiation as predictors. The results evidenced the capacity of the SVM model. Samadianfard et al. (2018) integrated wavelet transformation with ANN and GEP models for ST prediction at different depths in Tabriz, Iran. The predictors were sunshine, radiation, and AT. Sanikhani et al. (2018) used the extreme learning machine (ELM) model as non-tuned predictive model for simulating ST at different soil depths at Mersin and Adana, Turkey. The ELM model was validated against ANN and M5 Tree models. Different climate variables, including AT, solar radiation, relative humidity, and wind speed, were used as predictors.
Soil temperature is a stochastic variable of great importance in various morphological and engineering areas [29]. Different AI models have been developed irrespective of internal parameters tuning [15]. Selection of the best model parameters using an optimization tool will ensure better model prediction [30]. Hybridized AI models have become one of the most successful nonlinear time series prediction methods in the field of signals and time-series analysis [31], [32]. Many studies have applied hybrid AI models but are limited on the measurement of ST at both surface and underground levels. The major aim of this study is to develop a univariate machine learning (ML) model for ST prediction using correlated independent variables.
Among several ML models, ANFIS model have demonstrated an excellent predictive model for diverse engineering applications [33]- [39]. Based on the literature, the ANFIS model has been found to be a reliable intelligent model for simulating soil temperature, owing to its ability to account for the uncertainty of data [40]. However, this model is associated with the problem of internal membership function parameter optimization [41]. Hence, the ANFIS model is hybridized with new integrative bio-inspired optimization algorithm called Grasshopper Optimization Algorithm (GOA) and Salp Swarm Algorithm (SSA). Several standard hybrid models are developed for the proposed model ANFIS-mSG model. In this context, the SSA algorithm showed promising results in solving a variety of optimization problems [42]- [45], because it has many advantages such as low computational cost, easy to implement, and few parameters to be optimized. However, it has some limitations like other metaheuristic methods, including slow convergence and low exploitation ability as well as it can get trapped in local optima. Therefore, the proposed method helps the SSA by adding a mutation strategy from getting trap in a local minimum, evolution stagnation, and premature convergence. The mutation strategy can assist metaheuristic algorithms to guide their populations toward the global optimum rather than getting trapped in local optima as well as increase the diversity of the search domain to speed up the convergence rate. Hence, the first phase of the proposed method is to improve the original SSA, then use this phase as a local search for the GOA algorithm.
However, GOA, a recent metaheuristic algorithm, has drawbacks, including high computation time and premature convergence [46], [47]. Therefore, the second phase of the proposed method is to improve the exploration and exploitation of the original GOA using the first phase (i.e. mutation SSA phase). These two phases are mentioned in this manuscript as mutation-SSA-GOA (mSG).
The final phase of the proposed method is to apply the mSG to train the ANFIS model in order to improve the predictiveness of the original ANFIS. The entire proposed method is called ANFIS-mSG. Thus, this study is the hybridization of the ANFIS model with a nature optimization algorithm (i.e., ANFIS-mSG) to develop a hybrid ST prediction model.
The literature shows that hybrid models are receiving attention as evidenced by the number of studies integrating nature optimization algorithms with standalone ML models for solving hyperparameter problems. Until now, limited attention has been given to the use of metaheuristic optimization algorithms with ANFIS model for ST modeling. Therefore, this study aims to implement a suite of standalone and hybrid ANFIS models for ST modeling at different meteorological stations located in North Dakota, United States. This study used a univariate modeling scheme as only AT variability was used to model the ST. Indeed, establishing such a kind modeling process based on univariate weather information is highly essential for the regions where climatological or geoscience information availability are limited. In addition, establishing such intelligent predictive with less input variables to predict soil temperature is significantly important for multiple geoscience related implications that contributes to basic knowledge of various perspectives of soil engineering. The proposed ANFIS-mSG was validated against ANFIS, ANFIS hybridized with grasshopper optimization algorithm (ANFIS-GOA), salp swarm algorithm (ANFIS-SSA), grey wolf optimizer (ANFIS-GWO), particle swarm optimization (ANFIS-PSO), genetic algorithm (ANFIS-GA), and Dragonfly Algorithm (ANFIS-DA). The main contribution of the current research is the newly developed hybrid ANFIS model for ST prediction which is important for soil health, agriculture, and the ecosystem.

II. CASE STUDY AND DATA EXPLANATION
North Dakota (ND), located in the center of North America, has a typical climate with cold winters and hot summers. Its climate is characterized by large temperature variations, which cause different weather conditions for each of the four seasons. The eastern and western parts of ND have different climate conditions. According to the Köppen-Geiger climate classification system, the climate of the eastern part of ND has a humid continental climate, while the western part has a semi-arid climate [48]. As noted by U.S. Global Change Research Program (2000), the average temperature in ND has increased about 3 • C. Figure 1 presents the locations of the stations. Figure 2 illustrates variations of soil temperature over 2010-2018 for all five working stations (Baker, Beach, Cando, Crary, and Fingal) at 10 cm soil depth along statistical analysis of all metrological station (see Table 1). For Crary station, the variations of soil temperature throughout the consecutive nine years were persistent as well as extreme than at  ANFIS, developed by Jang [49], is a combination of fuzzy logic and neural networks, drawing on their advantages. ANFIS applies a Takagi-Sugeno inference method that generates nonlinear mapping, by the fuzzy IF-THEN rules, from input to output domains. It uses five layers to address its tasks. Figure 3a summarizes these layers and the following steps explain the working sequence of these layers.
Layer 1 passes the inputs x and y to the nodes to calculate the output of this layer using the generalized Gaussian membership µ as in the following equations [50]: where B i and A i denote the values of the membership µ; and σ i and ρ i denote the parameters set. Layer 2 applies Eq. 3 to calculate the output of each node (the firing strength of a rule). After that, the results are normalized in Layer 3 by Eq. 4: In Layer 4, the adaptive nodes are computed by Eq. 5: where r, q, and p denote the parameters of the i-th node. Layer 5 computes the output results using the following equation: In several cases the search domain of the ANFIS model becomes wider but the convergence becomes slower, therefore it can get trapped in local optima [51]. Consequently, training the weights of the ANFIS model is a valuable step to overcome such a problem.

B. SALP SWARM ALGORITHM (SSA)
Salp swarm algorithm (SSA), proposed by [42], is an optimization technique mimicking the behaviour of the salp chains in nature. This behaviour is considered as a swarm behaviour; the target of this swarm is a food source. Salps use this behaviour to forage and move with fast harmonious diversity [52]. The SSA is formed mathematical to be used in the computational process. The initial population of SSA, after generation, is divided into two groups. The front one is called salp leader, whereas the reset is called salp follower. The search space of a given problem is represented in n-dimensions where n is the number variables. The salp leader positions are frequently updated using the following equation: where x 1 j , ub j , and lb j denote the position, upper and lower bounds in j-th dimension, respectively; F j defines the food source; c 2 and c 3 are generated randomly in the range [0, 1] ; and coefficient c 1 balances the exploration and the exploitation stages, whereas the following equation is used VOLUME 8, 2020 to compute its value: where P and p denote the maximum number of loops and the current loop, respectively.
The followers' positions are also updated by the following equation: where i > 1 and x i j denotes the i-th follower position.

C. GRASSHOPPER OPTIMIZATION ALGORITHM (GOA)
GOA is an optimization technique and works as the grasshopper insects in nature ). The grasshopper is a kind of pest effects on agriculture and crop production. Its life cycle includes three stages: egg, nymph, and adulthood [54].
In the nymph stage, it moves slowly in rolling cylinders and jumps by small steps and eats vegetation in its path. In the adulthood stage, it uses swarm behaviour to migrate a long distance with long-range and abrupt movement. This behaviour can be mathematically expressed by the following equations [47]: where x i is the position of the i-th grasshopper, and S i is its social interaction that can be defined as follows: where d ij denotes the space between the j-th and i-th grasshoppers.d ij indicates a unit vector from the i-th to the j-th grasshoppers, and s denotes the social forces' strength that can be defined as: where l and f denote the attractive length scale and the attraction's intensity, respectively. The wind direction is a main factor in the movement of nymph grasshoppers because they have no wings.
In Eq. (13), A i and G i denote the wind advection and the gravity force for the i-th grasshopper, respectively: where u and g denote a constant drift and the gravitational constant, respectively; andê w andê g denote the unity vectors of the direction of wind and towards earth's center, respectively.
Taking these factors into account Eq. (13) is rewritten in Eq. (14) to be more suitable for searching for the solution of a given problem.
where u and l denote the upper and the lower bounds of the search domain, respectively;T d denotes the current best solution; N is the population number; D denotes the dimension of the problem; and c denotes a decreasing coefficient to provide good balancing between exploration and exploitation phases and can be calculated by the following equation: where c max and c min indicate the maximum and the minimum values, respectively, namely 1 for c max and 0.0001 for c min ; t denotes to the number of the current iteration; and t max is the maximum number of iterations.

D. PROPOSED METHOD (ANFIS-mSG)
The proposed method contains two improvement stages. The first stage is to improve the basic SSA using mutation phase in order to enhance its exploration phase. After that, the improved SSA algorithm is used as a local search for GOA algorithm; this stage produces an mSG algorithm. The second stage uses the mSG algorithm to improve and train the basic ANFIS model. These stages are explained in the following subsections.

1) FIRST STAGE (mSG)
In this stage the basic SSA is improved by adding a mutation phase in its structure. This phase generates a mutation vector x mu as in the following equation: where, i = 1, 2, 3, . . . , N ;whereas x q , x w , and x r are random populations; and δ is a constant ∈ [0, 2] and works to control the differential variation. Then this vector (x mu,i ) is evaluated by the fitness function; if its fitness value is better than the fitness value of the original vector (x i ), then it will be used, else the old fitness value will be retained. Consequently, the improved SSA is used as a local search for the basic GOA to help in exploring more domains in the search space.

2) SECOND STAGE (ANFIS-mSG)
In this stage the mSG algorithm is used to adjust the ANFIS's parameters by passing the optimal weights between layers 4 and 5. The ANFIS-mS starts by receiving the predictors and splits them into train and test sets as well as setting up all parameter values. The membership function is determined by using the fuzzy c-mean method. The next step is to apply the mSG algorithm to optimize and adapt the ANFIS's weight values, where the mSG searches for the best weights that can give the best solution by exploring various domains. In this stage, the mSG starts searching for the best parameters for the ANFIS. These parameters are fed to train the ANFIS, then the fitness values are evaluated by Eq. (17) to check if the candidate parameters are good and better than the old ones or the algorithm should search for another one.
where a denotes the actual values; p denotes the output values; and n is the size of inputs. MSE is the well-established metric used as determination for the fitness matrix, as reported over the literature [55], [56]. MSE helps to know how close the predicted results to regression line, since the smaller value of MSE indicates the good results. It can effectively work with large error whereas, it gives a relatively high weight to them. MSE is a simple and common metric for measuring the performance of the regression models besides, it does not affect the computational cost. Therefore, the best parameters will be considered according to the smallest error between actual and predicted values. The ANFIS-mS works until reaching the maximum number of iterations which will be considered as a stop criterion. Then, the selected parameters are fed to the ANFIS to start preparing the predicted results in the test phase. The entire process of the proposed ANFIS-mSG model is illustrated in Figure 3b. It is worth to report the magnitudes of the internal tuning parameters of each model and those values reported in Table 2.

E. MODEL DEVELOPMENT
This study aims to investigate the ability of the ANFIS-mSG algorithm for predicting the soil temperature. For this reason, the experimental dataset is divided into two parts; the first part is used for training the proposed method, whereas the second part is used for the testing phase.
In the training phase, the ANFIS-mSG begins by generating a random population x; each x i, (i = 1, 2, 3, . . .., N ) represents a solution.
This population is updated by the improved SSA and GOA using a probability (pb) to switch between them as in Equation 18: where f i is the current fitness value. If pb i > rand(), the improved SSA will update the solution, else the GOA will be used.
The reason for using this probability is to overcome the limitation of the original GOA, such as high computation time, premature convergence, and getting trapped in a local minimum. After this step, each solution is evaluated using the fitness function and consequently, the prediction error is calculated using Equation (17). The result of this equation is used to check if the current solution is better than the best solution or not. If true, the current solution will be saved for comparison in the next iteration. This sequence is repeated till meeting the stop condition which is the maximum number of iterations. Then, the best candidate parameter is passed to update the ANFIS model to start the testing phase.
In the testing phase, the ANFIS-mSG method receives the test data to evaluate the candidate parameter of ANFIS. The output of this phase is evaluated using seven performance measures as shown below.

F. PERFORMANCE METRICS
Seven performance measures are presented which were used to evaluate the proposed predictive models [57].
Root mean square error (RMSE): It is computed by the following equation: where a denotes the output values, p denotes the real values, and n is the total number of items. Standard deviation (STD): It is computed by the following equation: where µ is the mean value of the RMSE. Mean Absolute Error (MAE): Root Mean Squared Relative Error (RMSRE) Average Absolute Percent Relative Error (AAPRE) Coefficient of Determination (R 2 ): Nash-Sutcliffe efficiency index (NES): where, µ a is the mean value of the a values.

IV. RESULTS AND ANALYSIS
The main motivation of the current research was to investigate the feasibility of new hybrid intelligent models based on the integration the viability of nature inspired algorithms for optimizing the internal parameters of the ANFIS model and modeling soil temperature at five different meteorological station located in North Dakota, USA. The models were trained using a univariate modeling procedure based on AT data, including maximum, mean and minimum values. The proposed hybrid predictive model and the competing models were assessed and evaluated using statistical performance indicators. Figure 4 reveals two statistical performance metrics, including MAE and RMSE, for the applied predictive models (i.e., ANFIS-mSG, ANFIS, ANFIS-DA, ANFIS-GA, ANFIS-GO, ANFIS-GWO, ANFIS-PSO and ANFIS-SSA) for all five meteorological stations (Baker, Beach, Cando, Crary and Fingal). With respect to MAE and RMSE statistics, the ANFIS-mSG model had the lowest magnitudes, showing the best performance for all stations except Fingal station where both proposed ANFIS-mSG and ANFIS-GA models were at the top with 81% and 82% enhancement in the case of RMSE and MAE, respectively, in comparison with the ANFIS. The highest performance enhancement was observed using the proposed ANFIS-mSG by (82% of RMSE and 79% of MAE) at Cando station in comparison with the least performed model (i.e., ANFIS-GOA). The lowest MAE and RMSE values of ANFIS-mSG model appeared for Crary, almost equivalent for both Baker and Fingal stations, while slightly higher and simultaneously highest values of the same matrices for Beach and Cando stations. In contrast, the highest MAE and RMSE values appeared for ANFIS model alone for all stations except Cando where ANFIS-GOA ranked the highest for the same performance matrices (See Table 3-7). This can be explained by the most unstable soil and air temperatures throughout 2010-2018, among all stations (Fig. 2), also this model supported the unstable variability. The second-best performing model was the ANFIS-GA model of all stations except Crary station where ANFIS-PSO scored higher than ANFIS-GA as per the same values of performance matrices. This can be clarified due to the capacity of the genetic algorithm and the particle swarm optimization as a robust nature inspired optimization algorithms for tuning the internal parameters of AI models [50], [58]. In addition, this could be due to the extreme ST traced within a year among all meteorological stations and AT showed more observations of low temperature throughout the years.
The second least performing predictive model was ANFIS-GOA, followed by ANFIS-DA model as per the magnitude of MAE and RMSE metrics. The third best-performing model came up with higher values of MAE and RMSE (i.e., ANFIS-SSA) which can be a promising competitor with ANFIS-GWO model, except for Beach station where the RMSE value was higher for the ANFIS-SSA and nearly similar with the MAE value. The variation of the results was apparently owing to different geographical locations of Beach among all meteorological stations and may cause the variety in various metrological parameters (Fig. 1); in the case of AT it exhibited the low variability with a year. Figure 5a shows the scatter plots of actual and predicted values of training and testing phases of all models for Baker station. The ANFIS-mSG and ANFIS-GA performances were at the top (R 2 = 0.977) for Baker station and as well as for all other stations where those two predictive models were followed by ANFIS-PSO models. On the contrary, ANFIS-GOA performed with least statistical matrices score and was followed by ANFIS-DA and ANFIS models, whereas

ANFIS-SSA and ANFIS-GWO remained at the average level.
It can be noted that, ANFIS-GOA consumed the highest convergence time among all models, followed by ANFIS-mSG. The least time was scored by ANFIS model alone, whereas the remaining models took on an average form. Figure 5b shows the agreement between actual and predicted values of ST for the training and testing phases of the eight models at Beach station. The best performance was by ANFIS-mSG (R 2 = 0.967), followed by ANFIS-GA and ANFIS-PSO, despite having high variance in AT and a smaller number of the negative value of ST in each year. ANFIS-SSA and ANFIS-GWO models had typical prediction performance. However, ANFIS-GOA and ANFIS models performed with lower predictability; yet, were better than ANFIS-DA whose performance was the least among all, with determination coefficient value R 2 = 0.722. The highest time consumption was observed by ANFIS-GOA (T = 39.694 sec), followed by ANFIS-mSG (T = 29.912 sec), while ANFIS-PSO (T = 6.581 sec) was he least time-consuming model. Figure 5c illustrates the training and testing phases of the proposed and competing predictive models for Cando station. The highest agreement between observed and predicted values was by ANFIS-mSG (R 2 = 0.967) which was  almost similar to ANFIS-GA (R 2 = 0.966) and ANFIS-PSO (R 2 = 0.965). These models performed better even for negative values of AT that appeared more times and most of the value of AT and ST were near to the mean value. The distinguished least performance was by ANIFS-GOA model in respect of R 2 = 0.537 as well as T = 40.131 sec. The remaining models performed average (ANIFS-SSA, ANFIS, ANIFS-GWO and ANIFS-DA). In the case of time consumed by each of the models, the least time spent was by ANFIS model where ANIFS-GOA used up the highest, followed by ANFIS-mSG. Figure 5d shows the scatter plots of training and testing phases of each model for Crary station. In this case, ANFIS-mSG performed the best (R 2 = 0.976), likewise for other stations. Another observation was that all models performed well with the range of R 2 = 0.975 (ANIFS-GA) to R 2 = 0.953 (ANFIS). The ST and AT were observed almost stable with its peak value as well throughout the years. The ANFIS model had spent the least time T = 2.952 sec. On the other hand, ANIFS-GOA expended for the largest time (T = 39.515 sec), followed by ANFIS-mSG (T = 33.999 sec). The remaining models did not have a significant difference, the average time range was 6.5-11.4 sec. Figure 5e presents the scatter diagram of training and testing phases of the models to determine the relationship between the actual and predicted values for Fingal station. The ANFIS-mSG model showed the superior predictability. Likewise, at Fingal station, the least performing model was ANIFS-DA. ANFIS had an average performance in accordance with ANIFS-SSA, ANIFS-GWO and ANIFS-GOA, unlike other stations. Interestingly the time difference among all models was noticeable. Highest times were noticed for the ANIFS-GOA model, likewise other station, followed by ANFIS-mSG. By contrast, the least time (2.824 sec) was used by ANFIS and the remaining models had the average range of time (6-10 sec) consumed. Figure 6 presents the distribution of relative error of all models for the testing phase for all stations. The demonstration laid out the equivalent interquartile range (IQR) of ANFIS-mSG. In contrast, ANIFS-DA and ANFIS had unstable relative errors in the testing phase with noteworthy number of outliers. For instance, at Baker station, ANFIS has its median value tending towards the 1 st quartile (25 percentile of relative error) with the highest peak value in the case of minimum as well as maximum, while in the case of ANIFS-DA, it was opposite i.e. the median value was towards the 3 rd quartile (75 percentile of relative error). A distinguished position of the boxplot was unveiled by all models which were slightly on the upper side in comparison with other station.
The relative error at Beach station demonstrated no significant IQR difference between ANFIS-mSG and ANIFS-GOA. In contrast, ANIFS-DA, ANIFS-SSA, and ANIFS-GWO displayed a gap with an important number of outliers, along with the least IQR in the case of ANIFS-DA model. The ANFIS model stood with the extreme point of relative error along with the piece of IQR being higher than value +1 among all models, likewise in the case of Crary and VOLUME 8, 2020  Fingal stations. There was no weighty gap appearing in line to IQR between ANIFS-PSO and ANIFS-GA except for higher minimum value in the case of ANIFS-PSO.
Based on the relative error in the testing phase for Cando station, the box plot exhibited no significant IQR difference among ANFIS-mSG, ANIFS-GO, ANIFS-PSO and ANIFS-GA, and the median value was slightly bending towards 1 st quartile (Figure 6c). However, for the case of ANIFS-SSA, ANIFS-GWO and ANIFS-DA the median values were found towards the 3 rd quartile with outliers towards the minimum VOLUME 8, 2020    Figure 6e shows a noteworthy gap for IQR boxplots of ANIFS-GWO, ANIFS-DA, and ANFIS, whereas ANIFS-DA was found with a maximum number of outliers in the case of minimum as well as maximum, followed by ANIFS-GWO in line with outliers but limited to the minimum error value only. The median value of ANFIS and ANIFS-SSA slightly tended towards 3 rd quartile, while ANFIS had he largest value of IQR among all models with the least relative error. There was no substantial break between the IQR value of ANFIS-mSG, ANIFS-GOA, ANIFS-PSO and ANIFS-GA; and the evident point was the median value appearing exactly between 1 st and 3 rd quartiles.
It is worth highlighting that soil physical properties, such as bulk density, moisture content, organic matter, and mineral type, could affect the specific heat capacity and thermal conductivity of the soil and in turn affect the diffusion coefficient of soil temperature. With the increase in depth, the coefficient of thermal diffusivity decreases in soil. Furthermore, the surface temperature transfers with a smaller lag time to deeper layers. Thus, with increasing depth, the heat flux changes the temperature of a larger volume of soil and ST is reduced further.
Soil thermal conductivity and soil moisture content (related to soil texture) have a strong influence on soil temperature gradients. Many hydrological, biogeochemical, biological processes and atmospheric water cycle are influenced by temperature and soil moisture. By definition, soil moisture refers to the total water volume (water vapor included) in unsaturated soil. Many microbes depend on water for their activity and survival. The biogeochemical environment for microorganisms is determined by soil moisture dynamics, as it affects dissolved nutrient availability, including organic carbon, ammonium, and nitrate. Soil moisture is highly important for microbial activity and diversity regulation. Despite the importance of soil moisture, it has not been considered as a parameter for weather prediction due to the complexity of its routine measurement over large areas. Accurate prediction of surface soil temperature can be achieved using analytical models as long as all the necessary input parameters are available, although this process is not efficient always. Based on the developed hybrid intelligence model, ANFIS-mSG, can be an effective and simple technique of soil temperature measurement since it does not require several input parameters.

V. CONCLUSION
This study aimed to predict ST using a univariate modeling scheme by incorporating only the air temperature information. It was evaluated the ability of the proposed hybrid intelligence model (i.e., ANFIS-mSG) to predict ST at different meteorological stations (i.e., Baker, Beach, Cando, Crary and Fingal) of North Dakota (ND), USA. The proposed hybrid intelligence model was validated against several benchmark models (i.e., ANFIS, ANFIS-DA, ANFIS-GA, ANFIS-GO, ANFIS-GWO, ANFIS-PSO and ANFIS-SSA). The training and testing modeling phases were conducted based on historical information of maximum, mean, and minimum air temperature using daily records for nine years (1 of January 2010 -31 of December 2018). Multiple evaluation techniques including statistical and graphical were used to present the predictive models. The modeling results acknowledged the feasibility of hybridizing ANFIS model with optimization algorithms (mSG) for building a robust and efficient predictive model for soil temperature. The convergence time of the proposed ANFIS-mSG was relatively acceptable; however, the standalone ANFIS reported minimal time but with a high level of prediction error. At Carary station, there was a slight difference in the predictive performance among the developed hybrid models. The RMSE metrics was enhanced by 73%, 74.4%, 71.2%, 76.7% and 80.7% using the developed ANFIS-mSG over the standalone ANFIS model during the testing phase at Baker, Beach, Cando, Crary, and Fingal meteorological stations, respectively. In conclusion, the proposed hybrid artificial intelligence (i.e., ANFIS-mSG) model was found to be an efficient predictive model for soil temperature based on univariate air temperature scenario. His publications include more than 424 articles in international/national journals, chapters in books and 13 books. He has one patent on physical methods for the separation of iron oxides. He supervised more than 66 postgraduate students at Iraq, Jordan, U.K., and Australia universities. He executed more than 60 major research projects in Iraq, Jordan, and U.K. His research interests are mainly in geology, water resources, and environment. He was a member of several scientific societies, e.g., the International Association of Hydrological Sciences, the Chartered Institution of Water and Environment Management, the Network of Iraqi Scientists Abroad, and the Founder and the President of the Iraqi Scientific Society for Water Resources. He was a member of the Editorial Board of ten international journals. He awarded several scientific and educational awards, among them is the British Council on its 70th Anniversary awarded him top 5 scientists in Cultural Relations.
SURAJ KUMAR BHAGAT received the master's degree (Hons.) in environmental engineering from the Rajasthan Technical University, India, in 2014. He is currently pursuing the Ph.D. degree in civil engineering from Ton Duc Thang University, Ho Chi Minh, Vietnam. He also received a research fellowship at IIT Delhi, India, in 2013. In 2014, he joined the Department of Civil Engineering, Institute of Technology, Ambo University, Ethiopia, as a Lecturer, where he also led a funded project which received the Energy Globe Award, issued by the Austrian Energy Pioneer Wolfgang Neumann. He is currently an Operations Director (Volunteer) of the Agua International Water Relief (NGO), USA.
ZAHER MUNDHER YASEEN received the master's and Ph.D. degrees in hydrology, water resources engineering, hydrological processes modeling, environmental engineering, and climate from the National University of Malaysia (UKM), Malaysia. He is currently a Senior Lecturer and a Senior Researcher in civil engineering with Ton Duc Thang University. In addition, he has an excellent expertise in machine learning and advanced data analytics. He has published more than 100 articles in international journals with a Google Scholar H-index of 24, and a total of 1724 citations.
VIJAY P. SINGH received the B.S., M.S., Ph.D., and D.Sc. degrees in engineering. He is currently a Distinguished Professor, a Regents Professor, and the Caroline and William N. Lehrer Distinguished Chair in water engineering with Texas A&M University. He is also a Registered Professional Engineer, a Registered Professional Hydrologist, and an Honorary Diplomate of ASCE-AAWRE. He has published extensively in the area of hydrology and water resources, including 30 textbooks; Handbook of Applied Hydrology; Encyclopedia of Snow, Ice and Glaciers; 71 edited books; 114 book chapters; 1270 refereed journal articles; 330 conference proceedings articles; 13 special edited journal issues; 50 book reviews; and 72 technical publications and reports. For his seminal contributions, he has received more than 92 national and international awards, and three honorary doctorates. He is a member of 11 international science/engineering academies. He has served/serves as an Editor-in-Chief of three journals and two book series and serves on editorial boards of more than 25 journals and three book series. He has served as the President for the American Institute of Hydrology (AIH), the Chair of the Watershed Council, American Society of Civil Engineers, and the Vice President of the Indian Association of Hydrologists and of Association of Global Groundwater Scientists. He is currently the President of the American Academy of Water Resources Engineers.