Adaptive Artificial Neural Network Surrogate Model of Nonlinear Hydraulic Adjustable Damper for Automotive Semi-Active Suspension System

Artificial neural network surrogate models are often used in the design optimization of the automotive semi-active suspension system. To realize the desired damping force, a surrogate model needs to be constructed to approximate the regulating mechanism of the hydraulic adjustable damper. However, very few of the existing studies discuss how to guarantee the modeling accuracy. To this end, this work constructs a novel surrogate model by using radial basis function neural network. Meanwhile, an adaptive modeling method based on modified hyperband and trust-region-based mode pursuing sampling is presented. Concretely, modified hyperband is used to fast select a seed model by early-stopping and dynamic resource allocation. Mode pursuing sampling is then performed in the neighborhood of the seed model, to systematically generate more sample points while statistically covering the entire neighborhood (i.e., trust region). In particular, in the mode pursuing sampling procedure, quadratic regression is performed around the current optimum as the second detection. Moreover, as the position or size of the trust region changes, the sampling and detection process iterate until the accuracy of the model is no longer improved. To avoid falling into the local optimum, the seed model selection and mode pursuing sampling process iterate until the termination criterion is met. The experimental results show that compared with the benchmarks, the modeling accuracy of hydraulic adjustable damper can be improved by up to 48%, and the iteration resources can be reduced by up to 84%.

reliable performance, and low cost [13]- [15]. Hence, this research focuses on the SAS system. The controllable SAS system usually means that the damper is adjustable. Among them, the monotube hydraulic adjustable damper (HAD) using the hydraulic flow regulation method is the most popular in the automotive industry [16]- [18]. To facilitate the practical application of the monotube HAD, its regulating mechanism should be modeled accurately, which means that the characteristic of the monotube HAD should be available in the design stage. With regards to this, the analytical model and thermal effect equations of the monotube HAD were established in Ref. [19], and their feasibility were verified by the integral shock absorber testing (ISAT) system. On this basis, HAD data was further collected from the ISAT system to construct a gray neural network (GNN) surrogate model of the unknown regulating mechanism. To ensure the accuracy and decrease the computational cost, a further research using fuzzy neural network (FNN) has been conducted in Ref. [6], and performance improvement in modeling was obtained.
In theory, artificial neural network (ANN) has the capability of approximating any complex and non-linear function arbitrarily well, and can be applied in many fields as a prediction model or surrogate model, such as performance prediction and decision optimization [19]- [22], fault diagnosis and fault tolerant control [23]- [25], and image recognition and classification [26]- [28]. However, the approximation performance of different ANNs in engineering applications varies [6], [22]. In the existing researches, only a few ANNs are used to solve the HAD problem [6], [19]. Therefore, a new neural network should be investigated in order to accurately realize the desired damping force. To this end, this work constructs a novel surrogate model by using radial basis function neural network (RBFNN) [21], [22]. RBFNN is employed as it has fast convergence speed and powerful anti-noise and repair capabilities, which is more in line with the requirements of SAS systems.
For the practical application of ANN surrogates, the hyperparameters need to be intentionally tuned to accurately approximate the training data. Hyperparameter optimization is a time-consuming task. To reduce the computational cost, in the existing HAD researches [6], [12], only a few of hyperparameters have been optimized, which may lead to the failure of obtaining a high-precision model. To solve this problem, an adaptive modeling method based on hyperparameter optimization algorithm is presented in this work. The ''adaptive'' refers to automatic selection, relative to manual [6] and semi-adaptive [12] methods.
Hyperparameter optimization algorithm has been widely used in the field of deep learning, such as random search (RS) [28], [29], Bayesian optimization (BO) [30]- [35] and hyperband [36]. However, RS cannot guarantee the global optimum, and is inefficient in high-dimensional problems; BO that based on probabilistic model requires calculation of complex statistics. As the number of evaluated points increases, the calculation cost increases sharply, resulting in low efficiency; hyperband that integrates resource dynamic allocation and early stopping strategies into RS to quickly eliminate the hyperparameter configurations with early poor performance, which has been proved to obtain better results than BO. However, its performance is limited by the RS part. Therefore, to achieve efficient adaptive modeling, a new hyperparameter method which combines modified hyperband (MH) with mode pursuing sampling (MPS) [37] through a trust region (TR) strategy is proposed, and is referred to as MH-TRMPS.
In MH-TRMPS, TR indicates a region around the current optimum, and within which MPS is performed to obtain minimizer of this region. Different from existing gradient driven trust-region-based optimization methods [38]- [41], MPS search is independent in MH-TRMPS, and there is no need to calculate the step size or direction. Concretely, MH is used to fast select the current optimum. Meanwhile, a hypercube around the optimum is constructed and defined as TR. MPS is then performed in the TR to pursue a more accurate model. According to the performance of the MPS search during previous iterations, the size of TR is either remained the same or reduced by a preset ratio, but its center always falls on the optimum. If the candidate solution does not produce a sufficient increase in the modeling accuracy after multiple reduction of the size of TR, MH search will be conducted again to determine whether a new TR needs to be reconstructed.
The contributions of this work can be summarized as follows: (1) An efficient adaptive ANN surrogate modeling method based on MH-TRMPS is presented for the HAD problem. Numerical examples confirm that the proposed approach can obtain better performance than manual method, semi-adaptive method, and other adaptive methods. To the best of our knowledge, this is the first investigation of the adaptive ANN-surrogate for HAD problem.
(2) A novel surrogate model is constructed for the HAD problem by using RBFNN. Through the application on the simulation data of HAD, the modeling accuracy of RBFNN is better than those of MLP and FNN. Specifically, the application of RBFNN has not been used for HAD problem in the past.
The rest of this paper is organized as follows. Section 2 shows the related materials of HAD. Section 3 introduces the MLP, FNN, and RBFNN models of nonlinear HAD. In Section 4, MH-TRMPS method is presented in detail. Section 5 introduces the numerical experiments. The conclusion is given in Section 6. Fig.1 shows a commercial SAS system with HAD. (a) is the basic setup of a SAS system [42], and (b) is the structural diagram of a nonlinear HAD. In Fig. 1(b), the outermost side is the damper cylinder block, and the upper side is the damper end cap. Above and below the piston are oil reservoirs, respectively referred to as the upper hydraulic oil  chamber and the lower hydraulic oil chamber. At the bottom of the damper is a gas chamber, and between the gas chamber and the lower hydraulic oil chamber is a floating piston.

II. REGULATION MECHANISM OF THE NONLINEAR HAD
The working principle of nonlinear HAD consists of two parts: the compression stroke and the rebound stroke. For the compression stroke, as the piston moves downward, the hydraulic oil in the lower hydraulic oil chamber flows into the upper hydraulic oil chamber through the adjustable valve inside the piston, while pushing the floating piston downward to compress the gas in the gas chamber. For the rebound stroke, as the piston moves upward, the hydraulic oil in the upper hydraulic oil chamber flows into the lower hydraulic oil chamber through the rebound valve in the piston assembly and the adjustable valve inside the piston. In order to compensate for the volume difference caused by the piston leaving the cylinder, the floating piston moves upward.
The process of HAD data collection is shown in Fig. 2. From the ISAT, each group of data contains five variables: piston displacement x (m), cylinder surface temperature T ( • C), motor angle θ (degree), damping force F (N), and piston speed v (m/s). The first four variables are taken directly from the tester, and the piston speed v is obtained in the LabView program by the differential of the piston displacement versus time. In this work, the hydraulic oil temperature is replaced by the cylinder surface temperature T , because the hydraulic oil is sealed in the cylinder and it is difficult to measure its temperature. Then, the regulating mechanism of damping force for nonlinear HAD can be expressed as a function of VOLUME 8, 2020 the variables as follows: The values of the data collected directly from the ISAT differ widely, for example, the force range from −1000 to 3500, and the displacement range from −45 to 45. Consequently, the method in Ref. [6] is used to normalize the data and is defined as follows: where x i_min and x i_max represent the maximum and minimum values of x i , respectively, and x i are the variables after normalization.

III. ANN MODELS OF THE NONLINEAR HAD
In this section, three ANN surrogate models, i.e., MLP, FNN, and RBFNN, are designed for the nonlinear HAD. Besides, the learning algorithm for training the models is also introduced.

A. MODEL DESIGN 1) MULTILAYER PERCEPTRON (MLP) MODEL
MLP is a fully connected network, which has been widely used due to its low prior knowledge dependency. As shown in Fig. 3, the general structure of the MLP model usually includes an input layer, an output layer, and n hidden layers. Obviously, to construct the MLP model, the number of hidden layers n and the number of neurons m l in the hidden layers need to be determined first, where 1 ≤ l ≤ n. n and m l are the so-called structure hyperparameters of MLP, w l ij , b l are the parameters of MLP. Generally, the larger the training data set is, the larger n and m l should be chosen to ensure an accurate model. In the MLP model, the same activation function is usually used in hidden neurons to map the feature of the training data. In this work, the generally rectified linear unit (ReLU) [43] is chosen as activation function, because it does not have the problem of gradient vanishing when the input is positive. Moreover, its calculation speed is much faster than Sigmod and Tanh. The ReLU function is defined as: and y l j is calculated by where x l−1 i is the ith output of the (l − 1)th layer, y l j is the jth output of the lth layer, w l ji is the weight connecting the jth neuron of the lth layer and ith neuron of the (l −1)th layer, and b l is the bias of the lth layer.

2) FNN MODEL
FNN is a fixed four-layer neural network that combines the advances of fuzzy theory and neural network method. As shown in Fig. 4, FNN consists of input layer, output layer, and two hidden layers called fuzzy layer and rule layer. The output of fuzzy layer is defined as: where x i are the input variables, 1 ≤ i ≤ n, u ij is the clustered center of the jth neuron of the ith input, which obtained by k-means method, 1 ≤ j ≤ m, σ is the center width or standard deviation, m and n are the numbers of neurons and inputs respectively. As for FNN, σ and m are structure hyperparameters of the model given by the user. The output of rule layer is then calculated by the equation as follow: Finally, the output of the model is calculated by where w j are the weights of the output layer.

3) RBFNN MODEL
The basic form of RBFNN is shown in Fig. 5. The input layer consists of a number of source nodes (sensing units) that connect the network to the external environment. The second layer is the only hidden layer in the network, also known as the radial base layer. It is used for nonlinear transformation from the input space to the hidden space. The output layer is a linear layer. In the radial base layer, the Gaussian function is used as the activation function of neurons for nonlinear transformation, which is defined as: where x i are the input variables, i = 1, . . . , n, µ j is the center of the jth node of the radial base layer and is usually calculated by k-means method, j = 1, . . . , m; σ is the width parameter. σ and m are the user-defined parameters. In this article, m is set to the size of the training data by default. Therefore, only σ needs to be determined in RBFNN model. Finally, the output of the model is calculated by where w j are the weights of the output layer.

B. MODEL TRAINING
The parameters of the ANN surrogate model need to be tuned by gradient-based optimization algorithm to fit the training data accurately, such as the weight w, the bias b, the center µ, and the width σ . This study used adaptive moment (Adam) estimation algorithm [44] to optimize the parameter matrices of the designed models, due to its advantages of high computing efficiency, little memory required, and constant rescaling of the diagonal of the gradient. The code steps of Adam algorithm are summarized as follows: (1) At the beginning of the algorithm, the following parameters need to be initialized: the objective function f (θ ) (θ is the parameter matrices of the designed models), the initial learning rate η, the initial parameter vector θ 0 , and the other initialized parameters, such as exponential decay rates β 1 , β 2 ∈ [0, 1), timestep t = 0, first moment vector m 0 = 0, second moment vector v 0 = 0, and = 10 −8 .
(2) Get gradients with respect to the objective function at timestep t: (3) Update the biased first moment and second raw moment: where g 2 t indicates the elementwise square g t g t . (4) Compute the bias-corrected first moment and second raw moment:m (5) Update the parameters: (6) If the convergence conditions are satisfied, the resulting parameters θ t are returned; otherwise, go to step (2).
The process of tuning parameters by gradient-based algorithm is also called error back propagation, that is, the dotted path of e in Figs. 3, 4 and 5. In order to fine-train the model, the training data is usually divided into small batches. The batch size is a user-defined parameter. In this paper, the parameters of the Adam algorithm, as well as the batch size, are collectively referred to as the learning algorithm hyperparameters.

IV. ADAPTIVE MODELING OF ANN
In this section, the proposed MH-TRMPS hyperparameter optimization method and the adaptive ANN surrogate modeling process based on MH-TRMPS will be introduced in detail.

A. MH-TRMPS METHOD 1) MH ALGORITHM
Hyperband [36] is a general-purpose algorithm to draw a random sample from large-scale validation sets. This algorithm only requires knowledge of R and η. R represents the maximum amount of resources that can be allocated to any given hyperparameter configuration, and η is the down-sampling rate depending on R to yield ≈ 5 brackets with a minimum of 3 brackets by s max = log η R. The so-called bracket s (0 ≤ s ≤ s max ) is also the number of global sampling. Concretely, for any s, n = (s max + 1) × η s (s+1) points are sampled randomly, and each of points is allocated r = Rη −s resources to evaluate its function value. Then, a down-sampling process is performed. For each down-sampling, the best 1 η part will be retained from the n points after the running round. Hyperband was designed with multiple down-sampling in each bracket to gradually eliminate the early poor performance configurations. However, configurations with early poor performance may not necessarily result in final poor performance. In this situation, multiple down-sampling is futile, as many promising configurations may be randomly dropped at the first down-sampling. Moreover, multiple down-sampling means that black box function needs to be called frequently, which is time-consuming. Consequently, a modified hyperband is proposed.
Similar to hyperband, MH cannot guarantee to find the global optimum. In this situation, more 'intelligence' needs to be built in. A natural method is to construct a small region which centered on the current optimum searched by MH, and then to search the region and find the global optimum by using a model-based method, e.g., MPS. In previous studies, such a small region is often referred to as the trust region [40], [41]. Since a certain number of initial starting points are required in MPS to construct a response surface, and the evaluation of these points is expensive. Therefore, in MH, each bracket is designed to have only one down-sampling. And after down-sampling, the s best configurations in each bracket will be retained, instead of one.
As shown in Fig. 6, the maximal bracket s max and resource burden B are initialized at the beginning of the MH algorithm.
Then, uniform design (UD) [45], [46] is executed in the hyperparameter definition domain to obtain n configurations. Note that this is a global random sampling process. UD is used because it has better uniformity than other sampling methods. After obtaining early performance by the function run_then_return_obj_val (x, r), the best s configurations are selected by using top_k(X , L, n i η ) and retain in X R . At last, the final performance of each configuration x in X R is evaluated in run_then_return_obj_val (x, R).

2) MH-TRMPS
Before the start of MH-TRMPS, some parameters need to be defined. Let hypercube H corresponds to the hyperparameter definition space , and hypercube S corresponds to the trust region which is a subregion of H. The region between S and H corresponds to hypercube T, where T = H − S. The initial size of S is set to R s,inintial = 0.5, and the size of H is set to R H = 1. To avoid overexploitation, the minimal size of S is set to R s,min = 0.01, and the reduction factor is defined as α ∈ [0.5, 0.9] (defaultα = 0.7). The details of the MH-TRMPS algorithm are as follows: Step 1. Initial trust region construction As shown in Fig. 6, generate x # and [X R , F R ] by MH in the definition space H and construct the initial trust region S with current optimum x # as the center. Append [X R , F R ] to [X , F].

Step 2. MPS optimization
Step 2.1. UD sampling. If x ∈ X within S, place x in X 0 . Suppose that n 0 is the number of points in X 0 , if n 0 < 10, 10 − n 0 expensive points are sampled from S by UD, evaluated, and appended to [X 0 , F 0 ].
Step 2.2. Global model construction. The linear spline function is used to fit [X 0 , F 0 ] to obtained a global model which is defined as: wheref 1, 2, . . . ,m, · is the Euclidean norm, m is the number of expensive points x i , and α i are the weights.
Step 2.3. Mode pursue sampling. Define g (x) = c 0 − f (x) as the mode function, where c 0 is any constant that ensures g (x) ≥ 0. Based on the mode function, the randomdiscretization-based Monte Carlo sampling method [47] is performed to systematically generate more sample points in the neighborhood of the function mode while statistically covering the entire search space. In one iteration, n mc = n v /2 expensive points will be selected from n 1 (usually a large number) cheap points and added to [X 0 , F 0 ], where n v is the number of variables. After each iteration, a speed control factor r is used to control the sampling process to balance exploration and exploitation.
Step 2.4. Local model construction. Construct the quadratic regression (QR) model using the X q , F q from [X 0 , F 0 ], where X q , F q is the neighborhood closest to the current optimum. The mathematical expression of an n-dimensional generic quadratic model is defined as: where β i , β ii , and β ij are regression coefficients, x i represents the expensive points in X q .
To assess the performance of model fitting, a coefficient of determination R 2 is used. R 2 is defined as: whereŷ i represents the QR function values, y i represents the back-box function values, and y is the mean of y i .
Step 2.5. Locally sampling and local model reconstruction. If R 2 ≈ 1, n nb = n v /3 expensive points are sampled randomly in the neighborhood and evaluated, and added to X q , F q and [X 0 , F 0 ]. Reconstruct the QR model using X q , F q and recalculate R 2 . Calculate Diff by the follow equation where f Step 2.6. Optimization based on the local model. If R 2 ≈ 1 and Diff < ε d , perform local optimization based on the QR model to generate a global optimal candidate x 0 , and evaluate its black box function value. Add (x 0 , f 0 ) to [X 0 , F 0 ].
Step 3. Adaptive trust region strategy Step 3.1. Let f min = min(F). If x∈X 0 and x / ∈ X , place (x, f ) in [X , F]. If x ∧ < f min , the center of S moves to the current optimum, and go to Step 2.
Step 3.2. If the MPS search does not improve for n uimp = 2 consecutive iterations, reduce R s to αR s , and go to Step 2.
Step 3.3. If the radius of S is continuously reduced by n reduce = 2 times, and the MPS search no improvement, search the T using MH to generate x # and [X R , A new trust region S is then reconstructed with x # as the center, and go to Step 2. Else, reduce R s to αR s , and go to Step 2.
Step 3.4. If R s reaches the preset minimum radius R s,min or the maximal number of function evaluations is met, stop iteration and return to the current optimum x * with the smallest intermediate loss seen so far. Else, go to step 2.
Simply put, if the MPS search is consistently reliable, move the location of TR i so that its center always falls on the current optimum, and try again; if the MPS search failed continuously, reduce the size of TR i and try again; if the candidate solution does not produce a sufficient increase in the modeling accuracy after multiple reduction of the size of TR i , j = j+1. Meanwhile, MH j search is conducted in the untrusted region; if MH j search is reliable, i = i+1; where i is the number of times the initial trust region is constructed, and j is the number of times MH is implemented. In the theory, the global optimum can be obtained through the iteration process of MH and MPS, because MH has global sampling characteristics while MPS has global convergence [37]. Fig. 7 shows the simplified diagram of ANN surrogate adaptive modeling based on MH-TRMPS. There are three parts: input data, hyperparameter selection and model training, and output of the optimal model. To put it another way, the adaptive modeling is essentially a nested optimization process that involves: (1) for each given hyperparameter configuration, using Gradient descent algorithm (e.g., Adam) to optimize the parameters of the ANN to obtained a well-trained surrogate model (inner optimization), (2) using hyperparameter optimization algorithm (e.g., MH-TRMPS) to optimize the hyperparameters to generate a new configuration (outer optimization), and (3) iterating steps (1) and (2). The hyperparameter optimization mathematical expression is defined as:

B. ADAPTIVE MODELING USING MH-TRMPS
where A λ is ANN surrogate with the given hyperparameter configuration λ, is hyperparameter definition domain, L is VOLUME 8, 2020 validation error, X train is training data, and X valid is validation data.
Obviously, the evaluation of hyperparameter configuration is time-consuming, because a well-trained model requires multiple iterations to converge. It can also be seen from Fig. 7 that in this work, the structural hyperparameters (λ 1 ) of the ANN surrogate model need to be determined first, and then the hyperparameters (λ 2 ) of the learning algorithm are selected to train the model. Moreover, the difficulties of selecting the optimal configuration may increase exponentially as the number of hyperparameters increases. Take MLP as an example. During the cross-validation process, if the termination condition is satisfied, multiple ''λ → L are obtained. And then, the current optimal hyperparameter configuration λ * with network parameters w and b will be output as the best model A λ * .

V. NUMERICAL EXPERIMENTS
In this section, all the compared methods will be used to model the unknown regulating mechanism of the nonlinear HAD for the SAS system, and the modeling performance will be given and discussed.

A. EXPERIMENT SETTINGS
The historical collected data from data collection system is designed into three data sets. Each data set is split into a training and validation set: (1) case-1 has 1400 and 87 groups, (2) case-2 has 5000 and 200 groups, and (3) case-3 has 20000 and 3000 groups. Concretely, for case-1, expert method (EXP) [6] is used as the benchmark; for case-2, grid search (GS) [12] is used for comparison; for case-3, RS [28], GP-EI [30], and hyperband [36] are used as the benchmarks. The RS and hyperband are direct stochastic optimization methods, while GP-EI is one of the well-known model-based Bayesian optimization methods.
The down-sampling rate η is set to 3 for hyperband and MH, and the initial start points for model-based methods is set to 10. In this work, 50R are used as the total iteration resources to constrain the comparison method, where R is set to 250. The root mean squared error (RMSE) is used to assess the generalization capabilities of the ANN surrogate model and is defined as: where y (k) p are the actual outputs and y a (k) are the normalized expected outputs at the kth test point, and n is the number of test points.
All experiments are repeated 10 times. Concretely, a search space is defined for each hyperparameter, from which the configuration is generated one by one to construct ANN surrogate model for cross-validation. Under the constraints of total iteration resources, only 50 hyperparameter configurations are verified in one experiment, and the best result will  be retained. When all 10 experiments have been performed, the average value will be calculated and compared.
To ensure that the experiment settings are consistent with those of the original literatures: (1) For EXP, the learning rate and the number of hidden layer neurons are set as adjustable hyperparameters; (2) For GS, learning rate, batch size and number of hidden layer neurons are set as adjustable hyperparameters. More detailed experiment settings are shown in Table 1.
The setting of MH-TRMPS is listed in Table 2 (two columns on the right of Table 2) because it is an adaptive method. Note that the learning algorithm hyperparameters are the same because Adam is used to train all three models. But for different models, the structure hyperparameters are usually different, as shown in Table 2. Generally, the definition of hyperparameter search space affects the optimization efficiency, so it still relies on expert experience to avoid too large a definition domain.
Codes for all compared algorithms can be obtained from shared resources: MPS from the Product Design and Optimization Laboratory (PDOL) 1 ; GP-EI 2 and hyperband 3 from an open source project hosting platform GitHub. For random search and grid search, they are very easy to achieve. To ensure the independence of MPS and hyperband, MH-TRMPS is implemented by MATLAB and Python language together, but the compiler environment is python.

1) THE MODELING RESULTS OF CASE-1
The average damping forces predicted by MLP and FNN surrogate models of the nonlinear HAD are plotted in Figs. 8 and 9. The results of EXP and MH-TRMPS methods are shown in green and red, respectively. Blue represents the expected value. The results show that the model constructed by MH-TRMPS has higher accuracy than the model constructed by EXP method. The reason may be that in the   adaptive modeling, more hyperparameters are optimized so that the model can accurately approximate the training data.
Apart from the prediction results, the comparison results of modeling using EXP and MH-TRMPS are listed in Table 3. Moreover, the errors between the command signal and the average predicted value are also plotted in Fig. 10. With the offset between positive error and negative error considered, the average value of error is calculated by using absolute value of the predicted results.
Compared with the EXP method, MH-TRMPS algorithm is much better in terms of implementation accuracy of the desired damping force, and the average RMSE is reduced by at least 36.6%. Moreover, the average RMSE of MLP model using MH-TRMPS method is about 42N , which is even less than that of FNN model using EXP method. As shown in Fig. 10, the average prediction error of the model using MH-TRMPS is closer to the 0 axis than that of EXP method. Therefore, it is confident of integrating the adaptive ANN surrogate model with any SAS control algorithms to optimize the performance of the vehicle.  Table 4 and shown in Fig. 12 to respectively describe average RMSE and the error between the command signal and the average predicted value. It can be seen from the results that compared with the GS algorithm, the accuracy of HAD modeling using the MH-TRMPS algorithm is higher, and the RMSE is reduced by 14.5%. The main reason is that the GS needs to divide the search space of   each hyperparameter, and then evaluate the combination of the divisions. With limited resources, fine division and exhaustive evaluation are difficult to achieve, especially for high-dimensional problems. In this case, five variables with five divisions are used for testing, which lead to a coarse assessment.

3) THE MODELING RESULTS OF CASE-3
The mean best value of prediction metrics of adaptive modeling found so far as the multiple of R varies are plotted in Fig. 13. It shows that compared with the benchmarks, MH-TRMPS can obtain better modeling performance. For example, the average RMSE of RBFNN model is under 8N ,   43N and 38N respectively. This means that in this case, RBFNN is more suitable to model the regulation mechanism of the nonlinear HAD with a high accuracy.
From the multiple of R marked on the horizontal axis of Fig. 13, it also can be seen that after 12R, 8R, and 12R, the mean minimum RMSE of MH-TRMPS is less than those of all the benchmarks that exhausted 50R. This means that the total number of iteration resources for function evaluations of MH-TRMPS is reduced by 76% ( 50−12 50 × 100%), 84%, and 76% compared to the best benchmark. The best results of adaptive modeling by MH-TRMPS and compared algorithms are listed in Table 5. It can be seen that the average RMSEs of MLP, FNN and RBFNN models constructed by MH-TRMPS are about 43, 38 and 7.8, while the best benchmarks are about 46, 40 and 9.1, which are reduced by about 6% ( 46−43 46 × 100%), 5%, and 14.6%, respectively.
The experiment also shows that because RBFNN has only one hidden layer, the training speed is very fast. For FNN, its training speed is affected by the number of neurons of the rule layer. Therefore, a trade-off between speed and accuracy is necessary. MLP is a deep neural network. Its training speed is also fast because the activation function is ReLU, which is very simple. However, the modeling accuracy of MLP is not stable enough because a large number of weights and biases need to be initialized randomly. Table 1, the adaptive method can greatly improve the accuracy of the ANN surrogate model of HAD. The main reason may be that in the process of adaptive modeling, more hyperparameters are adaptively adjusted, so that the model can be fine-tuned to accurately approximate the training data. For example, the learning rate is carefully selected in Ref. [6], but the batch size is not given. Furthermore, Table 1 also shows that the prediction error reduction of the MLP model is 48%, which is bigger than that of the FNN model. This is because the number of layers in the FNN model is fixed, while the number of layers in the MLP model can be adaptively selected.

As shown in
For the case-2 of testing, compared with the semiadaptive modeling using GS, the adaptive modeling using MH-TRMPS has higher accuracy. The reason may be: In Ref. [12], (1) the number of hidden layers is set to a fixed value, which is 3, and (2) each hyperparameter is searched separately to avoid dimensional disaster. All of these settings are not adaptive enough.
Case 3 is a test of adaptive modeling using different methods and MH-TRMPS obtains the state-of-the-art performance. Results among the methods are slightly different in accuracy, but dramatically different in the iteration resource consuming. For example, under the same validation answer, compared with the best benchmark, the average reduction of the iteration resources for function evaluation of MH-TRMPS is about 78.7%. But for final accuracy, only an average 8.4% improvement is obtained. This is because in this case, only about six hyperparameters need to be tuned, which is a lowdimensional optimization problem. Therefore, a good set of hyperparameters can be found after each compared algorithm exhausted 50R resources.

VI. CONCLUSION
This work presents an adaptive ANN surrogate modeling method based on MH-TRMPS for the HAD problem. Three ANN algorithms are used for testing on three data sets. Among them, the RBFNN is used for the first time to model the nonlinear HAD of SAS system. According to the experimental results, the following conclusions can be drawn: (1) RBFNN algorithm can be used to model the regulating mechanism of HAD, which has a much higher accuracy when compared with MLP and FNN.
(2) The accuracy of the adaptive method for modeling the nonlinear HAD is higher than manual and semi-adaptive methods in the previous literatures.
(3) Compared with other adaptive methods, MH-TRMPS not only achieves higher modeling accuracy, but also consumes less iteration resources.
Adaptive ANN surrogate modeling based on MH-TRMPS can be found to have excellent performance through many experiments. However, if the early stopping strategy is ineffective for practical issues, the MH will degenerate into random search, which will reduce the optimization efficiency. Future work will parallelize MH-TRMPS for potential expansion related to distributed computing, because: (1) MH has the potential to be parallelized since the global sampling is independent. The most straightforward parallelization scheme is to distribute individual brackets to different machines, which can be done asynchronously. As machines free up, new brackets can be launched with a different set of samples; (2) The processes of MH and MPS are independent, and are suitable for parallelization as well. Moreover, the adaptive model will be applied to the real vehicles in the future work. From May 2005 to September 2007, he conducted a Postdoctoral Research in the Department of Mechanical, Materials, and Aeronautical Engineering, Illinois Institute of Technology. He is currently a Professor with the Department of Mechanical and Electrical Engineering, Guangdong University of Technology. His research interests include big data-driven product design methods, complex product multidisciplinary integrated design optimization, artificial intelligence deep learning, and machine vision inspection. He presided over three National Natural Science Fund projects, two National 863 projects, more than ten Provincial Science and Technology projects, and published more than 40 articles.
ZEYING HUANG received the B.E. degree in mechanical design and manufacturing and automation from the Guangdong University of Technology, Guangzhou, China, in 2019, where he is currently pursuing the master's degree with the College of Mechanical and Electrical Engineering.
His research interests include the application of deep learning and transfer learning in the field of mechanical engineering.
ZHIQIAN LUO received the master's degree from the University of Macau, Macau.
He is currently working with the Automotive Engineering Institute, Guangzhou Automobile Group Company Ltd. His research interests are semi-active and active suspension control.