Hierarchically Reorganized Multi-Layer Fuzzy Neural Networks Architecture Driven With the Aid of Node Selection Strategies and Structural Network Optimization

In this study, a design methodology based on fuzzy sets inference and polynomial neural network(PNN) for hierarchically reorganized self-organizing network architecture is introduced to cope with over-fitting as well as multi-collinearity problems which generally appear in a conventional fuzzy neural network. The design method of the proposed self-organizing network structure provides an efficient solution to construct the hierarchically reorganized multi-layer fuzzy neural networks (HRmFNN) architecture through a synergy of multi-techniques such as L2-norm regularization, probability theory, and multi-optimization. The overall network structure is realized with the aid of parallel network structure with newly added inputs as well as effective neuron selection method through the exponential-based roulette selection technique for each layer in HRmFNN, and the least square error estimation (LSE)-based learning method with L2-norm regularization is used for constructing the stabilized network architecture, and their ensuring design methodologies result in alleviating the overfitting phenomenon and also enhancing the generalization ability. For the performance enhancement of HRmFNN directly affected by some parameters such as the number of input variables, collocation of the specific subset of input variables, the number of membership functions per each variable, and the order of polynomial in the consequent parts of the fuzzy rules, multi-particle swarm optimization (MPSO) is exploited for the effectively structural as well as parametric optimization of the proposed network. That is, the multi-optimization helps achieve a compromise between the better generation performance and the alleviated over-fitting leading to the stabilization of the proposed multi-layered self-organizing network structure with the aid of synergistic multi-techniques such as a) L2-norm regularization-based LSE learning, b) probability theory for effective neuron selection, and c) novel parallel network structure including newly added inputs and neuron selection method. The performance of the proposed network structure is quantified by comprehensive experiments and comparative analysis. It is also demonstrated through the application to cement compressive strength.


I. INTRODUCTION
In recent years, a lot of research has been devoted to machine learning and multi-layered network The research on the stabilization of multi-layer neural networks is on the rise as well [1]. Novel regression algorithms and neural networks have been paid increasing attention to stabilized model prediction [2]- [4]. Many researchers have provided persuasive theories combined with real-world applications, especially in time series by regression algorithm [3]- [6].
As one of the classical neural networks, the Group Method of Data Handling (GMDH) [2] was applied to a great variety of areas including multilayer network and knowledge discovery, prediction and data mining, optimization, and pattern recognition. The GMDH algorithm can automatically find the relationship among the data, select the optimal structure of the network, and improve the accuracy of the existing algorithm [3]- [5]. Polynomial Neural Network (PNN) [6] extended with the aid of the GMDH algorithm come with a flexible architecture whose potential could be utilized by regression models. In PNN, every node exhibits a high level of flexibility and realizes a polynomial type of mapping (viz. linear, quadratic, and cubic) between input and output variables. Fuzzy Polynomial Neural Networks (FPNN) [7] based on PNN and Fuzzy Relation-based Polynomial Neuron (FrPN) with a different type of polynomial were expected to be more flexible generation capability. Especially every node exhibits a high level of flexibility and realizes a polynomial type of mapping (linear, quadratic, and cubic) between input and output variables in FPNN. In the case of the probabilistic or regression model, an over-fitting problem as well as a multi-collinearity problem occurs depending on the structure of the model and the characteristics of the data [10]. In order to alleviate such over-fitting and multi-collinearity problems, we present a novel learning method in fuzzy-neuro models as the solution of the over-fitting and multi-collinearity problem from several points of view: In the study, we propose a novel hierarchically reorganized network architecture as well as a learning technique for designing multi-layer fuzzy neural networks. The key issues and advantages of constructing HRmFNN are highlighted in the following: First, in the generation process of each layer of the overall network architecture, the inputs of each layer are reorganized with the use of parallel layer structure newly added through original inputs and front layer outputs. Compared with the fuzzy relation, the fuzzy space divided by the fuzzy set can reduce the number of fuzzy rules, which is equivalent to greatly reducing the parameters required to form the model, thus reducing the complexity of the model and alleviating the emergence with the increase of the number of layers overfitting problem.
Second, the HRmFNN structure that takes a parallel connection network is introduced. As the number of layers of the self-organizing network increases, the similarity of nodes in each layer is very high, and overfitting caused by multicollinearity will occur. Therefore, in the existing network structure, each layer adds the original input, increasing the diversity of the input nodes of each layer, allowing the self-organizing network to fully train while alleviating the overfitting phenomenon caused by multicollinearity.
Third, a new criterion for node selection that increases the diversity of nodes and reduces the complexity of the model is introduced in order to alleviate over-fitting being caused by multi-collinearity problems. in order to reduce the possibility that several potential nodes are discarded on the ground that they do not show good ability in the current layer, we use the exponential-based roulette selection which is a statistical selection method based on the probability which is defined based on the exponential-based performance index.
Forth, we estimate the coefficients by considering the regularization factor in order to reduce the complexity of the proposed model and to alleviate the over-fitting caused by the large deviation between coefficients. To solve this problem, the least square error (LSE) method with L 2 -norm regularization is used as a learning method for constructing HRmFNN. So by doing this, we expect that the variance between coefficients in each node will be decreased and thereby the generalization ability will be enhanced.
Fifth, the performance of HRmFNN is directly affected by some hyperparameters such as the number of input variables, collocation of the specific subset of input variables, the number of rules, and the orders of the polynomials in the consequent parts of the rules. The structure and parameters of HRmFNN are optimized by MPSO to achieve better performance. Three objective functions including performance are used to evaluate the accuracy, complexity, and interpretability of HRmFNN. The objective functions are the performance of the model, the entropy of partition, and the sum of squared coefficients in HRmFNN to be estimated. Three objective functions used in the optimization process would form a sound tradeoff between better generation performance and alleviating the over-fitting problem. The proposed multioptimization method is exploited to carry out the structural and parametric optimization of the hierarchically reorganized network architecture for the enhancement of generation capability through the minimization of complexity and the maximization of accuracy.
Finally, when it comes to the structural design, techniques such as parallel network structure and MPSO as well as algorithmic design methodologies such as exponentialbased roulette selection technique and LSE-based learning with L 2 -norm regularization, and also their ensuing synthesis technologies of the proposed HRmFNN could lead to the realization of the stabilized multi-layered network architecture through the alleviation of over-fitting and multi-collinearity.
The HRmFNN architecture is not fixed in advance as in case of a self-organizing network but becomes organized through the growth process of the generation of the layers and nodes(neurons) of the network. Along with the use of synergistic design methodologies such as exponential-based roulette selection technique, LSE-based learning, and two types of parallel network structures as shown in section 2, the VOLUME 10, 2022 structural and parametric multi-optimization of the network through them is more effectively carried out for constructing the stabilized as well as an enhanced multi-layered network of HRmFNN. First, in the design process of HRmFNN, the structure of fuzzy set-based polynomial neuron (FsPN) as a node (neuron) of each layer of the network is optimized. Four kinds of structural parameters of FsPN contain a) the number of input variables, b) a collocation of the specific subset of input variables, c) the number of membership functions, and d) the type of polynomial. Next, after obtaining the optimized FsPN, the optimization design of both nodes selected in the current layer and their ensuing layer (next layer) leads to the optimized HRmFNN structure [41]- [44].
In the sequel, the main contributions of our work can be summarized in a concise way as follows: First, HRmFNN structure consisting of the original network based on a fuzzy set as well as a newly added parallel network is proposed in order to get better performance in the stabilized multi-layer network structure. Second, in order to alleviate over-fitting being caused by multi-collinearity problems, the exponential-based roulette selection technique for node selection and LSE-based learning with L 2 -norm regularization is proposed. Third, the synergistic effect of a) structural design techniques such as parallel network structure and MPSO as well as b) algorithmic design methodologies such as exponential-based roulette selection technique and LSE-based learning with L 2 -norm regularization leads to the stabilized multi-layered network architecture through the alleviation of over-fitting and multi-collinearity.
The structure of this study is organized as follows. Section 2 elaborates on the architectural framework of hierarchically reorganized multi-layer fuzzy neural networks. Section 3 provides a multi-optimization of the overall framework of HRmFNN with the aid of MPSO. Section 4 reports a comprehensive set of experiments. Finally, concluding remarks are shown in Section 5.

II. ARCHITECTURAL FRAMEWORK OF HIERARCHICALLY REORGANIZED MULTI-LAYER FUZZY NEURAL NETWORKS
This section elaborates on the design of HRmFNN based on polynomial reasoning. The main differences between the previous works (FPNN) and the proposed HRmFNN are shown in Table 1. A detailed explanation of the differences and the improvements by the proposed techniques are summarized as follows.
1) Division of fuzzy space: The fuzzy space divided by the fuzzy set can reduce the number of fuzzy rules, thus reducing the complexity of the model and alleviating the emergence with the increase of the number of layers Overfitting problem.
2) Structure: Parallel network structure including newly added inputs. −− See Section 2.B and Figure 2.
When compared with the previous studies, the proposed network structure leads to the stabilization of the unstable network structure, which is caused by overfitting and multicollinearity of the high similarity between each input node in the multi-layer network structure.
3) Performance criteria for neurons to be selected in each layer: Probability theory for effective neuron selection through the exponential-based roulette selection technique. −− See Section 2.C.
The nodes in FPNN are selected based on the performance index (PI) of the training dataset. Thus, the model is over-fitted toward the training dataset, and this design methodology results in the degradation of generalization ability.
In this study, we consider the probability mechanisms for effective neuron selection through the exponentialbased roulette selection. The roulette nodes selection based on the probability theory enables the network structure to alleviate the overfitting phenomenon and to get better EPI in the nodes of each layer as a compromise technique (exponential-based roulette selection) in the proposed network. 4) Learning method: L 2 -norm regularization-based LSE learning. -See Section 2.D.
In the existing FPNN, the coefficients of the consequent part in a fuzzy rule are trained by LSE-based learning. As the size of the layer increases, the pattern of the input variables in the node becomes very similar, and this leads to the overfitting caused by the multicollinearity problem in LSE learning. The singular matrix by the multicollinearity interrupts the training of coefficients, and this phenomenon more frequently occurs in the deep layer.
To overcome such a problem, in this study, we apply LSE-based learning with the aid of L 2 -norm regularization. The learning process is the same with FPNN, but the problem by multicollinearity is alleviated by the L 2 -norm regularization and as a result, it helps the proposed model to be deeper architecture than the FPNN. 5) Optimization: Optimization of the node's structure by multi-particle swarm optimization. -See Section 3, and Figures 5,6,and 7. The structure of each node is determined by particle swarm optimization. Because the node consists of a fuzzy model, we require the proper selection of structural factors such as the number of input variables, selected input variables, the number of membership functions, and polynomial types. The previous models (FPNN) use a single objective function-based genetic algorithm, and the objective function is the performance index (PI) determined by the training dataset. As a result, the optimization technique using the PI-based objective function may lead to the risk of overfitting.
In this study, we consider multi-particle swarm optimization (MPSO) that uses multi-objective functions than one objective function to prevent the overfitting of the model. In the proposed model, three objective functions, such as weighted Performance Index (MPI), Sum of Squared Coefficients (SSC), and Entropy (H), are used to consider the performance, the deviation of coefficients, and the structural complexity, respectively.

A. HIERARCHICALLY REORGANIZED MULTI-LAYER FUZZY NEURAL NETWORKS BASED ON FUZZY SET-BASED POLYNOMIAL NEURONS
Hierarchically reorganized multi-layer fuzzy neural networks (HRmFNN) is based on 'if-then' rule-based fuzzy network with the extended structure of the premise and the consequence parts of the fuzzy rules. The layer consists of fuzzy set-based polynomial neurons (FsPN) generated by newly added parallel layer structure. These neurons are fully reflective of regularity (inherent pattern) involved in numeric data, which are granulated with the aid of fuzzy rule and fuzzy set inference. Polynomial neuron dwells on the concepts of a collection of fuzzy membership function and nonlinear polynomial processing. The number of input variables and their membership functions realized by the input variables implies the partitions of the input space and could be constructed by considering some relationships between inputs and output [17]. The FsPN encapsulates a family of nonlinear ''if-then'' rules. When arranged together, FsPN forms a neural network architecture. This neuron, which is regarded as a generic type of processing unit, dwells on the concepts of fuzzy sets and neural networks. As visualized in figure 1, the FsPN consists of two basic functional modules. The first one, the fuzzy part based on input and output experimental data has become the standard method for dealing with uncertain nonlinear dynamic systems [1], [7]. The fuzzy reasoning usually discussed in the literature attempts to decompose the input space into fuzzy subspaces and then approximates the system in each subspace by linear regression models [12], [22]. The rules of each node of HRmFNN are constructed by means of fuzzy partitioning of spaces that is implemented based on the fuzzy granularity of input space. The approach taken to obtain the shape of any particular membership function is usually oriented to a given application. Usually, we consider trigonometric and Gaussian membership functions here. The second part refers to the function-based polynomial processing that involves some input variables. The activation levels of the individual rules contribute to the output of the FsPN, which is computed as a weighted average of the individual condition parts P j (x). The fuzzy rules of FsPN are shown as follows where R j is the j-th fuzzy rule, x k is the k-th input variable, u j is the membership value of the j-th fuzzy rule and the P j (x) stands for a polynomial of the consequence part in FsPN. The polynomial types used in the proposed model are shown in Table 2. As shown in figure 1, the notation used in the figure requires some clarification. The ''circles'' denote units of the FPN, ''N '' refers to a normalization procedure that is applied to the membership grades, '' '' is the product and the summation operations of all incoming signals, respectively. The output z of FsPN is determined as follows: where K stands for the number of fuzzy rules.û j is the normalized fuzzy membership value of the j-th fuzzy rule. As shown in figure 1. Compared with the fuzzy relation, the fuzzy space divided by the fuzzy set can reduce the number of fuzzy rules, which is equivalent to greatly reducing the parameters required to form the model, thus reducing the complexity of the model and alleviating the emergence with the increase of the number of layers overfitting problem.

B. ARCHITECTURAL DESIGN OF HIERARCHICALLY REORGANIZED MULTI-LAYER FUZZY NEURAL NETWORKS
In contrast to the typical architectures encountered in FPNN, the main challenges in this study are targeted at stabilizing the multilayer network structure designed with the aid of learning method and structural changes for improving the performance of HRmFNN.
As shown in (1) of Figure 2, the conventional FsPN-based multi-layered structure as well as design methodology in generic FPNN as the previous works leads to unstable network structure caused by over-fitting and multi-collinearity VOLUME 10, 2022 being occasionally occurred in the multi-layered network structure [21]. Also, bias and variance of the model affect the over-fitting and under-fitting, the bias denotes the degree of deviation between the expected output of the learning algorithm and the target output, which means the fitting ability of the algorithm (model). The high bias indicates that the prediction result (model output) is very different from the real result (target output) [10]. Variance represents the difference between training and test results in the same model (viz. model constructed by N-fold cross-validation method). Highly complex models become unstable based on changes in training data and testing data. Low bias and low variance will produce a low error, but low bias and low variance are often not compatible. If we want to reduce the bias of the model, it will increase the variance of the model to some extent, and vice versa [22].
As shown in (2) and (3) in Figure 2, in the proposed HRmFNN structure, two types of parallel connection network are additionally considered as the previous layer-based parallel network structure and the 1st layer (with original input variables)-based parallel network structure for the stabilization (viz. the reduction of both bias and variance) of unstable network structure caused by over-fitting and multicollinearity As shown in (4)   model with high deviation and low variance, figure 2(b) shows the low deviation and low variance considered as nearly optimized network structure of the proposed model, and figure 2(c) shows an over-fitting model with low deviation and high variance, Ideally, the smaller the bias and variance are, the better the performance of the model becomes, but it is closely impossible because most models exposure the problem called the bias-variance dilemma.
A summary of the characteristics of the conventional model (FPNN) and the proposed HRmFNN evaluated vis-à-vis the relationship between bias and variance is presented as follows: a) In the case of Bias: High and Variance: Low In the early stage of training (up to the first three layers as shallow layers), the model's ability to fit is not strong enough, thus the bias is relatively large, and the fitting ability is not strong. That is the case of under-fitting by low variance and high bias as shown in figure 2 (a). b) In the case of Bias: Low and Variance: Low figure 2 (b) shows the bias-variance tradeoff of the optimized model. That is, low bias and low variance will get a low error, however low bias and low variance are not compatible in most cases. If we want to reduce the bias of the model by increasing the complexity of the model, it will increase the variance of the model to some extent, and vice versa. Therefore, we need to determine the proper structural design parameters such as the number of layers and rules, as well as methodological design parameters related to both the proposed performance criterion and learning technique in order to get performance close to low bias and low variance in HRmFNN.
Especially, In HRmFNN, we take into consideration a parallel architecture with the newly added layers. when constructing the proposed multi-layered network architecture, two types of parallel network structures are considered as shown in (2)  c) In the case of Bias: Low and Variance: High Under the deepening of the training level (caused by an increase of layers as well as fuzzy rules), the fitting ability of the model is gradually improved, and even the noises and outliers in the training data can be trained through the network model. After completing sufficient training (up to deep layers), the model's ability to fit is very strong, and the slight disturbance such as noise and outlier of the training data will lead to significant changes like the increase of variance of the model. As a result, such changes result in over-fitting of the test results as shown in figure 2 (c), but the proposed design methodologies lead to alleviating the overfitting as well as decreasing the variance of the proposed model.

C. EXPONENTIAL-BASED ROULETTE SELECTION TECHNIQUE
The better the modeling performance of a node is, the higher the possibility that the associated node can be chosen becomes. This means that a node, which is inferior to other better nodes in the current state and may have the potential to enhance the performance of the whole networks later, cannot be selected to compose the current layer. In other words, the selection after sorting in terms of the modeling performance may discard the useful nodes which can contribute to the improvement of the performance.
In order to reduce the possibility that several potential nodes are discarded on the ground that they do not show good ability in the current layer, we use the roulette wheel selection which is a statistical selection method based on the probability which is defined based on the performance index. In this paper, we use RMSE as the performance index.
whereŷ is the output of the model, and m is the number of data. The probability of each node being selected among all nodes is inversely proportional to its performance index (RMSE). In other words, the better the model performance of each node, the smaller the error and the higher the probability of this note being selected. Here we use exponential function as the fitness function of the probability of each node being selected. The fitness exponential function used is shown in figure 3. The fitness function of the k-th node is as follows The probability P k for each individual is defined in the for where f k means the fitness value of the k-th node. A series of k random numbers are generated and compared against the cumulative probability Q i = i j=1 P i of the nodes. The appropriate individual k is selected and applied as a new input variable, which can be used as input variables of a node in the next layer if the following condition is satisfied.
where U (0,1) denotes a random number derived from the uniform distribution over [0,1]. Algorithm 1 describes each step of the exponentialbased roulette selection (ERS) technique in detail, and figure 4 shows the entire ERS process. The goal of most models is targeted at enhancing the generalization ability from training data in order to make the good predictive ability for unseen data. Overfitting happens when too much learning of model is carried out from training dataset including noise and outlier data, so the constructed model couldn't generalize well, and the poor predictive performance is caused by testing dataset. Especially it occurs as a very common problem when the dataset is too small when compared with the number of model parameters that need to be learned. This problem is particularly acute in the multi-layered FPNN with lots of parameters.

Algorithm 1 Exponential-Based Roulette Selection Technique
Regularization is an important method in alleviating overfitting [20]. Furthermore, some techniques of regularization can be used to reduce model capacity while maintaining accuracy, for example, to drive some values of the parameters to zero.
In the regularization technique, the weight penalty is a standard way and widely used for the training of the model. The penalties try to keep the weights (coefficients) small or non-zero except the big gradients counteracting, which makes models more interpretable. An alternative name for weight penalties is called ''weight decay'' in the regularization technique.
The penalty term is specified as the square of coefficients of a regression model, which is called L 2 -norm regularization in machine learning, shrinkage in statistics, and weight decay in neural networks. λ is called the regularization coefficient and controls how much we value fitting the data well, as a simple hypothesis.
In fact, many fuzzy neural networks have been encountered in the overfitting problem. As the most common and simplest kind of parameter norm penalty, L 2 -norm regularization is one of the effective methods to alleviate this problem. During the design process, the L 2 penalty term is added to the objective function. This method is used to reduce the variation between coefficients being led to multi-collinearity as well as prevent the degradation of generalization ability.
Here, λ stands for a regularization parameter and a p are the coefficients of the polynomial. Assume that (x 1 , x 2 , . . . , x k ) and y represent the input and output of training data, then the least square error estimation (LSE)-based learning method with L 2 -norm regularization algorithm can be summarized as shown in Algorithm 2.

Algorithm 2 Least Square Error Estimation (LSE)-Based Learning Method With L 2 -Norm Regularization
Input: (x 1 , x 2 , . . . , x k ), Y Output: Coefficient matrix A 1. Set regularization factor λ 2. Set X = (1, x 1 , x 2 , . . . , x k ), Y 3. Take the partial difference of error function (a p ) 2 and set it to be zero 4. Estimate the coefficients A by using the formula A = (X T X + λI) −1 X T Y For convenience, we consider a linear format of polynomial in the following way: [Step1] Determine the combination of input variables for the first layer.
Calculate all combinations of input variables to build up the polynomial neuron in the next layer.
[Step2] Calculate the value of the membership function by a triangular membership function.
where a, b, and c denote the centers of membership functions.
[Step3] Normalize the firing strength of each fuzzy rule.
where u j is the fuzzy membership value of the j-th rule andû j is the fuzzy membership value of the j-th rule after normalization.
[Step4] For convenience, the expression can be described as follows: where A i is the vector of coefficients of i-th consequent polynomial and X i is a matrix that includes input data with fitness. In case the consequent polynomial is linear, X i and A i read as shown at the bottom of the page. According to the LSE method, the coefficient can be calculated by the following expression: The HRmFNN sometimes causes an overfitting problem with the increasing number of layers. To alleviate the overfitting problem, we use L 2 -norm regularization method that adds the penalty term to the cost function in the following way: where I is the unit matrix, and λ denotes a regularization parameter. In this study, λ is fixed at 0.01. The coefficients of the polynomial are estimated by the least square u 1m · · ·û nm x 1mû1m · · · x 1mûnm · · · x km w 1m · · · x kmûnm    T A i = a 10 · · · a n0 a 11 · · · a n1 · · · a 1k · · · a nk  error estimation (LSE)-based learning method with L 2 -norm regularization. L 2 -norm regularization commonly added the L 2 -norm penalty to the objective function, which is known as ridge regression or Tikhonov regularization. In most learning algorithms, regularization plays an important role in order to improve the performance of the regression model. In the case of ridge regression, the objective function can be regularized by adding a penalty item (λ||a|| 2 ) as the regularization of parameters to limit the parameter of the solutions. The simple penalty term takes the form of a sum of the square of all of the coefficients, leading to the objective function in the form presented in [21].
The coefficient λ governs the relative importance of the regularization term (λ||a|| 2 ) compared with the term, the sum of squared error. In order to alleviate the over-fitting problem in the conventional FPNN, L 2 -norm regularization is considered in the LSE learning method, which is used for the minimum coefficient estimation by adding a penalty term to the cost function. It's a kind of representative method of reducing the influence of noise by flattening the solution space and also lessening the coefficient size. In the regularization approach, a parameter (λ||a|| 2 ) is included and the reasonable value is implemented in the associated cost function. The parameter of (10), A i could not be obtained if it is a singular matrix or almost a singular matrix, X T i X i . Therefore, the L 2 -norm regularization technique adds a penalty item (λ n p=1 (a p ) 2 in (5)) for the additional diagonal elements (λI ) of the matrix to make the singular matrix invertible as shown in (X T i X i + λI ) of (11), Then the overfitting problem would be alleviated.
Multi-collinearity yields a high-variance model according to the increase of correlation values. L 2 -norm regularization by dealing with the numerical instability of the matrix inversion leads to lower variance and a better prediction model. In the sequel, the combination of both this method and exponential-based roulette selection technique enables the proposed multi-layered network structure to do the decrease overfitting as well as multi-collinearity problem. VOLUME 10, 2022

III. MULTI-OPTIMIZATION OF OVERALL FRAMEWORK OF ENHANCED ENSEMBLE FUZZY SET-BASED POLYNOMIAL NEURAL NETWORK
The HRmFNN architecture is not fixed in advance as a self-organizing network but becomes organized through the growth process of the generation of the layers and nodes (neurons) of the network. Along with the use of synergistic design methodologies such as exponential-based roulette selection technique, LSE-based learning, and two types of parallel network structures as shown in Section 2, the structural and parametric multi-optimization of the network through them is more effectively carried out for constructing the stabilized as well as an enhanced multi-layered network of HRmFNN [45]. In the training process, each layer of the proposed model uses an optimization algorithm, so it takes lot of computing time for training the model. Besides, it is proportional to the number of generations of PSO and the number of layers of the model. For this reason, we have decided to apply particle swarm optimization to the proposed model because PSO is conceptually simple, easy to implement, and computationally efficient. Unlike many other heuristic techniques, PSO has a flexible and well-balanced mechanism to enhance global and local exploration abilities.
First, in the design process of HRmFNN, the structure of fuzzy set-based polynomial neuron (FsPN) as a node (neuron) of each layer of the network is optimized. Four kinds of structural parameters of FsPN contain a) the number of input variables, b) a collocation of the specific subset of input variables, c) the number of fuzzy membership functions, and d) the type of polynomial.
Next, after obtaining the optimized FsPN, the optimization design of both nodes selected in the current layer and their ensuing layer (next layer) leads to the optimized HRmFNN structure. The overall optimization process of HRmFNN is schematically displayed in figure 5.
The design procedure for the optimization of HRmFNN comprises the following steps.
A. STEP 1) CONSTRUCT TRAINING DATA, VALIDATION DATA, AND TESTING DATA Determine input variables, the original data set is divided into three parts: training data, verification data, and test data. The training and validation data are used to design the HRmFNN, and the test data is used to evaluate the performance of the model.

B. STEP 2) SPECIFY INITIAL DESIGN PARAMETERS
There are many detailed decisions to be made about basic design parameters: 1) Use prior domain knowledge that affects the proposed network topology. 2) Select a certain stopping criterion. 3) Enter the maximal number of input variables for each node in the corresponding layer. 4) Determine the total number (N ) of nodes entering the next layer from the current layer in the network. 5) Select the network depth of the HRmFNN to reduce possible conflict between over-fitting and generalization capabilities of the network; 6) Determine the depth and width of the network that needs to be selected due to some tradeoff between the accuracy and complexity of the overall model.

C. STEP 3) DESIGN FsPN AS NODE(NEURON) OF EACH LAYER
In the design of FsPN, the membership function is used to construct the premise part while the least square error estimation (LSE)-based learning with L 2 -norm regularization is utilized to estimate the parameters of the polynomial used in the consequence part. figure 5 visualizes an example of particle interpretation being used for the optimization of FsPN, that is, the number of input variables, collocation of the specific subset of input variables, the number of membership functions per one variable, and order of polynomial and the values. The values of such parameters of FsPN are determined by PSO. The related parameters to be optimized are shown in figure 5. The values of particles are arbitrarily selected to explain specifically the optimization process of the FsPN. The random values shown in four parts of the particle are rounded up to the first decimal digit. The first particle is assigned for the selection of the number of input variables, whose range is set the real number between 2 and 4; the second particle is assigned for the non-repeated and selected input variables, whose range is the real number from 1 to the maximal number of input variables; the third particle is used to decide the polynomial type, which is selected from 2 to 3; there are four types of a polynomial can be selected, the polynomial type is randomly selected from one to four.
The objective function is divided into two parts; MPI is used as the objective function including training data and validation data and comes as a convex combination of these two components: where MPI means weighted performance index obtained by considering both training and validation dataset. θ is a weighting factor that helps form a sound balance between training data and validation data. Regarding the choice of θ, we generally consider the following two situations: 1) The value of θ is set to 1. The HRmFNN is optimized only based on the training data, regardless of the validation data.
In this case, the training data and the validation data will be considered. The choice of θ establishes some trade-offs between the approximation and prediction capabilities of the HRmFNN. Assume that an input-output dataset is denoted as (x k , y k ) = (x 1k , x 2k , . . . , x nk , y k ), k = 1, 2, . . . , m, where m is the number of data patterns, then the performance  index (PI) is used as the performance index of a model as follows: whereŷ is the output of the model, and m is the overall number of data, VPI and EPI stand for performance index for validation and testing datasets. Generally, most optimization problems encountered in real-world problems involve more than a single objective. In the multi-optimization part, three objective functions including MPI are used to evaluate the accuracy, complexity, and interpretability of HRmFNN. The objective functions are the MPI as PI, the entropy of partition, and the sum of squared coefficients in HRmFNN to be estimated [33].
1) The PI is the accuracy criterion of the HRmFNN like (13) 2) As a measure for evaluating the coefficient limitation of the fuzzy model we consider the sum of squared coefficients of HRmFNN, which is expressed as follows: 3) As a measure for evaluating the structural complexity of a model, we consider the entropy of the partition [17]. The entropy of partition reflects a degree of overlap between the regions of fuzzy relations for every fuzzy rule. Considering the training dataset, the entropy of the fuzzy partition reads as where µ jk is the fuzzy membership value of the j-th rule, k is the number of data. VOLUME 10, 2022 To optimize the structure selection of FsPN, we realize the following steps 1) Select the number of input variables and collocation of the specific subset of input variables based on multi-PSO.
2) Determine the number of membership functions, and the order of polynomial based on the selected input variables with the aid of multi-PSO.
3) Calculate the output of the FsPN by LSE-based learning with L 2 -norm regularization.

D. STEP 4) SELECT NODES(FsPN) AND CONSTRUCT THEIR CORRESPONDING LAYER
Select nodes by PSO and construct HRmFNN by training data and validation data, as shown in figure 6. Overall optimization procedure of HRmFNN designed with the use of PSO is carried out by MPSO-optimized HRmFNN (HRmFNN).
Through the exponential-based roulette selection, technique node selection is conducted by the multi-objective optimization from each layer combined with both the generic network and added parallel network. As shown in Figure 7, we find the Pareto optimal sets A by minimizing the objective functions related to {MPI, SSC, H}. The Pareto optimal set A is used in order to select the input neurons entering the next layer.

E. STEP 5) MODEL EVALUATION
Use testing data to calculate the performance index (EPI) of each FsPN, M nodes (neurons) with the best performance (EPI) are selected from all FsPN of each layer, and one of M nodes becomes the final result.

IV. EXPERIMENTAL STUDIES
In this section, a series of experiments were performed to evaluate the performance of the proposed neural network. We used the benchmark machine learning datasets for statistical analysis as well as performance comparison. In addition, the experiment is carried out through five-fold cross-validation, and the entire dataset is divided into three parts: training data (50%), validation data (30%), and test data (20%). In the experiment, we evaluate the performance index (PI) of HRmFNN with the aid of L 2 -norm regularization for each neuron. The objective function of MPSO is given as three types of MPI. SSC, and H. In MPSO, the parameters such as c 1 , c 2 , w min , w max , v max and the mutation ratio are selected as 2.0, 2.0, 0.4%, 0.6%, 20% and 0.5, respectively. these values are selected by trial and error in the experiment. The parameters used in the experiment are shown in Table 3.

A. BOSTON HOUSING(BH) DATASET
This dataset concerns real estate in the Boston area (ftp://ftp.ics.uci.edu/pub/machine-learning databases/ housing/ housing.data). The median price of the house (MEDV) is considered as an output variable. This dataset includes 506 input-output pairs. There are 13 input variables. Tables 4 show the performance index of FPNN and the proposed HRmFNN. The performance value is reported as the mean and its standard deviation, and the bold faces indicate the best performance index based on a testing dataset of each model. The conventional model based on FrPN has the best performance in the second layer because the sharp increase of parameters shows over-fitting from the third layer, and the proposed HRmFNN based on FsPN has poor performance in the first layer, but more stability running to the fourth layer, and get better performance than the conventional model FPNN. Table 5 shows the performance comparison by the parallel network structure. The parallel network structure has higher performance as well as exhibits higher level of stabilization. Table 6 shows the performance comparison by 2 types such as PI-based node selection, and exponential-based roulette selection technique in the HRmFNN structure. The HRmFNN using the exponential-based roulette selection technique has higher performance as well as high level of stability. Figure 8 illustrates the comparison of exponential-based roulette selection technique and PI-based node selection   methods in the proposed HRmFNN. figure 8(a) and 8(c) respectively represent the optimal topology generated by PI-based node selection and the topology generated by exponential-based roulette selection under the same conditions in HRmFNN. The number in the node (neuron) represents the index of the node that builds the current layer. It is obvious from figure 8(b) that 26 identical nodes are selected by the two node selection methods, and the remaining 15 nodes are different (because the maximum number of nodes in each layer is 30). Although the current structure constructed by ERS is not yet optimal (7-layer topology), the performance of the network composed of its selected nodes is better than the network constructed by PNS in the 7th layer.
As shown in Table 7 and experimental results, λ = 0 leads to the high possibility of overfitting for testing dataset resulting from higher values of polynomial coefficients of the fuzzy rules, while through the estimation of polynomial coefficients by the change of lambda(λ), the possibility of model overfitting gets more lessened and its ensuing results lead to the stabilization(viz. the alleviation of overfitting for testing dataset) as well as the higher performance of the model according to the change of lambda. Table 8 shows the performance index of optimized HRmFNN, the performance is reported in terms of its mean and the standard deviation. Boldface entries denote the best performance of each model. When increasing the number of layers, the performance obtained on the testing dataset is improved without overfitting. The performance of the proposed HRmFNN using MPSO is much better than that of FPNN. Figure 9 illustrates the details of the optimized topology of the 10th layer of HRmFNN with MPSO, it can be seen that the node with the best performance is generated by parallel structure with newly added layers (the 29th and 31st nodes of the 8th layer are selected).
As shown in figure 10, the output performances of each layer through layer generation are depicted from the viewpoint of an unstabilized network caused by overfitting between the conventional models and the proposed models. Especially, figure 10 shows HRmFNN based on multi-optimization continues to maintain the stabilized multilayered network structure during the growth(generation) process of layers. In the case of using multi-optimization, the optimization procedure is carried out based on 3-dimensional VOLUME 10, 2022   values (MPI, SSC, and H) for enhancing the generalization ability of network layers.
As shown in figure 10, the strictly different point between using optimizations is shown from the viewpoint of the stabilization as well as generalization ability of deeper network structure through the process of layer growth. In the construction of deeper network architecture according to the increase of the number of network layers, the without optimization causes the divergence of output performance by over-fitting problem, while the use of multi-optimization leads to the preferred value of output performance as the stable network structure. As shown in Fig 11 and 12, the characteristics of    Figure 12 shows the results of accuracy versus SSC and accuracy versus entropy in 2-dimensional space, respectively. Table 9 offers a comparative summary of the proposed model when being contrasted with other models. The approximation and generalization abilities of the proposed model are largely improved in comparison with the abilities of other models. Especially among FPNN, and HRmFNN from the viewpoint of deep-layer structure as a multi-layered network, HRmFNN is depicted as a much more stabilized network structure as shown in figure10.   is the automobile's fuel consumption expressed in miles per gallon. This dataset includes 392 input-output pairs (after the removal of incomplete data points). There are seven input variables such as cylinder, displacement, horsepower, weight, acceleration, model year, and origin. Table 10 shows the performance index of the FPNN and the proposed HRmFNN. The performance value is reported as the mean and its standard deviation, and the bold faces indicate the best performance index based on a testing dataset of each model. We compare the output of the proposed HRmFNN with this of the FPNN, and the performance is depicted for the validation data set and testing dataset during the generation of a layer. EPI is better in HRmFNN from the viewpoint of the generation as well as overfitting of multi-layered network structure. Table 11 shows the performance comparison by the parallel network structure. The parallel network structure has higher performance and stabilizes better. Table 12 shows the performance comparison by 2 types such as PI-based node selection, and exponential-based roulette selection technique in the HRmFNN structure. The HRmFNN using the exponential-based roulette selection technique has higher performance as well as more stabilization. figure 13 illustrates the comparison of exponential-based roulette selection technique and PI-based node selection  methods in the proposed HRmFNN. figure 13(a) and 13(c) respectively represent the optimal topology generated by PI-based node selection and the topology generated by exponential-based roulette selection under the same conditions in HRmFNN. The number in the node (neuron) represents the index of the node that builds the current layer. It is obvious from figure 13(b) that 30 identical nodes are selected by the two different kinds of node selection methods, and the remaining 18 nodes are different (because the maximum number of nodes in each layer is 30). Although the current structure constructed by ERS is not yet optimal (6-layer topology), the performance of the network composed of its selected nodes is better than the network constructed by PNS in the 6th layer.
As shown in Table 13, λ = 0 leads to the high possibility of overfitting for testing dataset from bigger values of polynomial coefficients of the fuzzy rules, while through the estimation of polynomial coefficients by the change of lambda(λ), the possibility of model overfitting gets more lessened and its ensuing results lead to the stabilization(viz. the alleviation of overfitting for testing dataset) as well as the higher performance of the model according to the change of lambda.   Table 14 shows the performance index of optimized HRmFNN, the performance is reported in terms of its mean and the standard deviation. Boldface entries denote the best performance of each model. When increasing the number of layers, the performance obtained on the testing dataset is improved without overfitting. The performance of the proposed HRmFNN using MPSO is much better than that of FPNN. Figure 14 shows a series of performance values for the testing dataset according to the increase in the number of layers. In the case of the conventional FPNN, HRmFNN based on FsPN, HRmFNN with ERS node selection technique they cause overfitting in the third layer and higher. Although there does exist any overfitting tendency in the HRmFNN with parallel structure, the performance seems to be unstable during the growth of the layer. On the other hand, The HRmFNN based on MPSO keeps stable performance without overfitting until the 10th layer. Figures 15 and 16 depict the Pareto fronts generated using MPSO, we can notice that there is an interesting accuracyinterpretability tradeoff. As the entropy (H) of fuzzy partition

C. OTHER DATASETS
To evaluate the performance of the proposed models and the effect of the composite kernel function, different algorithms on 14 well-known benchmark datasets are compared. These datasets are obtained from the University of California at Irvine (UCI) Machine Learning Repository (http://archive. ics.uci.edu/ml/datasets.html). Table 16 presents a summary of the datasets. Table 17 shows the performance index of the conventional FPNN and the proposed HRmFNN. Testing results are better in HRmFNN from the complexity as well as the performance of the model. To further analyze whether the proposed model  is statistically significantly better than the other comparative models, we use the Bonferroni-Dunn test in Table 18, which fits situations where all models are only compared to the control model and not between themselves [40]. If the corresponding average rank differs by at least the critical difference (CD), the performance of any two models is significantly different. At p = 0.10 (significance level), the CD value is 1.50. Table 18 covers the difference of average rank between the five comparative models (Weka) and the proposed models, as well as the comparison results with CD. Since the difference between the average rank of all comparative models and the proposed HRmFNN with MPSO is greater than CD (2. In this application, we used the proposed model to estimate the application of cement compressive strength (CCS). Cement is one of the most widely used building materials in the world. Its physical characteristics greatly affect the safety of the building. Among these physical properties, CCS is the   most important physical and mechanical property reflecting the quality of cement. The extracted 3D microstructure image data of cement is shown in Figure 17. First, cylindrical cement samples were scanned by µ computed tomography (CT) to obtain their 3D microstructure images [38]. Secondly, by capturing the cubic volume of interest (VOI) to generate an image data set, the influence of the air film outside the sample area can be eliminated. Third, extract 3D microstructure image features. The detailed extraction process is shown in [39] uses gray-level histogram (GLH) and gray-level cooccurrence matrix (GLCM) to describe the characteristics of 3D images. For further statistical analysis, the feature values of GLH and GLCM are calculated as 3D microstructure image features [39]. The CCS data set has 56 input dimensions, and the output is the compressive strength of cement. Table 19 shows the performance index of HRmFNN with MPSO, and the comparison of performance is summarized in Table 20. After many experiments, the proposed HRmFNN has better performance when comparing other models.

V. CONCLUSION
In this study, the hierarchically reorganized multi-layer fuzzy neural networks (HRmFNN) architecture is developed as a multi-layered self-organizing network constructed through the hierarchical as well as newly added layer generation process of layers of the HRmFNN. Besides the proposed multilayered self-organizing network structure is realized with the aid of several synergistic multi-techniques such as multi-PSO, additional parallel structure design, LSE-based learning with L 2 norm regularization, and exponential-based roulette selection technique. In the deep multi-layered network structure, multi-PSO based design combined with some synergistic techniques mentioned previously could effectively produce the preferred as well as stabilized network structure through the growth process of layers by considering data characteristic, data dimensionality, and size between input variables, and others.
When the dataset is complex and more sensitive to overfitting, multi-PSO design based on the previously synergistic techniques could enhance the diversity as well as the selection of the layers-and nodes-structure in the multi-layered Network to a certain extent, and their ensuing results lead to the deeply stabilized multi-layered network architecture through alleviation of the over-fitting phenomenon caused by multi-collinearity. In the series of experimental studies, the proposed HRmFNN is much more effectively stabilized than the conventional models from the viewpoint of deeply layered structure as well as performance.
Possible future studies might focus on exploring structural design methodologies to make the proposed network more stabilized, more performance-improved, and highly multi-layered to cope with classification as well as regression problems.