Robust T-S Fuzzy Model Identification Approach Based on FCRM Algorithm and L1-Norm Loss Function

The Takagi–Sugeno (T–S) fuzzy model identification is a very powerful tool for modelling of complicated nonlinear system. However, the traditional T-S fuzzy model typically uses the L2-norm loss function, which is very sensitive to outliers or noises. So an unreliable model may be obtained due to the presence of outliers or noises. In this paper, the outliers and noises robust T-S fuzzy model identification method based on the fuzzy c-regression model (FCRM) clustering and the L1-norm loss function is proposed. The hyper-plane-shaped clustering algorithm has been proved to be more effective than hyper-sphere-shaped clustering algorithm in T-S fuzzy model identification. Therefore the FCRM clustering algorithm is used in T-S fuzzy model identification for structure identification in the antecedent part. A mass of relevant researches have pointed out that the L1-norm loss function is more robust to outliers and noises than L2-norm loss function. In order to reduce the negative influence of outliers and noises, the L1-norm loss function is employed to enhance the robustness of T-S fuzzy model instead of the L2-norm loss function in the consequent part. Regression and classification applications have been used to demonstrate the validity of the proposed method. The experimental results show that the proposed method has significantly improved the modelling accuracy in dealing with data contaminated by outliers and noises compared with other algorithms.


I. INTRODUCTION
With the rapid development of artificial intelligence techniques, the data-driven modelling method [1]- [7] is playing an important role for modelling of complicated nonlinear system in the age of big data, where the performance of these methods crucially rely on the quality of given training data. However, the training data will inevitably is contaminated by outliers or noises because of sampling errors, instrument errors and modelling errors, which will significantly reduce the accuracy and reliability of the established model. Therefore, it is very necessary to put forward the outliers and noises The associate editor coordinating the review of this manuscript and approving it for publication was Ligang Wu . robust data-driven modelling methods to handle the outliers and noises. Takagi-Sugeno (T-S) fuzzy model proposed by Takagi and Sugeno is a very effective data-driven modelling method [8]. It contains the following merits: (1) Easyimplementation [9], [10]; (2) High interpretability [11], [12]; (3) Universal approximation capability [13]- [15]. Therefore the T-S fuzzy model has been successfully and widely applied in the field of high precision modelling and many control problems [16]- [19].
Although the T-S fuzzy model has many advantages, the performance will be heavily deteriorated due to the presence of outliers or noises. The traditional T-S fuzzy model identification approach with L2-norm loss function is very sensitive to the outliers or noises because the L2-norm loss function is prone to be badly affected by outliers and noises [20]. The unreliable T-S fuzzy model will be obtained in handling data with outliers or noises, and its performance can't be guaranteed. However, until now the outliers and noises robust T-S fuzzy model identification approach has not yet been widely researched. So robust T-S fuzzy model identification is worth to study.
However, few researches are focused on the robustness of the T-S fuzzy model with outliers and noises. But related work on other data-driven modelling methods can provide reference for building a robust T-S fuzzy model. Some outliers and noises robustness methods have been proposed for regression, classification, fault diagnosis [21], [22] and so on. Deng et al. [23] introduced the weighted least square scheme and the regularization term to improve the robustness of the extreme learning machine. Zhang et al. [24] used a weighted function to enhance the robustness of the least square support vector machine model to outliers. Although the weighted strategy can improve the robustness of algorithms to some extent, there also remain defects that limit its practical performance. The appropriate weight estimation relies heavily on previously established T-S fuzzy model and has an important influence on the final output. But we can't guarantee to obtain a good previous established T-S fuzzy model.
Most of related researches have proved that the L1-norm loss function is more robust to outliers and noises due to its sparsity [26]- [28]. For the L2-norm loss function, the corresponding loss increases in square with the increase of errors. When training data contains outliers or noises, regression errors in the outliers will dominate the increase or decrease of the entire loss function value, which will cause the trained model at this time to be biased towards outliers. But the loss of L1-norm function increase linearly with the error, which is slower than L2-norm loss function. Some studies have proved that the L1-norm is more robust than L2-norm and it has been widely applied in handling outliers or noises.
The fuzzy c-regression model (FCRM) clustering algorithm has been proposed for structure identification of T-S fuzzy model, which is a hyper-plane-shaped clustering algorithm [29]- [38]. The FCRM still has certain limitations when facing complex datasets with various shapes, sizes, and densities. So some clustering algorithms which are capable of clustering of arbitrary shape have been proposed [39]- [47]. Although these clustering algorithms have better performance than FCRM for data with various shapes, sizes, and densities, the FCRM described by several linear regression models maybe more suitable for the T-S fuzzy model identification because the final T-S fuzzy model is consisted of multiple linear regression models. Furthermore, this paper mainly studies the robustness of the T-S fuzzy model for data with outliers and noises, and the major innovation of this paper is in consequent parameters identification no structure identification. So in this paper, the mostly used FCRM clustering algorithm is adopted for T-S fuzzy model identification.
The hyper-plane-shaped clustering algorithm FCRM has been shown to be more effective than hyper-sphere-shaped clustering algorithm in T-S fuzzy model identification. Thus, a robust T-S fuzzy model identification approach based on the FCRM clustering algorithm and L1-norm loss function is proposed. The FCRM is used to partition the input-output data space to obtain the optimal data structure of the T-S fuzzy model in the antecedent part. With the obtained antecedent parameters, the L1-norm loss function which has more strong outliers or noises robustness is introduced to estimate the output error instead of the traditional L2-norm loss function in the consequent part. Four regression applications and a classification application have been used to verify its effectiveness. The experimental results show that the proposed method has significantly improved the modelling accuracy in handling data with outliers and noises.
The rest of this paper is organized as follows. In Section II, the background of T-S fuzzy model and FCRM clustering algorithm are introduced. In Section III, the novel robust T-S fuzzy model identification approach is proposed. Section IV shows experimental results and analysis. Section V presents the conclusion and future works.

II. TAKAGI-SUGENO FUZZY MODEL IDENTIFICATION A. T-S FUZZY MODEL
Assuming a multiple-input-single-output (MISO) system P(x, y) needs to be identified, in which x ∈ R M is the system input and y ∈ R is the system output. The T-S fuzzy model of this system is consisted of several IF-THEN fuzzy rules: where R i (i = 1, 2, 3, . . . , c) is the fuzzy rule, c is the number of fuzzy rules. M is the dimension of the input vector, x = [x 1 , x 2 , ..., x M ] T is the input vector of the fuzzy model, which is composed of system input and output variables. ''T '' is the transposition operator of matrix. y i is the i th sub-model output and {θ i j , j = 0, ..., M } is the consequent parameter of i th sub-model The final output of T-S fuzzy model is comprised of those sub-models as a form of weighted mean defuzzification: where the weight w i denotes the overall membership grade of input x belonging to the i th sub-model. It can be calculated as: where µ A i j (x j ) is fuzzy membership grade of x j belonging to fuzzy set A i j . There are triangular membership function, trapezoid membership function, Gauss membership function and so on can be chosen according to specific problems and experimental results. The Gauss membership function which is mostly used VOLUME 8, 2020 in T-S fuzzy model is chosen in this paper: where v i j , σ i j represent the center and width of the membership function respectively.

B. FUZZY C-REGRESSION MODEL CLUSTERING ALGORITHM
To obtain the center and width of the membership function, clustering algorithms are employed for fuzzy space partition. The FCRM has been used to partition the input-output data.
There are N input-output data pairs (x k , y k ) (k = 1, . . . , N ) and they will be divided into c subclasses using the FCRM, in which the i th subclass data samples are described by a linear regression model: It is a hyper-plane function. Where x k = [x k1, · · · x kM ] T is the k th input vector, ''·'' is the multiplication operator of two vectors and T is the parameter of the i th linear regression model. The distance between the k th sample to the i th linear regression model is defined as follows: Therefore the objective function of FCRM is defined: where m ∈ (1, ∞) is the fuzzy weighted exponent and it is often set to be 2. u ik ∈ [0, 1] is the fuzzy membership degree of k th sample belonging to the i th cluster. The fuzzy membership degree u ik and the linear regression model parameter ω i can be obtained by minimizing the objective function Eq. (7) [29]: ., y N ] T is the actual system output. By iterating Eq. (8) and Eq. (9), the fuzzy membership degree u ik can be obtained.

C. IDENTIFICATION OF ANTECEDENT PARAMETERS
The main task of identification of antecedent parameters is to obtain reasonable center and width of membership function for each sub-model. Once the fuzzy membership matrix has been obtained, the antecedent parameters composed of the center v ij and width σ ij of fuzzy membership function can be calculated: The consequent parameters can be calculated from the following matrix equation: T is the consequent parameters of T-S fuzzy model and A is the coefficient matrix, which can be calculated according to Eqs. (2)-(4): where λ k i is defined as: The least square method is used to solve the matrix equation in (12) to obtain the consequent parameters vector θ: where the symbol ''−1'' is the inverse operator of matrix.

III. ROBUST T-S FUZZY MODEL IDENTIFICATION
In order to improve the robustness of T-S fuzzy model to outliers and noises, the robust T-S fuzzy model identification approach based on the FCRM algorithm and L1-norm loss function (RTS-L1) is proposed in this paper.

A. WEIGHTED T-S FUZZY MODEL
The weighted T-S fuzzy model (WTS) identification method is used to compare with RTS-L1 in this paper. It mainly includes three stages: (1)  x p i is the L P norm. By solving the above loss function the optimal solution θ can be obtained. The matrix form of the objective function shown in Eq. (16) can be rewritten as: For the unconstrained optimization problem, the optimal solution can be explicitly obtained by setting the gradient of the objective function J to zero: By setting ∂J (θ) ∂θ = 0, the θ can be obtained shown in Eq. (15).
For the objective function shown in Eq. (16), the weights of all samples are assigned to be 1. However, the main idea of the WTS is to assign weights to the samples based on training errors. The weighted objective function of WTS is described as: where W = diag{δ 1 , δ 2 , ..., δ N } is the weights of samples. Similar to Eq. (16), the objective function shown in Eq. (19) can be rewritten as: The optimal solution can be explicitly obtained by setting the gradient of the objective function J w to zero: Similar to Eq. (15), the optimal solution θ has the following form: Now, the main task is to determine the weights of samples, the reasonable weights will lead to more reliable robust T-S fuzzy model. There are some methods to estimate the weights based on the model training errors. Suykens et al. [48] proposed a classical weights estimation method: whereŝ is the robust estimate of the standard variance of training errors, which has the form as follows: where IQR is the interquartile range of the samples errors. For input vector, IQR is the difference between the 75th and 25th percentiles of samples errors e. The values of c 1 and c 2 are typically set as 2.5 and 3.

B. ROBUST T-S FUZZY MODEL WITH FCRM AND L1-NORM LOSS FUNCTION
The majority researches have pointed out that the L1-norm loss function is more robust to outliers and noises than L2-norm loss function because of its sparsity. Therefore the L1-norm loss function is employed for identification of consequent parameters and the robust T-S fuzzy model based on FCRM and L1-norm loss function (RTS-L1) is proposed in this paper. The L1-norm loss function of the RTS-L1 has the form as follows: Eq. (25) is a constrained convex optimization problem and it can be solved by the Augmented Lagrange Multiplier (ALM) method [49]. The corresponding augmented Lagrangian function is defined as [50]: L µ (e, θ, λ) = e 1 + a T (y − Aθ − e) + µ/2 y − Aθ − e 2 2 (26) where g is a vector of Lagrange multiplier and b is a constant penalty factor. The ALM algorithm is used to obtain the optimal values of (e, θ, g) by iteratively updating the three parameters: The function S α is a shrinkage operator, which has the form as follows: Based on the above analysis, the detailed process of the proposed robust T-S fuzzy model identification approach with FCRM and L1-norm loss function is presented as follows: Step 1: Data preprocessing. In order to make sure sample features are on a similar scale. The min-max Normalization VOLUME 8, 2020 Step 2: Identification of antecedent parameters. Partition the input-output space of the T-S fuzzy model and calculate the fuzzy membership matrix based on the FCRM clustering algorithm. Identify the antecedent parameters according to Eqs. (10)-(11).
Step 5: The consequent parameters vector θ can be obtained.

IV. EXPERIMENTS AND DISCUSSION
In this section, regression and binary classification applications are used to verify the performance of the proposed RTS-L1 compared with the traditional T-S fuzzy model algorithm (T-S), extreme learning machine (ELM), regularized extreme learning machine (RELM) [51] and weighted T-S fuzzy model (WTS). The outliers and noises are added into training data to test the robustness of the proposed algorithm. It should be pointed out that outliers or noises only are added to training data not testing data. In this paper, the Root Mean Square Error (RMSE) is used as the performance index, which is defined as.
where y k is the k th original system output, andŷ k is the k th model output.

A. ROBUSTNESS EVALUATION VIA REGRESSION APPLICATION
In this part, the proposed RTS-L1 is compared with T-S, ELM, RELM and WTS by using four regression problems: SinC function, nonlinear differential equation, Box-Jekins system and Mackey-Glass chaotic time series. The experiments are carried out in the Matlab 9.2.0.538062 (R2017a)

1) SINC FUNCTION
The regression problem of SinC function is often used to test the approximation ability of algorithm, which has the following mathematical expression [52]: The 200 samples and 10001 samples are generated as training set and testing set respectively, in which x used for training and testing are generated from the uniform disturbance [−10, 10]. The input variable of the fuzzy model is x(k) and the output variable is y(k). The number of fuzzy rules is set to be 10. Firstly the outliers are added into the training set, which is randomly selected from the set {−1, 1} and interval [−1, 1]. In order to test the robustness of the proposed algorithm with the increase of the number of outliers points, the 10%, 20%, 30% and 40% outliers or noises are taken into consideration. Fig. 1 shows the results of regression on testing data using T-S, ELM, RELM, WTS and RTS-L1 on 10%, 20%, 30% and 40% {−1, 1} outliers level. The T-S and ELM are prone to be badly affected by outliers, especially for T-S. The RELM and WTS fit the testing data better than T-S and ELM on low outliers level (10%-30%), but the WTS arises fluctuations on 40% outliers level. The proposed RTS-L1 fits the testing data best than other four algorithms.
The experiment is repeated 30 times and the average training and testing RMSE are given in Table 1. For the purpose of comparing the performance of the five algorithms intuitively, the bar chart of testing RMSE are shown in Fig. 2.
From Table 1 and Fig. 2, the traditional T-S makes no distinction between outliers or noises and normal data, so it   can obtain better training effect, but it fails for the testing process. The ELM and RELM have a better performance than T-S except for the ELM on 40% outliers level. The RELM obtains smaller testing RMSE than ELM due to its regularization term. Although the WTS obtains significant improvement than T-S, ELM and RELM, it doesn't obtain a better result than RELM on 30% and 40% {−1, 1} outliers. However, the proposed RTS-L1 obtains the best result on all outliers level.
Furthermore, the uniform distribution [−1, 1] noises are considered to add into training data for evaluating the noises robustness of algorithms. The experimental results with respect to [−1, 1] noises are shown in Table 2 and Fig. 2.
From Table 2 and Fig. 2, we can find that the T-S, ELM and RELM have obtained similar testing results, but the WTS and RTS-L1 gain significant improvement. However, the proposed RTS-L1 have improved at least one order of magnitude in testing RMSE than WTS.
From these experimental results, it is can be seen that the RTS-L1 have significantly improved the outliers and noises robustness of T-S fuzzy model.

2) NONLINEAR DIFFERENTIAL EQUATION
The nonlinear system is described by a second-order highly nonlinear difference equation [53]: The 500 data points are generated for training process and 500 data points are generated for testing process, where the uniform distribution in the interval [-2,2] for training data and a sinusoidal signal as u(k) = sin(2k/25) for testing data. The number of fuzzy rules is set to be four for T-S, WTS and RTS-L1 algorithms. The fuzzy model has three inputs u(k), y(k-1), y(k-2) and a output y(k). Tables 3 and 4 show the average training and testing RMSE of the five algorithms with 33798 VOLUME 8, 2020     Fig. 3.

3) BOX-JEKINS SYSTEM
The Box-Jekins gas furnace system data set consisted of 296 data pairs describes the nonlinear relationship between the system input gas flow rate and the system output CO2 concentration [54]. The first 204 data pairs are considered as training set and the remaining data as testing set. The variable y(k) is taken a fuzzy model output and the five variables u(k), u(k-1), y(k-1), y(k-2) and y(k-3) are taken as fuzzy model input. The different proportion of {−1, 1} outliers and [−1, 1] noises are added to training set and the corresponding experimental results are shown in Tables 5-6 and Fig. 4. Table 5 provides relevant comparison of training and testing RMSE of different algorithms on Box-Jekins system data with {−1, 1} outliers. The ELM and RELM which have the similar testing RMSE are better than T-S except for 10% outliers. The result of WTS is slightly better than T-S, ELM and RELM on 10% and 20% outliers, but worse on 30% and 40% outliers.  Table 6 provides relevant comparison of training and testing RMSE of different algorithms on Box-Jekins system data FIGURE 7. Training and testing outputs of five algorithms (contamination rate: 10%, 20%, 30% and 40%). VOLUME 8, 2020 with [−1, 1] outliers. The T-S, ELM and RELM obtained a similar result. And the T-S also only obtained the better result on low outliers level. However, the RTS-L1 obtain the best result on 10%-40% outliers and noises level. It is superior to other methods listed in the table.

4) MACKEY-GLASS CHAOTIC TIME SERIES
As one of the famous time series predictive problems, the Mackey-Glass chaotic time series has been successfully and widely used to test the learning and generalization ability of different algorithms, which has the form as follows [55]: There are 1000 input-output data pairs generated by the Eq. (32), where the first 500 data points are taken as the training set and the remaining 500 data points are taken as the testing set. The past four values x(t-18), x(t-12), x(t-6) and x(t) are used to predictive the future value x(t+6) in the fuzzy model. All experiments are repeated 30 times and the average training RMSE and testing RMSE are shown in Tables 7-8 Table 7 shows that the WTS and RTS-L1 obtain smaller testing RMSE than T-S, ELM and RELM. The WTS and RTS-L1 have similar results on 10%, 20% and 30% outlier lever, but RTS-L1 significantly performs better than WTS when the outliers contamination rate is high (40%). From Table 8, the proposed RTS-L1 outperforms other algorithms on all noises level, especially in high noise level (20%, 30% and 40%). The bar chart of the testing RMSE of five algorithms is represented in Fig. 5.

B. ROBUSTNESS EVALUATION VIA BINARY CLASSIFICATION APPLICATION
The experimental results analysis of the proposed method for regression applications have been done above. In this part, a binary classification application is employed to verify the robustness of the proposed method. The Breast Cancer Wisconsin (Original) Data Set is used, which is consisted of 699 samples and each sample has 10 attributes and 2 possible classes: benign or malignant. In this data set, the benign is denoted by 1 and malignant is denoted by −1. Different from the regression applications, the outliers are added into training data set by changing the class of training samples. The different proportions of outliers are added into training data and the training and testing classification accuracy of five algorithms are shown in Table 9. Fig. 6 shows the corresponding classification accuracy of testing data for five algorithms.
From Table 9 and Fig. 6, the proposed RTS-L1 method has obtained higher classification accuracy for all outliers lever, but the WTS does not obtain expected results when the contamination rate is 40%. The training and testing outputs of T-S, ELM, RELM, WTS and RTS-L1 on 10%, 20%, 30% and 40% noises are shown in Fig. 7 (a)-(d). As can be seen from Fig. 7, the majority of outputs of RTS-L1 are closely surround the desired output, but the outputs of other algorithms are more dispersed, especially for original T-S. Therefore the RTS-L1 shows more advantages than other algorithms.
The comparison experiments on regression problems and binary classification problem including of SinC function, nonlinear differential equation, Box-Jekins system, Mackey-Glass chaotic time series and Breast Cancer Wisconsin (Original) Data Set have been conducted. The T-S, ELM, RELM and WTS all use the L2-norm loss function as their objective function, but the L2-norm loss function is very sensitive to outliers or noises. The L1-norm loss function used in this paper is more robust to outliers or noises and the experiment results have demonstrated. The experimental results have shown that the proposed RTS-L1 is superior to T-S, ELM, RELM and WTS in most cases and show strong robustness for data with outliers and noises.
However, compared with T-S, ELM, RELM and WTS, the RTS-L1 has more points located in ''output = −1'' for some positive data in Fig. 7 and the WTS has the similar phenomenon. Although the most of outputs of RTS-L1 closely surround the desired outputs, the misclassification of RTS-L1 can prefer crisp values of {−1, 1} rather than fuzzy outputs. There still have certain limitations when the proposed method is applied in a binary classification application and the L1-norm loss function may be more suitable to deal with the regression problem with outliers or noises. Therefore the robust classification algorithm will be studied in our future work.

V. CONCLUSION AND FUTURE WORKS
In this paper, we have proposed the outliers and noises robust T-S fuzzy model identification method based on FCRM clustering algorithm and L1-norm loss function. The FCRM which is a hyper-plane-shaped clustering algorithm is used to obtain the more reasonable fuzzy space structure. The L1-norm loss function which has been proved to be more robust to outliers and noises than L2-norm loss function is employed in T-S fuzzy model identification instead of the L2-norm loss function. The augmented Lagrange multiplier iteration algorithm is used to solve the L1-norm loss function to guarantee its effectiveness and efficiency. Experimental results on several test problems including four regression applications and a classification application have shown that, compared with traditional T-S, ELM, RELM and WTS, the proposed RTS-L1 show more advantages on all datasets in dealing with data contaminated by outliers and noises.
Although the proposed RTS-L1 is more superior than other algorithms listed in this paper, it still has certain limitations that the misclassification of RTS-L1 can prefer crisp values of {−1, 1} rather than fuzzy outputs when it is applied in a binary classification application. Thus, our future work will focus on the following aspects: (a) A robust classification algorithm will be studied in our future work.
(b) The intelligent optimization algorithm can be used to determine the optimal number of fuzzy rules [56]- [58]. He is currently a Lecturer with the Faculty of mechanical and material engineering, Huaiyin Institute of Technology. His research interests include vibration signal processing, intelligent fault diagnosis, and modeling of power generation system.
XIN XIA received the Ph.D. degree in water conservancy and hydropower engineering from the Huazhong University of Science and Technology, Wuhan, China, in 2015. He is currently a Lecturer with the Faculty of Automation, Huaiyin Institute of Technology, Huai'an, China. His research interests include modeling and fault diagnosis of hydro generator unit and system identification. TIAN PENG received the B.S. degree in information and calculation science from Chongqing University, Chongqing, China, in 2013, and the Ph.D. degree in system analysis and integration from the Huazhong University of Science and Technology, in 2018. She is currently a Lecturer with the College of Automation, Huaiyin Institute of Technology, Huaian, Jiangsu, China. Her main research interests include water resource management and optimization, energy forecasting, intelligence algorithms, and decision support system. LIPING SHI is currently pursuing the M.Sc. degree with the Huaiyin Institute of Technology. Her research interests include control theory, system identification, and modeling of power generation system. CHAOSHUN LI (Member, IEEE) received the B.S. degree in thermal energy and power engineering from Wuhan University, Wuhan, China, in 2005, and the Ph.D. degree in water conservancy and hydropower engineering from HUST, in 2010. He is currently an Associate Professor with the School of Hydropower and Information Engineering, HUST. His research interests include fuzzy application system, fuzzy modeling, and system identification.
JIANZHONG ZHOU was born in Wuhan, China, in December 1959. He received the B.S. degree in automatic control from the Nanjing University of Aeronautics and Astronautics, Nanjing, China, in 1982. He is currently a Professor with the School of Hydropower and Information Engineering, HUST. His research interest includes modeling, control, and operation theory in hydraulic power plants. VOLUME 8, 2020