Takagi-Sugeno Modeling of Incomplete Data for Missing Value Imputation With the Use of Alternate Learning

Missing values often occur in real-world datasets, which undermines the data integrity and reduces the reliability of data mining. In this paper, a method of Takagi-Sugeno (TS) fuzzy modeling for incomplete data is proposed and utilized to estimate missing values. Considering the difference of attribute relationship within different clusters, this method performs regression analysis on the subsets obtained by fuzzy clustering and constructs the global model with the weighted sum of regression models, which describes the relationship between attributes more precisely on the basis of traditional regression imputation. Meanwhile, focusing on the problem of incomplete model input caused by missing values, we propose an alternate learning strategy to train model parameters and imputations, which treats missing values as variables to drive the advance of incomplete data modeling and updates imputations with the adjustment of model parameters. Through the alternate learning strategy, not only the problem of incomplete model input is well solved, but also the accuracy of the model and the performance of imputation are improved together in a collaborative way. Experimental results on several UCI datasets with different missing ratios and missing data mechanisms demonstrate the effectiveness of the proposed method and strategy.


I. INTRODUCTION
In real-world datasets, the problem of missing values is almost inevitable, which is often caused by many factors such as equipment failure, limitation of data collection and human fault in storage [1], [2]. These missing values undermine data integrity and have become a major obstacle in data mining. Therefore, how to deal with missing values is a crucial issue in the analysis of incomplete data. Generally speaking, the simplest way is to discard incomplete records directly and analyze with the remaining complete records, but it only works for datasets with a small number of incomplete records [3]. In practice, the incomplete records usually cannot be overlooked, because if they are discarded directly, it may result in misleading conclusions due to the loss of information. By contrast, missing value imputation is an effective way, which can further improve data quality.
The associate editor coordinating the review of this manuscript and approving it for publication was Malik Jahan Khan .
Missing value imputation aims to replace missing values with reasonable ones derived from present data, and many imputation methods have been proposed to make the results of data mining more effective and valuable [4], [5].
In the past few decades, commonly utilized methods include mean imputation, median imputation, hot-deck imputation, k nearest neighbor (kNN) imputation, class center-based imputation, exception maximization imputation (EMI) and regression-based imputation. The first two imputation methods take mean values and median values of present data in each incomplete attribute as the replacements. Hot-deck imputation method selects a complete record nearest to the current incomplete one and regards its corresponding attribute values as expected imputations [6]. Similar to the hot-deck imputation, the kNN imputation method performs replacement with mean values of corresponding attributes in k nearest neighbors [7], [8]. To improve the imputation accuracy, Song et al. took similarity neighbors into consideration when searching for the nearest neighbors [9]. The class center-based imputation method defines a threshold by the distances between each class center and present values for missing value imputation [10]. EMI is a parametric modelbased imputation method, which is realized through the iteration of E-step and M-step. E-step estimates conditional expectations of missing values and takes them as imputations. M-step calculates parameters that maximize the expectations of log-likelihood function based on the imputed dataset [11]. Considering that instances in the same cluster are very similar to each other, Rahman et al. first made a fuzzy clustering of the dataset for finding similar records, and then applied a fuzzy EM algorithm to impute the missing values [12]. These methods mentioned above are widely adopted, but sometimes with limited imputation performance due to ignoring relationships between attributes [13].
Taking attribute relationships into consideration, the regression-based imputation method establishes several regression models with each missing attribute as the output variable, which has received wide attention [14]- [17] In this method, support vector regression model is trained by complete records before being utilized for imputation, which attempts to make the output values more approximate to their corresponding inputs [23]. Sefidian et al. imputed missing values by a novel grey-based fuzzy c-means, mutual information-based feature selection, and regression model, which achieved good performance of imputation through the construction of a specialized regression model for each cluster [24].
From the above, we can conclude that analysis aiming at each subset rather than the whole dataset is more capable in describing the relationships between attributes during the regression modeling of incomplete data, thus obtaining a better performance of imputation. Hence, the partition-based models have been widely employed in data mining, especially rule-based fuzzy models. As the most concerned rule-based fuzzy model, Takagi-Sugeno (TS) model performs regression analysis on the premise of fuzzy partition and establishes linear regression models aiming at the relationships in each subset [25]- [27]. Due to this feature, TS model can be utilized as a universal approximator to handle nonlinear problems, and has an outstanding performance in working out the relationships than traditional regression model [28], [29]. On the other hand, pre-imputation of missing values is commonly adopted to deal with the problem of incomplete input during modeling. Whereas, the perturbation caused by different preimputed values can easily lead to the variations of model parameters, which has a great influence on the model accuracy. Hence, in the process of incomplete data modeling, the way to handle the incomplete model input deserves great attention.
In this paper, we take TS model for modeling incomplete data and realize missing value imputation in collaboration with the dynamic modeling. Aiming to describe the relationship between attributes more precisely, the method first divides the dataset into several subsets and performs regression analysis for each subset, followed by constructing the global model with the weighted sum of regression models, which improves the model accuracy on the basis of traditional regression modeling. Meanwhile, owing to the problem of incomplete model input caused by missing values, an alternative learning strategy used for training the parameters of incomplete data-based model together with imputations is presented. In this strategy, missing values are treated as the variable to drive the advance of incomplete data modeling, and imputations are promoted to update dynamically together with the adjustment of model parameters. Therefore, a collaborative improvement of model accuracy and imputation performance can be realized additionally as the problem of incomplete model input is resolved.
The rest of this paper is organized as follows. Section II introduces the basic structure of TS model. Section III describes the proposed method of incomplete data modeling and missing value imputation. Meanwhile, the alternate learning strategy that treats missing values as variables is carried out to train model parameters together with imputations. Section IV demonstrates the effectiveness of the proposed method by several UCI datasets with different missing ratios and missing data mechanisms. Section V concludes the paper.

II. TAKAGI-SUGENO FUZZY MODEL
The Takagi-Sugeno fuzzy model was proposed by Takagi and Sugeno in 1985, whose basic idea is to divide the nonlinear problem into several linear sub-problems and describe them separately with ''IF-THEN'' rules [30]. It obtains the premise parts by fuzzy partition, and then linear regression models are established as corresponding consequence parts to describe the relationships between input-output variables. Given a dataset composed of N records in s-dimensional real space, i.e. X = {x 1 , · · · , x N } and x k = [x k1 , · · · , x ks ] for , and · · · , and x ks is where R (i) is the ith fuzzy rule for i = 1, · · · , c and c is the number of fuzzy rules; x k1 , · · · , x k(j−1) , x k(j+1) , · · · , x ks are the input variables of R (i) , A represent their corresponding fuzzy sets, also known as premise parameters;x (i) kj represents the output of R (i) , and θ The global output of TS model is the weighted summation of each rule output, as shown in (2): where

III. TS MODELING, MISSING VALUE IMPUTATION, AND ALTERNATE LEARNING
Given a dataset, the attribute relationships among each record are diverse, which can be similar or different. Therefore, it is more appropriate to divide the dataset into several subsets and carry out regression analysis for each subset separately.
To make a more precise analysis for incomplete data and thus obtain more reasonable imputations, a dynamic TS modelingbased method on the premise of fuzzy partition is proposed in this section. Meanwhile, focusing on the problem of incomplete model input caused by missing values, we present an alternate learning strategy that regards those missing values as variables to promote the training of incomplete model. Through this strategy, the model parameters can be adjusted with the feedback of those updated variables, and those variables can also be more adapted to the model with parameter adjustment. The framework of the proposed method is shown in Fig. 1, where x (i) k represents the kth sample of the ith cluster. As shown in Fig. 1, given an incomplete dataset, the method first divides it into c subsets by a fuzzy clustering algorithm, and thus realizes premise parameter identification along with the fuzzy partition. After premise parameters of the incomplete data model are determined, consequence parameter identification can be worked out through the training of alternate learning strategy, and missing value imputation can also be achieved accompanying with the training. In this process, consequence parameters are adjusted with the updating of imputations, and imputations are updated in turn with the adjustment of consequence parameters. Through repeated adjustments and updates, consequence parameters and imputations are learned alternately and tend to be practical, which means that the problem of incomplete model input caused by missing values is resolved effectively.

A. TS MODELING OF INCOMPLETE DATA
Similar to complete data-based TS modeling, the realization for incomplete data modeling can also be divided into premise parameter identification and consequence parameter identification. To make a more precise analysis, we add variable selection to regression modeling of each subset after fuzzy partition. Considering the occurrence of missing values, premise parameters are obtained through fuzzy C-means clustering with partial distance strategy (FCM-PDS) and consequence parameters are estimated using the least square method by treating missing values as variables. Besides, a stepwise regression algorithm is utilized for variable selection to describe the regression relationships between attributes in each subset.

1) PREMISE PARAMETER IDENTIFICATION
FCM-PDS [31] algorithm divides the incomplete dataset into c subsets by minimizing the objective function:  where u A (i) (x k ) represents the membership degree of x k belonging to fuzzy set ; m represents the fuzzification parameter, m ∈ (1, ∞); d ik represents partial distance from the kth record to the ith cluster center, which is calculated by where the value for x kj is missing} represent the set of present values and the set of missing values; v ij represents the jth attribute of the ith cluster center. After the fuzzy partition, premise parameters composed of the membership degree u A (i) j (x kj ) can be obtained by projecting u A (i) (x k ) onto each axis of the input variable [32]. In this paper, we use the Gaussian function, given by where a ij represents the center and σ ij represents the standard deviation, 2

) INPUT VARIABLE SELECTION
Stepwise regression algorithm is designed to introduce the variables with significant impact on the output into regression model one by one, which can make the established model contain only all significant variables [33]- [35]. Therefore, we use it for selecting input variables of each regression model in fuzzy rules, so as to improve the model precision while reducing complexity.

3) CONSEQUENCE PARAMETER IDENTIFICATION
The least square method has been widely utilized in the parameter calculation of nonlinear regression model, as it obtains the optimal fitting function by minimizing the sum of the squared errors. However, the method fails to estimate when the dataset occurs missing values, thereby we propose to treat missing values as variables for estimation. Subsequently, those estimated parameters are trained to be more appropriate through an alternate learning strategy. The detailed realization steps are shown in Section III.B.

B. TS MODELING-BASED MISSING VALUE IMPUTATION 1) THE OVERALL STRUCTURE FOR IMPUTATION
In this paper, several incomplete data-based TS models with multiple-input-single-output (MISO) structure are established considering that not only one attribute suffers from missing values in general and a multiple-input-multipleoutput (MIMO) problem can be divided into several MISO problems. In each model, one incomplete attribute is taken as output variable and the other attributes are selected as input variables for modeling. For example, taking the jth incomplete attribute as output variable, the process of modelingbased imputation is shown in Fig. 2, where x (i) k * (i = 1, · · · , c) represents the vector of input variables in R (i) , A (i) * represents the set of corresponding premise parameters, and θ (i) represents the vector of consequence parameters.
As depicted in Fig. 2, establishing an incomplete data model by regarding the jth incomplete attribute as the output variable, where input variables are selected from the other attributes by stepwise regression algorithm. To obtain the imputations represented by corresponding model outputs, the method first assigns a value to each variable in the input vector and then inputs this reconstructed set to the established model for calculation.

2) ALTERNATE LEARNING OF MODEL PARAMETERS AND IMPUTATIONS
After identifying the model parameters and obtaining the input variables for each TS model, we can calculate the output values of these model through these estimated parameters. However, since the data integrity is undermined by missing values, the incomplete data-based TS model usually cannot describe the regression relationships between attributes in each subset well, and then the output values derived from the model are not so appropriate as imputations. To improve the precision of incomplete data model, and thereby enhance the appropriateness of imputations derived from the model, an alternate learning strategy is proposed in this section for training the parameters together with imputations. The strategy is shown in Fig. 3, whereX P represents the set of reconstructed values corresponding to present ones in X P , X M represents that of imputations corresponding to missing values in X M , ε represents the threshold for terminating iterations, and f ε = |f represents that in previous one. The f ε in each iteration is calculated usingX P and X P by (8): where |X P | represents the number of values in X P . As shown in Fig. 3, by treating missing values as variables, the model parameters and model output values under these parameters can be estimated easily by the method in Section III.A. However, instead of finishing the imputation task after replacing missing values with corresponding estimated values, the strategy is able to decide whether this reconstructed dataset should be output according to the change of reconstruction error f ε . If the error changes within a limited range compared with the previous one, i.e. f e = |f (l) e − f (l−1) e | ≤ ε, it means that the fitting ability of this incomplete data model will be no longer changed, and thereby the imputation task can be accomplished with the output of this reconstructed dataset. Otherwise, the dataset should give feedback to the model for the parameter adjustment, so as to update the fitting model and its corresponding output values. In turn, new imputations and reconstruction error can be obtained in response to the model adjustment. Through the alternate learning of model parameters and imputations, the reconstruction error tends to be stable and the imputation task can be finally accomplished with the output of the updated dataset.
In summary, given an incomplete dataset, the alternate learning strategy can be realized through the following steps.
Step1: initialize missing values in X M ; Step2: estimate model parameters based on the reconstructed dataset; Step3: update imputations according to the estimated parameters; Step4: evaluate the change of reconstruction error. If it is greater than the given threshold, return to Step2, and the current incomplete data model needs to be adjusted according to the updated dataset. Otherwise, end the training and output the updated dataset.

IV. EXPERIMENTS A. EXPERIMENTAL SETUP 1) DATASETS
In this subsection, we select 12 complete benchmark datasets from the UCI Machine Learning Repository [?] for experiments. Their brief descriptions are shown in Table 1. The UCI database is a repository released by the University of California, Irvine. It currently maintains 497 data sets as a service to the machine learning community, which has become a popular database for researchers. To observe the imputation performance of the proposed method under different missing scales, we set 10 missing ratios uniformly in the range of 5% to 50% for each benchmark dataset. Under the constraint of missing ratios, some attribute values are deleted from each benchmark dataset based on Missing Completely At Random (MCAR) and Missing Not At Random (MNAR) mechanisms while keeping the dimension of attributes and the number of records unchanged. In the MCAR mechanism, values are removed uniformly at random, and in the MNAR mechanism, only values higher than the median of the attribute can be removed randomly. These two mechanisms are performed in turn to produce missing data, that is, incomplete datasets under the missing ratios 5%, 15%, 25%, 35% and 45% are generated based on MCAR, and those under missing ratios 10%, 20%, 30%, 40% and 50% are produced based on MNAR. Hence, there are 10 combinations (2 mechanisms of missingness and 5 missing ratios for each mechanism) of missing types. Moreover, 10 incomplete datasets are generated randomly under one combination for each benchmark dataset, which means that a total of 1200 (12 * 10 * 5 * 2) incomplete datasets are utilized for experiments.

2) EVALUATION CRITERION
In this paper, we take root mean square error (RMSE) which is calculated by  (10) to evaluate the performance of imputation, wherex kj ∈X M represents the imputation for a missing value x kj , and r kj represents its corresponding actual value.

3) COMPARISON METHODS
In order to verify the effectiveness of clustering-based TS model in missing value imputation and the feasibility of alternate learning strategy in modeling with incomplete data, the following nine comparison methods are designed to carry out experiments.
(1) The k nearest neighbor imputation (KNNI). Select k nearest neighbors for each incomplete record, then impute missing values with the mean values of corresponding attributes in nearest neighbors [7]. (2) Exception maximization imputation (EMI). Take the iteration of E-step and M-step to estimate missing values and calculate parameters [11].

(3) Fuzzy exception maximization imputation (FEMI).
Make a fuzzy clustering of the dataset and then perform the EMI algorithm in each cluster [12].  In TSI, TSIf and TSIf-AL methods, fuzzy rules are generated separately for each dataset by means of TS modeling. Generally, the generation of fuzzy rules is equivalent to the construction of TS model, which contains three steps: premise parameter identification, input variable selection, and consequence parameter identification. For each incomplete dataset, these three steps are conducted to generate fuzzy rules automatically.

B. EXPERIMENTAL RESULTS
In order to make the conclusion more reliable, the average of imputation performance for all comparison methods obtained from the ten incomplete datasets with the same missing ratio, benchmark dataset and missingness mechanism are taken as a set of results. In other words, a benchmark dataset corresponds to only one group of results under the constraint of each combination of missing ratios and mechanisms, as shown in Tables 2 to 13. Moreover, entries in boldface are obviously better than all the other entries in the same column. After obtaining the experimental results for all the methods, we adopt t-test with significance level p = 0.05 to determine whether the two results in the same column of the table are significantly different from each other. The minimum result will be underlined only when it is significantly different from all the other results. Based on the distribution of bolded results, we can evaluate the imputation performance from the perspective of statistical significance tests.

C. RESULT ANALYSIS
By observing the experimental results in Tables 2 to 13, it is clear that TSIf-AL significantly has the most optimal results, which indicates that the imputation performance of TSIf-AL is better than the rest methods. Furthermore, we can also find that the RMSE values and MAPE values obtained from REGI and TSI are larger than those obtained from the other regression-based methods. This phenomenon indicates that making full use of present values for incomplete data modeling can enable the relationships between attributes to be described more effectively, and thus the imputation performance can also be enhanced correspondingly. In the following analysis, we discuss the superiority of clusteringbased modeling compared to overall modeling, the advantage of utilizing alternate learning strategy by those methods making full use of present values, and the comparison between TSIf-AL and non-regression-based methods.

1) THE EFFECT OF CLUSTERING-BASED MODELING ON IMPUTATION PERFORMANCE
As illustrated in Tables 2 to 13, there are      and REGIf performs better in the remaining 42 combinations. For another set of comparisons between TSIf-AL and REGIf-AL, a similar pattern can be drawn that the former has better performance in most cases. According to the above descriptions, analyzing with TS model is more effective in missing value imputation than that with traditional regression model. The primary reason why TS modeling-based methods can achieve better performance is that TS model is realized on the premise of fuzzy partition, which considers the differences    of attribute relationships between subsets while carrying out regression analysis. Besides, TS model is a nonlinear model in essence, which obtains the model output by weighting and summing those values derived from each fuzzy rule. In general, these fuzzy rules can reflect the local characteristics of incomplete data, and thus mine the distribution of VOLUME 8, 2020 association among attributes in different partitions to some extent. Therefore, it is more capable of data estimation than traditional regression model, and thus performing better in missing value imputation.

2) THE EFFECT OF ALTERNATE LEARNING ON IMPUTATION PERFORMANCE
According to the RMSE values in Tables 2 to 13, TSIf-AL performs better than TSIf, and the performance of REGIf-AL is also superior to the REGIf, which indicates that the estimated values derived from the incomplete data model are more approximate to their actual values after using the alternate learning strategy. Taking the Abalone datasets with different missing ratios as examples, when applying the proposed strategy, the accuracies of regression modeling-based imputation are increased by more than 15% in most cases, and the performance of TS modeling-based imputation also has a further improvement. Therefore, the feasibility and effectiveness of alternate learning strategy can be verified based on those descriptions.
The feasibility and effectiveness lie in the following two aspects. On the one hand, model output values are able to approximate to their actual values with the adjustment of model parameters, which means that imputations can be more reasonable with the optimization of incomplete data modeling. On the other hand, model parameters can reflect the real attribute relationships with the development of data quality, and thus further enhancing the reliability of those imputations. In summary, the accuracy of incomplete data modeling and the effectiveness of missing value imputation can be enhanced in a collaborative way.

3) THE COMPARISON BETWEEN TSIF-AL AND NON-REGRESSION-BASED METHODS
Comparing the imputation performance of TSIf-AL, KNNI, EMI and FEMI in Table 2 to 13, it can be seen that TSIf-AL has a higher imputation accuracy in most cases. In a total of 240 set of results, TSIf-AL outperforms the other methods in 197 sets. Moreover, TSIf-AL has an even better imputation results when the missing ratio of the incomplete data is large. Specifically, the results of TSIf-AL are totally better than those of KNNI, EMI and FEMI except for the Iris dataset, the Abalone dataset and the Dow dataset when the missing ratio is not less than 35%, and TSIf-AL has a better imputation performance in 92 out of 96 sets. The above analysis shows that TSIf-AL can impute missing values in an effective way. Furthermore, taking into account the relationship among attributes may contribute to the imputation accuracy, especially in the case of high missing ratios.

D. FURTHER EVALUATION 1) CONVERGENCE OF ALTERNATE LEARNING
Convergence is one of the key concerns for iterative algorithms. In this subsection, the convergence of alternate learning strategy is verified using present values due to the consideration that only those values are available in realworld datasets. Taking the Segment datasets with different missing ratios as an example, the RMSE values obtained in each iteration are shown in Fig. 4. As shown in Fig. 4, all the curves present the same trend in general, which drops rapidly at the beginning and then tends to be stable. Specifically, the RMSE sharply goes to a small value within the first 3 iterations for each missing ratio, then the convergence rate decreases gradually and remains unchanged. Therefore, it can be easily concluded that the alternate learning strategy has a fast convergence speed and good stability.

2) TIME EFFICIENCY
The average execution time for 10 datasets (2 mechanisms and 5 different incomplete datasets per mechanism) for each benchmark dataset is shown in Fig. 5. In order to make the results more reliable, we use the same machine to carry out the experiments. As shown in Fig. 5, TSIf-AL generally takes more time than KNNI, EMI and FEMI in order to pay the cost of apparently better imputation accuracy. We can also find that in respect of Wireless and Abalone datasets, the time consumption of KNN is obviously larger than the other methods due to the increase in data size, and for Segment dataset, the gap of execution time between TSIf-AL and FEMI is not obvious. These results indicate that the problem of time consumption for TSIf-AL can be neglected to some extent when the size of data gets larger, and the reason lies in that TSIf-AL can obtain the ideal performance of imputation compared with the other methods.
Next, we analyze the time complexity from the perspective of theoretical analysis. TSIf-AL is realized by premise parameters identification, input variables selection and the alternate learning of model parameters with imputations. Let N , c, s and l represent the numbers of records, clusters, attributes and iterations for alternate learning respectively. Since we take FCM-PDS algorithm with the complexity of O(Nc 2 s) to identify the premise parameters for all the TS models and utilize stepwise regression algorithm which has the complexity of O(Ns 2 ) to select the input variables for each TS model, the complexity of the above step can be described as O(Nc 2 s + Ns 3 ). In each iteration of alternate learning, the consequence parameters of each TS model are obtained through the least square method with the complexity of O(c 3 s 3 ). Therefore, the complexity of TSIf-AL is O(Nc 2 s + Ns 3 + lc 3 s 4 ). Generally, l, c are chosen to be numbers significantly smaller than N [12], and thus the complexity of TSIf-AL can be simplified to O(Ns 3 + s 4 ). Additionally, the complexity of KNNI, EMI and FEMI are O(Ns), O(Ns 2 + s 3 ) and O(Ns 2 + s 3 ), respectively. Although TSIf-AL needs higher computation time compared with the other methods, imputation accuracy generally has a higher priority in the imputation of missing values especially when the difference of time computation is not obvious.

V. CONCLUSION
Taking the differences in regression relationships among subsets into consideration, we propose a method of incomplete data modeling based on TS model for imputing missing values. The method performs regression analysis on each subset obtained by fuzzy clustering algorithm and takes the weighted sum of the regression models to build the global model, which has higher precision and better imputation performance than traditional regression model. Meanwhile, concentrating on the problem of incomplete model input caused by data corruption, this paper carries out an alternate learning strategy for training model parameters together with imputations, in which missing values are treated as variables to promote training. Through this strategy, a collaborative improvement of model accuracy and imputation accuracy can be realized additionally as the problem of incomplete model input is resolved. Experiments on 12 UCI datasets with different missing ratios and mechanisms demonstrate that precise TS modeling with the consideration of differences among subsets is capable of missing value imputation, which derives more appropriate estimation values than the traditional regression model. Furthermore, the effectiveness of incomplete data modeling is enhanced by engaging all the present values in modeling, and the performance of missing value imputation is further improved when training with alternate learning strategy.
In addition, the variation of RMSE values in alternate learning process indicates the ideal convergence of TSIf-AL, and the comparison among TSIf-AL with KNNI, EMI and FEMI on time complexity shows that TSIf-AL requires higher computation time, but obtains the obviously better imputation performance. From the perspective of execution time, the gaps in time consumption between the proposed method and comparison methods are not obvious when the size of dataset gets large, and can be neglected to some extend since imputation accuracy generally has a higher priority in the imputation of missing values.
XIAOCHEN LAI received the B.S., M.S., and Ph.D. degrees from the Dalian University of Technology, Dalian, China, in 1999, 2003, and 2016, respectively. He is currently working as an Associate Professor with the School of Technology, Dalian. His major research directions involve deep learning, data modeling, control theory and engineering, body sensor networks, and wireless sensor networks.
LIYONG ZHANG received the B.S. degree in automation, the M.S. degree in control theory and control engineering, and the Ph.D. degree in control theory and control engineering from the Dalian University of Technology, Dalian, China, in 1999, 2002, and 2018, respectively. He is currently working as a Lecturer with the School of Control Science and Engineering, Dalian University of Technology. He has published more than 50 research articles, and has five Chinese invention patents issued. He is a coauthor of three books. His current research interests include data modeling, fuzzy clustering, feature selection, and granular computing.
XIN LIU received the M.S. degree from the Dalian University of Technology, in 2019. She is currently pursuing the Ph.D. degree with Central South University. Her main research topic is devoted to data mining.