Common and Special Knowledge-Driven TSK Fuzzy System and Its Modeling and Application for Epileptic EEG Signals Recognition

Takagi-Sugeno-Kang (TSK) fuzzy systems are well known for their good balances between approximation accuracy and interpretability. Among a wide variety of existing TSK fuzzy systems, most of them are driven by special knowledge since the learned parameters of each fuzzy rule are totally different. However, common knowledge is equally important and useful in practice and hence a TSK fuzzy system embedded with common knowledge should be more intuitive and interpretable when tackling with real-world problems. In this paper, we propose a common and special knowledge-driven TSK fuzzy system (CSK-TSK-FS), in which the parameters corresponding to each feature in then-parts of fuzzy rules always keep invariant and these parameters are viewed as common knowledge. As for its modeling, except the gradient descent techniques and other existing training algorithms, we can obtain a trained CSK-TSK-FS from a trained GMM or a trained FLNN because the proposed fuzzy system CSK-TSK-FS is mathematically equivalent to a special GMM and a FLNN. CSK-TSK-FS has three characteristics: (1) with the classical centroid defuzzification strategy, the involved common knowledge can be separated from fuzzy rules such that the interpretability of CSK-TSK-FS can be enhanced; (2) it can be trained quickly by the proposed LLM-based training algorithm; (3) the equivalence relationships among CSK-TSK-FS, GMM and FLNN allow them to share some commonality in training such that the proposed LLM-based training algorithm provides a novel fast training tool for training GMM and FLNN. Experimental results on UCI, KEEL and epileptic EEG datasets demonstrate the promising classification of CSK-TSK-FS.


I. INTRODUCTION
Epilepsy is a finite episode of brain dysfunction caused by abnormal discharge of cerebral neurons.With regards to the clinical diagnosis of epilepsy, electroencephalogram (EEG) signals are often employed to decide its presence and type [3].Many machine learning approaches including SVM, The associate editor coordinating the review of this manuscript and approving it for publication was Yongqiang Cheng.fuzzy systems, KNN, decision trees [1]- [3], [39] have been developed and successfully used for epileptic EEG signals recognition.Among these machine learning approaches, the Takagi-Sugeno-Kang (TSK) fuzzy system is a fuzzy rulebased inference system [1]- [3], which have been most used for EEG signals recognition and other applications [46]- [48] because of its strong approximation capability and good interpretability.Generally speaking, a TSK fuzzy system, e.g., zero-order-TSK [4] or one-order-TSK [4] can be taken as a knowledge-driven model in which the knowledge is scattered in each fuzzy rule.Undoubtedly, the knowledge is the cornerstone of strong approximation capability and good interpretability of TSK fuzzy systems.More specifically, we can consider the knowledge as the parameters learned in each fuzzy rule and hence the types of knowledge are decided by the way the parameters presented in fuzzy rules.If a TSK fuzzy system is considered as an expert system, then each fuzzy rule can be taken as an expert with different/special knowledge.Based on the special knowledge, for a problem that is as the input, the expert system can output a decision result effectively in most cases.However, although the special knowledge is effective in driving an expert system, the common knowledge between experts sometimes is also useful for the deduction of an expert system.In clinical diagnosis, the common knowledge between medical experts can help them make a more accurate clinical diagnosis decision.For instance, the common knowledge ''diabetic eye disease is most likely caused by retinopathy'' can help medical experts earn reputations in the clinical diagnosis of eye disease.Therefore, from the perspective of the application, constructing a TSK fuzzy system with special knowledge associating with common knowledge is very significant.
As we stated before, knowledge is represented by parameters learned in each fuzzy rule.That is to say, the difference between a TSK fuzzy system only with special knowledge and a TSK fuzzy system with both special and common knowledge is their different combination modes of input features.In [5], the authors carry out the existing simple regression models on about 60 real-world datasets, the conclusion hints us that the mode of knowledge presentation (the combination mode of input features) in a simple regression model can be flexible and varied.
Therefore, in this paper, inspired by the conclusion in [5] and considering the requirements of the application in realworld, we re-design the mode of knowledge presented in the classical one-order TSK fuzzy system and propose a novel TSK fuzzy system, termed as CSK-TSK-FS that is driven by common and special knowledge.Our CSK-TSK-FS is different from the classical one-order TSK fuzzy system.As for the classical one-order TSK fuzzy system, learned parameters in each fuzzy rule are special, and hence, we also consider one-order TSK fuzzy systems as special knowledgedriven fuzzy systems.But for the proposed fuzzy system CSK-TSK-FS, except for special knowledge, common knowledge is also embedded as realized by the parameters involved in one-order parts of then-parts of fuzzy rules.In other words, parameters involved in one-order parts always keep invariant for all fuzzy rules.
With the embedded common knowledge, CSK-TSK-FS becomes more interpretable as a result having its fuzzy rules shortened implicitly.More importantly, its modeling is no longer limited by traditional algorithms as the proposed fuzzy system CSK-TSK-FS is mathematically equivalent to a special Gaussian mixture model (GMM) [6] and a functional link neural network (FLNN) [7] such that the algorithms of modeling GMM and FLNN can also be transferred to CSK-TSK-FS.
The contributions of this paper can be summarized as the following three aspects: 1) A novel TSK fuzzy system embedded with common knowledge and special knowledge is proposed.Compared with the classical one-order TSK fuzzy system, the proposed one is more interpretable because the common knowledge in the then-parts of fuzzy rules can implicitly shorten the length of fuzzy rules, at least to a certain extent.Besides, the performance of the proposed TSK fuzzy system can be guaranteed by the conclusion deduced in [5] that the combination mode of input features in a simple regression model is flexible.2) We reveal a relationship between the proposed fuzzy system CSK-TSK-FS and GMM with a certain constraint.Thus, from a trained GMM, we can obtain the proposed fuzzy system.In other words, we find a new training algorithm for the proposed fuzzy system.In addition, we also find that the Gaussian FLNN is mathematically equivalent to the proposed fuzzy system, so through the proposed fuzzy system, we establish a relationship between GMM and FLNN, and accordingly extend their training algorithms, respectively.3) An LLM-based fast learning algorithm for CSK-TSK-FS is proposed.In other words, we also develop a new learning algorithm for GMM and FLNN because of their equivalence relations.The following sections are organized as: section II gives some preliminaries; Section III gives the detail information about CSK-TSK-FS and reveals the relationship between it and GMM and FLNN; Section IV reports the experimental results and section V concludes our works.

II. PRELIMINARY
Since the proposed fuzzy system CSK-TSK-FS can be viewed as a special GMM [6] and a Gaussian FLNN [7], we prepare some preliminaries about the TSK fuzzy system, GMM and FLNN in this section.

A. TSK FUZZY SYSTEM
TSK fuzzy systems are fuzzy rule-driven inference systems in which the most commonly used fuzzy rules, e.g., the kth, can formally consist of their respective if-parts and then-parts, i.e., where k i is a fuzzy subset subscribed by feature ω i involved in the feature space denoted as ω = [ω 1 , ω 2 , . . ., ω d ] T , K is the total number of fuzzy rules, and notation ∧ represents a fuzzy conjunction operator.Each fuzzy rule is premised on the feature space and maps fuzzy sets Usually, the defuzzification process is achieved by a straightforward weighted summation.Therefore, the output γ o for a potentially new sample ω can be formulated as In ( 2), ν k (ω) and νk (ω) denote the compatibility and the normalized compatibility of ω associating with the fuzzy set k of the kth fuzzy rule, respectively, which can be computed as The Gaussian membership function is often considered as the fuzzy membership function used in (3) which can be formulated as where in each rule represent the kernel center vector and the kernel width vector needed to be learned in the if-parts.Fig. 1 shows the framework of a knowledge-driven TSK fuzzy system in which the parts in the shaded rectangle can be considered as knowledge.Generally speaking, parameters involved in each fuzzy rule are different from those of others, hence we call parameters involved in each fuzzy rule special knowledge and accordingly the classical TSK fuzzy system shown in Fig. 1 is a special knowledge-driven TSK fuzzy system.However, as the application scenarios we stated in the first section, a special knowledge-driven TSK fuzzy system sometimes cannot solve a real-world problem interpretably.
Usually, the learning process of the classic TSK fuzzy system can be divided into two parts, the if-parts learning and the then-parts learning.Also, they are often achieved in a separate manner.As for the if-parts learning, clustering algorithms [10]- [13] are usually adopted.For example, by introducing FCM [11], c k i in c k and δ k i in δ k can be computed by where u jk denotes the fuzzy membership degree of ω j belonging to the kth cluster, and h is a scale parameter which can be set manually.As for the then-parts learning, the commonly used optimization strategy is the quadratic programming (QP) with different criteria, e.g., the least square criterion [14], ε-insensitive criterion [15] and L1-Norm penalty [15], ε-insensitive criterion and L2-Norm penalty [16] and so on.
In additional to QP, the gradient descent-based approaches sometimes are used.Whether it is QP, gradient descent, or FCM, they are less efficient in the face of large-scale datasets.Therefore, a highefficiency optimization algorithm is desired in TSK fuzzy system modeling.

B. GAUSSIAN MIXTURE MODEL
GMM (Gaussian mixture model) is one type of mixture distributions where its each component is a normal component.For an arbitrary random variable ω in d dimensional feature space, the Gaussian mixture probability density function (PDF) [6] can be formulated by In (7), C is the total number of components, and κ = [κ 1 , κ 2 , . . ., κ C ] is a weight vector in which each element represents the weight of each component, where 0 normal density of the component c, which can be expressed as where | c | and −1 c are the determinant and inverse of c , respectively.
In [17], the authors demonstrate that radial basis function (RBF) networks are universal approximators.In fact, an RBF network is merely a linear superposition of RBFs, of which Gaussian functions are a particular type.In [18], the authors further prove the ability of RBF networks with superposition of Gaussian functions under the conditions that the functions to be approximated are imposed with some constraints like non-continuity.Combining the above theoretical analysis results together, it is obvious that a GMM can achieve approximation of any probability distribution with arbitrary accuracy if its parameters are set appropriately.

C. FUNCTIONAL LINK NEURAL NETWORK
FLNN (functional link neural network) is a special single layer neural network in which the hidden layer is replaced by higher-order representations of its input features.FLNN overcomes some disadvantages which are usually contained in multilayer networks such as initial weight dependence, weight inference, saturation and overfitting.Moreover, although FLNN is a single layer neural network, it is still able to handle non-linear separable classification tasks.Basically, the architecture of FLNN is a flat network without any hidden layer, which accordingly makes the parameters learning algorithm less complicated.Many simple learning algorithms, e.g., BP [19], artificial bee colony [20], adaptive learning [21], and pseudoinverse [22] have been proposed for FLNN and its variants learning.
Suppose that the Gaussian function is taken as a high order representation of FLNN, Fig. 2(a) illustrates the Gaussian FLNN and the corresponding flat structure is shown in Fig. 2(b).
The output of the Gaussian FLNN can be formulated as

III. CSK-TSK-FS: THE PROPOSED COMMON AND SPECIAL KNOWLEDGE-DRIVEN TSK FUZZY SYSTEM
In this section, we will incorporate common knowledge into the classical one-order TSK fuzzy system and accordingly propose the new common and special knowledge-driven TSK fuzzy system CSK-TSK-FS.Simultaneously, we will mathematically analyze its equivalences between GMM and FLNN.Lastly, we present a fast training algorithm for CSK-TSK-FS.

A. ARCHITECTURE OF CSK-TSK-FS
The structure of the proposed fuzzy system CSK-TSK-FS is illustrated in Fig. 3, where an input sample in the d dimensional feature space is expressed as ω and . ., K are the kernel center vector and the kernel width vector needed to be learned in the if-parts of each fuzzy rule.Comparing Fig. 3 with Fig. 1, the distinctive characteristic of CSK-TSK-FS is that there exists a common part (i.e., the parameters ρ 1 , ρ 2 , . . ., ρ d ) in all fuzzy rules.
Based on the structure illustrated in Fig. 3, the kth fuzzy rule of CSK-TSK-FS can be formulated as In (10), we can see that for each fuzzy rule, the parameters ρ 1 , ρ 2 , . . ., ρ d always keep invariant.We call this common part common knowledge in the proposed fuzzy system CSK-TSK-FS.Therefore, compare with the classical TSK fuzzy system shown in Fig. 1, which is only driven by special knowledge, CSK-TSK-FS is driven by both special and common knowledge and accordingly becomes more suitable and applicable for simulating the application scenarios.Also, we find that, if ρ 1 , ρ 2 , . . ., ρ d in the then-parts are set to zero, CSK-TSK-FS would degenerate into a classical zero-order TSK fuzzy system.Therefore, we can consider our proposed fuzzy system CSK-TSK-FS as a generalized zero-order TSK fuzzy system.In other words, a zero-order TSK fuzzy system can also be considered as a special case of our proposed fuzzy system CSK-TSK-FS.
As we all know that the interpretability of a TSK fuzzy system can be quantitatively measured by the number of parameters the system needs to learn [23], [24].During the training process of CSK-TSK-FS, 2Kd parameters in the ifparts and K + d in the then-parts need to be learned.Hence, the interpretability of CSK-TSK-FS can be quantitatively measured by 2Kd + K + d.As for the classical zero-order TSK fuzzy system, during the learning process of the if-parts, it also needs to learn 2Kd parameters.But during the learning process of the then-parts, it needs to learn Kd + K parameters.In our application scenarios, K and d are two integer numbers and often bigger than 1, therefore, by comparing CSK-TSK-FS with the classical zero-order TSK fuzzy system, the interpretability of CSK-TSK-FS is improved.
When the if-parts of CSK-TSK-FS are determined, and let where νk (ω) has been defined in (3), k = 1, 2, . . ., K .By introducing the classical centroid defuzzification strategy [4], the output of CSK-TSK-FS can be defined as The output defined in (12) indeed reveals a notable merit that the common knowledge can be independent of each fuzzy rule.That is to say, with the classical centroid defuzzification strategy, each fuzzy rule can be implicitly shortened as , . . ., K .Therefore, the interpretability of CSK-TSK-FS can be further enhanced from the perspective of the rule length [49], [50].By comparing the expressions in ( 12) and ( 9), we can easily find that the proposed fuzzy system CSK-TSK-FS is equivalent to FLNN as a matter of fact.On the contrary, FLNN also can be considered as a special fuzzy system such that it no longer works in a black way.To the best of our knowledge, this is the first attempt that we reveal the relationship between fuzzy systems and FLNN.The common knowledge denoted as ρ g contributes the linear approximator ρ T g ω g to the second term of the output of CSK-TSK-FS.Moreover, the relationship between CSK-TSK-FS and GMM will also be theoretically analyzed in the next subsection.

B. FROM GMM TO CSK-TSK-FS
Theoretically, GMM can approximate any probability distribution to arbitrary accuracy [17], [18] such that it can be taken as a high-efficiency approximator.Suppose where i, j = 1, 2, . . ., d. c is a symmetric matrix, hence τ cωγ is equal to τ T cγ ω .Thus, −1 c can be expressed as the following form, It is obvious that τ cωγ in c reveals the correlation degree between ω and γ in component c, and τ cγ γ in c reveals the correlation degree between γ s in component c.In many cases, we are generally uninformative for each component in advance, hence, a mild assumption that τ cωγ /τ cγ γ keeps invariant for each component may be considered.That is to say, τ cωγ /τ cγ γ = = [ 1 , 2 , . . ., d ] T is a constant vector for each component.Therefore, the joint PDF of (ω, γ ) can be approximated by a special GMM approximator, where By the approximator (ω, γ ) trained by χ, the output γ o for an unseen sample ω can be formulated as, 16) can be re-organized as In (17), Component c of the GMM approximator can be formulated as Similarly, like the derivation procedures in the Appendix, the marginal PDF of ω for component c of (ω, γ ) can be formulated as Therefore, the marginal PDF of ω can be deduced as By substituting ( 18) and ( 21) into (17), the expected output γ o for the unseen sample ω can be re-organized as In (22), with the assumption that the each feature in ω is mutually independent, we can express the output γ o as the following form, where After comparing the output γ o in (23) with that in (9) or ( 12), we can easily find that the GMM approximator with the assumption τ cωγ /τ cγ γ = = [ 1 , 2 , . . ., d ] T is equivalent to CSK-TSK-FS (also FLNN) where each component in GMM can be taken as a fuzzy rule in CSK-TSK-FS in which ϒ j ω j ; θ cj , τ cjj is considered as the fuzzy membership function.Here, please note that κ c in GMM should be uniform, i.e., κ c should be set to 1/K .The common knowledge denoted as (23).
Based on the above analysis, the relationship between CSK-TSK-FS and the special GMM approximator indicates the following three results: 1) The training of CSK-TSK-FS can be achieved by using a density estimation algorithm, e.g., EM [30] to train a special Gaussian mixture model.2) With the relationship between CSK-TSK-FS and GMM, CSK-TSK-FS can be interpreted from the perspective of probability statistics.Therefore, some statistical tools for GMM can also be applied to CSK-TSK-FS.For example, we know that the number of fuzzy rules in CSK-TSK-FS is equal to the number of components in GMM, hence, many useful tools for searching the optimal number of components can be used for searching the optimal number of fuzzy rules.
3) The promising approximation ability of CSK-TSK-FS can be insured since GMM is a global approximator.

By giving a training set
. ., N }, the training of CSK-TSK-FS can be achieved by many criteria [14]- [16].For example, with the determined if-parts, we can employ the gradient descent algorithm [26] to minimize the error criterion can be solved by QP-based learning [14]- [16].Although gradient descent-based algorithms and QP-based algorithms are easy to implement, both of them consume many CPU seconds for large-scale datasets.Moreover, clustering techniques used in the if-parts learning also consume many CPU seconds for large-scale datasets.Therefore, a fast training algorithm for CSK-TSK-FS is desired.Since the equivalence relationship between CSK-TSK-FS and GMM and FLNN, some of the training algorithms for GMM and FLNN, e.g., EM [30] for GMM, BP [19] artificial bee colony [20], adaptive learning [21], and pseudoinverse for FLNN can also be used for CSK-TSK-FS.However, in this study, we propose a new fast training algorithm for CSK-TSK-FS, which can be also used for GMM and FLNN.
In [35], the authors demonstrate that the modeling of a TSK fuzzy system can be replaced by modeling a fuzzy neural network.Since CSK-TSK-FS is indeed a TSK fuzzy system, it also can be considered as a fuzzy neural network, see in Fig. 4.
Essentially, CSK-TSK-FS is a special TSK fuzzy system.Our previous work [35] reveals that a TSK fuzzy system can be equivalent to a fuzzy neural network, so CSK-TSK-FS can also be considered as a special neural network shown in Fig. 4, where only one feature is involved in each yellow node of the hidden layer.In [8], [9], authors demonstrate that the optimization of a single-layer feedforward neural network is equivalence to solving a ridge regression problem that can be fast solved by the least learning machine (LLM).Therefore, obviously, the neural network in Fig. 4 can also be fast solved by LLM only by augmenting original input features into the hidden layer.
Different from the BP-like learning algorithms [32]- [34], only parameters in the output layer need to be trained in LLM.Therefore, LLM can achieve fast learning of such a SLFN illustrated in Fig. 4.
Detailed steps of CSK-TSK-FS training are listed in Algorithm 1.

Algorithm 1 Fast Training of CSK-TSK-FS
Step 3: Compute the output weight by using LLM, i.e., where I is an (K + d) by (K + d) identity matrix.
When the weight vector for the output layer is determined, CSK-TSK-FS can make a prediction for an unseen sample.Next, we give some remarks about the fast learning algorithm listed in Algorithm 1.
Remark 1: In this fast training algorithm, the parameters in the if-parts are randomly assigned rather than obtained by clustering techniques.The effectiveness of the randomness strategy has been demonstrated in ELM [35].Comparing with clustering techniques, undoubtedly, the randomness strategy can significantly reduce the CPU seconds consuming.Moreover, with the high-efficiency analytical solution to the then-parts learning in (25), the CPU seconds consuming is also reduced compared with BP-like algorithms in which all parameters in the network need to be iteratively adjusted in a backward gradient descent way.
Remark 2: As all we know that the inverse computation of a matrix becomes very time-consuming when the number of elements is very huge.So, the solution in (25) still becomes out of service for large-scale datasets.However, with the second property in [8] about LLM, the analytical solution of LLM in ( 25) can be re-organized in another form in (26).By the new analytical solution, the time complexity is independent on the number of samples, it now only dependents on the number of fuzzy rules K and the number of features d.For large-scale datasets, K and d are very smaller than N .Hence, the time complexity is significantly reduced.
Remark 3: In ( 26), C is a user-dependent parameter.According to our previous work [35], it is set to a comparatively large value, e.g., 200 in our following experiments.

IV. EXPERIMENTAL RESULTS
In this section, CSK-TSK-FS is mainly evaluated from two aspects: its classification ability on UCI and KEEL datasets and its application for epileptic EEG signals recognition.In addition, in order to highlight the performance of CSK-TSK-FS, several benchmarking approaches including SVM (with the linear kernel and the Gaussian kernel, respectively) [39], FS-FCSVM [24], zero-order-TSK-FS [4], GFS-AdaBoost-C [38], FH-GBML-C [36], [37] and L2-TSK-FS [4] are introduced for comparison studies.
The following experiments are organized as: subsection IV.A gives the experimental setups, subsection IV.B shows the experimental results on UCI and KEEL datasets, and subsection IV.C gives an application for epileptic EEG signals recognition.

A. EXPERIMENTAL SETUP
With regards to the all introduced benchmarking approaches, SVM, FS-FCSVM, zero-order-TSK-FS and L2-TSK-FS are coded in the MATLAB platform, while FH-GBML-C and GFS-AdaBoost-C are provided by the KEEL toolbox [40].L2-TSK-FS, zero-order-TSK-FS, and the proposed approach CSK-TSK-FS are originally designed for regression problems.In our experiments, according to [41], all of them can be trained classification tasks by considering the class labels of the training set as their regression values.For an unseen object, they predict its label as the one which is nearest to their outputs.
We use the default parameters provided by the KEEL toolbox to fix FH-GBML-C and GFS-AdaBoost-C.As for the remnant approaches, 20% validation objects are used to find the optimal parameters by 10-fold CV strategy.Table 1 gives the trial intervals for CV in the corresponding approaches.After we get the optimal parameters of each approach, 70% objects are selected for training and 10% objects are selected for testing.The results are reported in terms of the average testing accuracy (including the corresponding standard deviation) and the maximum testing accuracy for 30 trials.
The experiments are conducted on a personal computer with 4 cores of I5-7200U with 64G Bytes of memory.

B. ON UCI AND KEEL DATASETS
Since UCI [45] and KEEL [40] are two commonly used repositories for verifying machine learning approaches, we select sixteen real-life datasets including binary-class and multi-class to verify the classification performance of CSK-TSK-FS.Table 2 shows the detailed information of the selected datasets, in which some medium scale (i.e., Adult, Magic04) and large scale (i.e., Skin-Segmentation,Kddcup99) datasets are used to observe the CPU seconds consuming of CSK-TSK-FS.
Table 3 reports the classification performance of all approaches in terms of different criteria, i.e., ''Max'' represents the maximum accuracy of 30 trials, ''Rules'' represents the optimal number of fuzzy rules obtained by CV, ''Mean'' represents the average accuracy of 30 trials, and ''Std'' represents the standard deviation.Although GFS-AdaBoost-C is also a fuzzy rule-based classifier, the number of fuzzy rules is not provided by the KEEL toolbox.Therefore, we use ''-'' to represent the number of fuzzy rules in Table 3. Next, we will contrastively analysis the results from the perspectives of classification performance and interpretability.
1) CSK-TSK-FS wins the best average accuracy and the maximum accuracy in 7 and 8 out of the 16 UCI and KEEL datasets.As for some datasets, CSK-TSK-FS performs a little worse than other benchmarking approaches, e.g., Gaussian kernel based SVM on Adult, Sonar, Seismic-bumpsandKddcup99, linear kernel based SVM on Musk, Skin-Segmentation, Bal-anceandPage_blocks.However, we should keep in mind that CSK-TSK-FS is interpretable while SVMs work in a black-box way.Moreover, by comparing CSK-TSK-FS with zero-order-TSK-FC, L2-TSK-FC and FS-FCSVM, we find that CSK-TSK-FS often wins the best classification performance which indicates that the improved generalization capability of CSK-TSK-FS is insured in contrast to the similar approaches.In fact, the promising performance of CSK-TSK-FS on most datasets indicates that it indeed inherits the good approximation ability of GMM. 2) We know that the number of fuzzy rules is relative to the interpretability of a fuzzy system.From Table 3, we can see that, in some cases, the number of fuzzy rules CSK-TSK-FS used is more than that zeroorder-TSK-FC, L2-TSK-FC or FS-FCSVM used.For example, on the dataset Balance, 5 fuzzy rules are identified by CSK-TSK-FS to get its best performance.
As for L2-TSK-FC and FS-FCSVM, both of them need 4 fuzzy rules to get their best performance, respectively.Therefore, it seems that comparing with L2-TSK-FC and FS-FCSVM, the interpretability of CSK-TSK-FS is reduced.However, the interpretability is also high relative to the number of parameters involved in fuzzy rules.As for Balance, L2-TSK-FC and FS-FCSVM need to train 2 × 4 × 4 + (4 + 4 × 4) = 52 parameters, while CSK-TSK-FS needs 2 × 5 × 4 + (5 + 4) = 49 because of the common knowledge being involved.This phenomenon indicates that common knowledge involved in CSK-TSK-FS can improve the interpretability because it shortens the length of fuzzy rules implicitly.In Table 4, we report the CPU each approach consumes during the testing and training procedures.From careful observation from Table 4, we find that CSK-TSK-FS perform more efficient than other benchmarking approaches, especially for medium-scale or large-scale datasets (e.g., Magic04, Adult, Skin-Segmentation, and Kddcup99).
From the experimental results on UCI and KEEL datasets, we can draw the following conclusions.
1) With the common knowledge, CSK-TSK-FS becomes FLNN and hence a single layer fuzzy natural network.Thus, it can be considered as a ridge regression problem that can be fast solved by LLM. 2) With the common knowledge, the length of fuzzy rules can be implicitly reduced such that the interpretability is improved.Moreover, since the common knowledge is not dependent on each fuzzy rule, the consequent of each fuzzy rule can be implicitly shortened as, e.g., φ 1 (ω) = 6.9810 in the first fuzzy rule.Therefore, the interpretability of CSK-TSK-FS is accordingly enhanced comparing with only special knowledge-driven TSK fuzzy systems.
With the trained CSK-TSK-FS, it is very easy for us to present a corresponding FLNN based on the equivalence between them, see in Fig. 6.Conversely, with a trained FLNN, we can also immediately write all fuzzy rules of CSK-TSK-FS.In addition, since the equivalence, FLNN is no longer a black box, it can be interpreted from the perspective of fuzzy rules.
Similarly, from the trained CSK-TSK-FS, we also can deduce the corresponding GMM, see the mean vector and covariance matrix of each component in Table 7.
In Table 7, θ cy in θ c can be calculated by ρ k 0 − θ T cω where c = k and κ c = 1/K .Also, we find that τ cij in c can be obtained from the trained CSK-TSK-FS.With the assumption that τ cωγ /τ cγ γ = = [ 1 , 2 , . . ., d ] T , the values of τ cωγ and τ cγ γ have many choices.That is to say, multiple choices for the equivalent GMM from CSK-TSK-FS can be provided, which is very interesting and will be beneficial for practical requirements.Table 8 gives the training accuracy of all approaches in terms of ''Max'', ''Mean'', ''Std'' and ''Rules''.CSK-TSK-FS wins the best performance in terms of ''Mean'' and ''Rules''.

V. CONCLUSION
In this paper, a novel fuzzy system CSK-TSK-FS driven by special and common knowledge is proposed in which the common knowledge is defined as the common parts among all fuzzy rules while the special knowledge corresponds to difference parts.More specifically, parameters assigned to the one-order part in then-parts always keep invariant.When the classical centroid defuzzification method is adopted, the involved common knowledge can be separated from fuzzy rules such that the interpretability is enhanced and the model complexity is reduced.In addition, for the modeling of CSK-TSK-FS, except for traditional gradient descentbased and QP-based approaches, we demonstrate that CSK-TSK-FS is mathematically equivalent to a special Gaussian mixture model and a functional linked natural network such that it can also be determined from a trained GMM or a trained FLNN.In other words, traditional training algorithms like EM and BP of GMM and FLNN can also be applied to CSK-TSK-FS.Furthermore, since CSK-TSK-FS is a special natural network, we develop a fast LLM-based algorithm for its modeling in this study.That is to say, we also find a new fast training algorithm for GMM and FLNN.In our experiments, UCI and KEEL datasets are first taken to demonstrate the classification ability of CSK-TSK-FS, an application dataset, i.e., the epileptic EEG data is introduced for abnormal signals recognition.
In the future work, we are interested in developing a deep TSK fuzzy system in which common knowledge is embedded among different layers.

FIGURE 1 .
FIGURE 1. Framework of a special knowledge-driven TSK fuzzy system.k = [ k 1 , k 2 , . . ., k d ] from the feature space k ⊂ R d to a linear function or a constant here represented by φ k (ω).Usually, the defuzzification process is achieved by a straightforward weighted summation.Therefore, the output γ o for a potentially new sample ω can be formulated as
2, . . ., N } is a training dataset for a Gaussian mixture model who contains C components.With the training dataset χ, the means (θ 1 , θ 2 , . . ., θ C ) and covariance matrices ( 1 , 2 , . . ., C ) of all components can be obtained by a training algorithm, e.g., EM [30], where θ c = [θ cω , θ cγ ].If we use τ cab and τ cab to denote the elements in c and its inverse −1 c , respectively, then c and −1 c can be expressed as

TABLE 8 .
Each component of GMM based on CSK-TSK-FS.

TABLE 1 .
The trial intervals for CV in the corresponding approaches.

TABLE 2 .
Detailed information of selected UCI and KEEL datasets.

TABLE 3 .
The classification performance on UCI and KEEL datasets.

TABLE 4 .
The CPU seconds each approach consumes on UCI and KEEL datasets.

TABLE 5 .
Detailed information about the epileptic EEG data.

TABLE 6 .
Training results of CSK-TSK-FS on the epileptic EEG data.
FIGURE 5. Original signals in five groups.

TABLE 7 .
Each component of GMM based on CSK-TSK-FS.