An Efficient LS-SVM-Based Method for Fuzzy System Construction

This paper proposes an efficient learning mechanism to build fuzzy rule-based systems through the construction of sparse least-squares support vector machines (LS-SVMs). In addition to the significantly reduced computational complexity in model training, the resultant LS-SVM-based fuzzy system is sparser while offers satisfactory generalization capability over unseen data. It is well known that the LS-SVMs have their computational advantage over conventional SVMs in the model training process; however, the model sparseness is lost, which is the main drawback of LS-SVMs. This is an open problem for the LS-SVMs. To tackle the nonsparseness issue, a new regression alternative to the Lagrangian solution for the LS-SVM is first presented. A novel efficient learning mechanism is then proposed in this paper to extract a sparse set of support vectors for generating fuzzy if-then rules. This novel mechanism works in a stepwise subset selection manner, including a forward expansion phase and a backward exclusion phase in each selection step. The implementation of the algorithm is computationally very efficient due to the introduction of a few key techniques to avoid the matrix inverse operations to accelerate the training process. The computational efficiency is also confirmed by detailed computational complexity analysis. As a result, the proposed approach is not only able to achieve the sparseness of the resultant LS-SVM-based fuzzy systems but significantly reduces the amount of computational effort in model training as well. Three experimental examples are presented to demonstrate the effectiveness and efficiency of the proposed learning mechanism and the sparseness of the obtained LS-SVM-based fuzzy systems, in comparison with other SVM-based learning techniques.

linguistic model interpretable to the users.The key stage in constructing fuzzy systems usually involves the rule extraction and the associated parameter learning.It is desirable to find a sparse set of fuzzy rules, which provides a concise interpretable explanation of the behavior of the system under investigation.As a result, a variety of rule extraction methods have been proposed in the literature, including heuristic, adaptive, evolutionary, and statistical learning methods.
Among various rule extraction methods, the grid partition method was proposed to divide the input space into rectangular subspaces based on a uniform partitioning of each input variable into fuzzy sets [5].To cope with the curse-of-dimensionality issue caused by grid partitioning, various clustering methods were devised for fuzzy rule generation [6]- [8], where the number of fuzzy sets employed for each input variable is equal to the number of fuzzy rules used for the whole fuzzy system.Moreover, rank-revealing methods like SVD-QR and Pivoted QR decomposition [9]- [11] are used to determine the effective rank of the matrix constructed from all the rule premises (i.e., the normalized rule firing strength matrix) according to its singular values.However, these methods only work in the input space; thus, the selected rules may not necessarily be related to the output; therefore, the final model performance may not be as good as expected.Orthogonal least-squares (OLS) is another well-researched method [12], [13], which is also used to perform rule base reduction on both the input and output spaces.It is worth mentioning that the fast recursive algorithm (FRA) developed recently by Li et al. [14] is a useful alternative to OLS, which avoids any matrix decomposition during the subset selection process.The gradient descent and evolutionary optimization are also used in fuzzy rule extraction and parameter learning to find better global solutions [15]- [18], but they are still very time-consuming.Recently, the approach to use the support vector machine (SVM) methodologies to extract support vectors (SVs) for generating IF-THEN rules and thus to describe the fuzzy system in terms of kernel functions has attracted a lot of research interest in the rule extraction and hereby constitutes the main topic of this paper.
SVMs [19] are new techniques that aim to solve pattern classification problems, based on the principle of structural risk minimization instead of mean squared-error minimization, thus minimizing the upper bound on the model's generalization error.Based on this, fuzzy rule extraction incorporating SVM or support vector regression (SVR) has attracted a lot of interest [20]- [23].Chiang and Hao [20] first introduced fuzzy model construction using SVM techniques, where the kernel function in an SVM is related to the fuzzy basis function (FBF) to fuse the two mechanisms into a fuzzy rule-based modeling method.
This work is licensed under a Creative Commons Attribution 3.0 License.For more information, see http://creativecommons.org/licenses/by/3.0/The fuzzy rules are generated using the learning mechanism for extracting SVs, where the number of fuzzy rules is then equal to the number of SVs.To further decrease the number of fuzzy rules, a Takagi-Sugeno (T-S) fuzzy system based on support vector regression (TSFS-SVR) was proposed [23].In the TSFS-SVR, the number of fuzzy rules was determined by a one-pass clustering algorithm, and a new T-S kernel corresponding to a T-S-type fuzzy rule was constructed from the product of a cluster output and a linear combination of input variables.
However, apart from the fact that a large number of SVs may be generated by the SVM learning mechanism, another issue is the high computational complexity involved in solving a dual quadratic programming (QP) problem, which leads to the development of least-squares SVMs (LS-SVMs).The LS-SVMs were thus proposed by modifying the inequality constraints in the two-norm SVMs, resulting in solving a linear Karush-Kuhn-Tucker (KKT) system rather than solving the QP problem in the traditional SVM.Unfortunately, a major drawback of an LS-SVM model is its nonsparseness [24], where all the training patterns are used as SVs in the final classifier.The complexity of the final classifier after learning from data thus is extremely high.Therefore, despite the computational advantage of LS-SVMs, their nonsparseness issue still restricts the development of LS-SVM-based fuzzy systems as the final rule base can be extremely large where the number of fuzzy rules is equal to the number of training patterns.It is worth noting that a conventional strategy to overcome this drawback is to impose sparseness by pruning [25], where a series of LS-SVMs are continuously trained, and each time, a small fraction (for example, 5%) of the instances in the training dataset with smallest support values are discarded.However, this procedure inevitably increases the computational burden, and the resultant model performance cannot be guaranteed.Two fast sparse approximation schemes (i.e., FSALS-SVM and PFSALS-SVM) were also proposed for training LS-SVMs [26].They are based on the greedy algorithm with the aid of viewing the Wolfe dual problem of LS-SVMs as a regularized loss function induced by reproducing Kernel-Hilbert space (RKHS).Based on these observations, this paper mainly concerns the sparseness issue as well as the computational demand associated with the development of LS-SVM-based fuzzy systems.
The main contribution of this paper is the proposal of an efficient learning mechanism for the construction of sparse LS-SVM-based fuzzy systems with significantly reduced computational demand.The novel techniques employed are summarized as follows.First, the LS-SVM learning mechanism is employed to provide a framework to extract SVs for generating fuzzy IF-THEN rules and to formulate the fuzzy rule-based system in the form of a series expansion of FBFs.To deal with the nonsparseness issue for a conventional LS-SVM, a new regression solution to the Lagrangian one for solving the LS-SVM is presented.This regression solution is obtained by optimizing the same objective function defined in the LS-SVM and has a better objective value compared with the conventional one.Second, a novel learning mechanism is then proposed to extract a sparse set of SVs for generating fuzzy IF-THEN rules from the training instances.The novel mechanism works in stepwise subset selection manner, where in each step, it includes a forward expansion phase to select the most significant SVs and a backward exclusion phase to reevaluate the least insignificant SVs that are selected previously, and both phases work in a regularized least-squares sense.Finally, a few key techniques are proposed to completely avoid the matrix inverse operations and to accelerate the training process, leading to the proposal of the efficient learning algorithm with low computational complexity.It is also worth mentioning that the second-stage technique [27] used to refine a subset of fixed size has shown to be extremely effective when applied to improve the results produced by stepwise forward subset selection approaches.However, its computational demand is still high, and furthermore, the original second-stage algorithm was used to select a subset of terms of a fixed size.In this paper, the second-stage idea is also implemented in the proposed algorithm to demonstrate that the outstanding performance can be achieved by our method.With all these key technologies, the proposed approach can thus achieve both computation reduction and model sparseness in developing the LS-SVM-based fuzzy systems, and either of the two advantages surpasses the respective strength inherent from the conventional SVMs or LS-SVMs.Three simulation and real-world examples on modeling, prediction, and classification problems are presented, respectively, to demonstrate the efficiency of the novel learning mechanism and the sparseness of the constructed LS-SVM-based fuzzy systems.
This paper is organized as follows.Section II gives a brief description of the fuzzy rule-based systems.The mathematical formulation of the LS-SVMs and the new regression solution are then presented in Section III.Section IV proposes the efficient learning mechanism for the construction of sparse LS-SVM-based fuzzy systems.Results from three applications on nonlinear system modeling, melt pressure prediction in polymer extrusion, and mammographic masses diagnosis are presented in Section V. Finally, Section VI concludes this paper.

II. FUZZY RULE-BASED SYSTEMS
This section describes the mathematical formulation of the fuzzy rule-based systems.As indicated in [10] and [20], the spirit of fuzzy rule-based systems applies the strategy of "divide and conquer," in which by using a number of interpretable fuzzy rules, their premise part is first used to partition the original input space into a set of small fuzzy input regions, and the consequent part is then employed to describe the system behavior within that small fuzzy region via various constituents.Therefore, the most common fuzzy rule-based system consists of a set of linguistic fuzzy rules, the ith rule being represented by where t denotes the sampling instant, i is the rule index with a total of m fuzzy rules, x(t)=[x 1 (t),...,x n (t)] ∈ℜ n is an n-dimensional input vector for the system of interest, A i,j is the fuzzy set associated with the ith rule corresponding to the input variable x j (t), θ i is the constant constituent for the ith rule consequent, and ŷi (t) is the output variable for the ith rule in the fuzzy system.The Gaussian membership function defined as is commonly employed for the fuzzy set A i,j in the input space, where c i,j and σ i,j denote, respectively, the center and standard deviation of the ith membership function with regard to the jth input (j =1,...,n).To infer the fuzzy system output, the T-norm operators are applied to compute the ith rule firing strength where Then, the degree of fulfillment (normalized firing strength) of the ith rule is given by where T denotes the premise parameter vector.The weighted-average-defuzzification method can then be employed to calculate the overall output of the fuzzy rule-based system, such that where Θ =[θ 1 ,...,θ m ] T denotes the consequent parameters vector.Note that N i (x(t); W) is also called as the FBF.In this circumstance, the fuzzy rule-based system can be viewed as a series of FBF expansions.This linear combination of FBFs is capable of approximating any continuous nonlinear function on a compact set to arbitrary accuracy, provided that sufficient fuzzy rules are made available.

III. LEAST-SQUARES SUPPORT VECTOR MACHINE AND ITS NEW REGRESSION SOLUTION
SVM [19], [28] is a recently proposed technique that aims to solve pattern classification problems, where it is used to find a hyperplane h • x (h is a vector consisting of the associated unknown parameters) that can separate two-class patterns with the maximum margin.This is because maximizing the two-class margin is equivalent to minimizing the upper bound on the model's generalization error (i.e., structural risk minimization).Due to the high computational complexity generally involved in solving the QP problems in the dual space in SVM, LS-SVM was proposed by modifying the inequality constraints in a conventional two-norm SVM.The LS-SVM takes the form of h • φ(x(t)), in which the nonlinear function φ(x(t)) maps the original input data into some high-dimensional feature space, i.e., x(t) ∈ℜ n → φ(x(t)) ∈ℜ H , aiming to cope with the linear unseparated problem.Given a set of training patterns {x(t),y(t)} N t=1 ∈ℜ n ×{±1}, the classification problem in an LS-SVM is now defined as where µ is a regularization parameter that determines the biasvariance tradeoff.Its solution can be obtained by introducing the Lagrangian where α =(α 1 ,α 2 ,...,α N ) ∈ℜ N is the vector of Lagrange multipliers.The minimum value with respect to h, ε(t), and α t is obtained by solving the following well-known KKT system: ∀t ∈{1, 2,...,N}.
These equations can be rewritten concisely in a matrix form as where M = K + µI is a definite symmetric matrix, and K i,j (x(i), x(j)) = φ(x(i)) • φ(x(j)) is known as the kernel function.By using (8), the LS-SVM classifier can now be rewritten as It is observed from ( 9) and ( 10) that the mapping function φ(•) involved in solving the KKT system and in producing the final model output does not have to be known exactly.Instead, the value of interest is the kernel function , which is vividly referred to as the well-known kernel trick.The linear KKT system in (9) can now be efficiently solved by using direct methods, such as Cholesky decomposition as M is positive definite.However, a major drawback of an LS-SVM model lies in its nonsparseness [24].It can be shown in the second equation of ( 8) that the values of α t (t =1,...,N) shall never be zero because ε(t) (t =1,...,N) are nonzero.All training patterns are supposed to contribute to the final model, the importance of each being indicated by its support value.As a result, the LS-SVM obtained will lose sparseness, and the size of the resultant model can be extremely large.This is perhaps the main reason that limits the development of LS-SVM-based fuzzy systems.In this paper, a sparse LS-SVM learning mechanism will be proposed and integrated into the compact fuzzy rule extraction.
To deal with the nonsparseness issue in the LS-SVM, a new regression solution to the Lagrangian one to solve the LS-SVM is first given.In the aforementioned conventional solution of the LS-SVM presented in ( 9) and (10), the kernel trick is adopted to deal with the linear inseparable cases in classification.As a result, the necessity of knowing the exact mapping function used to map the input data into some high-dimensional feature space is no longer required.The authors have recently proposed a method [29] by first assuming that the mapping function φ(x(t)) is already known and given by where s i ∈ℜ n (i =1, 2,...,m) are some data vectors from input space, which can be chosen from the training patterns or otherwise.This way, the original input space F n is thus transformed into another high-dimensional feature space F m .Accordingly, the primal optimization problem of the LS-SVM defined in (6) can thus be reformulated as This constitutes a regularized least-squares problem, which is also called ridge regression in statistics.The primal optimization problem in the LS-SVM has thus been successfully transformed into a regularized least-squares one, avoiding the KKT problem described in (8).Considering that the gradient of the cost function (12) with respect to the parameter vector h has to be zero, the estimated optimal parameter vector is then given by ĥ where Each row in the whole mapping matrix Φ denotes a high-dimensional mapping space for an input vector, while each column denotes one dimension for a subspace of all the input data.The LS-SVM classifier can thus be written as follows for a new test vector x from the input space: Similar to the definition of SVs in an SVM and in the conventional solution of a LS-SVM, these s i that here correspond to h i (having nonzero values) that contribute to the final model output are the SVs.As in the conventional solution to an LS-SVM where all the training patterns themselves act as SVs, the regression matrix Φ ∈ℜ N ×m (m = N ) produced from using all the training patterns as SVs in our proposed solution turns out to be (15) This is identical to the kernel matrix K(x(i), x(j)) presented previously for the conventional solution to LS-SVM.By using the conventional solution and our new solution to the primal objective problem (6), both objective values can be obtained, assuming that all the training patterns are viewed as SVs.The superiority of the new regression solution to the LS-SVM was compared with the conventional one in [29].It can be observed that the kernel matrix K(x(i), x(j)) ∈ℜ N ×N in the conventional solution is a special case of the regression matrix Φ ∈ℜ N ×m in our solution.However, both ours and the conventional solutions do not possess the sparseness property at this stage, which in fact represents the main drawback of the LS-SVM models.It is interesting to observe that the compulsory square property of the matrix K(x(i), x(j)) in the KKT system (8) is no longer required in our regression matrix.Changes in the value of m indicate how many SVs will be included in the final LS-SVM classifier and, in turn, determine the sparseness and the scale of the classifier.This is a very important characteristic for the novel learning mechanism to be presented in the next section.In the proposed algorithm, since every column in the matrix Φ corresponds to one dimension of the mapped high-dimensional space, a subset of the training patterns can thus be chosen as the SVs in the LS-SVMs.

IV. NOVEL EFFICIENT LEARNING MECHANISM
The aim of this paper is to develop a new fuzzy rule-based system based on a sparse LS-SVM learning mechanism with the model structure shown in Fig. 1.Similar as in SVM-based fuzzy systems [20] (where the kernel function in SVMs is related to the FBF), the FBF (4) is chosen as the mapping function (11) in our proposed solution of LS-SVM, i.e., ϕ i (x(t)) = N i (x(t); W), to fuse the two systems into a new LS-SVM-based fuzzy rulebased system.Note that as usual, the denominator of the FBF is removed since the number of fuzzy rules is unknown in advance.There is no violation of the spirit of a fuzzy inference system as described in [20], where the rule premises determine the confidence values for all rules, while the rule consequents assign the consequence of the inference system with the confidence values for the corresponding rules.As a result, the SVs extracted from the LS-SVM learning mechanism can be applied in generating the fuzzy IF-THEN rules that correspond to the FBFs.In this manner, the fuzzy systems produced can provide satisfactory generalization capability over unseen data as in the case of LS-SVM.Different from the conventional LS-SVM where all training patterns serve as the SVs (thus causing nonsparseness), a novel sparse LS-SVM learning mechanism is proposed in this paper to produce rule selection in a fuzzy rule-based system.
The global optimization based on the new regression solution (13) of LS-SVMs still leads to the nonsparseness results, as in the conventional solution (9).To tackle this problem, an efficient learning mechanism based on the subset selection approach is proposed here to find a small subset of SVs.This is, however, an NP-hard problem, which is widely acknowledged as being extremely difficult to solve in terms of algorithm performance and running time.It is generally impractical to find the global optimal subset by performing exhaustive search due to the huge computational burden (where the evaluation of all the possible combinations of subsets from a total number of N candidate SVs is needed).This is also reflected in the experiment section.The novel learning mechanism proposed in this paper works in a stepwise subset selection manner, including a forward expansion phase and a backward exclusion phase on each selection step.The fast recursive algorithm presented in [14] is basically a fast and stable version of forward stepwise subset selection method working in the least-squares sense.It performs conditional optimization at each step under a given number of regressors that have been included in the subset, and the corresponding models are, therefore, usually suboptimal.Unlike the fast recursive algorithm, the novel learning mechanism consists of not only a forward expansion phase but a backward exclusion phase at each subset selection step as well, both also working in a new regularized least-squares sense.It is also different from the previously proposed second-stage algorithm [27], [30], which initially targets a subset of fixed size.The forward expansion phase at each step performs in the same way as in the fast recursive algorithm but within a regularized least-squares framework, instead of the least-squares approach.Here, each time, the most significant item from the candidate pool is added to the selected pool based in an efficient manner.The backward exclusion phase is devised to assess the least insignificant item that has been selected previously and, then, to determine whether or not to remove it from the current selected subset and return it to the candidate pool in order to determine a subset containing the most significant items.
For notation convenience, a similar residue matrix as in [14] is first defined as where Φ k =[p 1 ,...,p k ] represents the selected pool, which is a subset of the regression matrix Φ and R 0 = I ∈ℜ N ×N .If there is no prior knowledge about the system of interest, the number of initial regressors (equivalently SVs or fuzzy rules) can be set as m = N .It is not difficult to find that R k = R T k , and any changes in the order of the selected regressors p 1 ,...,p k (i.e., column vectors in the regression matrix Φ k ) do not affect the value of R k .Based on the way in which the forward expansion and backward exclusion phases are performed, two basic theorems related to the residue matrix R k are given below to facilitate the required sparseness learning for LS-SVM-based fuzzy systems.
Theorem 1: In the above two theorems, [Φ k ;+ϕ i ] denotes adding a new regressor ϕ i from the candidate pool Ψ k into the selected pool Φ k , and [Φ k +1 ; −p i ] denotes removing a selected regressor p i from the selected pool Φ k +1 .The proofs of these two theorems are given in Appendix A. In addition, note that the initial candidate pool is set as According to the solution given in (13), the optimal objective function ( 12) to the LS-SVM is computed as Considering the residue matrix defined in ( 16), the optimal value of the objective function ( 12) by using Φ k becomes Thus, on adding one new regressor, say ϕ i (i = k +1,...,m), into the selected pool in the forward expansion phase at the (k + 1)th subset selection step, the objective value is correspondingly decreased by On the contrary, deleting one such regressor, say p i (i = 1,...,k+1), from the selected pool in the backward exclusion phase at the (k +1)th subset selection step, the objective value is correspondingly increased by In summary, at the (k +1)th subset selection step, the forward expansion phase is first executed, where the regressor producing the largest objective reduction is chosen as the (k +1)th regressor and is involved in the selected pool, i.e., When the forward expansion phase is completed, the backward exclusion phase is executed to review the contribution of all previously selected regressors.This is done by excluding the regressor with the smallest contribution from the selected pool and, meanwhile, returning it to the candidate pool, i.e., p r = arg min k +1 i=1 ∆ ← − J k +1 (p i ).Note that if p r is exactly the regressor p k +1 just selected at the forward expansion phase, then the backward exclusion phase is neglected.In this circumstance, it means that all the regressors in the current selected pool are significant and, thus, that no backward exclusion is needed.Since as the criterion to determine whether to remove a regressor from the selected pool or not.To efficiently compute the regressor contributions based on ( 21) and ( 22), the following two sections give the efficient learning mechanism for producing sparse LS-SVM-based fuzzy systems.

A. Forward Expansion Phase
In each forward expansion phase, the net contribution of a regressor from the candidate pool to the objective function is expressed in (21).Suppose that the kth regressor has just been added into the selected pool; an intermediate matrix A ∈ℜ k ×m is thus generated with the kth row calculated as Note that the first k − 1 rows are, therefore, generated in the same way each time a new regressor is included into the model.Thus, the number of rows in matrix A increases by one as the selection procedure proceeds.By successively using (17), the following can be inferred for efficient computation: a j,k a j,i /(a j,j + µ),i =1,...,k a j,k a j,i /(a j,j + µ),i = k +1,...,m. ( To continue decreasing the computational complexity of the left-hand side entries in the kth row, it follows that Further define a vector b k +1) ∈ℜ m , where its entries at the (k +1)th step are calculated as and another vector d k +1) ∈ℜ m , where The values for i =1,...,k are kept unchanged from previous selection steps, and then, using (17) With the aid of the matrix A and the vectors b and d, the contribution of regressor ϕ i (i = k +1,...,m) from the candidate pool at the (k +1)th step can be reexpressed as As a result, the one with the largest objective reduction is selected as the (k +1)th regressor to be included into the system, i.e., As long as this new regressor is included in the selected pool, the next phase is to review the significance of all the previously selected regressors.

B. Backward Exclusion Phase
1) Model Review: Continuing from the forward expansion phase, a total of k +1 regressors have now been included in the selected pool.Thus, the intermediate matrix/vectors A ∈ℜ (k +1)×m , b k +2) ∈ℜ m , and d k +2) ∈ℜ m have been updated correspondingly.A backward exclusion phase is then to be performed, in which the significance of each selected regressor in terms of the objective function is reevaluated as in (22).Two vectors c k +1) ∈ℜ m and h k +1) ∈ℜ m are first defined with their entries at the (k +1)thstepgivenby Using ( 17) and comparing with the entries in b k +2) and d k +2) , the following results can be obtained: This way, the significance of a selected regressor p i given in (22) can be computed as ) is satisfied, then the previously selected regressor with the least significance to the objective function, say p r , will be excluded from the selected pool and returned into the candidate pool.An efficient process for removing this regressor from the selected pool is now detailed as follows.
2) Regression Context Reconstruction: All the intermediate matrix and vectors used in the aforementioned forward expansion and backward exclusion phases, such as and h k +1) ∈ℜ m ,are the key ideas behind the proposed algorithm and are referred to as the regression context as in [27].However, if one selected regressor needs to be removed from the selected pool, as described in Section IV-B.1, the regression context has to be updated.The new regression context can be obtained by only again performing the forward expansion procedure using the current selected order of regressors.Unfortunately, this is computationally inefficient.Based on the techniques introduced in [27], a computationally more efficient algorithm is presented for reordering the selected regressors.In more detail, suppose a previously selected regressor p r is going to be removed from the current selected pool; then, we first have to shift p r to the (k +1)th position (the last position) in the regression matrix Φ k +1 as if it was the last selected regressor.This shifted regressor p r is then deleted from the last position in Φ k +1 .
3) Regressor Exclusion: Once the regressor p r in the last position of the regression matrix has been determined for removal, the new regression context then just requires some small changes.First, the selected pool is temporarily updated by Φ k = [p 1 ,...,p k ] and the candidate pool by where ϕ k +1 (or equivalently pk+1 or p r ) is the regressor that was removed from Φ k +1 .Thus, the residue matrix Rk after removing pk+1 is given by Using this formula, the vector b k +1) is updated by and the vector d k +1) is updated by Since the removed regressor pk+1 may be again selected in a subsequent forward expansion phase at the next selection step, then this regressor should not be excluded from the selected pool.In this case, there is no need to perform any changes on the regression context obtained in Section IV-B2.To further reduce the computation time, by just employing the values of bk+1) i and dk+1) i obtained and using (30), it is now ready to determine which one is the new (k +1)th regressor to be selected at the next forward expansion phase.This way, ∆ ← − ) is applied to avoid removing the right regressor that has been selected previously, where ..,m.If the regressor pk+1 has been marked for removal from the selected pool, then the vector b k +1) is assigned with entries bk+1) i = bk+1) i and the vector d k +1) with entries ..,m), and the following two vectors c k ) and h k ) are updated: In the case of the matrix A, only the (k +1)th row is removed with the others remain unchanged.Obviously, the selected pool Φ k and the candidate pool Ψ k are updated using pi = pi for (i =1,...,k) and φi = ϕ i for (i = k +1,...,m).Thus far, the regression context and h k ) ∈ℜ m is ready for use in the following forward expansion phase at the next selection step.

C. Computation of Model Parameters
Assuming that a total of M rules have finally been selected by the proposed method, and using the definition of R k defined in (16), the model parameters are computed from (13) where c M ) i , i =1,...,M, are the first M entries obtained from the final value of the vector c M ) ∈ℜ M .Note that if only the forward expansion phase is considered in the selection procedure, then the related model parameters are computed as ĥM =[ ĥM,1 ,..., ĥM,M ] T , in which ĥM,i is given as follows from ( 17) and (50):

D. Algorithm: Construction of Sparse least-Squares Support-Vector-Machine-Based Fuzzy Systems
The efficient learning mechanism of the sparse LS-SVMbased fuzzy systems is shown in the flowchart in Fig. 2 and is detailed as follows.
Step 1) Initialization: To start the learning process, the candidate pool Ψ 0 =[ϕ 1 ,...,ϕ m ] is first generated by using all the training patterns as the potential rules/SVs.Note that the initially selected pool Φ 0 is an empty matrix.The number of selected regressors is set to k =0, and the two vectors b 1) =[ϕ T 1 y,...,ϕ T m y] and d 1) =[ϕ T 1 ϕ 1 ,...,ϕ T m ϕ m ] are initialized.Step 2) Forward expansion phase: The main task here is to select the most significant regressor from the candidate pool and to update the corresponding variables for the operations ahead.
1) According to the contribution of each candidate regressor computed from (30), the one with the largest objective reduction is selected as the next regressor to be added into the regression matrix Φ k +1 =[p 1 ,...,p k +1 ], i.e., The corresponding regressor p k +1 is then removed from the candidate pool and 2) The (k +1)th row of matrix A is calculated using (25), while all the previous k rows remain unchanged.
3) The two vectors b k +2) and d k +2) are updated with entries from k +2to m by using ( 28) and ( 29) and are employed for selecting the (k +2)th regressor from the candidate pool.
Step 3) Backward exclusion phase: The main purpose of this phase is to reevaluate the contribution of each of the previously selected regressors.
1) The entries from 1 to k +1 for the two vectors c k +1) and h k +1) are updated using ( 33) and (34), while the correspondingly remaining values in the two vectors are inherited from b k +2) and d ) is used to decide whether to remove a regressor from the selected pool or not, and to determine which one is to be removed.If the criterion is not met, then set k = k +1and go to Step 4. Otherwise, move to the next step.
3) The regressor p r is shifted to the last column of Φ k +1 using a total of k − r +1interchanges between two adjacent previously selected regressors.Thus, a new regression ) ∈ℜ m , and h k +1) ∈ℜ m is produced as if p r was the last selected regressor in the regression matrix ) is used to decide whether to remove a regressor from the selected pool or not.If none has to be removed, then set k = k +1and the algorithm moves to Step 4. Otherwise, go to the next step.5) The regressor p r is removed from the selected pool and returned to the candidate pool, i.e., Φ k =[p 1 ,...,p k ] and ) ∈ℜ m , and h k ) ∈ ℜ m are then updated and the index k is set to k − 1 as described in Section IV-B3.
Step 4) The learning process will terminate if some stopping criterion is met, such as a certain number of regressors have been selected or some tolerance value has been met.Similar to the stopping criterion commonly used in training neural networks and SVMs [13], [26], the tolerance for the maximum ratio of objective value reduction is used here.In detail, if the ratio (J k − min m i=k +1 J k +1 (ϕ i ))/J k is less than a very small positive tolerance value (ρ), the generalization performance of the fuzzy systems will not be greatly improved by adding a new regressor.It should be noted that the stopping criterion used here is an important measure for the tradeoff between the training accuracy (performance) and the model complexity (sparseness and interpretability) of the obtained fuzzy systems.If the stopping criterion is not met, the algorithm returns to Step 2.

E. Convergence and Computational Complexity
For the convergence, it is obvious that the objective value continuously decreases each time a new regressor is included into the selected pool (i.e., where only the forward expansion phase is applied), with a decrement amount of ∆ − → J k +1 (ϕ i ) at the (k +1)th subset selection step if ϕ i (i = k +1,...,m)i s added as defined in (21) and (30).To reassess the contribution of all the previously selected regressors, the backward exclusion phase is performed to exclude the most insignificant regressor with the smallest contribution to the objective function from the selected pool.Thus, the introduction of this backward exclusion phase can cause a small amount of increase ∆ ← − J k +1 (p i ) to the objective value, which is defined in (22) and (35) if a selected regressor, say p i (i =1,...,k+1), is removed from the selected pool at the (k +1)th subset selection step.However, as the criterion ) is used to determine whether or not a regressor is removed and assuming that the objective value on the kth subset selection step at some point is J k , the new objective value J k obtained after a forward expansion being followed by a backward exclusion is given by J Thus, the objective value is reduced each time a new subset of k regressors is selected.Obviously, the extreme case is that a nonsparse fuzzy system corresponding to the solution of ( 13) can be obtained if all the regressors are selected as the SVs with a tolerance value ρ =0.I ns u mmary, the convergence of the proposed method composed of iterative forward expansion and backward exclusion phases is guaranteed.
With respect to the computational complexity, the basic arithmetic operations involved in the construction of sparse LS-SVM-based fuzzy systems are additions/subtractions and multiplications/divisions. Assuming that a total of N data samples are used for training and that a total of M rules have been extracted by the proposed learning mechanism, the number of additions/subtractions and multiplications/divisions and overall total of operations from only using the forward expansion phase are listed in the first row of Table I.By introducing the backward exclusion phase, the overall computational complexity then varies with the different numbers of regressors removed at each selection step and the different position of the removed regressor in the selected pool.
The details of the computational complexity, including both the constant part and the variable part (shifting operations and removing operations), are listed in the last three rows of Table I.The first part constant operations involving ( 33)-( 35) and the forward expansion phase are listed in the second row of Table I.Suppose the forward expansion at the (k +1)th step is just completed and a previously selected regressor at the n k th position in Φ k +1 is to be removed from current selected pool; then, the operations involved in shifting this regressor to the last position in Φ k +1 and removing it are given in the third and fourth rows of Table I.Due to the fact that N> >M, the computation mainly comes from the term 2MN 2 .In practice, the proposed method is usually dominated by the forward expansion phase, while the backward exclusion phase works on revising the selected regression pool.Thus, considering M>k≥ n k , the computational demand of the proposed algorithm does not increase too much, compared with the forward expansion phase.In addition, as described in Section III, it generally needs a computational complexity of N 3 /3+O(N 2 ) by using the efficient Cholesky decomposition to solve the KKT system (9) only for nonsparse LS-SVMs.Therefore, the computational advantage of our learning mechanism is significant especially when the training dataset consists of a larger number of patterns.If the pruning method [25] discussed in Section I is used for imposing the sparseness for the conventional LS-SVM, its computational complexity can also be extremely large.Thus, the computational demand of the proposed learning mechanism in this paper can be dramatically decreased, meanwhile achieving the model sparseness.These will further be demonstrated in the following experimental examples.

V. N UMERICAL EXAMPLES
Three simulation and real-world problems are investigated to validate the efficiency and effectiveness of the proposed learning mechanism and the sparseness of LS-SVM-based fuzzy systems constructed.The resulting performances are also compared with other SVM-based fuzzy learning approaches in terms of model sparseness, running time, and model accuracy.The first example is a nonlinear dynamic identification problem [31], the second involves melt pressure prediction in polymer extrusion process [32], and the third is to diagnose the severity of mammographic masses [33].All the experiments were conducted on an Intel Core TM 2 Duo Processor E8135 2.40 GHz, running the Windows 7 operating system, with programs compiled by MATLAB.

A. Identification of the Nonlinear Dynamic System
The first example [31] involves identifying the following nonlinear dynamic system: (52) where ε(t) represents a noise sequence [ε(t) ∼ N (0, 0.01 2 )].A total of 400 simulated data points were then generated.The first 200 samples of training data were obtained by stimulating the system with a random input signal u(t) uniformly distributed in [−1, 1], while the remaining 200 samples of test data were produced under using a sinusoidal input signal u(t) = sin(2πt/25).Thus, [u(t − 1),y(t − 1),y(t − 2)] and y(t) constituted the input and output variables for the LS-SVM-based fuzzy models to be developed.
The Gaussian width σ was set to 3, and the regularization parameter µ was set to 1/(2 × 1000), as is common.To assess the effectiveness of the proposed algorithm in finding better values the number of fuzzy rules becomes larger, those objective values with "> " represent the one computed by including all the candidate fuzzy rules into the rule base and their running times were estimated based on executing the method for a small number of possible combinations.Similarly afterwards in Tables VI and VIII.
of the objective function, several experiments were carried out, as shown in Table II, given that the same number of fuzzy rules was selected.In the meantime, different sizes of subsets were also tested, and the results are listed in the corresponding rows.The first column lists the values of the objective function together with the running time obtained by only using the forward expansion algorithm, while the results from using a mixture of forward expansion and backward exclusion on each subset selection step are given in the second column.Apparently, as demonstrated in the first row, the two approaches produced the same objectives when only one rule was included in the rule base since this is the global optimum value and the backward exclusion phase was certainly not needed.The superiority of the mixed one over the forward expansion is evident when the selection process continues.It can also be seen that as the selection proceeded, the values of the objective function did not further decrease significantly when a certain number of fuzzy rules had already been selected, which means that the redundant rules with little contribution to the final fuzzy system were later included into the rule base.To further demonstrate the superiority of our proposed algorithm, the idea of the second-stage algorithm proposed in [27], which is used to refine a fixed size subset of regressors, was also applied on the results obtained by the proposed algorithm, as shown in the third column of Table II.
Here, the model size was unchanged during the second-stage refinement procedure, and the contribution of each previously selected fuzzy rule in the first stage was reviewed.It is obvious that there was no big improvement after introducing the secondstage optimization, which in turn reflects that the outstanding performance can be achieved by our proposed algorithm.Alternatively, efforts were also made to search for the global optimum results that can theoretically be found by the exhaustive search method.However, it turned out to be unrealistic if the number of fuzzy rules was larger than five in this example due to the huge amount of running time needed.Suppose that seven rules are currently considered to be added into the rule base; there are 200!/(7!(200−7)!)= 2.28e + 12 possible combinations, approximately needing 24.92 years to find the optimum result (! denotes the factorial operator).If this works on different numbers of fuzzy rules, the running time can be extremely inconceivable.Assuming that all the candidate fuzzy rules are included in the rule base where they collectively produce the minimum objective value, the global value of the objective function produced by the exhaustive search method under each size of subset should be greater than this value (75.9960 in this example).In conclusion, the proposed learning mechanism is able to select a small-size subset of fuzzy rules with acceptable objective value in a short running time.
It is also clear that the changes in the objective values became very small after a certain number of fuzzy rules had been selected; the stopping criterion with a tolerance value 0.02 was then used in this example to terminate the learning.The resultant number of rules, the number of model parameters, the training and test RMSEs (root mean-squared errors), and the running time by performing only the forward expansion phase and both the forward expansion and backward exclusion phases are shown in the first two columns in Table III.Here, the former method found a total of nine rules with a test RMSE of 1.76e-02, while the latter produced better results with eight rules and a smaller RMSE of 1.42e-02 as it is more capable of finding smaller objective values.
For comparison purposes, Table IV also lists the results of various SVM-based trained fuzzy models.The insensitive value used in the SVM-based fuzzy model and the TSFS-SVR was assigned as 0.03.A direct use of the conventional LS-SVMbased learning mechanism to construct a corresponding fuzzy model was also adopted, in which the KKT system defined in (9) was efficiently solved by the Cholesky decomposition as usual.For the TSFS-SVR, the Gaussian width is determined by the aligned clustering algorithm where the initial width, the    is close to 0.3560 and x 2 is close to −0.3775 and x 3 is close to 0.2831, then y 1 is close to 1.6518 R 2 :Ifx 1 is close to −0.4326 and x 2 is close to −0.0889 and x 3 is close to −0.0902, then y 2 is close to −1.7856 R 3 :Ifx 1 is close to −0.4244 and x 2 is close to −0.0983 and x 3 is close to −0.0163, then y 3 is close to −1.7375 R 4 :Ifx 1 is close to 0.2143 and x 2 is close to −0.3717 and x 3 is close to 0.9183, then y 4 is close to 1.7617 R 5 :Ifx 1 is close to −0.1834 and x 2 is close to 0.5619 and x 3 is close to −0.9942, then y 5 is close to −2.0003 R 6 :Ifx 1 is close to 0.7123 and x 2 is close to 0.2143 and x 3 is close to −0.6880, then y 6 is close to 2.1157 R 7 :Ifx 1 is close to −0.1218 and x 2 is close to 0.5619 and x 3 is close to −0.8827, then y 7 is close to −1.9111 R 8 :Ifx 1 is close to 0.4892 and x 2 is close to −0.1725 and x 3 is close to 0.2676, then y 8 is close to 1.8918 threshold, and the overlap coefficient were set as 0.3, 0.78, and 1.6 separately, and all the remaining parameters used in determining its weighting parameters were the same as in the other SVM-based fuzzy models.As a common strategy for model training [20], [23], [26], different values of regularization were also examined with a suitable Gaussian width as defined before, which gave good generalization performance.It is clear that the LS-SVM-based fuzzy model trained by our proposed learning mechanism required the least amount of running time for all the µ values, while the test performance was comparable with the other SVM-based techniques.The sparseness of the fuzzy models produced is reflected in the number of model parameters being used, as shown in the third row of Table IV, and fewer model parameters used in a sparser model.As mentioned earlier, the conventional LS-SVM-based learning mechanism used all the training instances as SVs, thus always resulting an extremely complex fuzzy model.While achieving an acceptable model performance, our LS-SVM-based fuzzy model also proved to be capable of producing a significantly sparser solution for all different µ values as expected.The best result is given by 1/(2µ)=1000 with a total of eight rules obtained.In general, the decrease of µ can enhance the training accuracy.However, according to the tradeoff between the training accuracy and the regularization defined in the objective function (6), this could cause the overfitting problem.Fortunately, overfitting did not occur here, and the test results are quite acceptable.It should be noted that the sparse fuzzy systems obtained by our method, which consist of fewer fuzzy rules, also help to avoid overfitting.Similar to [20], the nonlinear system in this example is thus represented as a combination of a series expansion of FBFs (assigned by SVs), and this corresponds to a set of fuzzy IF-THEN rules shown in Table V (x 1 , x 2 , x 3 , and y i denote the three input variables u(t − 1), y(t − 1), y(t − 2) and the output for the ith rule, respectively).Since a larger number of system inputs and fuzzy rules are involved in the next two examples, the similar rule representation will not be listed in this paper.Fig. 3 also shows the training and test outputs of the LS-SVMbased fuzzy model learnt by our algorithm.In conclusion, the proposed learning mechanism has shown to be able to produce sparser fuzzy models in terms of rule numbers and parameter numbers, while obtaining comparable model performance within a short period of running time.

B. Melt Pressure Prediction in Polymer Extrusion
Polymer processing is a major manufacturing sector.An extrusion process is used to melt and then form the raw polymeric materials into continuous profiles [32].In more detail, polymer material in the form of pellets is first fed into a fixed hopper.The material is then conveyed forward by a rotating screw and discharged through a die before being converted to a continuous polymer product.The product emerging from the die is cooled by blown air or in a water bath.Melt pressure at the end of extruder is one of the most important process parameters in the extruders, and it is closely related to the quality of polymer product produced.It is useful to understand the processing behavior and that the melt pressure is affected by many factors, such as screw speed, motor current, process operating conditions,  machine geometry, and material properties.Of these, the prediction of the effects of screw speed, motor current, and process operating conditions on melt pressure is important for a given machine, particular screw geometry, and polymer materials.
The experiment was conducted on a Killion KTS-100 singlescrew extruder in the Queen's University of Belfast.A total of seven heaters were located along the barrel, each controlled by a Eurotherm 808 PID temperature controller.The actual location of these heaters is shown in Fig. 4. The temperatures related to the melt pressure are located at zone 1, zone 2, and zone 3, in which four heating bands were mounted in the first two zones and three in the last one, as illustrated in Fig. 5.The experimental trials were conducted using a virgin low-density polyethylene (Dow LD150R, density: 0.921 g/cm 3 , and MFI: 0.25g/10 min).With a down-sampled frequency of 0.2 Hz, a total of 1154 data points were collected, from which 600 were used for training, the remaining 554 being used as the prediction dataset.These data points were processed by a second-order low-pass digital Butterworth filter with a normalized cutoff frequency of 0.01.The input vector to the melt pressure fuzzy models was set as [V s , I m , T 1 , T 2 , T 3 ].
In this example, the Gaussian width for all the SVM-based fuzzy models was set to 30 and the regularization parameter involved in the model training processes were the same as that used in Example 1.A series of experiments were conducted to verify the effectiveness and efficiency of the proposed algorithm based on different numbers of fuzzy rules being included in the fuzzy systems.Table VI confirms the superiority of the proposed learning mechanism in terms of finding better objective values compared with the forward expansion algorithm.It can also be found that after some iterations, there were no significant improvements by performing an additional second-stage algorithm, comparing the objective values in the second and  the third columns.This, in turn, means that it is unnecessary to use a second-stage algorithm to refine the fuzzy rules produced by our algorithm.The descending rate of the objective values became negligible as redundant fuzzy rules were then subsequently added in the system after a certain number of fuzzy rules had been selected.Similar to Example 1, the results from the exhaustive search method are also listed in the last column of Table VI.It is impossible to run this algorithm for a long period of 1.06e+19 years if 15 rules are required for constructing a fuzzy system in this case.The stopping tolerance was set to ρ =0.02 in this example.The final results obtained by using the proposed learning mechanism are shown in the middle two columns in Table III, where a small number of fuzzy rules can be obtained by combining the backward exclusion and forward expansion algorithm.
For the comparison purposes, the number of rules, the number of model parameters, the training and prediction errors, and the running time of SVM-based, conventional LS-SVMbased, TSFS-SVR, and our learnt fuzzy models are all listed in Table VII.The insensitive value used in the SVM-based model and TSFS-SVR was assigned as 0.15.For the TSFS-SVR, the initial width, the threshold, and the overlap coefficient in the aligned clustering algorithm were set as 3, 0.97, and 3.8 respectively.The other parameters were the same as that used in Example 1.It is shown that the TSFS-SVR was always incapable of producing good generalization results, and it also had a large number of model parameters.It can be seen that the SVM-based and conventional LS-SVM-based learning mechanisms both generally produced accurate fuzzy models with acceptable prediction RMSEs for all µ values, while the LS-SVM became extremely complex and the SVM provided less number of fuzzy rules.Apart from the model sparseness issue, this example also confirmed that the LS-SVM-based learning mechanism saved more running time than the SVM-based one.As expected, the proposed approach can further reduce the running time and provide the smallest number of fuzzy rules for all µ values with comparable generalization performance.The performance of the final sparse LS-SVM-based fuzzy model constructed by the novel learning mechanism on the training and prediction datasets is illustrated in Fig. 6.

C. Mammographic Masses Diagnosis
Mammography is the most effective method among various breast cancer screening techniques.However, about 70% unnecessary biopsies with benign outcomes are generally performed because of the low positive predictive value of breast biopsy resulting from mammogram interpretation.To reduce the high number of unnecessary breast biopsies, it is important to develop a diagnosis system that can help physicians in their decision to perform a breast biopsy on a suspicious lesion seen in a mammogram image or to perform a short term follow-up examination  The Gaussian width for all the SVM-based fuzzy classifiers was set to 10 in this example, while the remaining parameters were the same as in the previous two examples.Table VIII again demonstrates the superiority of our proposed algorithm in terms of finding better objective values according to the objective function defined in (12).The use of both the backward exclusion and forward expansion algorithm can further decrease the objective value after some number of fuzzy rules had been included in the fuzzy system, while the second-stage refinement algorithm cannot produce significant improvements.As a matter of fact, in some cases, the results performed by our proposed algorithm were very close to the global optimum assuming that all candidate fuzzy rules were selected.The stopping tolerance was set to 0.002, and the final results produced by the proposed learning mechanism are shown in the last two columns in Table III with a small number of fuzzy rules being found by combining both the backward exclusion and forward expansion algorithm.As in the previous two examples, the results produced by our approach are listed in the last three columns of Table IX and compared with that by SVM-based, conventional LS-SVM-based, and TSFS-SVR models.Comparing the SVM-based with the conventional LS-SVM-based ones, both produced good test accuracies for all µ values, while the former is much sparser along with longer running time and the situation was exactly opposite for the latter one.The TSFS-SVR in this example produced the worst performance on the test data also with a highly complex model.However, for our proposed approach, it is evident that the LS-SVM-based fuzzy classifiers trained by the novel learning mechanism were able to provide the most sparse model together with the least amount of running time while producing comparable test accuracies.

VI. CONCLUSION
This paper has investigated the construction of fuzzy rulebased systems by building sparse LS-SVMs.To achieve a sparse solution, a new regression solution to the primal optimization problem of LS-SVM has been presented first, which avoids

APPENDIX PROOF OF THE TWO THEOREMS
According to the definition of R k in (16), it follows that Applying the well-known matrix inverse equality where R k =(I + Φ k Φ T k /µ) −1 is obtained by using ( 16) and (54).Thus, Theorem 1 has been proved.In addition, it follows from ( 16) that where R k +1 =(I + Φ k +1 Φ T k +1 /µ) −1 .Thus, Theorem 2 has been proved as well.

Fig. 2 .
Fig. 2. Flowchart of the proposed efficient learning mechanism for constructing sparse LS-SVM-based fuzzy systems.

Fig. 3 .
Fig. 3. Fuzzy training and test outputs for u(t − 1),y(t − 1),y(t − 2),and y(t) by using our approach in example 1. (The sign " " denotes the model output, the solid line is the original data, and the bottom curve is the error between the upper two values.)(a) Training output.(b) Test output.

Fig. 6 .
Fig. 6.Training and prediction performances of our LS-SVM-based fuzzy model in melt pressure development.(The sign " " denotes the model output, the solid line is the original data, and the bottom curve stands for the corresponding error between the upper two curves.)

TABLE II OBJECTIVE
VALUES FOUND BY USING DIFFERENT SUBSET SELECTION ALGORITHMS IN EXAMPLE 1 * 1.73e-02 (2.00e+02) denotes running time (all possible combinations).Since it is unable to realize the exhaustive search method when

TABLE III RESULTS
OF THE PROPOSED LS-SVM-BASED LEARNING MECHANISM IN CONSTRUCTING FUZZY SYSTEMSFOR THE THREE EXAMPLES Here, "F" represents the results obtained from only using forward expansion and "F"+"B" is the results from both the forward expansion and backward exclusion. *

TABLE IV COMPARISON
RESULTS OF VARIOUS SVM-BASED LEARNING TECHNIQUES IN CONSTRUCTING FUZZY SYSTEMS IN EXAMPLE 1

TABLE V FUZZY
RULES OBTAINED FROM THE PROPOSED LEARNING APPROACH IN EXAMPLE 1 R 1 :Ifx

TABLE VI OBJECTIVE
VALUES FOUND BY USING DIFFERENT SUBSET SELECTION ALGORITHMS IN EXAMPLE 2

TABLE VII COMPARISON
RESULTS OF VARIOUS SVM-BASED LEARNING TECHNIQUES IN CONSTRUCTING FUZZY SYSTEMS IN EXAMPLE 2

TABLE VIII OBJECTIVE
VALUES FOUND BY USING DIFFERENT SUBSET SELECTION ALGORITHMS INEXAMPLE 3

TABLE IX COMPARISON
RESULTS OF VARIOUS SVM-BASED LEARNING TECHNIQUES IN CONSTRUCTING FUZZY SYSTEMS INEXAMPLE 3the KKT system in its conventional solution, which may result in all training patterns that are being used as the SVs.A novel learning mechanism has then been proposed, which efficiently works in a stepwise subset selection approach, consisting of a forward expansion phase and a backward exclusion phase at each selection step.The execution of the algorithm is extraordinarily fast, and a few key techniques have been introduced to avoid inverse operations and to accelerate training process, confirmed with the detailed computational complexity analysis.As a result, a sparse set of SVs for generating the fuzzy IF-THEN rules from the training instances can be obtained easily.Three examples, including a nonlinear dynamic modelling, melt pressure prediction, and mammographic masses diagnosis, have been presented to demonstrate the efficiency and effectiveness of the proposed learning mechanism.The superiorities of the LS-SVM-based fuzzy systems developed by the proposed method over other SVM-based learning techniques, in terms of the model sparseness and the computational demand, have been well demonstrated and verified. solving