Inverse-Free Incremental Learning Algorithms With Reduced Complexity for Regularized Extreme Learning Machine

The existing inverse-free incremental learning algorithm for the regularized extreme learning machine (ELM) was based on an inverse-free algorithm to update the regularized pseudo-inverse, which was deduced from an inverse-free recursive algorithm to update the inverse of a Hermitian matrix. Before that recursive algorithm was applied in the existing inverse-free ELM, its improved version had been utilized in previous literatures. Then from the improved recursive algorithm to update the inverse, we deduce a more efficient inverse-free algorithm to update the regularized pseudo-inverse, from which we propose the inverse-free incremental ELM algorithm based on regularized pseudo-inverse. Usually the above-mentioned inverse is smaller than the pseudo-inverse, while in the processor units with limited precision, the recursive algorithm to update the inverse may introduce numerical instabilities. Then to further reduce the computational complexity, we also propose the inverse-free incremental ELM algorithm based on the <inline-formula> <tex-math notation="LaTeX">${\mathrm {LDL}}^{T}$ </tex-math></inline-formula> factors of the inverse, where the <inline-formula> <tex-math notation="LaTeX">${\mathrm {LDL}}^{T}$ </tex-math></inline-formula> factors are updated iteratively by the inverse <inline-formula> <tex-math notation="LaTeX">${\mathrm {LDL}}^{T}$ </tex-math></inline-formula> factorization. With respect to the existing inverse-free ELM, the proposed ELM based on regularized pseudo-inverse and that based on <inline-formula> <tex-math notation="LaTeX">${\mathrm {LDL}}^{T}$ </tex-math></inline-formula> factors are expected to require only <inline-formula> <tex-math notation="LaTeX">$\frac {3}{8+M}$ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$\frac {1}{8+M}$ </tex-math></inline-formula> of complexities, respectively, where <inline-formula> <tex-math notation="LaTeX">$M$ </tex-math></inline-formula> is the output node number. The numerical experiments show that both the proposed ELM algorithms significantly accelerate the existing inverse-free ELM, and the speedup in training time is not less than 1.41. On the Modified National Institute of Standards and Technology (MNIST) Dataset, usually the proposed algorithm based on <inline-formula> <tex-math notation="LaTeX">${\mathrm {LDL}}^{T}$ </tex-math></inline-formula> factors is much faster than that based on regularized pseudo-inverse. On the other hand, in the numerical experiments, the original ELM, the existing inverse-free ELM and the proposed two ELM algorithms achieve the same performance in regression and classification, and result in the same solutions, which include the output weights and the output sequence for the same input sequence.


I. INTRODUCTION
The extreme learning machine (ELM) [1] is an effective solution for single-hidden-layer feedforward networks (SLFNs) due to its unique characteristics, i.e., extremely fast learning speed, good generalization performance, and universal approximation capability [2]. Thus ELM has been widely applied in classification and regression [3].
The associate editor coordinating the review of this manuscript and approving it for publication was Md. Zia Uddin .
The incremental learning algorithm for ELM proposed in [2] achieves the universal approximation capability by adding hidden nodes one by one. However, it only updates the output weights for the newly added hidden node, and freezes the output weights of the existing hidden nodes. Accordingly, those output weights are no longer the optimal least-squares solution of the standard ELM algorithm. Then the inverse-free incremental ELM algorithm based on regularized pseudo-inverse was proposed in [4] to update the output weights of the added node and the existing nodes simultaneously, and the updated weights are identical to the optimal solution of the standard ELM algorithm. The ELM algorithm proposed in [4] was based on an inverse-free algorithm to update the regularized pseudo-inverse of the hidden layer output matrix. On the other hand, the incremental ELM algorithm based on generalized inverse (i.e., pseudoinverse) was proposed in [5], which can only update the pseudo-inverse of the hidden layer output matrix recursively, and cannot be applied in the regularized ELM.
Recently, the incremental ELM has attracted more and more attention among researchers [6]- [10]. For food image classification, an adaptive reduced class incremental kernel ELM (ARCIKELM) was proposed in [6]. For multi-label learning with emerging new labels, an incremental kernel ELM was presented in [7], which can achieve the superior performance. For the ELM suffering from the coexistence of node fault and node noise, two fault tolerant incremental ELM algorithms were developed in [8] for the regression problem, which are node fault tolerant incremental ELM (NFTI-ELM) and node fault tolerant convex incremental ELM (NFTCI-ELM). Moreover, for multilayer ELM (MELM), the Multilayer Incremental Hybrid Cost Sensitive ELM with Multiple Hidden Output Matrix and Subnetwork Hidden Nodes was proposed in [9] to achieve better performance, and an incremental MELM with random search enhancement was proposed in [10], which selects the optimal node from multiple randomly generated nodes to possess a more compact network structure.
In [4], the inverse-free algorithm to update the regularized pseudo-inverse was deduced from an inverse-free recursive algorithm to update the inverse of a Hermitian matrix. Before the recursive algorithm to update the inverse was utilized in [4], it had been mentioned in previous literatures [11]- [15], while its improved version had been utilized in [15], [16]. Then in this article, from the improved recursive algorithm [15], [16] to update the inverse, we deduce a more efficient inverse-free algorithm to update the regularized pseudo-inverse, from which we propose the inverse-free incremental ELM algorithm based on regularized pseudoinverse. Usually the above-mentioned inverse is smaller than the pseudo-inverse, while in the processor units with limited precision, the recursive algorithm to update the inverse may introduce numerical instabilities after a very large number of iterations [17]. Then to further reduce the computational complexity, we also propose the inverse-free incremental ELM algorithm based on the LDL T factors of the inverse, where the LDL T factors are updated iteratively by the inverse LDL T factorization proposed in [18].
This article is organized as follows. Section II describes the ELM model. In Section III, we introduce the existing inverse-free incremental ELM algorithm [4]. Then Section IV and Section V propose the inverse-free incremental ELM algorithm based on regularized pseudo-inverse and that based on LDL T factors, respectively. In Section VI, we compare the expected computational complexities of the existing and proposed inverse-free incremental ELM algorithms, and evaluate them by numerical experiments. Finally, we make conclusion in Section VII.
In the following sections, the superscripts (•) T and (•) −1 denote the transpose and inverse operations of a matrix, respectively. Moreover, 0 l is the l × 1 zero column vector, while I l is the identity matrix of size l.

II. ARCHITECTURE OF THE ELM
In the ELM model, the n-th input node, the i-th hidden node, and the m-th output node can be denoted as x n , h i , and z m , respectively, while all the N input nodes, l hidden nodes, and M output nodes can be denoted as respectively. We can substitute (3) into (1) to obtain the hidden layer output matrix where H = h 1 h 2 · · · h K ∈ l×K is the value sequence of all l hidden nodes, and ⊗ is the Kronecker product [4].
Then we can substitute (5) and (4) into (2) to obtain the actual training output sequence In ELM, only the output weight matrix W is adjustable, while A and d are randomly fixed. Denote the training output sequence (i.e., the desired output) as Y. Then an ELM simply minimizes the estimation error by finding a least-squares solution W for the problem where • F denotes the Frobenius norm. For the problem (8), the unique minimum norm least-squares solution is [1] To avoid over-fitting, the popular Tikhonov regularization [19], [20] can be utilized to modify (9) into where k 2 0 > 0 denotes the regularization factor. Obviously (9) is just the special case of (10) with k 2 0 = 0. Thus in what follows, we only consider (10) for the ELM with Tikhonov regularization.

III. THE EXISTING INVERSE-FREE INCREMENTAL LEARNING ALGORITHM FOR THE REGULARIZED ELM
In machine learning, it is a common strategy to increase the hidden node number gradually until the desired accuracy is achieved. However, when this strategy is applied in ELM directly, the matrix inverse operation in (10) for the original ELM will be required when a few or only one extra hidden node is introduced, and then the algorithm will be computational prohibitive. Accordingly, an inverse-free strategy was proposed in [4], to update the output weights incrementally with the increase of the hidden nodes. In each step, the output weights obtained by the inverse-free algorithm are identical to the solution of the original ELM algorithm with the inverse operation.
Assume that in the ELM with l hidden nodes, we add one extra hidden node, i.e., the hidden node l + 1, which has the input weight row vectorā T l+1 = a (l+1)1 a (l+1)2 · · · a (l+1)N ∈ ( N ) T and the biasd l+1 . Then from (5), it can be seen that the extra rowh T l+1 = f ā T l+1 X +d l+1 1 T needs to be added to H, i.e., where H i (i = l, l + 1) denotes H for the ELM with i hidden nodes. Inā l+1 ,h l+1 ,d l+1 and what follows, we add the overline to emphasize the extra vector or scalar, which is added to the matrix or vector for the ELM with l hidden nodes. After H is updated by (11), the original ELM updates the output weights by (10) that involves an inverse operation. To avoid that inverse operation, the incremental learning algorithm for ELM in [4] utilizes an inverse-free algorithm to update the regularized pseudo-inverse and then substitutes (12) into (10) to compute the output weights by In [4], B l+1 (i.e., B for the ELM with l + 1 hidden nodes) is computed from B l iteratively by wherẽ Let and Then we can write (12) as From (17) where p l , a column vector with l entries, satisfies The inverse-free recursive algorithm computes Q l+1 = (R l+1 ) −1 by equations (16), (13), (14) and (11) in [4], which can be written as and respectively. Notice that in (22) and (23), t l is a column vector with l entries, and τ l is a scalar. The existing inverse-free incremental learning algorithm for ELM with Tikhonov regularization has been summarized in [4, Algorithm 2], which will be described in the following Algorithm 1.

Algorithm 1 The Existing Inverse-Free Incremental Learning Algorithm in [4] for the Increment of Hidden Nodes in Regularized ELM
Require: Desired approximation error η * , regularization factor k 2 0 , the input sequence X ∈ N ×K , the training output sequence Y ∈ M ×K , the initial ELM model with l 0 hidden nodes (input weights

Ensure:
An ELM with l hidden nodes (A, d, W) to reach the desired approximation error η * .

IV. THE PROPOSED INCREMENTAL ELM ALGORITHM BASED ON REGULARIZED PSEUDO-INVERSE
Actually the inverse-free recursive algorithm by (22) and (23) had been mentioned in previous literatures [11]- [15], before it was deduced in [4] by utilizing the Sherman-Morrison formula and the Schur complement. That inverse-free recursive algorithm can be regarded as the application of the block matrix inverse lemma [11, p.30], and was called the lemma for inversion of block-partitioned matrix [12, Ch. 14.12], [13, equation (16)]. To develop multiple-input multipleoutput (MIMO) detectors, the inverse-free recursive algorithm was applied in [13], [14], and its improved version was utilized in [15], [16].

Ensure:
An ELM with l hidden nodes (A, d, W) to reach the desired approximation error η * .

V. THE PROPOSED INCREMENTAL ELM ALGORITHM BASED ON LDL T FACTORS
The pseudo-inverse B defined by (19) is computed from the unique inverse Q, while usually Q is smaller than B. Then we update Q instead of B, to further reduce the computational complexity. Moreover, instead of the unique inverse Q, we update the LDL T factors of Q, to improve numerical stabilities. Finally in this section, we propose the incremental ELM based on LDL T factors (of the unique inverse).

A. THE ELM ALGORITHM UPDATING THE UNIQUE INVERSE INSTEAD OF THE PSEUDO-INVERSE
Since H is l×K , the pseudo-inverse B defined by (12) is K ×l, and the unique inverse Q defined by (18) is l × l. In SLFNs, usually there are more training samples than hidden nodes, i.e., K > l. Accordingly, usually the l × l matrix Q is smaller than the K × l matrix B. Then to further reduce the computational complexity, we can update Q instead of B, and utilize Q to compute the output weights W. We update Q by (21), (24) and (23), and update W by (28) whereW l and w l+1 are computed from the entries in the updated Q.

B. THE PROPOSED INCREMENTAL ELM UPDATING LDL T FACTORS OF THE UNIQUE INVERSE
Since the processor units are limited in precision, the recursive algorithm utilized to update Q may introduce numerical instabilities, which occurs after a very large number of iterations [17]. Thus instead of Q, we update the LDL T factors of Q, since usually the LDL T factorization is numerically stable [21]. The LDL T factors include the upper-triangular L and the diagonal D, which satisfy From (44), we can deduce where the lower-triangular L −T and the diagonal D −1 are the conventional LDL T factors [21] of R. It can be seen from (44) that L and D are the inverse LDL T factors [18] of R. Then L and D can be updated iteratively by the inverse LDL T factorization proposed in [18], i.e., where We can substitute (44) into (47a) and (47b), respectively, to show thatt l in (47a) and t l in (24b) satisfỹ and τ l in (47b) is equal to τ l in (24a). Then we can substitute (48) into (41) to obtaiñ After updating L and D, we can compute the output weights W by (43), (49) and (28). Substitute (21) into (43) and (47) to obtain and respectively. Also substitute (49) into (28) to obtain Lastly, let us utilize (51), (46), (50) and (52), to summarize the proposed inverse-free incremental learning algorithm for the regularized ELM, in the following Algorithm 3.
Notice that in Algorithm 3, we can utilize the conventional LDL T factorization [21] to compute (L l 0 ) −T and (D l 0 ) −1 from R l 0 by (45), and then compute the initial L l 0 and D l 0 from (L l 0 ) −T and (D l 0 ) −1 , respectively. On the other hand, we can also utilize the inverse LDL T factorization in [18], to compute L l 0 and D l 0 from R l 0 directly.

Algorithm 3 The Proposed Incremental ELM Based on LDL T Factors
Require: Desired approximation error η * , regularization factor k 2 0 , the input sequence X ∈ N ×K , the training output sequence Y ∈ M ×K , the initial ELM model with l 0 hidden nodes (input weights A l 0 ∈ l 0 ×N , biases d l 0 ∈ l 0 , regularized pseudo-inverse B l 0 ∈ K ×l 0 , output weights W l 0 ∈ M ×l 0 ).

Ensure:
An ELM with l hidden nodes (A, d, W) to reach the desired approximation error η * .

VI. COMPLEXITY ANALYSIS AND NUMERICAL EXPERIMENTS
In this section, we analyze the computational complexities of the presented inverse-free ELM algorithms. Then we carry out numerical experiments to show the performance of the presented inverse-free ELM algorithms, and compare their complexities. In the rest of this section, the proposed incremental ELM algorithm based on regularized pseudo-inverse (i.e., Algorithm 2) and that based on LDL T factors (i.e., Algorithm 3) will be denoted as the proposed ELM algorithms 1 and 2, respectively, for the sake of simplicity. VOLUME 8, 2020

A. COMPLEXITY ANALYSIS OF THE PRESENTED INVERSE-FREE ELM ALGORITHMS
In this subsection, we compare the expected flops (floatingpoint operations) of the existing ELM algorithm in [4] and the proposed ELM algorithms 1 and 2. Obviously l 1 l 3 (2 l 2 − 1) ≈2 l 1 l 2 l 3 flops are required to multiply a l 1 × l 2 matrix by a l 2 × l 3 matrix, and l 1 l 2 flops are required to sum two matrices in size l 1 × l 2 [4].
In Table 1, we compare the flops of the existing ELM algorithm [4] and the proposed ELM algorithms 1 and 2. As in [4], the flops of the existing ELM algorithm do not include the 0(lK ) entries for simplicity, since usually the ELM has large K (the number of training examples) and l (the number of hidden nodes). The flops of the proposed ELM algorithms do not include the entries that are 0(lK ) or 0(MK ). Since usually M /l ≈ 0, it can easily be seen from Table 1 that with respect to the existing ELM algorithm, the proposed ELM algorithms 1 and 2 only require about 3 8+M and 1 8+M of flops, respectively. Notice that in the proposed ELM algorithm 1,h T l+1 B l computed in (27) can be utilized in (26) and (29a). The dominant computational load of the proposed ELM algorithm 1 comes from (21), (27), (25) and (29b), of which the flops are 2Kl, 2Kl, 2Kl and 2KM , respectively. Moreover, in the proposed ELM algorithm 2, the dominant computational load comes from (21) and (43), of which the flops are 2Kl and 2KM , respectively.

B. NUMERICAL EXPERIMENTS
We follow the simulations in [4], to compare the existing inverse-free ELM algorithm and the proposed inverse-free ELM algorithms 1 and 2 on MATLAB software platform under a Microsoft-Windows Server with 128 GB of RAM. We utilize a fivefold cross validation to partition the datasets into training and testing sets. To measure the performance, we employ the mean squared error (MSE) for regression problems, and employ four commonly used indices for classification problems, i.e., the prediction accuracy (ACC), the sensitivity (SN), the precision (PE) and the Matthews correlation coefficient (MCC). Moreover, the regularization factor is set to k 2 0 = 0.1 to avoid over-fitting. For the regression problem, we consider energy efficiency dataset [22], housing dataset [23], airfoil self-noise dataset [24], and physicochemical properties of protein dataset [25]. As Fig. 1 in [4], Fig. 1 in this article shows MSEs for the original ELM algorithm (that computes the output weights by (10)), the existing inverse-free ELM algorithm, and the proposed inverse-free ELM algorithms 1 and 2. Moreover, as Table 4 in [4], Table 2 shows the regression performance, where different activation functions are chosen for different datasets, which include Gaussian, sigmoid, sine and triangular. In Table 2, the weight error and the output error are defined as W 1 − W 2 F and Z 1 − Z 2 F , respectively, where W 1 and Z 1 are computed by an inverse-free ELM algorithm, and W 2 and Z 2 are computed by the original ELM algorithm. We set the initial hidden node number to 2, and utilize the existing and proposed inverse-free ELM algorithms to  add the hidden nodes one by one till the hidden node number reaches 500. Table 2 includes the simulation results for the hidden node numbers 3, 100 and 500.
From Fig. 1 and the last column of Table 2, it can be seen that the proposed inverse-free ELM algorithms 1 and 2 always achieve the same accuracy as the existing inverse-free ELM FIGURE 1. MSEs for the proposed inverse-free ELM, the existing inverse-free ELM in [4], and the original ELM based on matrix inversion.
in [4] and the original ELM based on matrix inversion. On the other hand, Table 2 shows that the weight error and the output error are less than 10 −13 after 1 iteration (i.e., the node number 3), less than 10 −10 after 98 iterations (i.e., the node number 100), and not greater than 2 × 10 −9 after 498 iterations (i.e., the node number 500). Then we can conclude that the original ELM, the existing inverse-free ELM in [4], and the proposed inverse-free ELM algorithms 1 and 2 all result in the same solutions, which include the output weight matrix and the output sequence for the same input sequence.
The speedups in training time of the proposed inverse-free ELM algorithms 1 and 2 over the existing inverse-free ELM algorithm are shown in Table 3, where we add just one node to reach 100 and 500 nodes, respectively, and we do 1000 simulations to compute the average training time. The speedups are computed by T existing /T proposed , i.e., the ratio between the training time of the existing ELM algorithm and that of the proposed ELM algorithm (1 or 2). As observed from Table 3, both the proposed algorithms 1 and 2 significantly accelerate the existing inverse-free ELM algorithm.
For the classification problem, we consider MAGIC Gamma telescope dataset [26], musk dataset [27], adult dataset [28] and diabetes dataset [25]. For each dataset, five activation functions are simulated, i.e., Gaussian, sigmoid, Hardlim, triangular and sine. In the simulations, the original ELM, the existing inverse-free ELM, and the proposed inverse-free ELM algorithms 1 and 2 all achieve the same performance, as listed in Table 4.
Lastly, in Table 5 we simulate the existing and proposed inverse-free ELM algorithms on the Modified National Institute of Standards and Technology (MNIST) dataset [29] with 60000 training images and 10000 testing images, to show the performance on big data. To give the testing accuracy, we set the initial hidden node number to 2000, and utilize the existing and proposed ELM algorithms to add hidden nodes one by one till the hidden node number reaches 2200. To give the speedups of the proposed inverse-free ELM algorithms 1 and 2 over the existing inverse-free ELM algorithm, we compare the training time to reach 2200 nodes by adding one node, and do 500 simulations to compute the average training time.
As observed from Table 5, the existing and proposed inverse-free ELM algorithms bear the same testing accuracy, and usually the proposed algorithm 2 is much faster than the proposed algorithm 1 on MNIST Dataset. Moreover, from Table 3 and Table 5, we can conclude that both the proposed algorithms 1 and 2 significantly accelerate the existing inverse-free ELM algorithm, and the speedup in training time of the proposed inverse-free ELM algorithm (1 or 2) over the existing inverse-free ELM algorithm is not less than 1.41.

VII. CONCLUSION
In this article, we utilize the improved recursive algorithm [15], [16] that updates the inverse, to deduce an inverse-free algorithm to update the regularized pseudoinverse, which is more efficient than the corresponding algorithm utilized in the existing inverse-free incremental learning algorithm [4] for the regularized ELM. Accordingly, we propose the inverse-free incremental ELM algorithm based on regularized pseudo-inverse, which reduces the computational complexity of the existing inverse-free incremental ELM algorithm in [4].
Usually the unique inverse is smaller than the pseudoinverse, while for the processor units with limited precision, the recursive algorithm to update the unique inverse may introduce numerical instabilities after a very large number of iterations [17]. Then to further reduce the computational complexity, we update the LDL T factors of the unique inverse instead of the pseudo-inverse, by the inverse LDL T factorization proposed in [18]. Accordingly, we propose the inverse-free incremental ELM algorithm based on LDL T factors (of the unique inverse).
With respect to the existing inverse-free ELM algorithm, the proposed ELM algorithm based on regularized pseudoinverse and that based on LDL T factors are expected to require only 3 8+M and 1 8+M of flops, respectively, where M is the output node number. The numerical experiments show that both the proposed ELM algorithms significantly accelerate the existing inverse-free ELM, and the speedup in training time is not less than 1.41. On MNIST Dataset, usually the proposed algorithm based on LDL T factors is much faster than that based on regularized pseudo-inverse. On the other hand, in the numerical experiments, the original ELM, the existing inverse-free ELM and the proposed two ELM algorithms achieve the same performance in regression and classification, and result in the same solutions, which include the output weights and the output sequence for the same input sequence.
In this article, we focus on the incremental learning for the regularized ELM, and propose two efficient inverse-free algorithms to add hidden nodes. However, usually it is also required to prune redundant hidden nodes [20], [30]- [33] by decremental learning algorithms in machine learning. Accordingly, our future works will include an efficient algorithm to remove hidden nodes in the regularized ELM.