Fault and Noise Tolerance in the Incremental Extreme Learning Machine

The extreme learning machine (ELM) is an efficient way to build single-hidden-layer feedforward networks (SLFNs). However, its fault tolerant ability is very weak. When node noise or node failure exist in a network trained by the ELM concept, the performance of the network is greatly degraded if a countermeasure is not taken. However, this kind of countermeasure for the ELM or incremental learning is seldom reported. This paper considers the situation that a trained SLFN suffers from the coexistence of node fault and node noise. We develop two fault tolerant incremental ELM algorithms for the regression problem, namely node fault tolerant incremental ELM (NFTI-ELM) and node fault tolerant convex incremental ELM (NFTCI-ELM). The NFTI-ELM determines the output weight of the newly inserted node only. We prove that in terms of the training set mean squared error (MSE) of faulty SLFNs, the NFTI-ELM converges. Our numerical results show that the NFTI-ELM is superior to the conventional ELM and incremental ELM algorithms under faulty situations. To further improve the performance, we propose the NFTCI-ELM algorithm. It not only determines the output weight of the newly inserted node, but also updates all previously trained output weights. In terms of training set MSE of faulty SLFNs, the NFTCI-ELM converges, and it is superior to the NFTI-ELM.


I. INTRODUCTION
A single-hidden-layer feedforward network (SLFN) [1], [2] is able to act as a universal approximator.In the traditional training approach [3], [4], we need to determine all the connection weights, including the input biases, the input weights, and the output weights of hidden nodes.The traditional approach may trigger some well-known problems.For example, when there are many hidden nodes, the computational complexity is very high.
Instead of training all the weights, Huang et al. [5], [6] proposed the extreme learning machine (ELM) concept, in which the parameters of hidden nodes were generated randomly.For the SLFN case, only the output weights were required to be trained.In [5], Huang et al. formally proved that a SLFN with randomly generated hidden nodes could act as a universal approximator too.Recently, a number of The associate editor coordinating the review of this manuscript and approving it for publication was Xi Peng .studies on the ELM abilities were reported [7]- [9].Also, many applications make use of the ELM concept.For instance, Pan et al. [10] proposed an ELM model for simulating a visual neuron system and for extracting the leukocyte from images.Minhas et al. [11] developed a human action recognition framework to assign the action label for the video based on the ELM concept.Wang et al. [12] combined the ELM mapping with a multi-view framework to extract some features with good representation for training, which could be feasible to a multi-view discriminant analysis [13] too.The ELM concept [14] could work with an autoencoder for clustering and subspace clustering [15].In [16]- [18], other ELM applications were described in details.However, ELM algorithms used in these applications require to fix the size of hidden layer during training.Hence, it is less flexible than an incremental algorithm, which adds hidden nodes incrementally into a neural network during training.
In [5], [19], Huang et al. proposed two incremental ELM algorithms for the SLFN model, namely incremental ELM (I-ELM) and convex incremental ELM (CI-ELM).The concept of the incremental learning is that we add hidden nodes incrementally into an existing network until the predefined condition reaches.They also mentioned that some weights between the input nodes and hidden nodes in the ELM could be disconnected in some situations [20].If the disconnection occurred in an uncontrolled manner, it could be described as a faulty situation.However, these two incremental algorithms were designed for fault-free situations only.We believed that fault and noise could greatly degrade the performance for the I-ELM and CI-ELM, if special procedures were not considered [21], [22].
In the realization of a neural network, fault and noise are unavoidable due to some practical issues.For instance, in the hardware implementation of neural network [23], various kinds of fault may happen [24], [25], such as fault in weights and fault in an activation function.If fault happens in the activation functions, we can model it as node fault [26]- [28].Moreover, noise may happen in either a digital or analog implementation.For the digital implementation, when the floating format is used, the round-off error of a number is proportional to its magnitude.Hence, we can use the multiplicative noise model [29], [30] to describe the error.For the analog implementation, noise always exists in the amplifier output [31].In addition, transient noise and fault may happen when the implementation is at nanoscale [32].
In the last decades, many fault tolerant approaches for neural networks were developed.For instance, the authors in [28], [33]- [36] proposed a failure/chaos injection approach, which was suitable for online mode training.However, this approach is unable to capture the failure behavior when the training iterations are not sufficient.Another approach is to formulate the training process as a constrained optimization [37]- [39], in which the constraints are defined by the fault tolerant level.Apparently, this approach does not guarantee to have a feasible solution if the constraints are too strict.The above approaches are not suitable for the incremental learning mode.
The weight decay concept could improve the fault tolerant ability [40], [41].However, the objective function of the weight decay concept is not identical to the training set error of faulty networks.Hence, even for fault tolerant radial basis function (RBF) networks [22], [42], an optimal fault tolerant solution cannot be obtained, regardless of how the weight decay parameter is tuned.To the best of our knowledge, only few results in the fault tolerant incremental learning were reported [43].
This paper focuses on regression problem and the incremental learning mode, in which we incrementally insert hidden nodes in an SLFN.We propose two ELM based fault tolerant incremental learning algorithms.They are called node fault tolerant I-ELM (NFTI-ELM) and node fault tolerant CI-ELM (NFTCI-ELM), respectively.They are able to handle the coexistence of node fault and multiplicative node noise.We first define a fault tolerant objective function for SLFNs.By considering the change of the fault tolerant objective values, we derive the way to determine the output weight of the newly inserted hidden node.For the NFTI-ELM, the previously trained output weights remain unchanged.Simulation results confirm that the fault tolerant performance of the NFTI-ELM are better than those of the CI-ELM and I-ELM.To boost up the fault tolerant ability, the NFTCI-ELM is developed based on the CI-ELM concept.Unlike the NFTI-ELM, the NFTCI-ELM updates all previously trained output weights after determining the output weight of the newly inserted node.We prove that in terms of the faulty training set MSE, the two proposed algorithms converge.Simulation results show that the fault tolerant performance of the NFTCI-ELM is the best among all the compared algorithms.The compared algorithms include the I-ELM, the CI-ELM and the batch mode ELM.The batch mode ELM gives us a baseline since it is the best under fault-free situations.We use the well-known statistical significance test to verify the improvement for our proposed algorithms is statistically significant.Also, even the incorrect fault level is used for training, the performance of the NFTI-ELM and NFTCI-ELM are still better than that of the I-ELM and CI-ELM.
Our major contributions are summarized as follows.
• Two fault tolerant incremental learning algorithms are proposed, namely the NFTI-ELM and NFTCI-ELM, for SLFNs.
• The convergences of the two algorithms are proved in terms of the training set MSE.The rest of this paper is organized as follows.Section II provides the background on the basic ELM model.The effect of node failure and node noise on the SLFN model is presented in Section III.Section IV describes the details of the two proposed algorithms.Section V presents the simulation results.Section VI concludes the paper.

II. ELM AND INCREMENTAL LEARNING
This paper considers to use SLFNs for nonlinear regression.The training set is denoted as D t = (x l , y l ) : l = 1, ..., N , where x l ∈ R d is the l-th training input vector, d is the dimensions of data, and y l ∈ R is its associated training output.Similarly, the test set is denoted as D f = (x l , y l ) : l = 1, ..., N , where x l ∈ R d is the input vector of the l −th test sample, and y l ∈ R is its associated output.
The output of an SLFN [1], [2], [44] is expressed as where n is the number of hidden nodes, h i (x) is the output of the i-th hidden node, and β i is the output weight of the ith hidden node.In this paper, we use a sigmoid function as the activation function.The output of the i-th hidden node is given by  where w i and b i are denoted as the input weight vector and input bias term of the i-th hidden node, respectively.It should be noticed that other activation functions [5], [19] are also applicable to our proposed algorithms.
In the ELM model, the values of w i and b i are selected randomly.Under fault-free situations, the training set error is given by where T is the collection of outputs of the i−th hidden node for all training samples, and y = y 1 , ..., y N T is the collection of its training outputs.In [45], based on (3), a batch mode ELM algorithm was proposed.By minimizing E, the optimal output weights of hidden nodes β * can be obtained by where β * = β 1 , ..., β n T is the collection of the output weights of hidden nodes, and H = h 1 , ..., h n is the output matrix of the hidden layer.In Table 1, the total complexity for the batch mode ELM is ).In [5], [19], two incremental training algorithms were developed, i.e., the I-ELM and the CI-ELM.For the I-ELM, a randomly generated hidden node is inserted into the network at each training iteration.It is noticed that only the output weight of the newly inserted node is tuned, while the previously trained weights remain unchanged.The CI-ELM uses a similar training scheme, except that it uses a simple rule stated in (26) to update the previously trained weights.Therefore, the computational complexities of these two algorithms are significantly low.In Table 1, the total complexity of adding n hidden nodes for the I-ELM is O(n × d × N ).For the CI-ELM, the total complexity of adding n hidden nodes is ).The experimental results showed that both I-ELM and CI-ELM have a nice performance [5], [19] under fault-free situations.However, under faulty situations, the I-ELM and CI-ELM result in poor performance, as (3) does not encounter the fault tolerant ability.

III. FAULTY SLFNS A. NODE NOISE AND NODE FAULT
When we use finite precision technology [29], [30] to implement hidden nodes, the hidden node outputs may deviate from their original values.The deviations are usually proportional to the magnitude of their original output values [29], [30].
In addition, in analog circuits, the deviations from the original output values are usually specified in terms of percentage error.The deviations can be modelled as multiplicative noise [29], [46].In the multiplicative noise model, the hidden node outputs are described as where δ i 's are the normalized noise factors.This paper assumes that the normalized noise factors are identically independently distributed random variables.Their mean are equal to zero, and their variance are equal to σ 2 .
In some situations, physical fault may happen.For example, when a communication link between a hidden node and an output node is broken, the output signal of the hidden node cannot be transmitted to the output node.In this case, we use the open node fault model to describe the outputs [21], [28], [47].The hidden node outputs are given by where α i 's are fault factors.They express whether the outputs of hidden nodes are tied to zero or not.When α i = 0, the output of the i−th node is tied to zero.Otherwise, the i−th hidden node operates correctly.This paper assumes that the fault factors α i 's are the identically independently distributed binary random variables.The probability mass function of α i is given by Prob α i = 0 = p, and Prob When multiplicative node noise and open node fault coexist, the hidden node outputs are given by In (8), once a hidden node is opened, its output is tied to zero, regardless of the multiplicative noise level.When a hidden node is not opened, its output is then influenced by the multiplicative noise only.
From the statistical properties of α i and δ i , we obtain the following statistical properties of the hidden node outputs: where • is the expectation operator over α i and δ i .

B. TRAINING SET ERROR AND TRAINING OBJECTIVE
For a faulty network, the training set error for a particular fault pattern is given by From ( 9)-( 10), the average training set MSE over all possible fault patterns can be expressed as: Defining we can rewrite (11) as The expression stated in (13) gives us a direct way to compute the training set MSE of faulty SLFNs, which is lower bounded by zero.The advantage of using ( 13) is that we do not need to generate a large number of faulty networks.Since the term p y 2 2 in ( 13) is not a function of β i 's and the term (1 − p) is a constant, the training objective can be simplified to As the training objective in ( 14) is derived from the training set MSE in (13), the training objective is lower bounded by Based on (14), we can develop the fault tolerant incremental algorithms for SLFNs.

IV. FAULT TOLERANT ELM ALGORITHMS
The concept of the incremental learning is that we add hidden nodes one-by-one to the network.For ease of the description, we define some notations for the incremental learning, given by The vector f n represents the collection of the network outputs for all training samples when n hidden nodes are used.
The vector e n represents the collection of the errors for all training samples when n hidden nodes are used.The value v n represents the regularizer term.With ( 15), ( 16) and ( 17), for an SLFN with n hidden nodes, the objective function is given by In (18), the term ''(p ' is the effect of the node fault and node noise.

A. NODE FAULT TOLERANT I-ELM 1) ALGORITHM
In the NFTI-ELM, when a node is newly inserted to the network at the n-th iteration, we determine the output weight of the newly inserted node and keep all previously trained weights At the n-th iteration, the objective function, stated in (18), can be expressed as The change of the objective values between two consecutive iterations is given by It should be noticed that n is a quadratic function of β n with a minimum value equal to a negative value.To maximize the reduction of the objective value, we consider the partial derivative of n , given by By setting ∂ n /∂β n = 0, we obtain 2) CONVERGENCE AND COMPLEXITY By substituting the optimal β n into (22), the change of the objective value n becomes  Increment n by 1.

6:
Insert a new node to the SLFN.Its input bias b n and input weights a n are randomly generated.

7:
Compute the corresponding output vector h n of this hidden node.

8:
Compute the output weight of the newly inserted node: . 9: 10: e n = y − f n .11: end while

B. NODE FAULT TOLERANT CI-ELM 1) ALGORITHM
Under the fault-free situation [19], when we are allowed to update the previously trained weights, the performance can be enhanced.However, as shown in Fig. 1(a)-(i), the original CI-ELM cannot handle the faulty situation.Hence, it is necessary to develop a fault tolerant version for the CI-ELM.
At the n-th iteration, after we determine the output weight β n of the newly inserted node, we update all previously trained weights by for i = 1 to n − 1.With this new update scheme in β i 's, the recursive definitions for f n , e n and v n become where f 0 = 0, e 0 = y and v 0 = 0. From ( 26)-( 29), the objective value at the n-th iteration can be expressed as Algorithm 2 NFTCI-ELM 1: Set n equal to zero, n = 0. 2: Set the initial residue error to y, e 0 = y.

7:
Insert a new hidden node to the SLFN.Its input bias b n and input weights a n are randomly generated.

8:
Compute the corresponding output vector h n of this hidden node.

10:
Compute the new weight: . 11: 12: e n = y − f n . 13: . 14: The change of the objective value between two consecutive iterations is then given by where Again, for the NFTCI-ELM, n is a quadratic function of β n with a minimum value equal to a negative value.To maximize the reduction of the objective value, we should set . (34)

2) CONVERGENCE AND COMPLEXITY
By substituting the optimal β n into (32), the change of the objective value n becomes .  is lower bounded by zero, the objective value is also lower bounded.Since the training objective L is decreasing and lower bounded, we notice that the proposed NFTCI-ELM, in terms of L, converges.
The procedures of the NFTCI-ELM are summarized in Algorithm 2. At the n-th iteration, the computational complexity is O(d × N ) + O(n).Hence, the total complexity of adding n hidden nodes is given by O(n × d × N ) + O(n2 ), as shown in Table 1.Comparing to the NFTI-ELM, the update of β i 's in the NFTCI-ELM increases the computational complexity.

V. SIMULATION RESULTS
The performance of the proposed NFTI-ELM and NFTCI-ELM are verified by comparing it against other ELM algorithms, i.e., the I-ELM, the CI-ELM and the batch mode ELM, under faulty situations.For the batch mode ELM, the output weights of hidden nodes β can be obtained by the least square solution of SLFN, i.e., β = (H T H) −1 H T y.It gives us a baseline for comparison since given a set of hidden nodes it provides the best solution under fault-free situations.For the simulation, ten well-known benchmark datasets in the field of regression problem are used.

A. DATASETS AND SETTINGS
Ten commonly used datasets for regression problem are selected from the UCI 1 and KEEL 2 dataset repositories [48]- [57].Table 2 summarizes the properties of the ten datasets.Abalone [49] aims at predicting the age of abalones by using 8 features as input.Concrete [50] aims at predicting the concrete compressive strength by using 8 features as input.Boston Housing [51] aims at predicting the housing value in Boston by using 13 features as input.Wine Quality [52] aims at predicting the quality of wine with a score between 0 and 10 by using 11 features as input.Airfoil Self-Noise (ASN) [53] aims at predicting the sound pressure level in an anechoic wind tunnel by using 5 features as input.
Auto MPG [54] aims at predicting the fuel consumption for cars in miles per gallon by using 7 features as input.Mortgage [55] aims at predicting the 30 Year-Conventional Mortgage Rate in the USA by using 15 features as input.Weather Ankara [57] aims at predicting the mean temperature in Ankara by using 9 features as input.Parkinsons Telemonitoring [56] aims at predicting the clinician's Parkinson's disease symptom score on the UPDRS scale by using 20 features as input.Computer Activity aims at predicting the portion of time that CPUs run in user mode by using 21 features as input.
We adopt the validation method mentioned in [5] with these 10 datasets.For each dataset, the samples are randomly split for the training and test sets for 20 trials; hence, we have 20 partitions.For each partition, we run the simulation with 20 sets of random hidden nodes.Therefore, the total number of trails is 400.The training inputs and outputs are normalized to the range of [−1, 1] and [0, 1], respectively.

B. TEST SET MSE VERSUS NUMBER OF HIDDEN NODES
We demonstrate how the test set MSE changes with respect to the various numbers of hidden nodes using three datasets.The datasets are Abalone [49], Concrete [50], and Boston Housing [51].We consider three different fault levels.They are {p = 0.01 σ 2 = 0.04}, {p = 0.05 σ 2 = 0.09} and {p = 0.1 σ 2 = 0.16}.Fig. 1 shows the test set MSE versus the number of nodes under faulty situations.It is noticed that the test set MSE values of the CI-ELM and batch mode ELM are much higher than those of the other three algorithms.In other words, the fault tolerant ability of the CI-ELM and batch mode ELM is very weak.For the I-ELM, NFTI-ELM and NFTCI-ELM, when the number of hidden nodes is more than 500, the decreasing rate of the test set MSE becomes slow and reaches a plateau.
As shown in Fig. 1(c),(f),(i), the improvements for the NFTI-ELM and NFTCI-ELM are more significant under high fault levels.For instance, in the Abalone dataset, when the fault level {p = 0.01, σ 2 = 0.04} and 500 hidden nodes are considered, the test set MSE values of the CI-ELM and the batch mode ELM are very large.When we use the I-ELM, the MSE value is equal to 0.01480.For the NFTI-ELM, the MSE value is equal to 0.01437.Compared to the I-ELM, the improvement for the NFTI-ELM is 0.00043.For the CI-ELM, the MSE value is equal to 0.00822.Compared to the I-ELM, the improvement for the NFTCI-ELM is 0.00658.
When the fault level raises to {p = 0.1, σ 2 = 0.16}, the MSE value of the I-ELM is equal to 0.03577.For the NFTI-ELM, the MSE value is reduced to 0.02908.Compared to the I-ELM, the improvement for the NFTI-ELM is 0.00669.For the NFTCI-ELM, the MSE value is further reduced to 0.00911.Compared to the I-ELM, the improvement for the NFTCI-ELM is 0.02666.

C. COMPARISON WITH THE I-ELM, CI-ELM AND BATCH MODE ELM
We compare the performance of our NFTI-ELM and NTCI-ELM with the I-ELM, CI-ELM and batch mode ELM in terms of test set MSE using 500 hidden nodes.We consider three different fault levels.They are {p = 0.01 σ 2 = 0.04}, {p = 0.05, σ 2 = 0.09} and {p = 0.1, σ 2 = 0.16}.As mentioned, we partition the dataset into the training set and test set 20 times.For each partition, we generate 20 sets of random hidden nodes.Hence, we run the simulation 400 times for each dataset and each fault level.Table 3 shows the average test set MSE values of the algorithms for 400 trails under the three fault levels.The results indicate that the performance of our NFTI-ELM and NFTCI-ELM are much better than those of the other algorithms.For instance, in Abalone dataset with the fault level {p = 0.05, σ = 0.09}, the MSE values of the batch mode ELM, I-ELM and CI-ELM are 0.65088, 0.02291 and 0.16337, respectively.On the other hand, the MSE values of our NFTI-ELM and NFTCI-ELM are 0.02063 and 0.00852, respectively.We notice that the NFTCI-ELM has the best performance, which has the lowest MSE values among the other algorithms.
Moreover, compared to other algorithms, the NFTCI-ELM are relatively insensitive to the fault level.For instance, in Abalone dataset, when the fault level is equal to {p = 0.01, σ = 0.04}, the MSE value of the NFTI-ELM is 0.01336 from Table 3.When the fault level is equal to {p = 0.1, σ = 0.16}, the MSE value of the NFTI-ELM increases to 0.02806.For the NTCI-ELM, when the fault level is equal to {p = 0.01, σ = 0.04}, the MSE value is 0.00795.Even we greatly increase the fault level to {p = 0.1, σ = 0.16}, the MSE value slightly increases to 0.00887 only.This phenomenon also happens in other datasets.

D. STATISTICAL TEST ANALYSIS
To verify the superiority of our algorithms, we run the statistical test, i.e., paired t-test.The paired t-test illustrates whether the improvements for our algorithms are statistically significant or not.As the fault tolerant performance of the batch mode ELM and CI-ELM are poor, we only run the t-test for the NFTI-ELM against I-ELM the NFTCI-ELM against I-ELM in Table 4 and Table 5, respectively.For the paired t-test with 400 trials and 95% confidence level, the critical t-value is 1.6486.
In Table 4, all obtained t-values are far beyond the critical t-value, i.e., 1.6486.Also, all confidence intervals of the improvements exclude zero.For instance, in Abalone dataset with the fault level {p = 0.01, σ 2 = 0.04}, the obtained t-value is 346.10792, and the confidence interval is [0.000378, 0.000382].Similarly, all obtained t-values are much greater than the critical t-value in Table 5.Both paired t-tests conclude that the improvement for our NFTI-ELM and NFTCI-ELM are statistically significant.

E. INCONSISTENCE IN FAULT LEVEL BETWEEN OPERATION AND TRAINING
In our proposed algorithms, we assume that the fault level {p, σ 2 }, which is the true fault level during operation, is used for training.However, this true fault level may be unknown in some cases.In other words, the fault level used for training may different from the true fault level during operation.Hence, we investigate the situation that we use an incorrect fault level for training.Fig. 2 and Fig. 3 illustrate the impact of using an incorrect fault level on the performance of our algorithms.Given the true fault level {p = 0.05, σ 2 = 0.09}, the performance of the NFTI-ELM and NFTCI-ELM with different training fault levels, i.e., {p = 0.01, σ 2 = 0.04}, {p = 0.05, σ 2 = 0.09} and {p = 0.1, σ 2 = 0.16}, are shown in Fig. 2.
Fig. 2(a)-(c) shows the performance of the NFTI-ELM.We notice that even the incorrect fault level for training is used, the performance of the NFTI-ELM is still better than that of the I-ELM.For instance, in Abalone dataset, the MSE values of the NFTI-ELM with three training fault levels, i.e., {p = 0.01, σ 2 = 0.04}, {p = 0.05, σ 2 = 0.09} and {p = 0.1, σ 2 = 0.16}, are 0.02171, 0.02045 and 0.01963, respectively.Apparently, these MSE values are all smaller than the MSE value of the I-ELM, i.e., 0.02286.= 0.04} and {p = 0.05, σ 2 = 0.09}, are 0.007992 and 0.008222, respectively.These MSE values are similar to that of the I-ELM, i.e., 0.007977.Meanwhile, the NFTCI-ELM has similar phenomenon, as shown in Fig. 3(d)-(f).Therefore, our NFTI-ELM and NFTCI-ELM still perform excellently even though the true fault level is unknown, and the incorrect fault level is used for training.

F. DISTRIBUTION ANALYSIS OF β
The CI-ELM performs poorly under faulty situations.We believe that this behaviour is related to the distribution of the trained output weights β's.When the weights' magnitudes are large, the network output is very sensitive to the weight perturbation, i.e., fault and noise.This results in the poor performance of the CI-ELM under faulty situations.Fig. 4(a)-(m) shows the histograms of the output weights' magnitudes, which are the normalized frequencies versus the weight magnitudes.The results from the three datasets with the fault level {p = 0.1, σ 2 = 0.16} are displayed.In Fig. 4(h)-(j), the CI-ELM contains many weights in large magnitudes.Thus, the trained networks of the CI-ELM are sensitive to fault and noise.This behaviour results in poor fault tolerant performance.On the other hand, the NFTCI-ELM has few weights in large magnitudes, as shown in Fig. 4(k)-(m).Hence, the trained networks of the NFTCI-ELM are insensitive to fault and noise.

VI. CONCLUSION
In this paper, we propose two node fault tolerant incremental algorithms for SLFNs, namely NFTI-ELM and NFTCI-ELM.The two proposed algorithms aim to maximize the reduction of the training set MSE of faulty networks between two incremental steps.
For the NFTI-ELM, we train the output weight of the newly inserted node, whereas the weights in other nodes remain unchanged.The simulation results show that the fault tolerant performance of the NFTI-ELM is better than that of other ELM algorithms, including I-ELM, CI-ELM and batch mode ELM.In order to boost the performance, the NFTCI-ELM is then developed.The idea is to train the output weight of the newly inserted node, and to update the previously trained weights.The simulation results show that the performance of the NFTCI-ELM is better than that of the NFTI-ELM.
Meanwhile, the statistical test confirms our NFTI-ELM and NFTCI-ELM have a statistically significant improvement under faulty situations.Since the results show that the NFTCI-ELM is the best among other algorithms, one may argue that we do not need to consider the NFTI-ELM.However, the computational complexity of the NFTI-ELM is lower than that of the NFTCI-ELM.For someone concerning training speed, the NFTI-ELM might be another choice.In general, we believe that the NFTI-ELM and NFTCI-ELM should have certain potential be applied to the hardware implementation of neural networks in the future.
This paper focuses on regression problems.Our fault tolerant incremental algorithms, as well as the original I-ELM and CI-ELM, aim at minimizing the reduction in MSE between two consecutively incremental steps.As classification problems focus on the recognition error, it may not be appropriate to directly extend our algorithms for classification problems.Hence, one the future works is to develop fault tolerant incremental algorithms for classification problems.

Algorithm 1 3 :
summarizes the proposed NFTI-ELM.It can be seen that the computational complexity for Steps (7)-(10) at each iteration is O(d × N ).Hence, the total complexity for adding n nodes is equal to O(n × d × N ), as shown in Table1.It should be noticed that the computational complexity of the I-ELM at each iteration is equal to O(d × N ) too.Algorithm 1 NFTI-ELM1: Set n equal to zero, n = 0. 2: Set the initial residue error to y, e 0 = y.Set the initial network output to a zero vector, f 0 = 0. 4: while n ≤ n max do 5:

FIGURE 1 .
FIGURE 1.The performance of algorithms versus the number of additive nodes.The multiplicative noise and probability of open fault intensity levels are σ 2 and p.The datasets are Abalone, Concrete, and boston housing.

FIGURE 4 .
FIGURE 4. The histogram of β's, which is normalized frequency versus weight magnitude.The fault rate are = 0.1 and σ 2 = 0.16.The datasets are Abalone, Concrete, boston housing, Wine quality and ASN.

Fig. 2 (
d)-(f) shows the performance of the NFTCI-ELM.Similarly, the performance of the NFTCI-ELM is much better than that of the CI-ELM when the incorrect fault level for training is used.For instance, in Abalone dataset, the MSE values of the NFTCI-ELM with three training fault levels, i.e., {p = 0.01, σ 2 = 0.04}, {p = 0.05, σ 2 = 0.09} and {p = 0.1, σ 2 = 0.16}, are 0.008093, 0.007507 and 0.007793, respectively.Again, these MSE values are much 155180 VOLUME 7, 2019 smaller than the MSE value of the CI-ELM, i.e., 0.179.Also, it is noticed that the NFTCI-ELM achieves the smallest MSE value, i.e., 0.007507, when the correct training fault level {p = 0.05, σ 2 = 0.09} is used.This phenomenon also happens in Concrete and Boston Housing datasets.Moreover, given the fault-free situation, i.e., true fault level {p = 0, σ 2 = 0}, the performance of the NFTI-ELM and NFTCI-ELM with different training fault levels are shown in Fig. 3. Fig. 3(a)-(c) shows the performance of the NFTI-ELM.We notice that even the NFTI-ELM is used in the fault-free situation, the NFTI-ELM performs as well as the I-ELM.For instance, in Abalone dataset, the MSE values of the NFTI-ELM with different training fault levels, i.e., {p = 0.01, σ 2 CHI SING LEUNG (M'05-SM'15) received the Ph.D. degree in computer science from The Chinese University of Hong Kong, in 1995.He is currently a Professor with the Department of Electronic Engineering, City University of Hong Kong.He has published over 120 journal articles in the areas of digital signal processing, neural networks, and computer graphics.His research interests include neural computing and computer graphics.He was a member of Organizing Committee of ICONIP2006.He is a governing board member and the Vice President of the Asian Pacific Neural Network Assembly (APNNA).In 2005, he received the 2005 IEEE Transactions on Multimedia Prize Paper Award for his article titled The Plenoptic Illumination Function.He was the Program Chair of ICONIP2009 and ICONIP2012.He is/was a Guest Editor of several journals, including Neural Computing and Applications, Neurocomputing, and Neural Processing Letters.ERIC WING MING WONG (S'87-M'90-SM'00) received the B.Sc. and M.Phil.degrees in electronic engineering from the Chinese University of Hong Kong, Hong Kong, in 1988 and 1990, respectively, and the Ph.D. degree in electrical and computer engineering from the University of Massachusetts, Amherst, MA, USA, in 1994.He is currently an Associate Professor with the Department of Electronic Engineering, City University of Hong Kong, Hong Kong.His research interests include analysis and design of telecommunications and computer networks, energy-efficient data center design, green cellular networks, and optical networking.VOLUME 7, 2019 )

)
Apparently, n in (25) is negative, which means L n ≤ L n−1 .In other words, when a hidden node is inserted to the SLFN, the sequence of objective values {L 1 , L 2 , ..., L n } is decreasing.As mentioned, the training set MSE (13) is lower bounded by zero, and the objective value is derived from the training set MSE. Hence, the objective value is lower bounded too.As the training objective L is decreasing and lower bounded, we notice that our proposed NFTI-ELM, in terms of L, converges.

TABLE 2 .
Properties of the ten data sets.

TABLE 4 .
Paired t-test between I-ELM and NFTI-ELM.The number of trails is 400.

TABLE 5 .
Paired t-test between I-ELM and NFTCI-ELM.The number of trails is 400.