Flatness Prediction of Cold Rolled Strip Based on EM-TELM

Flatness of cold rolled strip is an extremely important indicator of quality, and flatness control is the key technology of the modern high-accuracy rolling mill. The establishment of an efficient and accurate flatness prediction model is conducive to improving the flatness accuracy and realizing the effective control of flatness. Inspired by the error minimization principle, error minimized extreme learning machine with two hidden layers (EM-TELM) used to automatically determine the optimum hidden nodes is proposed in the paper, which is applied to establish the flatness prediction model of cold rolled strip. EM-TELM uses the block matrices to solve the output matrix of the second hidden layer. EM-TELM randomly adds one or a group of hidden nodes to the current network every time. During the increasing process of the network structure, the weights matrix connecting the hidden layer and the output layer are updated incrementally. Since EM-TELM is a no analytic method, it can be used in a kind of prediction problem for complex and difficult modeling systems. The experimental results indicated that the accuracy of EM-TELM is higher than that of EM-ELM, and EM-TELM reduces the computational complexity and training time compared to TELM which recalculates the parameters between different hidden layers when the network structure changes.


I. INTRODUCTION
As the most important steel product in the world, Plate and strip are applied to the most extensive rolling products in the national economic departments and are used in all aspects of the national economy. Such as food packaging, household appliances, precision instruments, automobile manufacturing, aviation, shipbuilding, civil construction, and other industries. It plays an important role in the modernization of national defense and the construction of the national economy. Its production level is an important indicator to measure the development level of a country's steel industry. With the rapid development of the economies of the world, the demand for the strip is increasing. At the same time, the rapid development of the iron and steel industry has led to more and more fierce competition in the strip market. Driven by the market, the requirements of customers on the quality, The associate editor coordinating the review of this manuscript and approving it for publication was Giambattista Gruosso . type, and performance of strip materials have been gradually improved. To meet the requirements of users, improve the competitiveness of enterprises, transform and improve the equipment and technology of each strip rolling production line, and increase the investment in new technology and new process has become an important task. Therefore, the strip rolling technology can move toward a rapid development path of high precision, high speed, and automation [1]- [3].
Flatness [4], [5] refers to the degree of buckling of the plate belt, including the dimension indexes of longitudinal and transverse dimensions of the plate belt. Transverse aspect refers to the section flatness (thickness distribution in the width direction of the plate), that is, the convexity of the plate. In the longitudinal direction, it refers to the flatness of the length of the strip, that is, the straightness, commonly known as the wave shape. Flatness control is the core part of the strip cold rolling production process. In the process of strip rolling deformation, the setting and calculation of flatness are closely related to rolling force and roll bending force. By studying the factors influencing the shape of the exit plate and using the existing data to establish a model, the shape of the exit plate can be effectively controlled. With the progressing of computer application technology, the strip production process has been equipped with a complete sensor measurement device, which can obtain a large amount of process data online, such as bending force, rolling force, tension, and other measured values. These process data contain useful information about the running state of the production process, which can be used to predict the quality of the final flatness. However, due to the lack of effective data processing and information extraction methods, the traditional flatness prediction methods do not effectively use a large amount of readily available measurement data. In recent years, big data and machine learning technologies have emerged, and in many fields such as agriculture, medicine, science, and industry [6], there are many cases in which neural networks are modeled by a large amount of data, and both have achieved high accuracy. The neural network can also be introduced into the steel industry, and the existing data can be used for modeling to achieve the prediction effect, which is conducive to decision-making. Therefore, the paper combines a large number of flatness data in the cold rolling process with neural network technology, and explores the influence of the variation of the bending force of the work rolls and the intermediate rolls on the final exit flatness during the cold rolling process, and establishes a prediction model to effectively predict the flatness. The neural network model established by a large amount of data can hide the whole process in the hidden unit in the model. The established neural network model can learn autonomously through data, learn many hidden and complex knowledge and patterns. At the same time, the value of the large amount of data accumulated on the production line is utilized, and data is used to drive production. Do a good job of forecasting before the start of the production process, and adjust the value of each control means in advance according to the target flatness. That is, by controlling the input and changing the input, the output flatness can be close to or reach the target flatness, which also reduces the adjustment in the production process and saves costs. In recent years, many scholars have established a flatness prediction model based on intelligent methods. However, in the actual test process, it is found that the traditional back propagation (BP) network flatness prediction model has a long training process and is easy to fall into the local minimum problem. The radial basis function (RBF) flatness prediction model often fails to work when the data is insufficient.
Extreme learning machine (ELM) [7] is proposed by Huang et al. for a single hidden layer feed forward neural network to overcome the disadvantages of gradient-based algorithms. ELM randomly generates the connection weights matrix between the input layer and the hidden layer and the bias vector of the hidden layer, and no adjustment is needed in the training process [8], [9]. ELM has been favored by many scholars because of its fast learning speed, good generalization performance, and other advantages [10]- [12], and it has been applied to the rolling field in recent years. Wang et al. [13] applied it to the rolling force prediction of hot rolled sheets, and the experimental results show that ELM has a significant improvement on the modeling accuracy and the generalization ability of the model compared with the traditional modeling methods such as BP and RBF. Li et al. [14] applied ELM to flatness prediction, and the results show that ELM has higher prediction accuracy regardless of sample size and also solves the problem that traditional artificial neural network is easy to fall into a local optimal solution. Although ELM has shown its superior performance in many aspects, how to determine the number of hidden nodes and further improve the prediction accuracy of the model is still an urgent problem to be solved. Feng et al. [15] proposed error minimized extreme learning machine (EM-ELM) to dynamically determine the number of hidden nodes, and update the output weights incrementally. A large number of simulation results show that the algorithm can reduce the computational complexity of ELM. In order to further improve the prediction accuracy of the model, Qu et al. [16] proposed a twohidden-layer extreme learning machine (TELM) and Xiao et al. [17] proposed a multiple hidden layers extreme learning machine (MELM). The experimental results show that the average accuracy and generalization performance of TELM and MELM are greatly improved compared with ELM.
Based on the above problems, the paper will establish a cold rolling flatness prediction model based on error minimized extreme learning machine with two hidden layers (EM-TELM). EM-TELM uses the block matrices to solve the output matrix of the second hidden layer, which is different from the way that TELM uses the generalized inverse to solve the output matrix of the second hidden layer. Then, EM-TELM adds a hidden layer than EM-ELM, which improves calculation accuracy. EM-TELM adds hidden layer nodes one by one or group by group while keeping the structural parameters of the original hidden layer nodes unchanged, and updates the connection weights between the first hidden layer and the second hidden layer and the bias vector of the second hidden layer incrementally. Finally, EM-TELM is validated by the strip steel production data of the cold rolling mill. The experimental results show that EM-TELM has a higher accuracy of flatness prediction than EM-ELM, and EM-TELM reduces computational complexity and reduces training time compared with TELM.

II. BRIEF REVIEW OF ELM AND TELM A. ELM
Extreme learning machine (ELM) randomly generates the weights between the input layer and the hidden layer and the bias of the hidden nodes. And ELM only needs to set the number of nodes in the hidden layer during the training process to obtain the unique optimal solution. The advantage of ELM is that it improves the generalization performance of the network and avoids time-consuming iterative training steps and Local minimum. VOLUME 9, 2021 Suppose there are N independent samples (x i , t i )(i = 1, 2, . . . , N ) consisting of the input X = [x 1 , x 2 , . . . , x N ] T and the expected output , t i2 , . . . , t im ] T ∈ R m . And transpose of vector or matrix represented by superscript T in the paper. Assuming that the number of nodes in the hidden layer is L and the activation function of the hidden layer is g(x). And ELM randomly selects the weight matrix W = [W 1 , W 2 , . . . , W L ] T ∈ R L×n connecting the input layer and the hidden layer and the bias vector ∈ R L×N of the hidden nodes. After determining W and B, their values will not be changed in the training stage. The next steps make the nonlinear system to be transformed into the linear system whose mathematical description is where β = [β 1 , β 2 , . . . , β L ] T ∈ R L×m is the weights matrix connecting the hidden layer and the output layer and its vector element β j = [β j1 , β j2 , . . . , β jm ] T (j = 1, 2, . . . , L) represents the connection weights between the jth hidden node and the mth output nodes, and H = g(WX + B) ∈ R N ×L is the output matrix of the hidden layer and its expression is where h ij = g(W j x i + b j )(i = 1, 2, · · · , N , j = 1, 2, · · · , L) represents the output of the jth node corresponding to x i , W j = [W j1 , W j2 , · · · , W jn ] T represents the connection weight between nth input node and jth hidden node, b j is the bias of jth hidden node, and W j x i represents the inner product between W j and x i . Then ELM uses the least square method to obtain the output matrix β.
where H + is the Moore-Penrose generalized inverse [18] of the matrix H , which can be calculated by the orthogonal projection method. In other words

B. TELM
Tamura and Tateishi [19] pointed out that the advantage of the two-hidden-layer feedforward networks (TLFNs) is that fewer hidden nodes can be used to achieve the desired performance, and it can achieve arbitrarily small errors by using TLFNs with (N /2 + 3) hidden layer nodes to learn N samples. Huang [20] further proved that the number of nodes in the hidden layer can be 2 √ (m + 3)N . Therefore, Qu et al.   Suppose there are N samples (x i , t i ). First, the weights matrix connecting the input layer and the first hidden layer and the bias vector of the first hidden layer are initialized by using the randomly generated values. Then the two hidden layers are regarded as one hidden layer, and the connection matrix β between the second hidden layer and the output layer is calculated according to formula (3). In this way, we can obtain the expected output of the second hidden layer as follows: where β + is the Moore-Penrose generalized inverse of the matrix β, the definition and its calculation method of β + are the same as that of H + . In other words β The theoretical output of the second hidden layer is where W H is the connection matrix during the first and second hidden layers, B 1 is the bias vector of the second hidden layer, and H is the output of the first hidden layer.
where H + E is the Moore-Penrose generalized inverse of H E = [ 1 H ] T , and g −1 (x) is the inverse function of g(x). Therefore, the paper also get the actual output of the second hidden layer.
Finally, the output weight of the network is updated and where H + 2 is the Moore-Penrose generalized inverse of H 2 . Then the output of TELM is

III. PROPOSED LEARNING ALGORITHM
The number of nodes in the hidden layer of ELM has always been an important research issue. It has been proved in [15] and [21] that the prediction error of ELM will be smaller and smaller as the number of hidden nodes increases. And EM-ELM [15] dynamically determines the network structure based on the principle of error minimization and allows the nodes of the hidden layer to be added to the network one by one or a group of groups. EM-ELM solves the generalized inverse in the calculation process by block matrices. Therefore, the paper proposes error minimized extreme learning machine with two hidden layers (EM-TELM) after integrating the advantages of EM-ELM and TELM. And EM-TELM uses the block matrices method to determine the parameter of the second hidden layer.

A. CONVERGENCE ANALYSIS OF TELM
Before introducing EM-TELM formally, let's briefly introduce a lemma to prove the convergence of TELM. The derivation process of the convergence of TELM is similar to that of the convergence of ELM.

Lemma 1(Convergence Lemma):
A TELM network is given. Let H 1,1 = H (a 1 , · · · , a L 0 , b 1 , · · · , b L 0 , x 1 , · · · , x N ) and H 1,2 denote the output matrix of the first hidden layer and the output matrix of the second hidden layer, respectively. Each hidden layer of TELM contains L 0 nodes i=L 0 +1 , the new output matrix of the first hidden layer and the second hidden layer becomes H 2,1 = H (a 1 , · · · , a L 1 , b 1 , · · · , b L 1 , x 1 , · · · , x N ) and H 2,2 , separately. Then where the output corresponding to the new node in the first hidden layer is Since β 2,2 is the least square solution of min H 2,2 β − T , according to formula (7),

B. EM-ELM
The paper first introduces EM-ELM [15] and then extends to EM-TELM. First, the paper gives the initial number L 0 and the maximum number L max of hidden nodes in each hidden layer, the expected prediction error is ε, and is the output of the hidden layer with L 0 hidden nodes. Suppose the paper adds δL 0 = L 1 − L 0 hidden nodes where the output corresponding to the new node in the hidden layer is . . . · · · . . .
Let E(H ) = min H β − T , and E(H ) is the prediction error of the network. If E(H 1 ) = min H 1 β 1 − T < ε, it is no need to add hidden nodes to the network, and the training step is completed. According to the introduction of VOLUME 9, 2021 the extreme learning machine, Then As described in [5], H 1 is a full rank matrix when N ≥ L 1 . Then the Schur complement M of 1 H 1 is nonsingular matrix. According to the method for computing the inverses of 2 × 2 block matrices, Then the paper calculates D from formulas (17) and (18).
Because I −H 1 H + 1 has the characteristics of symmetry and orthogonal projection, In the same way, So far, the H 2 of EM-ELM can be obtained. Then EM-ELM solves the parameter β in the same way as ELM.

C. EM-TELM
EM-ELM can be transformed into EM-TELM by adding a new hidden layer. And EM-TELM uses the block matrices to solve the output matrix of the second hidden layer, which is similar to the process of EM-ELM solving the output matrix of the hidden layer. Unlike TELM using generalized inverse to solve the output matrix of the hidden layer, EM-TELM can solve this problem by using the block matrices. In addition, EM-TELM gradually increases the nodes of the hidden layer, and the other settings are the same as TELM. Specifically, EM-TELM uses the error minimized theory and successively increases the number of nodes in the hidden layer, which can make the error between the actual output and the expected output of the model smaller and smaller. And EM-TELM uses the block matrices to replace the generalized inverse but does not include the generalized inverse of β, which reduces the computational complexity and improves the operating efficiency of the model. It is worth noting that the parameters between the original nodes are unchanged during the process of adding nodes, which reduces the computational complexity. All in all, the biggest feature of EM-TELM is that it uses the block matrices to solve the output matrix of the second hidden layer, and the network updates faster and the computational complexity is low after adding hidden layer nodes.
The parameter settings of EM-TELM and EM-ELM in Part B are the same. And the derivation process of EM-TELM is the same as TELM, but the difference lies in the way of solving the output matrix of the hidden layer. When EM-TELM has a single hidden layer, the derivation process of EM-TELM is the same as EM-ELM. After finding the output H 2 of the first hidden layer, then Then, the expected output of the second hidden layer is H * 1,2 = T β + 1,1 . According to formula (5), the actual output of the second hidden layer is H 1,2 = g(W H H 2 + B 1 ). According to formula (6), the parameters of the second hidden layer are According to the introduction of the extreme learning machine, Then the paper uses the block matrices method to obtain Then Therefore, W HE2 and H 1,2 can be obtained, and β new can be obtained by formulas (8) and (9).
The specific steps of the proposed EM-TELM algorithm are as follows.
1) The parameters of the first hidden layer in EM-TELM with 2L 0 hidden nodes are randomly initialized and each hidden layer has L 0 hidden nodes. And L 0 is a small positive integer given by a person. At the same time, let k = 0.
2) The two hidden layers are regarded as one hidden layer, and the connection weight matrix β between the second hidden layer and the output layer is obtained from formula (3).
3) The connection weight and bias between the second hidden layer and the first hidden layer can be obtained from formulas (4) to (6).
4) The output matrix H 2 of the second hidden layer is calculated by formula (7), and the prediction error E(H 2 ) = H 2 H + 2 T − T of the model is calculated. 5) k = k + 1. 6) EM-TELM randomly initialize δL k−1 hidden nodes newly added to the first hidden layer and both hidden layers are added with δL k−1 hidden nodes. Thus the number of nodes for each hidden layer in the existing network is L k = L k−1 − δL k−1 . Meanwhile, the output matrix of the first hidden layer in the (k + 1)th iteration is H k+1 = [ H k δH k ], and the increment matrix δH k can be concretely expressed as 7) The two hidden layers are regarded as one hidden layer. According to formula (3), the output weight matrix between the second hidden layer and the output layer are updated in a fast recursive way as 8) According to formula (6), we can get the connection weight matrix between the second hidden layer and the first hidden layer and the bias vector of the second hidden layer recursively, and So EM-TELM get H + E,k+1 in a recursive way Then the connection matrix W HE,k containing the weights matrix and bias vector between the first and second hidden layers can be solved as where H * 1,k = H + k+1 T .
9) The output matrix H 2,k+1 of the second hidden layer is calculated by formula (7), and then we can get the prediction error E(H 2,k+1 ) = H 2,k+1 H + 2,k+1 T − T of the model. 10) If L k < L max and E(H 2,k+1 ) > ε, return to step 5. Otherwise, the process of training is completed.
Among them, steps 1 to 4 belong to the initialization phase, and steps 5 to 11 belong to the recursively growing phase. As a special case, the hidden node is added to the existing EM-TELM one by one, which is δL 0 = δL 1 = δL 2 = · · · = δL k = 1. And δH k is a vector and denotes as and the calculation formulas of U k and D k become According to the specific steps of EM-TELM, the pseudocode algorithm of EM-TELM can be obtained and shown below.
Input: The input variable X , and the expected output T . Output: The structure parameters and the performance indicators of EM-TELM. Set the structural parameters of EM-TELM as W , B, W H , B 1 and β respectively, the initial number of nodes in the hidden layer is L 0 , the maximum number of nodes in the hidden layer is L max , the expected error of EM-TELM is ε, and k = 0. The initialization phase: When the number of nodes in the hidden layer of EM-TELM is L 0 , formulas (3) to (7) are used to solve the initial structural parameters of EM-TELM. The recursively growing phase: When L k < L max and E(H 2,k+1 ) > ε k = k + 1; EM-TELM adds δL k−1 new hidden layer node. According to formula (29), the output H k+1 of the first hidden layer is calculated using the block matrices. According to formula (32), the output H E,k+1 of the first hidden layer is calculated using the block matrices. According to formula (33) and step 9, the predic -tion error E(H 2,k+1 ) = H 2,k+1 H + 2,k+1 T − T of the model is calculated. end

D. CONVERGENCE ANALYSIS OF EM-TELM
Knowing that TELM and EM-ELM have convergence [15], the paper can give and prove that EM-TELM has convergence. The derivation process of the convergence of VOLUME 9, 2021 EM-TELM is similar to that of the convergence of EM-ELM, so the paper will not repeat the proof.
Theorem 1 (Convergence Theorem): For a given set of distinct samples ℘ = {(x i , t i )|x i ∈ R n , t i ∈ R m , i = 1, · · · , N } and a given arbitrary positive value ε, there exists a positive integer k such that E(H 2,k ) = min H 2,k β 2,k − t ≤ ε.

IV. PERFORMANCE VERIFICATION A. EXPERIMENT OBJECT: 1740MM PRODUCTION LINE
In this section, we apply the proposed algorithm to the prediction of flatness. The dataset comes from the actual production data collected by the 1740 mm production line of the steel mill, and the exit flatness of the fifth frame of the strip steel is predicted through the proposed algorithm.
The 1740mm production line was completed in 2015. The pickling and rolling mill in the production line uses the combined pickling and rolling mill. Plate rolls are installed behind the first and fifth frames, and the fifth frame controls the shape of the plates by means of roll bending, roll shifting, and segment cooling. The specification of the raw material is (1.5-6.0) * (700-1600)mm, and the specification of the finished product is (0.2-2.5) * (700-1600)mm.
The data used are the actual measurement results of sensors of 5 racks in the first production sequence of the cold rolling mill production line. By consulting relevant literature and data, it was determined that the characteristic variables are 5 framework roll bending force, 5 frames intermediate roll bending force, 5 frames rolling force, 5 frames incoming tension, outgoing tension, first frame front tension, curl tension, and exit flatness measured in each sensor area of the first frame, namely input variables, the exit flatness measured by each sensor area of the fifth rack is taken as the target variable, that is, the output variable.
The flatness refers to the degree of warpage of the strip and is represented by elongation of each fiber [22].
where λ represents the elongation of the longitudinal strip in the direction of the length of the strip, L represents the difference between the longitudinal strip and the reference length in the direction of the length of the strip, and L which is the average length of each longitudinal strip represents the reference length of the strip.
Since the elongation calculated according to the formula (36) is a small value, the unit I is used to characterize the flatness in order to characterize the defects of the flatness visually. And the relationship between I and λ is Therefore, the unit of the industrially measured export flatness data is I . If the data value is greater than 0, the elongation of the flatness of the measuring point is positive, indicating that the elongation of the flatness is too long compared with the reference length and the plate quality is loose. If the data value is less than 0, the elongation of the flatness of the measuring point is negative, indicating that the elongation of the flatness is too short compared with the reference length and the plate quality is tight. If the data value is close to 0, it indicates that the flatness is close to the reference length and the flatness is good.
In the 1740mm production line, 32 sets of sensors are installed at the exit of the first rack to measure the exit flatness which is divided into 32 areas. The 54 sets of sensors are installed on the outlet of the fifth rack to measure the exit flatness which is divided into 54 areas. Through the analysis of the data, it can be seen that the bending force of the roll does not change in a certain period, but the flatness continues to change. Thus time is also an important factor influencing the prediction of flatness. The sample points in the dataset are generated sequentially every 0.08 seconds, so the time column is added to the input variable to reflect the change of the sample point time. Since the data used in the experiment were taken from the period from 8:56 to 9:24 on a certain day, the added time was listed as time = (0.08, 0.16, · · · , 18762.24) T In summary, the number of input variables is 27 + 1 + 32 = 60, and the number of output variables is 54. Due to the flatness data is only close to 0 and not equal to 0, the data with the flatness of 0 is rejected. Therefore, the exit flatness measured from area 1 to area 9 of the first rack and the fifth rack should be excluded, which also applies to area 46 to area 54 of the fifth rack. The final number of output variables is 51, and the final number of output variables is 36.
At the same time, the paper selects 8840 groups as the training set and the remaining 9614 groups as the test set. Since the data unit difference between the rolling force and the frame tension is large, it will affect the prediction accuracy of the model. Therefore, we have standardized processing of the input data with a mean of 0 and a variance of 1 before modeling. And the experimental environment is MATLAB 2016b.

B. ANALYSIS OF ACCURACY
For parameter setting, the EM-TELM algorithm increases the hidden nodes one by one, the activation function of the hidden layer is Sigmoid, the starting node of the network is 5 (that is, each hidden layer has 5 hidden nodes), the maximum number of hidden nodes for each hidden layer is 30, and the expected prediction error of the model is 1.0. The parameter setting of EM-ELM is the same as EM-TELM.
The index for evaluating the accuracy of the model selects the mean absolute deviation, which is the average for the absolute values of the deviations between the individual observation and the arithmetic mean. The mean absolute deviation can avoid the problem of mutual cancellation of errors and can accurately reflect the size of the actual forecast error. Suppose the actual output of the model is Tsim, the expected output of the model is Ttest, the number of samples is num, and the mean absolute deviation is Mean, then The number of hidden layers of EM-TELM is more than that of EM-ELM, but it is developed based on EM-ELM. Through the comparison of the mean absolute deviation (Table 1 and Figure 3), the accuracy of EM-TELM is better than that of EM-ELM as the number of hidden layer nodes increases. Therefore, the increase in the number of hidden layers improves the accuracy of the model. It should be pointed out that the mean absolute deviation used in the paper refers to the mean absolute deviation of the test set.

C. ANALYSIS OF MODEL TRAINING TIME
The parameter settings of TELM in the section are the same as EM-TELM. When the number of nodes in the hidden layer of TELM changes, TELM needs to recalculate the structural parameters of all nodes. When the number of nodes in the hidden layer of EM-TELM increases, EM-TELM only needs to solve the structural parameters of the newly added nodes by using the block matrices, which helps to reduce the training time. As shown in Table 2 and Figure 4, the training time of EM-TELM has been much lower than that of TELM with the FIGURE 3. The mean absolute deviation of EM-ELM and EM-TELM with different hidden nodes. When the number of nodes in the hidden layer is less than 16, the mean absolute deviation of EM-TELM is basically the same as EM-ELM. When the number of nodes in the hidden layer is greater than 16, the mean absolute deviation of EM-TELM is smaller than that of EM-ELM. increase of hidden layer nodes. Therefore, the training time of EM-TELM is less than that of TELM because EM-TELM uses the block matrices to replace the generalized inverse and VOLUME 9, 2021  does not change the structural parameters of existing nodes, which also applies to the relationship between EM-ELM and ELM.

D. THREE-DIMENSIONAL DISTRIBUTION MAP OF THE EXIT FLATNESS BASED ON EM-TELM
From Figure 3, we can see that the accuracy of the flatness prediction model based on EM-TELM is higher than that based on EM-ELM. It can be seen from Figure 4 that as the number of hidden nodes grows, it is much faster to update the connection weight matrix and bias vector between hidden layers by incremental learning than the traditional TELM method. Finally, the established model is used to forecast the data collected in the time period from 8:55:32 to 9:26:57, and a three-dimensional distribution map of the exit flatness based on EM-TELM is obtained.

V. CONCLUSION
The paper proposes EM-TELM based on EM-ELM and TELM, and EM-TELM uses the block matrices to solve the output matrix of the second hidden layer. At the same time, EM-TELM can allow hidden nodes to be added to the network one by one or group by group. It can be seen from the above experimental results that the accuracy of EM-TELM is higher than that of EM-ELM. Compared with TELM which recalculates the parameters between different hidden layers based on the entire new output matrix of the first hidden layer whenever the network architecture is changed, EM-TELM reduces the computation complexity by only updating the parameters between different hidden layers incrementally each time.