Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000

The sliding window kernel recursive least squares (SW-KRLS) algorithm has been widely used in system identification because of its simple structure, low computational complexity and high predictive accuracy. However, with the increase of input data, high computational complexity will worsen the algorithm performance, and there are some difficulties in adapting to the system with an abrupt change. In view of this, we propose a variable sliding window sparse kernel recursive least squares (VSWS-KRLS) algorithm. In order to obtains a parsimonious kernel matrix with the satisfactory prediction accuracy, the basic pruning technique is applied to the traditional sliding window method. In addition, mechanism for window size adjustment is added to adjust the size of sliding window adaptively according to the system changes. Finally, Novelty criterion (NC), dictionary with sliding window, variable sliding window technique and mutation detection mechanism are combined with KRLS to form our improved KRLS algorithm. The improved algorithm can reduce the computational complexity, improve the convergence performance, and have the capability to better track the system with an abrupt change. System identification experiments are carried out in the wiener nonlinear system to prove the effectiveness of the improved algorithm. INDEX TERMS Sparsification, variable sliding window method, system identification, kernel recursive least squares algorithm.


I. INTRODUCTION
System identification is a mathematical modeling method that uses known feature information to construct an equivalent model, which is used in many fields, such as signal processing, machine learning and control engineering [1][2][3][4][5][6]. The development of classic system identification methods has been relatively complete, including impulse response method, spectrum analysis method, least square (LS) method [7,8], maximum likelihood method [9] and deep learning [10,11]. Where, the LS method is a classic and most basic method, and it is also the most widely used method. However, the LS algorithm has certain limitations, so in order to overcome its shortcomings, scholars have proposed many adaptive algorithms. In 1950, Plackeet first proposed the recursive least squares [12] (RLS) algorithm, which has fast convergence and small prediction error. In 1960, Widrow and Hoff proposed the least mean square algorithm [13] (LMS), which is widely used because of its simple calculation and good convergence performance. However, traditional filtering algorithms have limited effectiveness in dealing with nonlinear problems.
In recent years, kernel methods [14] have been widely used to deal with nonlinear problems, such as kernel support vector machines [15,16], kernel principal component analysis [17,18], kernel adaptive filter [19]. As generalization of linear adaptive filters, the kernel recursive least squares [20] (KRLS) algorithm was proposed by Engel in 2004. This algorithm solves the nonlinear inseparable problem by projecting the input data to the Hilbert space (Reproducing Kernel Hilbert Space, RKHS), and using the positive definite kernel function that satisfies the Mercer [21] condition to calculate the inner product, this process involves nonlinear mapping from lowdimensional feature space to high-dimensional feature space, which can be shown in Figure 1.
After the KRLS algorithm was proposed, scholars proposed the kernel least mean square (KLMS) algorithm and the kernel affine projection algorithm (KAPA). According to the performance comparison results of KRLS, KRLS   three types of models in prediction efficiency, prediction accuracy, and convergence speed, as shown in Table 1. From Table 1, it can be seen that compared with KAPA, KRLS has lower prediction efficiency, but it has more advantages in terms of speed and prediction accuracy. Because the KRLS algorithm has a strong tracking ability in solving nonlinear problems, it has been successfully applied in data mining, machine learning and other fields [22,23]. However, the computational complexity of KRLS will increase due to recursive calculations, so most studies mainly concentrated on the sparsification of the KRLS algorithms. In 2006, Vaerenbergh et al. proposed a sliding window based kernel recursive least squares (SW-KRLS) algorithm [24], which uses a sliding window to limit the growing kernel matrix. This algorithm obtains lower computational complexity while retaining the advantages of KRLS. In 2010, Vaerenbergh et al. proposed a fixed budget recursive least squares (FB-KRLS) algorithm [25], which uses the distance between the sample and the dictionary as a constraint to the sample sparsification. In 2013, Chen et al. proposed the quantized kernel recursive least squares (QKRLS) algorithm [26], which used a quantized method to reduce the input dimension. In 2018, Han et al. proposed an adaptive dynamic adjustment kernel recursive least squares (ADA-KRLS) algorithm [27], which performs online sparsification by combining fixed budget with dynamic adaptation. The parsimonious kernel recursive least squares (PKRLS) algorithm is proposed in literature [28], a pruning approach that can restrict the network size to a fixed value is applied to the KRLS algorithm, which curbs the network growth and improves learning efficiency. In 2019, Han et al. proposed an adaptive normalized sparse quantized kernel recursive least squares (ANS-QKRLS) algorithm [29], which integrates dynamic adjustment, coherence criterion, approximate linear dependency (ALD) criterion and the QKRLS algorithm for online sparsification. Zhong et al. proposed a dynamic adaptive sparse kernel recursive least squares (DASKRLS) algorithm in 2020 [30]. It uses ALD and online vector projection standards to make the data sparse, and combined with regularized maximum correlation entropy to deal with the noise impact on data. The literature [31] proposed an initial framework with forgetting factor on the basis of KRLS (FFIKRLS), which combines the ALD-KRLS algorithm with the QKRLS algorithm, and introduces a forgetting factor to sufficiently track the strongly changeable dynamic characteristics. The literature [32] used Nyström method and k-means sampling to form novel Nyström kernel recursive least squares with k-means sampling (NysKRLS-KM).
In summary, research on the KRLS algorithm in system identification has made good progress [33,34]. However, in face of complex system environment, system identification also faces enormous challenges [35]. In addition, KRLS is based on the mean square error (MSE) criterion, which has some difficulties in accommodating abrupt change. For the above problems, the variable sliding window sparse kernel recursive least squares (VSWS-KRLS) algorithm proposes a suitable solution. In this paper, we present three major contributions: • We limit the size of dictionary in novelty criterion (NC) by using sliding window. As data size increases, computational complexity of calculating dictionary will raise. Therefore, in order to reduce the storage space of the dictionary, we use sliding window to fix the dictionary size. In addition, this dictionary can replace the KRLS algorithm dictionary to calculate the kernel matrix. • We reduce kernel matrix size by combining sliding window and NC with KRLS. A lot of irrelevant data in the sliding window will affect the convergence. Therefore, in order to improve calculation efficiency, we add NC, which can raise the threshold of entering the kernel matrix. • We propose a new variable sliding window technology to enhance the capability of the algorithm to track system changes. This method can adaptively adjust the window size according to the system environment. The rest of this paper is organized as follows: In Section 2, a brief review of RLS and KRLS is presented. Section 3 introduces several common sparse methods. In Section 4, the VSWS-KRLS algorithm is proposed. In Section 5, experiments in system identification is conducted to evaluate the performance of the proposed algorithm. Finally, conclusions and prospects are drawn in Section 6.

A. RECURSIVE LEAST SQUARES ALGORITHM
The LS algorithm does not need to make assumptions about the statistical characteristics of the input signal. The RLS algorithm is a recursive extension of the LS algorithm, which This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 3 can recursively update the estimated value by applying the old data. For the training data, the RLS algorithm estimates the filter coefficients ωi-1 by minimizing the cost function. The cost function is: The function is computed as: where Ui-1=[u1,u2,…,ui-1]L×( i-1), uj is the input vector at time j,di-1=[d1,d2,…,di-1] T , dj is the expected response at time j. By introducing the matrix inversion lemma (Woodbury identity), the RLS algorithm can be updated: The RLS algorithm uses the inverse of the correlation matrix to whiten the input data, which improves the convergence performance of the filter. Figure 2 is the RLS algorithm for system identification model.

B. KERNEL RECURSIVE LEAST SQUARES
The KRLS algorithm introduces the kernel function based on the RLS algorithm, which projects the input data to the Hilbert space (RKHS). The biggest advantage of this algorithm is that the nonlinear relationship can be solved without knowing the specific mapping form of input. First, we suppose that X represents the original space, X=[x1,x2,…,xn]R N , and the mapping φ is expressed as: where, ℍ represents the Hilbert space, φ(x) represents the projection of x in the feature space. In the kernel function, the nonlinear mapping is defined by:

RLS algorithm
Adaptive filter ωi-1 output di desired response  where < , > represents the inner product operation. The kernel function first maps the input data to a high-dimensional space, where it then performs linear operations. As the kernel function used in this paper, the Gaussian kernel function is expressed as: where σ represents the Gaussian kernel parameter. In KRLS algorithm, given the expected sequence {d1,d2,...} and the input sequence {φ1,φ2,...}, we have the cost function: where dj and φj represent the expected sequence and the input sequence. We assume that Φi=[φ(x1),...,φ(xi)], and a n×ndimensional kernel matrix is defined by: When ωi=Φiai, the cost function of the KRLS algorithm at time i can be obtained: (12) where ai=[a1,a2,…,ai] T is the weight vector and di=[d1,d2,…,di] T is the expected output vector. The estimated value of αi is as follows: The kernel matrix will be adapted as described below： Here, λ is a positive number called the regularization parameter, which will be implemented to prevent overfitting.
Unlike the RLS algorithm, the KRLS algorithm focuses on updating the kernel matrix. With the widespread application of KRLS algorithm in various fields [36,37], its deficiencies have gradually emerged. As the input data increases, the size of the kernel matrix is expanding, so the KRLS algorithm optimization problem has been transformed into the sparse problem of the kernel matrix.

III. SPARSIFICATION
The previous part introduces and compares the RLS algorithm and the KRLS algorithm. This part we will introduce some common sparse methods.

A. NOVELTY CRITERION AND APPROXIMATE LINEAR DEPENDENCY
Pruning techniques can reduce redundant information and improve computational performance [38]. In the KRLS algorithm, the size of the kernel matrix will upsize linearly 4 with the update, which will bring challenges to online prediction. Therefore, many sparse methods have been proposed, the NC [39] is a simple sparse method to check whether the latest data is useful. In 2004, Engel et al. proposed the ALD criterion to solve this problem for KRLS algorithm [20]. Besides, Richard et al. studied another similar method, called the coherence criterion [40]. The above methods can effectively optimize the KRLS algorithm from online sparsification.
This paper mainly introduces NC and ALD online sparse methods. In the NC, online sparsification starts from an empty set and gradually adds samples to the central set of the dictionary according to the judgment. Assuming the current dictionary is: where cj is the center at iteration j, and mi is the count of the set. When a new data pair {ui+1, di+1} appears, algorithm will decide whether to add ui+1 as a new center to the dictionary.
If the distance is less than the threshold δ1, then ui+1 will not be added to the dictionary. On the contrary, the algorithm continues to compare the calculated prediction error ei+1 with the threshold δ2. Only if ei+1 is greater than the threshold δ2, ui+1 will be added to the dictionary as a new center. In fact, in NC, increasing threshold δ1 will decrease the kernel matrix size, but the convergence performance of the algorithm may deteriorate. Similarly, increasing δ2 will decrease kernel matrix size, but the performance may deteriorate. Therefore, in this sparse method, the appropriate threshold can be selected according to different requirements ALD criterion is an approximate linear correlation sparse method, the ALD criterion can be defined as: In the ALD criterion, α represents the coefficient vector, and ν represents the threshold. When the input data arrives, the ALD criterion will calculate the linear dependence between the input data and the dictionary data. If the value is greater than the preset threshold, the input data will be added to the dictionary, otherwise the input will be removed. Similar to the NC criterion, through the comparison of the dictionary and the input data, the ALD criterion will perform sequential sparsification in order to reduce the size of the kernel matrix.
In KRLS-NC algorithms and KRLS-ALD algorithms, the weight vector i needs to be updated: This process is calculated using matrix inversion lemma. Finally, the weight is substituted into the output model to obtain the output results. At iteration i, given a test input u⁎, the output of the system is: Similar to the NC, the selection of threshold will also affect the performance of the algorithm.

B. SLIDING WINDOW METHOD
In order to limit the size of kernel matrix, Vaerenbergh et al. applied the sliding window method to the KRLS algorithm [24]. The SW-KRLS algorithm combines sliding window and traditional L2 norm regularization, which requires a fixed and unrestricted kernel matrix dimension. Meanwhile, L2 norm regularization improves the generalization ability of the model. This algorithm shows good tracking performance in sudden changes and is mostly used in non-memory nonlinear systems. In this method, we suppose the size of window is M, therefore the observation matrix is Φi=[φ(xi-M+1),...,φ(xi)], and a kernel matrix can be expressed as: κii=κ(xi, xi), c is the regularization factor. In order to keep the size of kernel matrix unchanged, the kernel matrix uses the new sample data to add new rows and new columns by (21)- (23).
where, A is a kernel matrix before expansion, which is nonsingular. After expansion, the kernel matrix needs to compress the expanded kernel matrix, and the oldest rows and columns are removed by (24)-(25).
The above equations can make the size of the kernel matrix fixed, and Figure 3 is the principle of the sliding window.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.  In summary, the SW-KRLS algorithm uses regularization to solve the overfitting problem, and combines sliding window and matrix inversion lemma to solve the limitation of computational complexity, which not only saves memory space, simplifies calculation, but also improves prediction accuracy.

IV. IMPROVED KERNEL RECURSIVE LEAST SQUARES ALGORITHM
After introducing the sparse method, in this section, we will first optimize the NC and sliding window method from online sparsification. And then, for the fixed-size "sliding window" technology, a method that can resize the kernel matrix according to the system environment is proposed. Finally, we combine the three optimization methods and propose the improved KRLS algorithm.

A. SLIDING WINDOW AND NOVELTY CRITERION
The SW-KRLS algorithm has low computational complexity and good tracking ability, but it directly adds the input data to the kernel matrix and lacks of judgment on the input data, so the data with small correlation in the sliding window may affect the convergence performance of the algorithm. In general, the algorithm remains of finding a sparse way of judging input data. Therefore, by applying the improved NC to the SW-KRLS algorithm, we propose the sliding window sparse kernel recursive least squares (SWS-KRLS) algorithm.
First and foremost, in the improved algorithm, the NC will determine whether to add input data to the sliding window at iteration i.
Correspondingly, given a test input u⁎, the error of the system is: The following two situations will be judged by NC: • If and only if the above equations are both satisfied, it means that the input has a greater influence on the algorithm, so input is allowed to enter dictionary. • Otherwise, if any of the above equations is not satisfied, it means that the new input has little effect on the algorithm. Therefore, the system will not add the input to the sliding window, and the algorithm error is equal to the previous error. Using this sparse method, the proportion of useful input in the sliding window can be increased. Meanwhile, with the increasing of iterations, the sparse method can also reduce the computational complexity of SW-KRLS algorithm.
In addition, although NC can eliminate less relevant data in the input, but the dictionary size in NC increases linearly with the number of training data. Therefore, we use the sliding window method to limit the size of dictionary. If the dictionary size is greater than M, the sliding window will delete the i-M th data in the dictionary to keep the dictionary size unchanged. Figure 4 shows the principle of dictionary update. Supposing that the size of the fixed NC dictionary is M, the definition of dictionary C is as follow: Using this method has two advantages, on the one hand, the dictionary with sliding windows is used to calculate the kernel matrix has a better steady-state error because sliding windows delete these data can increase the proportion of the closer data, on the other hand, this dictionary can decrease memory usage.
Finally, we apply these two methods to SW-KRLS algorithm and propose the SWS-KRLS algorithm, the process of the algorithm is shown in Algorithm 1.

B. Variable Sliding Window Method
In a sense, using the sliding window to limit the size of the kernel matrix is one of the effective sparse methods. However, if a fixed size window is used, it is difficult to achieve good parameter tracking when change occurs. If the system changes, the larger the window, the worse the tracking performance of calculate the kernel matrix according to (18)- (22). calculate the coefficient vector according to (14). calculate the system error according to (24). end for 6 ... ...
... the algorithm, on the contrary, the small the window, the worse the convergence performance of the algorithm. The trade-off relationship between the tracking performance and the convergence performance is governed by the size of sliding window. Based on these advantages and arguments, our strategy is clear: adaptive adjustment of the window size can better balance the relationship between them. Julian [41] first applied the variable sliding window method to KRLS, and the method achieved good results. In this paper, we improve this method by adding the mechanisms for window size adjustment and the change detection mechanisms. First of all, in the mechanisms for window size adjustment, for the algorithm to operate online, it is necessary to calculate the time required to upsize and downsize the kernel matrices. Suppose the time required to upsize is Um and the time required to downsize is Dm: where m is the kernel matrix size, suppose the time required to downsize is Um: where Tκ is the calculation cost of the kernel function. Thus, the total calculation time required to upsize and downsize the kernel matrix can be calculated as: In the course of upsizing and downsizing, we find that if the size of the kernel matrix is smaller than M, the total calculation time Tm will be less than the allowable calculation time. Thus, it is this "residual" computation time that the algorithm can adjust the size of the kernel matrix online. Suppose the size of the kernel matrix is m, where 1 <m <M. When the size of kernel matrix is upsized, the up range of the kernel matrix sizes is: We can not only increase the kernel matrix, but also reduce the size of the kernel matrix. However, enough time must be reserved to upsize, the downsizing range of the kernel matrix size is:

  
In general, when the kernel matrix size is m, the resizing range Rm can be expressed as: where, R U 1 represent the option of discarding the regularized kernel matrix, which is an operation that takes O(1) time. The kernel matrix can be upsized and downsized within this range.
Subsequently, to adjust the window size according to system changes, the algorithms need to use change detection mechanisms. Literature [42] introduced several different methods of detecting parameter changes in adaptive filtering algorithms, which can be divided into parameter detection methods and error detection methods. However, because the filter coefficient ω in the KRLS algorithm is replaced by α, the parameter detection method cannot be used. Therefore, this paper uses error detection to detect system change. The mean square error of adjacent time is defined by: In order to reduce the false alarm rate, we use the synchronous superposition averaging algorithm to update the system error difference, and this method will be verified in Section V. Assuming that the window size of the averaging algorithm is L, then the change detection function is defined by:  where θ is the threshold for detecting changes, the selection of θ represents certain degree of trade-off between the probability of false negatives and the probability of false positives. Therefore, the value of threshold is important for the performance of the system. The change threshold θ=3σ, where σ is the standard deviation of the background noise. Proceeding in a fashion similar to Julian, we adopt the threshold θ=3σ, so that if ki >θ the algorithm judges that the system has changed. In addition, in order to improve the parameter tracking performance of the algorithm, this paper proposes a method of adjusting window size. The implementation is shown in Figure 5.
In stage I, in the initial stage of the algorithm, the sliding window size gradually increases; in the stage II, when no change is detected, the SW-KRLS algorithm updates with the fixed window size M; in the stage III, when the system parameter changes are detected, the window size can be .  quickly reduced. By using a smaller window size, the algorithm can track parameter changes more sensitively; in the stage IV, the window size gradually increases until it increases to the maximum allowable size L; finally, in the stage V, the algorithm is updated with a larger window size L, so that the algorithm obtains a higher convergence accuracy. The process of variable sliding window kernel recursive least squares (VSW-KRLS) algorithm is shown in Algorithm 2.

C. IMPROVED KERNEL RECURSIVE LEAST SQUARE ALGORITHM
Through the above analysis, it can be seen that the prediction accuracy of SWS-KRLS algorithm can be The window length is reduced to M1 Update the kernel matrix, coefficient vector, and output using formulas    improved after adding the NC. Nevertheless, because the sparse method will destroy the correlation of the data, the prediction efficiency of the algorithm is severely degraded. As pointed out, the variable sliding window method enable the algorithm to perform tracking in nonstationary scenarios. In order to obtain an efficient kernel method, it is achieved by combining the two methods and we propose the sparse variable sliding window kernel recursive least square (VSWS-KRLS) algorithm.
In the initial stage of the algorithm, a decision is immediately made of whether input should be added into the dictionary. Only if the distance and error are both greater than the preset threshold, the input data will be added to the dictionary and participate in the calculation of the kernel matrix. Therefore, there are more useful information are used to update the algorithm with low computational complexity.
In the update stage of the algorithm, if the system environment changes, the mechanisms for window size adjustment can solve the problem that SW-KRLS is not sensitive to outliers or system change. The VSWS-KRLS algorithm flow chart is shown in Figure 6.

D. COMPUTATIONAL COMPLEXITY ANALYSIS
At present, KRLS algorithm has good application prospects because of its high prediction accuracy and fast convergence speed. However, with the increasing of iterations, the size of model will continue to increase, and the increase of computational complexity greatly limits the performance of the algorithm. So, in order to prove that the proposed VSWS-KRLS algorithm has lower computational complexity, this paper compares VSWS-KRLS algorithm with common KRLS algorithms (KRLS, KRLS-ALD, FB-KRLS, EX-KRLS, SW-KRLS). This paper uses the weight coefficient vector ai, the autocorrelation matrix Pi, dictionary Di and the sparse method to compare the computational complexity.
Suppose that at the iteration i, the input size is i, the dictionary size is l and the size of sliding window is L, in this paper, we use the O-order method to represent the computational complexity of the algorithm. The results are shown in Table 2.
In KRLS algorithm, because sparse method is not adopted, the computational complexity of weight coefficient vector and autocorrelation matrix are both O(i 2 ). In ALD-KRLS algorithm, the computational complexity of sparse method ALD, weight coefficient vector and autocorrelation matrix are both O(l 2 ) (l is the size of the kernel matrix with sparse method, l ≤ i). Similar to KRLS, the complexity of the weight coefficient vector and the autocorrelation matrix are both O(i 2 ) in EX-KRLS algorithm. Both FB-KRLS algorithm and SW-KRLS algorithm limit the growth of the kernel matrix by setting a fixed size of window, so the computational complexity of weight coefficient vector and autocorrelation matrix are both O(L 2 ) (L is the size of the sliding window, L≤i).
In VSWS-KRLS algorithm, two sparse methods (NC and sliding window) are added to reduce the computational complexity. And the computational complexity of the weight coefficient vector, autocorrelation matrix, sparse method and dictionary are all O(j 2 ) ( is the size of the NC dictionary, and the dictionary size is controlled by a variable sliding window, so ≤L). To summarize, compared with other algorithms, the VSWS-KRLS algorithm has the smallest computational complexity at iteration i, and with the increasing of iteration, the advantage of computational complexity will become more obvious.

V. SIMULATION
In this section, the KRLS-ALD algorithm and the KRLS-NC algorithm are evaluated to show the reason for choosing the NC as the sparse method. In addition, in order to verify the effectiveness of the proposed VSWS-KRLS algorithm, we experiment SWS-KRLS algorithm, VSW-KRLS and VSWS-KRLS algorithm under nonlinear system identification.

A. SIMULATION SETTINGS
This paper chooses nonlinear Wiener system as the nonlinear system, and the model is shown in Figure 7. A binary signal xn is sent through this channel, then a nonlinear function v=tanh(x) is applied to it, and vn is the channel output. Finally, 20dB of additive white Gaussian noise (AWGN) is added. The linear channel coefficients will change at a given moment to

Algorithm
Computational complexity compare the tracking ability of the algorithm. This paper uses four channels for simulation: In the first part of the simulation, the linear channel is H1(z)=1+0.8362z −1 −0.7732z −2 −0.4484z −3 , after receiving 500 data, it is changed into H2(z)=1−0.8045z −1 +0.9962z −2 +0.4678z −3 . The performance of the model is shown through the MATLAB simulation.
In the experiment, the mean square error (MSE) is selected as the evaluation index of prediction accuracy. It can be calculated as: where i represents the time, yi represents the target output, ŷi represents the predicted output, and L represents the number of samples. If the value of MSE is smaller, it means that the prediction performance is better. At the same time, this paper also uses training time as an indicator of computational complexity. In the system identification experiment, the time embedding length of the input sequence is 4, the length of the training sequence is 1,500, the length of the test sequence is 100, and the parameter of the Gaussian kernel function is 1.
Meanwhile, this experiment was simulated on windows 10 operating system, Intel Core i3-9100 CPU 3.60 GHz, RAM 8.00 GB, and the codes are operated in MATLAB R2016a software, and the experiment results are obtained through 100 Monte Carlo experiments.  For each threshold, we use KRLS-NC and KRLS-ALD to train, and record the kernel matrix size and MSE values. Figure 8 shows the MSE values corresponding to two different sparse methods in different kernel matrix size.

In
The results show that when the kernel matrix size is less than 70, the performance of ALD criterion is better than NC, and NC can actually perform better than ALD in other places. Because the sliding window size in the SW-KRLS algorithm is concentrated between 100-200, so this paper applies the NC to the SW-KRLS algorithm to solve the sparse problem.

C. SLIDING WINDOW SPARSE KERNEL RECURSIVE LEAST SQUARES ALGORITHM
To further test the performance of the SWS-KRLS algorithm, this section shows the influence of algorithm performance with different NC thresholds. We set the size of the sliding window is 150, the threshold δ1=[0.2,0.5,0.7], and threshold δ2=0.05. The simulation result is shown in Figure 9. It is showed that the steady-state error is improved by adding improved NC to the SW-KRLS algorithm. For thresholds, the greater the value of δ1 is, the better the steady-state error of the system. But the sparse methods will destroy the complex relationship between the data, and it will reduce the convergence speed of the algorithm.

C. IMPROVED KERNEL RECURSIVE LEAST SQUARES ALGORITHM
This paper introduces a change detection mechanism in the variable sliding window method. Different from the literature [41], the error difference after simultaneous superposition and averaging can better detect change. In a single experiment, the algorithm mean square error difference and mean square error value after using synchronous superposition averaging algorithm is shown in Figure 10  The error difference unprocessed is unstable at each iteration step, so if we use it to observe the system, the judgment is inaccurate. By comparison, the error difference with synchronous superposition average algorithm can better reflect the situation of the system, because it can select adjacent time to decrease the influence of instability. In order to evaluate the impact of different window size on the performance of the algorithm, we choose the window size M of the SW-KRLS algorithm to be 50, 100, and 150. Meanwhile, the variable sliding window method is implemented in the algorithm of M = 150. The simulation results are shown in Figure 11 and Figure 12. Clearly, when the SW-KRLS algorithm undergoes a system change in the system identification, the fixed window size affects the performance of the algorithm, its steady-state MSE decreases and tracking ability worsens when the size of sliding window grows. However, the trade-off between the efficiency and accuracy in this algorithm can be adjusted by adjusting the size of sliding window. The experimental results demonstrate that the VSW-KRLS algorithm has a faster convergence rate as well as a smaller steady-state misalignment than the SW-KRLS algorithm with fixed size. As shown in Figure 13, the VSWS-KRLS algorithm is compared with the NORMA algorithm, KRLS-T algorithm, SW-KRLS algorithm and SWS-KRLS algorithm. Gaussian kernel with σ=1 is used in all algorithms, and the regularization is set to 0.01. We apply the KRLS-T algorithm with forgetting factor λ = 0.999. The simulation results show that the VSWS-KRLS algorithm has not only a fast convergence rate but also a small steady-state error compared with other algorithms. Meanwhile, the VSWS-KRLS algorithm has the same convergence accuracy as the SWS-KRLS algorithm. With the application of the variable sliding window method, the convergence rate of the VSWS-KRLS algorithm has been greatly improved.
The dynamic change of the dictionary size is plotted in Figure 14. It can be observed that in the KRLS algorithm, the network sizes increase quickly with the increasing of iterations.
By using the sliding window method, the size of the dictionary is limited, the dictionary size will stop growing when it grows to the preset size. And in fact, we can stop adding new centers after convergence by noticing that the MSE does not change after convergence. In the VSWS-KRLS algorithm, the variable sliding window method and novel criterion guarantee the dictionary size changes more slowly, and at the same time they can also reduce the usage rate of the dictionary. Moreover, Figure 15 shows a more gradual linear channel transition from H1 to H2, when the system is linearly varying, the VSWS-KRLS algorithm achieve better performance than other algorithms. Table 3 shows the simulation results of different algorithms in a nonlinear system, where M represents the size of the sliding window, δ1, δ2 and θ represent the threshold and change detection threshold, μ represents the learning rate, λ represents the forgetting factor and MSE represents the mean square error value. t represents the system running time of a single iteration. As can be seen, in the SWS-KRLS algorithm, the increase of the NC threshold will not only improve the convergence accuracy of the algorithm, but also reduce the computational complexity of the algorithm. Compared with SW-KRLS algorithm, the convergence performance, steady state performance and calculation performance of the VSWS-KRLS algorithm has been significantly improved.  Finally, Figure 16 shows the actual value of the model and the predicted value using the SW-KRLS and VSWS-KRLS algorithms. In case of different amounts of data, the proposed algorithm can better estimate the model value. And it can be seen in Table 3 and Figure 16 that compared with the SW-KRLS algorithm, the VSWS-KRLS algorithm can better estimate the model value. In general, the VSWS-KRLS algorithm can play a great role in the field of system identification and make greater contributions to data processing.

VI. CONCLUSIONS AND FUTURE WORK
This paper proposes an improved sparse kernel RLS algorithm for nonlinear system identification. NC, dictionary with sliding window, variable sliding window size technique and mutation detection mechanism are combined with the KRLS algorithm to form our improved KRLS algorithm. Compared with the other algorithm, the VSWS-KRLS algorithm had better prediction performance and low computational complexity in identification of nonlinear systems.
Although the VSWS-KRLS algorithm can realize online prediction and limit the dimension of kernel matrix, the accuracy of online prediction will be affected by NC condition threshold. If the threshold is too small, the amount of data will be large and the efficiency will be low. However, if the threshold is too large, some important data may be deleted by mistake, resulting in the decline of accuracy, therefore, it needs to be improved in the future.
In future work, we will focus on selection of the other sparse methods in the future, which has certain effects on performance and deserves further studied. Furthermore, we will study the influence of adding forgetting factor and quantification to the performance of the algorithm. XINYU GUO received the bachelor's degree in internet of things engineering from the Yantai University, Yantai, China, in 2019, where he is currently pursuing the master's degree. His research interests include adaptive filtering algorithm, sparse signal processing and machine learning.
SHIFENG OU received the Ph.D. degrees from Jilin University, Jilin, China, in 2008. He is currently the deputy dean, professor, and master tutor with School of Optoelectronic Information Science and Technology, Yantai University. His main research interest is speech signal processing, adaptive filtering algorithm and blind signal processing.
MENGHUA JIANG received the bachelor's degree in Shandong Normal University, Jinan, China, in 2020. She is currently pursuing the master's degree in Yantai University, Yantai, China. Her research interests include adaptive filtering algorithm, sparse signal processing and machine learning.