Fractional Stochastic Gradient Algorithm for Time-Delayed Models With Piece-Wise Linear Input Using Self-Organizing Maps Method

Although stochastic gradient algorithm can identify linear systems with high efficiency. It is inefficient for nonlinear systems for the difficulty in the step-size designing. To overcome this dilemma, this paper proposes a fractional stochastic gradient algorithm for systems with piece-wise linear input. First, the nonlinear system is transformed into a polynomial nonlinear model, then the parameters and time-delay are estimated iteratively based on the fractional stochastic gradient algorithm and self-organizing maps method. In addition, to increase the convergence rates of the fractional stochastic gradient algorithm, a multi-innovation fractional stochastic gradient algorithm is developed. Convergence analysis and simulation examples are introduced to show the effectiveness of the proposed algorithms.


I. INTRODUCTION
Parameter estimation plays an important role in control theory and application [1], [2], [3]. A robust controller or an accurate predicted model is designed based on accurate parameters of the dynamic systems [4], [5], [6], [7]. The Stochastic Gradient (SG) and Least Squares (LS) algorithms are two classical identification algorithms. The LS algorithm updates the parameters by solving a derivative function, it has faster convergence rates but with the cost of heavier computational efforts when compared with the SG algorithm [8], [9]. In addition, if the considered model has a complex structure, the derivative function may not have analytic solutions. Therefore, the LS algorithm can be inefficient for systems with complex nonlinear structures.
The SG algorithm is a perfect alternative of the LS algorithm. It does not require function calculation and has less computational efforts. The basic idea of the SG algo-The associate editor coordinating the review of this manuscript and approving it for publication was Wentao Fan . rithm is first to design a direction, which is usually called negative gradient direction; and then to compute a suitable step-size for the direction [10], [11], [12]. Actually, the direction and step-size are two main factors in the SG algorithm designing. To improve the efficiency of the SG algorithm, one can choose a better direction, e.g., the conjugate SG algorithm [13], and the momentum SG algorithm [14]; or can design a suitable step-size, e.g., the forgetting factor SG algorithm [15], and the modified SG algorithm [16]. However, the SG algorithm has slow convergence rates due to its zigzagging nature [17].
The Fractional SG (FSG) algorithm, which was proposed in [18] and [19], is an outstanding modified SG algorithm. It introduces a fractional gradient direction in the SG algorithm. Thanks to this additional direction, the convergence rates of the FSG algorithm are faster than those of the SG algorithm. For example, Muhammad et al proposed a two-stage Fractional Least Mean Square (FLMS) identification algorithm for CARMA systems, where the systems are decomposed into two parts, and the two parts are VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ identified by the FLMS algorithm [20]. Naveed et al developed two modified fractional SG algorithms for power signal modeling, and these two algorithms have less computational efforts than the FSG algorithm [21]. Recently, Xu et al provided a momentum-based FSG and an adaptive-based FSG algorithms for time-delayed ARX models, these two algorithms have fast convergence rates and less computational efforts [22]. However, all the above work assumed that the models are linear. In engineering practices, the linear model usually cannot well catch the dynamics of the systems, while the nonlinear model can [23], [24], and [25]. Therefore, nonlinear model identification is more important. In the past few decades, a plethora of SG algorithms are proposed for different kinds of nonlinear models, e.g., the polynomial nonlinear model [26], [27], the separable nonlinear model [28], [29], and the hard nonlinear model [30], [31]. Among them, the hard nonlinear model identification is more difficult. Because designing a suitable step-size is challenging for the special structures of the hard nonlinear models.
This paper proposes an FSG algorithm for time-delayed nonlinear models with piece-wise linear input. By using the self-organizing maps method, the time-delay can be estimated. Then, based on the estimated time-delay, the parameters are updated using the FSG algorithm. To improve the convergence rates, a multi-innovation FSG algorithm is also proposed. Compared with the previous work, the method in this paper has the following contributions: (1) the method does not need to decompose the model into two sub-models, thus has a simpler structure than the structure in [20]; (2) the method involves more directions in each sampling instant, thus has faster convergence rates than the method in [21]; (3) the considered model is more difficult than the model in [22].
The paper is organized as follows: Section II describes the time-delayed nonlinear model and the fractional calculus. Section III proposes the FSG Self-Organizing Maps (FSG-SOM) algorithm. Section IV gives the properties of the FSG-SOM algorithm. Two simulation examples are provided in Section V. Finally, conclusions are presented in Section VI.

II. PROBLEM STATEMENT A. NONLINEAR MODEL WITH PIECE-WISE LINEAR INPUT
Consider the following time-delayed model with piece-wise linear input [32], where y t is the system output, u t is the system input with bounded values that is persistently excited, v t is the Gaussian white noise with zero mean, B(z) is a polynomial which is written by where z −1−τ y t = y t−1−τ . τ is the time delay. f (u t ) is the piece-wise linear input which is shown in FIGURE 1, where Define a switching function, Based on Equations (2) and (3), it gives rise to Then, the time-delayed model is written by Using the SG algorithm for the above model, we have Two challenges exist when identify the time-delayed model: (1) the information vector contains unknown variables because of the unknown time-delay τ ; (2) the convergence rates of the SG algorithm are slow for its zigzagging nature. Remark 1: Based on the parameter vector ϑ, once we obtain its estimates, we can recovery the parameters b i based

B. FRACTIONAL DERIVATIVE
To increase the convergence rates, Newton iterative algorithm is a good choice. However, the Newton method involves matrix inverse in each iteration which may lead to heavy computational efforts or even make the algorithm divergent for an ill-conditioned information matrix. Then, we use the FSG algorithm to establish the linkage between the SG and Newton algorithms. The Riemann-Liouville (RL) fractional derivative, the Caputo (CAP) fractional derivative, and the Grünwald-Letnikov (GL) fractional derivative are three classical methods which can obtain the fractional derivative.
The three classical fractional derivation methods are defined as

III. FSG BASED SOM ALGORITHM FOR TIME-DELAYED NONLINEAR MODEL
In this section, we use the SOM method to obtain the time-delay and then apply the FSG algorithm to estimate the parameters.

A. SOM METHOD
Since we have no knowledge of the time-delay. Let the upper bound of the time-delay be M , then we can get M sub-models, Assume that the parameters in the sampling instant t − 1 are ϑ t−1 , then the M residual errors are computed by Choosing the smallest error among all the residual errors yields For example, if the index = 4 in the sampling instant t − 1, then we can get that the time-delay isτ t−1 = 4.
Remark 2: The SOM method updates the time-delay with a binary choice, the weights of the sub-model are 1 or 0. This is different from the method in the expectation maximization (EM) algorithm which assigns different weights for each sub-model. Let For example, in the EM algorithm, for the sub-model j, its cost function is written by To obtain the parameters in the sampling instant t, the cost function is written by With a comparison, the cost function of the SOM method is Remark 3: Equations (8) and (9) show that the EM algorithm has heavier computational efforts than the SOM method. However, the EM algorithm assigns different weights for each sub-model, it is more stable than the SOM method.

B. FSG ALGORITHM
Once the time-delay in the sampling instant t − 1 is obtained, the parameters are estimated next. The parameter estimates of the FSG algorithm in the sampling instant t are updated by where

Remark 4:
The FSG algorithm performs an additional fractional gradient direction in the SG algorithm, which involves more information. Thus, it has faster convergence rates than the SG algorithm. According to Equation (7), we have Then, Equation (10) is transformed into Let Equation (11) is simplified as Remark 5: Assume that β is the step-size and is the negative gradient direction. The FSG algorithm is equivalent to the SG algorithm.
The difference between the FSG and SG algorithms is the step-size choosing method. In the SG algorithm, the step-size 1 r t is chosen as while the step-size in the FSG algorithm is

Remark 6:
In the SG algorithm, the step-sizes become smaller and smaller with the increased number of t, which may lead to slower convergence rates when compared with the FSG algorithm.

C. FSG BASED SOM ALGORITHM
In the first step, we obtain the time-delay based on the estimated parameters using the SOM method, that is In the second step, we update the parameters based on the estimated time-delay using the FSG algorithm, The steps of the FSG-SOM algorithm are listed as follows.

IV. PROPERTIES OF THE FSG-SOM ALGORITHM
Some properties of the FSG-SOM algorithm are given in this section which can help the readers follow the algorithm.

A. CONVERGENCE ANALYSIS OF THE FSG-SOM ALGORITHM
Theorem 1: For the FSG-SOM algorithm proposed in (13)- (16), the cost functions in the sampling instants t − 1 and t are J (ϑ t−1 ,τ t−1 ) and J (ϑ t ,τ t ), respectively. Then, the following inequality holds, Proof: Assume that the cost function in the sampling instant t − 1 is For a fixed time-delayτ t−1 , if we choose a suitable step-size β for the FSG algorithm, we can guarantee that That is According to SOM method, we obtain If follows that Then, we have The proof is completed. Remark 7: Theorem 1 only can guarantee that the estimates converge to a stable point. If J (ϑ, τ ) is a strictly convex function, it can ensure the estimates to converge to the global optimal point.

B. THE RANGE OF THE STEP-SIZE β
In Theorem 1, we should choose a suitable step-size β to keep Subtracting the true value ϑ on both sides of Equation (15) yields To keep the algorithm convergent, one should choose a suitable step-size β to make

Remark 8:
In order to keep the FSG algorithm convergent, the step-size β should keep the convergence factor there was not much way for the choice of β.
If γ satisfies there exist lots of methods to choose a suitable step-size β.
In this case, the FSG algorithm can be modified as Clearly, this is the same as the multi-innovation algorithm. It also demonstrates that the Multi-innovation FSG (M-FSG) algorithm has a faster convergence rate than the FSG algorithm but with the cost of heavier computational efforts. Remark 9: The M-FSG algorithm involves more directions in each iteration than the FSG algorithm, thus, it has faster convergence rates with cost of more computational efforts.
The steps of the M-FSG-SOM algorithm are listed as follows.
Theorem 2: Assume l = 2n + 2 and is nonsingular. The step-size β is chosen as Then, the best estimates can be obtained in only one iteration. Proof: Subtracting the true value ϑ on both sides of Equation (18) yields

M-FSG-SOM Algorithm
Initialize ϑ 0 = 1/p 0 with 1 being a column vector whose entries are all unity and p 0 = 10 6 , and y t = 0, u t = 0, t 0 Repeat for t = 1, 2, · · · , do Collect u(t) and y(t) Computeτ t−1 based on Equations (13) and (14) Form Clearly, if l = 2n + 2 and is nonsingular, we have e t = 0, which means that we can get the best estimates in only one iteration. Remark 10: If the number of the information vectors satisfies l > 2n + 2, and the step-size is chosen as Then, the M-FSG algorithm is the same as the LS algorithm where the number of the collected data is l.
Remark 11: Theorem 2 shows that the larger the number of directions is, the faster convergence rates the M-FSG algorithm will have. However, the larger number of directions will lead to heavier computational efforts.

A. EXAMPLE 1
Consider a nonlinear model with piece-wise linear input, The nonlinear model is simplified as Assume that the time delay is τ = 2 and assign M = 5.
In simulation, we collect 1000 sets of input and output data and assume α = 1.2. Use the FSG and SG algorithms for this nonlinear model. The parameter estimates and their estimation errors δ = ϑ t −ϑ ϑ are shown in Table 1    From this simulation, we can get the following findings: (1) The parameter estimates of the FSG and SG algorithms can asymptotically converge to the true values; (2) The convergence rates of the FSG algorithm are faster than those of the SG algorithm; (3) Based on the estimates of ϑ 1 and ϑ 2 , we can get k 1 and k 2 ; and based on ϑ 1 and ϑ 3 , we can compute b 1 ; (4) FIGURE 4 shows that the FSG algorithm is robust to the noise.

B. AN OPEN CHANNEL SYSTEM
An open channel system, shown in FIGURE 5, is proposed in this subsection, where R is the radius, x is the length of the channel, u t is the discharge at the upstream end, y t is the discharge at the downstream end, the slope is β. In this example, two slopes β = 10 and β = 15 degrees are assigned to the  open channel system, that leads to two different inputs [33], We collect 1000 sets of input-output data using Matlab software, where the sequence {u(t)} is generated by the data from t = 1 : 500 belong to β = 10, and those from 501 : 1000 belong to β = 15. Let = 1 2 (mean(u(1 : 500) + mean(501 : 1000)), and the true input are u = u − . The piece-wise input is written by Then, the open channel system is written by In simulation, we manually impose the time-delay as τ = 1, M = 5 and α = 0.9. Apply the SG, FSG and M-FSG algorithms for the considered nonlinear model. The parameter estimates and their estimation errors δ = ϑ t −ϑ ϑ are shown in FIGURE 6 and Table 2. The time-delay estimates are shown in FIGURE 7. The numbers of iterations with almost the   same estimation errors using the three algorithms are shown Table 3.
This simulation example shows that all the three algorithms can catch the true values. Among these three algorithms, the M-FSG algorithm has the fastest convergence rates, then is the FSG algorithm, and the SG algorithm has the slowest convergence rates.

VI. CONCLUSION
An FSG algorithm is proposed for piece-wise linear model in this paper. The algorithm has faster convergence rates than the traditional SG algorithm because it introduces a fractional gradient in the SG algorithm. The properties demonstrate that the multi-innovation FSG (M-FSG) algorithm is more efficient than the FSG algorithm. In application, we can choose different numbers of innovations on a case by case basis.
Although the FGS algorithm has several advantages over the SG algorithm, there are some interesting topics need to be further discussed. For example, how to choose a suitable α? and can this algorithm be extended to systems with other kind of nonlinear structures. These topics remain as open problems in future.
Data Availability Statement: All data generated or analyzed during this study are included in this article.