Journals & Magazines >IEEE Access >Volume: 10

Improved Redundant Rule-Based Stochastic Gradient Algorithm for Time-Delayed Models Using Lasso Regression

A graphical abstract for Improved Redundant Rule-Based Stochastic Gradient Algorithm for Time-Delayed Models Using Lasso Regression.

Abstract:

This paper proposes an improved redundant rule based lasso regression stochastic gradient (RR-LR-SG) algorithm for time-delayed models. The improved SG algorithm can upda...Show More

Metadata

Abstract:

This paper proposes an improved redundant rule based lasso regression stochastic gradient (RR-LR-SG) algorithm for time-delayed models. The improved SG algorithm can update the parameter elements with different step-sizes and directions, thus it is more adaptive; while the lasso regression method can pick out the small weights from the redundant parameter vector, it therefore can obtain the time-delay easily. To show the effectiveness of the proposed algorithm, the convergence analysis is also given. The simulated numerical results are consistent with the analytically derived results of the proposed algorithm.

A graphical abstract for Improved Redundant Rule-Based Stochastic Gradient Algorithm for Time-Delayed Models Using Lasso Regression.

Published in: IEEE Access ( Volume: 10)

Page(s): 3336 - 3342

Date of Publication: 24 December 2021

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2021.3138641

Funding Agency:

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

In this paper, we consider the following time-delayed model,

$\begin{equation*} y(t)= {\boldsymbol {\alpha }}(z)y(t-\tau)+ {\boldsymbol {\beta }}(z) u(t)+v(t),\tag{1}\end{equation*}$ View Source

where the polynomials

${\boldsymbol {\alpha }}(z)$

and

${\boldsymbol {\beta }}(z)$

are expressed as

$\begin{align*} {\boldsymbol {\alpha }}(z)=&\alpha _{1}z^{-1}+\cdots +\alpha _{n}z^{-n}, \\ {\boldsymbol {\beta }}(z)=&\beta _{1}z^{-1}+\cdots +\beta _{m}z^{-m},\end{align*}$

View Source

and the time-delay

$\tau$

is unknown. The focus of this paper is to use an improved stochastic gradient algorithm to identify the unknown parameters and time-delay simultaneously through lasso regression.

SG algorithm, a special kind of gradient descent (GD) algorithm, is usually regarded as a worthy addition to the least squares (LS) algorithm [1]–[6]. It constitutes of two steps: the direction designing and the step-size calculating [7], [8]. Although, it does not require to compute a matrix inverse and an analytic function, its convergence rates are quite slow. To improve the convergence rates, some improved SG algorithms are developed. These algorithms, e.g., the multi-innovation SG algorithm [9], [10], the forgetting factor SG algorithm and the momentum SG algorithm [11]–[14], improve the efficiency of the traditional SG algorithm in two ways: choose a better direction and calculate an optimal step-size. Note that all the elements in the parameter vector have the same step-size when starting the T-SG algorithm, which is unreasonable because each element in the parameter vector has its own order of magnitude. Therefore, assigning different step-sizes for each element is a better choice.

Time-delayed systems widely exist in engineering practices [15]–[17]. The identification of time-delayed system is more challenging because an unknown time-delay $\tau$ leads to an unknown information vector, which makes the identification algorithms be impossible. To overcome this difficulty, the iterative algorithm is often used. The basic idea of the iterative algorithm is to obtain the time-delay estimates and the parameter estimates iteratively [18], [19]. For example, Zhao et al developed a variational Bayesian (VB) approach for ARX models with a Markov chain time-varying time-delays, while the time-delay estimates and the parameters are obtained in the VB-E step and VB-M step, respectively [20]. Chen et al proposed an expectation maximization identification algorithm for time-delayed two-dimensional systems, where the time-delays and the parameters are updated iteratively [21]. Since the parameter estimates are depended on the time-delay estimates, and vice versa. Once one kind estimates have poor estimation accuracy, the other one will also have poor performance or may be divergent.

In this paper, an improved SG algorithm is proposed for an time-delayed model using the lasso regression. First, the redundant rule method is introduced to transform the time-delayed model into an augmented model whose parameter vector contains two zero sub-vectors and a parameter sub-vector. Then, the improved SG algorithm combining the lasso regression can distinguish these two kinds sub-vectors and then get the parameter and time-delay estimates simultaneously. The improved SG algorithm updates the parameter elements adaptively with different step-sizes and directions, therefore, it is more effective than the traditional SG algorithm.

Briefly, the paper is organized as follows. Section II describes the time-delayed model and the traditional SG algorithm. Section III introduces the framework of the redundant rule based SG algorithm. Section IV studies the RR-LR-SG algorithm. Section V provides an illustrative example. Finally, concluding remarks are given in Section VI.

SECTION II.

The Time-Delayed Model and Traditional SG Algorithm

Rewrite the time-delayed model as a regression model

$\begin{align*} y(t)=&{\boldsymbol {\phi }}^{ {\rm\text { T}}}(t-\tau) {\boldsymbol {\vartheta }}+v(t), \\ {\boldsymbol {\phi }}(t-\tau)=&[y(t-\tau -1),\cdots,y(t-\tau -n),u(t-1),\cdots, \\&~u(t-m)]^{ {\rm\text { T}}}\in {\mathbb R} ^{n+m}, \\ {\boldsymbol {\vartheta }}=&[\alpha _{1},\cdots,\alpha _{n},\beta _{1},\cdots,\beta _{m}]^{ {\rm\text { T}}}\in {\mathbb R} ^{n+m}.\end{align*}$ View Source

Define the cost function

$\begin{equation*} J({\boldsymbol {\vartheta }})=\frac {1}{2}[y(t)- {\boldsymbol {\phi }}^{ {\rm\text { T}}}(t-\tau) {\boldsymbol {\vartheta }}]^{2}.\end{equation*}$ View Source

The parameter estimates by using the traditional SG (T-SG) algorithm are written by

$\begin{equation*} \hat { {\boldsymbol {\vartheta }}}(t)=\hat { {\boldsymbol {\vartheta }}}(t-1)+\frac {1}{r(t)}[y(t)- {\boldsymbol {\phi }}^{ {\rm\text { T}}}(t-\tau)hat{ {\boldsymbol {\vartheta }}} (t-1)],\end{equation*}$ View Source

where

$\frac {1}{r(t)}$

is the step-size and can be computed by

$\begin{equation*} r(t)=r(t-1)+ {\boldsymbol {\phi }}^{ {\rm\text { T}}}(t-\tau) {\boldsymbol {\phi }}(t-\tau).\end{equation*}$

View Source

However, the T-SG algorithm is invalid for its unknown information vector

${\boldsymbol {\phi }}(t-\tau)$

. To get the parameter estimates, one should assume that the time-delay

$\tau$

is known in prior. The existing methods often use the iterative algorithm to update the parameters and time-delay [22], [23]. For example, in the sampling instant

$t$

, first let the time-delay lay between an interval

$[0,M]$

. Substitute the time-delay

$\tau =i, i=0,1,\cdots, M$

into the following function

$\begin{equation*} [y(t)- {\boldsymbol {\phi }}^{ {\rm\text { T}}}(t-i)hat{ {\boldsymbol {\vartheta }}} (t-1)]^{2}.\end{equation*}$

View Source

Assume that

$\begin{equation*} j=\arg \min _{i} \{[y(t)- {\boldsymbol {\phi }}^{ {\rm\text { T}}}(t-i)hat{ {\boldsymbol {\vartheta }}} (t-1)]^{2}, i=0,1,\cdots,M\}.\end{equation*}$

View Source

Then we can get that the time-delay

$\tau =j$

is the most likely value. Next, using the following SG algorithm to update the parameters yields

$\begin{equation*} hat{ {\boldsymbol {\vartheta }}} (t)=\hat { {\boldsymbol {\vartheta }}}(t-1)+\frac {1}{r(t)}[y(t)- {\boldsymbol {\phi }}^{ {\rm\text { T}}}(t-j)hat{ {\boldsymbol {\vartheta }}} (t-1)].\end{equation*}$

View Source

These two steps iteratively run until the parameters and time-delay converge to the true values. Since the two steps are coupled with each other, one kind poor estimates can also lead to their corresponding estimates poor or divergent. A question naturally arises: can we develop a method which can estimate the parameters and time-delay simultaneously?

SECTION III.

The Framework of the Redundant Rule Based SG Algorithm

The redundant rule method is an effective algorithm which can deal with system with invariant time-delay [24], [25]. This method can get the parameter estimates and time-delay estimates simultaneously. In the redundant rule method, assume that the upper bound of the time-delay $\tau$ is $M$ . Then, the time-delayed model can be transformed into an augmented model

$\begin{equation*} y(t)=\bar { {\boldsymbol {\phi }}}^{ {\rm\text { T}}}(t-M)\bar { {\boldsymbol {\vartheta }}}+v(t),\end{equation*}$ View Source

where

$\begin{align*} \bar { {\boldsymbol {\phi }}}^{ {\rm\text { T}}}(t)=&[\overbrace {y(t-1), \cdots,y(t-\tau)}^{\rm redundant ~{\mathrm{ vector}}}, \\&\underbrace {y(t-\tau -1), \cdots, y(t-\tau -n)}_{\rm original ~{\mathrm{ vector}}}, \\&\overbrace {y(t-\tau -n-1), \cdots, y(t-M-n-1)}^{\rm redundant ~{\mathrm{ vector}}}, \\&u(t-1),\cdots,u(t-m)]^{ {\rm\text { T}}}\in {\mathbb R} ^{M+n+m}, \\ \bar { {\boldsymbol {\vartheta }}}=&[\overbrace {\bar {\alpha }_{1},\cdots,\bar {\alpha }_{\tau }}^{\rm zero ~{\mathrm{ vector}}}, \underbrace {\bar {\alpha }_{\tau +1},\cdots, \bar {{\alpha }_{\tau +n}}}_{\rm parameter ~{\mathrm{ vector}}}, \\&\overbrace {\bar {\alpha }_{\tau +n+1},\cdots,\bar {\alpha }_{M+n}}^{\rm zero ~{\mathrm{ vector}}}, \beta _{1},\cdots,\beta _{m}]^{ {\rm\text { T}}}\in {\mathbb R} ^{M+n+m}.\end{align*}$

View Source

Since the redundant vectors play no role on the output

$y(t)$

, their corresponding parameter vectors are equal to zero vectors.

Using the redundant rule based SG algorithm to update the parameter vector $\bar { {\boldsymbol {\vartheta }}}$ yields

$\begin{align*} \bar { {\boldsymbol {\vartheta }}}(t)=&\bar { {\boldsymbol {\vartheta }}}(t-1)+\frac {1}{\bar {r}(t)}[y(t)-\bar { {\boldsymbol {\phi }}}^{ {\rm\text { T}}}(t)\bar { {\boldsymbol {\vartheta }}}(t-1)], \\ \bar {r}(t)=&\bar {r}(t-1)+\bar { {\boldsymbol {\phi }}}^{ {\rm\text { T}}}(t)\bar { {\boldsymbol {\phi }}}(t).\end{align*}$ View Source

Let

$\begin{align*} \bar { {\boldsymbol {\vartheta }}}(t)=&[\bar {\vartheta }_{1}(t),\cdots,\bar {\vartheta }_{M+n+m}(t)]^{ {\rm\text { T}}}\in {\mathbb R} ^{M+n+m}. \\ \bar { {\boldsymbol {\phi }}}(t)=&[\bar {\phi }_{1}(t),\cdots,\bar {\phi }_{M+n+m}(t)]^{ {\rm\text { T}}}\in {\mathbb R} ^{M+n+m}.\end{align*}$

View Source

When the estimates

$\bar { {\boldsymbol {\vartheta }}}(t)$

have been obtained, we compare the elements

$\bar {\vartheta }_{i}(t)$

with a threshold

$\zeta$

. If

$|\bar {\vartheta }_{i}(t)| < \zeta$

, this element can be regarded as a redundant element and be picked out from the vector

$\bar { {\boldsymbol {\vartheta }}}(t)$

. Once all the redundant elements have been picked out from the redundant parameter vector, the time-delay and the true parameter estimates can be obtained simultaneously.

Remark 1:

The threshold $\zeta$ plays an important role in the redundant rule based SG algorithm. If the threshold $\zeta$ is small, some redundant elements will be not picked out from the parameter vector ${\boldsymbol {\vartheta }}$ . On the other hand, a large threshold $\zeta$ would lead some true elements be mistaken for redundant ones. In application, we usually choose the threshold on a case by case basis.

SECTION IV.

The Redundant Rule Based Lasso Regression SG Algorithm

The lasso regression method is usually applied to overcome overfitting, where some secondary factors are involved in the structure of the system. In this paper, the lasso regression method is extended to the time-delayed model to pick out the zero vectors from the parameter vector.

A. Algorithm Designing

To use the lasso regression method for the time-delayed model, first let us introduce the following lemma.

Lemma 1:

For the function $y=|x|$ , its first derivative is

$\begin{align*} |x|'=\left \{{\begin{array}{l} 1, ~{\mathrm{ if}} ~x>0, \\ $[--1,1]$, ~{\mathrm{ if}} ~x=0, \\ -1, ~{\mathrm{ if}} ~x < 0 \end{array}}\right \}.\end{align*}$ View Source

Define the cost function as

$\begin{equation*} J(\bar { {\boldsymbol {\vartheta }}})=J_{1}(\bar { {\boldsymbol {\vartheta }}})+J_{2}(\bar { {\boldsymbol {\vartheta }}})=\frac {1}{2}[y(t)-\bar { {\boldsymbol {\phi }}}^{ {\rm\text { T}}}(t)\bar { {\boldsymbol {\vartheta }}}]^{2}+\frac {\lambda }{2}\|\bar { {\boldsymbol {\vartheta }}}^{ {\rm\text { T}}}\|_{1},\end{equation*}$ View Source

where

$\begin{align*} J_{1}(\bar { {\boldsymbol {\vartheta }}})=&\frac {1}{2}[y(t)-\bar { {\boldsymbol {\phi }}}^{ {\rm\text { T}}}(t)\bar { {\boldsymbol {\vartheta }}}]^{2} \\=&\frac {1}{2}[y(t)-\bar {\phi }_{1}(t)\bar {\vartheta }_{1}-\cdots -\bar {\phi }_{M+n+m}(t)\bar {\vartheta }_{M+n+m}]^{2}, \\ J_{2}(\bar { {\boldsymbol {\vartheta }}})=&\frac {\lambda }{2}|\bar {\vartheta }_{1}|+\cdots +\frac {\lambda }{2}|\bar {\vartheta }_{M+n+m}|.\end{align*}$

View Source

By using the regularization term

$\frac {\lambda }{2}\|\bar { {\boldsymbol {\vartheta }}}^{ {\rm\text { T}}}\|_{1}$

, some redundant parameters can be picked out from the augmented parameter vector, and then the time-delay can be estimated based on the structure of the new parameter vector.

Taking the derivative of $J_{1}(\bar { {\boldsymbol {\vartheta }}})$ with respect to $\bar {\vartheta }_{1}$ yields

$\begin{align*}&\hspace {-0.5pc}\frac {J_{1}(\bar { {\boldsymbol {\vartheta }}})}{\bar {\vartheta }_{1}}|_{\bar { {\boldsymbol {\vartheta }}}(t-1)}=-\bar {\phi }_{1}(t)[y(t)-\bar {\phi }_{1}(t)\bar {\vartheta }_{1}(t-1)-\cdots \\&-\,\bar {\phi }_{M+n+m}(t)\bar {\vartheta }_{M+n+m}(t-1)].\tag{2}\end{align*}$ View Source

Taking the derivative of

$J_{2}(\bar { {\boldsymbol {\vartheta }}})$

with respect to

$\bar {\vartheta }_{1}$

yields

$\begin{align*} \dfrac {J_{2}(\bar { {\boldsymbol {\vartheta }}})}{\bar {\vartheta }_{1}}|_{\bar { {\boldsymbol {\vartheta }}}(t-1)}=\left \{{\begin{array}{l} \dfrac {\lambda }{2}, ~{\mathrm{ if}} ~\bar {\vartheta }_{1}>0, \\ $[--a,a]$, ~{\mathrm{ if}} ~\bar {\vartheta }_{1}=0, \\ -\dfrac {\lambda }{2}, ~{\mathrm{ if}} ~\bar {\vartheta }_{1} < 0 \end{array}}\right \},\tag{3}\end{align*}$

View Source

where

$a=\frac {1}{2}\lambda$

Then, the parameter element $\bar {\vartheta }_{1}$ in the sampling instant $t$ can be computed by

$\begin{align*} \bar {\vartheta }_{1}(t)=&\bar {\vartheta }_{1}(t-1)-r\left [{\frac {J_{1}(\bar { {\boldsymbol {\vartheta }}})}{\bar {\vartheta }_{1}}|_{\bar { {\boldsymbol {\vartheta }}}(t-1)}+\frac {J_{2}(\bar { {\boldsymbol {\vartheta }}})}{\bar {\vartheta }_{1}}|_{\bar { {\boldsymbol {\vartheta }}}(t-1)}}\right] \\=&\bar {\vartheta }_{1}(t-1)+r\bar {\phi }_{1}(t-1)[y(t)-\bar {\phi }_{1}(t)\bar {\vartheta }_{1}(t-1)-\cdots \\&-\,\bar {\phi }_{M+n+m}(t)\bar {\vartheta }_{M+n+m}(t-1)]-r\frac {J_{2}(\bar { {\boldsymbol {\vartheta }}})}{\bar {\vartheta }_{1}}|_{\bar { {\boldsymbol {\vartheta }}}(t-1)}.\end{align*}$ View Source

Substituting Equation (3) into the above equation yields

$\begin{align*}&\hspace {-1.5pc}\bar {\vartheta }_{1}(t) \\=&\bar {\vartheta }_{1}(t-1)-r\left [{\frac {J_{1}(\bar { {\boldsymbol {\vartheta }}})}{\bar {\vartheta }_{1}}|_{\bar { {\boldsymbol {\vartheta }}}(t-1)}+\frac {J_{2}(\bar { {\boldsymbol {\vartheta }}})}{\bar {\vartheta }_{1}}|_{\bar { {\boldsymbol {\vartheta }}}(t-1)}}\right] \\=&\left \{{\begin{array}{l} \bar {\vartheta }_{1}(t-1)+r\bar {\phi }_{1}(t)[y(t)-\bar { {\boldsymbol {\phi }}}(t)\bar { {\boldsymbol {\vartheta }}}(t-1)]-r\dfrac {\lambda }{2}, \\ \qquad \quad {\mathrm{ if}} ~m_{1}>\dfrac {\lambda }{2}, \\ 0, ~{\mathrm{ if}} ~m_{1}\in \left[{-\dfrac {\lambda }{2},\dfrac {\lambda }{2}}\right] \\ \bar {\vartheta }_{1}(t-1)+r\bar {\phi }_{1}(t)[y(t)-\bar { {\boldsymbol {\phi }}}(t)\bar { {\boldsymbol {\vartheta }}}(t-1)]+r\dfrac {\lambda }{2}, \\ \qquad \quad {\mathrm{ if}} ~m_{1} < -\dfrac {\lambda }{2} \end{array}}\right \},\end{align*}$

View Source

where

$\begin{align*}&\hspace {-0.5pc}m_{1}(t)=\bar {\phi }_{1}(t)[y(t)-\bar {\phi }_{2}(t)\bar {\vartheta }_{2}(t-1)-\cdots \\&-\,\bar {\phi }_{M+n+m}(t)\bar {\vartheta }_{M+n+m}(t-1)].\tag{4}\end{align*}$

View Source

Therefore, we can get the redundant rule based lasso regression SG (RR-LR-SG) algorithm

$\begin{align*} \bar {\vartheta }_{i}(t)=&\left \{{\!\begin{array}{l} \bar {\vartheta }_{i}(t-1)+r_{i}\bar {\phi }_{i}(t)[y(t)-\bar { {\boldsymbol {\phi }}}(t)\bar { {\boldsymbol {\vartheta }}}(t-1)]-r_{i}\dfrac {\lambda }{2}, \\ \qquad \quad {\mathrm{ if}} ~m_{i}(t)>\dfrac {\lambda }{2}, \\ 0, ~{\mathrm{ if}} ~m_{i}(t)\in \left[{-\dfrac {\lambda }{2},\dfrac {\lambda }{2}}\right] \\ \bar {\vartheta }_{i}(t-1)+r_{i}\bar {\phi }_{i}(t)[y(t)-\bar { {\boldsymbol {\phi }}}(t)\bar { {\boldsymbol {\vartheta }}}(t-1)]+r_{i}\dfrac {\lambda }{2}, \\ \qquad \quad {\mathrm{ if}} ~m_{i} < -\dfrac {\lambda }{2} \end{array}\!}\right \}, \\ m_{i}(t)=&\bar {\phi }_{i}(t)[y(t)-\bar {\phi }_{1}(t)\bar {\vartheta }_{1}(t-1)-\cdots \\&-\,\bar {\phi }_{i-1}(t)\bar {\vartheta }_{i-1}(t-1) \\&-\,\bar {\phi }_{i+1}(t)\bar {\vartheta }_{i+1}(t-1)-\cdots \\&-\,\bar {\phi }_{M+n+m}(t)\bar {\vartheta }_{M+n+m}(t-1)].\end{align*}$ View Source

Remark 2:

In the improved lasso SG algorithm, when the residual error $m_{i}$ lays between $\left[{-\frac {\lambda }{2},\frac {\lambda }{2}}\right]$ , its corresponding parameter element $\vartheta _{i}$ is equal to zero, which means that the redundant parameter elements are picked out from the vector.

Remark 3:

In the improved lasso SG algorithm, each parameter element has its own direction and step-size. Thus it can update the parameters adaptively when the elements have different orders of magnitude.

Remark 4:

The improved lasso SG algorithm has faster convergence rates than the traditional SG algorithm for its adaptive property. However, its computational efforts are heavier than those of the traditional SG algorithm because it should compute $M+n+m$ step-sizes in each sampling instant.

B. Step-Size Choosing

The step-size is important in the SG algorithm designing, a small step-size leads to a slow convergence rate, while a large one would get divergent results. Next, three step-size designing methods are given.

Method 1:
For simplicity, let
$\begin{align*}&\hspace {-0.5pc}n_{j}(t)=\bar {\phi }_{i}(t)[y(t)-\bar {\phi }_{1}(t)\bar {\vartheta }_{1}(t-1)-\cdots \\&-\,\bar {\phi }_{M+n+m}(t)\bar {\vartheta }_{M+n+m}(t-1)].\tag{5}\end{align*}$ View Source Rewrite the parameter element $\begin{align*} \bar {\vartheta }_{i}(t)=\left \{{\begin{array}{l} \bar {\vartheta }_{i}(t-1)+r_{i}\left({n_{j}(t)-\dfrac {\lambda }{2}}\right), ~{\mathrm{ if}} ~m_{i}(t)>\dfrac {\lambda }{2}, \\ 0, ~{\mathrm{ if}} ~m_{i}(t)\in \left[{-\dfrac {\lambda }{2},\dfrac {\lambda }{2}}\right] \\ \bar {\vartheta }_{i}(t-1)+r_{i}\left({n_{j}(t)+\dfrac {\lambda }{2}}\right), ~{\mathrm{ if}} ~m_{i} < -\dfrac {\lambda }{2} \end{array}}\right \}.\end{align*}$ View Source
Substituting the above equation into the cost function $J(\bar { {\boldsymbol {\vartheta }}})$ and taking the derivative of $J(r_{i})$ with respect to $r_{i}$ yield
$\begin{align*} r_{i}(t)=\left \{{\!\begin{array}{l}\dfrac {0.5\lambda -\bar {\phi }_{i}(t)n_{i}(t)}{\bar {\phi }^{2}_{i}(t)(n_{i}(t)-\lambda)}, ~{\mathrm{ if}} ~m_{i}(t)>\dfrac {\lambda }{2}, \\ \dfrac {\bar {\phi }_{i}(t)n_{i}(t)-0.5\lambda }{\bar {\phi }^{2}_{i}(t)(n_{i}(t)+\lambda)}, ~{\mathrm{ if}} ~m_{i} < -\dfrac {\lambda }{2}\end{array}\!}\right \}.\end{align*}$ View Source
Remark 5:
Method 1 computes the step-size by solving the derivative function of $J(r_{i})$ , it is the same as the steepest descent SG algorithm in [26].
Method 2:
Consider that the cost function $J_{2}(\bar { {\boldsymbol {\vartheta }}})$ is used to chose the small weights from the parameter vector. When computing the step-size, we can neglect its influence.
Substituting the parameter estimates into the cost function and taking the derivative of $J(r_{i})$ with respect to $r_{i}$ yield
$\begin{equation*} r_{i}(t)=\frac {1}{\bar {\phi }^{2}_{i}(t)}.\end{equation*}$ View Source Since a smaller $\bar {\phi }_{i}(t)$ may yield a larger step-size, the algorithm can be oscillated intensively. To overcome these difficulty, we introduce a small constant $\rho$ . Then the step-size can be written by $\begin{equation*} r_{i}(t)=\frac {1}{\rho +\bar {\phi }^{2}_{i}(t)}.\end{equation*}$ View Source It is noted that when the estimates are closing the true values, the step-size becomes smaller and smaller, another kind of step-size is $\begin{equation*} r_{i}(t)=\frac {1}{\sum \limits _{j=1}^{t-1}\bar {\phi }^{2}_{i}(j)}.\end{equation*}$ View Source In this case, the algorithm is more stable.
Method 3:
Inspired by the intelligent search method, we can use many step-sizes to get a better parameter estimate.
For example, when computing the step-size $r_{i}(t)$ , we first give an interval $[0,r^{max}_{i}(t)]$ and choose $N$ different step-sizes $r^{j}_{i}(t), j=1,2,\cdots,N$ from this interval. Substituting these $N$ step-sizes into the parameter estimates gets
$\begin{align*} \bar {\vartheta }^{j}_{i}(t)=\left \{{\begin{array}{l} \bar {\vartheta }_{i}(t-1)+r^{j}_{i}\left({n_{j}(t)-\dfrac {\lambda }{2}}\right), ~{\mathrm{ if}} ~m_{i}(t)>\dfrac {\lambda }{2}, \\ 0, ~{\mathrm{ if}} ~m_{i}(t)\in \left[{-\dfrac {\lambda }{2},\dfrac {\lambda }{2}}\right] \\ \bar {\vartheta }_{i}(t-1)+r^{j}_{i}\left({n_{j}(t)+\dfrac {\lambda }{2}}\right), ~{\mathrm{ if}} ~m_{i} < -\frac {\lambda }{2}\end{array}}\right \}.\end{align*}$ View Source Then their corresponding $N$ cost functions are yielded $J(\vartheta ^{j}_{i}(t))$ . The best estimate is the one satisfies $\begin{equation*} \bar {\vartheta }_{i}(t)=\arg \min _{\bar {\vartheta }^{j}_{i}(t)}[J(\bar {\vartheta }^{j}_{i}(t)), j=1,\cdots,N].\end{equation*}$ View Source

Remark 6:

In method 3, the choice of the upper bound $r^{max}_{i}(t)$ is challenging, a small upper bound may lead to slow convergence rates, e.g., all the cost functions ensure $J(\bar {\vartheta }^{j}_{i}(t)) < J(\bar {\vartheta }_{i}(t-1))$ ; on the other hand, a large upper bound may diverge the results, e.g., all the cost functions satisfy $J(\bar {\vartheta }^{j}_{i}(t))>J(\bar {\vartheta }_{i}(t-1))$ .

Remark 7:

For a small upper bound, we usually assign $r^{max}_{i,new}(t)=2r^{max}_{i,old}(t)$ ; while for a large one, we can perform $r^{max}_{i,new}(t)=\frac {1}{2}r^{max}_{i,old}(t)$ .

Remark 8:

Since the intelligent search method should compute several cost functions in each iteration, its computational efforts are heavy.

C. Properties of the Algorithm

Define

$\begin{equation*} \bar {\vartheta }_{i}(t)-\bar {\vartheta }_{i}=e_{i}(t).\end{equation*}$ View Source

Subtracting

$\bar {\vartheta }_{i}$

on both sides of the parameter estimates yields

$\begin{align*}&\hspace {-1.2pc}e_{i}(t) \\=&\left \{{\begin{array}{l} e_{i}(t-1)-r_{i}\bar {\phi }_{i}(t)[\bar { {\boldsymbol {\phi }}}^{ {\rm\text { T}}}(t)E(t-1)-v(t)]-r_{i}\dfrac {\lambda }{2}, \\ \qquad \quad {\mathrm{ if}} ~m_{i}(t)>\dfrac {\lambda }{2}, \\ 0, ~{\mathrm{ if}} ~m_{i}(t)\in \left[{-\dfrac {\lambda }{2},\dfrac {\lambda }{2}}\right] \\ e_{i}(t-1)-r_{i}\bar {\phi }_{i}(t)[\bar { {\boldsymbol {\phi }}}^{ {\rm\text { T}}}(t)E(t-1)-v(t)]+r_{i}\dfrac {\lambda }{2}, \\ \qquad \quad {\mathrm{ if}} ~m_{i}(t) < \dfrac {\lambda }{2}\end{array}}\right \}. \\\tag{6}\end{align*}$

View Source

For simplicity, we only consider the first equation in (6). The parameter vector error can be written by

$\begin{align*} E(t)\!= \!E(t\!-\!\!1)\!-\!R(t\!-\!1)\bar { {\boldsymbol {\phi }}}(t)[\bar { {\boldsymbol {\phi }}}^{ {\rm\text { T}}}\!(t)E(t\!-\!1)\!-\!v(t)]\!- \! R(t\!-\!\!1) {\boldsymbol {\lambda }},\!\!\!\!\! \\\tag{7}\end{align*}$ View Source

where

$\begin{align*} R(t-1)=&{\mathrm{ diag}}[r_{1}(t-1),r_{2}(t-1),\cdots,r_{M+n+m}(t-1)], \\ {\boldsymbol {\lambda }}=&\left[{\frac {\lambda }{2},\frac {\lambda }{2},\cdots,\frac {\lambda }{2}}\right]^{M+n+m}.\end{align*}$

View Source

Since

$v(t)$

is a Gaussian white noise and is independent on the input and output

$\{u(1),\cdots,u(t),y(1),\cdots,y(t-1)\}$

, Equation (7) is simplified as

$\begin{equation*} E(t)= [I-R(t-1)\bar { {\boldsymbol {\phi }}}(t)\bar { {\boldsymbol {\phi }}}^{ {\rm\text { T}}}(t)]E(t-1)-R(t-1) {\boldsymbol {\lambda }}.\tag{8}\end{equation*}$

View Source

The step-size matrix

$R(t-1)$

is usually chosen to ensure

$\begin{equation*} \varrho _{max}[I-R(t-1)\bar { {\boldsymbol {\phi }}}(t)\bar { {\boldsymbol {\phi }}}^{ {\rm\text { T}}}(t)] < 1,\end{equation*}$

View Source

where

$\varrho _{max}[M]$

means the spectral radius of the matrix

$M$

Taking the expectation on both sides of Equation (7) obtains

$\begin{align*} { \boldsymbol E}[E(t)]=&{ \boldsymbol E}\{[I\!-\!R(t\!-\!1)\bar { {\boldsymbol {\phi }}}(t)\bar { {\boldsymbol {\phi }}}^{ {\rm\text { T}}}(t)]E(t\!-\!1)\}\!-\!{ \boldsymbol E}[R(t\!-\!1) {\boldsymbol {\lambda }}] \\=&R(t-1) {\boldsymbol {\lambda }}\neq {\mathbf{0}},\end{align*}$ View Source

where

${\mathbf{0}}\in {\mathbb R} ^{M+n+m}$

is a zero vector.

Remark 9:

The above equation shows that the redundant rule based lasso SG algorithm is a biased algorithm.

SECTION V.

Example

Consider the following time-delayed model,

$\begin{align*} y(t)=&0.12z^{-1}y(t\!-\!\tau -1)\!+\![1.2z^{-1}\!+\!0.85z^{-2}]u(t)+v(t), \\ u(t)\backsim&N(0,1), \quad v(t)\backsim N(0,0.1^{2}).\end{align*}$ View Source

Assume that the time-delay

$\tau =1$

, and the upper bound of the time-delay is

$M=4$

. Define

$\begin{align*} \bar { {\boldsymbol {\vartheta }}}=&[\bar {\vartheta }_{1},\bar {\vartheta }_{2},\bar {\vartheta }_{3},\bar {\vartheta }_{4},\bar {\vartheta }_{5},\bar {\vartheta }_{6},\bar {\vartheta }_{7}]^{ {\rm\text { T}}} \\=&[{0,0.12,0,0,0,1.2,0.85}]^{ {\rm\text { T}}}, \\ {\boldsymbol {\phi }}(t)=&[y(t-1),y(t-2),y(t-3),y(t-4),y(t-5), \\&~u(t-1),u(t-2)]^{ {\rm\text { T}}}.\end{align*}$

View Source

Apply the redundant rule based SG (RR-SG) algorithm and the redundant rule based lasso SG (RR-LR-SG,

$\lambda =0.8$

) algorithm for this time-delayed model, the parameter estimates and their errors are shown in Table 1, the parameter estimation errors

$\delta:=\|\hat { {\boldsymbol {\vartheta }}}- {\boldsymbol {\vartheta }}\|/\| {\boldsymbol {\vartheta }}\|$

versus

$t$

are shown in Figure 1.

TABLE 1 The Parameter Estimates and Errors

$FIGURE 1. - The parameter estimation errors $\delta $ versus $t$ .$

FIGURE 1.

The parameter estimation errors $\delta$ versus $t$ .

Show All

To show the role of the constants $\lambda$ , use the RR-LR-SG algorithm with different $\lambda$ for the time-delayed model, the parameter estimates and their errors are shown in Table 2.

TABLE 2 The Parameter Estimates and Errors (RR-LR-SG)

In addition, we apply the RR-LR-SG algorithm for the considered model with different signal-to-noise ratios (SNR). The parameter estimation errors are shown in Figure 2. Finally, the step-size choosing methods proposed in Section IV are applied for the time-delayed model, the parameter estimation errors are shown in Figure 3.

$FIGURE 2. - The parameter estimation errors $\delta $ versus $t$ .$

FIGURE 2.

The parameter estimation errors $\delta$ versus $t$ .

Show All

$FIGURE 3. - The parameter estimation errors $\delta $ versus $t$ .$

FIGURE 3.

The parameter estimation errors $\delta$ versus $t$ .

Show All

From this example, the following findings can be obtained:

The parameter estimates by using the SG algorithm can asymptotically converge to a stable point but not the optimal point that is because the step-sizes are approaching to zero with the increased sampling instant $t$ , as shown in Table 1 and Figure 1.
The parameter estimates by using the RR-LR-SG algorithm are more accurate than those of the SG algorithms, for the reason that the RR-LR-SG algorithm updates the parameter elements one by one with adaptive step-sizes and directions, as shown in Table 1 and Figure 1.
The RR-LR-SG algorithm can pick out the redundant elements from the parameter vector, while the SG algorithm cannot, as shown in Table 1.
A larger $\lambda$ can lead to more accurate time-delay estimate but less accurate parameter estimates, this is demonstrated in Table 2.
When using the RR-LR-SG algorithm for the proposed time-delayed model, a larger SNR leads to more accurate parameter estimates, see Figure 2.
Figure 3 shows that the intelligent search method (Method 3) has the best step-size, then is Method 2, and Method 1 has the poorest step-size.

SECTION VI.

Conclusion

An improved redundant rule based lasso regression stochastic gradient algorithm is proposed for time-delayed model in this study. The proposed algorithm has two advantages when compared with the traditional SG algorithm: (1) each element in the parameter vector can be adaptively updated (has different step-sizes and directions); (2) the parameter and time-delay estimates can be obtained simultaneously. The convergence analysis and simulation example are given to show that the proposed algorithm is effective.

Although the proposed algorithm can get the estimates adaptively, it remains some challenging and interesting problems. For example, the choice of the values of the constant $\lambda$ , the step-size in the improved SG algorithm. These topics need to be further investigated.

References is not available for this document.

Improved Redundant Rule-Based Stochastic Gradient Algorithm for Time-Delayed Models Using Lasso Regression

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction

The Time-Delayed Model and Traditional SG Algorithm