Introduction
In this paper, we consider the following time-delayed model, \begin{equation*} y(t)= {\boldsymbol {\alpha }}(z)y(t-\tau)+ {\boldsymbol {\beta }}(z) u(t)+v(t),\tag{1}\end{equation*}
\begin{align*} {\boldsymbol {\alpha }}(z)=&\alpha _{1}z^{-1}+\cdots +\alpha _{n}z^{-n}, \\ {\boldsymbol {\beta }}(z)=&\beta _{1}z^{-1}+\cdots +\beta _{m}z^{-m},\end{align*}
SG algorithm, a special kind of gradient descent (GD) algorithm, is usually regarded as a worthy addition to the least squares (LS) algorithm [1]–[6]. It constitutes of two steps: the direction designing and the step-size calculating [7], [8]. Although, it does not require to compute a matrix inverse and an analytic function, its convergence rates are quite slow. To improve the convergence rates, some improved SG algorithms are developed. These algorithms, e.g., the multi-innovation SG algorithm [9], [10], the forgetting factor SG algorithm and the momentum SG algorithm [11]–[14], improve the efficiency of the traditional SG algorithm in two ways: choose a better direction and calculate an optimal step-size. Note that all the elements in the parameter vector have the same step-size when starting the T-SG algorithm, which is unreasonable because each element in the parameter vector has its own order of magnitude. Therefore, assigning different step-sizes for each element is a better choice.
Time-delayed systems widely exist in engineering practices [15]–[17]. The identification of time-delayed system is more challenging because an unknown time-delay
In this paper, an improved SG algorithm is proposed for an time-delayed model using the lasso regression. First, the redundant rule method is introduced to transform the time-delayed model into an augmented model whose parameter vector contains two zero sub-vectors and a parameter sub-vector. Then, the improved SG algorithm combining the lasso regression can distinguish these two kinds sub-vectors and then get the parameter and time-delay estimates simultaneously. The improved SG algorithm updates the parameter elements adaptively with different step-sizes and directions, therefore, it is more effective than the traditional SG algorithm.
Briefly, the paper is organized as follows. Section II describes the time-delayed model and the traditional SG algorithm. Section III introduces the framework of the redundant rule based SG algorithm. Section IV studies the RR-LR-SG algorithm. Section V provides an illustrative example. Finally, concluding remarks are given in Section VI.
The Time-Delayed Model and Traditional SG Algorithm
Rewrite the time-delayed model as a regression model \begin{align*} y(t)=&{\boldsymbol {\phi }}^{ {\rm\text { T}}}(t-\tau) {\boldsymbol {\vartheta }}+v(t), \\ {\boldsymbol {\phi }}(t-\tau)=&[y(t-\tau -1),\cdots,y(t-\tau -n),u(t-1),\cdots, \\&~u(t-m)]^{ {\rm\text { T}}}\in {\mathbb R} ^{n+m}, \\ {\boldsymbol {\vartheta }}=&[\alpha _{1},\cdots,\alpha _{n},\beta _{1},\cdots,\beta _{m}]^{ {\rm\text { T}}}\in {\mathbb R} ^{n+m}.\end{align*}
Define the cost function \begin{equation*} J({\boldsymbol {\vartheta }})=\frac {1}{2}[y(t)- {\boldsymbol {\phi }}^{ {\rm\text { T}}}(t-\tau) {\boldsymbol {\vartheta }}]^{2}.\end{equation*}
The parameter estimates by using the traditional SG (T-SG) algorithm are written by \begin{equation*} \hat { {\boldsymbol {\vartheta }}}(t)=\hat { {\boldsymbol {\vartheta }}}(t-1)+\frac {1}{r(t)}[y(t)- {\boldsymbol {\phi }}^{ {\rm\text { T}}}(t-\tau)hat{ {\boldsymbol {\vartheta }}} (t-1)],\end{equation*}
\begin{equation*} r(t)=r(t-1)+ {\boldsymbol {\phi }}^{ {\rm\text { T}}}(t-\tau) {\boldsymbol {\phi }}(t-\tau).\end{equation*}
\begin{equation*} [y(t)- {\boldsymbol {\phi }}^{ {\rm\text { T}}}(t-i)hat{ {\boldsymbol {\vartheta }}} (t-1)]^{2}.\end{equation*}
\begin{equation*} j=\arg \min _{i} \{[y(t)- {\boldsymbol {\phi }}^{ {\rm\text { T}}}(t-i)hat{ {\boldsymbol {\vartheta }}} (t-1)]^{2}, i=0,1,\cdots,M\}.\end{equation*}
\begin{equation*} hat{ {\boldsymbol {\vartheta }}} (t)=\hat { {\boldsymbol {\vartheta }}}(t-1)+\frac {1}{r(t)}[y(t)- {\boldsymbol {\phi }}^{ {\rm\text { T}}}(t-j)hat{ {\boldsymbol {\vartheta }}} (t-1)].\end{equation*}
The Framework of the Redundant Rule Based SG Algorithm
The redundant rule method is an effective algorithm which can deal with system with invariant time-delay [24], [25]. This method can get the parameter estimates and time-delay estimates simultaneously. In the redundant rule method, assume that the upper bound of the time-delay \begin{equation*} y(t)=\bar { {\boldsymbol {\phi }}}^{ {\rm\text { T}}}(t-M)\bar { {\boldsymbol {\vartheta }}}+v(t),\end{equation*}
\begin{align*} \bar { {\boldsymbol {\phi }}}^{ {\rm\text { T}}}(t)=&[\overbrace {y(t-1), \cdots,y(t-\tau)}^{\rm redundant ~{\mathrm{ vector}}}, \\&\underbrace {y(t-\tau -1), \cdots, y(t-\tau -n)}_{\rm original ~{\mathrm{ vector}}}, \\&\overbrace {y(t-\tau -n-1), \cdots, y(t-M-n-1)}^{\rm redundant ~{\mathrm{ vector}}}, \\&u(t-1),\cdots,u(t-m)]^{ {\rm\text { T}}}\in {\mathbb R} ^{M+n+m}, \\ \bar { {\boldsymbol {\vartheta }}}=&[\overbrace {\bar {\alpha }_{1},\cdots,\bar {\alpha }_{\tau }}^{\rm zero ~{\mathrm{ vector}}}, \underbrace {\bar {\alpha }_{\tau +1},\cdots, \bar {{\alpha }_{\tau +n}}}_{\rm parameter ~{\mathrm{ vector}}}, \\&\overbrace {\bar {\alpha }_{\tau +n+1},\cdots,\bar {\alpha }_{M+n}}^{\rm zero ~{\mathrm{ vector}}}, \beta _{1},\cdots,\beta _{m}]^{ {\rm\text { T}}}\in {\mathbb R} ^{M+n+m}.\end{align*}
Using the redundant rule based SG algorithm to update the parameter vector \begin{align*} \bar { {\boldsymbol {\vartheta }}}(t)=&\bar { {\boldsymbol {\vartheta }}}(t-1)+\frac {1}{\bar {r}(t)}[y(t)-\bar { {\boldsymbol {\phi }}}^{ {\rm\text { T}}}(t)\bar { {\boldsymbol {\vartheta }}}(t-1)], \\ \bar {r}(t)=&\bar {r}(t-1)+\bar { {\boldsymbol {\phi }}}^{ {\rm\text { T}}}(t)\bar { {\boldsymbol {\phi }}}(t).\end{align*}
\begin{align*} \bar { {\boldsymbol {\vartheta }}}(t)=&[\bar {\vartheta }_{1}(t),\cdots,\bar {\vartheta }_{M+n+m}(t)]^{ {\rm\text { T}}}\in {\mathbb R} ^{M+n+m}. \\ \bar { {\boldsymbol {\phi }}}(t)=&[\bar {\phi }_{1}(t),\cdots,\bar {\phi }_{M+n+m}(t)]^{ {\rm\text { T}}}\in {\mathbb R} ^{M+n+m}.\end{align*}
Remark 1:
The threshold
The Redundant Rule Based Lasso Regression SG Algorithm
The lasso regression method is usually applied to overcome overfitting, where some secondary factors are involved in the structure of the system. In this paper, the lasso regression method is extended to the time-delayed model to pick out the zero vectors from the parameter vector.
A. Algorithm Designing
To use the lasso regression method for the time-delayed model, first let us introduce the following lemma.
Lemma 1:
For the function \begin{align*} |x|'=\left \{{\begin{array}{l} 1, ~{\mathrm{ if}} ~x>0, \\ $[--1,1]$, ~{\mathrm{ if}} ~x=0, \\ -1, ~{\mathrm{ if}} ~x < 0 \end{array}}\right \}.\end{align*}
Define the cost function as \begin{equation*} J(\bar { {\boldsymbol {\vartheta }}})=J_{1}(\bar { {\boldsymbol {\vartheta }}})+J_{2}(\bar { {\boldsymbol {\vartheta }}})=\frac {1}{2}[y(t)-\bar { {\boldsymbol {\phi }}}^{ {\rm\text { T}}}(t)\bar { {\boldsymbol {\vartheta }}}]^{2}+\frac {\lambda }{2}\|\bar { {\boldsymbol {\vartheta }}}^{ {\rm\text { T}}}\|_{1},\end{equation*}
\begin{align*} J_{1}(\bar { {\boldsymbol {\vartheta }}})=&\frac {1}{2}[y(t)-\bar { {\boldsymbol {\phi }}}^{ {\rm\text { T}}}(t)\bar { {\boldsymbol {\vartheta }}}]^{2} \\=&\frac {1}{2}[y(t)-\bar {\phi }_{1}(t)\bar {\vartheta }_{1}-\cdots -\bar {\phi }_{M+n+m}(t)\bar {\vartheta }_{M+n+m}]^{2}, \\ J_{2}(\bar { {\boldsymbol {\vartheta }}})=&\frac {\lambda }{2}|\bar {\vartheta }_{1}|+\cdots +\frac {\lambda }{2}|\bar {\vartheta }_{M+n+m}|.\end{align*}
Taking the derivative of \begin{align*}&\hspace {-0.5pc}\frac {J_{1}(\bar { {\boldsymbol {\vartheta }}})}{\bar {\vartheta }_{1}}|_{\bar { {\boldsymbol {\vartheta }}}(t-1)}=-\bar {\phi }_{1}(t)[y(t)-\bar {\phi }_{1}(t)\bar {\vartheta }_{1}(t-1)-\cdots \\&-\,\bar {\phi }_{M+n+m}(t)\bar {\vartheta }_{M+n+m}(t-1)].\tag{2}\end{align*}
\begin{align*} \dfrac {J_{2}(\bar { {\boldsymbol {\vartheta }}})}{\bar {\vartheta }_{1}}|_{\bar { {\boldsymbol {\vartheta }}}(t-1)}=\left \{{\begin{array}{l} \dfrac {\lambda }{2}, ~{\mathrm{ if}} ~\bar {\vartheta }_{1}>0, \\ $[--a,a]$, ~{\mathrm{ if}} ~\bar {\vartheta }_{1}=0, \\ -\dfrac {\lambda }{2}, ~{\mathrm{ if}} ~\bar {\vartheta }_{1} < 0 \end{array}}\right \},\tag{3}\end{align*}
Then, the parameter element \begin{align*} \bar {\vartheta }_{1}(t)=&\bar {\vartheta }_{1}(t-1)-r\left [{\frac {J_{1}(\bar { {\boldsymbol {\vartheta }}})}{\bar {\vartheta }_{1}}|_{\bar { {\boldsymbol {\vartheta }}}(t-1)}+\frac {J_{2}(\bar { {\boldsymbol {\vartheta }}})}{\bar {\vartheta }_{1}}|_{\bar { {\boldsymbol {\vartheta }}}(t-1)}}\right] \\=&\bar {\vartheta }_{1}(t-1)+r\bar {\phi }_{1}(t-1)[y(t)-\bar {\phi }_{1}(t)\bar {\vartheta }_{1}(t-1)-\cdots \\&-\,\bar {\phi }_{M+n+m}(t)\bar {\vartheta }_{M+n+m}(t-1)]-r\frac {J_{2}(\bar { {\boldsymbol {\vartheta }}})}{\bar {\vartheta }_{1}}|_{\bar { {\boldsymbol {\vartheta }}}(t-1)}.\end{align*}
\begin{align*}&\hspace {-1.5pc}\bar {\vartheta }_{1}(t) \\=&\bar {\vartheta }_{1}(t-1)-r\left [{\frac {J_{1}(\bar { {\boldsymbol {\vartheta }}})}{\bar {\vartheta }_{1}}|_{\bar { {\boldsymbol {\vartheta }}}(t-1)}+\frac {J_{2}(\bar { {\boldsymbol {\vartheta }}})}{\bar {\vartheta }_{1}}|_{\bar { {\boldsymbol {\vartheta }}}(t-1)}}\right] \\=&\left \{{\begin{array}{l} \bar {\vartheta }_{1}(t-1)+r\bar {\phi }_{1}(t)[y(t)-\bar { {\boldsymbol {\phi }}}(t)\bar { {\boldsymbol {\vartheta }}}(t-1)]-r\dfrac {\lambda }{2}, \\ \qquad \quad {\mathrm{ if}} ~m_{1}>\dfrac {\lambda }{2}, \\ 0, ~{\mathrm{ if}} ~m_{1}\in \left[{-\dfrac {\lambda }{2},\dfrac {\lambda }{2}}\right] \\ \bar {\vartheta }_{1}(t-1)+r\bar {\phi }_{1}(t)[y(t)-\bar { {\boldsymbol {\phi }}}(t)\bar { {\boldsymbol {\vartheta }}}(t-1)]+r\dfrac {\lambda }{2}, \\ \qquad \quad {\mathrm{ if}} ~m_{1} < -\dfrac {\lambda }{2} \end{array}}\right \},\end{align*}
\begin{align*}&\hspace {-0.5pc}m_{1}(t)=\bar {\phi }_{1}(t)[y(t)-\bar {\phi }_{2}(t)\bar {\vartheta }_{2}(t-1)-\cdots \\&-\,\bar {\phi }_{M+n+m}(t)\bar {\vartheta }_{M+n+m}(t-1)].\tag{4}\end{align*}
Therefore, we can get the redundant rule based lasso regression SG (RR-LR-SG) algorithm \begin{align*} \bar {\vartheta }_{i}(t)=&\left \{{\!\begin{array}{l} \bar {\vartheta }_{i}(t-1)+r_{i}\bar {\phi }_{i}(t)[y(t)-\bar { {\boldsymbol {\phi }}}(t)\bar { {\boldsymbol {\vartheta }}}(t-1)]-r_{i}\dfrac {\lambda }{2}, \\ \qquad \quad {\mathrm{ if}} ~m_{i}(t)>\dfrac {\lambda }{2}, \\ 0, ~{\mathrm{ if}} ~m_{i}(t)\in \left[{-\dfrac {\lambda }{2},\dfrac {\lambda }{2}}\right] \\ \bar {\vartheta }_{i}(t-1)+r_{i}\bar {\phi }_{i}(t)[y(t)-\bar { {\boldsymbol {\phi }}}(t)\bar { {\boldsymbol {\vartheta }}}(t-1)]+r_{i}\dfrac {\lambda }{2}, \\ \qquad \quad {\mathrm{ if}} ~m_{i} < -\dfrac {\lambda }{2} \end{array}\!}\right \}, \\ m_{i}(t)=&\bar {\phi }_{i}(t)[y(t)-\bar {\phi }_{1}(t)\bar {\vartheta }_{1}(t-1)-\cdots \\&-\,\bar {\phi }_{i-1}(t)\bar {\vartheta }_{i-1}(t-1) \\&-\,\bar {\phi }_{i+1}(t)\bar {\vartheta }_{i+1}(t-1)-\cdots \\&-\,\bar {\phi }_{M+n+m}(t)\bar {\vartheta }_{M+n+m}(t-1)].\end{align*}
Remark 2:
In the improved lasso SG algorithm, when the residual error
Remark 3:
In the improved lasso SG algorithm, each parameter element has its own direction and step-size. Thus it can update the parameters adaptively when the elements have different orders of magnitude.
Remark 4:
The improved lasso SG algorithm has faster convergence rates than the traditional SG algorithm for its adaptive property. However, its computational efforts are heavier than those of the traditional SG algorithm because it should compute
B. Step-Size Choosing
The step-size is important in the SG algorithm designing, a small step-size leads to a slow convergence rate, while a large one would get divergent results. Next, three step-size designing methods are given.
Method 1:
For simplicity, let
Rewrite the parameter element\begin{align*}&\hspace {-0.5pc}n_{j}(t)=\bar {\phi }_{i}(t)[y(t)-\bar {\phi }_{1}(t)\bar {\vartheta }_{1}(t-1)-\cdots \\&-\,\bar {\phi }_{M+n+m}(t)\bar {\vartheta }_{M+n+m}(t-1)].\tag{5}\end{align*} View Source\begin{align*}&\hspace {-0.5pc}n_{j}(t)=\bar {\phi }_{i}(t)[y(t)-\bar {\phi }_{1}(t)\bar {\vartheta }_{1}(t-1)-\cdots \\&-\,\bar {\phi }_{M+n+m}(t)\bar {\vartheta }_{M+n+m}(t-1)].\tag{5}\end{align*}
\begin{align*} \bar {\vartheta }_{i}(t)=\left \{{\begin{array}{l} \bar {\vartheta }_{i}(t-1)+r_{i}\left({n_{j}(t)-\dfrac {\lambda }{2}}\right), ~{\mathrm{ if}} ~m_{i}(t)>\dfrac {\lambda }{2}, \\ 0, ~{\mathrm{ if}} ~m_{i}(t)\in \left[{-\dfrac {\lambda }{2},\dfrac {\lambda }{2}}\right] \\ \bar {\vartheta }_{i}(t-1)+r_{i}\left({n_{j}(t)+\dfrac {\lambda }{2}}\right), ~{\mathrm{ if}} ~m_{i} < -\dfrac {\lambda }{2} \end{array}}\right \}.\end{align*} View Source\begin{align*} \bar {\vartheta }_{i}(t)=\left \{{\begin{array}{l} \bar {\vartheta }_{i}(t-1)+r_{i}\left({n_{j}(t)-\dfrac {\lambda }{2}}\right), ~{\mathrm{ if}} ~m_{i}(t)>\dfrac {\lambda }{2}, \\ 0, ~{\mathrm{ if}} ~m_{i}(t)\in \left[{-\dfrac {\lambda }{2},\dfrac {\lambda }{2}}\right] \\ \bar {\vartheta }_{i}(t-1)+r_{i}\left({n_{j}(t)+\dfrac {\lambda }{2}}\right), ~{\mathrm{ if}} ~m_{i} < -\dfrac {\lambda }{2} \end{array}}\right \}.\end{align*}
Substituting the above equation into the cost function
and taking the derivative ofJ(\bar { {\boldsymbol {\vartheta }}}) with respect toJ(r_{i}) yieldr_{i} \begin{align*} r_{i}(t)=\left \{{\!\begin{array}{l}\dfrac {0.5\lambda -\bar {\phi }_{i}(t)n_{i}(t)}{\bar {\phi }^{2}_{i}(t)(n_{i}(t)-\lambda)}, ~{\mathrm{ if}} ~m_{i}(t)>\dfrac {\lambda }{2}, \\ \dfrac {\bar {\phi }_{i}(t)n_{i}(t)-0.5\lambda }{\bar {\phi }^{2}_{i}(t)(n_{i}(t)+\lambda)}, ~{\mathrm{ if}} ~m_{i} < -\dfrac {\lambda }{2}\end{array}\!}\right \}.\end{align*} View Source\begin{align*} r_{i}(t)=\left \{{\!\begin{array}{l}\dfrac {0.5\lambda -\bar {\phi }_{i}(t)n_{i}(t)}{\bar {\phi }^{2}_{i}(t)(n_{i}(t)-\lambda)}, ~{\mathrm{ if}} ~m_{i}(t)>\dfrac {\lambda }{2}, \\ \dfrac {\bar {\phi }_{i}(t)n_{i}(t)-0.5\lambda }{\bar {\phi }^{2}_{i}(t)(n_{i}(t)+\lambda)}, ~{\mathrm{ if}} ~m_{i} < -\dfrac {\lambda }{2}\end{array}\!}\right \}.\end{align*}
Remark 5:
Method 1 computes the step-size by solving the derivative function of
, it is the same as the steepest descent SG algorithm in [26].J(r_{i}) Method 2:
Consider that the cost function
is used to chose the small weights from the parameter vector. When computing the step-size, we can neglect its influence.J_{2}(\bar { {\boldsymbol {\vartheta }}}) Substituting the parameter estimates into the cost function and taking the derivative of
with respect toJ(r_{i}) yieldr_{i} Since a smaller\begin{equation*} r_{i}(t)=\frac {1}{\bar {\phi }^{2}_{i}(t)}.\end{equation*} View Source\begin{equation*} r_{i}(t)=\frac {1}{\bar {\phi }^{2}_{i}(t)}.\end{equation*}
may yield a larger step-size, the algorithm can be oscillated intensively. To overcome these difficulty, we introduce a small constant\bar {\phi }_{i}(t) . Then the step-size can be written by\rho It is noted that when the estimates are closing the true values, the step-size becomes smaller and smaller, another kind of step-size is\begin{equation*} r_{i}(t)=\frac {1}{\rho +\bar {\phi }^{2}_{i}(t)}.\end{equation*} View Source\begin{equation*} r_{i}(t)=\frac {1}{\rho +\bar {\phi }^{2}_{i}(t)}.\end{equation*}
In this case, the algorithm is more stable.\begin{equation*} r_{i}(t)=\frac {1}{\sum \limits _{j=1}^{t-1}\bar {\phi }^{2}_{i}(j)}.\end{equation*} View Source\begin{equation*} r_{i}(t)=\frac {1}{\sum \limits _{j=1}^{t-1}\bar {\phi }^{2}_{i}(j)}.\end{equation*}
Method 3:
Inspired by the intelligent search method, we can use many step-sizes to get a better parameter estimate.
For example, when computing the step-size
, we first give an intervalr_{i}(t) and choose[0,r^{max}_{i}(t)] different step-sizesN from this interval. Substituting theser^{j}_{i}(t), j=1,2,\cdots,N step-sizes into the parameter estimates getsN Then their corresponding\begin{align*} \bar {\vartheta }^{j}_{i}(t)=\left \{{\begin{array}{l} \bar {\vartheta }_{i}(t-1)+r^{j}_{i}\left({n_{j}(t)-\dfrac {\lambda }{2}}\right), ~{\mathrm{ if}} ~m_{i}(t)>\dfrac {\lambda }{2}, \\ 0, ~{\mathrm{ if}} ~m_{i}(t)\in \left[{-\dfrac {\lambda }{2},\dfrac {\lambda }{2}}\right] \\ \bar {\vartheta }_{i}(t-1)+r^{j}_{i}\left({n_{j}(t)+\dfrac {\lambda }{2}}\right), ~{\mathrm{ if}} ~m_{i} < -\frac {\lambda }{2}\end{array}}\right \}.\end{align*} View Source\begin{align*} \bar {\vartheta }^{j}_{i}(t)=\left \{{\begin{array}{l} \bar {\vartheta }_{i}(t-1)+r^{j}_{i}\left({n_{j}(t)-\dfrac {\lambda }{2}}\right), ~{\mathrm{ if}} ~m_{i}(t)>\dfrac {\lambda }{2}, \\ 0, ~{\mathrm{ if}} ~m_{i}(t)\in \left[{-\dfrac {\lambda }{2},\dfrac {\lambda }{2}}\right] \\ \bar {\vartheta }_{i}(t-1)+r^{j}_{i}\left({n_{j}(t)+\dfrac {\lambda }{2}}\right), ~{\mathrm{ if}} ~m_{i} < -\frac {\lambda }{2}\end{array}}\right \}.\end{align*}
cost functions are yieldedN . The best estimate is the one satisfiesJ(\vartheta ^{j}_{i}(t)) \begin{equation*} \bar {\vartheta }_{i}(t)=\arg \min _{\bar {\vartheta }^{j}_{i}(t)}[J(\bar {\vartheta }^{j}_{i}(t)), j=1,\cdots,N].\end{equation*} View Source\begin{equation*} \bar {\vartheta }_{i}(t)=\arg \min _{\bar {\vartheta }^{j}_{i}(t)}[J(\bar {\vartheta }^{j}_{i}(t)), j=1,\cdots,N].\end{equation*}
Remark 6:
In method 3, the choice of the upper bound
Remark 7:
For a small upper bound, we usually assign
Remark 8:
Since the intelligent search method should compute several cost functions in each iteration, its computational efforts are heavy.
C. Properties of the Algorithm
Define \begin{equation*} \bar {\vartheta }_{i}(t)-\bar {\vartheta }_{i}=e_{i}(t).\end{equation*}
\begin{align*}&\hspace {-1.2pc}e_{i}(t) \\=&\left \{{\begin{array}{l} e_{i}(t-1)-r_{i}\bar {\phi }_{i}(t)[\bar { {\boldsymbol {\phi }}}^{ {\rm\text { T}}}(t)E(t-1)-v(t)]-r_{i}\dfrac {\lambda }{2}, \\ \qquad \quad {\mathrm{ if}} ~m_{i}(t)>\dfrac {\lambda }{2}, \\ 0, ~{\mathrm{ if}} ~m_{i}(t)\in \left[{-\dfrac {\lambda }{2},\dfrac {\lambda }{2}}\right] \\ e_{i}(t-1)-r_{i}\bar {\phi }_{i}(t)[\bar { {\boldsymbol {\phi }}}^{ {\rm\text { T}}}(t)E(t-1)-v(t)]+r_{i}\dfrac {\lambda }{2}, \\ \qquad \quad {\mathrm{ if}} ~m_{i}(t) < \dfrac {\lambda }{2}\end{array}}\right \}. \\\tag{6}\end{align*}
For simplicity, we only consider the first equation in (6). The parameter vector error can be written by \begin{align*} E(t)\!= \!E(t\!-\!\!1)\!-\!R(t\!-\!1)\bar { {\boldsymbol {\phi }}}(t)[\bar { {\boldsymbol {\phi }}}^{ {\rm\text { T}}}\!(t)E(t\!-\!1)\!-\!v(t)]\!- \! R(t\!-\!\!1) {\boldsymbol {\lambda }},\!\!\!\!\! \\\tag{7}\end{align*}
\begin{align*} R(t-1)=&{\mathrm{ diag}}[r_{1}(t-1),r_{2}(t-1),\cdots,r_{M+n+m}(t-1)], \\ {\boldsymbol {\lambda }}=&\left[{\frac {\lambda }{2},\frac {\lambda }{2},\cdots,\frac {\lambda }{2}}\right]^{M+n+m}.\end{align*}
\begin{equation*} E(t)= [I-R(t-1)\bar { {\boldsymbol {\phi }}}(t)\bar { {\boldsymbol {\phi }}}^{ {\rm\text { T}}}(t)]E(t-1)-R(t-1) {\boldsymbol {\lambda }}.\tag{8}\end{equation*}
\begin{equation*} \varrho _{max}[I-R(t-1)\bar { {\boldsymbol {\phi }}}(t)\bar { {\boldsymbol {\phi }}}^{ {\rm\text { T}}}(t)] < 1,\end{equation*}
Taking the expectation on both sides of Equation (7) obtains \begin{align*} { \boldsymbol E}[E(t)]=&{ \boldsymbol E}\{[I\!-\!R(t\!-\!1)\bar { {\boldsymbol {\phi }}}(t)\bar { {\boldsymbol {\phi }}}^{ {\rm\text { T}}}(t)]E(t\!-\!1)\}\!-\!{ \boldsymbol E}[R(t\!-\!1) {\boldsymbol {\lambda }}] \\=&R(t-1) {\boldsymbol {\lambda }}\neq {\mathbf{0}},\end{align*}
Remark 9:
The above equation shows that the redundant rule based lasso SG algorithm is a biased algorithm.
Example
Consider the following time-delayed model, \begin{align*} y(t)=&0.12z^{-1}y(t\!-\!\tau -1)\!+\![1.2z^{-1}\!+\!0.85z^{-2}]u(t)+v(t), \\ u(t)\backsim&N(0,1), \quad v(t)\backsim N(0,0.1^{2}).\end{align*}
\begin{align*} \bar { {\boldsymbol {\vartheta }}}=&[\bar {\vartheta }_{1},\bar {\vartheta }_{2},\bar {\vartheta }_{3},\bar {\vartheta }_{4},\bar {\vartheta }_{5},\bar {\vartheta }_{6},\bar {\vartheta }_{7}]^{ {\rm\text { T}}} \\=&[{0,0.12,0,0,0,1.2,0.85}]^{ {\rm\text { T}}}, \\ {\boldsymbol {\phi }}(t)=&[y(t-1),y(t-2),y(t-3),y(t-4),y(t-5), \\&~u(t-1),u(t-2)]^{ {\rm\text { T}}}.\end{align*}
To show the role of the constants
In addition, we apply the RR-LR-SG algorithm for the considered model with different signal-to-noise ratios (SNR). The parameter estimation errors are shown in Figure 2. Finally, the step-size choosing methods proposed in Section IV are applied for the time-delayed model, the parameter estimation errors are shown in Figure 3.
From this example, the following findings can be obtained:
The parameter estimates by using the SG algorithm can asymptotically converge to a stable point but not the optimal point that is because the step-sizes are approaching to zero with the increased sampling instant
, as shown in Table 1 and Figure 1.t The parameter estimates by using the RR-LR-SG algorithm are more accurate than those of the SG algorithms, for the reason that the RR-LR-SG algorithm updates the parameter elements one by one with adaptive step-sizes and directions, as shown in Table 1 and Figure 1.
The RR-LR-SG algorithm can pick out the redundant elements from the parameter vector, while the SG algorithm cannot, as shown in Table 1.
A larger
can lead to more accurate time-delay estimate but less accurate parameter estimates, this is demonstrated in Table 2.\lambda When using the RR-LR-SG algorithm for the proposed time-delayed model, a larger SNR leads to more accurate parameter estimates, see Figure 2.
Figure 3 shows that the intelligent search method (Method 3) has the best step-size, then is Method 2, and Method 1 has the poorest step-size.
Conclusion
An improved redundant rule based lasso regression stochastic gradient algorithm is proposed for time-delayed model in this study. The proposed algorithm has two advantages when compared with the traditional SG algorithm: (1) each element in the parameter vector can be adaptively updated (has different step-sizes and directions); (2) the parameter and time-delay estimates can be obtained simultaneously. The convergence analysis and simulation example are given to show that the proposed algorithm is effective.
Although the proposed algorithm can get the estimates adaptively, it remains some challenging and interesting problems. For example, the choice of the values of the constant