A Class of Diffusion Zero Attracting Stochastic Gradient Algorithms With Exponentiated Error Cost Functions

In this paper, a class of diffusion zero-attracting stochastic gradient algorithms with exponentiated error cost functions is put forward due to its good performance for sparse system identification. Distributed estimation algorithms based on the popular mean-square error criterion have poor behavior for sparse system identification with color noise. To overcome this drawback, a class of stochastic gradient least exponentiated (LE) algorithms with exponentiated error cost functions were proposed, which achieved a low steady-state compared with the least mean square (LMS) algorithm. However, those LE algorithms may suffer from performance deterioration in the spare system. For sparse system identification in the adaptive network, a polynomial variable scaling factor improved diffusion least sum of exponentials (PZA-VSIDLSE) algorithm and an <inline-formula> <tex-math notation="LaTeX">$l_{\mathrm {p}}$ </tex-math></inline-formula>-norm constraint diffusion least exponentiated square (LP-DLE2) algorithm are proposed in this work. Instead of using the <inline-formula> <tex-math notation="LaTeX">$l_{1}$ </tex-math></inline-formula>-norm penalty, an <inline-formula> <tex-math notation="LaTeX">$l_{\mathrm {p}}$ </tex-math></inline-formula>-norm penalty and a polynomial zero-attractor are employed as a substitution in the cost functions of the LE algorithms. Then, we perform mean behavior model and mean square behavior modal of the LP-DLE2 algorithm with several common assumptions. Moreover, simulations in the context of distributed network sparse system identification show that the proposed algorithms have a low steady-state compared with the existing algorithms.


I. INTRODUCTION
Distributed adaptive estimation over wireless sensor networks is an emerging research field in adaptive signal processing and has recently obtained significant attention. Distributed adaptive estimation can deal with the information using the collected data at nodes deployed in different locations, and it has been applied in environment monitoring, precision agriculture, source location and tracking, and disaster relief management [1]- [3]. Moreover, it can estimate parameters by cooperation and information exchange of interconnected nodes in the adaptive network. There are two strategies in the previous work, namely the incremental strategy and the diffusion strategy. The network structure of the incremental strategy is that the node only communicates with its adjacent node, so its structure is based on cycle [4].
The associate editor coordinating the review of this manuscript and approving it for publication was Theofanis P. Raptis . However, the structure is unstable which can easily fail if one of the nodes is broken. Moreover, the diffusion strategy is more stable and it is widely used in the distributed estimation [5]. Due to its robustness, each node in the network can communicates with all its neighbors and fuses the local estimation by a combination manner. The diffusion strategy also contains combination stage and adaptive stage. Due to these two stages, the combine-then-adapt (CTA) diffusion strategy and the adapt-then-combine (ATC) diffusion strategy were put forward [6]. In fact, the performance of the ATC-type outperforms that of the CTA-type [7]- [9].
In the family of gradient descent (GD) algorithms, the least mean square algorithm was first proposed in [10]. Then, an increasing number of general GD algorithms with better performances for system identification were found [11], [12]. So, a class of diffusion gradient descent algorithms was put forward, and the diffusion least-mean-square (DLMS) algorithm is seen frequently. In [13], a class of stochastic VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ gradient least exponentiated (LE) algorithms with exponentiated error cost function was proposed. The least exponentiated square (LE2) algorithm form the LE algorithms had a low steady-state for colored inputs and the least sum of exponentials (LSE) algorithm exhibited good robustness. However, the family of gradient descent algorithms suffers from performance deterioration in the sparse system. To overcome this problem, the zero attractors were employed in cost function for sparse system identification.
In practical application, the sparse system is usually seen in acoustic paths in hearing aids and active noise control systems. Sparse systems are the ones in which most of the coefficients of the impulse response is either zero or close to zero [14]. However, most of the traditional algorithms don't notice sparsity. Motivated by compressive sensing methods, one of the popular classes of sparse LMS algorithms are the ones which employ the concept of zero attraction. In the sparse algorithms based on a zero attractor, if most of the coefficients are zeros, the smaller coefficients of the model are forced to zero by the zero attractor. In this task, l 1 -norm penalty was utilized in the quadratic LMS cost function. In addition, attempts have been made to employ l p -norm (0 ≤ p ≤ 1) instead of the l 1 -norm. However, the zero attracting sparse algorithms have a limitation in which the zero attractor promote zero on all taps uniformly [15]. Thus, the reweighted zero-attracting LMS (RZA-LMS) algorithm was proposed with reweighted step sizes of the zero attractor for different taps. Advance in reweighted zero attracting methods have resulted in improving the limitation of sparse algorithms in the recent years, the polynomial sparse algorithm is dependent on the proper selection of a set of algorithm parameters and have a similar performance of l 0 -norm [16].
So far, the zero attracting methods are rarely utilized in distributed estimation. Although the zero attracting DLMS (ZA-DLMS) algorithm and the reweighted zero attracting DLMS (RZA-DLMS) algorithm were usually seen, the diffusion zero attracting algorithms can be further improved. Thus, in this paper an l p -norm constraint diffusion least exponentiated square (LP-DLE2) algorithm and a polynomial variable scaling factor improved diffusion least sum of exponentials (PZA-VSIDLSE) algorithm are proposed. By taking advantage of the merit of the l p -norm and polynomial sparse algorithm, we applied it into the diffusion LE2 algorithm and the diffusion improved LSE algorithm. In addition, a variable scaling factor is employed in the improved LSE algorithm, which has a fast convergence rate and a low steady-state [17]. Extensively, we develop their ATC and CTA variants, namely the ATC-LP-DLE2, ATC-PZA-VSIDLSE, CTA-LP-DLE2 and CTA-PZA-VSIDLSE. Moreover, we conducted the mean behavior model and mean square behavior model for the ATC-type algorithms of the LP-DLE2 algorithm. Simulation results demonstrate that the proposed algorithms have a lower steady-state as compared to the performances of several exiting algorithms under different environments. Notation: We use normal letters to denote scalars and use boldface letters to denote vectors or matrices. The mathematical notations used in this paper are summarized in Table 1.

II. REVIEW OF THE RELATED ALGORITHMS
where d(i) denotes the output random scalar and u i = [u(i), u(i − 1), . . . , u(i − L + 1)] T is the L-dimensional input vector. We define the error signal e i = d(i) − u T i w i−1 , where w i is the estimate of the unknown weight vector and v(i) is the background noise.
Given the data modal, the objective of the least exponentiated square algorithm is derived by minimizing a differentiable cost function with exponential error, which can be formulated as [13] Based on the gradient vector of J e2 (i), the filter weight update formula using the steepest-descent method is given as However, the step size of the LE2 algorithm is restrictive. To overcome this problem, an attempt to minimize the a posterior error at every iteration was been made resulting in a normalized step size. Hence, the normalization of the step size is obtained as [13] µ e2 (i) = µ e2 and the range of the normalized step size is 0 ≤ µ e2 ≤ 2. A positive multiplicative factor α is introduced to control the steepness of the error surface.

B. THE LEAST SUM OF EXPONENTIAIS (LSE) ALGORITHM
The LSE algorithm is one of the classes of least exponential algorithms, which is derived by minimizing a similar cost function with the hyperbolic cosine of the adaptive error. The three different stochastic cost functions, namely the mean-square cost function, the exponential square cost function and the hyperbolic cosine cost function are compared in Fig. 1. We can observe that both the exponential square cost function and the hyperbolic cosine cost function is steeper for larger error values while the surface of the hyperbolic cosine cost function is between the mean-square cost function and the exponential square cost function.
Then, the cost function of the least sum of exponentials algorithm is given by [13] Obviously, the hyperbolic cosine cost function is linear combination of all adaptive error with even moment, which has more information in the LSE algorithm. Thus, according to the steepest descent method, the weight update rule of the cost function can be expressed as where µ is a fixed step size. In the same way as for the least exponentiated square (LE2) algorithm, a normalization of the step size can be obtained as and the range of µ se is also the same with that of the LE2 algorithm [13]. In addition, a positive factor α is introduced to control the effect of these exponential terms.

III. PROPOSED ALGORITHMS
A. THE l p -NORM CONSTRAINT DIFFUSION LE2 ALGORITHM Consider the system identification problem in the adaptive networks, the network is composed of k ∈ N nodes where each node measures its data d k (i), u k,i to estimate the unknown weight w o of length L through a linear model: where d k (i) is the output signal observation of each node, the input vector represents the observation noise in the adaptive networks. The error signal is defined as where w k,i−1 is the estimate of the weight vector w o at time i-1 and node k.
Assuming that the set of nodes combined with the node k (including itself) is N k , its neighboring nodes can share information with each other by the network topology. To derive the LP-DLE2 algorithm, the local cost function can be formulated as a linear combination of an l p -norm penalty on the coefficients and the weighted least exponential difference.
where a l,k denotes the combination rule, γ represents the positive factor to control its zero attractor and · p is the l p -norm. Note that the combination rule a l,k satisfies where is a N × N matrix with individual entries a l,k . Using the steepest-descent method at node k, we can obtain the following recursion According to the linear combination assumption, the estimate w k,i−1 can be rewritten as where ϕ l,i−1 is the local estimates at node l and at time i − 1.
In addition, ||w|| p denotes the l p -norm of the weight vector w, which is defined as ||w|| p = ( of generality [14], [17], |w(m)| p can be replaced with the soft parameter function where β is an assumptive factor to control p of ||w|| p . So the approximation of F β [w(m)] at the extremes β = ∞ and β = 0 equals |w(m)| p with p = 0 and p = 1, respectively. Then, introducing (13) and (14) to (12), an iterative formula for the intermediate estimate is obtained.
where ρ is a positive factor. However, it is not available for the linear combination estimate w k,i−1 to require gradient calculation at node l. Considering the approximation in [18], the difference between w k,i−1 and w l,i−1 is not too large. Also, both are linear-combination estimate of w o at instant i-1. Thus, the updated formula of l p -norm constraint diffusion LE2 (LP-DLE2) algorithm can be obtained by transforming the steepest-descent type as follows where the local estimate ϕ k,i is replaced by linear combination w k,i−1 . In this way, the linear combination has more information from neighbor nodes than ϕ k,i−1 . Considering two-step iteration in the diffusion strategy, we can summarize its ATC and CTA forms as in Table 2.

B. THE POLYNOMIAL VARIABLE SCALING FACTOR DIFFUSION ILSE ALGORITHM
In [17], a variable scaling factor strategy is employed in the improved LSE algorithm, which provides a faster convergence rate and smaller steady-state error. Inspired by the variable step size method, a variable scaling factor λ i is introduced in the diffusion improved least sum of exponentials (ILSE) algorithm. Moreover, the polynomial sparse adaptive algorithm is utilized in the diffusion ILSE algorithm for the sparse system identification. The polynomial zero attractor can be achieved by using an l 0 -norm in conjunction with an l 2 -norm in the objective function of the system identification problem, and its modelling accuracy, inspired by the RZA-LMS, is constructed on a set of algorithm parameters from the proper selection. Thus, a new cost function with the network topology is proposed where α ≥ 1 is a power factor to select the proper parameters, and the variable scaling factor λ k (i) is similar to variable step size at node k and time i. Its scheme is given as [17] where λ k (i) ∈ [λ min , λ max ]. According to [17], when e k,i is larger, the cost function with larger λ has a larger gradient, resulting in a faster convergence rate; otherwise, the cost function with smaller λ has a flat gradient curve, leading to the improvement of filter stability. Thus, the initial value closes to λ max must be chosen for λ k (i) to achieve a fast convergence rate, and 0 < β < 1 is a positive factor. Using the same method in the LP-DLE2 algorithm, the update formula for polynomial variable scaling factor improved diffusion LSE (PZA-VSIDLSE) algorithm can be obtained as However, the diffusion strategy contains ATC type and CTA type and the derivation of the PZA-VSIDLSE algorithm is like that of the LP-DLE2 algorithm. Hence, we can obtain the update equations for the ATC-PZA-VSIDLSE and CTA-PZA-VSIDLSE algorithms, and they are summarized in Table 3.

IV. PERFORMANCE ANALYSIS
In this section, we perform the transient behavior analysis for the LP-DLE2 algorithm. Since the proposed LP-DLE2 and the proposed PZA-VSIDLSE algorithms have similarities in term of analysis, so we only view the LP-DLE2 algorithm. Since both the ATC and CTA algorithms are similar in analysis, we only provide analysis for the ATC-type algorithm. To make analysis tractable, the unified model of the LP-DLE2 algorithm is rewritten as where for the ATC-LP-DLE2 algorithm. First, some necessary assumptions and approximations are introduced. Assumption 1: The repressor u k,i is temporally and spatially independent and identically distributed (i.i.d.) with zero mean [5], [6], [19].
Assumption 2: The observed noise v k (i) is i.i.d. with zero mean and variance σ 2 v k . In addition, it is independent with u k,i [1], [20].
Assumption 3:w k,i (m) denotes the m-th entry of the weight error vector at node k and time i. So it meets the Gaussian distributionw k,i (m) ∼ N (µ k,i (m), σ 2 k,i (m)) with mean µ k,i (m) and variance σ 2 k,i (m). In this way, the m-th entry of the weight vector w k,i follows Gaussian distribution as [21]- [23], [25], [26] k,i (m)) Approximation 1: When m = n, we can consider the approximations - [23]. Approximation 2: When the waves of w k,i (m) is too small from one iteration to the next iteration, we can make the approximations as follows [23], [24]  i ] can be approximated [25], [26] exp[e 2 i ] = I + 1 1! e 2 i + 1 2! e 4 i + o(·) ≈ I + e 2 i with the Peano surplus.
Remark 1: Assumptions 1-3 can be used in analyzing the adaptive filtering algorithms successfully. Moreover, Approximations 1-3 have the desired calculations for non-linear adaptive filtering algorithms, especially under Gaussian input signals. Furthermore, it is feasible for using these approximations to predict the behaviors of the proposed LP-DLE2 algorithm.  In addition, the error vector, the noise vector and the desired vector of the network are defined as Further the step size µ, the positive factor ρ and the combination rule a l,k are defined as M diag{µ, . . . , µ}, ρ s diag{ρ, . . . , ρ} and N ×N . Consider the error vector e i = U T iw i−1 + v i , the adaptation stage in (20) can be rewritten asφ where Q = M ⊗ I L and ρ = ρ s ⊗ I L . Introducing (30) to (20), the weight vector is rewritten as where P = ⊗ I L . According to the Assumptions 1-2, the expectation of the weight vector can be given as where E[U T i U i ] = S ⊗ I L , and Sdiag{σ 2 u 1 , σ 2 u 2 , . . . , σ 2 u N }. According to the Approximation 3 and the error vector e i , we can obtain , φ(·) denotes the cumulative distribution function (CDF) of the standard normal distribution, erf(·) is the error function that is defined as In the same way, E|w k,i−1 (m)| is defined as

B. MEAN SQUARE BEHAVIOR MODEL
Multiplying both sides of (32) byw T i gives Invoking Assumption 1, Assumption 2 and taking the expectations of both sides of (36), we obtain To complete the analysis, the Kronecker product operation is introduced. Thus, we have vec(XYZ ) = (Z T ⊗ X )vec(Y ) where the arbitrary matrices {X , Y , Z } are compatible dimensions. Employing this operation in (37), the new equation will be yielded vec(W i ) where Then, under Assumption 1, Assumption 2 and Assumption 3, the focus is to calculate several expectations in (40), ) by using Approximation 1 and Approximation 2.

V. COMPUTER SIMULATIONS
In this section, the proposed LP-DLE2 and PZA-VSIDLSE algorithms are simulated for the sparse system identification. The adaptive filter and the unknown system are assumed to have the same number of taps. Consider the proposed algorithms in the adaptive network, the 10 log 10 MSD net,i is to test the performances of those algorithms [27], [28]. In this work, the uniform rule is used in the simulations, which is defined as a l,k = 1/ |N k | for all l. The results of all tests are obtained by averaging 100 independent trails.

A. SYNTHETIC SPARE SYSTEM
We define the network with 20 nodes, shown in Fig. 2 [27]. The parameter vector of the unknown time-varying system w o has L =16 taps. Initially, only one coefficient of the unknown system is set to 1, and the others are set to 0. So, the sparse ratio of the unknown system is 1/16. Notably, the position of its only one coefficient is random and the other coefficients are equal. In this section, both the Gaussian inputs and colored inputs are considered to examine the algorithms. However, different nodes have different inputs and background noises, where the variance of the Gaussian inputs and background noises are shown in Fig. 3. The colored inputs are generated by passing the Gaussian inputs through a first-order system H 1 (z) = 1 1 − 0.3z −1 .

B. SYSTEM IDENTIFICATION
First, the network shown in Fig. 2 is employed in the next experiments. In this section, two experiments are tested under the Gaussian inputs and colored inputs selectively. Also, another experiment with the impulsive noise is to test the robustness of the proposed algorithms. The unknown system is the same as that defined in Section 5.1. Experiment 1. In the first set of simulations, the Gaussian inputs are considered where the background noise is an independent white Gaussian noise whose variances are given in Fig. 3. To achieve the results of the experiment easily, we test the proposed ATC-type algorithms and its CTA-type algorithms respectively. Then, the proposed LP-DLE2 and the PZA-VSIDLSE algorithms are compared with the diffusion LSE algorithm and the reweighted ZA-DLMS algorithm. In order to fairly compare all algorithms, all the algorithms were initialized in such a way that they have the same initial convergence rate. The filter parameters of the proposed LP-DLE2 algorithm are set as µ e2 = 0.3; β = 5; ρ = 0.005; α = 1.2, the filter parameters of the proposed PZA-VSIDLSE algorithm are set as µ se = 1.2; α = 1.2; α = 6; β = 0.996; γ = 0.0001 and ρ = 0.005; λ = 1.5. The MSD performance curves results are shown in Fig. 4 and Fig. 5.  Experiment 2. In this example, computer simulations are conducted to verify the performance of the proposed algorithm for AR (1) input signal that is colored signal. As for the colored inputs are the same as that in Section 4.1. Under the same filter parameters, the step size of the proposed LP-LE2 and PZA-VSIDLSE algorithms are µ e2 = 0.18 and µ se = 0.8. The MSD curves for the ATC-type and the CTA-type are shown in Fig. 6 and Fig. 7. Under the colored inputs, the steady-state misalignments of the proposed ATC-LP-LE2 and ATC-PZA-VSIDLSE algorithms are -35 dB and -34 dB from Fig. 6. Moreover, the  steady-state misalignments of those CTA-type algorithms are -31 dB and -29 dB in Fig. 7. It is clearly observed that the convergence performance of the LP-DLE2 and PZA-VSIDLSE algorithms are not severely disturbed by colored signal, and the proposed LP-DLE2 and PZA-VSIDLSE algorithms achieve improved performance. Experiment 3. In this work, the impulsive noise is also added to the output and the desired signal d k (i) is disturbed with a signal-to-interference ratio of -10 dB at nodes k. The impulsive noise is modelled by the Bernoulli-Gaussian (BG) distribution, i.e., ϑ k (n) = c k (n)A k (n), where A k (n) is a zeromean white Gaussian random sequence with variance σ 2 A k and c k (n) is a Bernoulli process with the probability density function defined by P(c k (n) = 1) = Pr k and P(c k (n) = 0) = 1 − Pr k . In this test, the value of Pr k is set to 0.01. Under the same filter parameters, the step size of the proposed LP-LE2 and PZA-VSIDLSE algorithms are µ e2 = 0.3 and µ se = 1.15. The results of the ATC-type and the CTA-type are depicted in Fig. 8 and Fig. 9.
In the non-Gaussian environment, the steady-state misalignments of the proposed ATC-LP-LE2 and ATC-PZA-VSIDLSE algorithms are -33 dB and -31 dB from Fig. 8. Moreover, the steady-state misalignments of those CTA-type algorithms are -28 dB and -25 dB in Fig. 9. It is clearly seen that the LP-DLE2 and PZA-VSIDLSE algorithms have better robustness than the other algorithms and achieve improved performance.

VI. CONCLUSION
In this paper, a class of diffusion zero attracting stochastic gradient algorithms with exponentiated error cost functions are proposed by incorporating the zero attractors and the variable step size method into the conventional diffusion cost function with exponentiated error. Due to the use of the zero-attractor and the variable scaling factor, the proposed algorithms provide improved performance with the colored signal in the sparse system. In addition, an l p -norm penalty and a polynomial zero-attractor provide better performance compared with the existing zero-attracting algorithms. Moreover, the method of variable step size is incorporated into the proposed PZA-VSIDLSE algorithm which is superior to the conventional DLSE algorithm. Simulations are implemented to verify the performances of the LP-DLE2 and PZA-VSIDLSE algorithms for the sparse system identification. The results of the experiments show that the proposed algorithms have good convergence performance and robustness.