Distributed Consensus Student-t Filter for Sensor Networks With Heavy-Tailed Process and Measurement Noises

In the estimation of distributed sensor networks, process noise and measurement noise may have outliers which have heavy-tailed characteristics. To solve this problem, this paper proposes a distributed consensus estimating method for sensor networks based on Student- <inline-formula> <tex-math notation="LaTeX">$t $ </tex-math></inline-formula> distribution. In the state space model, both process noise and measurement noise are modeled as Student- <inline-formula> <tex-math notation="LaTeX">$t $ </tex-math></inline-formula> distributions with heavy-tailed characteristics. First, for the assumption that the process noise and measurement noise have the same degree of freedom parameters, an exact distributed consensus Student- <inline-formula> <tex-math notation="LaTeX">$t $ </tex-math></inline-formula> filtering algorithm is derived. In practical applications, this assumption is often not true, and due to the increasing degrees of freedom, the method will quickly converge to the traditional distributed consensus Kalman filter. Therefore, it is necessary to relax the assumption of the same degree of freedom and keep the degree of freedom unchanged within a certain range. Based on this, an approximate distributed consensus Student- <inline-formula> <tex-math notation="LaTeX">$t $ </tex-math></inline-formula> filter algorithm is proposed. Simulation results verify the effectiveness of the proposed algorithm.

INDEX TERMS Student-t distribution, distributed consensus filter, distributed sensor networks. NOMENCLATURE N(·; m, ( ) −1 ) Gaussian distribution with mean m and precision matrix (·) T Matrix transpose manipulation (·) −1 Matrix inversion operation Iteration index (·) Gamma function x k|k − Distributed state estimation is important in distributed sensor networks [1], [2]. Due to the complex environment, the communication, perception and processing capabilities of the distributed sensor network will be limited. Thus, traditional data sharing and fusion methods are not applicable in this case. There are three major methods for distributed state estimation, such as consensus-based algorithms [3], gossipbased algorithms [4] and diffusion-based algorithms [5]. Consensus-based algorithms have capability to provide better performance in terms of estimate accuracy. Information sharing and interaction using the consensus method requires neither a fusion center nor a full connection between network nodes. Information interaction only occurs between neighboring nodes, and the information of all nodes can eventually be coordinated and consistent. This method is applicable to any network topology and can simultaneously improve the flexibility and robustness of the network.
The combination of consensus theory and Kalman filter is the most direct idea of consensus estimation. This combination can apply the consensus mechanism to the prediction or update step of Kalman filter, so as not to lose the basic characteristics of the Kalman filter. The consensus on estimation (CE) methods using consensus strategy for the state estimation of each node are proposed in [6]. To teal with the conservative characteristic of the estimation for the CE method, a consensus on measurement (CM) method which makes the local new information pair to reach consensus is proposed in [7]. However, this method needs large number of consistent steps to achieve convergence. A consensus on information (CI) method utilizing uniform local averaging of information matrices and information vectors is proposed in [8]. Many scholars have conducted in-depth research under this framework and achieved fruitful results [9]- [17].
Most of the consensus filters assume that the process and measurement noises are Gaussian distributions. However, in many practical situations, the process and measurement noises may suffer from outliers, which may come from unreliable sensors, unmodeled anomalies, sudden disturbance in the system environment, or target maneuvers [18], [19]. The Gaussian distribution assumption may cause poor performance or system failures in these situations. Thus the Student-t distribution with heavy-tailed is used to model the uncertainties exhibiting frequent occurrence of the outliers [20]- [24]. A linear distributed consensus filter with CI strategy to handle measurement outliers is proposed in [25], where the measurement noise of each sensor node is modeled by the multivariate Student-t distribution and variational Bayesian (VB) method [26], [27] is used to approximate the joint posterior density. Then it is extended to nonlinear cases in [28] where hybrid consensus strategy [9], [11] is used. However, these methods can only handle the scenarios with heavy-tailed measurement noise and well-behaved process noise. The particle filter can handle the process and measurement noises with arbitrary distributions [29], however, it suffers from curse of dimensionality in high dimensional problems. Gaussian sum filter (GSF) [30] can deal with heavy-tailed non-Gaussian noises but it needs a lot of Gaussian distributions to model the heavy-tailed process and measurement noises accurately. There are other robust methods such as WLAV (weighted least absolute value) filter [31] and MEAV (Maximum Exponential Absolute Value) filter [32] to deal with bad data. However, these methods need optimization or iteration procedure which may lead to a lot of extra computation. Recently, the robust Student-t filters for heavy-tailed process and measurement noises have been proposed in [19], [33]- [35] for single sensor. This method has less computational complexity, is easy to apply and can deal with high-dimensional problems.
A distributed consensus Student-t filter is presented in this paper to handle both heavy-tailed process and measurement noises for distributed sensor networks. The main contributions of this paper can be highlighted as follows.
1) Both state and measurement of each sensor node are modeled by Student-t distribution. Under certain assumptions, the distributed consensus Student-t filter is derived: the recursion predicted and updates steps for multiple sensors are derived first, then the CI strategy for Student-t distribution are derived based on moment matching. 2) Since the assumptions are too restricted for real scenario, some approximation are made to relax the assumptions for practical applications. In addition, the degree of freedom for Student-t distribution does not grow over time to maintain heavy-tailed characteristics.
3) The simulation are processed in scenario where process noise and measurement noise are heavy-tailed. The results show that the proposed method outperforms the conventional distributed consensus filter. The remainder of this paper is organized as follows. Section II describes models of the sensor network and the consensus method used in this paper. Section III presents our distributed consensus Student-t filter. Simulation results and analysis are given in Section IV and the conclusion is given in Section V.

II. PROBLEM FORMALIZATION
In this paper, we consider the sensor network represented by (N , A), where N denotes set of sensor nodes and A ⊆ N × N is the set of connections between nodes such that (i, j) ∈ A if node j can receive data from node i. The set Consider a discrete time linear system and measurement equations of the sensor node i ∈ N where x k is the n x -dimension state vector, F k is the state transition matrix, w k is the process noise, z i k is the d i z -dimension measurement vector of node i, H i k is the measurement matrix of node i, v i k is the measurement noise of node i. For the initial state x 0 , the process noise w k and the measurement noise v i k , we have the following assumptions. Assume that the initial state and the noise signal are not related to each other, and their marginal distributions are where St(·; m, P, ν) denotes Student-t distribution with mean m, scale matrix P and degree of freedom (DOF) ν. Thus, η 0 , γ k and δ i k are the DOF of related densities. P 0 , Q k and R i k are the scale matrix of related densities. It is difficult to obtain a globally closed solution by maximizing the likelihood of Student-t distribution, however, we can decompose Student-t distribution into a mixture of infinite Gaussian distributions, which have the same mean and precision. The Student-t distribution after decomposition can be expressed as follows: (6) where N (x; m, ( ) −1 ) is Gaussian distribution with mean m and precision matrix , G(u; a, b) is Gamma distribution with shape parameter a and scale parameter b. The probability density function (PDF) of Student-t distribution is given by where (·) denotes the Gamma function. It should be noted that P is not a covariance matrix in general while ν ν−2 P is the covariance for ν > 2. When ν = 1, it converges to the Cauchy distribution, while ν → ∞ it becomes the Gaussian distribution. Compared with the Gaussian distribution, the Student-t distribution has the heavy tails.
Consensus algorithm is the information exchange rule that ensures that the amount of concern of each node achieves consistency. The weighted Kullback-Leibler averagep(·) among the probability density function (PDF) {p i (·)} are given by [11] where π i > 0 is the weight and i∈N π i = 1, KL(p||p i ) is the Kullback-Leibler divergence between the PDFs p(·) and p i (·). The problem of probability density consensus can be treated as finding a consensus algorithm to make wherep(·) is the asymptotic PDF. Thus the solution to (8) is where π i = 1/|N |, the operators ⊕ and are given by Therefore, the solution can be obtained by exchanging the local data with the neighbors via convex combination in a iterative way where π i,j ≥ 0 is the consensus weight and j∈N i π i,j = 1, is the iteration index and iterations are initialized with The purpose of consensus filter in this paper is to obtain a consensus state of a sensor network with both heavy-tailed process noise and measurement noise.

III. PROPOSED METHODS
denotes measurements of all sensor nodes till time k and Z k = {z i k , i ∈ N } denotes measurements of all sensor nodes at time k. Similar to the distributed consensus Kalman filter, we divide the filter recursion into time update and measurement update.
Suppose DOF γ k = η k for all process noises, where η k is the DOF of x k |Z k , then we have The predicted density is Thus the parameter η k is not changed. VOLUME 8, 2020 We assume that all nodes have the same DOF in measurement noise distribution δ i k = δ k and δ k = η k−1 , then we can obtain the joint density of the predicted state and the measurement noise Then, the joint density of state and measurement can be obtained by linear transformation Thus, given all measurement Z k , the conditional density of the state is still Student-t distribution The above is the recursive process of obtaining all sensor measurements.

2) CONSENSUS FOR STUDENT-t DISTRIBUTION
If the PDF of each sensor node is Gaussian distribution such , then we can obtain the average PDF by averaging information vectors q i = ixi and information matrix i = i , and the following consensus algorithm are derived in [10] For the Student-t distribution, the exact form of consensus algorithm can hardly be obtained. It can be noted that the Student-t distribution St(x; m, P, ν) will converge to the Gaussian distribution as ν → ∞. Therefore, we can approximate the PDF p(x) = St(x; m, P k , ν) by a Gaussian distribution p (x) = N(x; m,P) ≈ St(x; m,P,ν) to take the advantage of the Gaussian consensus algorithm such as (23) and (24). That is where DOFν → ∞. Qualitative features should be retained in the problem of adjusting the matrix parameters given a new DOF. Therefore, the adjusted matrix parametersP should be scaled versions of the original matrix given byP = cP. As a general problem, we must find a scalar c > 0 so that the PDF p(x) and p (x) is close in some respects. Onceν is given, the parameter c can be found using moment matching method. According to moment matching method, we obtain the condition for ν > 2 andν → ∞, then the scale factor is given by According to the approximation, we can use the consensus steps (23) and (24) directly. After the the consensus steps, we should do some inverse operations to change the DOF back to ν, and specific steps are given in Sec. III-A3.

3) THE RECURSION OF EXACT DISTRIBUTED CONSENSUS STUDENT-t FILTER
According to the predicted and update steps in Sec. III-A1, and the consensus step in Sec. III-A2, then for each local sensor in the distributed sensor network, the following exact consensus Student-t filter recursive process can be obtained: (1) Prediction of local filter: (2) Update of local filter: (3) Consensus on the information matrix and information vector: Approximate the initial information matrix and information vector byP For a L-step consensus iteration, the consensus on posterior information is carried out in the form where = 1, 2, · · · , L is the consensus step, π i,j is the consensus weight. A convex combination is adopted by supposing π i,j ≥ 0 and j∈N i π i,j = 1. (4) Recover the state and the scale matrix: It can be noted that with each measurement update, the degrees of freedom increase according to (36). In turn, this requires an increase in the degree of freedom of noise, making the problem more and more Gaussian.

B. THE APPROXIMATED DISTRIBUTED CONSENSUS STUDENT-t FILTER
The conditions required in Section 3.1 will hardly be met in practice. Therefore, we introduce some approximations, and the resulting filtering algorithm is only slightly more complicated than the exact filter in (28)- (34). In addition, we also prevent an increase in degrees of freedom, thereby maintaining heavy-tailed density for the entire time.
Under more practical assumptions, we consider the linear models (1) and (2) again. At this time, the degrees of freedom γ k and δ k in equations (4) and (5) are arbitrary, therefore the closed form of time update and measurement update cannot be derived in closed form. Suppose at time k we have the following posterior density In the process of deducing the exact filter, we use a formula to express the joint density of x k and w k after Z k is given. If γ k = η k , then p(x k , w k |Z k ) will not be represented as a closed form unless w k is independent. However, for independent noise, the joint density is no longer Student-t distribution nor ellipsoid shape. Therefore, we cannot derive a concise time update equation. What leads to the convenient expression in the exact filter (28)-(34) is the joint density in (15) and (16). Therefore, it is a reasonable choice to force the actual state and noise density into this format. For time update, we suggest to find the common degree of freedom parameterη k from γ k to η k . Similar to Sec. 3.1, we use as an approximate Student-t distribution. Since the degrees of freedom have changed,P k−1 andQ k−1 replace the matrices P k−1 and Q k−1 , and the mean remains unchanged. The method for findingP k−1 andQ k−1 called moment matching have already been given in Sec. 3.1.2. Therefore, we can now apply a similar time update to (28)- (29). For measurement updates, a similar approximation is The two density approximations (46) and (47) extend the exact filter in Section 3.1 and provide a convenient closed form solution for time update and measurement update. The approximate density is still the t density, but it may change the DOF.
Assuming that the measurement of each sensor is independent of each other, for each local sensor in the distributed sensor network, the following approximate consensus Student-t recursion process can be obtained: (1) Prediction of local filter: The prediction update depends on the previous posterior parametersx k−1 , P k−1 and η k−1 . First, the following approximation is required: Then we havex (2) Update of local filter: Do the following approximation Then we havẽ (3) Consensus on the information matrix and information vector: Approximate the initial information matrices and information vectors bȳ For a L-step consensus iteration, the consensus on posterior information is carried out in the form VOLUME 8, 2020 where = 1, 2, · · · , L is the consensus step, π i,j is the consensus weight. A convex combination is adopted by supposing π i,j ≥ 0 and j∈N i π i,j = 1. (4) Recover the state and the scale matrix: Compared with the standard CI method, the growth of the computational complexity for the proposed method comes from calculating the i z,k and η i k for exact method. The approximated method needs more computational cost in approximate procedures such as (48) and (51). Therefore, the order of computational complexity of the proposed algorithm is the same as that of CI method. In addition, no local parameters such as i z,k and η i k are communicated among nodes, and only global parameters are communicated among neighboring nodes, the communication cost of the proposed method is the same as the standard CI method.
The Gaussian noise is one of the most common distribution in nature. The central limit theorem states that under appropriate conditions, the mean value of a large number of independent random variables converges to normal distribution according to the distribution after proper standardization. Under normal circumstances, the noise is generally Gaussian distribution, with occasional outliers which make the whole distribution of the noise have heavy-tailed feature. It can be seen from [36] that the abnormal values can also be expressed by Laplace distribution and Cauchy distribution. However, the Student-t distribution provides a heavy-tailed alternative to the Gaussian distribution while the shape of Student-t distribution is more similar to Gaussian distribution (see Fig. 1). When DOF tends to 1, the Student-t distribution becomes Cauchy distribution. Besides, the Student-t distribution leads to closed-form solution of the proposed filters. In addition, the Student-t based filter can deal with other heavy-tailed noises such as Laplace noise [22]. Therefore, we choose the Student-t distribution here to model the heavy-tailed process and measurement noises.

IV. SIMULATIONS
Here we consider a tracking problem in twodimensional plane. The target dynamic includes the state x = [p x ,ṗ x , p y ,ṗ y ] T , and the model is confirmed by x , w 2 y ]), sample time T = 1s, w 2 x = w 2 y = 0.1 and The true initial state of target is There are 20 sensor nodes in the sensor network (the topology is shown in Fig. 2). The measurement model is given by The heavy-tailed measurement noise is generated by a mixture of Gaussian with a nominal noise variance R and outliers with noise variance 100R. Suppose we have a nominal measurement noise variance R = diag([(10m) 2 , (10m) 2 ]), where p o is the probability of the measurement outliers. The heavy-tailed process noise is generated according to which is wildly used to evaluate the performance of Student-t based filters. This paper mainly compares the following three methods: (1) Distributed consensus Kalman filter (DCKF) in [10]; (2) exact Distributed consensus Student-t filter, referred to as DCSTF-E; (3) approximate distributed consensus Student-t filter, referred to as DCSTF-A. The simulation results were obtained through 100 Monte Carlo simulations, and the root mean square error (RMSE) of position and velocity was used to evaluate the simulation results.
The consensus step is L = 3 and The consensus weights of sensor nodes are set to When no outliers exist ( p o = 0 ), both process noise and measurement noise are Gaussian. Fig. 3 and Fig. 4 show the simulation results of the three methods without outliers. It can be seen from the figures that the RMSE of the DCKF and DCSTF-E methods are relatively close, and the RMSE of the DCSTF-A is slightly higher than the previous two methods. This means that in the absence of outliers, the performance of the DCKF and DCSTF-E methods is relatively close, while the performance of the DCSTF-A method is slightly lower.    method at the initial stage of the simulation is lower than that of the DCKF method, and the root mean square error of the two is closer as the simulation proceeds. This is because the DOF parameter in the DCSTF-E method increases with time and eventually converges to the DCKF method. This shows that the previous theoretical analysis and simulation results are consistent. The RMSE of the DCSTF-A method is much lower than the first two methods. This shows that in the presence of outliers, the performance of the DCSTF-A method is significantly better than the DCKF and DCSTF-E methods.
Tab. 1 and Tab. 2 give the simulation results of the three methods under different outlier probabilities. It can be seen from the tables that as the outlier probability increases, the RMSE of all methods increases accordingly. Under different outlier probabilities, the RMSE of the DCSTF-E method is slightly lower than that of the DCKF method, and the RMSE of the DCSTF-A method is significantly lower than VOLUME 8, 2020    the first two methods. This shows that the proposed distributed consensus Student-t filtering algorithms perform better than the distributed consensus Kalman filtering algorithm in the presence of outliers.
Tab. 3 and Tab. 4 give the simulation results of the three methods under different consensus iteration steps. It can be seen from the tables that with the increase of the number of consensus iterations, the RMSE of the three methods decreases accordingly. Similar to the previous method, the RMSE of the DCSTF-E method is slightly lower than that of the DCKF method, and the RMSE of the DCSTF-A method is significantly lower than the previous two methods. These further verify the effectiveness of the proposed algorithms.   In order to further illustrate the effectiveness of the proposed algorithms, we the compared proposed algorithms with the distributed variational Bayesian Student-t CI (DVBSCI) filter presented in [25]. Fig. 7 and Fig. 8 show the simulation results. It can be seen from the figures that the RMSE of the DVBSCI method gives the worst performance. This is because DVBSCI algorithm only models the outliers of measurement noise and does not consider the outliers of process noise. This further proves the effectiveness of the proposed algorithm considering outliers of both process noise and measurement noise.

V. CONCLUSION
In this chapter, a kind of consistent Student-t filter is proposed, which is used for both process noise and measurement noise. Firstly, the system model based on Student-t distribution is established. Based on the Student-t distribution, an accurate distributed consistent Student-t filter is proposed. Then, an approximate distributed consistent Student-t filter is proposed for the determination of the strong constraints of the filter and its convergence to the standard Kalman filter over time. The simulation results show that the proposed algorithm can achieve stable and accurate state estimation when both process noise and observation noise are heavy tailed noise. The proposed algorithm can be extended to other consensus algorithms and nonlinear situations in the future.