The Research on Distributed Fusion Estimation Based on Machine Learning

Multi-sensor distributed fusion estimation algorithms based on machine learning are proposed in this paper. Firstly, using local estimations as inputs and estimations of three classic distributed fusion (weighted by matrices, by diagonal matrices and by scalars) as the training sets, three distributed fusion algorithms based on BP network (BP net-based fusion weighted by matrices, by diagonal matrices and by scalar) are proposed and the selection basis of the number of nodes in hidden layer is given. Furthermore, by using local estimations as inputs and centralized fusion estimation as training set, another recurrent net-based distributed fusion algorithm is proposed, in the case that neither true states nor cross-covariance matrices is available. This method is not limited to the linear minimum variance (LMV) criterion, so its accuracy is higher than the classical three distributed fusion algorithms. A radar tracking simulation verifies the effectiveness of the proposed fusion networks.


I. INTRODUCTION
With the development of sensor technology, the performance of sensors has been continuously improved, and the number of sensors integrated in the systems has been continuously increased. To some extent, the number of sensors even represents the accuracy and reliability of the system. With the increase in the number of sensors, information fusion has become an indispensable part of information processing for multi-sensor systems and has attracted many attentions of scholars.
Multi-sensor fusion estimation [1] is an important branch of information fusion and it often works at the bottom of the fusion framework. For fusion estimation based on Kalman filter, there are two basic fusion frameworks: centralized fusion and distributed fusion [2]. Centralized fusion needs to transmit measurement data to fusion center and then estimate using the expanded measurement. It can give the globally optimal estimation, but the large communicational and The associate editor coordinating the review of this manuscript and approving it for publication was Ludovico Minati . computational cost and the poor fault tolerance are its disadvantages. Distributed fusion can estimate in local node according to the local measurements and then transmit the local estimates to fusion center. It can only give the suboptimal estimation [3], [4], but due to the lower communicational and computational cost, it is more suitable for large-scale sensor network systems. Distributed fusion is widely used in many system [5], [6], such as the famous Carlson federal Kalman filter [7], [8].
Based on the LMV criterion, a optimal fusion algorithm weighted by matrices using weighted least squares method is proposed (i.e., Carlson's federated Kalman filter) [7], [8]. Roy and Iltis proposed a decentralized static filter for the linear system with correlated measurement noises [9]. Kim proposed the multi-sensor optimal information fusion estimate in the maximum likelihood sense under the assumption of normal distribution [10]. Based on the LMV criterion, fusion algorithm weighted by matrices, scalar and diagonal matrices was proposed by Sun and Deng [11]- [13]. All these distributed fusion algorithms have lower communicational and computational cost and good fault tolerance, and are widely used in various fields.
Artificial Neural Network (ANN) is an algorithmic mathematical model for distributed parallel information processing which imitates the characteristics of information transmission and reflective behavior of human brain nervous system [14]. Due to the good approximation performance of ANN, it has been well applied in the state estimation and information fusion [15]- [21]. Rao et al proposed various learning-based estimators to solve the fusion estimation problem (i.e., Artificial Neural Networks (ANNs), the Nadaraya-Watson estimator and the Nearest Neighbor Projective Fuser) [22]- [24]. Chowdhury proposes a neural data fusion method, which is performed by a set of artificial neurons and synaptic weights are used as weight estimates for optimal data fusion [20]. Liu et al. [25] and Brigham et al. [26] proposed the learningbased nonlinear fusion algorithms, and Brigham et al studied and compared the use of the following cages: ANNs, support vector regression (SVR), Nadaraya-Watson (NW) estimator and nearest neighbor (NN) projection fuser [26].
BP (Back Propagation) algorithm is first proposed by Werbos in his doctoral thesis, which provides a practical solution for the training and implementation of multi-layer neural network [27]- [29]. Rumelhart et al dissect the error back propagation algorithm of the multi-layer network, which further promotes the BP algorithm [30]. The topological structure of BP network includes input layer, hidden layer and output layer which can store a certain mapping relationship by learning without knowing the specific mathematical expression of input and output beforehand. Cybenko et al successively analyze the approximation performance of BP network, and prove that the continuous feedforward neural networks with single hidden layer and sigmoid transfer function can approximate any complex continuous mapping with arbitrary precision [31]- [33]. For recurrent networks (Hopfield, Elman, CG, BSB, CHNN, DHNN, etc.), the output are not only related to the current input, but also related to the previous input of the network, which makes them have an iterative structure similar to the Kalman filtering framework [34], [35], [40].
The main work of this paper is to construct multi-sensor distributed fusion frameworks based on ANN and machine learning. Firstly, using local estimations as the inputs and estimations of three classic distributed fusion (weighted by matrices, by diagonal matrices and by scalars) as the training sets, three distributed fusion algorithms based on BP network (BP net-based fusion weighted by matrices, by diagonal matrices and by scalar) are proposed and the selection basis of the number of nodes in hidden layer is given. Furthermore, the distributed fusion algorithm based on BP network (BP net-based distributed fusion algorithm) is analyzed, which uses local estimations as network inputs and true states as the training set, in the case that the true states are available but cross-covariance matrices are not. Finally, using local estimations as inputs and centralized fusion estimation as training set, another recurrent net-based distributed fusion algorithm is proposed, in the case that neither true states nor cross-covariance matrices is available. This method is not limited to the LMV criterion, so its accuracy is higher than the classical three distributed fusion algorithms.
Notations: n denotes the n-dimensional Euclidean space; I n×n is the n-dimensional identity matrix; 'E' denotes the mathematical expectation; Superscripts 'T' and '−1' denote the transpose and inverse, respectively; δ tk is the Kronecker delta function (i.e., δ tt = 1 and δ tk = 0(t = k)); tr P represents the trace of the matrix P;x is the estimation error cross-covariance matrix and P (ii) k|k−γ will be abbreviated to P

II. PROBLEM FORMULATION
Consider the discrete linear time-invariant system with multiple sensors where x k ∈ n is the state; z (j) k ∈ m j ( j = 1, · · · , L) are the measurements; w k ∈ r is the process noise; v (j) k ∈ m j (j = 1, · · · , L) are the measurements noises; , , H (j) are time-invariant matrices with suitable dimensions.
Assumption w k , v (j) k (j = 1, · · · , L) are the uncorrelated white noise with zero mean and covariance Our aim is to obtain the estimationx k|k as close to the x k as possible, based on the measurements z (j) 0 · · · z (j) k (j = 1, · · · , L). There are two basic structures for this type of fusion estimation, one is centralized fusion estimation framework and the other is distributed fusion estimation framework. When we use neural networks for fusion and estimation, the two fusion estimation frameworks are shown in FIGURE 1 and FIGURE 2, respectively. In the paper, we choose the distributed fusion estimation framework for the following two reasons: 1) The centralized fusion estimation framework shown in FIGURE 1 needs to transmit the original measurements. Due to the limitations of transmission bandwidth and energy, the sensors outputs are required to be simple, which makes many sensors unavailable, such as imaging sensors. However, the communication network in the distributed fusion estimation framework only needs to transmit the interested states, which makes the networks more efficient and fast.
2) The Kalman filter is optimal for systems (1) and (2) [38]. It means that, for distributed fusion estimation framework  shown in FIGURE 2, there is no loss of information before neural network and neural network only need to fuse local estimates. However, neural network in the centralized fusion estimation framework need to process estimation and fusion simultaneously, which needs the neural network to have an iterative framework (i.e., a neural network with feedback [39]) and more nodes in hidden layer. The complexity of the neural network will bring troubles to construction and training, and more computational cost to the operation of the neural network.

III. CLASSICAL FUSION ALGORITHMS
Lemma 1 [11]: (The optimal fusion algorithm weighted by matrices in the sense of LMV) Letx (j) . Then the optimal fusion estimatorx (M ) k|k weighted by matrices in the sense of LMV is given by: where the optimal matrix weights A (j) k are computed by where P k|k is an nL × nL symmetric positive definite matrix, and e is nL × n matrix: The corresponding fusion filtering error covariance matrix P (M ) k|k is: and tr P Lemma 2 [12]: (The optimal fusion algorithm weighted by scalars in the sense of LMV) Letx (j) k|k (j = 1, · · · , L) be unbiased estimators of x k ∈ n based on the measure- . Then the optimal fusion estimatorx (S) k|k weighted by scalars in the sense of LMV is given by: where the optimal fusion scalar weights a k = [a (1) where P tr k|k is the L ×L positive definite matrix, and e is L ×1 vectors: The fusion filtering error covariance matrix P (S) k|k is: and tr P Lemma 3 [13]: (The optimal fusion algorithm weighted by diagonal matrices in the sense of LMV) Letx . Then the optimal fusion estimatorx (D) k|k weighted by diagonal matrices in the sense of LMV is given by: (14) where the diagonal matrix weights are calculated by: The k|k are rewritten as: then Equation (14) can be rewritten as: where a i,k|k = a (1) i,k|k , · · · , a (L) i,k|k are calculated by: where P ii,k|k is the L × L positive definite matrix: k|k . The fusion filtering error covariance P and tr P Lemma 4 [3]: (Centralized fusion) The measurement function of centralized fusion is where and 'diag' representation diagonal matrix. For the centralized fusion system with Equations (1) and (22), using Kalman filter, the centralized fusion filterx (C) k|k can be yield.

IV. FUSION ALGORITHMS BASED ON BP NETWORK A. BP NET-BASED FUSION ALGORITHMS TRAINED BY CLASSICAL FUSION ESTIMATIONS
The topology of BP network with single hidden layer is shown in FIGURE 3 [36]. It can store the mapping relationship by learning without knowing the specific mathematical expression of input and output beforehand. In 1989, Cybenko et al analysis the nonlinear function approximation performance of BP neural network, and proved that the continuous feedforward neural network with single hidden layer and transfer function sigmoid can be arbitrarily accurate [31]- [33]. In this paper, based on the approximation performance of BP network, a fusion network is proposed. After training, the proposed fusion networks can achieve the effects of the three classical fusion methods (weighted by matrices, by diagonal matrices and by scalars). The selection basis of the number of nodes and activation function in hidden layer nodes will also be given. As shown in FIGURE 3, x j (j = 1, · · · , M ) are the inputs of the input layer; w (1) ij (i = 1, · · · , q; j = 1, · · · , M ) are the weights between input layer and hidden layer, and w (2) ki (k = 1, · · · , N ; i = 1, · · · , q) are the weights between hidden layer and output layer, respectively; f (·) and g(·) are the activation functions for the hidden and output layers, respectively; d(n) = [d 1 , d 2 , · · · , d N ], y(n) = [y 1 , y 2 , · · · , y N ] are the desired and actual outputs, respectively.
Using the approximation function of BP network to realize the classical fusion estimations involve the problems, such as the number of hidden layer nodes and the selection of activation functions.
First of all, noting that the fusion Equations (4), (9) and (14) are linear, linear functions are chosen as the activation functions. Such a choice can achieve the effects of classical fusion estimation and reduce the computational cost of network training and working. Next, some discussions about the number of nodes in hidden layer will be given.
Theorem 1: If a BP network with single hidden layer equivalent to the optimal fusion weighted by matrices in the sense of LMV through training, then the number of nodes in the hidden layer q must satisfy q ≥ n ( n is the dimension of state).
Proof: Letx i (t|t) (i = 1, · · · , L) is local optimal unbiased estimates. According to the distributed fusion estimation framework shown as FIGURE 2 and the topology of BP network shown as FIGURE 3, the number of nodes in the input, hidden and output layers are M (M = n × L), q and n, respectively.
Both the activation function in hidden layer and the activation function in output layer are selected as linear transfer functions. The performance index for the network is: where w ∈ ((M +n)×q)×1 is the weight vector, Q is the number of training patterns, and the error for the γ th input e γ = d γ − y γ = e 11 · · · e 1n · · · e Q1 · · · e Qn T ( γ = 1 ∼ Q ), and vectors e γ ∈ N ×1 , d γ ∈ N ×1 , y γ ∈ N ×1 (N = Q × n).
Find the minimum weight vector in performance index for the network of Equation (26) to get the system of Equations: where the Jacobian matrix J ∈ (Q×n)×U and the error vector are defined as: If J T J is a non-singular matrix (Rank(J T J) = (M +n)×q), we can get the least squares solution of the optimal weights w * for the γ th group of training as: In order to prevent the matrix J T J singularity in Equation (29), LM (Levenberg-Marquardt) algorithm is used here, that is the Equation (29) is rewritten as: According to the BP network shown in Fig. 3, the output of the kth node of output layer is: where the θ i is the threshold of the ith node in hidden layer, and the a k is the threshold of the kth node in output layer. Considering that fusion equations (4), (9) and (14) are linear homogeneous, the thresholds θ i and a k are set to zero. Then the output y γ is: 38178 VOLUME 8, 2020 where matrix Rewrite the Equation (4) as: Comparing Equations (32) and (33), if the BP network is equivalent to the optimal fusion estimation weighted by matrices, then there must be From the full rank decomposition, when Rank(A n×nL ) = n, there must be a row full matrix ∈ n×q and a column full rank matrix ∈ q×M to make Equation (34) be founded, that is, q ≥ n. The proof is completed.
Remark 1: From Theorem 1, it can be seen that if BP network with single hidden layer is used to achieve optimal fusion weighted matrices, the number of nodes q in the hidden layer is at least n. When q < n, the BP network will cause a part of states of the system to fail to achieve optimal fusion weighted by matrices. When q > n, some of the weight combinations in BP network are linearly related, which will increase the complexity of the network but have no effect. Theorem1 can improve the efficiency and reduce the unnecessary hidden nodes of the BP network.
Similar to Theorem 1, we can get the following deduction. Deduction 1: If a BP network with single hidden layer equivalent to the optimal fusion weighted by diagonal matrices in the sense of LMV through training, then the number of nodes in the hidden layer q must satisfy q ≥ n ( n is the dimension of state).
Remark 2: The accuracies of the three classical fusion algorithms are fusion weighted by matrices, by diagonal matrices and by scalars, respectively, from high to low. However, the minimum number of nodes in the hidden layer of the BP network has not changed, and it's just that some of the weights in the BP network are reduced to zero. That is to say, there is no difference in the efficiency of the three fusion algorithms based on BP network (BP net-based fusion weighted by matrices, by diagonal matrices and by scalar), but the accuracies and the training sets are different.

B. BP NET-BASED FUSION ALGORITHM TRAINED BY TRUE STATE
For BP net-based fusion algorithms trained by classical fusion algorithms, the process and measurement noise covariance matrices are needed to obtain the local estimations for BP network inputs, and cross-covariance matrices are also needed to obtain the fusion estimations for BP network training. However, in many cases the cross-covariance matrices are not easy to obtain or even impossible [39], which makes the training set of the BP network cannot be obtained and the network cannot complete the training. In this case, the true state can be use as the training set. Training BP networks with true state is widely used in various multi-sensor information fusion systems. However, there is on basis for the choice of activation functions and the number of nodes in hidden layer [20]- [26]. The following theorem will give some results about the BP net-based fusion algorithm trained by true state.
Theorem 2: If process noise w k and measurement noises v (j) k (j = 1, · · · , L) are linearly related, the fusion BP network can reach the true state through training when the linear functions are selected as the activation functions.
Proof: The inputs of the fusion BP network are the local Klaman filtersx (j) k|k (j = 1, · · · , L ), which are linear combination of the initial state x 0 , process noises w 0 ∼ w k and measurement noises v where L (j) k (·) represents linear transformation. The state x k is a linear combination of the initial state x 0 and process noises w 0 ∼ w k , that is where L (x) k (·) represents linear transformation. Then the output y γ in Equation (32) can be rewritten as: where L k (·) and L k (·) represent linear transformations. Considering that the measurement noises are random, L (x) k and L k can be equivalent only when v (j) k (j = 1, · · · , L) can be linearly represented by w k . Considering that the inputs of the BP network are linear Kalman filters, so the activation functions of BP network should be linear. The proof is completed.
Deduction 2: If process noise w k and measurement noises v (j) k (j = 1, · · · , L) have defined nonlinear relationships, the fusion BP network can be approximated to the true state when the nonlinear functions are selected as the activation functions.
Remark 3: If process noise w k and measurement noises v (j) k (j = 1, · · · , L) are independent, the fusion BP network can not get a set of fixed weights. That is to say, when the measurement noise v (j) k (j = 1, · · · , L) which are used to VOLUME 8, 2020 generate the training set are different, the weights obtained by the training set are different.

C. RECURRENT NET-BASED FUSION ALGORITHM TRAINED BY CENTRALIZED FUSION ESTIMATIONS
For BP net-based fusion algorithms trained by classical fusion algorithms, many prior probabilities need to be known.
For BP net-based fusion algorithm trained by true state, the true state need to be known, and for different measurement noises, the weights of the network are different. Considering the limitations of the above two training methods, in order to learn the optimal result, we choose the training set as the centralized fusion result (centralized fusion is considered a global optimal estimate because there is no information loss).
Theorem 3: The recurrent network can approach the centralized fusion through training when the linear functions are selected as the activation functions.
Proof: The centralized fusion estimationx (C) k|k is a linear combination of the initial state x 0 and process noises w 0 ∼ w k and measurement noises v (1) where L (C) k (·) represents linear transformation. A linear recurrent network output y γ is also a linear combination of inputs (here the inputs arex (j) k|k (j = 1, · · · , L)), so it can be written as: In theory, linear networks can learn this relationship, and considering the similarity of network structure, recurrent network is chosen here. The proof is completed.
Remark 4: Because centralized fusion is not limited to the minimum variance criterion and no information loss, its accuracy is higher than the classical three distributed fusion algorithms. The accuracy of the recurrent network obtained by learning centralized fusion is also higher than other networks.

V. SIMULATION EXAMPLE
We consider a tracking system with three sensors.
where x k = x kẋk y kẏk T , x k ,ẋ k , y k ,ẏ k are position and velocity on the x −axes and y−axes at time k. y where T is the sampling period. w k and v (j) k are independent Gaussian white noises with zero mean and variances Q w and R (j) , respectively. In the simulation, T = 0.1s, Q w = 1m 2 /s 4 , R (1) = diag(0.33 2 m 2 /s 4 , 0.38 2 m 2 /s 4 ) R (2) = diag(0.35 2 m 2 /s 4 , 0.28 2 m 2 /s 4 ), R (3) = diag(0.31 2 m 2 /s 4 , 0.22 2 m 2 /s 4 ) ; the initial value x 0 = 0m 0m 0m 0m T , The estimation performance is accumulated mean square error (AMSE) in position at time k [41]- [43]: where (x i t , y i t ) and (x i t|t ,ŷ i t|t ) are the true and estimated positions of the ith Monte Carlo experiment at time t.
First of all, according to Theorem 1, setting f (·) and g(·) are linear, θ i = 0 and a k = 0 respectively. Then applying Lemma 1, Lemma 2, Lemma 3 and Lemma 4, respectively, we obtain the estimatorx (M ) k|k weighted by matrices (the fusion            The network weight matrices of the distributed fusion algorithm based on BP network by using local estimations as inputs and true states as training set are given in TABLES 7 and 8. From the two tables, due to the different training sets, the network weights have changed significantly, and the network has non-convergence for different training sets. It verifies Theorem 2.

VI. CONCLUSION
Two types of multi-sensor distributed fusion frameworks based on ANN and machine learning are proposed. One is based on BP network, and the other is based on Elman network. The main works are the following: (1) Using local estimations as the inputs and estimations of three classic distributed fusions as the training sets, three distributed fusion algorithms based on BP network are proposed. In addition, the selection basis of the number of nodes in hidden layer is given. The basis can improve the efficiency and reduce the unnecessary hidden nodes of the network. For these fusion algorithms, the process and measurement noise covariance matrices and cross-covariance matrices are needed, which will increase the limitations of the algorithm.
(2) The BP net-based fusion algorithm which uses local estimations as inputs and true states as training set is analyzed. The unavailability of the true states and the instability of weights for different training sets are the limitations of the algorithm.
(3) By using local estimations as inputs and centralized fusion estimation as training set, Elman net-based distributed fusion algorithm is proposed, in the case that neither true states nor cross-covariance matrices is available. This method is not limited to the minimum variance criterion, so its accuracy is higher than the classical three distributed fusion algorithms.