Modeling and Mitigating Errors in Belief Propagation for Distributed Detection

We study the behavior of the belief-propagation (BP) algorithm affected by erroneous data exchange in a wireless sensor network (WSN). The WSN conducts a distributed multidimensional hypothesis test over binary random variables. The joint statistical behavior of the sensor observations is modeled by a Markov random field whose parameters are used to build the BP messages exchanged between the sensing nodes. Through linearization of the BP message-update rule, we analyze the behavior of the resulting erroneous decision variables and derive closed-form relationships that describe the impact of stochastic errors on the performance of the BP algorithm. We then develop a decentralized distributed optimization framework to enhance the system performance by mitigating the impact of errors via a distributed linear data-fusion scheme. Finally, we compare the results of the proposed analysis with the existing works and visualize, via computer simulations, the performance gain obtained by the proposed optimization.


I. INTRODUCTION
D EALING with a large collection of random variables and their interactions is a common practice when designing statistical inference systems.Graphical models, a.k.a., factor graphs, which are commonly used to capture the interdependencies between correlated random variables, are known to provide a powerful framework for developing effective low-complexity inference algorithms in various fields such as wireless communications, image processing, combinatorial optimization, and machine learning, see e.g., [1]- [3].Belief propagation (BP) [4] is a well-known statistical inference algorithm that works based on parallel message-passing between the nodes in a factor graph.BP is sometimes referred to as the sum-product algorithm.
When working with the BP algorithm, we should bear in mind that digital computation and digital communication are both error-prone processes in general.The messages exchanged between the nodes in a wireless network can always be adversely affected by errors caused by unreliable hardware components, quantization processes, approximate representations, wireless channel impairments, etc.Even though the BP algorithm has been extensively studied in the literature, we have rather limited knowledge about how stochastic errors in messages affect the beliefs obtained and how these erroneous beliefs influence the result of statistical inference schemes implemented by the BP algorithm.This territory is difficult to Y. Abdi and T. Ristaniemi are with the Faculty of Information Technology, University of Jyväskylä, P. O. Box 35, FIN-40014, Jyväskylä, Finland, Tel.+358 40 7214 218 (e-mail:younes.abdi@jyu.fi,tapani.ristaniemi@jyu.fi).explore mainly due to the nonlinearities in the BP messagepassing iteration.
In [5], we have developed a systematic framework for analyzing the behavior of BP and optimizing its performance in a distributed detection scenario.In particular, we have shown that the decision variables built by the BP algorithm are, approximately, linear combinations of the local likelihoods in the network.Consequently, we have derived in [5] closedform relationships for the system performance metrics and formulated a distributed optimization scheme to achieve a near-optimal detection performance.Moreover, we have discussed the relationship between the BP and the max-product algorithms in [6] where we extend the proposed framework in [5] to optimize the performance of the max-product algorithm in a distributed detection scenario.In this paper, we further extend that framework to gain insight into the impact of computation and communication errors, in a BP iteration, on the resulting decision variables and to effectively mitigate that impact.Examples of BP being used in distributed detection can be found in [7]- [10].
Accumulation of message errors and their adverse effect on the performance of BP is analyzed in [11] where the message errors are modeled as uncorrelated random variables to find probabilistic guarantees on the magnitude of errors affecting the beliefs.Assuming uncorrelated behavior for errors is inspired in [11] by observing the behavior and stability of digital filters, in the presence of quantization effects, which can be analyzed reliably by assuming uncorrelated behavior in the corresponding random errors [12].Such a modeling approach is in line with the von Neumann model of noisy circuits [13], which considers transient faults in logic gates and wires as message and node computation noise that is both spatially and temporally independent [14].
The behavior of BP implemented on noisy hardware is investigated in [15] where it is observed that under the so-called contracting mapping condition [16], the distance between successive messages in a noise-free BP decreases by the number of iterations.Consequently, in the presence of hardware (or computation) noise, the faulty messages which violate this trend can be detected and discarded (censored) from the BP iterations.Such an approach is termed censoring BP in [15] and is shown to performs well when the hardware noise distribution has a large mass at zero and has nonnegligible masses at some points sufficiently away from zero.As an alternative approach, the so-called averaging BP (ABP) is also proposed in [15].In this method, as the name implies, an average of the messages up to the last iteration is saved and then used, instead of the actual messages, to build the beliefs.This method is proposed and its convergence is established for general zero-mean computation noise distributions.Again, the von Neumann model is used in [15] to analyze the behavior of message errors.
In this paper, we use the fact that the BP algorithm and the linear data-fusion scheme are elegantly related to each other in the context of distributed detection.Fortunately, there already exists a rich collection of scientific works in the literature that investigate low-complexity detector structures based on linear fusion in various design scenarios [17]- [21].In many of these works, the data-exchange process within the sensor network is assumed adversely affected by non-idealities in the underlying communication links.Hence, dealing with erroneous data is a familiar challenge when designing wireless sensor networks (WSN).We use this knowledge to cope with the impact of message errors on distributed detection systems realized by BP.
In particular, we approximate the message structure in BP by a linear expression to study the impact of erroneous data exchange on the BP algorithm and to clarify how it affects the performance of the distributed detection concerned.We derive approximate expressions to measure the strength of the cumulative errors that affect the BP-based decision variables.These expressions are in the form of mean-squared error (MSE) levels.We compare the MSE levels obtained with the one in [11] to gain insight into the behavior of BP and to see how computation and communication errors propagate throughout the underlying factor graph.Moreover, based on the proposed linear approximation, we show that ABP is effective in alleviating message errors and falls short of mitigating the impact of erroneous local likelihood ratios (LLRs) on the resulting decision variables.
We also show, under practical assumptions, that the decision variables built by an erroneous BP are disturbed by a sum of independent error components whose collective impact can be modeled, approximately, by Gaussian random variables.Consequently, we establish the probability distribution of the resulting erroneous decision variables, derive the performance metrics of the BP-based distributed detection in closed form, and propose a two-stage optimal linear fusion scheme to cope with the impact of errors on the system performance.We then develop a blind adaptation algorithm to realize the proposed two-stage optimization when the statistics describing the radio environment are not available a priori.
Here is an overview of the paper organization: In Sec.II, we briefly explain the use of linear fusion and BP in distributed detection and provide the related formulations.In Sec.III, we discuss errors in BP and model their impact on the decision variables obtained.In Sec.IV, we view BP as a distributed linear fusion and formulate the proposed optimization framework.In Sec.V, we conduct computer simulations to verify our analysis and to illustrate how effectively the proposed method mitigates the impact of errors in a WSN with faulty devices.Finally, we provide our concluding remarks in Sec.VI.

II. LINEAR FUSION AND BELIEF PROPAGATION FOR
DISTRIBUTED DETECTION We consider N binary random variables, represented by x = [x 1 , ..., x N ] T , whose status are estimated based on N observations denoted Y = [y 1 , ..., y N ] made by a network of N sensing nodes.Each node, say node i, which intends to estimate the status of x i , collects K observation samples, denoted by y i = [y i (1), ..., y i (K)] T , and exchanges information with other nodes in the network to realize together a binary hypothesis test as x = max x p(x|Y ) = max x p(Y |x)p(x).This test can be conducted with low implementation complexity in two alternative ways that are explained in the following.

A. Linear Data-Fusion
Linear fusion has been extensively used in the context of spectrum sensing where the aim is to detect the presence or absence of a target signal by evaluating noisy observations made throughout a wireless sensor network (WSN).For brevity, we explain the uni-variate case here.In this detection scenario, each node, say node i, collects the signal samples y i = xs i + n i , where the random variable x ∈ {0, 1} determines the presence or absence of the target signal s i in the radio environment.In this model, n i denotes a vector of zero-mean Gaussian white noise samples while s i denotes an unknown deterministic vector of target signal samples, which, in general, represent a superposition of multiple signals received at node i from different transmitters.
The optimal approach to such a detection is known to be the so-called likelihood-ratio test (LRT) [22], which is conducted by evaluating the LLR, i.e., by x = 1{λ LRT } where where where γ i is referred to as the local LLR at node i.By 1{•} we represent the indicator function that returns one if its argument is positive and returns zero otherwise.It is clear that the LRT is, in fact, a matched-filtering process, which requires the target signal to be known a priori at the sensing nodes.
In practice, the local sensing process is realized by energy detection due to its ease of implementation and due to the fact that its structure does not depend on the behavior of the target signal.Energy detection is realized by γ i 1 K y i 2 and the sensor outcomes are combined linearly to build a global test statistic [17]- [21], i.e., where w [w 1 , ..., w N ] T and γ [γ 1 , ..., γ N ] T .Then, λ LF is compared against a predefined threshold τ to conduct the hypothesis test, i.e., x = 1{λ LF > τ }.Assuming that the number of signal samples K is large enough [17]- [21] such that γ i 's in (3) behave as Gaussian random variables, we can model the test summary λ LF , given the status of x, as a Gaussian random variable as well.Specifically, for x = b we have the false-alarm and detection probabilities, denoted P f and P d respectively, of this detector are derived in closed form by where Q(•) denotes the Q-function.Note that P f = g 0 (τ, w) while P d = g 1 (τ, w).By setting τ = Q −1 (α)w T Σ 0 w + w T µ 0 , we have P f = α and then the detection performance can be optimized by maximizing the resulting P d over w.This is the well-known Neyman-Pearson approach [22].Through some algebraic manipulations, this optimization is formally stated as where δ µ 1 − µ 0 .This problem is solved by quadratic programming in [17], by semidefinite programming in [19], and by invoking the Karush-Kuhn-Tucker conditions in [20].From these works, we know that the performance of linear fusion is close to the LRT performance.Alternatively, we can maximize the so-called deflection coefficient of the detector.This approach, which has a low computational complexity and leads to a good performance level, is realized by where Consequently, by using the Rayleigh-Ritz inequality [17], w * is obtained in closed form as w * = Σ −1 0 δ/ Σ −1 0 δ .When Σ 1 is used in (7), the objective function is referred to as modified deflection coefficient.
Note that both optimizations in ( 5) and ( 6) can be realized while taking into account the impact of erroneous γ i 's.The so-called reporting errors in [17]- [21] model the impact of erroneous communication links through which the sensing nodes share their observations.The optimal fusion weights obtained by ( 5) and ( 6) emphasize the impact of local sensing outcomes generated in high SNR conditions while suppressing the impact of errors caused by the data communication process between the sensing nodes.
Extension of the linear detection structure in (3) to N variables is discussed in [18], in the context of multiband spectrum sensing, where the detection performance is optimized by the so-called sequential optimization method that is based on maximizing the deflection coefficient of the system.In the following, we discuss the BP algorithm and show that it can be interpreted as a multivariate linear data-fusion.

B. Belief Propagation
We model the sensor network structure concerned by a Markov random field (MRF) defined on an undirected graph G = (V, E).In this model, the set of vertices V corresponds to the set of network nodes while each edge (i, j) ∈ E represents a possible connection between nodes i and j.Each node, say node i, is associated with a random variable x i and the edge (i, j) models a possible correlation between x i and x j .This model fits well into the commonly-used ad-hoc network configurations in which major network functionalities are conducted through pairwise i.e., one-hop, links between the nodes located close to each other.This design method is based on the common assumption that nodes located close enough to each other for one-hop communication, experience some levels of correlation between their sensor outcomes.
By using the MRF, we write p(x|Y ) as a product of univariate and bivariate functions, i.e., Note that ∝ in ( 8) refers to a normalization that ensures x p(x|Y ) = 1.When including the bivariate terms in the product, each edge in the factor graph is included in the product only once.This is realized by doing the multiplication on i < j while i ∈ N j .We use N j to denote the set of neighbors of node j in the graph, i.e., N j {k : (k, j) ∈ E}.By using (8), we formulate the message received at node j from node k as where by N j k N k \{j} we denote all nodes connected to node k except for node j.We denote by b (l) j (x j ) the belief, about the status of x j , formed at node j, which is obtained via multiplying the potential at node j by the messages received from all its neighbors, i.e., b (l) The beliefs are used as estimates of the desired marginal distributions, i.e., b j (x j ) ≈ p(x j |Y ).By adopting the commonlyused exponential model [4] to represent the a priori probability measure defined on x, we have For a given x, we assume the local observations to be mutually independent.Consequently, we have [5] Hence, by using (12), the BP messages are built as In the log domain, ( 13) and ( 14) convert, respectively, to where denote, respectively, the estimated likelihood ratio at node j and the message sent to node j from node k while S(a, b) ln 1+e a+b e a +e b and In this model, y k = x k s k + n k denotes the signal received at node k.Hence, x k = 0 indicates that the target signal s k is absent leaving the the spectrum free where node k operates.If x k = 1, then the corresponding spectrum band is occupied.J kj 's are calculated as in Eq. ( 16) in [5] by processing a window of T sensing outcomes.Note that θ k in ( 15) is merged into γ k without having any impact on the rest of the analysis.
After l * iterations, λ is compared, as a decision variable, against a detection threshold τ j at node j to decide the status of x j , i.e., xj = 1{λ (l * ) j − τ j }.By a linear approximation of (15), we have where c jk (e 2J kj −1) Therefore, we see that, given enough time, all the local likelihood ratios observed in the network are linearly combined at node j to calculate its decision variable λ j .We have shown in [5] that the convergence of this linear message-passing algorithm is guaranteed when The linear combination in (20) can be expressed as λ j = N i=1 a ji γ i , which is compactly stated in matrix form as where λ [λ 1 , ..., λ N ] T and A [a 1 , ..., a N ] while a j [a j1 , ..., a jN ] T .Through some algebra, we can find the relationship between A and c jk 's in (20).Specifically, we have where C [c jk ] N ×N and D(X) denotes a diagonal matrix whose main diagonal is equal to that of X.The proof is provided in Appendix I.
It is now clear that to have convergence in the messagepassing iteration (19), the spectral radius of C has to be less than one.This criterion may be used to impose bounds on c jk 's to guarantee the convergence of the algorithm.Alternatively, the convergence can be guaranteed, without dealing with the complexities of finding the spectral radius, by using the contracting mapping condition as we have discussed in [5].We use (22) in the following section to derive an estimation of the error strength affecting the decision variables built by an erroneous BP.

III. ERRORS IN BELIEF PROPAGATION
Eq. ( 15) shows that at each BP iteration each node creates its messages in terms of its local LLR value as well as the messages received from the neighboring nodes at the previous iteration.In our system model, we assume that the local LLRs and the BP messages are erroneous.As in [11], [15], we use the von Neumann approach to modeling the joint statistical behavior of errors.

A. Error Model and Analysis
Since the messages are multiplied together to build the beliefs, we formulate them as multiplicative perturbations affecting true (i.e., error-free) message values, i.e., where μ(l) k→j (x j ) denotes the erroneous message sent to node j from node k at iteration l while ε (l) k→j (x j ) denotes the corresponding error, which is considered in this paper as a stochastic process.
Remark 1: Eq. ( 23) differs from the model used in [11] in the sense that the error model in that work measures the difference between the messages at iteration l with their counterparts at the fixed point of the message-passing iteration.In other words, the error model in [11] measures the deviation of the messages at each iteration from their final value reached by BP after convergence.The stochastic error we discuss here is briefly studied in [11] under the notion of additional error.
By expressing the messages in the the log domain we have where Based on the von Neumann model, we assume that if k = n, then E[ln ε )] = 0 for all x.Consequently, we have E[ν k→j ν n→j ] = 0. To measure the collective impact of errors on the belief of node j, we use where b(l) j (x j ) denotes the belief at node j resulting from a BP iteration with erroneous messages as in (23) while b ( * ) j (x j ) denotes the belief of node j at a fixed point reached by an error-free BP iteration.We use ( * ) instead of (l) to indicate the messages and beliefs at a fixed point of the error-free BP.
By assuming uncorrelated stochastic behavior for the message errors, an upper bound on cumulative errors affecting the beliefs can be obtained.Specifically, assuming Var ν (l) k→j ≤ (ln u) 2 for all k, j, l, an upper bound on the resulting cumulative strength of errors at node j is derived in [11] as, where σ (1) where We use the upper bound in (27) in the log domain based on the fact that (see (17) and (26)) Hence, in the detection structure discussed, ( 27) gives an upper bound on the MSE level observed in the decision variable at node j.

B. Linear Approximations
In our analysis, we distinguish between the message errors and the errors in the computation of local LLRs to gain further insight into the behavior of the BP algorithm.In particular, we model the erroneous local LLRs as γk γ k + k and refer to k 's as likelihood errors (LE) while assuming that LEs are uncorrelated as well, i.e., E[ k n ] = 0 for k = n.We refer to ν k→j 's as message errors (ME) and assume that LEs and MEs are mutually independent.Moreover, we assume that all MEs and LEs are independent of the messages and of the local LLRs.Note that the bound in (33) does not take LEs into account.
Taking both types of error into account, we express the messages as which show that the errors pass through the same nonlinear transformation (i.e., S) as the messages do.By using (34), we can analyze the behavior of errors.The proposed linear BP iteration in the presence of message errors is expressed as Consequently, similar to the way ( 20) is derived, the resulting erroneous decision variable is formed as (37) which can be reorganized as where Hence, we have the following remark, which we will use in Sec.IV where we develop an optimization framework for the system.
Remark 2: Eq. (39) shows that, the error affecting the decision variable at node j has two distinct components.The first component is built as a linear combination of LEs while the second one is the sum of the MEs received at node j from its one-hop neighbors.The first component is fixed whereas the second one exhibits a new realization at every iteration.
According to (39), deviation from the error-free decision variables, caused by errors in the BP iterations, can approximately be measured by where Σ cov( ) and Σ νj cov ν

0.
Eq. (37) shows that when BP is used to realize a distributed detection, the erroneous local likelihoods in the network are combined linearly to build the decision variables.We can evaluate the impact of the errors on the system performance by analyzing the stochastic behavior of the erroneous decision variables λ(l) j .Given x, the decision variable at node j is obtained as a linear combination of independent random variables.Consequently, its conditional pdf is derived as where while and * denote the convolution operator.Consequently, we have {x (j) = b, x j = v} while p x (j) |xj (b|v) Pr{x (j) = b|x j = v}.Solving g j (τ j , 0) = α gives a threshold value that fixes the false-alarm rate at α. Similarly, g j (τ j , 1) = β fixes the detection rate at β. Recall that a ji 's are found by using c jk 's, see (22).
As a common practical case, when the local LLRs and the errors follow Gaussian distributions [17]- [21] the decision variable λj follows a Gaussian distribution as well and it is fully characterized by its first-and second-order statistics.Specifically, we have where In (45) we have assumed, without loss of generality, to have zero-mean errors.Note that, without the proposed approximation these performance measures are not available analytically due to the nonlinearity of (15).In the rest of the paper, we assume that the local likelihoods, LEs, and MEs are Gaussian random variables.Eq. (41) shows that, according to the central limit theorem [23], even if the local LLRs and errors are not Gaussian random variables, the stochastic behavior of the decision variables can still be approximately described by Gaussian distributions.

C. Impact of Averaging
In ABP, the message-passing iteration is the same as in BP.However, instead of the actual message values, an average of the messages are used to build the decision variables.To be more specific, in the log domain and for l ≥ L + 1, let m(l) The decision variable at node j is calculated by Similar to our discussion regarding (20), we can show that when the message-passing iteration is error-free, λ( * ) j lim l→∞ λ(l) j = λ j .Hence, we have the following remark.Remark 3: The averaging process does not alter the fixed points achieved by the error-free linear BP.From an approximation-based point of view, this observation is in line with the convergence analysis provided in [15].
The impact of averaging on LEs and MEs can be clarified by noting that λ(l) where, assuming L to be large enough, we have Remark 4: Assuming L to be large enough and the MEs to have zero mean, (50) shows that the resulting decision variable built by ABP in (48) is almost cleared of MEs.However, the averaging process has almost no impact on LEs.
Note that in ABP the message-passing iteration is the same as in BP and the averaging is only performed when computing the decision variables.Moreover, in ABP, instead of storing the messages in past iterations separately, we only need to store the sum of the messages up to the current iteration.As a consequence, the number of additional memory cells required can be kept constant [15].We will use ABP in Sec.IV-B to build an offline learning-optimization structure for the linear BP in the presence of errors.

IV. MITIGATING ERRORS BY LINEAR FUSION
In this section, we first propose a two-stage linear fusion scheme to obtain a near-optimal detection performance by suppressing the impact of the errors.Then, we realize the proposed optimization in a blind decentralized setting where the required statistics are not available a priori.

A. Linear Fusion
First, since |c jk | < 1, we further approximate the decision variable λ j in (20) as Due to the symmetry of the data-fusion process in (20), the approximation in ( 52) is an effective approach to building a distributed computing framework for the system performance optimization.In this framework, each node interacts only with its immediate neighbors.We have clarified this symmetry in [5, Sec.III-B].By taking into account the errors while analyzing the linear BP, ( 20) and (37) lead to We see that the disturbance on the decision variable caused by LEs is built, approximately, as a linear combination of k 's with c jk 's acting as weights in this combination.Therefore, we use c jk 's as design parameters to mitigate the impact of k 's.Moreover, MEs are combined in (53) linearly and in this combination, all weights are one.We propose to extend this combination by using a modified version of (35) as This modification in the structure of the decision variable does not affect the convergence of the proposed linear BP since it does not alter the message-passing iteration.Now, based on an approximation similar to the one in (53), we have Since λj is a Gaussian random variable, we only need its mean and variance to characterize its statistical behavior.Specifically, for b ∈ {0, 1}, we have where where v j w j •c j in which • denotes the Hadamard product while Σ γ j |b = cov(γ j |x j = b) and Σ j = cov( j ).Moreover, w j , c j , γ j , and j are |M j |-by-1 vectors containing w ji 's, c ji 's, γ i 's, and i 's for i ∈ M j , respectively.Eq. (56) gives the system false-alarm probability for b = 0 and the detection probability for b = 1.The false-alarm probability can be set to P and then by using (56) -(58), w j and c j can jointly be optimized in a Neyman-Pearson setting.
In order to avoid the challenges associated with this optimization, we maximize the deflection coefficient of the detector.We already know that the resulting detector performs well when the decision variables follow the Gaussian distribution.In this manner, we mitigate the joint impact of LEs and MEs with low computational complexity.
The proposed optimization is conducted in two consecutive stages based on the fact that we can decompose the construction of λj into two consecutive fusion processes.That is, we first optimize c jk 's by considering the impact of k 's on γ k 's.Then, we consider the resulting scaled LLRs, i.e., c jk γ k 's, as new statistics to be linearly combined, while being weighted by w jk 's and distorted by ν k→j 's, to make the decision variable at node j.
More specifically, first, we optimize c j in a hypothetical linear detector with its decision variable defined as The coefficients resulting from this optimization scale up the more reliable local LLRs, with respect to the ones built under low SNR regimes, to suppress the effect of LEs.We denote the resulting fusion weights by c * j .Then, we use c * j within the structure of the actual detector to optimize w j to mitigate the impact of MEs.That is, we consider the following linear detector at node j λ where The vector ν j contains ν k→j 's with k ∈ M j .In this structure, the elements of χ jk 's are seen as the actual local LLRs that are combined to build the decision variable at node j while the combination takes into account the joint degrading effect of MEs and LEs.
Based on the material provided in Sec.II-A, the first stage of the proposed optimization is formally stated as where where The resulting c * j is then used to realize the second stage of the proposed optimization by solving where where Having c * j and w * j , the detection threshold τ j is derived as to fix the system false-alarm rate at α. Through the proposed two-stage optimization, we enhance the detection performance at node j by suppressing the joint impact of MEs and LEs with low computational complexity.The statistics required in this optimization are collected from the one-hop neighbors of node j.This makes the proposed method a viable approach in ad-hoc network configurations where major network functionalities are conducted through one-hop links between the network nodes.

B. Offline Learning and Adaptation
To realize the proposed optimization, we need the mean and covariance of the local erroneous LLRs.In a blind setting where there is no prior information available regarding the radio environment, we have to estimate those parameters based on the detection outcomes.In particular, the main challenge here is that the state of x j is required at node j while the only information available in practice is the detection outcome xj .Hence, node j has to estimate the conditional statistics required in (62) and (64) based on xj .The problem with such an adaptation mechanism is that it makes the detection outcome xj depend on those estimates.This dependence creates an inherent deteriorating loop by feeding the detection errors back into the system structure through erroneous estimates of the required statistics.
To overcome this challenge, we propose an extended version of the blind learning-adaptation loop in [5] that accommodates the proposed error-mitigating structure.The pseudo-code of this adaptation is provided in Algorithm I where the task of each node is specified in a distributed computing framework.Algorithm I operates on a window of stored sensing outcomes and involves a secondary BP that is run much less frequently than the rate at which the distributed detection is performed.The outcomes of this offline BP are used in the estimation of the required unknown statistics.In this adaptation, the desired optimizations are realized iteratively while each node interacts only with its one-hop neighbors.Consequently, Algorithm I can be well incorporated in a decentralized network configuration.
In the sequel, we propose a blind adaptation structure in which we use κ to denote the iteration index.Note that we use l as the iteration index in the main BP through which the distributed detection is realized.The offline adaptation updates the fusion weights in the proposed linear BP by processing T stored samples of γ.This window of erroneous local likelihoods is denoted by γT and contains samples of γ(t) for t = 1, 2, ..., T .Recall that, γ = γ + where denotes the vector of LEs.The offline detection outcomes at iteration κ are denoted by x(κ) [ N ] while the resulting fusion weights and detection thresholds are denoted c (κ) j and τ (κ) j respectively.x(κ) denotes a window of stored sensing outcomes x(κ) (t) for t = 1, 2, ..., T .For simplicity, we do not show the time index when dealing with γT , and x(κ) .
Due to errors caused by the wireless links between the sensing nodes, node j does not have access to γk (t), k ∈ N j .Specifically, what node j receives from node k is γk (t)+ν k→j where ν k→j denotes the corresponding link error.Without loss of generality, we attribute MEs to wireless link errors.To alleviate the link errors, before starting the adaptation process node j receives L copies of γk (t) from node k and calculates an average to obtain γk (t) γk (t) + νk→j where νk→j denotes the average of L independent realizations of ν k→j .The desired statistics are then calculated by processing γk 's, which approximate γk 's.We use γT to contain the samples of γk (t) for t = 1, 2, ..., T for k = 1, 2, ..., N .
In a realistic detection scenario, the data exchanged between the nodes in the proposed offline adaptation is impaired by both types of errors.Since in the first linear fusion (62) we take into account the impact of LEs only, we need to isolate this optimization from the MEs.To this end, we estimate the desired statistics by using linear ABP.As we saw in Sec.III-C, MEs do not affect the ABP outcomes significantly.Therefore, the resulting offline decision variables are almost cleared of MEs and closely realize (62).Note that, the offline linear ABP processes γT (not γT ) since each node, say node j, builds its own messages by using its own local likelihood γj .The outcomes of the linear ABP are then used in processing γT as indicated in line 4 of Algorithm I.
As indicated in line 5, based on the outcomes of the ABP, we suppress the impact of LEs by the first-stage linear fusion.The resulting fusion coefficients enhance the quality of the linear ABP, which, in turn, enhances the quality of the fusion coefficients obtained in the following iteration.By repeating this learning-optimization cycle, we suppress the impact of LEs significantly while this cancellation process is not disturbed by the MEs.See lines 2 -10 in Algorithm I.
In this section, we distinguish between the coefficients obtained by Algorithm I and the ones obtained by linearzing the BP algorithm as in (19).Specifically, we use c BP j to collect c BP jk 's for k ∈ M j while c BP jk (e 2J kj − 1)/(1 + e J kj ) 2 .Line 6 indicates what we refer to as the η-test that ensures the system performance level not to fall below that of the legacy BP algorithm.The test is as follows: given a predefined value η and for n = 1, 2, ..., N , if c BP j (n)/c (κ) That is, we do not use a coefficient obtained by the offline optimization if that coefficient is not large enough with respect to its corresponding coefficient in the main linearized BP.The reason is that, when the primary-user signal received at node j is buried under heavy noise, the local optimization at node j is not able to fully capture the correlations between x j and its neighbors.Consequently, the resulting c jk 's attain values too small to maintain a good detection performance.In such cases, we replace those coefficients with their counterparts in the linearized Fig. 1: We have five secondary users cooperating via BP to sense the radio spectrum allocated to a primary network with two transmitters.We use dashed lines to depict the links between the cooperating nodes, through which the BP messages are exchanged.Nodes 1 and 4 act as faulty nodes in the second experiment.
BP.This technique prevents the resulting coefficients from degrading the performance when the corresponding nodes operate under an SNR regime that is not good enough for a reliable estimation of the desired statistics.We use the legacy BP coefficients for those nodes.
Having the high-quality fusion weights c * j and more-reliable detection outcomes x(κmax) , we realize the second stage of the proposed optimization.To this end, node j finds cov(χ j |x j = 0) in (65) by calculating cov(γ j |x (κmax) j = 0) and then multiplying the result, element-wise, by c * j c * T j .Note that cov(γ j |x j ) ≈ Σ γ j |xj + Σ j where γj denotes the jth column in γT .The elements of the diagonal matrix Σ νj are found at node j by noting that γk ( In case a certain performance level, such as a certain falsealarm rate, is required to be guaranteed, we can use the detector calibration technique in [5].The thresholds obtained by Algorithm I appear to be too sensitive to errors in the estimated statistics.Therefore, we do not use Algorithm I for threshold adaptation.Note that the implementation complexity of the BP algorithm does not increase significantly by using Algorithm I since the channel statistics change slowly compared to the rate at which the spectrum sensing is conducted.In other words, Algorithm I is executed far less frequently than is the spectrum sensing.Note that the spectrum sensing is performed at every time slot.

V. NUMERICAL RESULTS
Our simulation scenario in this section is an extension of the one considered in [5].We consider a spectrum sensing scenario typically used in cognitive radio networks [24].Specifically, we have five sensing nodes, as secondary users, cooperating with each other via BP to find spectral opportunities not in use by the primary users.Fig. 1 depicts our network configuration where the range of primary transmitter 1 covers nodes 1, 2, and 3 while the range of primary transmitter 2 covers nodes 3, 4, and 5.We use dashed lines to represent the links between the cooperating nodes.
Each node generates its local sensing outcome by using energy detection while processing 100 samples of the received Fig. 2: Impact of errors on the decision variables built by the BP and ABP algorithms.signals.Node 1 and node 5 receive the primary-user signal with an SNR level of -5 dB, node 2 and node 4 experience an SNR level of -8 dB in the received primary-user signal and node 3 receives signals from both of the primary transmitters at -10 dB each.In our simulations we randomly switch the primary transmitters on and off.We realize these onoff periods by generating correlated binary random variables.Hence, the primary transmitters exhibit correlated random behavior in our simulations.This is an extension to the primary network behavior assumed in [7].In that work, on of the primary transmitters is on while the other one is off and they do not change their status.As in [5], [7], we assume that the channel coefficients are fixed during a time slot.
We conduct two experiments in this section.In the first experiment, whose results are depicted in Fig. 2 , we evaluate our analysis and compare its results to the one obtained by Ihler et.al. in [11].In particular, we compare the levels of what we define as the decision SNR (DSNR), predicted by our analysis, against the one predicted by the work in [11].We define the DSNR level at node j as This parameter measures the ratio of the power of the decision variable built by an error-free BP to the power of the error affecting the same decision variable in an erroneous BP.
We realize the LEs and MEs as uncorrelated zero-mean Gaussian random variables in our simulations.At each node, we measure the strength of LEs and MEs with respect to the node's local likelihood.Specifically, we define the LE SNR level at node j as (67) while we define the ME SNR level at node j and for i ∈ N j as Consequently, now we have a reference level at each node to measure the power of LEs and MEs injected by that node into the BP iteration.
Fig. 3: Performance levels of BP, linear BP, and the proposed optimal linear BP in the presence of errors.
To see the impact of different error types separately as well as together, we run three BP algorithms in the first experiment.The first one is affected only by LEs, the second one is affected only by MEs and the third one is affected by both of the error types concurrently.Moreover, we evaluate the behavior of ABP in the presence of both error types.In each case, the average of the DSNR level predicted by the proposed analysis and the one observed in simulations are depicted in Fig. 2. The average of the DSNR is calculated over all sensing nodes in the network.The dashed curves in Fig. 2 represent the results of our analysis while the solid curves show the average DSNR levels observed in simulations.In this experiment, we consider ρ (j) LE = ρ (j,i) ME = 10 dB for all i, j and ζ = 1.In Fig. 2 for each data point we have averaged 20,000 realizations of the decision variables.For the adaptation process in Algorithm I we have used 2,000 data samples.That is T = 2, 000 in this experiment.
In Fig. 2 we see a close match between the results predicted by our analysis and the ones obtained via simulations.We also see that our analysis provides a better estimate of the DSNR levels than Ihler's bound in [11].Our analysis provides a better estimation even when we only consider MEs.
Moreover, we see a gap in Fig. 2 between the DSNR levels of the LE-only and ME-only cases.This gap indicates that the impact of MEs is more deteriorating than the impact of LEs when we have the same levels of ρ LE and ρ ME .This appears to be a reasonable observation in our network since the set of error items injected to the BP iteration by each node, say node j, comprises of only one LE and |N j | MEs and |N j | ≥ 1.This observation can be justified based on the linearity of the proposed detection scheme.Specifically, the average number of neighbors in our network is (4×2+4)/5 = 2.4 and 10 log 10 (2.4) ≈ 3.8 dB and we see almost 3.5 dB gap between the two curves.Note that Ihler's bound (27) cannot distinguish between the LEs and MEs.Fig. 2 confirms our observation in Sec.III-C where we showed that ABP is quite resilient to the impact of MEs.We now see that, by increasing the number of iterations the average DSNR level of ABP approaches that of a BP that is affected by LEs only.Note that the ABP is affected by both types of errors and even when the number of iterations is rather low, the DSNR level of ABP is quite high compared to that of a regular BP affected by both error types.This observation justifies our choice of ABP for the offline learning and optimization cycle in Algorithm I.
In this experiment, for a given number of iterations N iterate , we set L large enough, i.e., L ≥ N iterate , to use all the messages generated by BP when realizing the averaging process in the ABP decision variable (47).Hence, by increasing the number of iterations, L is increased and this increase leads to a heavier suppression of MEs.We can clearly see in Fig. 2 the increase in the DSNR level of ABP caused by increasing N iterate .Moreover, the DSNR level of the ABP approaches that of LE-only case, which, as we predicted in Sec.III-C, indicates that the LEs are not affected by the averaging process in the ABP.
In addition, our analysis, which is based on the von Neumann model, predicts that the DSNR levels of an erroneous BP algorithm do not change by the number of iterations significantly.This is also confirmed by the simulation results in Fig. 2.
In the second experiment, we study the impact of errors on the detection performance of the sensor network depicted in Fig. 1.We assume that nodes 1 and 4 are faulty and inject errors into the BP algorithm.All other nodes operate in a reliable manner, meaning that the errors in their local likelihoods and messages are negligible.The results are depicted in Fig. 3 where for each data point we have averaged 100,000 detection results.For the adaptation of the linear BP we have used a window of 2500 detection outcomes.That is T = 2500 in Algorithm I. We have set L = 10 in the offline ABP of Algorithm I.In the faulty nodes we have ρ ME = 20 dB.The aim of this experiment is to see whether the proposed method is able to alleviate the impact of those faulty nodes on the overall detection performance.
As for performance metrics, we use the average of the detection and false-alarm rates observed in all of the sensing nodes.We consider both the BP and linear BP algorithms with error-free and erroneous iterations.Consequently, we see how each detection method is affected by errors.Moreover, we consider the proposed linear BP optimized with and without having the required statistics.Fig. 3 shows that, in the presence of LEs and MEs, the detection performance of both messagepassing algorithms are significantly degraded.This observation clarifies the need for a better BP algorithm which resists against the impact of errors.In Fig. 3 we also see that the proposed method significantly improves the detection rate of the system in the presence of errors.Moreover, we can see that the proposed blind adaptation scheme closely achieves the optimal performance level when the required statistics are not available a priori.

VI. CONCLUSION
We studied the impact of computation and communication errors on the behavior of the BP algorithm.We showed that when evaluating the impact of errors on a distributed detection conducted by BP, the detection can effectively be modeled as a distributed linear data-fusion scheme.Consequently, we can analyze its statistical behavior in the presence of errors and obtain closed-form relations for its performance metrics.Moreover, by optimizing the resulting linear data-fusion we can effectively suppress the impact of errors and obtain a better detection performance.APPENDIX I: PROOF OF (22) We focus on non-diagonal elements of A here since it is clear that the diagonal ones are all close to one.It is straightforward to see that (22) holds for l = 1, 2, 3.That is, we have λ (l) = A (l) γ where A (l) ≈ l n=1 C n for l ≤ 3. Based on this observation, we prove (22) by induction.Specifically, we show that if A (k) ≈ k n=1 C n for k ≤ l then A (l+1) ≈ l+1 n=1 C n or, equivalently, A (l+1) ≈ A (l) + C l+1 .From ( 16) and (19) and by setting c jk = 0 for k / ∈ N j , ∀j we have   by c2 .Recall that, to ensure the system convergence we have |c j,k | < c, ∀(j, k) ∈ E. Hence, the proof is complete.
) and the beliefs at iteration l are expressed as b

Remark 5 : 1 maxn |Nn|− 1 ,
The convergence condition |c j,k | < ∀(j, k) ∈ E can be realized by a simple normalization of c * j,k 's since the objective function in (62) does not change by normalizing its argument.