Performance of XOR Rule for Decentralized Detection of Deterministic Signals in Bivariate Gaussian Noise

In this paper, we consider the performance of exclusive-OR (XOR) rule in detecting the presence or absence of a deterministic signal in bivariate Gaussian noise. Signals, when present at the two sensors, are assumed unequal, whereas the noise components have identical marginal distribution but are correlated. The sensors send their one-bit quantized data to a fusion center, which then employs the XOR rule to arrive at the final decision. Here we prove that, in the limit as the correlation coefficient <inline-formula> <tex-math notation="LaTeX">$r$ </tex-math></inline-formula> approaches 1, the optimum fusion rule for both parallel and tandem topologies is XOR with identical, alternating partitions (XORAP) of the observations at the sensors. We further quantify the asymptotic decrease of the Bayes error of XORAP towards zero as a constant multiplied by <inline-formula> <tex-math notation="LaTeX">$\sqrt {1-r}$ </tex-math></inline-formula>, as <inline-formula> <tex-math notation="LaTeX">$r$ </tex-math></inline-formula> approaches 1. When compared to the asymptotic Bayes error of CLRT, which decreases to zero exponentially fast, as a function of <inline-formula> <tex-math notation="LaTeX">$1/(1-r)$ </tex-math></inline-formula>, the Bayes error of XORAP decreases to zero much slower.


I. INTRODUCTION
A collection of sensors are employed in a variety of situations in order to enhance information gathering and processing operations. Depending upon the capability of sensor networks, either condensed information or the raw data from the sensor sites may be transmitted to a fusion center (FC), where a final decision with regard to the presence (hypothesis H 1 ) or absence (hypothesis H 0 ) of a signal (or target) is made. The former case is usually referred to as decentralized detection, whereas the latter case is termed as the centralized detection [1]. Due to the bandwidth constraint in the reporting channels, decentralized detection is often expected in practice. It is well known that the complexity of finding optimal solutions to decentralized detection problems is, in general, formidable [1]- [3]. The usual performance criterion employed is either Bayes error (or cost) or Neyman-Pearson (NP) test.
In this paper, we employ Bayes error criterion and restrict consideration to a two-sensor network, with one-bit quantiza- The associate editor coordinating the review of this manuscript and approving it for publication was Chengpeng Hao . Diagram of a two-sensor decentralized detection system with the parallel topology (left) and with the tandem topology (right). FC stands for the fusion center. In tandem topology, the second sensor also acts as the FC.
tion of sensor observations. Fig. 1 shows configurations for parallel and tandem topologies in a decentralized detection system. Even in our case of two sensors, with each sensor sending one-bit quantized information to a FC, which then makes a final decision on the true hypothesis, finding the optimal decentralized decision scheme, i.e., quantization schemes at the two sensors and the decision rule at the FC, is still a difficult problem. In the parallel topology, the optimal quantizers at the two sensors need not be identical, even if the marginal distributions are identical [1], [2], [4]. This is applicable for both correlated observations and statistically independent observations [1]. In the case of one-bit quantized data from two sensors, the number of possible FC Boolean rules are limited to 2 2 2 , out of which the non-trivial ones are the Boolean AND, OR, XOR, and ''ignoring one of the sensors,'' excluding the other equivalent rules. A method to find the optimal solution to the decentralized detection problem is to assume a particular fusion rule, optimize the quantizers at the sensors for that fusion rule, and then pick the best fusion rule along with the corresponding quantizers at the sensors as the optimal solution. In the simplest case of each sensor sending one-bit information, the optimal quantizer turns out as the partitioning of the real line (i.e., the observation) into quantization intervals, where the intervals are represented as a bit '1' (possibly also for the intervals represented by the bit '0') could be non-contiguous.
At sensor 1 (2), assume that a known signal s 1 (s 2 ) is present in Gaussian noise, when the hypothesis H 1 is true. When H 0 is true, there is no signal at each sensor, and the sensor observation is simply noise. The noise components at the sensors are assumed jointly Gaussian with a correlation coefficient r and identical variance (non-identical variance case can simply be handled by dividing each observation at a sensor with the corresponding standard deviation of the noise component). In the seminal paper by Willett et al. [4], for the parallel topology, the authors identified regions for the optimality of the Boolean AND (or OR) rule in the (s 1 , s 2 ) signal plane, which includes the region where the quantization intervals for both bits '1' and '0' are contiguous (called simply the 'Good' region), the region where it may be optimal to ignore one sensor data ('Bad' region), and the region where the exclusive-OR (XOR) rule could outperform both AND and OR rule. However, [4] did not examine the performances of those rules when the correlation coefficient r is closer to 1. As will be shown in this paper, the performance of XOR rule is of significance, as r approaches 1.
Reference [5] extended the results in [4] to two-sensors tandem case and obtained 'Good' region solution. Reference [6] discussed the detection of Gaussian signals in Gaussian noise for the same tandem case. It was pointed out in [6] that [7] made an incorrect assumption that the first sensor in a tandem network could employ a simple likelihood ratio test without compromising the global optimality. A recent contribution discusses the joint estimation of unknown parameters and detection in a joint copula probability distribution model [8]. All the above results and those in [9]- [12] show the difficulty in finding an optimal solution when the sensors' observations are correlated. In this paper, we consider the detection problem with completely known parameters and the XORAP rule. It must be mentioned that the term XOR refers to the case where the Boolean fusion rule at the FC is exclusive-OR, without any specific reference to the one-bit quantization schemes that the sensors would employ. XORAP specifies identical alternating quantization intervals of length, D |s 1 − s 2 |, representing bits '1' and '0' at both the sensors, in addition to the specification of XOR fusion rule (see Fig. 2).

A. MOTIVATION FOR INVESTIGATING THE XOR PERFORMANCE
The first reason for the investigation is the optimality of the XORAP rule. If each sensor makes its decision based on the comparison of its observation against a single threshold, then XOR as a fusion rule is not meaningful in the case of independent observations. This is because the XOR rule is not a monotone fusion rule, a condition needed for optimality in the case of independent observations [1], [13]. Consider the situation when sensors utilize multiple quantization intervals, each interval designated as either a bit '1' or a bit '0'. In that case, for example, a bit '1' received at the FC from a sensor might be the result of its observation falling in any one of the multiple intervals marked as bit '1' by the sensor, and therefore this bit is not a local decision in favor of a hypothesis. The alternating partitioning intervals of span D for XOR decision was first mentioned in [14]. It showed the optimality of XORAP rule for the case of two-sensor parallel topology, in the sense of achieving perfect decision, when r = 1. The partitioning was arrived at heuristically. A theoretical proof that it is optimal for both parallel and tandem topologies is needed.
The second reason for the investigation is the behavior of different fusion rules as the correlation coefficient increases from 0 to 1. As shown in [15], for the case of centralized likelihood ratio test (CLRT), as r increases from 0, the Bayes error keeps increasing up to the point r = min(s 1 , s 2 )/max(s 1 , s 2 ), after which it monotonically decreases towards 0. That is, the increasing correlation is detrimental to the probability of error performance in detection, but that is only up to the point determined by the signal levels at the two sensors. This behavior is seen for the number of sensors exceeding 2 as well, but the point of inflection is not the above simple formula given for two sensors. Any increase in correlation coefficient beyond that point aids in the detection of signals, ultimately leading to zero Bayes error as r → 1. Similar behavior is also exhibited when sensors employ multi-bit quantization, see [14], [16]. Hence, given that XORAP is the best one-bit fusion rule as r → 1, and that the CLRT is the best fusion rule without any quantization of sensor data, it will be of interest to compare the performances of the two for large values of r, knowing well that the performance of any reasonable scheme improves, once the correlation becomes increasingly larger.
Finally, it should be noted that although the correlation coefficient close to 1 might be rarely seen in practice, investigation of this limiting case has its own theoretical importance that contributes to a better understanding of how the local decisions and the fusion rules would change with respect to the change of correlation coefficient, as well as the corresponding performance.

B. CONTRIBUTION
Contribution in this paper is two-fold. First, in Theorem 1 to be presented in the next section, we will present a proof asserting that XORAP is the optimal rule for the two-sensor tandem topology, as r → 1. For a two-sensor network, since the optimal rule in the parallel case can never outperform the optimal rule in the tandem case (see [1], [13]) and the XORAP is also a parallel decision rule, other parallel fusion rules with one-bit quantization cannot outperform the XORAP rule in the parallel case, in terms of the rate of decrease of the Bayes error as r approaches 1. This proves that the XORAP is the optimal rule for both tandem and parallel cases as r → 1. Second, in Theorem 2 we will prove that the Bayes error of the XORAP rule decreases to zero as a constant multiplied by √ 1 − r, as r → 1 and then compare it with the case of centralized detection.
The rest of the paper is organized as follows. Section II describes the problem formulation, proof of the optimality of the XORAP rule for the two-sensor tandem topology, and the derivation of the asymptotic (as r → 1) Bayes error for both CLRT and XORAP. Additionally, random number generation to corroborate the theoretical results, graphical explanation for the asymptotic error behavior of XORAP, and results of suboptimal solution obtained from a genetic algorithm based simulation study are also presented in this section. Section III concludes this paper.

II. PROBLEM FORMULATION AND SOLUTION
Consider two sensors monitoring a region of interest to ascertain the presence of a signal (hypothesis H 1 ) or its absence (H 0 ). When a signal is present in the region, the signal components at the two sensors, which are assumed to be deterministic and known, are received in additive Gaussian noise. Otherwise, the observations at the sensors have only noise components. The two hypotheses can be stated as the following equation: where . As in [4], N is assumed to be distributed as bivariate Gaussian with zero means, unit variances and correlation coefficient r. Since s 1 and s 2 could assume negative or positive values, without any loss of generality we can restrict r to be in the interval [0, 1]. Assuming prior probabilities for the hypotheses, namely P(H 0 ) = π 0 , P(H 1 ) = π 1 = 1 − π 0 , we consider the Bayes error P e = π 0 P(D 1 |H 0 ) + π 1 P(D 0 |H 1 ) as the performance criterion for a decision rule, where D j , (j = 0, 1), is the decision favoring hypothesis H j . Here, the first conditional probability is the probability of false alarm P f , that is, the probability that the decision rule decides hypothesis H 1 , when the true hypothesis is H 0 . Similarly, the second conditional error probability is the probability of miss, P m . As shown below, the optimality of XORAP and its asymptotic error are valid for any prior probabilities. For simplicity, the asymptotic error rate of CLRT is derived for the case of equal prior. Similarly, in the simulation studies of sections II. D and II. F we assume equal prior. If needed, similar analyses can easily be done for unequal prior probabilities. The notations used in this paper are listed in Table 1.

A. XORAP RULE AND ITS OPTIMALITY
In a tandem decentralized detection system with two sensors, see Fig. 1, assume the decision rule of sensor 2 (FC) is denoted by W 2 and the decision rule of sensor 1 is given by W 1 , with R W 1 as the region in which sensor 1 makes a decision favoring H 1 . Since we consider one-bit sensor reports, both W 1 and W 2 take values from {0, 1}. Referring to Fig. 2, we have the following Theorem.
Theorem 1: For a two-sensors decentralized detection system with tandem topology, as r → 1, the optimum decision rule W 2 can be expressed as the Exclusive-OR (XOR) rule between W 1 and I (x 2 ∈ R W 1 ), where I (·) is the indicator function which takes value 1, if x 2 ∈ R W 1 , and value 0, otherwise. The optimum partitions of both sensors' observations, as bits '1' and '0', are the same and are given by alternating segments of fixed length, D = |s 2 − s 1 |, assuming that the length is not zero.
Proof: Appendix A proves in detail that the optimal detection rule for the two-sensor tandem topology, with one-bit quantization, is exactly the XOR rule with alternating partitions as shown in Fig. 2, as r → 1. This result is valid for any prior probability.

B. ASYMPTOTIC ERROR ANALYSIS OF CLRT
Let the covariance matrix of noise N be denoted as , with 11 = 22 = 1 and 12 = 21 = r. Since CLRT is a LRT based on the observation X , it can be easily seen from [15] that the Bayes error P eC for CLRT, with π 0 = π 1 = 1/2, is given by Q( ). Using asymptotic expansion for the Q(·) function as the argument becomes infinitely large, we get lim r→ 1 P eC = e − D 2 16(1−r) . Hence, the Bayes error decreases to zero exponentially fast, with increasing 1/(1 − r).

C. ASYMPTOTIC ERROR ANALYSIS OF XORAP
We begin by evaluating the probability of correct decision of the XORAP rule when H 0 is true, that is the probability, P c0 = P(XORAP decides H 0 |H 0 ). The contributions to this probability come from two distinct probabilities P 1 and P 2 , which are defined below in the following two equations.
where f (x, y; r) denotes bivariate Gaussian density with zero means, unit variances and correlation coefficient r. P 1 is nothing but the probability that both the sensor observations fall in the same interval (iD, (i + 1)D) in Fig. 2. Therefore, when this happens, XORAP will correctly decide H 0 . By looking at Fig. 2, it is plain that, under H 0 , XORAP will also decide correctly when X 1 ∈ (iD, (i + 1)D) and X 2 ∈ (iD ± 2kD, (i + 1)D ± 2kD), k is an integer, thereby contributing to the probability P 2 . Theorem 2: By using the result by G Pólya in [17], we show below that Proof: Based on the partition and the XOR rule shown in Fig. 2 We will be utilizing a result from [17] to evaluate J i , as r → 1. The notation L(·) below and its arguments from this reference are restated here for our use.
We will show in Appendix B that P 2 decreases to zero exponentially fast and can be neglected when compared to 1 − P 1 . Hence, the probability of false alarm, P f = 1 − P c0 = 1 − P 1 − P 2 goes to zero as a constant multiplied by √ 1 − r (4). The other conditional probability of error, the probability of miss, is exactly the same as P f . This can be seen from the following argument. Since the error in XORAP is caused by the pair of noise samples, whenever XORAP makes an error under H 0 , the same noise pair would also cause an error if H 1 were true. Similar comments apply to non-error-causing events. Hence, the Bayes error of XORAP is P eX = P f , for any prior probability.

D. SIMULATION OF XORAP ERROR
In this subsection, we study the probability of error behavior of XORAP, by generating bivariate Gaussian samples with different values of correlation coefficient r, which can be simulated by using the following equation.
where N 1 and Z are i.i.d as Gaussian with zero mean and unit variance, i.e. ∼ Gaussian (0, 1), and = √ 1 − r 2 /r. Table 2 shows how the Bayes error P eC and P eX , respectively for CLRT and XORAP, decrease towards zero as r increases towards 1. The Bayes error probabilities for CLRT are based on exact theoretical expression, assuming equal prior, π 0 = π 1 = 1 2 . The P eX values shown in the Table 2 for r = (0.99, 0.999, 0.9999, 0.99999) are remarkably close to what one gets from the asymptotic error expression for P eX ≈ 1 − P 1 , shown as the last column in the Table 2. Since P f = P m for XORAP, P eX is independent of prior probability.
In this simulation study and the asymptotic error analysis, we assumed that the origin coincided with T 0 in Fig. 2. As can be seen from the proof of Theorem 2, if the origin were offset from T 0 , then the asymptotic error still decays as √ 1 − r, but the proportionality constant in (4) will slightly be altered. Similarly, P eX values in Table 2 will change slightly.
The following subsection provides a graphical illustration of why Bayes error of XORAP decreases at a much slower rate when compared to the CLRT rate of decrease.
The CLRT separation line is given by [4] (s 1 − rs 2 )X 1 + (s 2 − rs 1 )X 2 Assume = 1 − r, as r → 1, → 0. Then (12) is reduced to According to Taylor series expansion, by ignoring the second and higher orders of , the CLRT separation line can be obtained as The CLRT divides the plane into two parts, with those points (X 1 , X 2 ) falling below the straight line (X 2 = X 1 − 0.25, → 0) classified as decision H 1 and those falling above the line classified as decision H 0 .
In Fig. 3, the line above the CLRT separation line is the line through the origin, with slope 1 and the line below the CLRT separation line is represented as X 2 = X 1 − 0.5. At r = 1, the probability distribution of (X 1 , X 2 ) is degenerate along the line X 1 = X 2 , when H 0 is true, and the distribution is degenerate along the line X 2 = X 1 −0.5, when H 1 is true. The shaded green area corresponds to the region where XORAP decides H 1 , whereas CLRT decides H 0 . The opposite is true for the shaded red area. Under H 0 , as r gets closer and closer to 1, the probability distribution keeps increasingly getting concentrated along the X 1 = X 2 line. Hence, the CLRT will make less and less error (error happens when samples fall below the CLRT separation line) as r gets closer and closer to 1. In the area above the X 1 = X 2 line, there are regions labeled as '01' or '10', where the XORAP decides H 1 , resulting in decision errors for XORAP, when H 0 is true. However, the probability of this happening will be negligible and will decrease exponentially to zero as r → 1. Now, consider hypothesis H 0 being true. What differentiates the rates of decrease of the probabilities of false alarm of CLRT and XORAP is the difference in the probability mass over the areas within the band, flanked by the lines, X 2 = X 1 and X 2 = X 1 − 0.5, where the two decision schemes make decision H 1 . The probability mass over the green shaded area, where XORAP decision differs from CLRT, contributes to XORAP making an incorrect decision, H 1 . In the shaded red area, where XORAP decision differs from CLRT, the XORAP makes the correct decision, H 0 . Since the probability mass is increasingly getting concentrated along the X 2 = X 1 line, as r tends to 1, the probability mass over the red area, which is below, and touching at points along the X 2 = X 1 − 0.25 line, decreases to zero very fast. Hence, this addition to the probability of correct decision for the XORAP has no impact on the rate at which XORAP probability of error (P f for XORAP) approaches zero. In contrast, as r tends to 1, the probability mass over the green area, which is closer to, and touching at points along the X 2 = X 1 line, goes to zero at a slower rate. Although one cannot predict from the graph the rate at which the error would decrease to zero, the analytical expression in Theorem 2 shows that this is of the order √ 1 − r. A similar conclusion can be drawn when H 1 is true.

F. SUBOPTIMAL SOLUTIONS FROM THE GENETIC ALGORITHM (GA)
Finding the sensor decision rules and the fusion rule that minimize the Bayes error is an non-linear integer programming problem [14]. In this subsection, the GA is used to find suboptimal solutions for both parallel and tandem topologies. The flowchart of this GA-based method is shown in Fig. 4.

1) GA OPERATIONS
Key operations carried out in the GA are as follows.
Initialization: In the beginning, n p numbers of solutions are randomly generated, each of which is a binary sequence that is converted from the sensors' decision rules as follows.
1) Divide the observation range of the k-th (k = 1, 2) sensor into a number of sub-intervals, such as S k , with equal space division. 2) Assume that sensor k has B k reporting bits. Sensor observation X k is quantized by assigning each subinterval with one value from the set {1, 2, . . . , Q k } where Q k = 2 B k . Quantized values of all sub-intervals are concatenated into a sequence. 3) Represent each value of the above sequence with its binary format to obtain a binary sequence b k with a length of B k S k bits, k = 1, 2. 4) In the parallel topology, the solution is represented as b P = [b 1 b 2 ], which is obtained by concatenating binary sequences from both sensors and is of length B 1 S 1 + B 2 S 2 . In the tandem topology, b T = b 1 with length of B 1 S 1 only, because given the decision rule of sensor 1, the rule of sensor 2 can be obtained by utilizing a person-by-person optimal decision, according to the tests in (15) and (16), see Appendix A. Bayes error calculation: As the fitness criterion, the Bayes error for each of the n p solutions is evaluated using the loglikelihood test at the FC.
Selection: A number of n c 2 pairs of solutions (i.e., parents), as determined by the crossover probability p c , are selected by the roulette wheel selection, where solutions with a lower Bayes error are more likely to be selected.
Crossover and mutation: For each pair of the selected solutions, uniform crossover [18] with a crossover probability p c is performed to generate a new pair of solutions (i.e., offsprings). In the mutation step, each bit in one solution will be flipped with a probability µ c from '0' to '1' or vice versa [18].
As a final step in each iteration, the total of n c new offsprings are merged with the n p solutions at the beginning of the iteration, among which n p solutions with lower Bayes error are chosen as input for the next iteration. At the end of the final iteration, the solution with the lowest Bayes error will be output as the final sub-optimal solution. Furthermore, for one-bit quantization, by checking the result of the loglikelihood test used for this final solution, the corresponding fusion rule, either AND rule, OR rule, XOR rule, or ''ignoring one of the sensors'' rule, can be decided. A similar algorithm and discussion of complexity can be found in [14].

2) SIMULATION RESULTS
The parameters in this simulation are n p = 500, p c = 0.8, n c = 400, µ c = 0.02, and |I | = 1000. VOLUME 10, 2022 First, we consider one-bit quantization for sensor observations, i.e., B 1 = B 2 = 1. For several testing points of correlation coefficient r ∈ (0.93, 0.9999), s 1 = 0.5, s 2 = 1, σ 2 = 1, and η = 1 (same parameters used in Table 2), we studied how one-bit quantization pattern might change if LRT fusion with one-bit quantized data is optimized using the GA in the two-sensor parallel topology, like the one used in [16]. It is interesting to notice that the best rules identified by GA are as follow. It is ''ignoring sensor 1'' for r = 0.93, AND rule for r = 0.95, and XORAP for r ≥ 0.999. For r = 0.99, GA yields the XOR rule which is slightly different from XORAP, where the quantization intervals of the GA alternate over a certain segment around the origin, but at the two ends, only one interval happens for the remaining parts of the real line. Although the GA can only claim sub-optimal solutions, nevertheless, it also shows that XORAP could be the optimal one-bit fusion rule for r close to 1.
Second, we study the variation of the Bayes error when using both one-bit and two-bit quantizations as r approaches 1 from 0.9, and s 1 = 2, s 2 = 4, σ 2 = 1, η = 1. From  Fig. 5, we can observe that the rates of the Bayes errors decreasing to zero, for the two-sensor decentralized detection, with the quantized case are very slow when compared to the exponential decrease exhibited by the CLRT. The coarseness of one-bit parallel scheme is evident from Fig. 5. The average Bayes error remains high for values of r up to about 0.99, before it starts to drop. In contrast, tandem schemes with both one-bit and two-bit sensor reports, as well as the parallel scheme with two-bit sensor reports, show decreasing Bayes error trends over the span of r. Hence, the benefits of using two-bit quantization over one-bit quantization can be observed. Moreover, as to be expected, the tandem scheme exhibits superior performance when compared to the parallel scheme, when both schemes utilize the same number of bits for quantization.

G. DISCUSSION
A special case is what happens when s 1 = s 2 and r → 1. Partition interval in the XORAP rule cannot be of zero length.
Since observations at both sensors now will be identical under hypothesis H 1 and identical under H 0 (identical but with only noise) also, the FC essentially has information from any one sensor. The CLRT in such a case will have the performance of a single sensor. Hence, the Bayes error reaches a threshold that is bounded away from zero, determined only by the signal-to-noise ratio (SNR) of a single sensor, as r → 1.
An extension of analysis to the general case of n > 2 sensors would have to address several complex issues. The first issue one has to deal with is the general nature of n × n covariance matrix of the noise components at the n sensors. For analyzing the effect of strong noise correlations, in general, one has to examine the effect of n(n − 1)/2 pairwise correlation parameters on the Bayes error performance. Basically, the covariance matrix is not dependent upon one correlation, but several correlation parameters. One can restrict attention to some structured covariance matrices, such as the one based on an autoregressive model (AR-1) or an equicorrelated model [15]. In both cases, the covariance depends upon a single correlation parameter r. In [14] GA based solution for three sensors has been presented. For the CLRT, the SNR parameter K can be calculated, as was done for the two-sensor case, and its variation with respect to r examined. In general, K grows unbounded, as r → 1, except for the case of all identical signals at the sensors. However, for any decentralized detection rule, finding the rate of convergence of P e , as r approaches 1, is a difficult problem. For one reason, the optimal partitions for quantizers at the n > 2 sensors and the corresponding fusion center rule are difficult to find, even for one-bit quantization. XOR rule can be defined only for n = 2.

III. CONCLUSION
In this paper, we considered the performance of an exclusive-OR fusion rule for detecting the presence or absence of a known deterministic signal in correlated Gaussian noise. The XORAP's performance is perfect when the correlation coefficient r equals 1, but far worse than the centralized likelihood ratio test when 0.99 < r < 1. Our contributions are 1) to prove the optimality of the XORAP rule for twosensor tandem and parallel topologies; and 2) to analytically quantify the asymptotic Bayes error of the XORAP, as r → 1, and observe the pronounced difference in the asymptotic performances between XORAP and the centralized likelihood ratio test. Since XORAP is optimal for both two-sensor parallel and tandem topologies, as r → 1, this error convergence result applies to both network topologies.

APPENDIX A PROOF OF THEOREM 1
Before we prove Theorem 1, we prove the necessary conditions that the decision rules at the two sensors in the tandem configuration need to satisfy for the final decision to be globally optimal. Although these conditions have been stated in [5], [16], we state this as a theorem and provide the proof here. Also, these conditions are for any bivariate distributions, not necessarily restricted to the bivariate Gaussian family.
Theorem 3: For the global optimality of two-sensor tandem scheme, the decision rules at the two sensors need to satisfy the following coupled equations where η = π 0 π 1 is the decision threshold. L 1 and L 0 are corresponding functions in (15) and (16), Proof: The decision variables of sensors, W i , i = 1, 2, are defined in Table 1. Since the final decision is W 2 , the Bayes error can be written as C(W 2 ) = π 0 P(W 2 = 1|H 0 ) + π 1 P(W 2 = 0|H 1 ) = π 1 + π 0 P(W 2 = 1|H 0 ) − π 1 P(W 2 = 1|H 1 ). (18) The person-by-person optimal solution is obtained by first treating one of the sensor rules, say sensor 1, as fixed and then sensor 2 rule is optimized for minimum Bayes error, followed by the reverse operation of treating the sensor 2 rule being fixed and optimizing the sensor 1 rule. In the tandem structure, since the sensor 2 decision W 2 depends upon the first sensor decision, which could be either a bit '1' or a bit '0,' decision regions where W 2 = 1, that is, where sensor 2 decides H 1 , would generally be different for W 1 = 0 and W 1 = 1. Hence, the decision regions where W 2 decides H 1 are denoted as R W 2 |W 1 =i , i = 0, 1. For sensor 1, the decision region for H 1 decision is denoted as R W 1 . Obviously, the regions where sensors decide H 0 are denoted as the complements of these regions. First, assuming that the sensor 2 decision rules are fixed, we will minimize C(W 2 ) by optimizing sensor 1 decision rule. Expanding equation (18), we get.
Since W 2 depends upon W 1 and x 2 only, and W 1 depends upon x 1 only, equation (19) can be reduced to Similarly, Therefore,

C(W 2 )
= π 1 + π 0 To minimize C(W 2 ), if the item inside the curly bracket of equation (22) is negative, P(W 1 = 1|x 2 ) = 1; otherwise, P(W 1 = 1|x 2 ) = 0. The condition in (17) is reached. Next, we assume that sensor 1 rule is fixed so that we can find the sensor 2 decision regions in order to minimize C(W 2 ). Rewrite the Bayes error as VOLUME 10, 2022 Each of the two integrals within the parentheses in the above equation, which contribute to the Bayes error, can be individually minimized as they correspond to the cases of W 1 = 1 and W 1 = 0, respectively. The first integral is minimized by assigning all points x 2 to R W 2 |W 1 =1 that make the integrand within the square brackets negative. This leads to equation (15) above. Similarly, the second integral is minimized by assigning all points x 2 to R W 2 |W 1 =0 that make the integrand within the square brackets negative. This leads to the equation (16) above.
Degenerate distributions: As shown below, bivariate Gaussian distribution becomes degenerate distributions, as r → 1, under both the hypotheses. Let = 1 − r, then → 0 as r → 1. The joint probability density function (PDF) of bivariate Gaussian is given by where p(x 2 |H 1 ) is the marginal PDF of sensor 2 under H 1 .
Similarly, under H 0 , Therefore, the joint pdf p(x 1 , x 2 ) is degenerate with support x 1 = x 2 + s 1 − s 2 under H 1 and x 1 = x 2 under H 0 . We now proceed to prove Theorem 1 by showing that the XORAP rule is the unique solution satisfying both (15) and (16) when r → 1, and then showing that this solution also satisfies (17) when r → 1, which hence establishes the optimality of the XORAP rule in the two-sensor tandem topology.
We consider W 1 = 1 and W 1 = 0 within each of the following four cases.
is indeterminate and after multiplying both sides with the denominator of the left-hand side of (16), it shows that W 2 can be either a '1' or a '0', without affecting the Bayes error.
When W 1 = 1, η. 2) is indeterminate and after multiplying both sides with the denominator of the left-hand side of (15), it shows that W 2 can be either a '1' or a '0', without affecting the Bayes error. When W 1 = 0, In case 1), W 2 = 1 only when L 1 (x 2 ) ≥ η, with x 2 satisfying condition 1). Hence, such a decision in this case will not lead to the zero probability of error, as r → 1. Contrast this with the XORAP rule, which will have zero probability of error as r → 1 (see section II and [14] when r = 1). Hence, the case 1) x 2 + s 1 − s 2 ∈ R W 1 and x 2 ∈ R W 1 cannot be in the optimal rule as r → 1.
Similarly in case 4), W 2 = 1 only when L 0 (x 2 ) ≥ η, with x 2 satisfying condition 4). Hence, such a decision in this case will not lead to the zero probability of error, as r → 1. Contrast this with the XORAP rule, which will have zero probability of error as r → 1. Hence, the case 4) x 2 +s 1 −s 2 / ∈ R W 1 and x 2 / ∈ R W 1 cannot be in the optimal rule as r → 1. Therefore, optimal decision rule of sensor 1 will not encounter the cases of (x 2 + s 2 − s 1 ∈ R W 1 , x 2 ∈ R W 1 ) and (x 2 + s 2 − s 1 ∈ R W 1 , x 2 ∈ R W 1 ). Since x 2 is an arbitrary real number, both x 2 + s 2 − s 1 and x 2 cannot simultaneously fall in R W 1 or simultaneously fall in R W 1 . This tells us that the optimum partition of X 1 to obtain R W 1 and R W 1 has to be alternating segments of length |s 1 − s 2 | as shown in Fig. 2.
Next, we show that this XOR rule with alternating partition also satisfies the other coupling condition given by (17). Applying the degenerate distributions to (17), we obtain Now, starting from the XOR rule with alternating partition obtained in the above equations (30) and (31), and noticing that R W 2 |W 1 =1 = R W 1 and R W 2 |W 1 =0 = R W 1 , it is easy to see that the XORAP rule satisfies the inequality in (32). In other words, the W 1 partition of X 1 by applying (17) to the original partition and XOR rule that are obtained from (15), (16), results in no change to the original XORAP partition. Hence, identical XORAP partition for W 1 and W 2 satisfy the coupled necessary conditions for the optimality of tandem system, which closes the loop and asserts the global optimality of the rule when r → 1.

APPENDIX B RATE OF DECREASE OF P 2 AS r APPROACHES 1
Consider the term f (x, y; r)dxdy. Since f (x, y; r) = f (x)f (y|x; r), where f (x) ∼ Gaussian(0, 1), f (y | x; r) ∼ Gaussian(rx, (1 − r 2 )), where (·) is the CDF of standard Gaussian random variable and g(·) is the standard Gaussian density. Without loss of generality, consider +k(2D) in the above integral. Similar analysis for −k(2D) leads to the same conclusion.