Passive Multi-Channel Detection: A General First-Order Statistical Theory

In this article a general first-order statistical framework is established for passive multi-channel detection and localization of an unknown radiated signal. This radiated signal is written in terms of a finite basis expansion, and the map between these basis coefficients and the measured data on a sensor is a channel that might be known, partially known, or unknown except for the dimension of the signal subspace. The noise at each sensor is assumed to be Gaussian and white; the noise variances at each sensor may be known, unknown and equal, or unknown and possibly unequal. This article develops detectors for all nine of these cases. These detectors are each generalized likelihood ratios, and typically decompose into locally computed detector statistics plus pairwise coherence, or cross-validation, statistics. Of particular note are the scale-invariant detectors that preserve a constant false alarm rate (CFAR) property with respect to noise power.


I. INTRODUCTION
The problem studied here is to detect and localize a source of radiated acoustic or electromagnetic energy, using passively sensed signals at spatially separated sensors.The question to be answered is whether or not the measurements contain a signal common to all sensors, indicating the existence of a radiating source.The statistical framework is based on a first-order statistical model for the multivariate normal (MVN) measurements at each sensor.The underlying measurement model involves a linear channel mapping between basis coefficients of the transmitted signal and the associated time-domain signal on a receiving sensor.Although this underlying measurement model is linear, the resulting detector statistics are decidedly nonlinear functions of the measurements.
The detector statistics derived in this article are generalized likelihood ratios (GLRs), composed as a ratio of likelihoods, each of which is separately maximized with respect to unknown parameters in a measurement model.These GLRs take many forms, depending upon which parameters of the linear measurement model are unknown.The channel models are of a special form, each consisting of a channel map that factors into the product of a complex scalar ("gain") factor generally associated with coherent signal processing, and a basis mapping factor generally associated with subspace methods.We consider three possibilities: 1) The full channel map is known.In this case, both the complex gain and the basis mapping factor are known, and consequently this case allows coherent processing.We refer to this case as the known-channel case.
2) The basis mapping factor is known, but the complex gain factor is unknown.In this case, the knowledge of the basis mapping factor can be exploited in what can be called partially coherent processing.We refer to this case as the partially-known-channel case.3) Both the complex gain and the basis mapping factor are unknown, but the dimension of the signal subspace is known.Knowledge of this dimension can be exploited in incoherent processing.We refer to this case as the "unknown"-channel case.Generally, the multi-channel detectors of this article decompose into a combination of local detector statistics computed at each sensor plus a coherence statistic computed from measurements on pairs of sensors.For known channels (case 1 above), the local detector statistics are matched-filter statistics, and coherences are computed as real parts of standard complex Euclidean coherence [1].For partially known channels (case 2 above), the detector statistics are quadratic forms in matched-filter statistics and coherence is magnitudesquared coherence between matched-filtered measurements.For channels that are known only by the the dimension of the subspace in which the signal lies (case 3 above), no filtering is involved, and the detector statistic depends only on eigenvalues of a composite correlation matrix that contains local correlations and cross correlations between sensor measurements.
In each case, the detection statistic, when plotted as a function of relative delay and Doppler shift between data on the sensors, forms what might be called a "likelihood image".

A. CONNECTIONS TO THE LITERATURE
This is a article on Passive Source Localization (PSL) rather than Passive Radar (PR).The distinction is this.In PR, it is assumed that a reference channel receives a signal radiated from a source of opportunity.The question is whether a surveillance channel receives a delayed and Doppler shifted version of the reference signal.Under the null hypothesis of no signal in the surveillance channel, there remains a signal in the reference channel.Under the alternative hypothesis, there is signal in both channels.This is the model in [9], [10].In PSL, under the null hypothesis, there is no signal in the surveillance channel and no signal in the reference channel.Under the alternative hypothesis there is signal in both channels.This is the model in [2] and [8].The two problems are distinguished by their modeling of signals in two or more channels under the null hypothesis.
There are a number of approaches to the PSL problem.For example, detectors based on first-order models are derived in [3], [4], and [7].Our treatment of multi-sensor detection in a first-order multivariate normal model differs from this work in its factorization of likelihood into sensor-specific and sensor-coupling terms, in its modeling of the radiated signal, and in its treatment of unknown noise powers at each sensor array.
Approaches that assign a prior distribution to the common signal are reported in [11], where the model may be said to be a second-order statistical model, and in [12], where the marginalized measurement densities are not characterized by second-order covariance.
Methods of passive localization may also be categorized as estimation/localization or detection/localization.In the estimation/localization category, estimates of unknown source parameters, including its location, are found by maximizing an objective function (usually a likelihood function of the data).See for example [4] and [13].The approach of this article is to compute detection statistics for a set of posited source parameters such as rangle and Doppler.A plot of the detection statistic as a function of these parameters produces an image that may be used to identify parameter values that produce large detection statistics, indicating the presence and determining the location in range and Doppler of a radiator.This approach is also used in some sections of [7].
Table 1 outlines some of the relationships with the existing literature.Our results for the northwest 2 × 2 block of this table generalize existing results by allowing for a finitedimensional subspace model for the unknown radiated signal.The results of the third row address the important case of unknown and unequal noise variances at two different sensors.The results of the third column address the case of a channel that is known only by the dimension of its subspace.

B. CONTRIBUTIONS OF THIS PAPER
It is shown in this article that multi-channel detectors based on a first-order MVN measurement model often have a common detector structure where the detector statistic decomposes into a combination of locally computed detectors plus pairwise cross-validation terms.The locally computed detectors use only measurements collected at a single sensor, and these detectors are typically matched subspace detectors.The crossvalidation terms are coherence statistics that use data collected at two or more sensors.These terms are the statistics that distinguish bistatic or multistatic detection from simply adding together ("diversity combining") multiple single-sensor detection statistics.This decomposition is maintained for a variety of known or unknown noise and/or signal parameters.The importance of the cross-validation term is quantified by the receiver operating characteristic (ROC) curves and curves of P D vs SNR shown in Section VI.Without the cross-validation terms, the likelihood ratios are a diversity-combining sum of monostatic likelihood ratios.
The detectors of this article are derived for channels that are known, partially known, and unknown, terms to be precisely defined in due course.For each of these channels, detectors are derived for cases where the noise variance is known or unknown at each of the distributed sensors.See Table 1.The resulting detectors have an important invariance to measurement scaling, meaning that once a detection threshold is set, the false alarm probability is invariant to scale.Such detectors are said to maintain a constant false alarm rate (CFAR).The key results of the paper are contained in Sections V and VI, where detector statistics are derived and then analyzed for  their performance using simulated source transmissions and measurements.
Each of the nine detectors derived in this article may be placed in a cell of a 3 × 3 matrix, organized by noise model in rows and channel model in columns.See Table 2.In the first row of the table, where noise variances are assumed known, the detector statistics are not scale-invariant.In the second row, where noise variances are unknown but assumed equal, the detector statistics are scale-invariant and therefore they are CFAR with respect to a common noise variance.In the the third row, where noise variances are unknown but not assumed equal, the detector statistics are scale-invariant and therefore they are CFAR with respect to different noise variances.
A very preliminary version of this article was published as [14] and a related but different account was posted to arXiv as [15].

II. SIGNAL MODELS
The detectors derived in this article are applicable to a variety of detection problems, where the problem is to fuse measurements from several sensors into a single detection statistic.However, the results have a particular salience when applied to the problem of forming delay-Doppler images from measurements made at several distributed sensor platforms.The principle for fusing these received signals is based on likelihood in a multivariate normal model for measurements.
A source is assumed to emit an unknown, real-valued, bandlimited waveform s(t ) := Re{e i2π f c t w(t )}. ( The signal w(t ) is the unknown baseband signal transmitted by the source.A sensor receives a delayed, scaled, and noisy version of this transmission, Here G > 0 is a real-valued channel gain, which can include sensor gain and any attenuation losses due to propagation.The delay τ (t ) represents a time-varying propagation delay between the source and sensor.It depends on the speed of propagation in the medium and the time-varying relative positions of the source and sensor.The constant t denotes any offset between the clock of the sensor and a reference clock.The received waveform is then complex demodulated to obtain the complex baseband signal, where u(t ) denotes the combination of the demodulated passband noise (for example, RF interference) n(t ) and any baseband sensor noise.It will be assumed that the noise is wide-sense stationary (WSS) in time.That is, where the bar denotes complex conjugate.An advanced and complex-modulated version of u(t ), namely e −i2πνt u(t + τ ), would then be correlated as E e −i2πν(t+ ) u(t + +τ )u(t+τ )e i2πνt =e −i2πν σ 2 ρ( ).
(5) We assume the time-varying delay can be well-approximated by a first-order Taylor series This amounts to approximating the Doppler shift ν as ν = (V/c) f c , where V is the relative velocity between the radiating source and the receiving sensor.In addition, we assume that the signal is sufficiently narrowband so that with this approximation of τ (t ), the small terms t and νt/ f c can be neglected in the argument of w, so that (3) is where the complex gain g is defined to be g = Ge −i2π f c (t +τ ) .

A. SAMPLING AND SYNCHRONIZING THE DATA
We begin with a hypothesized delay and Doppler shift.The first step is to sample and "synchronize" (remove the delay and Doppler shift of) the data (7).The order of these steps is immaterial; in practice the sampling step is typically done first.
The continuous-time waveform x(t ) of ( 7) is sampled at rate 1/T S , to produce the discrete-time sequence (8) When the noise is sampled at rate 1/T S , it is assumed that ρ(kT S ) = δ k , where δ k is the Kronecker delta.That is, E[u((k + k)T S )u(k T S )] = σ 2 δ k ; the sampled-data noise is white, and this holds for the advanced and demodulated version of u(t ) as well.
We next produce a sampled version of the "synchronized" signal z(t, τ , ν ) = e −i2πν t x(t + τ ).This can be done in the discrete domain by advancing (8) by k , where τ = k T S , and we Doppler-shift it with frequency ν to produce Here we are using the "synchronizing parameters" θ = (τ , ν ) = (k T S , ν ).Below we denote the actual delay and Doppler parameters of ( 8) by θ = (τ, ν) = (kT S , ν).Then 1) for θ = θ , z[n; θ ] may be written where w[n] = w(nT S ) and u[n] = u(nT S ). 2) for θ = θ , z[n; θ ] may be written z[n; θ ] = ge i2πνk T S e i2π (ν−ν )nT S w(n − (τ − k T S )) + u((n + k )T S ); except for the extra phasing term, which is independent of the sampled time index n, this shows that the effect of mismatch between the actual frequency ν and the synchronizing frequency ν appears as the frequency modulation e i2π (ν−ν )nT S , where the frequency difference (ν − ν ) is commonly called the frequency difference of arrival (FDOA); the effect of delay mismatch between the actual delay τ and the synchronizing delay τ = k T S appears as the delay difference τ − τ = (k − k )T S , where this difference is commonly called the time difference of arrival (TDOA); In this article we study the case in which the delay and complex modulation (Doppler-shifting) has been done so that θ = θ , or in other words, (ν , τ ) = (ν, τ ), and consequently z is related to w by (10).We refer to this process as "synchronization".For each choice of θ , detectors are derived under the assumption that θ = θ .Performance of these detectors is then quantified by a receiver operating characteristic (ROC) curve, as in Section VI.The effects of incorrect synchronization, on the other hand, are quantified by a point spread function, which is studied in a separate paper.
We organize N of the samples (10) into the N-dimensional the assumption that θ = θ = (kT S , ν), we write ( 9) and ( 10) where In the known-channel case, g is assumed known.In the partially-known-and known-bydimension-channel cases, g is assumed unknown and given no prior distribution.The noise vector u ∼ CN N (0, R) is additive Gaussian noise.The N × N noise covariance matrix may be factored as

B. SUBSPACE SIGNAL MODELS AND SUBSPACE SIGNAL PROCESSING
Equation ( 11) is general.However, in many cases the discretetime baseband vector w ∈ C N may be modeled as a subspace signal where a ∈ C K are complex coefficients that fix the location of w in the K-dimensional subspace of C N spanned by the linearly independent columns of the matrix H ∈ C N×K .These linearly independent columns are a basis for the corresponding subspace, which may be denoted H .The model H may be a model for the source only, for the propagation channel only, or for a composition of the source and the channel.For example, when H is a K-column slice of an N × N discrete Fourier transform (DFT) matrix, the subspace signal is a linear combination of K DFT modes.When K = N, then the matrix H is the DFT matrix and the signal w is simply expanded in a DFT basis rather than the standard Euclidean basis.The subspace is the space C N .
Another example is the case when H is a K-column slice of an N × N Slepian matrix.In this case, the subspace signal is a linear combination of the dominant K = N2βπ eigenvectors (Slepian vectors, or discrete prolate spherioids) that essentially span the linear space of signals with time-bandwidth product N2βπ.When β = 1, then the signal subspace is the space C N .
The key feature of these examples is that such a model divides the measurement space C N into the direct sum of a K-dimensional signal subspace and an (N − K )-dimensional signal-free subspace, where noise variances may be estimated.This point will be exploited in the sections to follow.
The ratio of signal energy to noise energy is The per-sample, or input, signal-to-noise ratio (|g| 2 /σ 2 )a H a/N is scaled by the processing gain N/K that is achieved with resolution of the noisy measurement onto a K-dimensional subspace of C N .

III. GENERAL LINEAR MODEL FOR DISTRIBUTED SENSORS
The signal models of the previous section may be used to model the signals sensed in an array of distributed sensors.It is assumed that the measurement at each sensor is a complexvalued time series.This time series might arise as the output of a multi-sensor array that returns a beamformed output.Consider a measurement system with L sensors, each receiving a common transmitted signal propagated through its own propagation channel.For a particular hypothesized source location and velocity, we first synchronize the measurements at each sensor to correspond to that hypothesized source location and velocity.The delays and Doppler shifts needed for synchronization are determined by the physical location of the radiating source, the physical location of the sensor, and the relative velocity between source and sensor.The goal here is to bring each sensor's respective delayed and complex-modulated signal e i2πν t w(t − τ ) into alignment as a scalar multiple of a common w(t ).
Once this is done, the measurement model at sensor is Note that the signal amplitudes a ∈ C K are the same for all channels.The amplitude vector a is considered unknown and is not described by any probabilistic model or deterministic model that would constrain it to a finite set of symbols.
The composite model for all sensors is where If M such samples are recorded at each sensor, the resulting linear model is The notation is this: Importantly, the composite matrix F is assumed to remain constant for all M source transmissions a[m], m = 1, . . ., M, and the noise matrix U is assumed to consist of a sequence of independent CN N Z [0, R] random vectors.

A. THE CHANNEL MATRICES H
If baseband symbols are transmitted on a carrier through a free space channel, and then complex demodulated, the baseband signal model is gHa.This may be an explicit model For the detector statistics to follow, the general case is derived for arbitrary M, N and K ≤ N.This general case is then specialized for M = 1 and then for K = N, in which case the channel matrix H may be taken to be identity or even a known linear channel transformation.

IV. BISTATIC LIKELIHOOD FORMULAS
Here we specialize (17) to the case of L = 2 sensors.With little loss in generality and significant gain in notational clarity, we shall assume N 1 = N 2 .

A. LOG-LIKELIHOOD UNDER H 0
Under H 0 , the Gaussian log-likelihood is We have assumed that the noise at distributed sensors is uncorrelated, so that If the variances σ 2 1 and σ 2 2 are assumed to be equal but unknown, the estimate of σ 2 is The notation σ 2 | 0 indicates that this is the ML estimate under hypothesis H 0 .This estimate does require the fusing of autocorrelations ( The trace term in log-likelihood evaluates to −M(2 N ) and the ML estimate of σ 2 minimizes log det[R] under this constraint.For σ 2 1 and σ 2 2 unknown and unequal, this log-likelihood is maximized with respect to σ 2 1 and σ 2 2 to produce the ML estimates The corresponding compressed log-likelihood is 22) Importantly, the terms in compressed log-likelihood under H 0 are computed locally at each sensor, with no communication of measurements between the two sensors.Moreover, at the ML estimates for σ 2  1 and σ 2 2 , the trace term in loglikelihood evaluates to −M(2 N ).This result follows from a theorem of [11]: because the set of covariance matrices 0} is a positive cone, the ML estimate of R maximizes the log-likelihood of (18) under the constraint that the trace term is equal to −2MN.

B. LOG-LIKELIHOOD UNDER H 1
The Gaussian log-likelihood under H 1 is ) With all parameters known, the ML estimate of A is the pseudoinverse which may be written out as The estimator of A is a linear combination of scaled and complex demodulated measurements at each of the two sensors.
The corresponding compressed log-likelihood is When the unknown noise variances σ 2 1 and σ 2 2 are assumed to be equal, the ML estimate of σ 2 is Again, the trace term in the log-likelihood (26) evaluates to −M(2 N ), and the resulting compressed log-likelihood is When the variances σ 2 1 and σ 2 2 are unknown and unequal, then they appear both in the leading log terms and in the matrix inverse R −1/2 of the trace term.The ML estimates of σ 2  1 and σ 2 2 minimize the sum of the two log terms under the constraint that the trace term evaluates to −M(2 N ).These ML estimates are determined as the zeros of two coupled nonlinear equations.The parameters of these nonlinear equations are functions of measurements at both sensors and as a consequence log-likelihood is a complicated function of measurements at both sensors.However, these solutions are well-approximated by the "local" estimates where we have used the fact that P F = P H .The corresponding compressed log-likelihood is where R1 is the matrix Only the last term is a coherence term that depends jointly on Z 1 and Z 2 .The other terms depend only on Z 1 and Z 2 individually.

V. BISTATIC PASSIVE DETECTORS
There are three cases to be considered for the channel matrix F := g H : 1) Known channel.The channel matrix F = g H is known, and coherent processing is possible; 2) Partially known channel.In the channel matrix F = g H , the basis mapping factor H is known, but the complex gain g is unknown, and consequently only partially coherent processing can be done; 3) "Unknown" channel.The gain g is unknown and the subspace H is known only by its dimension K; the processing is incoherent.

A. DETECTORS FOR THE KNOWN-CHANNELS CASE
To assume F is known is to assume g 1 H 1 and g 2 H 2 are known.This is idealistic, but the results are illuminating and they establish a framework for deriving more realistic results.
The log-likelihood ratio statistic This log-likelihood ratio statistic is invariant to the group of transformations that transform Z as ZQ M , where here and throughout Q M denotes an arbitrary M × M unitary matrix.It is not hard to show, using P F = HH H and the cyclic property of trace, that this log-likelihood ratio statistic may be re-written as the Rayleigh quotient where β = (β 1 , β 2 ) T with β 1 = g 1 /σ 1 , β 2 = g 2 /σ 2 and where the i j are scalar-valued quadratic forms in matched filterings: Interpretations: When written out, the log-likelihood ratio of (32) is illuminating: This is, in fact, tr[ Â ÂH ].So the GLR L may be read as follows: with all parameters known, except for A, the GLR (35) is a sum of three terms: 1) The term 11 is the log-likelihood ratio for sensor 1 only; this matched subspace detector statistic [5], [6] is scaled by the fractional channel gain |β 1 | 2 /|β| 2 to determine its contribution to the log likelhood ratio.2) The term 22 is the log-likelihood ratio for sensor 2 only; it is scaled by the fractional channel gain |β 2 | 2 /|β| 2 to determine its contribution to the log likelhood ratio.
3) The term 12 is a coherence or cross-validating term that measures the correlation between Z 1 and Z 2 after each of these has been matched to its respective channel matrix.These results generalize to three or more sensors, denoted by their channel maps g H , noise variances σ 2 , and measurements Z .Then for The resulting L-sensor log-likelihood ratio is These results suggest a sequential implementation where sensors, their log-likelihoods and their coherences are added in.This strategy is not without its risks, as there is no knowledge, without simulation, of the null distribution of the log-likelihood ratio statistic.Special Cases: When there is a single snapshot, then M = 1 and Z is replaced by z.The L = 2 log-likelihood ratio may be written The first two terms in the square brackets are matched subspace detectors and the third is a cross-correlation between coefficients H H 1 z and The first two terms in the square brackets are energy detectors and the third is a cross-correlation between measurements.None of these detector statistics is scale-invariant and consequently under H 0 , where measurement scaling has the effect of scaling noise variance, none is CFAR with respect to noise variances.

2) NOISE VARIANCES σ 2 1 AND σ 2 2 ARE UNKNOWN BUT ASSUMED EQUAL
When the noise variances are unknown, but assumed equal, then the ML estimator of σ 2 under H 0 is given in (19) and the corresponding log-likelihood L 0 is given in (20).Under H 1 , the ML estimate of σ 2 is given in (27) and the corresponding compressed likelihod L 1 is given in (28).The log-likelihood ratio statistic L = L 1 − L 0 is then It is a few steps of algebra to show that this log-likelihood ratio is a monotone function of the scale-invariant matched subspace detector statistic [5], [6], which is to say where we recall that M(1, 1) = M(σ 1 = 1, σ 2 = 1).This statistic is invariant to the group of transformations that transform Z as aZQ M , where a is an arbitrary complex constant.Consequently the statistic is CFAR with respect to noise variance.Special Cases.When there is a single snapshot, M = 1, and the log-likelihood ratio may be written When the noise variances σ 2 1 and σ 2 2 are not assumed to be equal, then their ML estimates under H 0 are given in (21) and the corresponding log-likelihood function L 0 is given in (22).Under H 1 the ML estimates of the variances σ 2 1 and σ 2 2 are approximated in (29) and the approximate log-likelihood function L 1 is given in (30).The log-likelihood ratio statistic is then approximated as

B. DETECTORS FOR THE PARTIALLY-KNOWN-CHANNELS CASE
Here H 1 and H 2 are assmed known, but g 1 and g 2 are unknown.We begin with the idealized GLR detection statistic of (32).With the noise variances σ 2 1 and σ 2 2 known, the maximum likelihood solution for the unit vectorβ/|β| .= v = (v 1 , v 2 ) in (32) is the eigenvector of M(σ 1 , σ 2 ) corresponding to the maximum eigenvalue, and the resulting value of the log-likelihood ratio is this eigenvalue, which we write as This statistic is invariant to the group of transformations that transform Z as ZQ M .The statistic is not invariant to scale, and therefore it is not CFAR with respect to noise variances.Typically sensor position errors and clock synchronization errors are encoded in the unknown phase of the channel gains g .Since arg g = arg v , the maximum eigenvector v of M(σ 1 , σ 2 ) provides a passive autofocus solution. Moreover, , so the relative sensor signal-to-noise ratios are determined.
The maximum eigenvalue of M(σ 1 , σ 2 ) may be written where A, G, and |ρ 12 | 2 are defined in terms of (34) and are interpreted as follows: Special cases: When M = 1, then Z is replaced by z ∈ C 2 N and terms like 12 are written as 12 The solution for three or more sensors is the maximum eigenvalue of a matrix M, consisting of terms tr[H H Z Z H H /(σ σ )] but this maximum eigenvalue has no closed form characterization as in (46).
unknown, then the ML estimates for σ 2 under H 0 and H 1 may be used to compress likelihood.This produces the results of equations ( 19) and ( 27).Only the log-likelihood under H 1 depends on the complex gains g 1 and g 2 .So to maximize log-likelihood under H 1 with respect to these gains is to maximize a monotone function of the loglikelihood ratio, which may be taken to be the log-likelihood ratio of (41).The resulting log-likelihood ratio is then This result may also be written as the result of (45) with the common variances replaced by tr[ZZ H ]. The log-likelihood ratio statistic is invariant to the group of transformations that transform Z as aZQ M , where a is complex.Consequently the statistic is CFAR with respect to noise variance.Special cases: When M = 1, then Z is replaced by z ∈ C 2 N .and terms like 12 are written as 12 We begin with the compressed log-likelihood ratio of (44), in which σ 2 1 and σ 2 2 have been estimated from the data.The only remaining unknowns are the complex gains g 1 and g 2 , and these appear only in the last term of (44), which can be written To maximize log-likelihood under hypothesis H 1 with respect to the complex gains g 1 and g 2 is to maximize the last term with respect to β.Consequently, the log-likelihood ratio is This log-likelihood ratio statistic is invariant to the group of transformations that transform Z as diag[a 1 I N 1 , a 2 I N 2 ]ZQ M , where a 1 and a 2 are arbitrary complex constants that independently scale Z 1 and Z 2 .Consequently the statistic is CFAR with respect to noise variances.

C. DETECTORS FOR THE "UNKNOWN"-CHANNELS CASE
When the gains g and matrix H are unknown, then the composite matrix F is unknown.This unknown matrix is no more or less general than the matrix R −1/2 F. We do, however, know the dimension of the signal subspace, which is the dimension of the subspace F .
We begin with the log-likelihood ratio of (31).Maximization with respect to unknown . The resulting log-likelihood function is So, with the noise variances known, these are used to normalize the measurements Z 1 and Z 2 , their composite correlation matrix is computed and the first K eigenvalues of the normalized composite correlation matrix are summed for the detection statistic.This amounts to measuring the energy in the dominant subspace of the whitened correlation matrix.
The log-likelihood ratio statistic is invariant to the group of transformations that transform Z as ZQ M .It is not invariant to scale and therefore it is not CFAR with respect to noise variances.Special Cases: When M = 1, there is only a single eigenvalue, so We begin with the log-likelihood ratio of (47).To maximize with respect to P F is to choose P F = U K U H K , where U K now contains the dominant eigenvectors of ZZ H . Then the loglikelihood ratio is With the common noise variance unknown, the fraction of the total energy that is contained in the dominant subspace is used for the detection statistic.The statistic is invariant to the group of transformations that transform Z as aZQ M .Therefore it is CFAR with respect to noise variance.Special cases.When M = 1, there is no likelihood ratio statistic.
are invariant to the same transformation.Therefore the likelihood ratio statistic is invariant to independent scaling of each channel and CFAR with respect to noise variances.
Special Cases: When M = 1, there is no likelihood ratio statistic.

VI. NUMERICAL EXPERIMENTS
We now present some numerical results for the detector statistics we have derived.The cases we analyze provide insights into the performance of detector statistics under various conditions of channel model and noise model.ROC curves and curves of Probability of Detection (P D ) versus input SNR for each of our cases demonstrate the importance of including cross-validation terms in a detector statistic.Detectors without the cross-validation terms are listed in Table 3.
For all the plots, the number of samples in a collection period is N = 16, the number of collections is M = 64, and the dimension of the signal subspace is K = 8.

A. NUMERICAL RESULTS FOR THE KNOWN-CHANNELS CASE
The channel matrices g H are known, and the associated detectors involve coherent processing.The noise variances in each channel may be known, unknown but equal, or unknown.Therefore there are three detector statistics from Section A to be analyzed according to their ROC curves.
In Fig. 1 we show ROC curves for six detector statistics at fixed input SNR of −10 dB.The simulations were done for data for which the noise variances on both channels were the same.The three solid curves represent the full detector statistics developed in Section V-A, namely (35), (41), and (44), whereas the three dashed curves represent noncoherent summation of detector statistics computed on individual sensors.In other words, the dashed curves are computed from versions of (35), (41), and (44) in which all terms have been dropped that involve cross correlations of measurements from sensor 1 with measurements from sensor 2. See Table 3.This  summing of locally computed detector statistics, without the use of cross-correlations, we refer to as diversity combining.
Fig. 2, shows the probability of detection as a function of input SNR, with false alarm probability fixed at P F = 0.01.These curves allow us to compare the full detectors (35), (41), and (44) to the corresponding detectors obtained from dropping the cross-validation terms.For the case when the noise powers are known, dropping the cross-validation term results in a 2 dB loss in performance; when noise powers are assumed equal but unknown, dropping the cross-validation term results in a 4 dB loss; and when noise powers are assumed unknown, dropping the cross-validation term results in a 3 dB loss.

B. NUMERICAL RESULTS FOR THE PARTIALLY-KNOWN-CHANNELS CASE
For the detectors of Section V-B, the basis mapping matrices H are known, but the complex gains g are unknown; this is the partially coherent case.The noise variances in each channel may be known, unknown but equal, or unknown.Therefore there are three detector statistics, namely (46), (47), and (49), to be analyzed according to their ROC curves.
In Fig. 3 we show ROC curves for six detector statistics at fixed input SNR of −10 dB.The three solid curves represent the full detector statistics (46), (47), and (49), whereas the three dashed curves again correspond to dropping the cross-validation terms (see Table 3).Thus the dashed curves represent simple diversity combining of detector statistics computed on individual sensors.The comparison between solid curves and dashed curves quantifies the loss in detectability from using only diversity combining of local statistics.
Fig. 4 shows probability of detection plotted as a function of input SNR, with the false alarm probability fixed at P F = 0.01.Again these curves allow us to compare the full detectors (46), (47), and (49) to the corresponding detectors obtained from dropping the cross-validation terms.For the case when the noise powers are known, dropping the cross-validation term results in about a 1 dB loss in performance; when noise powers are assumed equal but unknown, dropping the crossvalidation term results in a 4 dB loss; and when noise powers are assumed unknown, dropping the cross-validation term results in a 2 dB loss.It is noteworthy that the drop in performance between each detector and its diversity-combining counterpart is smaller here than in the case of a known channel.

C. NUMERICAL RESULTS FOR THE "UNKNOWN"-CHANNELS CASE
For the detectors of Section V-C, neither the basis mapping matrices H nor the complex gains g are known; however, the dimension of the signal subspace is assmed known.This is the incoherent case.The noise variances in each channel may be known, unknown but equal, or unknown.
In Fig. 5 we show ROC curves for six detector statistics at fixed input SNR of −10 dB.The three solid curves represent the detector statistics developed in Section V-C, namely (50), (51), and (55), while the three dashed curves represent diversity-combining of locally computed detector statistics.The comparison between solid curves and dashed curves quantifies the loss in performance between the joint two-sensor detector statistics (solid curves), and the noncoherent combining of locally computed statistics (dashed curves).The ROC curves near the chance line indicate only that at this choice of SNR the likelihood ratio detector for an unknown channel has very low P D when the noise variances on the two channels are unknown.Fig. 6 shows the probability of detection for (50), (51), and (55) as a function of input SNR, with the false alarm probability fixed at P F = 0.01.Again these curves allow us to compare the full detectors (50), (51), and (55) to the corresponding detectors obtained from dropping the cross-validation terms.When noise powers are assumed equal but unknown, dropping the cross-validation term results in a 3 dB loss.An important observation is that for the cases where the noise variance on each sensor is known and where the noise variance is unknown and not assumed to be equal, there is no advantage to using our approximate full detector over the diversity-combining sum.
It is important to recall that these generalized likelihood ratio detectors are principled detectors with defensible invariances, but no claims to optimality.

D. CHANNEL COMPARISONS
The previous subsections have addressed detector performance under various assumptions on the channel models and noise models.In this subsection we fix a noise model and compare detector performance for various degrees of prior knowledge about the channels.Fig. 7 shows the probability of detection as a function of input SNR for the cases where  we assume the noise is known at each sensor.We can see that there is about 1 dB of loss between the detectors for known channels and partially-known channels and then about 3 dB of loss between partially-known channels and unknown ones.For these experiments, the number of samples in a collection period is N = 16, number of collections is M = 64, and dimension of the signal subspace is K = 8.These results are generated with a fixed probability of false alarm of P F = 0.01.These plots include only the full detectors and not the versions without cross validation.
Fig. 8 shows plots of the probability of detection as a function of input SNR for the case of unknown but assumed equal noise variance.We see about a 1 dB loss between known and partially-known channels but now observe about a 6 dB loss between partially-known and unknown channels.For these experiments, the number of samples in a collection period is N = 16, number of collections is M = 64, and dimension of the signal subspace is K = 8.These results are generated with a fixed probability of false alarm of P F = 0.01.These plots include only the full detectors and not the versions without cross validation.
Fig. 9 shows plots of the probability of detection versus the input SNR when we assume that the noise in unknown at both sensors.In this case, we notice about a 4 dB loss between known and partially known channels, and we notice again about 6 dB loss between partially-known and "unknown" channel cases.These results are generated with a fixed probability of false alarm of P F = 0.01.These plots include only the full detectors and not the versions without cross validation.

VII. CONCLUSION
In this article we have developed a general first-order statistical framework for deriving bistatic passive detectors of a radiated signal.The bistatic problem is to test the hypothesis that two sensors measure only noise versus the alternative that both sensors contain a delayed and Doppler-shifted version of a common radiated signal.Typically this radiation is electromagnetic or acoustic.The results generalize to more than two sensors.
Three general channel models are assumed: fully coherent, where a subspace model for the source signal is known and the complex channel gain to each sensor is known; partially coherent, where a subspace signal model is known but the channel gains are unknown; incoherent, where only the dimension of the signal subspace is known and the channel gains are unknown.Within these three general classes, the additive white Gaussian noise at each sensor may be known, may be unknown but equal, or may be unknown and unequal.Thus this article treats nine different cases.
Many of the detectors statistics derived in this article decompose into a sum of two types.The first type involves single-channel detector statistics computed using only local measurements at a sensor.The second type is a two-channel statistic that measures coherence between the synchronized signals at each of the two sensors.The exact form of this coherence statistic depends on the channel model and what is known about the noise variances at each sensor.In each case, there is an intuitive interpretation for the resulting detector.
The ROC curves and P D vs SNR curves generated from numerical experiments quantify the importance of the crossvalidation terms in the various detectors.As expected their importance is high for known channels, medium for partially known channels, and low for "unknown" channels.These conclusions are drawn for the parameter choices made in the simulations.For some choices of parameters in the radiated signal model and channel noise variances, it may turn out that diversity combining without cross validation performs about as well or better than diversity combining plus cross validation.
The degradation of detector performance for unknown noise variances is significant, suggesting that secondary data should be used to estimate noise variance, and perhaps even noise covariance for use in adaptive versions of these detectors.In first-order models for measurements it is not uncommon to find that ML problems are ill-posed when the unknown covariance matrix is constrained only to be nonnegative definite.However with secondary data, the problems are well-posed.To accomodate these more general noise models, we continue to explore the extension of single channel adaptive filters to the multichannel problems of this article, following the single channel framework of [16], which extends the work of [17], [18], [19], [20].
The detectors statistics of this article may be used in any passive imaging problem (e.g.seismology, cosmology, vibration analysis).However our motivation is passive bistatic radar/sonar.These detectors may be adapted to bistatic synthetic-aperture approaches to source localization by identifying the multiple snapshots of this article with measurements made from a sequence of sensor locations.In this case snapshots are identified with slow time and the known sensor locations at a sequence of slow times are used to compute phasings.These phasing are used for virtual beamforming, as is usual in the formation of an image from a synthetic aperture.This point will be elaborated in a separate paper.
In Table 3, for each of the nine detectors of Table 2, we give the approximate GLRs that set the cross-validation terms to zero.These detector statistics are called diversity-combining detectors because they combine only locally computed detector statistics.No cross correlations between measurements at the two sensors are computed or used in the detector statistics.
for symbols a modulating the basis H or it may be a bandlimited spectral representation of a transmitted signal on a compactly supported observation interval.If a free space channel is replaced by a more general linear passband channel, then the baseband signal model is gTHa.The matrix T is a baseband representation of the passband linear channel, and gTH is a composition of the source basis H and the channel transformation T. The matrix gTH may be different for each sensor.A model g T H accounts for this effect, so that g H = g T H may differ from g H = g T H.When the basis H and channel transformation T are known, then H is known and g H may be known or partially known.If the set of T , = 1, . . ., L, are unknown and unconstrained, then g H is unknown and unconstrained.So, to assume the T unknown is to assume the subspaces H , = 1, . . ., L to be known only by their dimension.B.THE PARAMETERS K, M, AND NConsider the th channel, for which the measurement model is z[m] = g H a[m] + u [m].This model arises naturally when the N measurements are made in an array of N colocated sensors, in which case the set of measurements z [m], = 1, . . ., N is said to be the mth array snapshot, and M such snapshots are typically organized into the N × M spacetime data matrix Z.It is common to assume the signal component of each snapshot shares the same complex gain and subspace basis if the EM or acoustic field is stationary in time and space.But what if the sensor is a single sensor, or the complex scalar output of a beamformer in a sensor array?Suppress the subscript and write this as z[m] = gHa[m] + u[m] ∈ C N .In this case, assume the measurement is a long sequence of length P. Segment this sequence into M short sequences z[m] ∈ C N , with MN = P.Further assume that the signal component of each of these short sequences may be modeled as x[m] = gHa[m], with H ∈ C N×K .Then the N × M measurement matrix Z = [z[1], . . ., z[M]] = gHA + U, where A = [a[1], . . ., a[M].Why not simply model the original long sequence as z = Ha + u, with H ∈ C MN×K ?In this case there is a single snapshot of measurement dimension MN = P.One might argue for either of these models, but to model a long transmitted sequence as a subspace signal of subspace dimension K requires more modeling confidence than to model short segments as subspace signals of subspace dimension K.The essential point is that with MN fixed, one has the flexibility to choose M, N, and K for subspace modeling.There are special cases.Choose M = 1, in which case N = P. Then H ∈ C N×K and A = a ∈ C K .The a are source parameters for transmission Ha in the basis H.If K = N, then H ∈ C N×N and A = a ∈ C N .It is then more natural to think of Ha as a linear transformation of an unknown transmitted signal a by a channel H.If the channel is free of filtering, then H = I N and the model is z = ga + u ∈ C N .

FIGURE 1 .
FIGURE 1. P D versus P F for a fully-known channel and three different assumptions about the noise variances.These detectors, namely (35), (41), and (44), use the cross-validation terms (solid curves) or they use only the local terms (dotted curves).Input SNR is fixed at −10 dB.

FIGURE 2 .
FIGURE 2. P D versus input SNR for the fully-known channel and three different assumptions about the noise variances.These detectors, namely (35), (41), and (44), use the cross-validation term (solid curve) or they use only the local terms (dotted curves).The probabiity of false alarm is set at P F = 0.01.

FIGURE 3 .
FIGURE 3. P D versus P F for a partially-known channel and three different assumptions about the noise variances.These detectors, namely (46), (47), and (49), use the cross-validation terms (solid curves) or they use only the local terms (dotted curves).Input SNR is fixed at −10 dB.

FIGURE 4 .
FIGURE 4. P D versus input SNR for the partially-known channel and three different assumptions about the noise variances.These detectors, namely (46), (47), and (49), use the cross-validation term (solid curve) or they use only the local terms (dotted curves).The probabiity of false alarm is set at P F = 0.01.

FIGURE 5 .
FIGURE 5. P D versus P F for an unknown channel operating on a signal subspace of known dimension, and three different assumptions about the noise variances.These detectors, namely (50), (51), and (55), use the cross-validation term (solid curve) or they use only the local terms (dotted curves).Input SNR is fixed at −10 dB.

FIGURE 6 .
FIGURE 6. P D versus input SNR for an unknown channel operating on a signal subspace of known dimension and three different assumptions about the noise variances.These detectors, namely (50), (51), and (55), use the cross-validation term (solid curve) or they use only the local terms (dotted curves).The probabiity of false alarm is set at P F = 0.01.

FIGURE 7 .
FIGURE 7. P D vs SNR for three different channel models when the noise variances are known.At P F = 0.01, there is about 1 dB of loss between the detectors for known channels and partially-known channels and then about 3 dB of loss between partially-known channels and channels that are unknown except for their subspace dimension.

FIGURE 8 .
FIGURE 8. P D vs SNR for three different channel models when the noise variances are unknown but assumed equal.At P F = 0.01, there is about a 1 dB loss between known and partially-known channels, and about a 6 dB loss between partially-known channels and channels that are unknown except for their subspace dimension.

FIGURE 9 .
FIGURE 9. P D vs SNR for three different channel models when the noise variances are unknown.At P F = 0.01, there is about a 4 dB loss between known and partially known channels, and about 6 dB of loss between partially-known channels and channels that are unknown except for their subspace dimension.

Table of Detector Statistics
where we have neglected additive constants.Eachof the first two terms is a monotone function of a local scale-invariant matched subspace detector, tr[H H Z Z H H ]/tr[Z Z H ]. The third term is a cross-validation term that depends jointly on Z 1 and Z 2 .The log-likelihood ratio statistic is invariant to the group of transformations that transform Z as diag[a 1 I N , a 2 I N ]ZQ M , where a 1 and a 2 are arbitrary complex constants that independently scale Z 1 and Z 2 .Consequently the detector statistic is CFAR with respect to noise variances.If no communication between sensors is possible, then both of the first two terms are used in what might be called a diversity-combining of detector statistics.More generally, the GLR detector statistic is a linear combination of local and global terms.Special Cases: When there is a single snapshot, then a trace term like tr[H H 1 Z 1 Z H 2 H 2 ] is written as z H 2 H 2 H H 1 z 1 in all terms of the likelihood ratio.When K = N and H 1 = H 2 = I, then a trace term is z H 2 z 1 .