Latent Parameter Estimation in Fusion Networks Using Separable Likelihoods

Multi-sensor state space models underpin fusion applications in networks of sensors. Estimation of latent parameters in these models has the potential to provide highly desirable capabilities such as network self-calibration. Conventional solutions to the problem pose difficulties in scaling with the number of sensors due to the joint multi-sensor filtering involved when evaluating the parameter likelihood. In this article, we propose a separable pseudo-likelihood which is a more accurate approximation compared to a previously proposed alternative under typical operating conditions. In addition, we consider using separable likelihoods in the presence of many objects and ambiguity in associating measurements with objects that originated them. To this end, we use a state space model with a hypothesis based parameterisation, and, develop an empirical Bayesian perspective in order to evaluate separable likelihoods on this model using local filtering. Bayesian inference with this likelihood is carried out using belief propagation on the associated pairwise Markov random field. We specify a particle algorithm for latent parameter estimation in a linear Gaussian state space model and demonstrate its efficacy for network self-calibration using measurements from non-cooperative targets in comparison with alternatives.

Abstract-Multi-sensor state space models underpin fusion applications in networks of sensors. Estimation of latent parameters in these models has the potential to provide highly desirable capabilities such as network self-calibration. Conventional solutions to the problem pose difficulties in scaling with the number of sensors due to the joint multi-sensor filtering involved when evaluating the parameter likelihood. In this article, we propose a separable pseudo-likelihood which is a more accurate approximation compared to a previously proposed alternative under typical operating conditions. In addition, we consider using separable likelihoods in the presence of many objects and ambiguity in associating measurements with objects that originated them. To this end, we use a state space model with a hypothesis based parameterisation, and, develop an empirical Bayesian perspective in order to evaluate separable likelihoods on this model using local filtering. Bayesian inference with this likelihood is carried out using belief propagation on the associated pairwise Markov random field. We specify a particle algorithm for latent parameter estimation in a linear Gaussian state space model and demonstrate its efficacy for network self-calibration using measurements from non-cooperative targets in comparison with alternatives.
Index Terms-sensor networks, hidden Markov models, Markov random fields, pseudo-likelihood, simultaneous localisation and tracking, Monte Carlo algorithms, dynamical Markov random fields

I. INTRODUCTION
A wide range of sensing applications including wide area surveillance is underpinned by state space models which are capable of representing a variety of dynamic phenomena such as spatio-temporal (see, e.g., [1]) processes. In fusion (or, object tracking [2]) networks, multi-sensor versions of stochastic state space models, also known as hidden Markov models [3], are used to estimate object trajectories in a surveillance region.
These models, however, are often specified by some latent parameters [4] some of which are unknown in practice and need to be estimated based on measurements from the state processes (or, objects). Examples of this problem setting in fusion networks include estimation of noise parameters [5], sensor biases [6], [7] and localisation/calibration of sensors in a GPS denying environment (e.g., in underwater sensing [8]) using point detections from non-cooperative targets [9], [10]. Another example is the estimation of the orientations and positions of nodes in a camera network based on feature detections [11].
Such problems fall in the domain of parameter estimation in state space models (see, e.g., [12] for a review). The parameter likelihood of the multi-sensor problem, however, does not scale well with the number of sensors which specifies the dimensionality of the unknown, or, the length of the measurement window that will be used for estimation. In the presence of multiple objects, the scalability issue is exacerbated by the measurement origin (or, data association) uncertainties that arise. Exact evaluation of the likelihood in this case has combinatorial complexity with the number of sensors [13], and, in general multiple object models, it is intractable even for a single sensor [14]. Estimation using a maximum likelihood (ML) or a Bayesian approach requires repeated evaluation of this likelihood (see, e.g., [12], [15], [16]) necessitating the use of efficient approximation strategies.
Intractable or computationally prohibitive likelihoods have motivated a number of lines of work in the statistics literature including likelihood free methods, or, approximate Bayesian computation [17], and, composite likelihood/pseudolikelihood approaches [18]. Likelihood free methods can be used for sampling from the parameter posterior in state space models [19] including those capable of modelling multiple objects [20]. The latter approach is based on developing surrogates to replace the original likelihood, e.g., block based approximations in maximum likelihood [21]. The pseudolikelihood perspective has been useful in networked settings in which constraints on i) the availability of parts of data, and/or, ii) scalability in processing with the number of sources arise. Examples include surrogates built upon local functions for estimation of parametric probability measures (e.g., exponential family distributions) from distributedly stored high dimensional samples [22]- [24].
It is not straightforward to find such pseudo-likelihoods for parameter estimation in state space models, however, that can resolve these two issues that arise when there are multiple data sources (or, sensors). It is worthwhile to develop and analyse surrogates that provide scalability with the number of sources, and, are suitable to local computations (e.g., local filtering). In [25], we proposed a pseudo-likelihood which is a product of "dual-term" approximations replacing their intractable exact counterparts. These approximations are separable in that they can be evaluated using single sensor filtering. This feature underpins scalability with the number of sensors. In [26], we have investigated the quality of the dual-term approximation, and, related it to the level of uncertainty in the prediction and estimation of the underlying state process.
In this work, we propose an alternative pseudo-likelihood which is provably a more accurate approximation for parameter estimation in multi-sensor state space models, under typical operating conditions. This approximation is also separable in that it is a scaled product of quadruple terms each of which can be found using single sensor filtering. In order to exploit this quad-term likelihood when there are multiple objects, extra attention should be paid to the handling of the data association uncertainties. We propose to use a hypothesis based parameterisation for the multi-object state space model as detailed in [14]& [27] in order to facilitate the use of the quad-term surrogate in this setting. In the parameterised model, we explicitly point out the combinatorial complexity of exact likelihood evaluation with the number of sensors. Then, we introduce an empirical Bayesian [28] interpretation of local filtering that facilitates the use of separable likelihoods within this model. These modelling aspects detailing the use of separable likelihoods in hypothesis based multi-object models constitute the second contribution of this work.
Separable likelihoods fit well in distributed fusion archictectures in which locally filtered distributions are transmitted in the network, as opposed to sensor measurements [29]. Moreover, they facilitate parameter estimation using a message passing computational structure which is desirable in networked problems. Specifically, the proposed likelihood surrogate together with independent parameter priors lead to a pairwise Markov random field (MRF) posterior model. The marginal distributions of this model approximates posterior marginals of the latent parameters to be estimated. We estimate these marginals iteratively using Belief Propagation (BP) [30] which consists of successive message passings among neighbouring nodes and updating of local marginals based on these messages. This computational structure lends itself to decentralised estimation, as well as scalable computation at fusion centre.
As an indication of the approximation quality, we consider the Kullback-Leibler divergence (KLD) [31] of the quad-term likelihood with respect to the actual pairwise likelihood and relate it to the uncertainties in predicting and estimating the underlying state using individual and joint sensor histories. We show that with more accurate local filters the approximation quality improves and the proposed quad-term separable likelihood has an improved error bound compared to the aforementioned dual-term approximation.
We provide a Monte Carlo algorithm for sensor selfcalibration in this framework for linear Gaussian state space (LGSS) models. The algorithm is based on the nonparametric BP approach [32] and involves sampling from the updated marginals followed by quad-term likelihood evaluations in the message passing stage. As BP iterations converge to a fixed point, the empirical average of the samples from the marginals constitute (an approximate) minimum mean squared error (MMSE) estimate of the latent parameters. The edge potential are evaluated using the entire measurement history within a selected time period in an offline fashion which is a strategy similar to particle Markov chain Monte Carlo (MC) algorithms [15]. As such, we differ from [26] in which windowing of measurements are used for enabling online processing.
Preliminary results of the proposed pseudo-likelihood can be found in [33]. This article provides a complete account of our solution strategy in multiple object models and is structured as follows: Section II provides the probabilistic model and the problem statement. Then, we detail pairwise pseudolikelihoods in parameterised multi-object models, and, relate this perspective to latent parameter estimation via inference over pairwise MRFs, in Section III. The proposed quad-term node-wise separable likelihood approximation is detailed in Section IV. Section V details the structural and computational properties of the quad-term approximation when the unknowns are respective quantities. Based on these results, we propose a distributed sensor localisation algorithm in linear Gaussian multi-object state space models in Section VI. The efficacy of this algorithm is demonstrated in comparison to the approach in [26], in Section VII. Finally, we conclude in Section VIII.

A. Probabilistic model
Let us consider a set of sensors V " t1, . . . , N u networked over communication links listed by E Ă VˆV. The graph G " pV, Eq is undirected (i.e., the links are bi-directional), connected, and, might contain cycles.
Next, let us consider a single object with state evolution modelled as a Markov process X k for time index k ě 1. This process is specified by an initial state distribution and a transition density. The state space model with parameters θ is then specified as follows [12]: The state value x k is a point in the state space X and is generated by the chain X k |pX 1:k´1 " x 1:k´1 q " πpx k |x k´1 ; θq, where .|. denotes conditioning. A measured value z i k P Z i at sensor i P V is generated independently in accordance with the likelihood model where subscript 1 : k indicates a vector concatenation over time.
In fusion scenarios, there are multiple such objects denoted by a multi-object state that induce measurements according to the above state space model resulting with sensors collecting a multitude of mea- where M k is the number of objects and O i k is the number of measurements collected at sensor i at time k. Here, the origin of Z i k,j s are unknown, i.e., the data associations which encode a mapping from these measurement (random) variables to the elements of X k (and, equivalently to the previously collected measurements from the same objects) are not known [2].
In the general multi-object tracking model, M k and O i k are random variables with laws determined by probability models regarding how these objects appear in the surveillance region and disappear (which is often referred to as their birth and death, respectively), the law for the false alarms, etc. For the sake of simplicity and ease of presentation in the limited space especially when relating computational complexity to the number of sensors and objects in the following discussion, we assume that all of the objects that exist at time step k " 1 remain in the scene for the time window considered and there are no missed detections and false alarms in sensor measurements which imply that M k " M and O i k " M k , respectively, for some positive integer M with probability one.
In this simplified "closed world" model the multi-object state transition is given by The likelihood of the measurements collected by sensor i is conditioned not only on the multi-objet state X k , but, also on a (data association) hypothesis τ i k that encodes the association of measurements to the objects within X k : and, the prior on τ i k assigns equal probability to all M ! permutations of r1, . . . , M s that τ i k can take, i.e.,

B. Statement of the problem
We are interested in estimating θ P B using the measurements collected across the network by sensors i P V for a time window of length t. The parameter likelihood of the problem quantifies how well these measurements fit into the state space model with the selected value of the parameter, and, is evaluated via multi-sensor filtering [4, Sec.IV]: where the time updates on the right hand side are given by ppX k |Z 1 1:k´1 , . . . , Z N 1:k´1 ; θqdX k , (9) and, the multi-sensor likelihood inside the integration factorises as where the terms in the product are given by (6).
Here, (8) follows from the chain rule of probabilities. The term in (9) is the contribution at time step k which updates the likelihood of the previous time step and is found using the Markov property that the sensor measurements are mutually independent of the measurement histories, conditioned on the current state for any value of θ. Let us denote this relation by Z j k K KZ j 1:k´1 |X k , θ for i P V (see, e.g., [34], for this notation). (10) follows from that the measurements of different sensors are mutually independent, i.e., Z i k K KZ j k |X k , θ for pi, jq P VˆV.
This likelihood can be used in a MMSE estimator of θ P B, in principle, for a random variable Θ associated with a prior density ppθq. This estimate is given by the expected value of the posterior distribution ppθ|Z 1 1:t , . . . , Z N 1:t q 9 lpZ 1 1:t , . . . , Z N 1:t |θq ppθq, θ " The MSSE estimate can be computed by generating L samples from the posterior distribution in (11) using, for example, Markov Chain Monte Carlo (MCMC) methods [15] and using these samples to find a Monte Carlo estimate of the integral in (12). In both this approach and maximum likelihood (ML) solutions aiming to maximise (8) with iterative optimisation, repeated evaluations of the likelihood are required. The evaluation of this likelihood is intractable, however, not only because of the pM !q N summations in (9), but, also because of the complexity in finding the integrations involved. The integrands here are i) the multi-sensor likelihood in (10), and, ii) the prediction density for X k based on the network's entire measurement history up to time k. In other words, (9) is the scale factor for the posterior density of Bayesian recursions, or, the "centralised" filter given by ppX k , τ 1:N k |Z where we denote by τ 1:N k the concatenation of τ i k s and it has pM !q N different configurations. Here, both the prediction (14) and update (13) are OppM !q N q.
In order to address these challenges, multi-object filtering (or, tracking) algorithms often employ two approximations: First, they aim to find the most probable data association hypothesis in (13) denoted byτ 1:N k , instead of both evaluating this expression for all possible associations and storing them. The benefits of doing so are that i) one can generate tracks (or, object trajectories) as simply marginals of ppX k , τ 1:N k " τ 1:N k | . q for k " 1, ..., t, and, ii) evaluations of the integral in (14) in the next time step can be restricted to this value of the hypothesis variable. Equivalently, the posterior distribution in (13) is factorised as (15) where the association variables appear as model parameters in the first term and the second term is similar to a prior distribution on these models with the difference that it is conditioned on the current measurements. At time k´1, let us select this "empirical prior" as where δ is Kronecker's delta function and Ð denotes assignment. The second approximation follows from the first one: Evaluation of the prediction stage in (14) reduces to evaluation of the Chapman-Kolmogorov equation for only the most likely value of the association parameters. This approach is often referred to as empirical Bayes [28], and is used to facilitate approximate solutions to otherwise intractable problems. A similar approximation can be used when evaluating the parameter posterior in (11). This leads to the following likelihood conditioned on τ 1:N 1:t where the factors are defined by This likelihood evaluated at τ 1:N 1:k "τ 1:N 1:k replaces the one in (8) when the emprical (model) prior is selected as in (16) (see Appendix A for details). We will refer to (17) as the empirical likelihood. Note that (18) is the integral term in (9).
The empirical likelihood update term is computationally more convenient, however, alone it is not sufficient for scalability with the number of sensors N : Findingτ 1:N k is equivalently an N`1-dimensional assignment problem which is NP hard even for N " 2 sensors [35] which partly underlies the local filtering paradigm for multi-sensor processing and our interest in compatible solutions. For a moment, let us consider the problem for a single object, i.e., for M " 1. In this case τ i k for i " 1, . . . , N have only one possible configuration (i.e., there is no data association uncertainty). Because the dimensionality of θ is specified by N and (8) will be evaluated for roughly N L samples (when estimating (12) (see, e.g., [36])) each of which costing -in the simplest linear Gaussian measurements case (9) 1 -at the least OpN 2 tq, the computational cost will be cubic in the number of sensors which can easily become prohibitive for large N .
The networked setting has additional constraints to take into account: The sensors perform local filtering of their measurements and exchange filtered (track) distributions over G as opposed to transmitting their measurements [29]. As a result, the network-wide measurements are not available to evaluate the likelihood of the problem. Instead, local distributions we denote by ppX k , τ j k "τ j k |Z j 1:k q are made available to neighbouring nodes whereτ j k is an approximation to the most probable association configurationτ j k found locally, based on only the local sensor measurements at sensor j. There are computationally efficient algorithms for finding such solutions for the single sensor problem (see, e.g., [35] and the reference therein). Therefore, a viable solution needs to build upon these densities and local data associationsτ i k as opposed to joint multi-sensor filtering in the network.
The problem we address in this work is the design of scalable approximations to (8) for estimating θ in a networked setting based on local filtering results at the nodes. The proposed approach also addresses the aforementioned computational bottleneck at fusion centres in centralised multisensor architectures with a designated node receiving unfiltered sensor measurements.
It is also worth noting that the parameter vector θ P B can be used to represent a wide variety of parameters of the global model some of which can be intrinsic to sensors i P V individually such as parameters pertaining to local noise models. We are particularly interested in a second class of parameters which have dependencies among sensors such as respective parameters such as sensor locations and similar "calibration" parameters. In the former setting, the estimation of local parameters decouple into independent estimation problems which can be solved using a suitable approach (see, e.g., [27], [37], [38]).
In our setting, θ fi rθ 1 , . . . , θ N s where θ i is associated with i P V and its estimation does not decouple and depends on all measurements across the network due to the dependencies of parameters, which, in turn, brings forward the multi-sensor aspects of the problem this work aims to address. Because local filtering is performed, on the other hand, local estimation of data associationτ i k is available independent of θ, which we discuss in detail later in Section V.

III. A PSEUDO-LIKELIHOOD AND A PAIRWISE MRF POSTERIOR FOR DECENTRALISED ESTIMATION
Pseudo-likelihoods are constructed from likelihood like functions which are computationally convenient and defined typically over smaller subsets of the data to overcome difficulties posed by intractable likelihoods over the entire set of data (see, e.g., [18] and the reference therein). Let us denote the network-wide data set by Z fi rZ 1 1:t , . . . , Z N 1:t s. A fairly general form for a pseudo-likelihood is given by [18] where S is an index set, ω s is a positive real number, and, Z ds and Z cs are mutually exlusive subsets of Z (for example, Z d1 " Z i 2 and Z c1 " Z j 1 , etc.). These sets can be selected in various ways ensuring that the factors are computationally convenient functions, for example, marginals and/or conditional densities, and, estimates based onlpZ|θq are sensible. Note that (8) is also in this form, however, with difficult to evaluate factors.
Let us consider θ " rθ 1 , ..., θ N s and a pseudo-likelihood surrogate for the empirical likelihood in (17): " ź pi,jqPE Here, E is essentially the set of sensor pairs whose likelihoods we would like to incorporate into the pseudo-likelihood, and, it is convenient to choose them as those that share a communication link, in a networked setting (Section II-A). The pairwise structure above is beneficial to use with the MMSE estimator in (12). Note that the MMSE estimate is the concatenation of the expected values of posterior marginals, i.e., ppθ i |Zq for i " 1, . . . , N . These distributions can be found using message passing algorithms over G when the surrogate (20) is used in (11) together with independent but arbitrary a priori distributions selected for Θ i s. Specifically, the parameter posterior corresponding to such a selection of prioirs and the pseudo-likelihood (20) is a pairwise Markov random field over G " pV, Eq [34]: where the node potential functions (i.e., ψ i s) are the selected priors (e.g., uniform distributions over bounded sets θ i s take values from) and the edge potentials (i.e., ψ ij s) are the pairwise likelihoods for the pairs pi, jqs. This model is illustrated in Fig. 1. The pairwise MRF model in (22) allows the computation of posterior marginal ppθ i |Zq through iterative local message passings such as Belief Propagation (BP) [30]. In BP, the nodes maintain distributions over their local variables and update them based on messages from their neighbours which summarise the information neighbours have gained on these variables. This is described for all i P V by In BP iterations, nodes simultaneously send messages to their neighbours using (24) (often using constants as the previously received messages during the first step) and update their local "belief" using (25). If G contains no cycles (i.e., G is a tree),p i s are guaranteed to converge to the marginals of (22), in a finite number of steps [30]. For the case in which G contains cycles, iterations of (24) and (25) are known as loopy BP (LBP). For the case, convergence does not have general guarantees, nevertheless LBP has been been very successful in computing approximate marginals in a distributed fashion, in fusion, self-localisation and tracking problems in sensor networks [39]- [41]. In our problem setting, we assume that the models over spanning trees of a loopy G are consistent in that they lead to "similar" marginal parameter distributions, which suggests the existence of LBP fixed points [42] that will be converged when initial beliefs are selected reasonably [43].

IV. QUAD-TERM NODE-WISE SEPARABLE LIKELIHOODS
The pseudo-likelihood introduced in Section III leads to a parameter posterior that admits a pairwise MRF model. This is advantageous in providing a means for decentralised estimation through message passing algorithms in a network. The edge potentials (23) of this model, however, are i) conditioned jointly on two sensors' measurements simultaneous access to which is infeasible in a networked setting, and, ii) conditioned on association variables for two sensors and ideally should be evaluated at its most probable configurationτ i,j 1 , . . . ,τ i,j t each of which is NP-hard to find, as explained in Section II.
In order to overcome these difficulties, we introduce an approximation which factorises into terms local to nodes, i.e., a node-wise separable approximation. Let us consider the "centralised" pairwise likelihood update term in (21) given some configuration τ i,j k for k " 1, . . . , t, and, drop them from the subscript for the sake of simplicity in notation, as well as the i, j subscript in θ, in the following discussion. This term factorises in alternative ways as follows: In the first and second lines above, the chain rule is used. The third equality can be found by taking the geometric mean of the first two expressions. All four factors in Eq.(28) are conditioned on the measurement histories of both sensors to which one cannot have simulatenous access in a networked setting. We would like to aviod this by leaving out the history of sensor i (sensor j) in the first two (last two) terms of (28), i.e., where κ k pθq is the normalisation constant that guarantees q to integrate to unity. Note that κ k is a function of the parameters θ.
The appeal of this quadruple term is that its factors depend on single sensor histories. As such, they require filtering of sensor histories of i and j individually enabling the evaluation of their product in a network. This point is discussed later in this section.

A. Approximation quality
We consider the difference between the original centralised update term in (28) and the quad-term approximation introduced in (29). Because these terms are probability densities over sensor measurements, their "divergence" can be quantified using the KLD [31]: Proposition 4.1: The KLD between the centralised update and the node-wise separable approximation in (29) is bounded by the average of the mutual information (MI) [31] between the current measurement pair and a single sensor's history conditioned on the history of the other sensor, i.e., The proof can be found in Appendix B 2 . The upper bound in (31) measures the departure of the current pair of measurements, and, one of the sensor histories from a state of conditional independence when they are conditioned on the history of the other sensor. Note that these variables, when conditioned on X k , are conditionally independent, i.e., pZ i k , Z j k qK KZ j 1:k´1 |X k , Θ holds and consequently Similarly, the average MI term on the right hand side of (31) is zero if pZ i k , Z j k qK KZ i 1:k´1 |Z j 1:k´1 , Θ and pZ i k , Z j k qK KZ j 1:k´1 |Z i 1:k´1 , Θ hold simultaneously. This condition is satisfied, for example, in the case that either of the measurement histories Z i 1:k´1 and Z j 1:k´1 are sufficient statistics for X k (i.e., it can be predicted by both sensors with probability one). This level of accuracy should not be expected as the transition density of state space models introduce some uncertainty. Therefore, it is instructive to relate the KLD in (31) further to the uncertainty on X k given the sensor histories: Corollary 4.2: The KLD considered in Proposition 4.1 is upper bounded by the weighted sum of uncertainty reductions in the local target prediction and posterior distributions achieved when the other sensor's history is included jointly: where H denotes the Shannon differential entropy [31]. The proof is provided in Appendix C. Corollary 4.2 relates the approximation quality of the quad-term node-wise separable updates to the uncertainties in the target state prediction and posterior distributions when individual node histories and their combinations are considered. The difference terms on the RHS of (32) quantify the difference in uncertainty between estimating the target state X k using only the local measurements, and, also taking into account the other sensor's measurements.
Overall, a better quality of approximation should be expected when the local filtering densities involved concentrate around a single point in the state space.

B. The quad-term pairwise likelihood
The quad-term update in (29) leads to a separable approximate likelihood given bỹ We refer to this term as the quad-term separable likelihood as it can also be expressed as a (scaled) product of four factors each of which are the products of the four factors of (29) over k. Let us define r k ij pZ i k , θq fi ppZ i k |Z j 1:k , θq, s k j pZ j k , θq fi ppZ j k |Z j 1:k´1 , θq. Then, the quad-term update in (33) is given by where the normalisation factor is given in (30), and, equivalently in terms of the four factors above as

Corollary 4.3:
The KLD between the parameter likelihood in (8) and the node-wise separable approximation in (33) is bounded by the terms on the right hand sides of (31) and (32) summed over k " 1, . . . , t as Proof. Eq. (35) can easily be found after expanding the KLD term explicitly and expressing the logarithm of products involved as sums over logarithms of the factors. Boundedness follows from non-negativity of KLDs and summing both sides of (31) and (32) over k " 1, ..., t. As a conclusion, when t is not large -e.g., on the order of tens which is typical in fusion applications-the proposed approximation can be used for parameter estimation via local filtering. Sensors that are more accurate in inferring the underlying state process result with a smaller KLD in (35), which in turn leads to a more favourable estimation performance.
One other approximation based on local filtering distributions was studied in [26] which has the following dual-term product form upZ i k , Z j k |Z i 1:k´1 , Z j 1:k´1 , θq fi ppZ i k |Z j 1:k´1 , θqppZ j k |Z i 1:k´1 , θq. (36) In Appendix D, we shown that Dpp||qq ă Dpp||uq, when sensors are equivalent. The entropy bound given in (32) is also smaller than that for the dual-term approximation. In other words, the quad-term approximation is more accurate compared to the dual-term approximation, under typical operating conditions.
The scaling factor of the dual term approximation is unity regardless of θ, on the other hand, admitting a significant amount of flexibility in the range of the distributions and likelihoods that can be accommodated in the state space model. For example, the dual-term pseudo-likelihood is used with random finite set variables (RFS) in [26], [44], which, in a sense, have the association variables marginalised out making it possible to avoid multi-dimensional assignment problems in the general multi-object tracking model. For RFS distributions, however, it is not straightforward to compute the scaling factor in (30) for the quad-term. In this article, we consider a parametric model instead, which is effectively configured through association variables.

PARAMETERS
The results presented so far are fairly general and do not depend on the nature of Θ. When Θ represents respective parameters such as calibration parameters, there are certain simplifications of the expressions involved which provide computational benefits. In particular, parameters such as respective location and bearing angles relate the local coordinate frames of the sensors which collect measurements in their local frame. The local filtering distributions are hence over the space of state vectors in the local frame. A point x k P X (Section II-A) is implicitly in the Earth coordinate frame (ECF), and, associated with its representation in the jth local frame rx k s j through a coordinate transform T with the following properties rx k s j " T px k ; θ j q, (37) rx k s i " T pT´1prx k s j ; θ j q; θ i q.
As an example, when x k is a location on the Cartesian plane, and, θ j is the position of sensor j, T is given by T pT´1prx k s j ; θ j q; θ i q " rx k s j`θj´θi .
For simplicity in notation, we will denote T pT´1p.; θ j q; θ i q by T θ p.q when semantics is clear from the context.
In this section, it is revealed how local filtering distributions are used in the quad-term update. When θ are respective quantities, these distributions become independent of θ because both the state and the measurement variables are in the same local coordinate frame. In other words, at sensor j holds for the filtering posterior, for any configuration of τ j k . In the prediction stage of filtering, the multi-object transition kernel in (5) also becomes independent of Θ, so, the Chapman-Kolmogorov equation for finding the prediction density at sensor j (together with the empirical Bayes selection of association priors during iterations as explained in Section II) becomes pprX k s j |Z j 1:k´1 q " ż πprX k s j |rX k´1 s j q pprX k´1 s j , τ j k´1 "τ j k´1 |Z j 1:k´1 qdrX k s j . (39) Note also that the entropy terms in (32) that are conditioned on a single sensor's measurements measure the uncertainty of the above densities. Consequently, they also become independent of Θ, i.e., HpX k |Z j 1:k´1 , Θq equals to HpX k |Z j 1:k´1 q for example, which highlights its relevance to the local prediction accuracy.

A. The quad-term update for calibration
Now, let us expand the quad-term time updates in (34), and, explicitly show the aforementioned simplifications. We start with s k j which is the scale factor of the local Bayesian filter at sensor j: where in the last line independence from θ is asserted. Because s k j does not depend on θ, we denote s k j pZ j k , θq by s k j pZ j k q in the rest of the article.
Next, let us consider r k ij which has terms in different coordinate frames: In the third line above, the coordinate transformations are substituted explicitly. The last line follows from that the filtering distribution (38) is in the jth local frame.

B. Evaluation of the quad-term update based on single sensor filtering distributions
Here, we discuss the evaluation of the quad-term likelihood given local filtering distributions and single sensor association configurations which we denote for sensor j by pprX k s j , τ j k "τ j k |Z j 1:k q andτ j k , respectively, as explained in Section II-B.
Instead of considering evaluation for the most probable association hypothesisτ i,j k which is infeasible to find, we propose to use the local resultsτ i,j k fi pτ i k ,τ j k q as a reasonable approximation to this configuration and substitute them in (40)- (41). These approximations can be found regardless of θ as discussed earlier in this section by using one of the well studied algorithms in the literature [2] such as solving a 2´D association problem at each time step [35] to find τ j k . We detail this approach for a linear Gaussian state space model in Section VI. Given these local results and their exchange over the network, one can consider an in-network computation scheme for evaluating (40)- (41). Specifically, these terms (and, the other factors of the quad-term update which are obtained by replacing i and j in these expressions) can be found at the sensor platform where the measurements to be substituted for evaluation are stored, i.e., sensors j and i, respectively, for s k j and r k ij . Substitution of the measurement histories on the conditioning side will have been carried out by local filtering.
More explicitly, s k j in (40) (or, s k i ) becomes a product of similar terms whenτ k j is substituted in (40), and, its computation is carried out during the local filtering of sensor j's (or, sensor i's) measurements using where m 1 "τ j k poq and the density in the integral is the m 1 th marginal of the local prediction density, the mth of which is given by x k,m`2 , . . . , x k,M s ,τ j k "τ j k |Z j 1:k´1 q dx k,1 . . . dx k,m´1 dx k,m`1 . . . dx k,M .
The term r k ij in (41) (or, r k ji ) is also computed based on these local filtering distributions. The integration in the RHS of (41) implicitly assumes that the ordering of individual objects in the local multi-object vectors are the same. In a networked setting, however, this is not necessarily the case and the identities of the fields in the state vector may differ [45]. In order to tackle with this unknown correspondance, we introduce an additional permutation random variable γ k (see, e.g., [46]) for relating the fields of a multi-object vector X k as ordered locally at sensor i and j, such that the mth field of the state vector at sensor i refers to the same object in the γ k pmqth field of the state vector at sensor j. For example, rx k,m s i " T θ prx k,γ k pmq s j q.
Suppose that an estimateγ k of this quantity is provided. After substituting in (41) together withτ i,j k one obtains where m 1 "γ k pτ i k poqq and the density inside the integral is the m 1 th marginal of the filtering distribution local to sensor j. In Appendix E, we show that (43) replaces the likelihood for γ k when m 1 " γ k pτ i k poqq and an ML estimateγ k can be found in a way similar to solving the data association problem in local filtering, which is detailed later in Section VI-A.
Finally, the scale factor (30) is computed. This involves finding the measurement distributions in (30) using the prediction distribution in (39) for both sensors i and j together with their likelihoods. This leads to the following two decomposition: The first term in (30) is found as where ξpoq "τ j´1 k˝τ i k poq maps the oth measurement at sensor i to the corresponding one in sensor j, and, m 1 "τ i k poq in the second line.
The second term in (30) is found as where m 1 "γ k˝τ i k poq in the last line is the object that corresponds to the oth measurement at sensor i.
Consequently, the scale factor is found as using the densities found in (44) and (45). The expressions above describe the evaluation of the quadterm likelihood in terms of single sensor filtering distributions that can be obtained using any filtering algorithm with individual measurement histories. The use of the local filtering distributions provides scalability with the number of sensors for parameter estimation in the state space model in Section II.
These computations can be distributed in the pair pi, jq as follows: Both sensors i and j perform local filtering and exchange the resulting posterior densities at every step, as well as s i and s j , respectively, found using (42). Based on the received densities, sensors i and j evaluate r ij and r ji , respectively, using (43). As part of the filtering process, they realise the Chapman-Kolmogorov equation in (39) with their local posterior, as well as the remote posterior recently received. These densities are then used in (44), (45), and, (46) to compute the scale factor for the next step. The scale factor hence found in the previous step, therefore, is substituted in (34) together with the four terms computed. This is repeated for k " 1, . . . , t, and, the quad-term likelihood (33) is computed, as a result.

VI. A MONTE CARLO LBP ALGORITHM FOR SENSOR CALIBRATION IN LINEAR GAUSSIAN STATE SPACE MODELS
In this section, we consider a linear Gaussian state space (LGSS) model within the probabilistic graphical model in Fig. 1, and, specify an algorithm for estimation of θs that combines the quad-term calibration likelihood evaluation detailed in Section V with BP message passing on the resulting pairwise MRF model (Section III). This algorithm uses Monte Carlo methods for realising BP [32] and facilitates scalability by building upon single sensor filtering as required by the quad-term approximation. As a result, an efficient inference scheme over the model in Fig. 1 is achieved. The state space model we consider is specified by a linear state transition with process noise that is additive and Gaussian, and, linear measurements with independent Gaussian measurement noise, i.e., πpx k |x k´1 q " N px k ; Fx k´1 , Qq g j pz j k |x k ; θ j q " N pz j k ; H j rx k s j , R j q for j " 1, . . . , N , where N p.; µ, Pq is a multi-dimensional Gaussian density with mean vector µ and covariance matrix P.
Here, x k is the concatenation of position and velocity (on a 2´D Euclidean plane, without loss of generality). The matrices F and Q model motion with unknown acceleration (equivalently, manouevres), and, are selected as where I and 0 are the 2ˆ2 identity and zero matrices, respectively. ∆T is the time difference between consecutive steps. Q is positive definite and parameterised with σ 2 , and, 0 ă q 1 ă q 2 ă q 3 ă 1 specifying the magnitude of the uncertainty, and, contributions of higher order terms, respectively 3 . In the measurement model, R j is the measurement noise covariance, and, H j is the observation matrix which we assume forms an observable pair with F (e.g., H j " rI, 0s).

A. Local single sensor filtering
We now focus on filtering and provide explicit formulae that adopts the recursions in (13)-(18) for a single sensor in the LGSS model. We use the empirical Bayes approach explained in Section II-B for scaling with time under data association uncertainties. This approach corresponds to the single frame data association solution in multi-object tracking [35].
First, let us consider the prediction stage at time k in which we are given a filtering density evaluated at the most likely data association hypothesisτ j k´1 of the previous step 4 . The latter is a product of its marginals ppX k´1 , τ j k´1 "τ j k´1 |Z j 1:k´1 q " M ź m"1 p m px k´1,m , τ j k´1 "τ j k´1 |Z j 1:k´1 q where p m px k´1,m , τ j k´1 "τ j k´1 |Z j 1:k´1 q " ppx k´1,m |z j,m 1:k´1 q. Here, z i,m 1:k´1 denotes the measurements induced by object m from step 1 to k´1, i.e., where ρ is the inverse of τ , i.e., ρ˝τ is the identity permutation. Consequently, the mth marginal of the posterior at k´1 is a Gaussian density that can be obtained equivalently by Kalman filtering [36] over z j,m 1:k´1 , i.e., p m px k´1,m |z j,m 1:k´1 q " N px k´1,m ;x j k´1,m , P j k´1,m q, (49) with the mean and covariance matrices over time specifying the mth "track." Hence, the prediction density in (39) (Section V-A) evaluated at τ j k´1 "τ j k´1 for the state transition in (47) is given by N px k,m ;x j k|k´1,m , P j k|k´1,m q (50) x j k|k´1,m " Fx j k´1,m , P j k|k´1,m " FP j k´1,m F T`Q , where the last two lines are Kalman prediction equations with p.q T denoting matrix transpose.
Next, let us consider the update stage in which we use the prediction density (50) with M measurements concatenated in Z j k and the measurement likelihood. This likelihood is found by substituting (48) in (6). First, we use this term within the likelihood for the association variable τ j k which -as it is mutually independent from τ j 1 , . . . , τ j k´1 -is given by The first line above follows from that the joint distribution of Z j 1:k´1 and τ j k is independent of the latter. The prior distribution for τ j k is non-informative as given in (7), so, the ML estimate using (51) coincides with the MAP estimate and it is given bŷ where S M is the set of M -permutations. An equivalent problem is found by taking the logarithm of the objective function in the combinatorial optimisation problem above as follows: for o, m " 1, . . . , M . This cost for the LGSS model is explicitly found using the prediction distribution (50) and the measurements within the KF innovations [36] as The optimisation in (53) is a 2´D assignment problem which can be solved in polynomial time with M (despite that the search space S M has a factorial size) using one of the well known solvers [47] including the auction algorithm [48]. This algorithm operates over a matrix of costs obtained by C " rcpo, mqs to iteratively find the M pairs corresponding to the best permutationτ j k in the ML problem (52). Here, com-putation of the M 2 cost matrix usually has the predominant computational time. Next, we consider the state distribution update (see (15) in Section II-B) and assert the empirical (model) prior in (16). As a result, the filtering density at k becomes a product of its marginals each of which is a Gaussian as in (49) found by the KF update [36], i.e., where, x k,m "x k|k´1,m`Kk,m pz j k,ρ j k pmq´H jxk|k´1,m q P k,m " pI´K k,m H j q P j k|k´1,m K k,m fi P j k|k´1,m H T j S j k´1 ,m . Note that, because the posterior density is now non-zero only for τ j k "τ j k , the marginalisation over τ j k (see, e.g., (14)) in the following prediction stage reduces to (39) and (50).

B. Evaluation of the calibration quad-term in the LGSS Model
Let us consider the evaluation of s k j given by (42) using the formulae for the LGSS model introduced in Section VI-A. By comparison with the cost term in (53) and (54), it can easily be seen that s k j is the exponential of the association cost for τ j k , i.e., Next, let us consider (43) for evaluating r k ij . Evaluation of this term involves finding the object identity correspondancê γ k by solving a 2-D assignment as explained in Appendix E. In the LGSS model, the assignment cost matrix D " rdpo, mqs is found as for o, m " 1, . . . , M . Here, the second order statistics P j k,m is also transformed by applying any rotations involved in T θ to its eigenvectors. The best assignment which here encodeŝ γ k is found using the auction algorithm [48] as well, similar to the assignment in the Bayesian filtering update (Sec. VI-A). Using this estimate, the quad-term factor is computed using In order to evaluate the other factors of the quad-term update, i.e., s i k and r k ji , similar computations are used. It suffices to replace i in the subscripts/superscripts of the expressions above with j, and, vice versa.
Finally, let us consider the scale factor in (46). Let us use the notation introduced in the previous section for expressing the densities inside the integration. Starting with (44), one obtains where m "τ i k´1 poq, and,ẑ i k,m and S i k,m are computed as in (54) with i substituted in place of j.
The second density in (46) conditioned on sensor j's history, i.e., (45), is similarly found as where m "γ k´1˝τ i k´1 poq and,ẑ j k,m and S j k,m are given in (54).
Using the densities above and integration rules for Gaussians, the oth term of the scale factor in (46) is found as 1 µ 1`Σ´1 2 µ 2˘( . (60) In a distributed setting, the scale factor expressions above are computed both at sensors i and j, for which the prediction stage given in (50) is carried out for both the local posterior and the posterior recevied from the other sensor at time k´1.

C. Sampling from the calibration marginals using nonparametric BP
In this section, we introduce particle based representations and Monte Carlo computations [49] for the realisation of (loopy) BP message passings. Note that Sections VI-A and VI-B specify the evaluation of edge potentials given in (33) for pτ i 1:t , τ j 1:t q " pτ i 1:t ,τ j 1:t q. For sampling from the marginal parameter posteriors, we adopt the approach detailed in [26, Sec.VI] for carrying out LBP belief update and messaging in (25) and (24), respectively. Given L equally weighted samples fromp i pθ i q, i.e., for l " 1, . . . , L, the edge potentials are evaluated to obtain Consider the BP message from node j to i in (24). Suppose that independent identically distributed (i.i.d.) samples from the (scaled) product of the jth local belief and the incoming messages from all neighbours except i are given, i.e., θ plq j "p j pθ j q ź i 1 Pnepjq{i m i 1 j pθ j q for l " 1, ..., L. (63) These samples are used with kernel approximations in order to represent the message from node j to i (scaled to one), in the NBP approach [32]. We use Gaussian kernels leading to the approximation given bŷ where the kernel weights are the normalised edge potentials. Λ ji is related to a bandwidth parameter that can be found using Kernel Density Estimation (KDE) techniques. In particular, we use the rule-of-thumb method in [50] and find wherem ji andĈ ji are the empirical mean and covariance of the samples, respectively, and d is the dimensionality of θ ji s. Given these messages, let us consider sampling from the updated marginal in (25). We use the weighted bootstrap (also known as sampling/importance resampling) [51] with samples generated from the (scaled) product of Gaussian densities with mean and covariance found as the empirical mean and covariance of the particle sets, respectively. In other words, givenm ji andĈ ji as above, we generate The particle weights for these samples to represent the updated marginal is given by where p 0,i is the prior density selected for θ i (and, the node potential in (22)). Thus, the local calibration marginal is estimated byP Algorithm 1 Pseudo-code for estimation of θ using the quadterm separable likelihood within Belief Propagation. 1: for all j P V do Ź Local filtering 2: for k " 1, . . . , t do

3:
Find ppX k , τ j k "τ j k |Z j 1:k q in (55) as described in Section VI-A 4: Find s k j pZ j k q in (56) 5: end for 6: end for 7: for all j P V do Ź Sample from priors 8: Sample θ plq i " p 0,i pθ i q for l " 1, . . . , L as in (61)  As the final step of the bootstrap, tθ plq i , ω plq i u M l"1 is resampled (with replacement) leading to equally weighted particles fromp i pθ i q, i.e., tθ plq i u L l"1 . We follow similar bootstrap steps in order to generate the samples in (63).
After nodes iterate the BP computations described above for S times, each node estimates its location by finding the empirical mean of tθ plq i u L l"1 . These steps are summarised in Algorithm 1.

VII. EXAMPLE: SELF-LOCALISATION IN LGSS MODELS
In this example, we demonstrate the quad-term node-wise separable likelihood in sensor self-localisation. The LGSS model given by (47) and (48) is used with process noise parameters selected as σ " 0.5, q 1 " 1{4, q 2 " q 3 " 1{2, q 4 " 1. The measurement model for sensor i is given by H i " rI, 0s and R i " σ 2 n I with σ n " 10 modelling noisy position measurements in the local coordinate frame.
We use Algorithm 1 specified in Section VI for estimating θ. The MRF model we consider is specified by the pairwise graph G in Fig. 2 (blue edges). We use L " 100 points in (61) to represent the local belief densities. We select the sensor data time window length as t " 10 (starting at time 21 until 30 in the scenario in Fig. 2). We follow the steps in Algorithm 1 for S " 16 iterations.
A typical run is illustrated in Fig. 3. Here, the scatter plot of particles from marginal posteriors are given over iterations. Note that the network coordinate system is established by sensor 1 through its informative prior, and, LBP emanates this information towards the outer nodes while learning the edge potentials using node-wise separable likelihood evaluations ψ i,j pθ plq i , θ plq j q given in (23) and (33). For performance assesment, first, we consider the mean squared error forθ output by our algorithm, as we have built our discussion on MMSE estimators in (12). We find this value empirically by taking the average of the squared norm of estimation errors over 100 Monte Carlo simulations. In Fig. 4, we present a semi-log plot of this quantity over iterations (blue line). Note that, convergence occurs in less  than ten iterations which is a favourable feature. We compare this algorithm with the RFS based dual-term pseudo-likelihood proposed in [26]. When evaluating this term, we use a Poisson multi-object model output by using the Gaussian mixture probability hypothesis density (GM-PHD) filter [52] with the LGSS model. The averaged MSE performance for the case that dual-term likelihoods are used as edge potentials is depicted with the green dashed line in Fig. 4. The quad-term approximation is seen to provide faster convergence with both pseudo-likelihoods leading to an on par accuracy in the steady regime, in this example. The edge update time for the quadterm update averages to 0.601 per edge per particle compared to 1.312 for the dual-term update demonstrating its relative efficiency. Note that, the MSE is a network-wide term and the local error norms are smaller. The localisation miss-distance averaged over sensors is given in Fig. 5. The average error (ȏ ne standard deviation) in the final step is 2.60˘0.70m with a maximum value of 4.96 which is less than 0.5% of the edge distances of 1000m. These results demonstrate that the proposed scheme is capable of providing self-localisation with favourable accuracy and small error margins.

VIII. CONCLUSIONS
In this work, we have addressed the prohibitive complexity of latent parameter estimation in state space models when there are many sensors collecting measurements. We proposed a pseudo-likelihood, namely the quad-term node-wise separable likelihood, as an accurate surrogate to the actual likelihood which is extremely costly to evaluate. The separable structure of this quad-term approximation makes it possible to evaluate it using local filtering operations, hence, scale with the number of sensors. It is not straightforward to use this pseudo-likelihood in the case of multiple objects, however, unlike, for example, a dual-term alternative proposed previously in [26]. The latter can easily be used with random finite set state variables and efficient filtering algorithms in the presence of data association ambiguities different from the quad-term approximation. We addressed this challenge by using a parameterised multiobject state space model in which different configurations of the parameter specify different hypothesis of object-tomeasurement and object-to-object associations. We introduced an empirical Bayesian perspective for evaluating separable likelihoods in this model, using only local Bayesian filtering.
The associated posterior distribution is a MRF over which distributed inference is possible using message passing algorithms such as LBP. We specify a particle message passing algorithm for sampling from latent parameter marginals for a linear Gaussian state space model with multiple objects. It is possible to extend this algorithm to handle further uncertainties in multi-object multi-sensor filtering such as false alarms by extending the model hypotheses to encode these possibilities when solving the assignment problems involved (see, for example [35]).