A Solution for Large-scale Multi-object Tracking

A large-scale multi-object tracker based on the generalised labeled multi-Bernoulli (GLMB) filter is proposed. The algorithm is capable of tracking a very large, unknown and time-varying number of objects simultaneously, in the presence of a high number of false alarms, as well as misdetections and measurement origin uncertainty due to closely spaced objects. The algorithm is demonstrated on a simulated large-scale tracking scenario, where the peak number objects appearing simultaneously exceeds one million. To evaluate the performance of the proposed tracker, we also introduce a new method of applying the optimal sub-pattern assignment (OSPA) metric, and an efficient strategy for its evaluation in large-scale scenarios.


I. INTRODUCTION
Multi-object tracking has a host of applications in diverse areas, and many effective solutions have been developed in the literature [1], [2], [3]. Due to the unknown and time-varying number of objects, clutter, misdetection and data association uncertainty, multiobject tracking is computationally intensive, specifically, it is an NP-hard problem [1].
The total number of objects/tracks, however, is not indicative of the size or difficulty of the problem since it is straightforward to track an arbitrarily large cumulative number of objects over time in scenarios with only a few objects per frame. A more discerning indicator of problem size is the representative number of objects and measurements per frame, provided all other factors are on equal footing.
Computational requirement is the biggest barrier in large-scale multi-object tracking problems that can arise in applications ranging across areas such as space surveillance to cell biology. In space surveillance, tracking orbital space objects including thousands of satellites and millions of man-made debris is the foremost task [18]. Wide Area Surveillance for large urban environments, important for disaster relief as well as traffic M. Beard and accident management, requires tracking hundreds of thousands of objects over time, including people in crowded environments [14]. In cell biology, tracking cells is critical to understanding how cells behave in living tissues [4]. Similarly, in wildlife biology, animal tracking is needed to study wildlife behaviour in their natural habitats. Such biological tracking applications could involve hundreds of thousands of objects.
To the best of our knowledge, so far no multiobject trackers capable of handling thousands of objects/measurements per frame have been reported. The large-scale air traffic surveillance algorithm proposed in [5], arguably one of the earliest large-scale multiobject trackers, was demonstrated on a total of about 800 tracks (the number of objects per frame is not provided). In [6], the scalability of the linear multi-target integrated probabilistic data association (LMIPDA) filter was demonstrated on scenarios with 50 objects per frame. Multiple hypothesis tracking (MHT) based algorithms developed for cell tracking [4], and for wildlife tracking [13], were demonstrated on thousands of objects in total, with several hundreds of objects per frame [13]. In [15] it was demonstrated that a one-term approximation of the generalised labeled multi-Bernoulli (GLMB) filter can track over a thousand objects per frame [15].
Large-scale multi-object tracking, involving tens of thousands up to millions of objects/measurements per frame, is extremely challenging as the computational complexity does not scale gracefully with problem size. There are numerous combinations of events, and this number explodes with more objects/measurements. Thus, innovative theoretical and numerical constructs are necessary to identify the important combinations and process them in the most efficient manner. Given the current status of computing technology, is it possible to solve large-scale multi-object tracking problems in a timely manner, possibly even in real time?
In this work, we present a large-scale multi-object filter capable of tracking in the order of a million objects/measurements per frame, while running off-theshelf computing equipment. Our solution rests on the notion of multi-object density 1 -a trademark of the random finite set (RFS) approach -that encapsulates all information on the current set of tracks in a single non-negative function. Processing the massive number of combinations translates to recursive computation of the multiobject filtering density [7], [8], [9]. Tractability hinges on efficient functional approximation/computation of the so-called GLMB filtering recursion, under limited processing/memory resources. At a conceptual level, the key enablers in our proposed large-scale tracker are: • Adaptive approximation of the GLMB filtering density, at each time, by a product of tractable and approximately independent GLMB densities; • Efficient parallel computation of these GLMB densities by exploiting conjugacy of the GLMB family. In essence, this strategy efficiently identifies and processes significant combinations by exploiting structural properties of the GLMB densities and parallelisation, to make the most of the limited computing resources. Consequently, while the focus of this paper is on largescale problems, our solution also provides significant efficiency gains when applied to smaller scale problems.
Our study would be incomplete without evaluating the tracking performance of the proposed large-scale multiobject tracker, which is a major barrier in itself. The standard application of the optimal sub-pattern assignment (OSPA) metric, originally proposed in [11], does not take into account aspects such as track switching, and track fragmentation. What we need is a measure of dissimilarity between two sets of tracks, which; (i) is physically meaningful, (ii) satisfies the properties of a metric for mathematical consistency, and (iii) is computable for scenarios involving millions of tracks. Fortunately, the OSPA metric can be adapted for this purpose, which we have called OSPA (2) to distinguish it from the standard use of OSPA. To evaluate the performance of the proposed large-scale tracking algorithm on a scenario involving an unknown and time-varying number of objects in excess of a million per frame, we developed an efficient and scalable procedure for computing the OSPA (2) metric.

II. BACKGROUND: GENERALISED LABELED
MULTI-BERNOULLI TRACKING FILTER The multi-object tracking problem is distinct from the multi-object filtering problem, in the sense that we are interested in estimating the trajectories of objects over time, as opposed to the multi-object state at each time instant. We therefore limit our attention to methods that are capable of estimating object trajectories over time, and not just multi-object states.
The generalised labelled multi-Bernoulli (GLMB) filter is an algorithm that is specifically designed to fulfil this requirement. Its ability to estimate trajectories stems from the idea of modeling the multi-object state as a labeled random finite set, which, from an intuitive viewpoint, simply means augmenting the kinematic state space with a discrete space of object labels. A given element of the label space identifies a unique object, and the trajectory of an object can be estimated by extracting the collection of states with the same identifying label across different time instants. In the remainder of this section, we briefly revisit the main points of the GLMB filter, the interested reader is referred to [7], [8], [9], [10] for a more detailed treatment.
We begin by defining the notion of a labeled random finite set. Let X be a single-object state space, L a discrete label space, and L : X × L → L the projection defined by L ((x, )) = for all points (x, ) ∈ X × L. Consider a finite subset X ⊆ X × L, and its corresponding label set L (X) = {L (x) : x ∈ X}. It follows that the labels of the points in X are distinct if and only if X and its label set L (X) have equal cardinality. This is expressed mathematically by defining a distinct label indicator function which has value 1 if the labels in X are distinct and 0 otherwise. A labeled RFS is defined as an RFS on the space X × L, with the constraint that any realisation X must satisfy ∆ (X) = 1.
A generalised labeled multi-Bernoulli RFS is a class of labeled RFS that is distributed according a multi-object density with the form is referred to as a multi-object exponential. As shown in [7], the GLMB is closed under the the standard multi-object transition kernel, and is a conjugate prior with respect to the standard multi-object observation likelihood. These properties ensure that the GLMB can be used to construct a recursive Bayesian multi-object tracker.
Let us assume that at time k − 1, the posterior multiobject density is a GLMB for the form π k−1 (X) = ∆ (X) where ω (I,ξ,J,θ) k Probability that an object with label is born at time k p B,k (·, ) Prior probability density for a new object born at time k with label B k Label space for new objects born at time k L k−1 Space of all possible object labels up to time k − 1 Z k Set of measurements received at time k Θ k Space of measurement-label mappings at time k θ ( ) Measurement index assigned to label under mapping θ ∈ Θ k , where θ ( ) = 0 implies that no measurement is assigned P S,k (x, ) Probability that an object with state x and label at time k − 1 survives to time k P D,k (x, ) Probability that an object with state x and label is detected at time k f k|k−1 (·|x, ) Prior probability density for a surviving object at time k, which had state x and label at time k − 1 g (·|x, ) Probability density for an observation generated by an object with state x and label Then at time k, the posterior multi-object density is given by the GLMB as defined by equations (2)-(8), where we have used the notation defined in Table (I) to denote the inputs and model parameters.

III. SCALABLE GLMB FILTERING: THEORETICAL FOUNDATIONS
Due to practical limitations on computational resources, the original implementation of the GLMB filter proposed in [7], [8] cannot accommodate large numbers of objects simultaneously. The main computational bottleneck occurs in the measurement update procedure, which involves processing each component of the predicted GLMB density using Murty's algorithm to rank the most significant posterior components in descending order of weight. Accordingly, if the input data consists of M measurements, and the i-th component of the predicted GLMB density has N unique target labels and is allocated K posterior components, then the complexity of processing the i-th component will be O K (N + M ) 3 . Clearly the number of measurements and objects that this algorithm can handle in practice will depend on the available resources. However, in general, the cubic complexity will render this algorithm computationally infeasible for large values of N and M .
An alternative implementation of the GLMB filter based on Gibbs sampling was proposed in [9]. This algorithm replaces the expensive procedure of deterministically generating the highest weighted components, in favour of a cheaper stochastic sampling approach. Consequently, this method reduces the computational complexity of processing each component to O KN 2 M . Note that the quadratic complexity in the number of objects will still render this algorithm infeasible for tracking large numbers of objects simultaneously.
In a given application, the available computational resources, combined with the requirements on processing speed and tracking accuracy, will impose a practical upper limit on the number of objects that can be feasibly tracked simultaneously. In order to track more objects using this algorithm, the overall tracking problem needs to be decomposed into smaller sub-problems, with an upper bound on the number of objects in any given sub-problem. In labelled multi-object tracking, such a decomposition can be described in terms of a partition of the label space, where each group within the partition represents a sub-problem that is assumed to be statistically independent of all others. For a given a partition of the label space, it is possible to approximate the overall GLMB density as a product of smaller GLMB densities, each of which consists only of labels from a single group within the partition. Here we present a method for computing these GLMB densities, which is optimal in the sense that the Kullback-Leibler divergence (KLD) from the original GLMB density is minimised. In the remainder of this section we show that, for a given partition of the label space, a GLMB density can be feasibly and optimally decomposed into a product of smaller GLMB densities.

A. GLMB Optimal Product Approximation
Given a partition G = L (1) , . . . , L (N ) of the label and the marginal π (n) of π on F X × L (n) is given by We are interested in approximating π as a product of N independent densities as follows whereπ (n) (·) is a density on the space F X × L (n) . First let us consider a partition consisting of two groups i.e. G = L (1) , L (2) . The Kullback-Leibler divergence between π and its product approximation is given by (9) on the following page. Note that the minimum possible KLD is obtained when D π (1) ;π (1) = 0 and D π (2) ;π (2) = 0, which, by the properties of the KLD, occurs only whenπ (1) = π (1) andπ (2) = π (2) almost everywhere. Thus, choosing the elements of the product approximation to be the marginal densities yields the minimum possible KLD. The same argument follows for a general partition consisting of N groups, i.e. G = L (1) , . . . , L (N ) .

B. Marginalising GLMB Densities
Consider a GLMB density of the form For a partition G = L (1) , L (2) consisting of two groups, the marginalised GLMB density with respect to the first group is given by (12)-(13) on page 6. For a general partition G = L (1) , . . . , L (N ) , using the same argument, the marginalised density corresponding to the n-th group is given by

IV. SCALABLE GLMB FILTERING: IMPLEMENTATION
In this section we discuss the implementation of an efficient and scalable GLMB filter, using the decomposition method presented in the previous section. It is well known that the computational complexity of the multi-object tracking problem rises sharply as the number of objects and/or measurements increases. This leads to the problem becoming infeasible when the number of objects/measurements present at a given time exceeds a certain threshold, which depends on the available computing resources. Adding more memory and processing power to the system may allow it to track more objects, however, this does not solve the fundamental computational problem, and will only serve to delay its onset. A logical approach to tracking a very large number of objects simultaneously is to employ a divide-and-conquer strategy, by decomposing the large tracking problem into many smaller sub-problems, each requiring significantly fewer resources to solve. However, this introduces a new challenge, namely, to find an appropriate decomposition of the overall problem that can be considered "optimal" in some sense. The meaning of "optimal" in this context is difficult to define, since there are multiple competing objectives that can influence the suitability of a solution.
Arguably, the main objective in decomposing a large tracking problem is to divide the set of tracks into smaller groups (i.e. partitioning the discrete set of track labels), such that the number of tracks in any one group does not exceed some upper bound. This is necessary to ensure that the processing of any single group does not consume an excessive or disproportionate share of the available computing resources. The number of tracks in a group determines the amount of memory required to store the parameters of the corresponding GLMB density, and the processing time needed to perform the GLMB update. Therefore, setting an upper bound on the group size requires careful consideration of several factors, such as the amount of available system memory, the CPU speed, and the real-time constraints on processing each frame of incoming data.
Another objective is to minimise the errors incurred from carrying out the decomposition. In this case, we where 1 , . . . , 1 , . . . , 1 , . . . , 1 , . . . , wish to approximate a GLMB density that encompasses all tracks in a scenario, by a product of marginal GLMB densities, each of which encompasses only a subset of the tracks. Ideally, we would like this product of marginal GLMBs to match the original (full) GLMB density as closely as possible. This is equivalent to minimising the Kullback-Leibler divergence between the two representations.
Finding the ideal decomposition thus involves solving a multi-objective optimisation problem over the space of partitions of object labels. In theory, this problem could be cast as a constrained optimisation, where the objective is to find a partition minimising the KLD between the full GLMB and its product approximation, subject to a constraint on the maximum size of any one group. However, this optimisation problem is much harder than the original tracking problem, since the space of label partitions grows super-exponentially as the number of tracks increases 2 . Therefore, to establish a feasible decomposition algorithm, we re-cast the problem into a form that can be solved with existing numerical techniques.
Instead of directly attempting to perform a constrained optimisation of the KLD over the space of partitions (which is clearly intractable), we aim to find a partition of the labels such that any pair of objects that do not appear in the same group are approximately statistically independent. In the standard multi-object tracking model, with point detections and independent object motion, this statistical dependence arises solely via the uncertainty in the unknown association between measurements and objects. That is, if multiple objects could have produced a particular detection, then there is a statistical dependence between those tracks in the posterior multi-object density. It follows that tracks that are well separated in the measurement space will have low posterior statistical dependence, because the probability that these tracks give rise to closely spaced measurements is extremely low. In this context, "well separated" means that the distance between tracks in the measurement space is large compared to the measurement noise and the uncertainty in the object's predicted location. This is the key property that we exploit in order to find a partition of the track labels that minimises the amount of potential "measurement sharing" between objects in different groups.
To achieve this, we first project the marginal probability density corresponding to each object label onto the measurement space, which is used to construct a "bounding region" in the measurement space for each object. We then attempt to find a partition that minimises the amount of overlap between the bounding regions of objects in different groups. This reduces the chances of measurement sharing, and consequently reduces the statistical dependence between groups of objects in the posterior GLMB density. This is based on the premise that if there are statistically independent subsets of track labels present, then then the GLMB density over the space of all labels can be expressed as a product of marginal GLMBs on these subsets, without any loss of information.
It should be noted that in the worst case, it may not be possible to find independent subsets of track labels. In such cases, the techniques proposed here cannot effectively reduce the computational complexity. However, in many practical tracking scenarios, such independent subsets do exist, and it is this aspect of the problem structure that we exploit in order to make the most efficient use of the available computing resources.
A naive approach to minimising the overlap between target groups would require computing the overlap between the bounding regions for all pairs of objects. Doing so would lead to a computational complexity of O N 2 , making it computationally infeasible for largescale tracking problems involving thousands or millions of objects. Fortunately, spatial searching algorithms can be readily applied to this problem in order to reduce the complexity to O (N log N ), thus making it suitable for large-scale multi-object tracking. In particular, data structures based on R-trees [19], [20] can be used to efficiently search for overlapping hyper-rectangles in the measurement space. The sizes of these hyper-rectangles are chosen probabilistically, such that the measurement of an object has at least probability P G of falling inside the rectangle. When overlaps between hyper-rectangles are detected, the object labels corresponding to those bounding regions are placed in the same group within the partition. If it is not possible to satisfy the upper bound on the group size for a given value of P G , the procedure is repeated with a smaller value of P G . This results in a more aggressive, albeit more approximate, partitioning of the track labels.
Once the partition of the labels has been constructed, we proceed to compute a factorised representation the posterior GLMB density. Note that when processing each new frame of measurements, we must begin with a factorised prior GLMB density, in which each term corresponds to a group of labels in the partition from the previous measurement update. The factorised posterior GLMB is computed from this by the following threestep procedure.
1) Carry out a further factorisation of each term in the existing prior GLMB density, such that no single factor contains labels from multiple groups of the new partition. 2) Multiply subsets of these factorised densities together, so as to reconstruct the marginal prior GLMB for each group of the new partition. 3) Compute the posterior marginal GLMB density for each group of the new partition in parallel, using the joint prediction/update method based on Gibbs sampling as proposed in [9]. Note that in step 3 above, the posterior densities can be computed in parallel because the groups are all approximately independent. The parallelisation of this final step is the key element that leads to a potentially very significant speedup, particularly when the algorithm is deployed on multi-core architectures with many concurrent threads of execution.

A. Application of the OSPA Metric to Evaluate Tracking Performance
In [11], the optimal sub-pattern assignment (OSPA) distance was proposed as a mathematically consistent metric for measuring the distance between two sets of points. This has found widespread application in the literature to evaluate the performance of multi-target filters, where it is usually presented as a plot of the OSPA distance between the estimated and true multitarget states at each time instant. This technique can also provide an indication of the performance of multi-target trackers, however, it does not fully account for errors between the estimated and true sets of tracks. When evaluating tracking performance, the main drawback of using the OSPA distance between multi-target states is that phenomena such as track switching and fragmentation are not penalised in a consistent fashion.
In [12], it was shown that the OSPA metric can be applied in a relatively straightforward manner to facilitate a more rigorous evaluation of multi-target tracking performance. The resulting metric is referred to as OSPA (2) , which reflects the fact that it is the same as the original OSPA metric, except that the base distance is itself an OSPA-based distance. In the following, we revisit the main points of the OSPA (2) metric, and demonstrate its application to the evaluation of large-scale multitarget tracking performance. We begin by defining the following notation: Also, recall that for a function d (x, y) to be called a "metric", it must satisfy the following four properties: As defined in [11], let d (c) p (φ, ψ) be the OSPA distance between φ, ψ ∈ F (X) with order p and cutoff c. That is, for φ = φ (1) , φ (2) , . . . , φ (m) and ψ = ψ (1) , ψ (2) , . . . , ψ (n) , with m ≤ n d (c) p (φ, ψ) p (ψ, φ). Remark 1. Note that in (14), the factor of 1/n, which normalises the distance by the number of objects, is crucial for the OSPA to have the intuitive interpretation as a per-object error.

1) Base Distance Between Tracks :
We shall now make use of the OSPA distance (14) to define a metric on the space of tracks U, which shall in turn serve as the base distance for the OSPA (2) metric on the space of finite sets of tracks F (U). Let us define the distancẽ d (c) (x, y) between two tracks x, y ∈ U as the mean OSPA distance between the set of states defined by x and y, over all times t ∈ D x ∪ D y , i.e.
Note that in (15), {x (t)} is a singleton if t ∈ D x , and {x (t)} = ∅ if t / ∈ D x (and likewise for {y (t)}). In this case, the OSPA distance defined in (14) reduces to Note that the order parameter p becomes redundant, and is therefore omitted from (16).
In order to use (15) as a base distance between tracks, we need to establish that it defines a metric on U. That is, it must satisfy the properties P1-P4 as previously mentioned.
Proposition 2. Let d (c) (·, ·) be a metric on the finite subsets of X, such that the distance between a singleton and an empty set assumes the maximum attainable value c. Then the distance between two tracks as defined by (15) is also a metric.
The proof of this proposition is given in the appendix, which establishes that (15) is indeed a metric on the space U, when d (c) (·, ·) is defined according to (16). Before proceeding to define the OSPA (2) , we make two important observations regarding the properties of this base distance.
• Since d (c) (·, ·) ≤ c, the distance between tracks saturates at the value c, i.e.d (c) q (·, ·; w) ≤ c. • For two tracks x and y such that D x = D y , (15) can be interpreted as a mean square error (MSE) between x and y. Hence, the base distance can be regarded as a generalisation of the MSE for tracks of different lengths and/or domains.
2) OSPA (2) for Tracks: The distance between two tracks as defined in Section V-A1 is both a metric on the space U, and bounded by the value c. It is therefore suitable to serve as a base distance for the OSPA metric on the space of finite sets of tracks F (U). Let X = x (1) , x (2) , . . . , x (m) ⊆ F (U) and Y = y (1) , y (2) , . . . , y (n) ⊆ F (U) be two sets of tracks, where m ≤ n. We define the distanceď (c) p (X, Y ) between X and Y as the OSPA with base distance d (c) (·, ·) (the time averaged OSPA given by equation (15)). That is, where c is the cutoff and p is the order parameter. Note that if m > n, thenď X). We refer to this as the OSPA (2) distance, which can be interpreted as the time-averaged per-track error.
3) Efficient Evaluation of OSPA (2) : Evaluating (17) involves the following three steps: 1) Compute an m × n cost matrix C, where the jth column of the i-th row is given by x (i) , y (j) , according to (15). 2) Use a 2-D optimal assignment algorithm on the matrix C, to find the minimum cost 1-1 assignment of columns to rows. 3) Use the result of step 2 to evaluateď (c) p (X, Y ) according to (17). A basic implementation of this procedure would require computing the base distance between all pairs of tracks in X and Y , which has complexity O (|w k | mn).
Step 2 then requires solving a dense optimal assignment problem with complexity O (m + n) 3 . This would preclude its use in large-scale tracking scenarios involving millions of objects, as the cost matrix would consume too much memory, and the assignment problem would infeasible to solve.
Fortunately, in many practical applications, the base distance between most pairs of tracks will saturate at the cutoff value c. This can be exploited to dramatically reduce the computational complexity, making it feasible for large-scale problems. Firstly, recall that the time averaging in the base distance (15) is carried out only over the union of the track domains. Consequently, the base distance between any pair of tracks that have no corresponding states closer than a distance of c, must saturate at c. Such pairs can be considered unassignable, and efficient spatial searching algorithms can be applied to extract only the assignable pairs of tracks in O (|w k | m log n) time. Once these have been found, a sparse optimal assignment algorithm can be used to obtain the lowest-cost matching between the ground truth and estimated trajectories. Such algorithms can solve sparse assignment problems with a much lower average complexity than is possible under the dense optimal assignment formulation. 4) Visualisation of OSPA (2) : In practice, it is desirable to examine the tracking performance as a function of time, so that trends in algorithm behaviour can be analysed in response to changing scenario conditions. This can be achieved by plotting as a function of k, where w k is a set of time indices that varies with k, and where f | w denotes the restriction of f to domain w. Note that the sets X wk and Y wk only contain those tracks whose domain overlaps with w k , i.e. any tracks whose domain lies completely outside the set w k are disregarded. Choosing different values for the set w k allows us to examine the performance of tracking algorithms over different time scales. For example, a straightforward approach is to set w k = {k − N + 1, k − N + 2, . . . , k}, so that at time k, the set w k consists of only the latest N time steps. In this case, choosing a small value for N will indicate the tracking performance over shorter time periods, while larger values will reveal the longer-term tracking performance. This choice is highly dependent on the application at hand. For example, in real-time surveillance, we may only be interested in tracking objects over a period of a few minutes, as older information may be considered irrelevant. In this case, a small value for N would suffice to capture the important aspects of the tracking performance. However, in an off-line scenario analysis application, we might require accurate trajectory estimates over much longer time periods, in which case a larger value for N would be more appropriate.
Remark 3. Note that computing the OSPA (2) (for sets of tracks) on a sliding window as described above, converges to the traditional OSPA (for sets of points) as N becomes smaller. For N = 1 the OSPA (2) becomes identical to the traditional OSPA.
It is important to understand that the OSPA (2) distance has a different interpretation to that of the traditional OSPA distance. Whereas the traditional OSPA distance captures the error between the true and estimated multitarget states at a single instant in time, the OSPA (2) distance captures the error between the true and estimated sets of tracks over a range of time instants, as determined by the choice of w k . Therefore, careful consideration must be given to the design of w k , and the user must be mindful of this when interpreting the results.

B. Numerical Results
In this section, we demonstrate the scalability of the proposed GLMB filter implementation, by applying it to a simulated multi-target tracking scenario where the peak number of objects appearing simultaneously exceeds one million. The scenario runs for 1000 time steps, and the surveillance region is a 2D area with width 16 km and height 9 km. New objects are born during the time intervals The peak cardinality of 1, 082, 974 objects occurs at time 400. At this time, the mean target density is around 7, 500 objects per square kilometre, however, since the objects are not uniformly distributed in space, the peak density is approximately 32, 700 objects per square kilometre. The survival time for each object is chosen uniformly at random in the interval [400, 700], and the tracker assumes a survival probability of 0.999. The measurement noise standard deviation is 0.15 metres in both the x and y directions, and the detection probability is constant at 0.9. The false alarms are uniformly distributed across the space, with a Poisson cardinality distribution with a mean of 250, 000 per scan. In the label partitioning step within the tracker, the maximum number of labels per group is set to 20.
The tracking algorithm was coded in C++, making use of OpenMP to implement parallel processing wherever possible. We executed the algorithm on a machine with four 16-core AMD Opteron 6376 processors (for a total of 64 physical processor cores), and 256 GB of memory. On this hardware configuration, the peak time taken to process a frame of measurements was approximately 5 minutes when the cardinality was highest, but the algorithm ran considerably faster than this at times when there were fewer objects in the scene. The peak memory usage of the algorithm was approximately 50 GB. To evaluate the tracking performance, the OSPA (2) metric was coded in Matlab, and we used the "parfor" construct to parallelise some aspects of the computation. The average time taken to evaluate each point in the OSPA (2) curve was approximately 1 minute. For a scenario of this scale, it is not practical to show the trajectories of individual objects, thus we resort to showing summary statistics. The true and estimated cardinality is shown in Figure 1, and the OSPA (2) distance is shown in Figure 2. Figures 3 and 4 contain six snapshots of the true and estimated target density at various times during the scenario. For the OSPA (2) calculation, the cutoff was set to c = 2, the order was p = 1, and a sliding window over the latest 50 time steps was used to evaluate each point in the curve.
From the cardinality plot, it can be seen that the estimated cardinality lags behind the true cardinality during the times when new targets are being born. This is to be expected, as the measurement-driven birth model needs to consider a very large number of potential birth tracks at each scan. To avoid initiating too many false tracks, the filter delays the initiation of tracks until there is more data to confirm the presence of an object. The OSPA (2) plot shows increased error during the periods in which new targets are appearing, due the delay in initiating new tracks. At other times, the error stabilises to approximately 0.2 metres per object per unit time. We have presented an efficient and scalable implementation of the generalised labeled multi-Bernoulli filter, that is capable of estimating the trajectories of a very large number of objects simultaneously, in the order of millions per frame. The proposed method makes efficient use of the available computational resources, by decomposing large-scale tracking problems into smaller independent sub-problems. The decomposition is carried out via marginalisation of high-dimensional multi-object densities, using a technique that is shown to be optimal in the sense that it minimises the KLD for a given partition of the object label space. This allows the algorithm to fully exploit the potential of highly parallel processing, as afforded by modern multi-core computing architectures. Due to its relatively low processing time and memory requirements, simulations show that the proposed technique is capable of tracking in excess of a million objects simultaneously, running on standard off-the-shelf computing equipment. Additionally, we have introduced a new way of using the OSPA metric, that captures more information about the multi-object tracking performance. We have demonstrated that this metric can be feasibly evaluated in large-scale tracking scenarios.
When at least one of the sets D x ∪ D y , D y ∪ D z , D z ∪ D x is empty, the triangle inequality for tracks over time instants t 1 , t 2 , . . . , t k , t k+1 can be easily verified, since d (c) (·, ·) is a metric. Hence, we consider the case where D x ∪ D y , D y ∪ D z , D z ∪ D x are all non-empty.
Before proceeding with proof, we must define some notation. Let us denote the cardinalities of the basic sets in Nx as illustrated in the diagram below.
Furthermore, we adopt the following abbreviations: P |D x ∪ D y | = N + N p + N q + N r + Nx + Ny = S + Nx + Ny, Q |D y ∪ D z | = N + N p + N q + N r + Ny + Nz = S + Ny + Nz, The sum p over (the non-empty) D x ∪ D y can be decomposed into;p N consisting of N terms on (D x ∩ D y ∩ D z ),p Np consisting of N p terms on (D x ∩ D y − D z ), and (N q + N r + Nx + Ny) c since d ({x (k)} , {y (k)})=c on (D x ∪ D y ) − (D x ∩ D y ). Similar decompositions also apply to q, and r, i.e.
p p N +p Np + (N q + N r + Nx + Ny) c, q q N +q Nq + (N r + N p + Ny + Nz) c, The following boundsp and the following identities are required for the proof. Note that so far, all of the variables we have defined are non-negative.