ORACLE: Occlusion-Resilient and Self-Calibrating mmWave Radar Network for People Tracking

Millimeter wave (mmWave) radar sensors are emerging as valid alternatives to cameras for the pervasive contactless monitoring of people in indoor spaces. However, commercial mmWave radars feature a limited range (up to $6$-$8$ m) and are subject to occlusion, which may constitute a significant drawback in large, crowded rooms characterized by a challenging multipath environment. Thus, covering large indoor spaces requires multiple radars with known relative position and orientation and algorithms to combine their outputs. In this work, we present ORACLE, an autonomous system that (i) integrates automatic relative position and orientation estimation from multiple radar devices by exploiting the trajectories of people moving freely in the radars' common fields of view, and (ii) fuses the tracking information from multiple radars to obtain a unified tracking among all sensors. Our implementation and experimental evaluation of ORACLE results in median errors of $0.12$ m and $0.03^\circ$ for radars location and orientation estimates, respectively. Fused tracking improves the mean target tracking accuracy by $27\%$, and the mean tracking error is $23$ cm in the most challenging case of $3$ moving targets. Finally, ORACLE does not show significant performance reduction when the fusion rate is reduced to up to 1/5 of the frame rate of the single radar sensors, thus being amenable to a lightweight implementation on a resource-constrained fusion center.

R ADARS operating in the mmWave frequency band have emerged as valid alternatives to cameras for indoor monitoring, as they are robust to changing and poor lighting conditions, and raise less privacy concerns [1]- [3].Their use enables advanced sensing applications spanning contactless people tracking [4], personnel recognition [4]- [6] and movement classification [7].However, commercial mmWave radars have limited range [5] (up to 6-8 m) and are subject to occlusion [4], which may constitute a significant drawback in large, crowded rooms containing furniture and walls.Thus, covering large indoor spaces requires multiple radars (i.e., radar networks), with known relative position and orientation Marco Canil and Jacopo Pegoraro are with the Department of Information Engineering, University of Padova, Padova 35131, Italy (email: marco.canil@phd.unipd.it;jacopo.pegoraro@unipd.it).
Michele Rossi is with the Department of Information Engineering, University of Padova, Padova 35131, Italy, and with the Department of Mathematics "Tullio Levi-Civita", University of Padova, Padova 35121, Italy (email: michele.rossi@unipd.it).
and algorithms to combine their output information.
In this work, we tackle the largely unexplored design of distributed mmWave radar networks to monitor people in indoor spaces.Our aim is to develop automatic calibration and sensor fusion algorithms to enable the quick deployment of multiple, jointly operating radars with no human intervention and no accurate synchronization between the devices.Multistatic radars used in, e.g., [8], [9], require synchronizing the devices' clocks in order to allow coherent processing of the received signals.This is highly impractical, as it mandates clock distribution through an optical connection to be set up, jeopardizing the ease and speed of deployment.For this reason, we rather assume that the radar devices operate independently (i.e., they can only receive their own transmitted signals), and communicate the result of the target detection and tracking steps to a fusion center.In this scenario, we need to solve two main issues: (i) automatically obtain the positions and orientations of the radars (self-calibration), as they are often unknown, or it is impractical to measure them at deployment time; and (ii) combine the environment perception capabilities of the multiple radars (sensor fusion), so as to boost their sensing accuracy and mitigate occlusion.The few existing solutions to point (i) present significant practicality and usability limitations in real scenarios [10]- [12].Point (ii), instead, has not been investigated with indoor mmWave radars, to the best of our knowledge.This aspect is particularly challenging as we aim at enhancing the tracking accuracy without leveraging coherent processing, thus only assuming coarse synchronization as provided by popular network protocols (e.g., the network time protocol, NTP).
In this paper we propose ORACLE, a solution to the mmWave radar network deployment and integration problem.Our contribution is twofold.As a first step, ORACLE automatically estimates the location and orientation of multiple radars with respect to a common coordinate system through an improved version of our previous work [13].For this, ORACLE takes the trajectories of people moving in the environment as a reference.Then, the system fuses the information about moving people tracked by different radars at a fusion center (FC), enhancing the resilience of the tracking process in case of occlusion.ORACLE processes local information, transmitted by the radars to the FC, in a slotted-time fashion, thus handling the high variability in the frame rate of commercial radars.Then, it merges local tracks and provides a global representation of the moving targets in the environment.
The original contributions of this work are: 1) We propose ORACLE, a novel plug-and-play system for the real-time, automatic self-calibration and integration of multiple incoherent mmWave radars for indoor people tracking.2) As a first component of ORACLE, we present a fullyautomated method for the self-calibration of multiple mmWave radars.The algorithm extends our previous work [13] by adding a masking phase (see Section IV-A.3) that handles a wider range of cases and provides better calibration results.ORACLE estimates the relative positions and orientations of the radars with a median error of 0.12 m and 0.03 • , respectively, when 3 people move in the environment.3) ORACLE includes a track-to-track radar fusion algorithm that combines information about the same subject collected by different radars.This improves the mean tracking accuracy by up to 27% with respect to singlesensor tracking.4) We evaluate our method via an extensive measurement campaign through the RadNet platform [14], using 4 commercial mmWave radars deployed in realistic conditions and multiple subjects, including challenging human motion.In the most difficult case of 3 subjects moving concurrently, ORACLE achieves a tracking accuracy of 87% and a mean tracking error of 23 cm.
The remainder of this paper is organized as follows: Section II provides a summary of the related work.In Section III, the challenges of designing a mmWave radar network are introduced.Section IV presents and discusses ORACLE, the proposed method.Section V provides some insights regarding the practical implementation of ORACLE and presents the experimental results on our testbed.Finally, concluding remarks are drawn in Section VI.

II. RELATED WORK
Multistatic radars.Using multiple radar receivers with synchronized clocks enables coherent analysis of the received signals, which yields significant processing gains due to spatial diversity [15].Existing works have leveraged this principle for drone detection [16] and people tracking [8], [17]- [19].
Despite their potentially superior accuracy and resolution, the main drawback of multistatic radars lies in their impracticality and deployment cost.Indeed, a common clock source needs to be distributed to the receivers, either via optical links or GPS, which is not available indoors.This would prevent the radar sensors from being quickly deployed, used, or relocated.Conversely, we target a scenario where ease of deployment and minimal human intervention are key requirements.For this reason, multistatic radars are not applicable and ORACLE focuses on track-level sensor fusion from incoherent sensors, that are only coarsely synchronized (i.e., not at the clock level) using standard NTP.

Radar networks.
A large body of work has considered the use of incoherent radar networks in airborne and automotive applications to improve the detection and tracking capabilities of the standalone sensors, e.g., [20], [21].These works tackle the fusion of distributed radar tracks without leveraging the multistatic gain available with precise synchronization.However, the considered radar setups significantly differ from mmWave radar network deployments which typically take place indoors or in short-range (6-8 m) outdoor scenarios, where the presence of multiple subjects may create crowded scenes and occlusions.The latter occur when people or objects block the line of sight between a radar and the target.Moreover, most existing works on mmWave radar networks rely on very simple offline data association rules based on known sensor positions, with no data fusion to improve the tracking accuracy [22], [23].To the best of our knowledge, only one work [12] has addressed indoor radar networks for people tracking, although using a lower frequency band (7)(8)(9)(10).A major drawback of [12] is the assumption that only one person is present in the environment, which is unrealistic in general indoor settings.All the above limitations are solved by the proposed solution, which is the first system that (i) combines the information from multiple radars, handling the presence of multiple subjects, (ii) automatically estimates their relative positions and orientations, and (iii) shows robust real-time performance thanks to its low complexity and distributed computation load.Radar networks self-calibration.To our knowledge, only two works have tackled the problem of self-calibration in mmWave radar networks, i.e., [10], [11].Both have significant practical limitations: [10] requires that just a single subject, following a linear walking trajectory, appears in the field of view (FoV) of the radars, while [11] can handle multiple subjects, but all of them need to be static (e.g., sitting).Such assumptions considerably limit the application scope of these systems.Conversely, our method completely automates the calibration process, working with movement trajectories of arbitrary shapes and with multiple concurrently moving targets.Actually, ORACLE benefits from having multiple trajectories of complex and irregular shape that span a large portion of the FoVs of the radars, as they lead to a more accurate calibration.

III. PROBLEM OUTLINE
In this section, we first present an overview of mmWave Multiple-Input Multiple Output (MIMO) radars.Then, we formalize the problems of combining the information obtained by the different radars in the network at a central fusion entity, and of estimating their relative positions and orientations.

A. mmWave MIMO radars
A MIMO FMCW radar jointly estimates the distance, the radial velocity, and the angular position of the targets with respect to the radar itself [24].During the sensing process, the radar transmits sequences of linear chirp signals with bandwidth B. A full sequence, or "radar frame", is repeated with a period of T s seconds.The distance, r, and velocity, v, of the targets are computed from the frequency shift induced by the delay of each reflection, usually by applying discrete Fourier transform (DFT) processing.The FMCW radar distance resolution is related to the bandwidth B by ∆r = c/(2B), where c is the speed of light.This makes mmWave devices accurate to the level of a few centimeters using a bandwidth of 2-4 GHz [4].Furthermore, using a 2D array of multiple receiving antennas makes it possible to obtain the angle of arrival (AoA) of the reflections along the azimuth (θ), and the elevation (φ) domains, by leveraging phase shifts across different antenna elements.The azimuthal AoA resolution depends on the number of antennas N in the array and is given by ∆θ = λ/(N d cos θ), where d is the spacing between the antennas.Due to the high ranging resolution, a human presence in the environment generates a large number of reflecting points, which are detected by the radar.This set of points, usually termed radar point cloud, can be transformed into the 3-dimensional Cartesian space using the distance, azimuth, and elevation angles information of the multiple body parts.Each point is described by a vector [x, y, z] T including the point's spatial coordinates x, y, z obtained by transforming r, θ and φ.Movement trajectories can be tracked across time from the point-clouds.

B. Sensor fusion in mmWave radar networks
Consider a mmWave radar network consisting of S monostatic radar sensors.Each radar has local computational capabilities and a communication interface that enables them to transmit information to a FC.The sensors are identified by indices s = 1, . . ., S, while quantities related to the FC are denoted by superscript c .All radar sensors operate at discrete time steps of duration T s , indexed by variable k.The FC also operates at discrete time steps that, in general, may have a different duration T c and are indexed by variable m.
The people tracking problem relates to estimating the subjects' movement trajectories in the (x, y) horizontal plane across time, exploiting the measurements of the multiple radar sensors.For this, we define the state of subject u, seen by the FC at time m, as x m (u) = [x m (u), y m (u), ẋm (u), ẏm (u)] T , containing u's coordinates and the corresponding velocity components ẋm (u) and ẏm (u).We assume that the state's evolution obeys a constant-velocity (CV) model [25].At the FC, the state model for target u is where F Tc is the state transition matrix that projects the state forward by a time duration T c , according to the CV model, while w m (u) is the (global) Gaussian process noise, having zero-mean and covariance matrix W [4], [26].The process noise is here considered to be generated by a random acceleration that is not explicitly accounted for by the CV model [4].Sensor measurements of the state of target n, at time k, are obtained according to where z s k (n) is the observation obtained from sensor s, H is the observation matrix relating the observation to the state and v s k (n) is the sensor-specific measurement noise having covariance matrix V k [4].In our system, all sensors are of the same type and have the same specifications.Therefore, we can safely assume that the measurement error processes have the same zero-mean, Gaussian distribution, whose covariance is time-varying due to the dependence of the radars' resolution on the position of the targets in the FoV [4].
The aim of our system is to estimate x m over time, exploiting the measurements collected by the sensors.Note that: (i) the correspondence between the targets tracked by each sensor is unknown, and finding a suitable association between them is part of the problem we tackle; (ii) the algorithm can handle the fact that the same target may be tracked simultaneously by one or more sensors; and (iii) the sensors may collect the measurements and obtain state estimates at a different time granularity than that of the FC.
Each sensor locally tracks the targets in the environment.The common approach to people tracking from mmWave radar point-clouds [4], [5], [23] includes: (i) a detection phase via density-based clustering algorithms (e.g., DBSCAN [27]) to separate the reflections from multiple subjects; (ii) applying Kalman filtering techniques [28] on each cluster centroid to track the movement trajectory of each subject in space.The KF used at each sensor provides, at each time step, an estimate of the state of the targets in its FoV and the corresponding error covariance.We call N s k the number of such targets at time k, xs k|k (n) the estimated state of target n after the KF update step, and C s k|k (n) the associated error covariance.Note that the above quantities are sensor-dependent, as different sensors provide their own estimates of the state of the same target.We denote by T s k (n) = {x s k|k (n), C s k|k (n)} the track corresponding to target n as estimated by sensor s, expressed with respect to its own reference frame.In addition, we assume that each sensor is able to provide a timestamp, τ s k , corresponding to the current time step, according to its local time reference (e.g., internal clock or network time).For the timestamps to match between different sensors, some level of synchronization is needed within the radar network (e.g., using NTP).At the end of each time step, sensor s transmits the set If the same target is tracked by more than one sensor, the FC should maintain a single track for it, which is updated and improved by fusing the information coming from the sensors.Our aim is to develop an algorithm to estimate the position of the targets across time at the FC, in the form of central tracks obtained by combining the sensor information T s k , s = 1, . . ., S. The above problem is complicated by correlation between the estimation errors of the tracks obtained at the sensors and at the FC.From Eq. ( 1), one can see that some correlation exists between all the tracks that refer to the same target (also in different sensors), as the process noise is the same, but this can be typically neglected if the process noise has low intensity or if the radar measurement rate is high with respect to the subject's motion [26].Conversely, the error correlation between a central track and a sensor track of the same target cannot be ignored, as the FC obtains its own tracks as a function of the sensor tracks.This is especially true for our real-time application, where the fusion occurs frequently, e.g., from 10 to 15 times per second.

C. Self-calibration of mmWave radar networks
The track sets that the radars transmit to the FC are expressed in the local reference frames of the sensors.Any algorithm that fuses them to improve the tracking accuracy requires to know the sensors' relative position and orientation.However, manually measuring them is impractical and prone to errors, therefore an automatic self-calibration procedure is highly appealing.Here, we propose to exploit the trajectories of targets of opportunity that move within the radars' FoVs, independently tracked by each sensor.Tracks from different radars that correspond to the same target have almost the same shape, up to a rigid transformation and some noise.Estimating such rigid transformation parameters corresponds to estimating the sensors' relative position and orientation.
Considering the system of S radars, deployed in the same area, call F s , s = 1, . . ., S, their reference systems (RSs).Each RS consists of a pair F s = {t s , R s }, where t s is the 2 × 1 vector with the coordinates of the s-th RS's origin and R s is the 2 × 2 rotation matrix specifying its orientation.Without loss of generality, in this paper we consider a global RS (of the FC) which coincides with that of radar 1 and for which it holds that t 1 = 0 2×1 and R 1 = I 2 , respectively, the 2 × 1 zero vector and the rank 2 identity matrix.Selfcalibrating the system consists in estimating F s , s = 2, . . ., S. We define the movement trajectory of target n, as seen by sensor s, as the sequence of position estimates of the target, where k is the discrete time index.Note that ps k (n) contains the first two components of the KF state estimate xs k|k (n).An estimate of the rotation matrix and of the translation vector between radar s and radar 1 (our reference) can be obtained solving the following Least-Squares (LS) problem where SO(2) denotes the special orthogonal group in dimension 2 (i.e., the set of all possible rotations around a point in a 2-dimensional space) and || • || 2 is the Euclidean norm.While the translation vector of sensor s with respect to the global RS is directly obtained by solving Eq. ( 3), the orientation angle, denoted by θ s , is given by θ s = cos −1 (trace(R s )/2).
In Section IV-A we present the proposed approach to solve the self-calibration problem in the more complex and realistic scenario where: (i) multiple sensors concurrently track multiple targets; (ii) the track-target correspondence among different sensors is unknown, so, an association strategy has to be developed; and (iii) the trajectories should be aligned in time before using Eq.(3).

IV. PROPOSED APPROACH
In this section, we first present a high-level overview of the processing blocks of ORACLE and then provide a detailed description of each of them.Fig. 1 presents the workflow of ORACLE.Self-calibration.In this phase, the relative positions and orientations of the radars are obtained from the trajectories of targets of opportunity (see Section IV-A).The steps are: 1) Time alignment.A time alignment between the trajectories from radar 1 and the trajectories from radar s is sought, by minimizing the distance between the trajectories' timestamps (see Section IV-A.1).2) Track association.Using the time alignment from point 1), we solve the problem in Eq. ( 3) for all the trajectory pairs and compute a corresponding association cost matrix.Using the cost matrix, the best associations between track pairs are computed (see Section IV-A.2). 3) Masking.When estimating the roto-translation parameters at point 4), multiple track pairs will be used all together.To avoid that possibly wrong associations spoil the final results, all possible subsets of the best associations from point 2) are considered through a masking operation and a new association cost is computed using all the trajectories in each subset.In the end, the subset giving the lowest cost will be selected for the final parameters estimation in point 4) (see Section IV-A.3).4) Radar calibration.All the track pairs from point 3) are stacked together and used to set up a rigid transformation problem as in Eq. ( 3) that provides the final position and orientation estimates for the radar (see Section IV-A.4).Multi-radar fusion.Here, the tracks from the radars are fused at the FC to build a set of central tracks associated with the subjects in the environment (see Section IV-B).This includes: 1) Slotted sensor information processing.The tracks information from the sensors are sent to the FC and processed using a slotted protocol.(see Section IV-B.1).2) Track association.A method to associate (frame-byframe) those tracks that correspond to the same target, according to their statistical similarity, is used to select pairs of tracks to be fused (see Section IV-B.2). 3) Radar track fusion algorithm.The fusion algorithm combines sensor tracks with the central tracks using different rules depending on the type of fusion event (see Section IV-B.3).
A. Self-calibration 1) Time alignment: For simplicity of notation, call n 1 a trajectory from sensor 1 and n s a trajectory from sensor s.Each of them contains the sequence of the position estimates of some target that may, or may not, coincide.Sensors communicate position estimates to the FC along with their timestamps.Note that the trajectories may have a different length.The time alignment is then performed so that each position estimate of trajectory n 1 is associated with the position estimate of trajectory n s that minimizes the time difference between the two acquisition instants.Elements of trajectory n 1 that do not have a corresponding element of trajectory n s within T c seconds are discarded and vice-versa (recall that T c is the duration of a FC time step).This operation reduces the trajectories to a common length of K time-aligned positions.Call k 1 and k s the vectors containing the indices that provide the time-aligned sequences from radars 1 and s, respectively.Note that k 1 and k s have the same length.With the time alignment operation, we retain only the portions of the trajectories that are sufficiently well synchronized, in order to avoid performing the rigid transformation on wrongly associated points.Once the trajectory association has been established, we define the mean time shift of the pair where k j,k denotes the k-th element of vector k j .The value of τ (n 1 , n s ), expressed in seconds, is related to the alignment quality of the two trajectories and will be used in the association step (see Section IV-A.2).
2) Track association: Our data association strategy consists in computing a cost for each pair {n 1 , n s } and solving the resulting combinatorial cost minimization problem to obtain the best associations.We assume to have N 1 and N s trajectories available at radars 1 and s, respectively.Our cost function incorporates different aspects: (i) the length of the trajectories, as longer trajectories are assumed to provide a better calibration; (ii) the time alignment of the trajectories, as we should compare position estimates acquired almost simultaneously by the different radars; and (iii) the quality of the rigid transformation, in terms of residual error in superimposing trajectories from the different radars.We define the association cost, A, for the pair {n 1 , n s }, as where ξ(n 1 , n s ) is the sum of the LS residuals, after applying the time alignment and the rigid transformation, while ρ(K, τ ) is a factor that favors trajectory pairs with a long overlap and a low mean time shift.The rigid transformation parameters R (n1,ns) s , t (n1,ns) s are computed, using the time-aligned track pairs from point 1), solving the LS problem in Eq. ( 3) in closed-form through a Singular-Value Decomposition (SVD) method [29].Then, the LS residuals sum is computed as where k indexes trajectory positions.Recalling that T c is the sampling interval of the FC, the corrective term is formalized as arranged into a N 1 × N s cost matrix and the optimal association of trajectories is obtained by minimizing the overall cost, computed through the Hungarian algorithm [30].This yields N t = min (N 1 , N s ) pairs of associated trajectories, which are possibly the same targets seen by the two radars.
Due to the presence of spurious trajectories, ghost targets and clutter, we select a subset of the associated trajectory pairs that have a cost below a threshold A self , which represents a confidence value under which the pair is truly a trajectory pair generated by a human.Denote the set of selected track pairs by Q 1s = {{s 1 1 , s s 1 }, . . ., {s 1 Nt , s s Nt }}, where s q p indicates the p-th selected track from radar q.In our experiments, we empirically adopted A self = 18.
3) Masking: In phase 4) (see Section IV-A.4), one or more of the N t track pairs selected during the previous phase are used combinedly to compute the final self-calibration parameters.Ideally, each of the selected track pairs should provide two trajectories, one from each radar, corresponding to the same target.However, in practice, there might be wrong associations.To mitigate this shortcoming, in phase 3) all possible subsets of the selected N t tracks are considered, through a masking operation on the one-to-one track associations.We call it a masking operation as it corresponds to purposedly ignoring (masking) some of the track associations in the computation of the self-calibration parameters.The same association cost of Eq. ( 4) is computed by stacking together all the trajectories in each subset.Then, the subset providing the lowest cost is used for the final calibration.Formally, let P (Q 1s ) be the set of all possible subsets of Q 1s excluding the empty set.Recall that Q 1s is the set of all selected track pairs after the track association phase.Each element of P (Q 1s ) is a set of track pairs from radar 1 and radar s.For each element of P (Q 1s ), all the trajectories from sensor 1 are stacked in vector q 1 and all the trajectories from sensor s are stacked in vector q s and the same operation is performed with the corresponding timestamp sequences.Then, cost A(q 1 , q s ) is computed as in Eq. ( 4) and all costs are stored in a matrix of dimension (2 Nt − 1) × 1.The element of P (Q 1s ) providing the lowest cost is selected.The N * t ≤ N t trajectory pairs contained in such minimum-cost element are used in phase 4) to compute the self-calibration parameters.Since the masking phase cost is exponential in the number of track-to-track associations, it is possible to limit the maximum number of track pairs to be retained from Q 1s .In this case, the track pairs with the highest cost are to be excluded.In our experiments, we used a maximum of 5 track pairs.
4) Radar calibration: The N * t trajectory pairs selected during the masking phase are then stacked together and used to set Fig. 2: Scheme of the proposed fusion algorithm with 2 radars.
up a rigid transformation problem as in Eq. ( 3).The problem is solved with the same procedure described in Section IV-A.2 [29], obtaining the final rotation matrix and translation vector to calibrate radar s, namely {R * s , t * s }.This step exploits all the available information from multiple subjects, improving the calibration accuracy by increasing the number of useful measurements per time frame.Note that, even though target occlusion events may split a trajectory into multiple components, our algorithm still works by exploiting each resulting sub-trajectory.

B. Multi-radar fusion
Once the radar network is calibrated, i.e., we have an estimate {R * s , t * s }, ∀s, we can fuse the information coming from the S radars at the FC.In the following, we consider S = 2 for better clarity in the algorithm description, but the method works for an arbitrary S. We denote the precision matrix, which is defined as the inverse of a covariance matrix, by P = C −1 .The fusion algorithm, represented in Fig. 2, is described as follows: 1) Slotted sensor information processing: The FC maintains a central time variable, denoted by τ c m = τ 0 + mT c , which is incremented at the end of each central time step and where τ 0 is the time when the FC starts operating.In order to cope with the random variations in the sensor acquisition, processing, and communication times, the FC operates on time slots of duration T c .Specifically, several track sets from different sensors can be received during time step m due to differences between the FC and the sensors' time steps and the variable communication time.Using the timestamp information contained in the received tracks, at time m the FC filters out all the track sets that are not received within the interval (τ c m−1 , τ c m ] and retains only the most recent track set from each sensor.Formally, for each s, we select the track set whose timestamp is the solution to arg min τ s k (|τ c m − τ s k |).In the following, to highlight that a track set has been selected from sensor s to be processed in time step m, we denote it as T s m , using the time index of the FC rather than that of the sensor, and we do the same for all the tracks it contains.
The slotted processing procedure (i) reduces the number of fusion steps the FC carries out, using only the most recent information available from each sensor, and (ii) avoids erroneously fusing outdated tracks.
After selecting the sensor tracks, the FC transforms them to match its own reference system, using the information about the location and orientation of the radar sensors.Then, according to the CV model, the tracks are propagated to the FC current time.We denote by t * s = [t * s , 0 1×2 ] T the augmented translation vector and by R * s = blkdiag(R * s , R * s ) the 4 × 4 augmented rotation matrix of sensor s, where blkdiag(•) returns a block diagonal matrix of its inputs.The transformation and propagation are performed, together, as with xs m (n) and C s m (n) being the state and covariance communicated by sensor s that have been selected in the current central slot, expressed in the reference frame of sensor s, while xs m (n) and C s m (n) are expressed in the reference frame of the FC.In Eq. ( 5) and Eq. ( 6), the state evolution matrix F τ c m −τ s k projects the sensor state/covariance estimates forward by τ c m −τ s k , so that they are up to date with the current FC time.Similarly, the FC also performs a prediction step, for a time duration T c , on all its maintained tracks, by leveraging their motion model.For this, the standard KF prediction equations are used 2) Track association: The FC has to compute track-to-track associations before being able to fuse the information from the sensors with its own tracks, as it needs to identify which tracks correspond to the same target.There can be (i) sensorto-center associations (SC), to verify whether the sensor tracks correspond to any of the maintained central tracks, and (ii) sensor-to-sensor associations (SS), only for sensor tracks which did not find a SC association, to establish which tracks correspond to the same targets and consequently initialize the correct number of central tracks.The aim is to find a oneto-one association between two sets of tracks, respectively, indicized by variables i and j.Note that i and j may refer to two sensor tracks, in case of an SS association, or to a central track and a sensor track, in case of an SC association.
Initially, SC associations are considered.As the first step, it is verified whether associations from previous time steps are still valid.To this purpose, unique identifiers associated with each sensor, central track, and sensor track are exploited.Every sensor-to-center track pair that has a correspondence in previous SC associations is examined to verify whether the association still holds.This operation consists in computing the Mahalanobis distance [31] between the two tracks and confirming the association only if the distance is lower than, or equal to, a threshold A th .Formally, it is computed as d M (i, j) = (x(i) − x(j)) T P(i, j) (x(i) − x(j)) , ( 9) where P(i, j) is the precision matrix inducing the distance.All track pairs for which d M (i, j) ≤ A th are retained as valid SC associations, while the remaining ones undergo the following further association stages.Assume M and N central and sensor tracks are left to be associated, respectively.A M × N cost matrix, Λ, is obtained, where the value of entry Λ ij is computed differently depending on the relationship between central track i and sensor track j.
If i and j were previously fused together within a time interval of T th seconds, then, Λ ij is computed as in Eq. ( 9) with the difference that each track state estimate x(q) and each covariance matrix C(q), q = i, j, is replaced by xdec (q) = x(q) − x(j) and C dec (q) = (C(q) −1 − C(j) −1 ) −1 , respectively.x(j) and C(j) are the last communicated state and covariance matrix from track j, respectively.xdec (q) and C dec (q) represent the decorrelated versions of the corresponding quantities, according to the decorrelation principle [26], [32].The decorrelation operation removes the effect of previous fusion events that, otherwise, would affect the computation of the association cost [26].
If i and j were never fused together, or the fusion event happened more than T th seconds before, then, Λ ij = d M (i, j), as in Eq. ( 9), without modifications.Once matrix Λ is available, the minimum total cost association is obtained by using, e.g., the Hungarian algorithm [30], and all associations whose cost doesn't exceed threshold A th are considered as valid SC associations.
After these operations, all acceptable SC associations have been established and only SS associations are left to be computed.Let i and j be two tracks from different sensors.Then, a similar cost matrix Λ is built, where Λ ij = d M (i, j), as in Eq. ( 9).Since sensor tracks are originated from different sensors, they are negligibly correlated and there is no need to apply any decorrelation operation on them.Then, the minimum total cost association is obtained, as before, using, e.g., the Hungarian algorithm [30].In our experiments we adopted A th = 18 and T th = 1.3 × T c .
3) Radar track fusion algorithm: The track fusion algorithm behaves differently in case it has to combine two sensor tracks (SS fusion) or one sensor track with a central track (SC fusion).If the FC is currently not maintaining any track for a certain subject, but one or more radars are, a new central track needs to be initialized based on the received information from the local sensors.Specifically, two cases may happen: (i) if a target n 1 is currently tracked by one sensor only, the corresponding central track is initialized using the state and covariance of the sensor track; (ii) if the FC receives two tracks that can be associated, say T 1 m (n 1 ) and T2 m (n 2 ), these are fused into a single, new track associated with target with central index u (SS fusion).The local tracks from sensors 1 and 2 have uncorrelated (or negligibly correlated) errors as they are two sensor tracks, so they can be fused with a weighted combination of their states [26].The states are weighted by the precision matrices associated with the estimation errors at each sensor.The fusion equations used for the initialization of a new central track, at time m, in case (ii), are for the couple of associated tracks T 1 m (n 1 ) and T 2 m (n 2 ).Note that, to detect when a SS fusion has to be performed, our algorithm applies the SS association procedure to all the sensor tracks that have not been associated to any central track in the current slot.In case more than 2 sensors are available, the above process is repeated sequentially using sensors 1 and 2 first, then fusing the resulting track with the information from sensor 3 and so on until the track sets of all S sensors are used.
On the other hand, if the FC has already initialized the track for a subject, the fusion has to be performed between the central track and a sensor track corresponding to the same subject.Denote by T c m the set of tracks maintained by the FC at the central time step m.Upon receiving the local information T s m , the FC runs the track-to-track association algorithm to find pairs of corresponding tracks {T c m (u), T s m (n)}.Once such pairs are available, if the timestamp associated with T s m is older than T th , then, the same fusion rule of Eq. ( 11) and Eq. ( 12) is used, as the two tracks can be considered sufficiently decorrelated, otherwise, each central track is updated with its corresponding sensor track using the information decorrelation method [26], [32] as follows where Ps (n) and xs (n) are the last communicated precision matrix and state estimate of track T s m (n) from sensor s to the FC.Information decorrelation copes with the problem of time correlated tracks between the FC and the radar sensors, by removing the most recently received information about target n (or u from the FC perspective), as, otherwise, this would be accounted for twice.

4) Track initialization and termination:
To deal with the initialization and termination of central tracks, while keeping the complexity of the system as low as possible, we follow a so-called m/n logic, similar to what is done for the local tracking process of each radar sensor [4].Specifically, a track is maintained if it is associated with any of the received sensor tracks for at least m out of the last n frames.Similarly, received sensor tracks that are not associated with any existing central track are initialized as new tracks if they are detected for at least m out of the last n frames.As detailed in Section IV-B.3, before initializing a new central track, the received selected tracks form the radars are associated and fused with an SS fusion step, whenever possible.In this way, we avoid multiple initializations of the tracks corresponding to the same targets.

V. EXPERIMENTAL RESULTS
We evaluated ORACLE through an implementation of the RadNet platform [14]

A. Implementation notes on numerical stability
In this section, we provide insight into the numerical stability of ORACLE, which are key to implementing the system in practice.Specifically, during our experiments, we observed that ORACLE's operations on covariance matrices (e.g., the information decorrelation or roto-translations) can easily make them (i) non positive definite and (ii) ill-conditioned (with very large condition numbers), causing wrong track associations and fusion results because of the spoiled inverse matrices.This approach relies on the fact that a symmetric matrix is positive definite if and only if all its eigenvalues are positive.As a solution for (ii), we use Ridge regularization [33] to limit the condition number of the matrix.Let λ min > 0 and λ max > 0 be, respectively, the minimum and maximum eigenvalues of the positive definite matrix P pos .Denote by cond(P pos ) = λ max /λ min the condition number of matrix P pos , and by δ the regularization parameter.The regularized covariance matrix, P reg , is obtained as As a result of Eq. ( 16), the minimum and maximum eigenvalues of P reg are λ max + δ and λ min + δ, respectively.To limit the condition number of P reg , we specify an upper bound for its value, denoted by c * .Then, such bound is enforced by computing δ from cond(P reg ) = (λ max + δ)/(λ min + δ) ≤ c * , which is solved with equality by In our experiments, we empirically decided to adopt c * = 50.ORACLE applies the correction in Eq. ( 15) whenever a covariance (or precision) matrix is non positive definite.The regularization in Eq. ( 16), instead, is used if the condition number of a covariance (or precision) matrix is above c * .

B. Measurements setup and Dataset
To assess the performance of the proposed method, we conducted tests in a 7 × 4 m 2 research laboratory (see Fig. 3a) equipped with a motion tracking system featuring 10 cameras.This provides the ground truth (GT) 3D localization of a set of markers placed on the radars and on the moving subjects with millimeter-level accuracy.We considered 2 different scenarios with 4 radars and 1, 2, and 3 moving targets.Fig. 3 shows the locations and orientations of the radars in the different setups, where the black numbered dots represent the radar devices and the arrows identify their pointing direction.The blue dots represent the moving people, while the dashed lines show the traveled trajectories in the direction given by the blue arrows.The first row shows setup-1 deployments whereas the second row shows setup-2 deployments.We also asked the subjects to move according to 6 possible different trajectories: (i) in-line, identifying a movement along a straight line, one subject after the other; (ii) parallel, identifying a movement along parallel lines; (iii) circular, corresponding to the two subjects following parallel and circular trajectories; (iv) free, where all subjects could move freely in the room; (v) paraldiag, identifying a movement on parallel lines but with the subjects spaced apart along the movement directions; and (vi) vs-in-line, where the two subjects moved one towards the other following the same linear trajectory.All trajectories are depicted in Fig. 3.In total, we collected 55 sequences, each 40 s long.In every sequence, subjects were tracked by all 4 radars simultaneously and independently.Then, tracking information has been fused considering all possible combinations of 1 to 4 radars.However, we experienced the Jetson Nano DevKit edge computer not being able to properly track more than 2 targets simultaneously, and we noticed the issue after the experiments.For this reason, in order to provide reliable results, we decided to show results with up to 3 fused radars.After filtering out the corrupted data, considering all the evaluated combinations, we analyzed a total of 220, 187, and 91 experiments for 1, 2, and 3 fused radars, respectively.

C. Evaluation metrics
To evaluate the self-calibration algorithm performance, we define the orientation error as the absolute value of the difference between the true orientation angle of a radar and the estimated one.This is derived from the corresponding rotation matrix, after calibration, as explained in Section III-C.The position error is defined as the Euclidean distance between the estimated position of the radar and its true position.In order to assess the tracking performance, we adopt the Multiple Object Tracking Performance Accuracy (MOTA) metric, which accounts for the number of misses, false positives, and switches in the object detections, and the Multiple Object Tracking Performance Precision (MOTP) metric, which represents the mean position error by considering only correctly tracked objects.More details about these metrics can be found in [34].

D. Self-calibration
As previously mentioned, in the first part of this work we are presenting an enhanced version of mmSCALE [13], our  Fig.6: Example of a where the old algorithm fails while the new one doesn't.In this case, linear, similar trajectories led to a mistake in the track association phase, resulting in a shift in the estimated location of the radar.
self-calibration algorithm.The enhancement, consisting in the addition of the masking phase to the self-calibration procedure, allows to handle a wider range of cases, where the previous version was more prone to errors, and to achieve a general improvement in the accuracy of the calibration parameters estimation.To show the effect of the enhancement, we will focus on the comparison between the old and the new version of the self-calibration algorithm.
Qualitative results.Fig. 5 shows a qualitative example of the calibration process.Here, after finding the optimal rotation and translation parameters, we applied the rigid transformation to the trajectory seen by radar 2 (blue line, R2), so as to superimpose it with the one of radar 1 (orange line, R1).The transformed trajectory (green line) matches the reference one well, showing a good calibration result.We represent the reference radar with a red square (located at [0, 0] T ), while the black triangle and the purple square mark the estimated position of radar 2 and its GT, respectively.
As long as only one target is being tracked, the track association phase is easy and it is likely that no errors occur.If the number of tracked targets increases, the track association phase becomes more challenging.Our track association cost (see Section IV-A.2) is able to lead to a correct association for most situations.However, there are some particular cases where it is not sufficient.Fig. 6 shows an example of one of such cases with 3 subjects following a parallel trajectory.The blue lines represent the trajectories of the reference radar in its RS, the orange lines shows the trajectories of another radar after transforming them in the reference radar's RF using the old self-calibration algorithm, while the green lines represent the trajectories from the same radar after transforming them using the new self-calibration algorithm.In this scenario, all trajectories are very similar, as the subjects proceed in parallel and at the same speed.Because of the very high correlation between the track positions over time, the association costs are very similar and the final association result depends on subtle numerical variations due to the tiny differences between the trajectory shapes.From the figure, we notice that this causes a wrong association between the tracks from the reference radar and the other one, reflecting in a shifted estimate of the radar's RS, when using the old method.The new method, instead, correctly copes with this situation.
Position and orientation errors.Fig. 7 shows a comparison between the calibration performance using the old and the new version of the self-calibration algorithm versus the number of targets tracked.When only one target is tracked, there is almost no difference between the two algorithms, while a clear improvement is observed as the number of targets increases.In particular, the new algorithm has a great effect in reducing the sparsity of the box plots, meaning it is increasing the number of cases it is able to handle correctly.Tab. 1 shows the numerical results in terms of median and interquartile range (IQR).It also shows the difference between the new and the old algorithm.

E. Fusion center tracking accuracy
We evaluate the performance of the fusion algorithm in two cases: (i) using the roto-translation parameters obtained from the GT; and (ii) using their estimations from the selfcalibration algorithm.In order to evaluate ORACLE in a more realistic scenario, when using self-calibration, the transformation parameters are computed only once per setup-trajectory pair, that is, the first time a sequence with that setup-trajectory pair is elaborated.Then, all sequences with the same setup and trajectory type use the same parameters.Fig. 8 shows the tracking results versus the number of targets tracked and the number or radars used for the fusion.In the figure, xT-SF and xT-GT denote a case where x targets are tracked and, respectively, self-calibration or GT are used.The bar charts represent the 1 standard deviations for the self-calibration results.The values are computed as the average over all the sequences with the same number of radars and targets.Numerical results using self-calibration are presented in Tab. 2.
When only one target is tracked, there is almost no difference between using a single radar or multiple fused radars (+2%).As the number of tracked people increases, single sensors experience a remarkable decrease in the MOTA (−35%) while the FC maintains high performance, with a MOTA  as high as 87% when 3 targets are tracked, leading to an improvement with respect to single sensors of +27%.This is due to the fact that multiple targets may often create occlusions with respect to single radars, increasing the number of misses and switches in the tracks.Instead, occlusions can be mitigated by fusing data from different points of view.We also note that GT and self-calibration achieve very similar results in terms of MOTA.
In general, MOTP slightly increases in the fused tracking.The reason is twofold.First, noise can be incorporated during the fusion process and slightly affect the localization performance.Second, MOTP is computed only for correctly tracked targets.Because a single radar's tracking capability is limited, successful tracking occurs only in sufficiently simple scenarios, where MOTP would be straightfowardly low.On the contrary, multiple radars track targets successfully even in more complicated cases.This increases the range of points for which we compute MOTP to include more arduous and inevitably less precise location and movement estimates.Interestingly, for the 1T and 2T cases, after increasing from 1 to 2 radars, MOTP is almost constant when moving from 2 to 3 fused radars, suggesting that this could be the case also if more radars are fused.Following the 3T lines, instead, MOTP increases from 1 to 2 radars and then decreases when 3 radars are used, reaching the same values of the 2T lines.A possible explanation for this is that 2 radars are not enough for tracking 3 targets, leading to errors in the track associations that cause the MOTP to increase.3 radars, instead, have better tracking capabilities and can better handle 3 targets.For the most challenging scenario (3 targets), MOTP is 31 cm when 2 radars are used and 23 cm when 3 radars are used.MOTP is slightly lower when using GT rather than self-calibration because of the more precise knowledge about sensors' position and orientation.
As a final test, we acquired some sequences where we fuse all of the 4 radars for various 2-target trajectories, providing a MOTA and MOTP of 95% and 20 cm, respectively, while single radars on the same sequences reach a MOTA and MOTP of 78% and 11 cm, respectively.Since, for this test, we only collected a few sequences that do not represent a statistically significant set, we present them only as an example.These results show that our self-calibration algorithm works well in combination with the proposed fusion algorithm and that they can be effectively used together to enable an occlusion-resilient people tracking through a self-calibrated radar network, requiring almost no human intervention.

F. Robustness to reduced fusion rate
In certain resource-constrained applications it may be useful to reduce the FC processing rate of the sensors data, in order to lower the computational burden.However, this requires striking a balance between fusion rate and tracking accuracy, as decreasing the processing rate reduces the capability of the FC to follow the movement of the subjects.In Fig. 9, we show the MOTA and MOTP curves as a function of the ratio T c /T s .This is varied by fixing T s = 66.7 ms (15 fps) and changing T c from 0.2T s to 25T s .Values are obtained by averaging all experiments with 3 radars.We can identify three regions.(i) For T c /T s < 0.8, the MOTA is very low, as the FC runs significantly faster than the sensors and, therefore, has to rely mostly on the KF predictions, which are inaccurate after a few consecutive steps.The MOTP instead is unaffected as it is obtained only on the successfully tracked subjects.(ii) For 0.8 ≤ T c /T s ≤ 5, our system achieves the best performance in terms of MOTA, i.e., over 90%.At the same time, the MOTP is still low, with errors of less than 29.4 cm when using self-calibration with T c /T s = 5.This shows that, if necessary, the processing load on the FC can be reduced by 5 times with negligible performance degradation.
(iii) For T c /T s > 5, MOTA degrades slowly and MOTP increases.This is because the time-step of the FC, especially towards the end of this region, is too long to accurately follow human movement using the CV model.Finally, we notice that the MOTA is almost unaffected by using self-calibration in place of the GT sensor locations and orientations.This holds for all regions (i)-(iii).However, as expected, the MOTP is slightly worse in case self-calibration is used, as the residual error in the locations of the sensors indirectly affects the FC tracking precision.

G. Effect of radars' location on fused tracking
When tackling the problem of people tracking through multiple radars, it is interesting to explore how different radar deployments affect the results.This kind of study is beyond the scope of this paper and would require a denser deployment of radars.For this reason, here we show only some preliminary results, while we leave a deeper inspection of the problem to future developments.In Fig. 10 we compare the results for 3 trajectories and some specific combinations of radars for the fusion.In particular, according to setup-1 (see Fig. 3, first row) combination (1, 3) corresponds to two radars facing each other, combination (2, 3) has perpendicular radars, while combination (1, 2, 3) fuses all of the three.For each combination, results are computed using self-calibration and averaging over all sequences featuring the particular trajectory, with either 1, 2, or 3 targets.Considering MOTA, perpendicular radars are generally better than facing radars, while fusing 3 radars always provides the best results.Regarding MOTP, there is no clear trend common to all trajectories.The median MOTP is always within the [0.20, 0.25] m interval, which means there are no great differences depending on the combinations.The only case worth mentioning is that featuring free trajectory and (1, 3) combination, where MOTP values are generally higher than in the other cases.In conclusion, from this brief analysis, it appears that (i) a larger number of radars is to be preferred over a lower one, and (ii) radars with more diverse points of view provide, in general, better tracking results.

VI. CONCLUSIONS
In this work, we presented ORACLE, a solution to the mmWave radar network deployment and integration problem for human sensing purposes.First, ORACLE automatically estimates the relative position and orientation of the radars with respect to a common reference system.Then, it exploits such estimates to fuse the information about people tracked by different radars at a fusion center, enhancing the resilience of the subject localization in case of occlusions.ORACLE estimates the radars' position and orientation with a median error of 0.12 m and 0.03 • , respectively, exploiting the movement trajectories of tracked people.With respect to existing self-calibration techniques, ORACLE is more robust to multiple subjects concurrently moving in the environment, with no need to follow any predetermined trajectory for the calibration.By fusing multiple radars tracking information, ORACLE improves on single sensors by up to 27% in mean tracking accuracy, with a mean precision of 23 cm in the most challenging case of 3 targets moving.Finally, ORACLE handles different time steps for the single sensors and for the FC, keeping the tracking accuracy higher than 90% when the ratio between the central and the sensors time step is 0.8 ≤ T c /T s ≤ 5.These results substantiate ORACLE as a key technology enabler for distributed people tracking with radar networks, serving as a base system for a large variety of applications, from personnel recognition to restricted areas monitoring, elderly care, customer profiling, and many others.
Future research includes the extension of ORACLE to multiple disjoint radar networks (where none of the radars of a network share a part of the FoV with any of the radars of the other networks), and the study of how different points of view of the same scene affect the tracking performance.

Fig. 3 :
Fig. 3: Setup schemes.The black numbered dots represent the radar devices and the arrows identify their pointing direction.The blue dots represent the moving people, while the dashed lines show the traveled trajectories in the direction given by the blue arrows.The first row shows setup-1 deployments whereas the second row shows setup-2 deployments.

Fig. 4 :
Fig. 4: Experimental setup.The fusion center, not shown in the picture, is connected to the edge computers through a switch.

Fig. 5 :
Fig. 5: Example of self-calibration with a free trajectory.

Fig. 7 :
Fig. 7: Comparison of the self-calibration results when using the old and the new self-calibration algorithm as a function of the number of targets tracked during the calibration phase.

Fig. 8 :Fig. 9 :
Fig.8: Average MOTA and MOTP as a function of the number of radars used for the fusion and of the number of targets tracked.Solid and dashed lines identify, respectively, results obtained using self-calibration (SF) and ground-truth (GT) to estimate radars' location and orientation.The bar charts represent the 1 standard deviations for the SF case.

TABLE 1 :
Comparison of self-calibration algorithms

TABLE 2 :
Summary of ORACLE's tracking performance