Localizing Unsynchronized Sensors with Unknown Sources

We propose a method for sensor array self-localization using a set of sources at unknown locations. The sources produce signals whose times of arrival are registered at the sensors. We look at the general case where neither the emission times of the sources nor the reference time frames of the receivers are known. Unlike previous work, our method directly recovers the array geometry, instead of first estimating the timing information. The key component is a new loss function which is insensitive to the unknown timings. We cast the problem as a minimization of a non-convex functional of the Euclidean distance matrix of microphones and sources subject to certain non-convex constraints. After convexification, we obtain a semidefinite relaxation which gives an approximate solution; subsequent refinement on the proposed loss via the Levenberg-Marquardt scheme gives the final locations. Our method achieves state-of-the-art performance in terms of reconstruction accuracy, speed, and the ability to work with a small number of sources and receivers. It can also handle missing measurements and exploit prior geometric and temporal knowledge, for example if either the receiver offsets or the emission times are known, or if the array contains compact subarrays with known geometry.


I. INTRODUCTION
In MIMO radar [1], ultrasound imaging [2], underwater acoustics [3], time-reversal [4], [5], and room acoustics [6], [7] a collection of sources emit signals that are then captured by the receivers. In these applications, we often need an accurate knowledge of the geometry of the source and receiver positions to proceed with common array processing algorithms such as beamforming and source localization. Knowing the sensor array geometry similarly enables the reconstruction of physical fields in environmental monitoring using ad hoc sensor networks [8], [9]. Further, knowing device locations enables a host of location-based services in the context of the Internet of Things [10]. The recurring quintessential problem is thus to efficiently estimate the geometry of a set of nodes.
We study estimating the geometry of the nodes from the times of arrival (TOAs) of the source signals. The setup is illustrated in Figure 1 where receivers register TOAs of source events. Some events can be (near-)collocated with the nodes, such is the case with transceivers. If the sources are anchors with known positions, locating the nodes becomes an exercise in geometry: intersecting spheres or hyperboloids depending In  on whether or not the devices are synchronized. However, the most general and practically appealing setup is when the sources are at arbitrary, unknown locations. This enables one to use signals of opportunity such as speech, transient sounds, or radio signals [11]. But it also means that receivers are no longer synchronized with the sources. To complicate things further, receivers themselves do not have to be mutually synchronized. While in many audio applications, microphones are connected to a common interface, we are also surrounded by ad hoc networks of smartphones and voice-based assistants. Thus, not only do we measure times of arrival as opposed to absolute times of flight, but those registered times of arrival further depend on the unknown time reference frame of each receiver. In this paper, we introduce an algorithm for jointly localizing a set of receivers and a set of sources from measured TOAs in the general case when all nodes are not synchronized: sources go off at unknown times and the reference time frame of each receiver can be different and unknown.

A. Notation
We denote matrices with capital letters (S , R) and vectors with lowercase letters ( s, r). Uppercase subscripts indicate the size, e.g., J L is an L × L matrix, and R M ×K is an M × K matrix. Lowercase subscripts d ij indicate the element at the i th row and j th column of a matrix. A superscript T as in J T denotes the matrix transpose. Lowercase superscripts x (k) denote the k th iteration. The real d-space is denoted R d . The vectorization operator is denoted vec(x). We use the diagonalization operator where diag(G) is a vector of the elements on the diagonal of the matrix. Similarly, diag(σ 1 , . . . , σ N ) is an N × N matrix with σ 1 , . . . , σ N on the diagonal.

B. Related Work
Among the earliest work on self-localization are the papers of Rockah and Schultheiss [12], [13] and Weiss and Friedlander [14] from the 1980s. Plinge et al. [15] provide an indepth overview of microphone array localization techniques. A systematization of existing approaches to self-localization is also given by Wang et al. [16]. In the following, we review some of the major points, related to localization in different setups. An overview is also shown in Table I.
In some cases, the pairwise distance between all the nodes can be estimated. This happens for example when the nodes can both send and receive [35] or by measuring the diffuse noise coherence [36]. Localization then amounts to multidimensional scaling (MDS) [17], [18], [19], [20].  Fig. 1. The problem setup addressed in this paper. Source events (such as sounds in the environment produced by people or loudspeakers, or electromagnetic signals of opportunity) are captured by the receivers. In this illustration, there are three loudspeakers (sources) with emission times {τ k } 3 k=1 . We also have three microphones (receivers) numbered 1, 2, 3 with internal times of reference {σm} 3 m=1 relative to a global reference (not numbered). The time of arrival t mk at each microphone depends on the source's emission time τ k and the microphone's offset σm. Not all source events reach all receivers, thus some entries t mk might be missing.

Approach
Input Unknowns Challenges Multidimensional scaling (MDS) [17], [18], [19], [20] Pairwise distances -Requires full synchronization Multidimensional unfolding [21], [22], ML optimization [14] Distances between nodes/events -Requires full synchronization, Bad local minima SDP relaxations [23], [24], [25], [26], [27], [28], [29] Distances/TDOA/FDOA -Requires anchor nodes or positions of the sensor nodes Majorization [30], Two-stage [31], [32], [33], [34] TDOA Source or receiver offsets Bad local minima, cannot handle near-minimal configurations Two-stage [16] TDOA Source & receiver offsets Slow, cannot handle near-minimal configurations Proposed TOA/TDOA Source & receiver offsets -A more common situation in audio applications is that the nodes can only receive or only send. The "sending" nodes need not be real devices; they can be any acoustic events or signals of opportunity. We can distinguish the case when the sources and the receivers are synchronized so we can estimate the source-receiver distances, or the various cases when sources, receivers, or both sources and receivers are not synchronized. a) Known Source Emission Times and Receiver Offsets: If the emission times happen to be known and the nodes and events all have a common time reference (that is, they are synchronized) 1 , then the TOAs correspond to times of flight and directly give the distances between the nodes and the events. Given these distances, the joint localization problem reduces to multidimensional unfolding [21], [22]. Some methods are based on direct optimization of the maximum likelihood (ML) criterion [14], but these often fail due to the non-convexity of the objective and the existence of bad local minima [16]. Other methods are based on Euclidean distance matrices and semidefinite programming (SDP) [37].
Crocco et al. [22] proceed by constructing a certain lowrank matrix of differences of squared TOAs from which the positions can be recovered by solving a low-dimensional non-convex optimization problem as follows. The low-rank matrix of differences of squared TOAs [22] is related to the (translated) source positions S and receiver positions R as H ≈R TS , where the latter is the matrix of inner products 1 Additionally, the internal delays of the receivers are known.
between centered receiver points and centered source points. The estimated H is decomposed using singular value decomposition as H = U ΣV T . Up to a translation due to centering, the source and receiver locations are thenS = Q −1 ΣV T and R = (U Q) T for some unknown invertible 3 × 3 matrix Q.
Since we have that H =R T QQ −1S , finding the correct relative geometry (between R and S ) amounts to identifying the right Q and the difference vector between the center of R and the center of S which is done via non-convex optimization and can suffer from local optima, especially in the presence of noise. In our approach, we avoid the need to estimate Q to stitch R and S together by working with the full point set X = [R; S ] and the corresponding Gram matrix, instead of splitting X into R and S . The Gram matrix and its constraints then automatically encode the relative geometry and glue together the sources and the receivers (see Section III-A).
b) When One Set of Times is Known: A more realistic scenario is when the source emission times are unknown and different but the microphones are synchronized. The reverse case is also relevant: when the receivers are not synchronized but the source emission times are known. This is the case if we use a distributed sensor array.
The two unknowns (geometry and one set of times) can be estimated by directly optimizing an objective, for example using majorization [30]. Again, this often fails due to the nonconvexity of the problem and cannot work with near-minimal configurations.
An alternative approach is to estimate the unknowns sequentially, first recovering the unknown times, and then using them to recover the geometry [31], [32]. Early work of Pollefeys and Nister [38] exploits the low rank of a certain matrix of squared TOA differences. Their work is a near-field generalization of the work of Thrun [33], which was also adapted to work with missing measurements [34]. Heusdens and Gaubitch propose a more robust scheme based on structured total-least-squares [39] to reconstruct the times. 2 c) Unknown Emission Times and Receiver Offsets: The closest to our work is the method of Wang et al. [16] who treat the most general case with unknown source emission times and receiver offsets. Same as the methods that address the case of one unknown set of offsets [39], Wang et al. proceed in two stages. The timing information is estimated via structured least-squares (Gauss-Newton), noting that with the correct timing estimates a certain matrix computed from the measured T(D)OA measurements must become low rank. This timing estimation procedure may require numerous random restarts. Once the timings are estimated, they proceed as Crocco et al. [22] to recover the geometry. d) Convex Relaxations: In the context of sensor network localization (with fixed anchor nodes), Biswas et al. [23], [24] proposed to use SDP relaxations [28], [29] to handle the non-convexity introduced by the square root in the distance measurements. Similar relaxations have since also been used for source node localization from TDOA measurements (see e.g. Yang et al. [25], Vaghefi and Beuhrer [26]) and mixed TDOA/FDOA measurements (Wang et al. [27]). Yang et al. [25] also proposed a SOCP relaxation; however, it requires that the target node lies within the convex hull of the anchors. Jiang et al. [40] propose to use Truncated Nuclear Norm Regularization (TNNR) to solve the TDOA self-calibration problem. Their optimization scheme consists of solving a sequence of convex surrogate problems based on the (convex) nuclear norm. We propose SDP relaxations for the full selfcalibration problem, where both senders are receivers are unknown and not synchronized. e) Minimal Estimation Problems: A series of work considers minimal problems in TOA, where the goal is to estimate the unknowns from as few measurements as possible. Following up on their work on minimal problems for TOA measurements [41], Kuang et al. [42] propose a stratified approach for estimating the unknown time offsets from TDOA measurements. Once the offsets are recovered, the solvers from their previous work [41] can be applied to recover the sender and receiver positions. Zhayida et al. [43] (and later Farmani et al. [44]) considered the minimal solutions to the special case of dual microphone setups. Burgess et al. [45] proposed solutions for settings where either the sender or the receivers lie in a lower dimensional space. Batstone et al. [46] consider the case of constant offset TDOA self-calibration (i.e. transmissions have a known period but unknown offset); this is also a stratified approach which first solves for the 2 Heusdens and Gaubitch in fact address the case when the microphone delays are unknown and the source is periodic; this is equivalent to only having unknown microphone delays; see Section IV-C. unknown time offset [42]. We show that our approach can also succeed in near-minimal configurations.

C. Contributions
Our work is motivated by the fact that the two-stage procedure is suboptimal. The (valid) reason to adopt sequential estimation are poor local minima in the loss function which involves both the times and the positions. From a statistical point of view, however, joint estimation is preferred. Further, timing estimation can fail or require many random restarts; the same holds for the non-convex optimization to recover the relative configurations fromR TS . The two-stage procedure also makes it challenging to exploit prior geometric information such us known distances or distance bounds. In light of related work and the above discussions, we summarize our contributions as follows: • What we are usually interested in are the sensor positions. We thus formulate a new self-localization loss which is invariant to offsets and delays. The proposed loss uses non-squared distances as opposed to the usual squared; we prove that minimizing this objective yields the maximum likelihood estimate of the positions under Gaussian noise. It enables us to recover the geometry without worrying about the times and without random restarts. • We formulate a non-convex semidefinite optimization problem in terms of this new loss. We work with the full point set X = [R; S ] and the corresponding Gram matrix. The Gram matrix and its constraints intrinsically glue together the sources and the receivers and thus obviate the need for the non-convex optimization step of Crocco et al. [22] (see Section III-A); • Although our method does not require synchronization, it can leverage synchronization among the receivers or the sources if it is fully or partially present [39], [47], [48]. It can further leverage geometric side information such as when the receivers contain subarrays with known relative geometry or some sources and receivers are collocated. Finally, it can handle missing measurements. The proposed method achieves state-of-the art results, not only in terms of localization accuracy, runtime, or exploiting side information, but also in the ability to solve near-minimal problems with few nodes. Finally, we note that as suggested by Ono [30], our method can also be interpreted as a virtual synchronization method.

II. PROBLEM STATEMENT
We wish to localize a set of M receivers with unknown positions r 1 , . . . , r M ∈ R d using a set of K sources with unknown positions s 1 , . . . , s K ∈ R d . We assume that the sources emit pulses whose times of arrival (TOA) can be measured at the receivers. 3 Since we do not assume synchronization between the sources and the receivers, the absolute emission time τ k of the kth source is unknown. Since the sources are not necessarily synchronized among themselves, the differences τ k − τ k are also unknown. Similarly, since we do not assume synchronization among receivers (nor knowing their internal delays), the temporal frames of reference of each receiver are shifted by an unknown σ m with respect to some reference clock. The times of arrival of the kth source signal at the mth receiver thus become where for simplicity we let both σ m and τ k have arbitrary sign. We assume the transmission speed v to be known and w.l.o.g. let v = 1 for the remainder of the paper. Note that if we instead use the time-difference-of-arrival (TDOA), then with the first sensor as the reference, the TDOA ist And so we can still think of the TDOAt mk as a TOA but with modified emission times and offsets.

A. Minimal Number of Sources and Receivers
Common sensor localization scenarios are either in the horizontal plane (d = 2) or in 3D space (d = 3). The number of the degrees of freedom in the positions of sources and receivers is then d(M + K). However, since rigid motions of the entire setup cannot be distinguished from TOA data, we can choose the associated d(d+1)/2 degrees of freedom freely (d translational and d 2 rotational). Each source and receiver come with an additional unknown time, which gives another M + K degrees of freedom. However, we can choose a global time offset arbitrarily, which is another gauge of invariance and subtracts one degree of freedom. The total number of the degrees of freedom is thus As measurements we get M K TOAs. In order to get a dimension-zero solution set (a solution set that is a finite or countable set of points, but not necessarily unique), this number should at least match the number of the degrees of freedom. Solving for the minimal number of sources as a function of the number of receivers, we obtain the following relation implying that the smallest number of receivers that can be localized is M = 5 for which at least K = 13 sources are required to get a dimension-zero solution. Some other minimal cases are: (M = 7, K = 7), (M = 13, K = 5).
As we demonstrate with real and numerical experiments in Section VI, our algorithms can operate in the vicinity of this minimal regime, where previous methods either fail or perform poorly.

III. A LOSS INSENSITIVE TO OFFSETS AND DELAYS
Instead of proceeding sequentially by first estimating the unknown times as in the case of the state-of-the-art methods [39], [16], we note that the primary task is usually position estimation rather than timing estimation. Besides, when the positions are known, estimating the timings {σ m } M m=1 and {τ k } K k=1 boils down to a simple algebraic exercise (see Section III-D). We thus devise a data fidelity metric which is insensitive to the unknown timings, yet is only minimized with the correct positions.
We begin by writing (1) in matrix form as where T . Now we make the key observation: multiplying (2) on both sides by a matrix which has an all-ones vector in the nullspace will annihilate the two terms that depend on the unknown times.
Given an integer L, let J L be a geometric centering matrix of size L, 4 J It is easy to verify that does not depend on the unknown times. In fact, we can say more.
No similar statement can be made here simply because ∆ holds non-squared distances, and because it only corresponds to one off-diagonal block of D.
Proof. The nullspace of the linear operator Note that not all of those M K matrices are linearly independent, but the argument does not hinge on independence. Every matrix in the argument of span correspond to adding a constant offset to either a source or a receiver. As a consequence, any nullspace vector can be written as a linear combination of source / receiver offsets which proves the proposition.
What Proposition 1 says is that J M ∆J K is sufficient to determine the positions in the sense that the only locations such that this transformation matches the measurements are the true ones (if the original problem is solvable).
We can then propose the following timing-invariant objective for localization where · F denotes the Frobenius norm, R ∈ R d×M are the receiver positions, and S ∈ R d×K are the source positions. This objective is justified by the following direct corollary of Proposition 1.
Corollary 1. If the unsynchronized localization problem has a unique solution (up to a rigid transformation), then vanishing loss in (5) implies correct localization.
Further, the loss (5) is in fact equivalent to the ML loss given the offsets σ and τ . Namely, assuming i.i.d. Gaussian noise on the TOA measurements we have To see this note that Inserting into the original cost we get Solving for τ similarly yields f M L (∆, σ , τ ) = f (∆).

A. Characterization in Terms of the Full Gram Matrix
As described in Section I-B, a number of state-of-the-art methods based on low-rank matrix decompositions recover the matrix H =R TS , withR andS being some translated (centered) versions of R and S [22], [39], [16]. Matrix factorizations such as singular value decomposition can only determineR andS up to a multiplication by an arbitrary invertible matrix Q, since for any such Q we have that H = R T S , with R = Q T R, S = Q −1 S . Finding the right Q (which can be assumed to be upper-triangular which fixes the rotational gauge freedom) and the (one) translation vector is then achieved via direct non-convex optimization, though over a lower dimensional space than the original problem.
We sidestep this problem of having to determine Q in order to stitch together R and S by writing the measurements in terms of the full point set where N = M +K, and always working with the entire X . We denote the corresponding full Gram matrix by G = X T X ∈ R N ×N . The off-diagonal blocks of the Gram matrix G encode the relative geometry between R and S -by recovering G, we know that relative geometry. Note that this only resolves the relative ambiguity between R and S which is an artifact of the previous methods. The inherent invariance of localization from distances to Euclidean motions remains.
The matrix of squared distances (the EDM) between the column vectors in X can be written as a linear function of the Gram matrix. Letting D = ( x n − x n 2 ) N n,n =1 , one has with diag(G) = [g 11 , . . . , g N N ] T . This means that the matrix of distances ∆ between sources and receivers can be written as an entrywise square root of a linear function of G, since The constraint (10c) resolves the translation ambiguity; it is equivalent to the point set being centered around the origin 1 N n x n = 0. Note that now the receivers and sources are being translated together, unlike in earlier work where they were split. Further, the estimated locations are directly read out from a factorization of G, without the need for stitching.

B. Semidefinite Relaxation
The formulation (10a) remains nonconvex. Thus, direct approaches such as randomly-initialized first-order methods (e.g., projected gradient descent) are likely to get stuck in a local minimum. We thus propose to instead solve a convex relaxation of the problem.
There are two sources of nonconvexity: the entrywise square root in the objective, and the rank constraint on the Gram matrix. We begin by replacing the entrywise square root • L(G) by a new matrix variable B, and adding the constraint that the entrywise square of B must equal L(G).
Concretely, we write the objective as and ask that B •2 = L(G) to obtain an equivalent formulation. We now proceed by reformulating the added constraint as a rank constraint on some semidefinite matrix (or matrices). Since the entrywise square does not mix the entries of B, we can add such constraints on a per-entry basis. This is computationally more efficient (and empirically works better) than working with the entire vectorization of B at once.
We can equivalently write the constraint b 2 mk = L(G) mn as All the non-convexity is now lumped into the rank constraints, rank(G) = d and rank(L mk ) = 1 for 1 ≤ m, k ≤ M, K. The final step is to relax all of the rank constraints to get L mk 0, for all 1 ≤ m, n ≤ M, K, (14d) We can finally reconstruct the point set from the recovered G using singular value decomposition. We have G = U ΛV , where Λ = diag(σ 1 , . . . , σ N ) with the singular values σ i assumed to be sorted in decreasing order. We reconstruct the point set usingX = [diag(σ 1 , . . . , σ d ), 0 d×(N −d) ]V . If the rank of the estimated G is truly d, as it ideally should be since it describes a ddimensional point set, then the trailing singular values would be zero anyway.

C. Refinement by Levenberg-Marquardt
In general, solving the above semidefinite program gives point configurations which are good coarse approximations to the true geometry, although they are not good enough for most applications. The reason can be tracked down to the fact that after relaxing the rank constraints, the estimated matrices have higher rank than desired. Still, the positive semidefinite constraints constrain geometry sufficiently to get decent estimates in many situations. One strategy would be to employ various rank minimization strategies. Informally, we tried this and it works decently but it is very slow.
Instead, we propose to refine the output of the SDR using the Levenberg-Marquardt (LM) algorithm. It is fast, accurate, and it works better. We have also empirically observed that simple gradient descent is much slower requiring more iterations to converge.
To derive the LM updates, we need to compute the Jacobian of the loss in order to linearize it. The loss can be written as and δ = vec(∆) with ⊗ denoting the Kronecker product. Since ∆ : R d×(M +K) → R M ×K , the Jacobian is a tensor in R M ×K×d×(M +K) . We can compute with the understanding that the first M slices in the last index of D correspond to the receivers and the last K to the sources. Note that the Jacobian is not well defined at r m = s k . In that case, it would be possible to remove the corresponding variables from the optimization and adjust the Jacobian accordingly.
To make things easier to compute with, we rewrite T as a function of a single vector and rearrange the output into a vector so that δ( x) = vec(∆(R, S )). The Jacobian D δ is obtained from DT by collapsing the first two dimensions and the last two dimensions so that D δ ∈ R M K×d(M +K) . Finally the Jacobian of f is computed as and we have an affine approximation of f as (with some abuse of notation) This leads to the LM update, which is solved by The full localization algorithm is summarized in Algorithm 1.

Algorithm 1 Unsynchronized Sensor Localization.
Require: TOA T ∈ R M ×K Ensure: Positions of receivers R ∈ R d×M and sources S ∈ R d×K Initialize R and S using SDR (14). Update R and S using LM.
We use the SeDuMi solver [49] for the optimization problem (14). It has a O( √ n log 1 ) worst-case time complexity per iteration where n = d(M + K) is the number of variables. The LM refinement is O(d 2 (M + K) 2 M K) per iteration.

D. Estimating Offsets and Delays
Once an estimate ∆ of ∆ is computed, we can also compute the unknown offsets τ and delays σ. We start by noting from (2) that where the right hand side is linear in ( σ, τ ). We exploit it by vectorizing both sides: with V := 1 K ⊗I M , W := I K ⊗1 M . From here, estimating the times vector amounts to solving an overdetermined system of linear equations. Note that the system matrix has a nullspace of dimension one (spanned by α1) due to the global offset ambiguity. To eliminate the offset ambiguity we arbitrarily set σ 1 = 0, and let V be V with the first column removed. The least-squares estimate of the unknown timings is then given as with σ = [ σ 2 , . . . , σ M ] T and ( · ) † denoting the Moore-Penrose pseudoinverse.

IV. VARIANTS
In many cases, we have some prior information about the distances or offset times. The proposed method makes it straightforward to leverage this prior knowledge. In the following, we explain how by describing several typical scenarios.

A. Localization of Subarrays
Suppose that some distances are known. It is often the case that the set of receivers consists of several compact subarrays with known geometry (voice-based assistants, smart phones). These compact subarrays are then distributed in the space of interest at unknown positions and with random orientations, together with discrete microphones.
Since the EDM is a linear function of the Gram matrix, this prior knowledge corresponds to simple linear constraints on G. We thus augment the original program with a set of known inter-sensor distances for microphone pairs M = (m 1 , m 2 ) : m 1 < m 2 , distance d m1m2 between r m1 and r m2 is known , Then we add the following set of constraints to our optimization: If the set of receivers is partitioned as with the distances within the jth subarray R j known, these constraints correspond to knowing the diagonal subblocks of the upper-left block of D.
We can again proceed with an LM refinement step. In this case, we consider the following objective where g : R d(M +K) → R |M| is the known-distances residual defined as with edm(X , X ) m1m2 = x m1 − x m2 2 being the entries of the EDM of the point set X . We solve (17) using the Augmented Lagrangian algorithm [50]. The corresponding update is given in Appendix A.
In some cases the distances may not be known exactly, but we might have access to good bounds. We once more leverage the linearity of D in the Gramian by adding the following linear constraints to the program, whered m1m2 and d m1,m2 are upper and lower bounds on the distance d m1,m2 . Finally, we note that priors might be available not only for inter-receiver distances but also for inter-source distances and the distances between the sources and the receivers (we often have a coarse idea about how far the source events are from the microphones). These can be added to the constraints completely analogously.

B. When One Set of Times is Known
When one set of timings is known (either the source emission times or the receiver offsets), it would still be possible to use the general formulation with the derived loss (10a) designed for the case when both sets of times are unknown (2). Though that would require more measurements than necessary given such prior knowledge as we show next.
Both cases (unknown emission times or unknown offsets) are captured by the simplified measurement model up to a transposition as necessary (exchanging the role of sources and receivers). From here, it follows that the influence of the timings can be eliminated simply by one left multiplication by J M so that does not depend on τ . We can thus replace the loss (10a) by To see how this reduces the minimal required number of measurements (sources or sensors), recall the degrees-offreedom count from Section II-A. Concretely, For d = 3, this becomes K ≥ 4M −6 M −3 , so that the minimal number of receivers is now M = 4 and the minimal cases for the dimension-zero solution are M = 4, K = 10, M = 5, K = 7, M = 6, K = 6, and M = 9, K = 5. Another way to see this is by noting that the operator A → J M A has a smaller nullspace than A → J M AJ K .

C. Sources with Known (Constant) Offset
As a last example of prior information that is easily handled, we mention the practically relevant case where the source is for example a smartphone emitting periodic pulses. More generally, the source is emitting signals at known (not necessarily regular) intervals. The phone is not synchronized with the receivers, so the emitting times are strictly speaking unknown, but only up to one unknown offset-the start time of the emissions. This case has been studied in previous work [39], [46].
We show how to incorporate this using our proposed method. We have that with δ k is the known offset of the kth emission with respect to the unknown time τ o . When the emissions are periodic, we have δ k = k · δ, but for simplicity we keep the notation more general.
Note that in general, the vectors σ and τ can never be recovered uniquely, unless one of the times is known. The reason for this is that the global time reference can be decided arbitrarily (we used this fact when counting the degrees of freedom in Section II-A).
It then follows that the case of sources with known offsets is mathematically equivalent to the case when only the receivers' offsets are unknown. Concretely, first write where δ is a known vector. Then note that Since 1 δ is known it can be absorbed in the measurements; the one unknown offset τ o can then be absorbed in receiver offsets.

D. Missing Measurements
One upshot of our formulation is that it allows localization even when some sensor-receiver measurements are unavailable. We denote the set of missing entries by M = (m, k) : distance between m th receiver and k th source is missing .
We then replace the objective (14a) with where e n denotes the canonical basis, and the coefficients α mk account for the missing entries.
The LM refinement then proceeds with θ =   vec(R) vec(S ) vec(α)   ∈ R d(M +K)+|M| . The Jacobian now includes the derivative with respect to α as well Note that this approach can be used in the presence of outliers by first estimating them using RANSAC for instance. Then those outliers can be treated as missing entries.

V. NUMERICAL SIMULATIONS
In this section, we evaluate our approach on synthetic data. We implemented our algorithms in Matlab and used the CVX package for specifying convex programs [51], [52] with the SeDuMi solver. For reverberation simulations, we use the Pyroomacoustics [53] package in Python. In the following, we describe the setup and present a series of results for different scenarios showcasing the accuracy and flexibility of our method.

A. Setup
We consider an equal number of sensors and sources M = K in the range of 7 to 12.  Table II.

B. Evaluation
The reconstructed points are first optimally aligned using Procrustes analysis [54]. The localization error between the true [R, S ] and reconstructed [R,Ŝ ] points is then We present the results using box plots depicting the median, first quartile, and third quartile. The whiskers correspond to 1.5 times the inter-quartile range. Values over or below this are shown as outliers.
To facilitate comparison, we will also present the median and the 95% confidence interval. In all figures, we clip the errors to a minimum of 10 −3 which we assume is sufficient for most applications.

C. Results
As a baseline algorithm, we choose to compare to the twostage approach of Wang et al. [16]. We consider two aspects in our comparison: localization error and runtime. Note that we only report the results of Wang et al.'s approach without the further third-stage refinement using Ono et al.'s [30] algorithm. We had observed that the refinement takes a long time due to the required number of iterations and did not improve the results in the case M = K = 7.
The localization results comparing our approach (SDR+LM) to the two-stage baseline are shown in Figure 2. We also show our SDR results before the LM refinement. Note that while the SDR-only results are poor, they indeed provide a good initialization for the subsequent LM refinement as evidenced by the SDR+LM results. We further note that the outlier errors correspond to geometrically unfavorable source/receiver configurations that have attractive local minima. As can be seen, our timing-invariant approach outperforms the two-stage method, especially in the near-minimal configurations. As for the runtime, we show in Figure 3 a comparison of the average runtime over 20 runs as we increase the number of sensors and receivers for a fixed noise standard deviation σ = 10 −5 . The runtime of our approach is consistently less than 5 seconds, whereas the runtime for the two-stage approach increases significantly with the number of sensors. Thus, our timinginvariant approach avoids both having timing-estimation errors that propagate to the position estimation as well as the need for multiple random initializations that increase the runtime.
We now evaluate the case when the microphones are synchronized but the source emission times are still unknown. The results are shown in Figure 4 along with the results of the fully unsynchronized case. As can be seen, the localization performance with partial synchronization is better, especially for (M = 7, K = 7) and (M = 8, K = 8).
Next, we test the case when the inter-microphone distances are known, but the whole setup is still fully unsynchronized. The localization errors are shown in Figure 4. As would be expected, having side information significantly improves the localization performance.
For the last experiment, we attempt localization when some TOA data is missing. We test a range of missing entries from 4% to 20% of the total. Figure 5 shows the results for M = K = 12 where we also compare the performance to the complete data case. We can see that we are still able to localize when few entries are missing. However, as we have found some intrinsically bad configurations that cannot be accurately localized with complete data, it is expected that the problem would be exacerbated when some TOA entries are further missing.
In summary, our approach (SDR+LM) outperforms the twostage baseline (Wang et al. [16]) in terms of localization error and runtime. Additional side information such as partial synchronization or knowledge of some distances can be easily accommodated in our formulation, and as expected improve the localization results compared to the fully unsynchronized case.

D. Reverberation Results
We now evaluate the localization performance of our approach in the presence of reverberation.
We consider a room of dimensions 10 × 10 × 3. The M = K = 12 sensors and sources are randomly placed in the room. The sources are random Gaussian noise and do not overlap such that the segmentation at each microphone is known. We also test with speech sources from the Speech Commands dataset [55]. The SNR is set to 15 dB.
We vary the reverberation time by adjusting the wall absorption coefficients and maximum order for the imagesource method. For each reverberation time, we simulate 20 realizations. While the desired reverberation times are 0 s, 0.1 s, 0.2 s, 0.3 s, 0.4 s, and 0.5 s, the actual reverberation times obtained in simulation are slightly different but close. The parameters are summarized in Table III.
The TDOA are estimated using the method of Yamaoka et al. [56]. The error between the true t mk and estimatedt mk TDOA is calculated as We then use the estimated TDOA as input to our localization algorithm as well as the two-stage baseline of Wang et al. [16]. The results are shown in Figures 6 and 7. In Figure 6a, we show the median TDOA estimation error. As expected, the errors are larger as the reverberation time increases. Subsequently, this affects the localization errors as shown in Figure 6b. A similar trend is observed with the speech sources in Figure 7a. For both source types, our timing-invariant approach still outperforms the two-stage baseline.

VI. EXPERIMENTS
In this section, we evaluate our approach on real data [46] recorded in an office with most of the furniture removed. A loudspeaker played a chirp from 65 positions. We have access to the ground truth positions for 12 microphones measured using a laser, the 12 × 65 TOA matrix, and a mask indicating the positions of outlier TOA entries. While the TOA Fig. 2. Localization results for an unsynchronized setup comparing our approach (SDR+LM) to the two-stage baseline (Wang et al. [16]). We also show the results before the LM refinement (SDR). Results are shown at different noise levels and for M = K in the range of 7 to 12. Fig. 3. Average runtime of our approach (SDR+LM) compared to the twostage baseline (Wang et al. [16]). Results are for M = K in the range of 7 to 12. The noise standard deviation is fixed σ = 10 −5 . The runtime of our approach is consistently less than 5 seconds. was extracted from the recordings knowing the chirp, our algorithms also work for TDOA matrices which could have been extracted without any assumption on the source as shown in Section V-D.
A. Setup Figure 8 shows the TOA matrix, outlier positions, and the true microphone positions. A total of 23 loudspeakers provide clean data without outliers. Thus, for the first experiments, we will use the 12 × 23 subset of TOAs. Then, for the full 12 × 65 TOAs, we will use our missing data approach to handle the outliers. We report results over 200 runs where we randomly choose a subset of the loudspeakers to localize the 12 microphones.
Since we only have the ground truth for the microphone positions, we calculate the localization error between the true R and reconstructedR microphone positions as

B. Results
We first test our approach on subsets of the 12 × 23 TOA matrix that is outlier-free. We also compare to the Wang et al. [16] two-stage baseline. In Figure 9(a), we show the errors for localizing all 12 microphones using a varying number of loudspeakers from 6 (the minimal case for M = 12) to 11. Once again, our approach significantly outperforms the twostage baseline. Also similar to what we observed in the numerical simulations, while using more loudspeakers improves the average error, for each K, there are a number of large errors corresponding to intrinsically difficult configurations that have attractive non-global minima. However, the minimum error is consistent for all K and is less than 5 cm.
Next, we use subsets of the entire 12×65 TOA that contains outliers but assume knowing where the outliers are. We can thus use the missing data approach described in Section IV-D. Figure 10 shows the errors for localizing the 12 microphones using 6 to 21 sources. On one hand, the average error is larger compared to the outlier-free case. On the other hand, the best case scenarios that result in minimum errors are comparable to the outlier-free case. The smallest localization error is 4 cm and corresponds to K = 16 loudspeakers. In the case without any synchronization, our approach (SDR+LM) outperforms the two-stage baseline (Wang et al. [16]). Side information such as partial synchronization or knowledge of some distances improves the results.

VII. CONCLUSION
We formulated a timing-invariant objective that allows us to localize sensors and sources in an unsynchronized setting. Based on this objective, we proposed an SDP relaxation using the full Gram matrix of the point set and then a subsequent refinement step using the LM algorithm. We have thus eliminated the need for having to first estimate the unknown timings as well as avoided the multiple initializations required when solving non-convex problems. We also proposed an approach to handle missing data and showed how to seamlessly incorporate additional information such as knowledge of some distances or partial synchronization.
Using numerical simulations, we demonstrated the feasibility of the approach in different scenarios and in near-minimal configurations. We compared our approach to a two-stage state-of-the-art method [16]. Not only did our approach outperform the two-stage algorithm in terms of localization error, but also in terms of runtime. We also tested our algorithms on real times of arrival measured in an office environment. We were able to localize the 12 microphones to within 0.04 m average error.
Future work could focus on dealing with outliers which arise in the presence of multipath, for example. The method could be similar to our missing data approach except that the positions of the outliers are not known and need to be determined. The TDOA estimation errors are larger with speech compared to noise sources. The subsequent localization is also worse. Our timing-invariant approach outperforms the two-stage baseline.
Since (27) is nonlinear in θ, the LM algorithm is once again used in which the update is Df (θ (i) ) Df (θ (i) ) + µDg(θ (i) ) Dg(θ (i) ) + λ (i) I −1 Df (θ i ) f (θ (i) ) + µDg(θ (i) ) (g + z/(2µ) where Dg ∈ R |M|×d(M +K) is the Jacobian defined as   9. (Real Data.) Localization errors for localizing 12 microphones using 6 to 11 sources. We compare our approach (SDR+LM) to a two-stage approach (Wang). The minimum error with our approach is consistently less than 5 cm. The input TOA matrix is a subset of the 12 × 23 outlier-free TOA matrix. Fig. 10. (Real Data.) Localization errors for localizing 12 microphones using 6 to 21 sources. We compare using subsets of the available 12 × 65 TOA matrix (Full Data) to using subsets of the 12 × 23 outlier-free TOA matrix (Outlier-Free). For the former, we use the missing data approach where we treat outliers as missing entries.