Single-Look Multi-Master SAR Tomography: An Introduction

This paper addresses the general problem of single-look multi-master SAR tomography. For this purpose, we establish the single-look multi-master data model, analyze its implications for single and double scatterers, and propose a generic inversion framework. The core of this framework is nonconvex sparse recovery, for which we develop two algorithms: one extends the conventional nonlinear least squares (NLS) to the single-look multi-master data model, and the other is based on bi-convex relaxation and alternating minimization (BiCRAM). We provide two theorems for the objective function of the NLS subproblem, which lead to its analytic solution up to a constant phase angle in the one-dimensional case. We also report our findings from the experiments on different acceleration techniques for BiCRAM. The proposed algorithms are applied to a real TerraSAR-X data set, and validated with height ground truth made available via a SAR imaging geodesy and simulation framework. This shows empirically that the \emph{single-master} approach, if applied to a single-look \emph{multi-master} stack, can be insufficient for layover separation, and the \emph{multi-master} approach can indeed perform slightly better (despite being computationally more expensive) even in the case of single scatterers. Besides, this paper also sheds light on the special case of single-look bistatic SAR tomography, which is relevant for current and future SAR missions such as TanDEM-X and Tandem-L.


Introduction
Nan Ge, Richard Bamler, Fellow, IEEE, Danfeng Hong, Member, IEEE, and Xiao Xiang Zhu, Senior Member, IEEE Abstract-This paper addresses the general problem of singlelook multi-master SAR tomography.For this purpose, we establish the single-look multi-master data model, analyze its implications for single and double scatterers, and propose a generic inversion framework.The core of this framework is nonconvex sparse recovery, for which we develop two algorithms: one extends the conventional nonlinear least squares (NLS) to the single-look multi-master data model, and the other is based on bi-convex relaxation and alternating minimization (BiCRAM).We provide two theorems for the objective function of the NLS subproblem, which lead to its analytic solution up to a constant phase angle in the one-dimensional case.We also report our findings from the experiments on different acceleration techniques for BiCRAM.The proposed algorithms are applied to a real TerraSAR-X data set, and validated with height ground truth made available via a SAR imaging geodesy and simulation framework.This shows empirically that the single-master approach, if applied to a single-look multi-master stack, can be insufficient for layover separation, and the multi-master approach can indeed perform slightly better (despite being computationally more expensive) even in the case of single scatterers.Besides, this paper also sheds light on the special case of single-look bistatic SAR tomography, which is relevant for current and future SAR missions such as TanDEM-X and Tandem-L.
• Single-look single-master: Reigber & Moreira (2000) did the pioneering work on airborne SAR tomography by densifying sampling via the integer interferogram combination technique and subsequently employing discrete Fourier transform on an interpolated linear array of baselines [1].Fornaro et al. (2003Fornaro et al. ( , 2005Fornaro et al. ( , 2008) ) paved the way for spaceborne SAR tomography with long-term repeat-pass acquisitions and proposed to use more advanced inversion techniques such as truncated singular value decomposition [3], [5], [21].Zhu & Bamler (2010a) provided the first demonstration of SAR tomography with very high resolution spaceborne SAR data by using Tikhonov regularization and nonlinear least squares (NLS) [22].Budillon et al. (2010) and Zhu & Bamler (2010b) introduced compressive sensing techniques to tomographic inversion under the assumption of a compressible far-field profile.Zhu & Bamler (2011) proposed a generic algorithm (named SL1MMER) that is composed of spectral estimation, model-order selection and debiasing [23].• Single-look multi-master1 : To the best of our knowledge, the publications in this category are rather scarce.Zhu & Bamler (2012) extended the Tikhonov regularization, NLS and compressive sensing approaches to a mixed TerraSAR-X and TanDEM-X stack by using pre-estimated covariance matrix [24].Ge & Zhu (2019) proposed a framework for SAR tomography using only bistatic or pursuit monostatic acquisitions: non-differential SAR tomography for height estimation by using bistatic or pursuit monostatic interferograms, and differential SAR tomography for deformation estimation by using conventional repeat-pass interferograms Etc.† Incorrectly uses the single-look single-master data model and the previous height estimates as deterministic prior [25].However, the single-look single-master data model still underlies the algorithms in both publications.
• Multi-look single-master: Aguilera et al. (2012) exploited the common sparsity pattern among multiple polarimetric channels via distributed compressive sensing [13].Schmitt & Stilla (2012) also employed distributed compressive sensing to jointly reconstruct an adaptively chosen neighborhood [14].Liang et al. (2018) proposed an algorithm for 2-D rangeelevation focusing on azimuth lines via compressive sensing [26].Shi et al. (2019) performed nonlocal InSAR filtering before tomographic reconstruction [20].• Multi-look multi-master: In general, any algorithm estimating the auto-correlation matrix belongs to this category.Note that this is closely related to modern adaptive multi-looking techniques that also exploit all possible interferometric combinations [27]- [31].Gini et al. (2002) investigated the performance of different spectral estimators including Capon, multiple signal classification (MUSIC) and the multi-look relaxation (M-RELAX) algorithm [2].Lombardini (2005) extended SAR tomography to the differential case by formulating it as a multi-dimensional spectral estimation problem and tackled it with higher-order Capon [4].Duque et al. (2009Duque et al. ( , 2010) ) were the first to investigate bistatic SAR tomography by using ground-based receivers and spectral estimators such as Capon and MUSIC [32], [33].Duque et al. (2014) demonstrated the feasibility of SAR tomography using a single pass of alternating bistatic acquisitions, in which the eigendecomposed empirical covariance matrix was exploited for the hypothesis test on the number of scatterers [34].Fornaro et al. (2014) proposed an algorithm (named CAESAR) employing principal component analysis of the eigendecomposed empirical covariance matrix in an adaptively chosen neighborhood [15].This list has a clear focus on urban scenarios.Needless to say, SAR tomography in forested scenarios involving random volume scattering in canopy and double-bounce scattering between ground and trunk (e.g., [35]- [42]) also falls in the multi-look multi-master category.
Let us follow the common conventions and denote the azimuth, range and elevation axes as x, r and s, respectively, where s is perpendicular to the x-r plane.For the sake of argument, suppose for any sample at the x and r positions, the N single-look complex (SLC) SAR measurements are noiseless.After deramping, each phase-calibrated SLC measurement can be modeled as the Fourier transform Γ of the elevation-dependent far-field reflectivity function γ : R → C at the corresponding wavenumber k [21]: where k n := −4πb n /(λr 0 ) is the nth wavenumber determined by the sensor position b n along an axis b s w.r.t. an arbitrary reference, the radar wavelength λ, and the slant-range distance r 0 w.r.t. a ground reference point.Here we consider the nondifferential case.An extension to the differential case, in which scatterers' motion is modeled as linear combination of basis functions, is straightforward.
In the single-look single-master case, one SLC (say the ith, i ∈ [N ]), typically near the center of joint orbital and temporal distribution, is selected as the unique master for generating interferograms, i.e., g n g i /|g i |, ∀n ∈ [N ] \ {i}.This process, which can also be interpreted as a phase calibration step, converts k n into the wavenumber baseline As a result, the zero position of wavenumber baseline is fixed, i.e., ∆k i = 0.The rationale behind this is, e.g., to facilitate 2-D phase unwrapping for atmospheric phase screen (APS) compensation by smoothing out interferometric phase in x-r.
Likewise, the data model of random volume scattering is straightforward in the multi-look multi-master setting.Suppose γ(s) is a white random signal.For any master and slave sampled at k and k +∆k, respectively, the Van Cittert-Zernike theorem implies that the expectation (due to multi-looking) of the interferogram, being the autocorrelation function R ΓΓ of Γ, is the Fourier transform of the elevation-dependent backscatter coefficient function σ 0 : R → R at ∆k: where the property of γ(s) being white, i.e., E[γ(s)γ(s )] = σ 0 (s)δ(s − s ), is utilized.This leads to an inverse problem similar to the one in the single-look single-master case.
This paper, on the other hand, addresses the general problem of SAR tomography using a single-look multi-master stack.Such a stack arises when, e.g., • a stack of bistatic interferograms is used in order to diminish APS, to minimize temporal decorrelation of non-PSs, and to eliminate motion-induced phase for single scatterers [33], [43], [44], • repeat-pass interferograms of small (temporal) baselines are employed so as to limit the corresponding decorrelation effects of non-PSs [45]- [47].While both previously mentioned categories have been intensively studied, it is not the case for single-look multimaster SAR tomography.To the best of our knowledge, all the existing work to date toward single-look multi-master SAR tomography is still incorrectly based on the single-look singlemaster data model [24], [25].As will be demonstrated later with a real SAR data set, this approach can be insufficient for layover separation, even if the elevation distance between two scatterers is significantly larger than the Rayleigh resolution.This motivates us to fill the gap in the literature by revisiting the single-look data model in a multi-master multi-scatterer configuration, and by developing efficient methods for tomographic reconstruction.Naturally, this study is also inspired by prospective SAR missions such as Tandem-L that will deliver high-resolution wide-swath bistatic acquisitions in L-band as operational products [48].
Our main contributions can be summarized as follows.
• We establish the data model of single-look multi-master SAR tomography, by means of which both sparse recovery and model-order selection can be formulated as nonconvex minimization problems.• We develop two algorithms for solving the aforementioned nonconvex sparse recovery problem, namely, 1) NLS: we provide two theorems regarding the critical points of its subproblem's objective function that also underlies model-order selection; 2) bi-convex relaxation and alternating minimization (Bi-CRAM): we propose to sample its solution path for the purpose of automatic regularization parameter tuning, and we show empirically that a simple diagonal preconditioning can effectively improve convergence.
• We propose to correct quantization errors by using (nonconvex) nonlinear optimization.• We validate tomographic height estimates with ground truth generated by SAR simulations and geodetic corrections.The rest of this paper is organized as follows.Sec.II introduces the data model and inversion framework for singlelook multi-master SAR tomography.In Sec.III and IV, two algorithms for solving the nonconvex sparse recovery problem within the aforementioned framework, namely, NLS and BiCRAM, are elucidated and analyzed, respectively.Sec.V reports an experiment with TerraSAR-X data including a validation of tomographic height estimates.This paper is concluded by Sec.VI.

II. SINGLE-LOOK MULTI-MASTER SAR TOMOGRAPHY
In this section, we establish the data model for singlelook multi-master SAR tomography, analyze its implications for two specific cases, and sketch out a generic inversion framework for it.
We start with the mathematical notations that are used throughout this paper.
Notation.We denote scalars as lower-or uppercase letters (e.g., m, N , λ), vectors as bold lowercase letters (e.g., g, γ), matrices, sets and ordered pairs as bold uppercase letters (e.g., R, Ω), and number fields as blackboard bold uppercase letters (e.g., Z, R, C) with the following conventions: • g n denotes the nth entry of g.
• a m and a n denote the mth row and nth column of A, respectively.• Diag(a) denotes a square diagonal matrix whose entries on the main diagonal are equal to a, and Diag(A) denotes a vector whose entries are equal to those on the main diagonal of A. • Supp(x) denotes the index set of nonzero entries or support of x. • A, A T and A H denote the (elementwise) complex conjugate, transpose and conjugate transpose of A, respectively.
• A R and (A) denote the real part of A.
• A I and (A) denote the imaginary part of A.
• A • B denotes the Hadamard product of A and B.
• A 0, B ≺ 0 means that A is positive definite and B is negative definite.• A Ω denotes the matrix formed by extracting the columns of A indexed by Ω.
• A 1,2 denotes the 1,2 norm of A, i.e., the sum of the 2 norms of its rows.• I denotes the identity matrix.
• The nonnegative and positive subsets of a number field F are denoted as F + and F ++ , respectively.

A. Data Model
First of all, we give a definition of "single-master" and "multi-master" by using the language of basic graph theory (e.g., [49, §1]).Let G := (V(G), E(G)) be an acyclic directed graph that is associated with an incidence function ψ G , where V(G) := [N ] is a set of vertices (SLCs), E(G) is a set of edges (interferograms), and for each e ∈ E(G), ∃m, n ∈ V(G) such that ψ G (e) = (m, n).Its adjacency matrix A(G) := (a m,n ) ∈ {0, 1} N ×N is given by Since G is acyclic, a n,n = 0, ∀n ∈ V(G), i.e., the diagonal of A(G) contains only zero entries.Without loss of generality, assume that every vertex is connected to at least another one.
Definition 1.The single-master configuration means that there exists a unique i ∈ [N ] such that a i,n = 1, a m = 0, ∀m, n ∈ [N ] \ {i}.In this case, we refer to {g n g i /|g i |} as the singlemaster stack with a master indexed by i.
Definition 2. The multi-master configuration means that i ∈ In this case, we refer to {g n g m } as the multi-master stack.
That is, "multi-master" is equivalent to "not single-master".Two exemplary configurations are illustrated in Fig. 1.
In the multi-master case, an interferogram is created for each (m, n) ∈ E(G): Hereafter, we focus on the case in which the far field contains only a small number of scatterers such that where γ l ∈ C is the reflectivity of the lth scatterer located at the elevation position s l .The single-look multi-master data model (4) becomes In the next subsection, we analyze the implications of (6) for the single-and double-scatterer cases.

B. Implications
In the single-scatterer case, (6) becomes i.e., the multi-master observation is actually the Fourier transform of the reflectivity power at the wavenumber baseline k n − k m .As a result, the nonnegativity of |γ| 2 should be considered during inversion.Since both the real and imaginary parts of g n g m are parametrized by |γ| 2 , i.e., the inversion problem can be recast as a real-valued one.
For double scatterers, the multi-master observation is In addition to the Fourier transform of the reflectivity power at k n − k m , the right-hand side of (9) contains the second and third "cross-terms" in which the reflectivity values of the two scatterers (and their frequency-time-products) are coupled.This essentially rules out any linear model.
Remark.In the multi-look multi-master setting, the data model under random volume scattering is as already indicated in Eq. ( 2), i.e., no coupling is involved.
Remark.A multi-master bistatic or pursuit monostatic (i.e., 10-second temporal baseline [50]) stack is in general not motion-free for double (or multiple) scatterers.
To see this, consider for example the linear deformation model d(t n ) := vt n , where v and t denote linear deformation rate and temporal baseline, respectively.Observe that (11) In the case of t m = t n , the motion-induced phase in the crossterms vanishes if and only if The next subsection introduces a generic inversion framework for single-look multi-master SAR tomography.

C. Inversion Framework
The data model (6) already indicates a nonlinear system of equations for a single-look multi-master stack.Suppose G is the graph associated with this stack that contains a total of N := |E(G)| multi-master observations, and e 1 , . . ., e N is an ordered sequence of all the edges in E(G).Let M, S : [N ] → [N ] be the mappings to the master and slave image indices, respectively.For each e n , n ∈ Let s 1 , . . ., s L be a discretization of the elevation axis s.The data model in matrix notations is where R, S ∈ C N ×L represent the tomographic observation matrices of the slave and master images, respectively, r n,l := exp(−jk and γ ∈ C L is the unknown reflectivity vector such that γ l is associated with the scatterer (if any) at elevation position s l .
In light of ( 12), we propose the following framework for tomographic inversion.
1) Nonconvex sparse recovery: We consider the problem where K ∈ Z ++ .The objective function measures the model goodness of fit and the constraint enforces γ to be sparse, as is implicitly assumed in (6).If

K l=0
L l is small, ( 13) can be solved heuristically by using the algorithms that will be developed in Sec.III.Sec.IV is dedicated to another algorithm that solves a similar problem based on bi-convex relaxation.
2) Model-order selection: This procedure removes outliers and therefore reduces false positive rate.By using for example the Bayesian information criterion (e.g., [51]), model-order selection can be formulated as the following constrained minimization problem: where δ ∈ C L is an auxiliary variable, and Ω is its support.Since | Supp(γ)| is typically small, ( 14) can be tackled by solving a sequence of subset nonlinear least squares problems in the form of for which two solvers will be introduced in Sec.III.
3) Off-grid correction: The off-grid or quantization problem arises when scatterers are not located on the (regular) grid of discrete elevation positions s 1 , . . ., s L .Ge et al. [17] proposed to oversample γ in the vicinity of selected scatterers in order to circumvent this problem.Here we propose a more elegant approach that is based on nonlinear optimization.
Denote K := | Ω| as the number of scatterers after modelorder selection.Let γ R l and γ I l be the real and imaginary parts of the complex-valued reflectivity γ l of the lth scatterer that is located at s l , respectively, i.e., γ l = γ R l + jγ I l , ∀l ∈ [ K].On the basis of the single-look multi-master data model ( 6), we seek a solution of the following minimization problem: Note that the objective function is differentiable w.r.t.γ R l , γ I l and s l , ∀l ∈ [ K]. Needless to say, the on-grid estimates from ( 14) are used as the initial solution.We will revisit this problem in Sec.III-A.
Thus far the inversion framework has been established.In the next two sections, we will deal with the optimization problems ( 13)-( 16) from the algorithmic point of view.

III. NONLINEAR LEAST SQUARES (NLS)
NLS is a parametric method that breaks down a sparse recovery problem into a series of subset linear least squares subproblems [52, §6.4].Here we extend the concept of NLS to the single-look multi-master data model (12) and address the subproblem (15), or equivalently, where As can be concluded from Sec. II-C, ( 17) is clearly of interest, since it not only solves the nonconvex sparse recovery problem (13), but also underlies model-order selection (14).

A. Algorithm
In this subsection, we develop two algorithms for solving (17).
The first algorithm is based on the alternating direction method of multipliers (ADMM) [53].ADMM solves a minimization problem by alternatively minimizing its augmented Lagrangian [54, p. 509], in which the augmentation term is scaled by a penalty parameter ρ ∈ R ++ .A short recap can be found in Appendix A. It converges under very general conditions with medium accuracy [53, §3.2].
Now we consider (17) in its equivalent form: This is essentially a bi-convex problem with affine constraint [53, §9.2].Applying the ADMM update rules leads to Alg. 1.Note that both xand z-updates boil down to solving linear least squares problems.
Algorithm 1 An ADMM-based algorithm for solving (17) 1: The second algorithm uses the trust-region Newton's method that exploits second-order information for solving general unconstrained nonlinear minimization problems [55, §4].The rationale behind this choice is to circumvent saddle points that cannot be identified by first-order information [56].In each iteration, a norm ball or "trust region" centered at the current iterate is adaptively chosen.If the second-order Taylor polynomial of the objective function is sufficiently good for approximation, a descent direction is found via solving a quadratically constrained quadratic minimization problem.Suppose f : R n → R is the objective function, the subproblem at the iterate x ∈ R n is where ∆x ∈ R n is the search direction, ∇f and ∇ ), it is easy to show that the objective function of ( 17) is not complex-differentiable w.r.t.x.In lieu of using Wirtinger differentiation that does not contain all the second-order information, we exploit the fact that the mapping x → (x R , x I ) is isomorphic and let where f : R n × R n → R is real-differentiable w.r.t.x R and x I .Straightforward computations reveal its gradient as where and its Hessian as Note that d : C n → C n and C, D, E : C n → C n×n are essentially functions of x.Here we drop the parentheses in order to simplify notation.For the same purpose, we adopt the following convention: By using the first-and second-order information of ( 20), ( 17) can be directly tackled by the trust-region Newton's method via solving a sequence of subproblems in the form of (19).For any optimal point x , the KKT condition is Remark.Likewise, the objective function of ( 16) is realdifferentiable w.r.t.γ R l , γ I l and s l , ∀l ∈ [ K].Therefore, the trust-region Newton's method is directly applicable.Alternatively, first-order methods such as Broyden-Fletcher-Goldfarb-Shanno (BFGS, see for example [55, §6.1] and the references therein) can also be used.
Needless to say, it is not guaranteed that these algorithms always converge to a global minimum.We will demonstrate later in Sec.V that the solutions are often good enough.Fig. 2 shows typical convergence curves in the case of double scatterers (#6 in Sec.V-C).In order to generate this plot, we first let one algorithm run non-stop until it converged with very high precision.We then took this solution as an optimal point x and compared the absolute difference of the objective value |f (x) − f (x )|.Both ADMM and the trust-region Newton's method converged to the same solution (up to a constant phase angle, see Sec.III-B), although it only took the latter less than 10 iterations.Still, the former can be interesting due to the simplicity of its update rules (see Alg. 1).In Sec.V, the latter will be used for demonstration purposes.

B. Analysis of the Objective Function
Due to the nonconvexity of the objective function (20), its analysis is not straightforward.We are primarily concerned with the following two questions: 1) Under which circumstances do critical points or local extrema exist?2) If they do exist, how many are they?This subsection shall provide a partial answer to these questions.
First of all, we state the following general observation.
Proposition 1.For any x ∈ C n and φ ∈ R, any eigenvalue of ∇ 2 f (x) is also an eigenvalue of ∇ 2 f (x exp(jφ)) and vice versa.
Proof.See Appendix B.
Informally, this proposition implies that the definiteness of the Hessian is invariant under any rotation with a constant phase angle.Now we state the main theorem for the general case.This theorem implies that if there exists one critical point, then there are an infinite number of them up to a constant phase angle, and each is as good.Furthermore, we conjecture that A H Diag(b)B + B H Diag(b)A 0 is a necessary and sufficient condition (cf.Thm.2(b)), and each nonzero critical point is also a local minimum under some mild conditions.
For the special case n = 1, i.e., A, B ∈ C m , x ∈ C, we have a much stronger result.are infinitely many local minima that are exactly as good.Fig. 3 shows as an example the negative logarithm of ( 20) with a circle of local maxima.Lastly, Thm. 3 implies the following interesting result.
Corollary 4 (n = 1).Each nonzero local minimum (if it exists) is given by for some φ ∈ R.
Proof.See the proof of Thm.3(b).
Now we return to our problem in SAR tomography.For the single-look multi-master data model (12), this corollary motivates the 1-D spectral estimator: ∀l ∈ [L].Note that this also provides the solution for any 1-D NLS subproblem up to a constant phase angle.In the case of multiple scatterers, this estimator does not have any super-resolution power.

IV. BI-CONVEX RELAXATION AND ALTERNATING MINIMIZATION (BICRAM)
This section introduces a second algorithm for solving the nonconvex sparse recovery problem (13).

A. Algorithm
As a starting point, we replace the constraint in (13) with a sparsity-inducing regularization term, e.g., where λ ∈ R ++ trades model goodness of fit for sparsity.
In light of ( 26), the necessary condition for being an optimal point γ is i.e., the right-hand side is a subgradient of the 1 norm at γ .Obviously, 0 always satisfies this condition.
In principle, an ADMM-based algorithm similar to Alg. 1 can be used to solve (29).However, our experience with real SAR tomographic data shows that it often diverges, presumably due to the high mutual coherence of R and S under nonconvexity.For this reason, we consider instead the following relaxed version of (29): where , it is convex in γ with θ fixed, and convex in θ with γ fixed.The first regularization term enforces γ and θ to have similar entries, and the second one promotes the same support.Since (31) is essentially an unconstrained bi-convex problem, it can be solved by using alternating minimization via Alg. 2 (see also [58]- [60]).
Due to the nonconvexity of (31), it is very difficult to establish a convergence guarantee for Alg. 2 from a theoretical point of view.However, our experiments with real SAR tomographic data (see Sec. V) show that it converges empirically.As an example, Fig. 4 depicts a convergence curve in the case of two scatterers that are closely located (#6 in Sec.V-C).
In terms of regularization parameter tuning, we adopt the approach of sampling the solution path (λ 1 , λ 2 ) → x, and selecting the solution with the highest penalized likelihood (14).Last but not least, this procedure can be simplified by performing 1-D search, i.e., fixing one parameter and tuning the other at a time.

B. Implementation
This subsection addresses several implementation aspects that contribute to accelerating Alg. 3 (and therefore Alg. 2).
The exposition is based on an ADMM-based algorithm for solving the 1 -regularized least squares (L1RLS) problem: This is more suitable for demonstrating the power of different acceleration techniques, since each of its subproblems has an analytical solution and does not involve iteratively solving another optimization problem (cf.Alg. 2).Besides, it will also be used as a reference in Sec.V. Now consider (36) in its equivalent form: Applying the ADMM update rules leads to Alg. 4.
Algorithm 4 An ADMM-based algorithm for solving (36) 1: Input: A, b, z (0) , λ, ρ 2: Initialize z ← z (0) 3: Until stopping criterion is satisfied, Do 4: is the proximal operator of the 1 norm scaled by λ (also known as the soft thresholding operator [62]): whose i-th entry is given by [61, §6.5.2] The first technique provides an easier way for the x-update.1) Matrix inversion lemma: In Alg. 3 and 4, an L-by-L matrix needs to be inverted.For large L, a direct exact approach can be tedious.Instead, we exploit the following lemma.
Lemma 5 (Matrix inversion lemma [63]).For any A ∈ C n×m , B ∈ C m×n and nonsingular C ∈ C n×n , we have Lemma 5 suggests a more efficient method if inverting C is straightforward.This is the case for matrices in the form of A H A + ρI since i.e., instead of the original L-by-L matrix, only an N -by-N matrix needs to be inverted.Alternatively, the linear least squares (sub)problems can be solved iteratively in order to deliver an approximate solution [64], which is known as inexact minimization [53, §3.4.4].
The following techniques can be employed to improve convergence.
2) Varying penalty parameter: The penalty parameter ρ can be updated at each iteration.Besides the convergence aspect, this also renders Alg. 4 less dependent on the initial choice of ρ.A common heuristic [53, §3.4.1] is to set at the (k + 1)th iteration, where τ, µ > 1 are parameters, r (k) := x (k) − z (k) is the primal residual, and ) is the dual residual.As k → ∞, r (k) and s (k) both converge to 0. Intuitively, increasing ρ tends to put a larger penalty on the augmenting term (ρ/2) x − z 2 2 in the augmented Lagrangian and consequently decrease r (k) 2 on the one hand, and to increase s (k)  2 by definition on the other and vice versa.The rationale is to balance r (k) and s (k) so that they are approximately of the same order.Naturally, one downside is that (41) needs to be recomputed whenever ρ changes.
3) Diagonal preconditioning: The augmenting term (ρ/2) x − z 2 2 in the augmented Lagrangian can be replaced by where P 0 is a real diagonal matrix.Note that this falls under the category of more general augmenting terms [53, §3.4.2].By means of this, Alg. 4 is deprived of the burden of choosing ρ and the ADMM updates become where p := Diag(P), and Prox 1,w : C L → C L is the proximal operator of the weighted 1 norm with weights w ∈ R L ++ : whose i-th entry is given by In case A H A is ill-conditioned (such as in SAR tomography), P can be interpreted as a preconditioner.Needless to say, Lemma 5 can be also applied to invert A H A + P. Pock and Chambolle (2011) proposed a simple and elegant way to construct diagonal preconditioners for a primal-dual algorithm [65] [54, §15.2] with guaranteed convergence: where α ∈ [0, 2] is a parameter.4) Over-relaxation: This means inserting between the xand z-updates of Alg. 4 the following additional update:  Fig. 5 shows the convergence curve of Alg. 4 using different acceleration techniques, as applied to real SAR tomographic data (#6 in Sec.V).Each technique did contribute to accelerating Alg. 4 in comparison to "baseline", where we set ρ = 1.The number of iterations of this and five other cases are listed in Tab.II.Obviously, the combination of diagonal preconditioning and over-relaxation was the most competitive one and will therefore be adopted for all the ADMM-based algorithms in the following.

V. EXPERIMENT WITH TERRASAR-X DATA
In this section, we report our experimental results with a real SAR data set.

A. Design of Experiment
As a demonstration, we used 31 TerraSAR-X staring spotlight repeat-pass acquisitions of the central Munich area from March 31, 2016 to December 7, 2017.This data set was processed with DLR's Integrated Wide Area Processor [66], [67], as was elaborately described in [17].In addition to side lobe detection (see [17] and the references therein), any nonpeak point inside a main lobe was also removed, since it would otherwise lead to a "ghost" scatterer in the result, as any side lobe point would do too.Our region of interest contains a six-story building ("Nordbau") of the Technical University of Munich (TUM) shown in Fig. 6 (left).The building signature in the SAR intensity image can be observed in Fig. 9, where the regular grid of salient points within the building footprint is a result of triple reflections on three orthogonal surfaces: metal plate (behind window glass), window ledge and brick wall [68].After main and side lobe detection, a total of 594 looks were left, whose azimuth-range positions are shown in Fig. 8

(bottom).
A single-master stack was formed by choosing the acquisition from December 20, 2016 as the one and only master.Its vertical wavenumbers are shown in Fig. 7.A sinusoidal basis function was used for modeling periodical motion induced by temperature change.The vertical Rayleigh resolution at scene center is approximately 12.66 m.The Crámer-Rao lower bound (CRLB) of height estimates given the aforementioned periodical deformation model [25] and a nominal signal-tonoise ratio (SNR) of 2 dB is approximately 1.10 m.NLS and L1RLS were applied to this stack for tomographic reconstruction.The latter was solved by Alg. 4 augmented with diagonal preconditioning and over-relaxation (see Sec. IV-B), where we set β = 1.8 and the choice of α is irrelevant (since A is a Fourier matrix).The solution path of L1RLS was sampled 11 times with the regularization parameter varying logarithmically from λ min := 5 We constructed a multi-master stack of small temporal baselines: suppose 1 , 2 , 3 , 4 , . . . is a chronologically ordered sequence of SLCs, the interferograms (edges) are (1 , 2 ), (3 , 4 ), etc. (see Fig. 1).As a result, this stack consists of 15 interferograms.Due to the small-baseline feature of this stack, we did not employ any deformation model for the sake of simplicity.NLS and BiCRAM were applied to reconstruct the elevation profile, where the latter was solved by Alg. 3 employing diagonal preconditioning and over-relaxation.Likewise, the solution path of BiCRAM was also sampled 11 times, where λ 1 was fixed as one (since it was deemed relatively insignificant as far as our experience went), and λ 2 was set to vary logarithmically from The initial solution was given by γ (0) = (R•S) H g due to its simplicity.Alternatively, (28) could be used.In terms of off-grid correction, forwardmode automatic differentiation [69] was employed in order to circumvent analytically differentiating the objective function of ( 16) for any number of scatterers, and the optimization problem was solved by means of a BFGS implementation [70].
Finally, we built a second small-baseline multi-master stack in the identical way as the previous one.In addition, we normalized each interferogram with the corresponding master amplitude.We will refer to this as the fake single-master stack, since we treated it as if it had been a single-master one.In order to apply the single-master approach, we calculated for each interferogram the difference between slave and master  wavenumbers, and used it as if it had been the wavenumber baseline, i.e., by inadequately assuming for each (m, n) ∈ E(G).Needless to say, NLS and L1RLS were employed exactly the same as in the single-master case.
The next subsection briefly explains how we generated ground truth data.

B. Generation of Height Ground Truth
Height ground truth data was made available via a SAR imaging geodesy and simulation framework [68].The starting point was to create a three-dimensional (3-D) facade model from terrestrial measurements, via (drone-borne) camera, tachymeter, measuring rod and differential global positioning system (GPS), with an overall accuracy better than 2 cm and a very high level of details [68].Ground control points were used for referencing this facade model to an international terrestrial reference frame.A visualization of the 3-D facade model is provided in Fig. 6 (right).The ray-tracingbased RaySAR simulator [71] was employed to simulate dominant scatterers that, as already mentioned in Sec.V-A, correspond to triple reflections on the building facade.With the help of atmospheric and geodynamic corrections from DLR's SAR Geodetic Processor [72], [73] and the newly enhanced TerraSAR-X orbit products [74], their absolute coordinates were converted into azimuth timing, range timing and height that we refer to as Level 0 ground truth data.
Level 1 ground truth data consists of height at 30 simulated PSs that are matched with real ones.The matching was conducted in the azimuth-range geometry, so as not to be affected by any height estimate error [75].Fig. 8 (top) shows the height simulations at the sub-pixel azimuth-range positions of the corresponding 30 PSs.This height is relative to a corner reflector that is located on top of a neighboring TUM building and next to a permanent GPS station [68].
In addition, we performed height interpolation for a total of 594 looks (see Sec. V-A) in the following way.First of all, the height of each simulated PS was converted into interferometric phase.Next, the distance to the polyline representing the nearest-range cross-section of the building facade was used as the independent variable to construct a 1-D interpolator.In the end, the phase was interpolated at the previously mentioned 594 looks and converted back into height.This interpolated height is referred to as the Level 2 ground truth and shown in Fig. 8 (bottom).Needless to say, one assumption is that each scatterer, if it does exist, should lie on the building facade.A cross-validation of this 1-D interpolator was performed in [68], where the standard deviation (SD) and median absolute deviation (MAD) were shown to be 0.004 and 0.002 m, respectively.
In the next subsection, our preliminary results are reported.

C. Experimental Results
The experiments can be divided into three categories: singlemaster, multi-master and fake single-master (see Sec. V-A).In each category, two algorithms were applied for tomographic reconstruction.
As a proof of concept, we selected six looks that are very likely subject to facade-roof layover.These six looks were chosen in a systematic way: we performed tomographic reconstruction on the single-master stack by using Tikhonov regularization (i.e., the 1 norm in the regularization term of  (36) is replaced by the 2 norm), extracted all the seven looks containing double scatterers, and discarded one look whose height distance is almost identical to the one of another look.These six looks are shown in Fig. 9, where the indices increase with decreasing estimated height distance from approximately 1.5 to 0.8 times the vertical Rayleigh resolution.This ordering agrees approximately with intuition under the assumption that the roof is entirely flat: the higher the scatterer is on the facade, the less is its height distance to the roof.
The estimated height profile is shown in Fig. 10 (#1-3) and 11 (#4-6), where we used the vertical Rayleigh resolution of the single-master stack (see Sec. V-A) for normalizing the x-axis.The height estimates are listed in Tab.III.In the single-master setting, NLS and L1RLS produced very similar height profiles, despite the occasional sporadic artifacts in the latter which are known to be an intrinsic problem of 1regularization.Moreover, the height estimates were identical after off-grid correction.In each case, the height estimate of the lower scatterer fits very well the Level 2 RaySAR simulation of facade.Overall, the multi-master results are consistent with the single-master ones, with deviations of height estimates typically of several decimeters.In the fake single-master setting, however, layover separation was only successful in the fifth case, presumably due to the high SNR (see the brightness of the look in Fig. 9).When the height distance is significantly larger than the vertical Rayleigh resolution (#1-2), both NLS and L1RLS could reconstruct double scatterers, but only the one with larger amplitude could pass model-order selection.When the height distance approaches the vertical Rayleigh resolution or becomes even smaller (#3, 4, 6), neither algorithm could reconstruct a second scatterer, and the height estimate of the single scatterer after off-grid correction is also arguably wrong.We are therefore convinced by this simple experiment that the conventional single-master approach, if applied to a multi-master stack, can be insufficient for layover separation.
Naturally, we also performed tomographic reconstruction for all the 594 looks within the building footprint in Fig. 8 (bottom).Tab.IV lists the overall runtime on a desktop with a quad-core Intel processor at 3.40 GHz and 16-GB RAM.Note that the periodical deformation model was only used in the single-master case, and the solution path of L1RLS or BiCRAM was sampled 11 times (see Sec. V-A).The height estimates of single and double scatterers are shown in Fig. 12-14 for the three categories, respectively.In the case of double scatterers, the higher one was plotted.The seemingly messy appearance in the left column is due to the fact that single scatterers originate from both facade and roof.In spite of this, the gradual color transition at the 30 PSs from far-to near-range agrees visually very well with the Level 1 ground truth in Fig. 8 (top).Tab.V lists the number of scatterers in each case.In the single-scatterer setting, NLS detected almost twice as many double scatterers as L1RLS.This is presumably due to a higher false positive rate: at 2 out of 30 PSs (fifth/second row from near range, and fifth/fifth column   from late azimuth on the 6 × 5 regular grid) double scatterers were detected, although there should only be single ones.The number of double scatterers in the multi-master case is in the same order as single-master L1RLS, and the ratio between the number of single and the one of double scatterers is also similar.We attribute the smaller number of single scatterers to the nonconvexity of the optimization problem.In particular, as Thm.2(b) suggests, a certain condition needs to be fulfilled for any nonzero solution of height profile estimate to exist at all, let alone whether an algorithm can provably recover it.In the fake single-master category, many fewer double scatterers were produced.This is presumably due to double scatterers being misdetected as single scatterers, which occurred 5 out of 6 times in the previous experiment (see Fig. 10 and 11).
The next subsection elucidates how we validated the height estimates with Level 1 and 2 ground truth data.

D. Validation
Since the height ground truth is limited to facade only (see Sec. V-B), the validation was focused on single scatterers by following two approaches: the first one uses 30 PSs, and the second one is based on extracted facade scatterers.
As already mentioned in Sec.V-A, the 30 PSs constituting the Level 1 ground truth in Fig. 8 (top) are caused by triple reflections on the building facade, and are located on    a regular grid of salient points.Due to the (almost) identical scattering geometry, these PSs should have similar SNRs and are therefore ideal for height estimate validation.In each of the six cases, single scatterers were correctly detected at all the 30 PSs-with the exception that double scatterers were misdetected by NLS in the single-master setting (see Sec. V-C).For this reason, the height estimate error could be evaluated straightforwardly.Fig. 15 shows the normalized histogram and Tab.VI lists some of its statistical parameters.As a reference, the SD and MAD of height estimate error of the PSI result are about 0.28 and 0.22 m, respectively [68].
In each of the three settings (single-master, multi-master and fake single-master), the respective two algorithms performed similarly and no significant difference is visible.A crosscomparison between the multi-master and fake single-master cases revealed the superiority of the former: its histogram is more centered around zero, and both its SD and MAD are slightly smaller.This is unsurprising since we already analyzed the implications of the single-look multi-master data model for single scatterers in Sec.II-B.At this point, we could confidently assert that it does make a difference in practice, albeit small, despite the longer (by approximately one order considering that the solution path of BiCRAM was sampled 11 times) processing time.Somewhat surprisingly, the multimaster result is also slightly better than the single-master one.We suspect that this is due to the complication of single-master tomographic processing by using the (imperfect) periodical deformation model, and the justified simplification in the multi-master case thanks to the small-baseline configuration (so that deformation-induced phase is mitigated via forming interferograms).
The second approach is based on all facade scatterers (Level 2 ground truth) in Fig. 8 (bottom), given that they

VI. CONCLUSION AND DISCUSSION
The previous sections provided new insights into singlelook multi-master SAR tomography.The single-look multimaster data model was established and two algorithms were developed within a common inversion framework.The first algorithm extends the conventional NLS to the single-look multi-master data model, and the second one uses bi-convex relaxation and alternating minimization.Extensive efforts were devoted to studying the nonconvex objective function of the NLS subproblem, and to experimenting with different acceleration techniques for ADMM-based algorithms.We demonstrated with the help of a real TerraSAR-X data set that the conventional single-master approach, if applied to a multi-master stack, can be insufficient for layover separation, even when the height distance between two scatterers is significantly larger than the vertical Rayleigh resolution.By means of a SAR imaging geodesy and simulation framework, we managed to generate two levels of height ground truth.The height estimates in each of the three settings were validated at either 30 PSs or hundreds of extracted facade scatterers.Overall, the multi-master approach performed slightly better, although it was computationally more demanding.
A special case of the general problem analyzed so far is single-look SAR tomography using only bistatic (or pursuit monostatic) interferograms.On the one hand, the advantages are that bistatic interferograms are (almost) APS-free, and the data model is still linear for any single scatterer whose reflectivity can be estimated up to a constant phase angle (7).On the other, the disadvantages are that, for double or multiple scatterers, bistatic interferograms are not motion-free (see Sec. II-B), and the data model is nonlinear (9).
An alternative way to formulate the problem in the bistatic setting is to parameterize APS without forming any interferogram.Let g and h be the bistatic observations of the master and slave scenes, respectively.We have essentially two data sets:     where φ ∈ R N is the APS vector.Under the sparsity or compressibility assumption of γ, one could consider the following problem: Inspired by PSI, another related problem is SAR tomography on edges.Let γ and θ represent the reflectivity profiles of two neighboring looks, and their phase-calibrated SLC measurements be denoted as g and h, respectively.Consider the following problem: which is bi-convex in γ and θ.Likewise, the rationale of g • h is to mitigate APS for neighboring looks.Alternatively, a parametric approach similar to (51) could also be considered.
in the kth iteration, where ρ ∈ R ++ is a penalty parameter.

APPENDIX B PROOF OF PROPOSITION 1
The proof uses the following minor result.
The rest of the proof follows via straightforward computations.Now we turn our attention to the proposition.
The choice of P can be divided into two cases depending on the value of φ.
Lemma 7.For any x := x R + jx I , the following equalities hold: Proof.The proof follows via straightforward computations.
Proof of Theorem 2.
For any x := x R + jx I = 0, where the second equality is given by Lemma 7. Since E(0) is Hermitian, we have

Theorem 2 .
Properties of the critical points of f (x).(a) 0 is a critical point: it is a local minimum if A H Diag(b)B + B H Diag(b)A ≺ 0, and a local maximum if A H Diag(b)B + B H Diag(b)A 0. (b) If there exists a nonzero critical point, then A H Diag(b)B + B H Diag(b)A ⊀ 0. (c) Suppose there exists a nonzero critical point z.Then (1) ∇ 2 f (z) is rank deficient.(2) There exist an infinite number of critical points in the form of z exp(jφ), φ ∈ R \ {0}.Each has the same objective function value as z, and its Hessian has the same definiteness.Proof.See Appendix C.
There exists a nonzero critical point if and only if (A • B) H b > 0. (c) Suppose there exists a nonzero critical point z.Then (1) ∇ 2 f (z) is positive semi-definite and rank deficient 2 .(2) There exist an infinite number of critical points in the form of z exp(jφ), φ ∈ R \ {0}.Each has the same objective function value as z, and its Hessian has the same definiteness.(3) z is a local minimum.Proof.See Appendix D. As a result, a nonzero local minimum exists if and only if (A • B) H b > 0. If this condition is satisfied, then there 2 Note that ∇ 2 f (z) ∈ R 2×2 by definition.

Fig. 3 .
Fig. 3. Negative logarithm of the NLS objective function (n = 1) with a circle of local maxima at the verge of the "crater".

Fig. 4 .
Fig. 4. Convergence curve of BiCRAM.The horizontal axis refers to the outer iterations in Alg. 2.

Fig. 6 .Fig. 7 .
Fig. 6.Eastern facade of the six-story TUM-Nordbau building in our region of interest [68].Left: in-situ photo.Right: 3-D facade model.The black shape corresponds to a metallic window.

Lemma 6 .
For any F, G ∈ R n×n and c, d ∈ R such that c 2 + d 2 = 1, the following equalities hold: Observe that for any a, b ∈ R such that a 2 + b 2 = 0, im a t e d H e ig h t [m ] im a t e d H e ig h t [m ] im a t e d H e ig h t [m ]

Fig. 17 .
Fig. 17.Scatter plot of simulated and estimated height of single scatterers using Level 2 height ground truth.Black: extracted facade scatterers.Gray: extracted non-facade scatterers.

TABLE I A
CLASSIFICATION OF TOMOGRAPHIC SAR ALGORITHMS (19)enote the gradient and Hessian of f , respectively, and r ∈ R ++ is the current trust region radius.By means of the Karush-Kuhn-Tucker (KKT) conditions for nonconvex problems, Nocedal and Wright [55, §4.3] divided(19)into several cases: in one case a one-dimensional (1-D) root-finding problem w.r.t. the dual variable is solved by using for example the Newton's method, while in the others the solutions are analytic. Snce the technical details are quite overwhelming, we do not intend to provide an exposition here.Interested readers are advised to refer to[55,  §4.3].It can be shown that the trust-region Newton's method converges to a critical point with high accuracy under general conditions[55, p. 92].By verifying the Cauchy-Riemann equations (e.g.,[57,  p. 50]

TABLE VI STATISTICS
OF HEIGHT ESTIMATE ERROR [M]: 30 PSS (LEVEL 1)