SAR Tomography at the Limit: Building Height Reconstruction Using Only 3-5 TanDEM-X Bistatic Interferograms

Multi-baseline interferometric synthetic aperture radar (InSAR) techniques are effective approaches for retrieving the 3-D information of urban areas. In order to obtain a plausible reconstruction, it is necessary to use more than twenty interferograms. Hence, these methods are commonly not appropriate for large-scale 3-D urban mapping using TanDEM-X data where only a few acquisitions are available in average for each city. This work proposes a new SAR tomographic processing framework to work with those extremely small stacks, which integrates the non-local filtering into SAR tomography inversion. The applicability of the algorithm is demonstrated using a TanDEM-X multi-baseline stack with 5 bistatic interferograms over the whole city of Munich, Germany. Systematic comparison of our result with TanDEM-X raw digital elevation models (DEM) and airborne LiDAR data shows that the relative height accuracy of two third buildings is within two meters, which outperforms the TanDEM-X raw DEM. The promising performance of the proposed algorithm paved the first step towards high quality large-scale 3-D urban mapping.


I. INTRODUCTION A. The TanDEM-X Mission
This work is supported by the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement no.ERC-2016-StG-714087, acronym: So2Sat, www.so2sat.eu), the Helmholtz Association under the framework of the Young Investigators Group "SiPEO" (VH-NG-1018, www.sipeo.bgu.tum.de),Munich Aerospace e.V. Fakultät für Luft-und Raumfahrt, and the Bavaria California Technology Center (Project: Large-Scale Problems in Earth Observation).The authors thank the Gauss Centre for Supercomputing (GCS) e.V. for funding this project by providing computing time on the GCS Supercomputer SuperMUC at the Leibniz Supercomputing Centre (LRZ).
Y. Shi is with the Chair of Remote Sensing Technology, Technische Universität München (TUM), 80333 Munich, Germany (e-mail: yilei.shi@tum.de)R. Bamler is with the Remote Sensing Technology Institute (IMF), German Aerospace Center (DLR), 82234 Weßling, Germany and the Chair of Remote Sensing Technology, Technische Universität München (TUM), 80333 Munich, Germany (e-mail: richard.bamler@dlr.de)Y. Wang is with Signal Processing in Earth Observation, Technische Universität München (TUM), 80333 Munich, Germany (e-mail: yuanyuan.wang@dlr.de)X.X.Zhu is with the Remote Sensing Technology Institute (IMF), German Aerospace Center (DLR), 82234 Weßling, Germany and Signal Processing in Earth Observation, Technische Universität München (TUM), 80333 Munich, Germany (e-mail: xiaoxiang.zhu@dlr.de)(Correspondence: Xiao Xiang Zhu) T ANDEM-X satellite is a German civil and commercial high-resolution synthetic aperture radar (SAR) satellite which has almost identical configuration as its 'sister' TerraSAR-X satellite.Together with TerraSAR-X, they are aiming to provide a global high-resolution digital elevation model (DEM) [1].Both satellites use a spiral orbit constellation to fly in tight formation in order to acquire the image pair simultaneously, which significantly reduces the temporal decorrelation error and the atmospheric interference.Since its launch in 2010, TanDEM-X has been continuously providing high quality bistatic interferograms that are nearly free from deformation, atmosphere and temporal decorrelation.

B. SAR Tomography Techniques
Tomographic synthetic aperture radar (TomoSAR) is a cutting-edge SAR interferemetric technique that is capable of reconstructing the 3-D information of scatterers and retrieving the elevation profile.Among the many multi-baseline InSAR techniques, TomoSAR is the only one that strictly reconstructs the full reflectivity along the third dimension elevation.SAR tomography and its differential form (D-TomoSAR) have been extensively developed in last two decades [2] [3] [4] [5] [6] [7] [8] [9].They are excellent approaches for reconstructing the urban area and monitoring the deformation, especially when using high resolution data like TerraSAR-X [10] [11] or COSMO-Skymed [12].Compare to the classic multi-baseline InSAR algorithms, compressive sensing (CS) based methods [13] [14] can obtain extraordinary accuracy for TomoSAR reconstruction and show the super-resolution (SR) power, which is very important for urban areas, since layover is dominant.
Although TanDEM-X bistatic data has many advantages, there is only a limited number of acquisitions available for most areas.For a reliable reconstruction, SAR tomography usually requires fairly large interferometric stacks (> 20 images), because the variance of the estimates is asymptotically related to the product of SNR and the number of acquisitions.Therefore, it is not appropriate for the micro-stacks, which have limited number of interferograms [15].

C. The proposed framework
As mentioned above, the accuracy of 3-D reconstruction replies on the product of SNR and the number of measurements N .Since the motivation of our work is the large-scale arXiv:2003.07803v1[eess.IV] 17 Mar 2020 urban mapping, the data we adopted is TanDEM-X stripmap co-registered single look slant range complex (CoSSC), whose resolution is about 3.3 m in azimuth direction and 1.8 m in range direction.The typical number of available interferograms for most areas is 3 to 5 [16].In [17], the pixels with similar height are grouped for the joint sparsity estimation, which leads to an accurate inversion of TomoSAR using only six interferograms.Although the unprecedented result is obtained, the accurate geometric information is usually not available for most areas.Therefore, the feasible way to keep the required precision of the estimates is to increase the SNR.
Recent works [18] [19] [20] show that SNR can be dramatically increased by applying non-local filters to the TomoSAR processing for different sensors, such as airborne E-SAR, COSMO-Skymed and TerraSAR-X.In [18], different nonlocal filters have been adopted to improve the estimation of the covariance matrix for distributed scatterers, which leads to a better height estimation for simulated data and airborne SAR data.Ferraioli et al. [19] introduced the non-local filter and the total variation regularizer to improve the multi-baseline phase unwrapping process.In [20], it is shown that we can achieve a reasonable reconstruction using only seven interferograms and better super-resolution properties when the number of interferograms is relative low.
In this work, we extend the concept of non-local compressive sensing TomoSAR in [20] [21] [22] and propose a new framework of spaceborne multi-baseline SAR tomography with TanDEM-X bistatic micro-stacks, i.e. 3 to 5 interferograms.The framework includes non-local filtering, spectral estimation, model selection and robust height estimation.Since the different spectral estimators have different estimation accuracy and computational cost, we compared the estimation accuracy of different estimators with micro-stacks.
The demonstration of different TomoSAR inversion methods for a large-scale area has been shown in [23] [24] [11].Only a few works on the validation of single buildings were reported in [8] [10] [25].Therefore, the validation of the specified quality of the TomoSAR result at a larger scale, would be of considerable interests for the scientific and commercial users.We choose Munich city as a test site because of a high quality LiDAR reference available to us, and we propose a complete workflow to compare the TomoSAR point cloud [26] generated by the proposed framework, TanDEM-X DEM product, and LiDAR data.

D. Contribution of this work
The major contributions of this work are summarized as follows; • We make possible a new application of bistatic SAR data for global building height reconstruction.• We have pointed out that for pixel-wise multi-master TomoSAR the well-known system equation is no longer valid in the multi-scatterer case.• We have developed a framework for tomographic stacks with only 3 -5 interferograms.A systematic investigation on the estimation accuracy and super-resolution power for the micro-stacks have been carried out, which was never done before.
• We use 5 TanDEM-X bistatic data to demonstrate the proposed framework.A systematic validation for largescale TomoSAR reconstruction have been carried out and a method for comparing with other reference data is established.The results are quantitatively compared with LiDAR reference for more than 34,000 buildings.The paper is organized as follows: In section II, the nonlocal TomoSAR framework is introduced; In section III, the estimation accuracy of TomoSAR with small stacks has been systematically studied; The experiments using real data, is presented in section IV; In section V, the quantitative validation is carried out; Finally, conclusions are given in section V.

II. NON-LOCAL TOMOSAR FOR MULTI-MASTER INSAR
In this section, we introduce the non-local TomoSAR framework for multi-master multi-baseline InSAR configuration.For a fixed azimuth-range position, γ(s) represents the reflectivity profile along elevation s.The measurement g n , i.e. the complex value of the corresponding azimuth-range pixel, in the n th SAR image is then a sample of Γ(k) -the Fourier transform of γ(s), where the elevation wavenumber k is a scaled version of the sensors position b n projected on the cross-range-azimuth axis b||s : with Note that b n are no baselines, but the positions of the sensor w.r.t.some origin.In case of monostatic multi-temporal data stacks, a single master g 0 is chosen with b 0 and its phase is subtracted from all other acquisitions: g n g * 0 /|g 0 |.This operation renders the phase spatially smooth and is a prerequisite for spatial phase unwrapping, averaging and graph (network) processing.It does not reduce information, since the phase of any (master) acquisition is random.Note that the choice of the point b = 0 only defines the x-r-s coordinate system.This point need not necessarily be the master track position b 0 .However, b 0 = 0 is a mathematically convenient choice and is assumed in all conventional TomoSAR system model like in [27].All the equations in [27], however, are actually independent of this particular choice.Only in the special case b 0 = 0 the b n are identical to baselines.
Here we are dealing with stacks of bistatic acquisitions, i.e. with the multi-master case.From each of these acquisitions we get a master g n,m = Γ(k n ) taken at b master = b n and a slave g n,s = Γ(k n + ∆k n ) image taken at b slave = b n + ∆b n , where ∆b n is the bistatic baseline (which takes the effective positions of the transmit-receive phase center into account).If we used a standard, i.e. single-master, TomoSAR inversion algorithm, we would confuse ∆b n and b n .In the case of a single scatterer in γ(s), this misinterpretation would do no harm, because the Fourier transform of a single point has a constant magnitude and a linear phase.In order to determine the slope of the phase ramp we can take any two samples and divide their phase difference by the difference in wavenumbers (= baseline).This is no longer true for two or more scatterers.The example of two symmetric and equally strong scatterers makes this clear: Hence, acquisitions with the same baseline ∆b are different depending on where the two sensors were located along b.Every bistatic acquisition provides three pieces of information: the two magnitudes |Γ(k n )| and |Γ(k n + ∆k n )| as well as the phase difference ∠Γ(k n +∆k n )Γ * (k n ), i.e. we must normalize the phase by the respective master in every acquisition, in order to become unaffected by deformation and atmospheric delay.Spectral estimation based conventional TomoSAR inversion algorithms, however, require complex spectral samples at several wavenumbers, phase-normalized to a single master phase.In a current parallel work by one of the authors [28] it is shown that pixel-wise TomoSAR using multi-master acquisitions is a non-convex hard to solve problem.This is true for pixel-wise tomographic inversion or for point scatterers.The situation becomes different, though, once we talk about averages of pixels, i.e. estimates of expectation values.Let us assume Gaussian distributed scattering with a backscatter coefficient along elevation of Assuming further that γ(s) is white, its power spectral density is stationary and is the autocorrelation function of Γ(k), i.e. the Fourier transform of σ 0 (s) as a function of the baseline wavenumber ∆k : Instead of sampling the Fourier spectrum we sample its autocorrelation function by the bistatic data stack.Since this relationship is independent of k ∝ b because of stationarity, it makes no difference, where the two acquisitions have been taken, only their baseline ∆b n counts.In other words we can use standard TomoSAR inversion algorithms in this case.
In this paper we use nonlocal filtering to improve SNR for micro-stacks.These filters perform ensemble averages with number of looks in the order of tens to hundreds.Hence, we tend to the assumption that we work with reasonably good estimates of E {Γ(k n + ∆k n )Γ * (k)} and can use the bistatic interferograms for TomoSAR reconstruction.
By introducing a noise ε, the matrix notation of TomoSAR model can be formulated as: where g = [g 1 , g 2 , ..., g n ] T is vector notation of the complexvalued measurement with dimension N ×1, and X ∼ σ 0 (s l ) = E{|γ(s l )| 2 } is the expectation value of reflectivity profile along elevation uniformly sampled at s l (l = 1, 2, ..., L).R is a sensing matrix with the dimension N × L, where R nl = exp(−j∆k n s l ).

B. Non-Local Procedure
Since we have only limited number of acquisitions for largescale area, the SNR need to be dramatically increased in order to obtain the required accuracy.As shown in [20], nonlocal procedure is an efficient way to increase the SNR of interferograms without notable resolution distortion.The idea of patch-wise non-local means considers all the pixels s in the search window, when the patch with the central pixel s is similar to the patch with central pixel c, the value of s is selected for calculating the value of pixel c.The value of pixel c is estimated by using a weighted maximum likelihood estimator (WMLE).
where weights w(i s , j s ) can be calculated by using patchwise similarity mesurement [20].Assuming that we have two expressions g = (I 1 , I 2 , φ) and Θ = (ψ, µ, σ 2 ), where g denotes the complex-valued measurement.I 1 and I 2 are the instensity of two SAR images.φ is the interferometric phase.Θ is the true value of the parameters, where ψ is the noise-free interferometric phase, µ is the coherence magnitude, and σ 2 is the variance.The likelihood function p (g s |Θ) = p I 1,s , I 2,s , φ s |ψ, µ, σ 2 is adopted from [29] with following formualtion: N (.) denotes the non-local estimator, where N (g) = f ( Θ). Θ = ( ψ, μ, σ2 ) represents the parameters being estimated, where ψ is the estimated interferometric phase, μ stands for the coherence magnitude, and σ2 stands for the variance.
f ( Θ) is the maximum likelihood estimator and the estimated parameters can be formulated as The patch size and search window size are set to be 7 × 7 and 21 × 21 according to experimental study, which is also reported by other works [30] [31].Each pixel represents 2.17 m in azimuth and 1.36 m in range.

C. Spectral Estimation
After the non-local procedure, spectral estimation is applied.The most relevant spectral estimation algorithms, including singular value decomposition (SVD) [5] [7], compressive sensing (CS) are introduced in the following.
• SVD: • CS: where C εε is the noise covariance matrix, which is defined as: Under the assumption that the model errors are circular Gaussian distributed with zero mean, the noise covariance matrix is formulated as C εε = |σ ε | 2 I and |σ ε | 2 is the noise power level.C XX is the covariance matrix of the prior, if it is assumed to be white, i.e.C XX = I.
The choice of different combinations of spectral estimators depends on the required accuracy, the computational time and others.We follow the procedure proposed in [23].It consists of three parts: (1) an efficient low-order spectral estimation; (2) the discrimination of the number of scatterers; (3) an accurate high-order spectral estimation.The elevation profile is first estimated by an efficient low-order spectral estimator in order to discriminate the number of scatterers in one resolution cell.Then, CS-based approach is adopted for the pixel which has multiple scatterers.This method decreases the amount of pixels that need the L 1 minimization, which leads to reduce the computational cost.Furthermore, the rest of pixels can be efficiently solved by randomized blockwise proximal gradient method [24].

D. Model Selection
The abovementioned spectral estimators retrieve a nonparametric reflectivity profile.Since our data is in urban area, we assume only a few dominant scatterers exist along the reflectivity profile.Therefore, the number of scatterers K is estimated by a model order selection algortihm as well as their elevation in one azimuth-range pixel [7].The estimator can be expressed as follows.
where C(k) is a model complexity penalty term which avoids more complicated model overfitting the observed data.The classical penalized likelihood criteria are the Bayesian information criterion (BIC), the Akaike information criterion (AIC), and the minimum description length (MDL) principle [32].
As mentioned in [7], the criteria of model order selection has to be chosen according to the experiments for the particular situation, because it is difficult to remove the bias of the selection.

E. Robust Height Estimation
To tackle the possible remaining outliers in the height estimates, the final height will be fused from the result of multiple neighbouring pixels as a post-processing.But instead of simple averaging, the height will be adjusted robustly using an M-estimator.Instead of minimizing the sum of squared residuals in averaging, M-estimator minimizes the sum of a customized function ρ ( ) of the residuals: where ŝi is the elevation estimates of the ith neighbouring pixel.It is shown that the close-formed solution of Eq. ( 16) is simply a weighted averaging of the heights of the neighbouring pixels [33].The weighting function can be expressed as follows, if the derivative of ρ (x) exists.
The robust estimated height can be written as follows: where ĥ = ŝ • sin θ, and θ is incident angle.The choice of the weighting function depends on the distribution of the heights.Without prior knowledge of the distribution, promising robust weighting functions are Tukey's biweight or tdistributed weighting [33].For instance, the formulation of Tukey's biweight loss function can be written as: and the weighting function can be formulated as:

III. ESTIMATION ACCURACY OF TOMOSAR WITH SMALL
STACKS This section will discuss the theoretical 3-D reconstruction accuracy of a micro-stack with 3-5 interferograms.The estimation accuracy of TomoSAR has been systematically investigated.It is exhaustively shown in [15] that the elevation estimation accuracy and SR power depend asymptotically on the multiplication N • SNR.In this section, we investigate the estimation accuracy of TomoSAR with the extremely small number of interferograms, which is 3 to 5.

A. The Lower Bound for Micro-stacks
In the case of pixel-wise TomoSAR inversion, i.e. without spatial averaging, each of our N bistatic pairs contain three pieces of information, as mentioned before.If we want to reconstruct elevation profiles containing M discrete scatterers we need to infer 3M parameters, i.e. elevation, magnitude and phase for each scatterer.Hence an absolute lower bound of the micro-stack size is N ≥ M .Distributed scatterers, on the other hand, are characterized by only two parameters each: elevation and backscatter coefficient.Likewise each interferogram provides only two parameters, magnitude and phase (difference).Since our goal is 3-D reconstruction based on bistatic data, we disregard motion-induced phase here.Hence, also in this case the absolute lower limit is N ≥ M .This limit is only a necessary condition, however not sufficient from the robustness point of view, because of ambiguities in the inversion cost functions.
For 3-D urban mapping the single and double scattering cases are the dominant ones.We investigate the cases N = 3 − 5 in this paper, because these are close to the mentioned limits and are relevant for TanDEM-X.

B. CRLB
It is demonstrated in [7] that the Cramer-Rao lower bound (CRLB) of the elevation estimates for single scatterer can be expressed as: where σ b is the standard deviation of the baseline distribution.
N is the number of interferograms, and SNR is the signal-tonoise ratio.
For the double scatterers' case, the CRLB can be written as: where σ sq,0 represents the CRLB on the elevation estimation of the qth scatterer without the interference with the others.c 0 is the correction factor of the interference for the scatterers, which are closely located [15].It is nearly free from N and SNR, which can be written as: where ∆ϕ is the phase difference of the two scatterers.κ is the normalized distance between two scatterers (defined in next section).Since ∆ϕ is a random variable, the approximated formulation of c 0 can be calculated by integrating the variances over ∆ϕ.
A note on the baseline distribution is worth mentioning: all the Eqs.( 21)-( 24) are satisfied with large stacks.In the micro-stacks with only 3 -5 acquisitions, the baseline distribution may be unfavorable, even if the baseline spread σ b is acceptable.For example, if two baselines were very similar, the information content would be reduced.It is therefore desirable to have the baselines possibly statistically uniformly distributed.

C. Monte Carlo Simulations
In this section, we compare different spectral estimators using simulated data.Two cases were carried out.The first case considers only a single scatterer in the interest of exploring the effect of N and SNR on the estimation accuracy for micro-stacks and the performance of different estimators.The second case considers double scatterers to investigate the estimation accuracy and the super-resolution power for different estimators.The inherent (Rayleigh) elevation resolution ρ s is inversely proportional to the maximal elevation aperture ∆b [15].
The normalized distance is defined as For the first test case, only one scatterer is placed at s = 0, and the SNR is in the range between 0 and 30 dB.For each N • SNR value, 100 different baseline distributions were generated.We carried out a Monte Carlo simulation for each baseline distribution with 10,000 realizations.Afterwards, the CRLB was evaluated by averaging the value of 100 different baselines.Fig. 3 (a) shows a performance comparison between SVD, and CS on simulated data with five acquisitions for a single scatterer.X-axis presents N • SNR in dB.Y-axis is the normalized CRLB σ s /ρ s .As one can see, both approaches have similar estimation accuracy.They are asymptotically towards the CRLB and collapse it when N • SNR is large.More interestingly is when N • SNR is fixed, leaving N as a variable.Fig. 3 (b) presents the estimation accuracy of SVD with N = 3, 4, 5.It shows that the accuracy when N = 3 is the smallest.This indicates that SNR carries more weight than N on the estimation accuracy when N is very small.
With the 3-D reconstruction accuracy of single scatterer clearly analyzed, we switch to the double scatterers case.In the simulation, the elevation of one scatterer is fixed at 0. the normalized elevation of the other scatterer is increased from 0.1 to 1.5, in order to mimic the layover of a ground layer and a facade layer.The number of acquisitions is set to N = 3−5, same as the first simulation.SNR is set to be 10 dB, since the SNR of TanDEM-X bistatic data is usually higher than this value in urban area [34].
The Monte Carlo simulation result is shown in Fig. 4. The x-axis represents the true normalized elevation distance κ of the simulated facade and ground layers.Y-axis is the estimated normalized elevation distance κ of the simulated facade and ground layers.The two solid lines in the Fig. 4 (a) (b) (c) represent the true position of the building facade and the ground, respectively.The dashed lines imply the true position plus and minus the CRLB.The blue bar and dot imply the standard deviation and the mean of the estimated elevation of the facade scatterers, whereas the red ones represent those of the ground scatterers.The green dot indicates that the detection rate of double scatterers is below 5% and denotes the estimated result of the single scatterer.Fig. 4 (b) (c) show the estimated results by SVD and CS, respectively.
As one can see in Fig. 4 (b) (c), the result of SVD has larger bias and slightly bigger standard deviation than CS.Note that, comparing to SVD, CS can give the better result, not only the accuracy of the estimation but also the super-resolution power.As one can see that the SVD has scarcely no super-resolution power, which can only distinguish double scatterers tile one Rayleigh resolution ρ s .In contrast, CS can achieve until 0.6 ρ s .

IV. PRACTICAL DEMONSTRATION A. Data Description
We make use of a stack of five co-registered TanDEM-X bistatic interferograms to evaluate the proposed algorithm.The dataset is over Munich, Germany, whose slant range resolution is 1.8 m and the azimuth resolution is 3.3 m.The images were acquired from July 2016 to April 2017.The most pertinent parameters of a TanDEM-X bistatic stripe map acquisition of Munich are listed in Table I and Table II.All the preprocessing steps, like deramping, are standard that are known from bistatic forest tomography.For interested readers, please refer to [2].

B. Visual Comparison with TanDEM-X raw DEM
In this work, the TanDEM-X raw DEM is adopted for visual comparison with TomoSAR point clouds of the test area, which is formed by two TanDEM-X bistatic acquisitions using the Integrated TanDEM-X Processor (ITP).
A top view of the reconstructed point cloud of TomoSAR is shown in Fig. 5 (a).The black regions in the figure is where the pixels are not coherent.The corresponding area of TanDEM-X raw DEM is presented in Fig. 5 (b) as a comparison.It is clear that the result of TomoSAR point cloud preserves more detailed building structures.The road layer is also better represented in the TomoSAR result as well.In Fig. 5 (b), the flat ground surface are well reconstructed.But when it comes to complex or high-rise buildings, their accuracy is compromised.For instance, the building of European bureau of the patent in the bottom right (red color) along the Isar river.A close view to this building can be seen in Fig. 6.Due to the complex building structure, as well as the multilooking processing, the TanDEM-X raw DEM merges several buildings together and exhibits lower accuracy on the height of the buildings.
As another example, Fig. 7  Since the TomoSAR point cloud is with respect to a reference point that was chosen during the TomoSAR processing, its location is not with respect to a geo-coordinate system.We coregistrated the point cloud of TomoSAR with the DEM and the LiDAR point cloud.In addition, in order to compare point clouds with DEM, we rasterize the two point clouds.These preparing steps are briefly explained in this section.

A. Geocoding
Since the result of TomoSAR inversion is a 3-D point cloud in the range-azimuth coordinate, the first step is to transform the result to Universal Transverse Mercator (UTM) coordinate with the Range-Doppler approach [36].

B. Coregistration of different point clouds
Consequently, when the TomoSAR point cloud is transformed to a UTM coordinate, its position may differ from the ground truth since the height of the reference point is unknown.Hence, the alignment of different point clouds is necessary.The most popular 3-D point cloud registration algorithm is iterative closest points (ICP) approach [37].
The performacne of ICP depends on the initial alignment.Hence, a coarse alignment is adopted before applying ICP, which includes three steps: (1) the edge image is extracted by an edge detector, such as Sobel algorithm [38]; (2) the horizontal coregistration of two edge images is using crosscorrelation of two edge images; (3) the vertical coregistration is using cross-correlation of the two height histograms.After the coarse alignment, ICP can be applied for the fine alignment [39].

C. Object-based raster data generation
The direct comparison of TomoSAR and LiDAR points is not feasible, as the central position of two corresponding points (one TomoSAR and one LiDAR point) and the footprint of the points are differing.Consequently, for comparing both data, an object-based raster need to be generated by using geographic information system GIS data.

D. Comparison of individual structure
In order to evaluate the estimation accuracy, nine test sites with high average SNR have been chosen for individual quantitative comparison.From Tab.III we can see that the height differences between TomoSAR result and LiDAR data are within one meter and the height differences between TanDEM-X DEM product and LiDAR data vary from 2.5 m to 8.5 m.Similar performance is shown in standard deviation, for NL-TomoSAR is up to 1.4 m and for TanDEM-X DEM is up to 8.4 m.

E. Average accuracy
In order to have an assessment of the overall accuracy in a city scale, we compared all the 36,499 buildings in the area with the LiDAR point cloud.38.7% buildings are within 1 m accuracy.62.8% are within 2 m accuracy.A detailed distribution of accuracy is listed in Tab.IV.However, the two datasets (TanDEM-X CoSSC and LiDAR) were acquired at different time.It is almost certain that changes happened during the period.Therefore, in order to obtain a more realistic assessment, we truncated the distribution of height difference at ±15 m.The truncated histogram can be seen in Fig. 9. 34,054 buildings remains after the truncation.Their overall standard deviation is 1.96 m.

VI. CONCLUSION
A new SAR tomographic inversion framework tailored for very limited number of measurements is proposed in this paper.A systematic investigation of the estimation accuracy of TomoSAR with a micro-stacks is carried out using simulated data.Our experiments show that SVD and CS-based methods have almost identical performance on the estimation of single scatterer and the SNR plays a more important role than N for the estimation accuracy, when N is small.For the estimation  of double scatterers, CS-based approach outperforms the other spectral estimators.Experiments using TanDEM-X bistatic data shows the relative height accuracy of 2 m can be achieved in large scale.Thus it demonstrates the proposed framework being a promising solution for high quality large-scale 3-D urban mapping.

Fig. 4 .
Fig. 4. Monte Carlo simulations of double scatterer with different normalized distances: κ ∈ [0.1, 1.5] and SNR = 10 dB.X-axis represents normalized true distance κ of simulated facade and ground.Y-axis is normalized estimated distance κ of simulated facade and ground.The blue dot marker denotes the estimated location of facade and the error bar indicates the standard deviation of the estimates, whereas the red dot marker represents the estimated location of ground.The green dot suggests that detection rate of double scatterers is below 5% and denotes the estimated result of single scatterer.(a) Illustration (b) SVD (c) CS.

Fig. 5 .
Fig. 5. Visual comparison of NL-TomoSAR point clouds and TanDEM-X DEM over Munich, Germany.Color code: 565 m (blue) 596 m (red), scene size: 15 km × 9 km, north = top.The voilet bounding box indicates the region of interest (ROI) over the area of European bureau of patent and the white bounding box indicates the ROI near Munich central station.(a) Point Clouds generated by NL-TomoSAR with five interferograms.(b) TanDEM-X DEM.

Fig. 6 .Fig. 7 .
Fig. 6.Visual comparison of NL-TomoSAR point clouds and TanDEM-X DEM, close-up 3-D view over the area of European bureau of patent.(a) TomoSAR point clouds.(b) TanDEM-X DEM.
shows the visual comparison over the area around Munich central station.It is clear that NL-TomoSAR result can show more detailed structures, such as the bridge, the central station, and roads.V. QUANTITATIVE VALIDATION In this section, we have quantitatively compared the To-moSAR point clouds with TanDEM-X raw DEM, as well as a much more precise LiDAR reference.The LiDAR dataset of Munich is provided by Bavarian State Office for Survey and Geoinformation with ten centimeter accuracy [35].

Fig. 8
shows the optical images of nine test sites for quantitative comparison of NL-TomoSAR point clouds and TanDEM-X DEM.They are (1) Munich central station, (2) European bureau of patent, (3) Technical University of Munich, (4) A railway signal light stand near Hirschgarten, (5) A train repair garage near Hirschgarten, (6) A residential building between two bridges (Hackerbrücke and Donnersbergebrücke), (7) Munich University of Applied Sciences, (8) A residential building near Lowenbrau beer company and (9) Karstadt (shopping mall).The summary of the results is shown in Tab.III.

Fig. 8 .
Fig. 8. Optical images of nine test sites for quantitative comparison of NL-TomoSAR point clouds and TanDEM-X DEM.(a) Munich central station.(b) European bureau of patent.(c) Technical University of Munich.(d) Railway signal light stand.(e) Train repair garage.(f) Residential building between two bridges.(g) Munich University of Applied Sciences.(h) Residential building near Lowenbrau beer company.(i) Karstadt (shopping mall).

Fig. 9 .
Fig. 9. Histogram of height differences of structures in the whole Munich area.

TABLE III STATISTICS
OF QUANTITATIVE COMPARISON OF NINE TEST STRUCTURES.FIRST COLUMN SHOWS THE NUMBER OF EACH STRUCTURE.SECOND COLUMN IMPLIES THE SOURCE OF EACH RESULT, I.E., T (TOMOSAR), L (LIDAR), D (DEM).THIRD AND FOURTH COLUMNS PRESENT THE STATISTICS (MIN, MAX, MEAN AND STANDARD DEVIATION) OF SAMPLE POINTS AT TOP LAYER AND BOTTOM LAYER.FIFTH COLUMN DEMONSTRATES THE RELATIVE HEIGHT OF EACH STRUCTURE, WHICH IS CALCULATED BY USING THE MEAN VALUE OF TOP LAYER MINUS THE MEAN VALUE OF BOTTOM LAYER.SIXTH COLUMN SHOWS THE RELATIVE HEIGHT DIFFERENCE BETWEEN TOMOSAR POINT CLOUDS AND LIDAR DATA, AS WELL AS BETWEEN TANDEM-X RAW DEM AND LIDAR DATA.