γ -Net: Superresolving SAR Tomographic Inversion via Deep Learning

—(This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.) Synthetic aperture radar tomography (TomoSAR) has been extensively employed in 3-D reconstruction in dense urban areas using high-resolution SAR acquisitions. Compressive sensing (CS)-based algorithms are generally considered as the state of the art in super-resolving TomoSAR, in particular in the single look case. This superior performance comes at the cost of extra computational burdens, because of the sparse reconstruction, which cannot be solved analytically and we need to employ computationally expensive iterative solvers. In this paper, we propose a novel deep learning-based super-resolving TomoSAR inversion approach, γ - Net, to tackle this challenge. γ -Net adopts advanced complex-valued learned iterative shrinkage thresholding algorithm (CV-LISTA) to mimic the iterative optimization step in sparse reconstruction. Simulations show the height estimate from a well-trained γ -Net approaches the Cram´er-Rao lower bound while improving the computational efﬁciency by 1 to 2 orders of magnitude comparing to the ﬁrst-order CS-based methods. It also shows no degradation in the super-resolution power comparing to the state-of-the-art second-order TomoSAR solvers, which are much more computationally expensive than the ﬁrst-order methods. Speciﬁcally, γ -Net reaches more than 90% detection rate in moderate super-resolving cases at 25 measurements at 6dB SNR. Moreover, simulation at limited baselines demonstrates that the proposed algorithm outperforms the second-order CS-based method by a fair margin. Test on real TerraSAR-X data with just 6 interferograms also shows high-quality 3-D reconstruction with high-density detected double scatterers.


I. INTRODUCTION
S YNTHETIC aperture radar (SAR) tomography (To- moSAR) [1] has been widely employed for large-scale 3-D urban mapping.It utilizes a stack of SAR acquisitions to reconstruct the reflectivity profile γ along the elevation direction for every azimuth-range pixel.In urban areas, there are usually only a few significant scatterers overlaid in a resolution cell along the elevation direction.Based on this fact, compressive sensing (CS)-based sparse reconstruction algorithms [2], [3], [4] were introduced to TomoSAR inversion so that we can best unleash the potential of high-resolution SAR data like TerraSAR-X in urban areas.[5], [6] presented the first simulation of CS TomoSAR, [6] presented the first real data example and [7] proved the super-resolution power of CS for TomoSAR inversion.In recent years, different CSbased methods for solving TomoSAR inversion have been extensively studied, such as SL1MMER [8], TSVD-based CS [9] and ADMM-based L 1 algorithm [10].These CS-based algorithms show superiority in super-resolution capability as well as elevation estimate accuracy over the conventional L 2 regularization methods.However, CS-based algorithms usually suffer from high computational expense and are more challenging to be extended to large-scale processing.An efficient approach was proposed in [11] to address this issue, which is an integration of persistent scatterer interferometry (PSI) and "SL1MMER".This approach speeds up the processing by pre-classifying the pixels and reducing the percentage of pixels that require SL1MMER for sparse reconstruction.Nevertheless, it did not boost the TomoSAR inversion fundamentally.The same authors of [11] also proposed a datadriven method [12], which is based on the CAESAR algorithm [13].It applies kernel principle component analysis (KPCA) to separate the contribution of individual scatterers before inversion, thus reducing the computational cost logarithmically.Although these algorithms brings a perspective of data-driven approaches in TomoSAR, they still do not strictly solve the SAR tomographic inversion.Their super-resolution capability are also not investigated.Therefore, there has not been a fully data-driven TomoSAR algorithm to date.Hence, we would like to explore the potential of modern data science algorithm such as deep learning, for TomoSAR in this paper.

A. Related work
Recently, deep learning has rapidly developed and been extensively applied in various fields of remote sensing [14], including SAR data processing [15], thanks to its strong learning power.In particular, a deep neural network can act as an effective nonlinear function and is capable of representing many complicated mathematical models including the CS problems [16].Several recent studies [17] [18] [19] have documented the application of deep neural networks in solving sparse reconstruction related problems in signal processing and remote arXiv:2112.04211v1[eess.SP] 8 Dec 2021 sensing, i.e. 3-D millimeter-wave sparse imaging and 3-D microwave reconstruction.Motivated by this fact, community started to investigate TomoSAR inversion algorithms based on deep learning and their application since a few years ago.[20] proposed a method to utilize neural networks to detect single scatterer and estimate the corresponding elevation.In [20], To-moSAR inversion was treated as a typical classification problem to detect single scatterer with the classes indicating all the possible discretized positions within the elevation extension of the illuminated scene.Because of its problem formulation, this method cannot be employed in true SAR 3-D imaging, i.e., layover separation.An efficient line spectral estimation algorithm based on deep neural networks was proposed in [21] and applied to tackle the TomoSAR inversion.Experiment results in [21] showed that the method can separate overlaid scatterers and achieves moderate reconstruction performance, whereas the super-resolution power of the proposed method was not systematically analyzed.More recently, a novel superresolving TomoSAR imaging framework based on CS and deep neural networks was proposed in [22].It employed CSbased algorithms for preliminary reconstruction and split the elevation range to several subregions with spatial filters.Then a group of deep neural networks based regression models were trained and applied to each subregion to achieve final super-resolution reconstruction.This method was shown to have unprecedented super-resolution capability.However, the drawback of the proposed algorithm is also obvious.First, the computational complexity of the proposed method is of same order of magnitude to other CS-based algorithms, although the authors increased the sampling distance between two neighbor discrete grid point.The second drawback is that the strong super-resolution power is attributed to adequate training samples, whereas it is arduous to simulate data that imitate the real scattering scenario of high fidelity.Hence, strong overfitting to the training data is expected.

B. Contribution of this paper
The aim of this paper is to introduce a computationally efficient and generic TomoSAR algorithm based on deep learning and provide a systematic analysis of its super-resolution power.To this end, we propose a deep learning based approach to address super-resolving TomoSAR inversion.We unroll iterative shrinkage thresholding algorithm (ISTA) as a complex-valued feedforward neural network with side-connection, named as γ-Net.γ-Net could be trained using data simulated by spatial baselines of given stacked SAR acquisitions.Once welltrained, γ-Net can be directly used for further inference.The main contributions of this paper are listed as follows: 1) We are the first to introduce a deep unfolded neural network called γ-Net to solve super-resolving TomoSAR inversion.We improved the conventional soft-thresholding function by the piecewise linear function to mitigate the loss of information caused by the learning architecture.Simulations indicate that the piecewise linear function contributes to improving the convergence rate and reducing reconstruction error.2) We are the first to perform a systematic evaluation of a deep learning-based TomoSAR inversion algorithm.
We investigated the generalization ability w.r.t amplitude ratio, phase difference of the interfering scatterers, superresolution power and elevation estimation accuracy of the proposed algorithm, i.e. γ-Net.Experiments demonstrate that γ-Net not only approaches almost the same performance in nominal condition comparing to the state of the art but also outperforms the state of the art at limited number of measurements.3) We carried out rigorous analysis of algorithm complexity and proved that γ-Net improves the computational efficiency by 1 to 2 orders of magnitude comparing to first-order CS-based methods and shows no degradation in super-resolution power, nor in estimation accuracy comparing to second-order CS-based methods, which are much more computationally expensive than first-order methods.Further time consumption comparison to the second-order CS-based method establishes the superiority of γ-Net in computational efficiency and evidences that γ-Net is able to realize large-scale super-resolving TomoSAR processing, whereas second-order CS-based solvers can only be applied to small laboratory samples.The remainder of the paper is outlined as follows: In section II, the TomoSAR imaging model and inversion are briefly introduced.Section III provides an overview on the formulation of γ-Net.Results of systematic evaluation, using simulated and real data, are presented and discussed in section IV and V. Finally, the conclusion of this paper is drawn in section VI.Firstly, we would like to introduce the TomoSAR imaging model (see Fig. 1).For a single SAR acquisition at aperture position b n , the complex-valued measurement at an azimuthrange pixel for the nth acquisition is the integral of the reflected signal along the elevation direction and can be expressed as follows:

II. SAR IMAGING MODEL AND PROBLEM FORMULATION
where γ(s) depicts the reflectivity profiles along the elevation direction.ξ n = −2b n /(λr) denotes the elevation frequency.λ and r refer to the wavelength and the range, respectively.By discretizing the reflectivity profile along the elevation direction s, the approximated system model reads where L is the number of the discrete elevation indices and δs = ∆s/(L − 1) is the sampling distance with ∆s depicting the whole extent of the reflectivity profile along the elevation direction.In the presence of noise ε, the discrete TomoSAR imaging model can be expressed as follows: where g ∈ C N ×1 is the complex-valued SAR measurement vector, R ∈ C N ×L is the steering matrix with R nl = exp (−j2πξ n s l ), and γ ∈ C L×1 denotes the discrete reflectivity profile vector.TomoSAR inversion is aimed at retrieving the reflectivity profile for each range-azimuth cell, then estimating the corresponding scatterering parameters such as the number of scatterers and their elevation and reflectivity.For TomoSAR reconstruction in urban areas, it is shown in [7] that there are rarely more than a few (0-4) scatterers overlaid along the elevation direction in each resolution unit, namely, the reflected signal along the elevation direction is sufficiently sparse.The ideal sparse solution of γ is obtained by solving (3) with the L 0 -norm regularization, which is, however, a NP-hard problem.For our application, it is shown in [2], [3], [4] [7] that the L 0 -norm minimization can be approximate by the L 1 -norm minimization, which can be expressed as follows: where λ is a regularization parameter balancing the sparsity and data-fitting terms.It should be adjusted according to the noise level as well as the desired sparsity level.The choice of a proper λ is described in great detail in [23].The L 2 -L 1 mixed norm minimization (4) is also known as basis pursuit denoising (BPDN) [23] and can be formulated as least absolute shrinkage and selection operator (LASSO) in some condition.Conventional solvers for (4) are either first-or second-order methods.First-order methods are typically based on linear approximations, such as ISTA [24], ADMM [25].An example for the second-order methods is the primal-dual interior-point method (PDIPM) [26].Second-order methods often suffer from high computational cost, thus impeding their application in large-scale processing.
III. TOMOSAR INVERSION VIA γ-NET A. Background of complex-valued ISTA ISTA [24] is a popular method to solve the L 2 -L 1 mix norm minimization.Each iteration of ISTA is defined by where γ0 = 0, β is the stepsize, η st is the soft-thresholding function applied to each element of γi , and θ is the threshold in the soft-thresholding function.The complex-valued version of the soft-thresholding function η st is defined by where j is the imaginary number.In each iteration of ISTA, the estimate is firstly optimized via gradient decent and then the soft-thresholding function is applied to prune the elements with small magnitude, thus promoting the sparsity of the final estimate.

B. Complex-valued LISTA formulation for TomoSAR
We can rewrite equation ( 5) as the following form: where If we regard the soft-thresholding function in (7) as an activation function, we find that (7) is the basic form of the i th layer of a recurrent neural network (RNN).Therefore, ISTA can be viewed as a RNN illustrated in Fig. 2.
Fig. 2: RNN structure of ISTA by viewing an iteration of ISTA as a layer of the RNN.
Inspired by the connection between ISTA and RNN, a learning based model named Learned ISTA (LISTA) was proposed in [27].Fig 3 demonstrates the learning architecture of a K-layer LISTA, which unrolls the RNN and truncates it into K iterations, thus leading to a K-layer side-connected feedforward neural network.The major difference between ISTA and LISTA is that the weight matrices W i 1 , W i 2 as well as the threshold θ i in each layer of LISTA are not predetermined.Those parameters are learned in the LISTA neural network from training data.
The loss function of LISTA over the training data {(g i , γ i )} T i=1 is the mean square error (MSE) loss described as follows.
where T denotes the number of samples in the training data and Ψ = {W 1 , W 2 , θ} is the set of free parameters to be learned.Many recent works [27], [28], [29], [30], [31] have demonstrated that LISTA is able to achieve the same estimation accuracy within two to three order-of-magnitude fewer iterations than the original ISTA.Moreover, empirical results show that LISTA has better generalization ability.However, to apply LISTA to solve TomoSAR inversion, we need to extend LISTA to complex-valued domain.CV-LISTA shares the same learning architecture as LISTA, except that each neuron in CV-LISTA has two channels, which refer to the real and imaginary part of a complex number, respectively.We applied the following adaptions to the equation ( 7) where with j = 1, 2 and (•) and (•) denote the real and imaginary operators, respectively.

C. γ-Net formulation for TomoSAR
Through our research and experiments, we discovered a few drawbacks of CV-LISTA applying to TomoSAR and proposed several novel improvements in γ-Net.The improvements are mainly three-fold.First, the W 1 and W 2 matrices in the aforementioned CV-LISTA are highly correlated.Hence, we introduced weight coupling structure to reduce redundant trainable parameters in γ-Net.Second, we tried to use the acceleration technique, support selection, which was originally developed for LASSO to boost the convergence in γ-Net.Last, we replaced the conventional soft-thresholding function by the piecewise linear function because the conventional softthresholding function causes information loss, which leads to large reconstruction error and decreases the convergence rate of the model.We will discuss the improvements in detail in the following paragraphs.
Weight coupling: Instead of training γ-Net as pure "blackbox" networks, we simplify the CV-LISTA and propose γ-Net by exploiting the dependency among the trainable weights.Details can be found in [32] that the weights to be learned in each layer asymptotically satisfy the following partial weight coupling structure: By employing the partial weight coupling structure, we can simplify the i th layer of γ-Net to: where ( (W i ), (W i ), θ i ) are the parameters to be learned in the i th layer, and the trainable weight W i is initialized using the system measurement matrix R with W i = βR H .The coupled structure contributes to eliminating the number of free parameters to be trained, thus accelerating the training procedure significantly.Theoretically speaking, (11) can only be satisfied for deep layers.However, extensive simulations in [32] demonstrate that the application of the partial weight coupling structure to every layer will not degrade the theoretical and experimental performance.
Support selection: In addition to the application of the weight coupling structure, we introduce a special thresholding scheme in γ-Net, called support selection, which is inspired by "kicking" in linearized Bregman iteration [33].Meaning, we will select a certain percentage of entries with largest magnitude at each layer of γ-Net before the shrinkage step.Hereafter, the selected part will be trusted as "true support" and directly fed into the following layer, bypassing the shrinkage step.The remaining entries will go through the shrinkage step as usual.Assuming that ρ i percentage of entries are trusted in the i th layer, the support selection can be formally defined as: where S ρ i (γ) contains the entries with the ρ i largest magnitudes.It is worth mentioning that the percentage ρ i is a hyperparameter that requires manual tuning.When we apply the support selection to γ-Net, then ( 12) is modified as: Simulation experiments in [32] support that introducing the support selection on the one hand improves the convergence rate both theoretically and empirically, on the other hand, it contributes to reducing the recovery error, thus improving the estimation accuracy.
Piecewise linear thresholding function: The conventional soft-thresholding function, simply prunes elements with small magnitude to zero, which is very likely to result in information loss.In order to maintain useful information as much as possible and execute the shrinkage step in the meanwhile, we replace the soft-thresholding function by the piecewise linear function η pwl (γ, θ) in γ-Net, which is defined as: To clarify, the symbol j in the equation ( 15) refers to the imaginary number.Fig. 4 compares both functions.As can be seen, instead of pruning elements with small magnitude, the piecewise linear function only down scale them.Hence, it mitigates the information loss.However, it leads to the consequence that the final output of γ-Net being not strictly sparse.Most elements of the final output are not driven to zero strictly but to some extremely small values.Therefore, an additional postprocessing step for cleaning the elements with extremely small magnitude is necessary when we employ the piecewise linear function.
Fig. 5 compares γ-Net performance in term of the normalized mean square error (NMSE) under the two shrinkage functions.To clarify, the performance of γ-Net with different shrinkage functions was verified on a set of noise free data so that the results reflect the ideal performance.The NMSE is defined as: From this figure, it can be seen that γ-Net with the piecewise linear function achieves lower NMSE at the same number of layers.In another word, the piecewise linear function improves the estimation accuracy of γ-Net or the convergence rate.Specifically, γ-Net with the piecewise linear function requires only about 12 layers to achieve convergence.However, it is obvious that much more layers are required when the conventional soft-thresholding function is employed.

D. Algorithm summary
In order to achieve super-resolution ability and high elevation estimation accuracy, it is usually required to sample the elevation range much denser than the elevation resolution unit, thus rendering the steering matrix R severely overcomplete, reducing its restricted isometric property (RIP) and increasing its coherence.The violation of RIP and incoherence introduces outliers to estimates of the reflectivity profile γ [34].Additionally, outliers will be caused by the noise interference as well.Hence, we need further perform model order selection [35] to suppress the undesired outliers and estimate the number and location of scatterers precisely, which are typical steps in TomoSAR.The proposed super-resolving TomoSAR inversion algorithm is a combination of γ-Net and model order selection, and re-estimation.The basic workflow of the proposed algorithm is show in Algorithm 1.The model order selection is conducted based on Bayesian Information Criterion (BIC) [36].

IV. PERFORMANCE EVALUATION A. Data simulation and training
Simulation setup: We simulate the data using a setting similar to [8].Specifically, SAR measurements with 25 spatial baselines are simulated.The spatial baselines are regularly distributed in the range from -135m to 135m, thus leading to a Rayleigh resolution of around 42m.We simulated ca. 4 million training samples, half of which are single scatterer and the others are overlaid double scatterers.The simulation details of single and double scatterers are listed below.We randomized many parameters in the simulation, in order to make the simulation more realistic.
• Single scatterers: for a single scatterer, the scatterering coefficient is a complex number γ = A • exp (jφ), with • Double scatterers: For double scatterers, the generation of the two single scatterers is identical to the previous step, i.e. for each individual scatterer, the phase follows an uniform distribution φ ∼ U (0, 2π) and the amplitude follows an uniform distribution A ∼ U (1, 4), respectively.As a result, different amplitude ratio and phase difference of the simulated double scatterers can be covered.We also vary the elevation distance between the two single scatterers.The elevation distance varies from 0.1 until 1.2 Rayleigh resolution, with a regular sampling of 0.1 Rayleigh resolution.The elevation of the first scatterer follows an uniform distribution in the range of 0m to 200m.In order to avoid the off-grid bias, we assume that all scatterers locate on-grid with 1m sampling.

Training:
The training was carried out using Pytorch [37] and the Adam optimizer [38].The learning rate was initialized at 0.0005 and adjusted adaptively during the training.In the training procedure, we gradually increase the number of the layers from 3 to 20 in order to determine an optimal network structure.We validate the performance of γ-Net with different number of layers on a validation dataset.The validation dataset contains 0.2 million noise-free samples simulated using the same settings mentioned in the simulation setup, so that we can compare the theoretical performance of γ-Net with different number of layers.Fig. 6 illustrates the performance of γ-Net w.r.t the number of its layers.Closer inspection of Fig. 6 shows that the NMSE firstly decreases rapidly and then starts to converge to a minimum at around twelve layers.Simultaneously, the increase of the number of layers leads to heavier computation cost.Therefore, γ-Net employed in this paper contains just twelve layers.On the one hand, γ-Net with twelve layers is able to guarantee the estimation accuracy; on the other hand, it maintains the computational efficiency.Fig. 6: γ-Net performance w.r.t the number of layers.After 12 layers, the performance improvement of γ-Net is marginal with the increase number of layers.Instead, the increase of layers leads to heavier computational burden.

B. Single scatterer analysis
In addition to the simulated training data, we simulated four sets of testing data for the single scatterer analysis with SNR={0, 3, 6, 10}dB.Each set is composed of 0.2 million samples.We use the proposed algorithm to detect the single scatterer and estimate the corresponding elevation.Fig. 7 demonstrates the estimated reflectivity profile using the trained γ-Net and SVD-Wiener [35] (a conventional nonsuperresolving algorithm).As we can see, although both of γ-Net and SVD-Wiener are able to detect the position of the single scatterer, γ-Net reconstructs spectral lines instead of sinc-like point response function, thus mitigating the sidelobe problem.Moreover, from Fig. 7(a)-(d), we can see that the outliers caused by noise interference exist in the reflectivity profile estimate of γ-Net.Therefore, further model order selection step is required Table I provides the results after model order selection.The CRLB, the estimates mean (µ) and standard deviation (σ) in Table I are normalized to the Rayleigh resolution.Since the goal of TomoSAR is to have a good elevation estimate, and also a high detection rate, we define the term effective detection rate.An effective detection of single scatterer should satisfy the following two conditions: (1) only one scatterer is detected; (2) the estimated elevation should not exceed ±3 times CRLB w.r.t the ground truth.It is apparent from this table that the proposed algorithm is able to detect almost all single scatterer at different SNRs.Further statistics on mean value µ and standard deviation σ of the estimation error indicate high estimation accuracy of the proposed algorithm with the bias approaching zero and the standard deviation approaching the CRLB.

C. Double scatterers analysis
For the double scatterer analysis, we simulate two-scatterer mixtures in the experiments and a systematic evaluation was carried out regarding the distance between simulated double scatterers, different scatterers amplitude ratio as well as phase difference between the double scatterers.

Performance with respect to scatterers distance
In this experiment, we performed a well-known TomoSAR benchmark test [7] [35].We simulated double scatterers with increasing elevation distance between the two layovered scatterers, in order to mimic a facade-ground interaction.Since we focus on the super-resolution regime, the elevation distance d s between the two overlaid scatterers is set to be no larger than 1.2 times of the Rayleigh resolution.Two different scenarios were taken into consideration with SNR∈ {0, 6} dB, which represent typical SNR levels in a high-resolution spaceborne SAR image.Fig. 8 demonstrates some examples of the estimated reflectivity profile at the normalized elevation distance α = [0.2,0.5, 1.0].The normalized distance α is defined as the ratio of the elevation distance between the double scatterers and the Rayleigh resolution ρ s , formally expressed as: From Fig. 8, one can see that both the trained γ-Net and SVD-Wiener are able to distinguish the overlaid double scatterers in the non-superresolving case, i.e. last column, when α = 1.0.But comparing to SVD-Wiener, γ-Net provides much higher elevation estimation accuracy.Moreover, when we move the double scatterers closer into the Rayleigh resolution, SVD-Wiener fails to separate them.In the contrast, γ-Net is still capable of detecting the double scatterers in most cases, which exhibits its super-resolution power.
Hereafter, we compare the proposed algorithm with the state-of-the-art SL1MMER algorithm [8] focusing on the detection rate and the estimation accuracy.Similar to the single scatterer case, we use the effective detection rate to fairly evaluate the detection rate.An effective detection of double scatterer is defined as: 1) the hypothesis test correctly decides two scatterers for a double-scatterers signal; 2) the estimated elevation of both detected double scatterers are within ±3 times CRLB w.r.t their true elevation; 3) both elevation estimates are also within ±0.5 d s w.r.t their true elevation.The third criterion is seldom seen in the literature.However, it is necessary, because in extremely super-resolving cases, 3 times CRLB will become much larger than the elevation distance.Hence, it cannot be used as a accountable measure for reasonable estimates.±0.5 d s is a much stricter constraint in such cases, which will reflect the true performance of the algorithm.
Fig. 9 compares the effective detection rate P d of SL1MMER and the proposed algorithm for the case N = 25.For each pair of (SN R, α), 0.2 million Monte Carlo trials for the worst case in TomoSAR inversion, i.e. the double scatterers have the same amplitude and phase, were simulated.The effective detection rate P d is presented as a function of the normalized distance.The red and blue polylines illustrate the results of the proposed algorithm and SL1MMER, respectively.As we can see from Fig. 9, the proposed algorithm has comparable super-resolution power as SL1MMER.
Fig. 10 demonstrates the elevation estimates of simulated facade and ground w.r.t the true normalized elevation distance.In each subplot of Figure 10, the two red line segments represent the true elevation of the simulated facade and ground, respectively, while the dashed lines show the true elevation ±1×CRLB (normalized).Exhaustive details of the derivation of the CRLB can be found in [8].The elevation estimates of simulated facade and ground are plotted with each dot depicting the sample mean of all estimates at the given normalized distance and the error bar indicating the corresponding standard deviation.Points below an effective detection rate of 10% were not plotted in the figure.As it is shown in Fig. 10, the proposed algorithm shows higher elevation estimation accuracy than SL1MMER.To be specific, at 0dB SNR, although both the proposed algorithm and SL1MMER have similar estimate bias, the proposed algorithm leads to much smaller variance.In high SNR case, the proposed algorithm outperforms SL1MMER in super-resolving cases w.r.t the elevation estimation accuracy.As can be seen that, SL1MMER suffers from much larger elevation estimate bias as well as the standard deviation.

Performance with respect to amplitude ratio
This simulation sets out to evaluate the performance of the proposed algorithm w.r.t different amplitude ratio of the double scatterers.Fig 11 illustrates us the effective detection rate of the proposed algorithm at different amplitude ratio.As can be seen, the effective detection rate decreases with the increasing amplitude ratio.Since γ-Net promotes sparsity by shrinking elements with small magnitude layer by layer.With the increase of the amplitude ratio between simulated double scatterers, the darker scatterer becomes less prominent, and hence easier to be ignored.Therefore, at high amplitude ratio, the proposed algorithm tends to detect single scatterer with dominant amplitude.However, from our perspective, it will not affect the application of the proposed algorithm.In realworld processing, if one scatterer is much more prominent than others in a pixel, we can usually judge that this pixel contains only a single scatterer by viewing others as noise.

Performance with respect to phase difference
We varied the phase difference between the simulated double scatterers in this simulation to further verify the generalization ability of the proposed algorithm.us an example of the effective detection rate when N = 25 and SN R = 6dB, with the normalized distance α = 0.6.The double scatterers in the simulation are set to have identical amplitude.As we can see, although the phase difference φ affects the performance, the proposed algorithm is still capable of providing satisfactory super-resolution power even in the worst case, where the phase difference φ = 0.

D. Analysis of false detection
In this section, we will provide a quantitative assessment about false detection.We used the proposed algorithm to detect 0.2 million samples containing 0 scatterer, i.e. pure noise, at different SNR.As it is shown in Table II, the proposed algorithm is able to distinguish almost all samples of noise at different SNR.Less than 5 percent samples are falsely detected as single scatterer and only about 0.1% as double scatterers.The low false alarm attributes to the powerful model order selection with known noise variance in the simulation.However, in real-world application, the noise variance needs to be estimated.Therefore, Table II shows the upper limit.

E. Performance at limited number of measurements
This simulation was carried out to verify the performance of the proposed algorithm at limited number of baselines.From left to right, the normalized elevation distance α = 0.2, 0.
Fig. 9: Detection rate P d as a function of the normalized elevation distance between the simulated facade and ground using the proposed algorithm (dashed star) and SL1MMER (dashed circle) with SNR = 0dB and 6dB, N = 25 and phase difference φ = 0(worst case) under 0.  We simulated data with only 6 baselines according to a real TanDEM-X images stack we have.The baseline ranges from -565.5m to 373.2m.Fig. 13 compares the performance of the two algorithms at limited number of measurements.As one can see, in the noisy case, i.e.SNR = 0dB, the two algorithms have similar performance.However, with the increase of the SNR level, the proposed algorithm outperforms SL1MMER by a fair margin.

F. Performance verification using real data
For a better evaluation of the proposed algorithm, we worked with a stack of six high-resolution TanDEM-X pairs acquired in the bistatic mode mentioned in the previous section.The elevation aperture size of about 940m results in about 12m inherent elevation resolution.An optical image of the test site from Google Earth and the SAR mean intensity  Preprocessing, such as multiple images co-registration and phase calibration were carried out using the DLR's integrated wide area processor (IWAP) [39].Moreover, we manually selected a coherence point on the ground as reference and set its elevation as zero.γ-Net employed for the real data experiment was trained with data simulated using the real baseline distribution.The training data contains 4 million samples generated with the same strategy mentioned in the simulation setup.After training, γ-Net can be directly applied in the upcoming TomoSAR processing on real data.
We use the proposed algorithm to reconstruct the whole test site and demonstrate the super-resolution power by comparing to the results derived by SL1MMER.The complete  Comparing to SL1MMER, the proposed algorithm captures more reflection from the facade.This confirms the finding in Fig. 13 that the proposed algorithm outperforms SL1MMER at low number of measurements.
In terms of detection rate, the proposed algorithm is comparable to that of SL1MMER.Although there is no ground truth, we compare the agreement of the double scatterers detection of the both algorithms (shown in Table III).For the whole test site, 38.97% and 37.76% of pixels are detected as double scatterers by the proposed algorithm and SL1MMER, respectively.36.56% of the pixels were detected as double scactterers by both algorithms.Only 2.4% were only detected double scatterers by the proposed algorithm, and only 1.2% were only detected by SL1MMER.V. DISCUSSION

A. Analysis of computational complexity
We assume O(1) to be the computational complexity of one multiplication.The computational complexity of the proposed algorithm, as well as the original ISTA, are mainly determined by O(K s L 2 ), where K s is the number of layers or the number of iterations.For the proposed algorithm, K s is set as 12. Comparing to the original ISTA, which usually requires hundreds or even thousands of iteration, the computational efficiency of the proposed algorithm is two to three orders of magnitude better.Moreover, other efficient L 1 norm minimization solvers, such as FISTA [40], ADMM [25], RBPG [41], usually need about 100 iterations to converge and achieve reasonable estimation accuracy.Comparing to those efficient solvers, the proposed algorithm is still about one order of magnitude more efficient.
In our experiments, for a single dataset containing 0.2 million Monte Carlo trials simulated using the aforementioned setup, SL1MMER requires about 10 CPU hours for the TomoSAR processing.On the contrary, it takes only a few CPU minutes when a trained γ-Net is employed, despite the fact that about 9 hours are required for training the model with a single NVIDIA RTX 2080 GPU.However, the fixed cost of model training diminishes when we further increase the amount of the data.Fig. 18 provides us an intuitive view of the time consumption of the two methods for TomoSAR processing.As we can see, the training procedure dominates the time consumption of the proposed algorithm and the increment of the amount of data will not burden the time consumption seriously.In the contrast, the time consumption of SL1MMER escalates with the increasing amount of data, especially when limited measurements are available.In real-world TomoSAR processing, the number of pixels is usually tens or even hundreds of million, thus blocking the application of SL1MMER or other second-order CS-based methods.The proposed algorithm is able to complete the processing, including the training procedure, within matters of hours.The great superiority of the proposed algorithm in computational efficiency makes large-scale super-resolving TomoSAR processing feasible and realizable.
In addition, it is worth mentioning that the proposed algorithm maintains the elevation estimation accuracy in the meanwhile.The proposed algorithm employed the neural network with special structure, which can be trained as a more general model and is more likely to reach the global minimum and achieve better results.A detailed investigation about how deep learning improves the estimation efficiency for TomoSAR inversion will be executed in our following study.

B. Parameter selection
Step size in γ-Net: As it is stated in the equation ( 7), a manual selected step size is required for the initialization of the trainable weights in γ-Net.To select the step size, the Lipschitz constant L s is required, which is the largest eigenvalue of R H R. Usually, a proper step size can be taken as 1  Ls .In our experiments, we fix the step size as 1 2Ls to guarantee the convergence of γ-Net.
Percentage ρ in the support selection: An empirical formula is introduced in [32] to choose a proper percentage ρ i for the i th layer of γ-Net for the support selection, where p is a positive constant and p max is the upper bound of the percentage of the support cardinality.Both p and p max can be selected using cross validation.From our experience, we fix the percentage ρ i as 5% for all layers of γ-Net, which leads to satisfactory performance for our application.

C. Piecewise linear function
The learning architecture of γ-Net, where the output of the current layer is generated only directly from the output of the previous layer, leads to an error propagation phenomenon.Specifically, errors in the first few layers of γ-Net will be propagated and further amplified in the following layers.Moreover, the most serious problem is that once useful information is discarded in the previous layers, it is no longer possible for the upcoming layers to utilize the discarded information.The conventional soft-thresholding function, simply prunes elements, whose magnitude is smaller than the threshold, Similar to all other deep learning based models, the training time could also become a burden when the task is to process many stacks with distinct baseline distributions.In our experiments, we simulated training data with the exact baselines as the test data.Ideally in real data processing, we shall train a separate model for each stack, which is very time consuming.However, our models shows moderate tolerance to the baseline discrepancies between the training and the testing data.This is elaborated in more detail in the next section.
Baseline perturbation: The biggest challenge to our deep learning model is the baseline discrepancies between the training and the testing data, because the baseline distribution are rather unique for each SAR interferometric stack.As a preliminary study, we test the proposed algorithm on testing data with slight baseline perturbation and find that the slight perturbation do not degrade the performance significantly.To be specific, we add random perturbation uniformly distributed in the range [−10m, 10m], i.e. about 15% of the baseline standard deviation, to the 25 baselines.Applying the pretrained γ-Net on the data with baselines perturbation shows that the effective detection rate decreases only 3% to 5% and the estimation accuracy as well as the bias retain nearly the same.This shows a good transferability of our trained model.However, further study is required to guarantee the performance of the proposed algorithm for large scale processing.
Application to more complex scenarios: When the proposed algorithm is applied to more complex scenarios, i.e. more than 2 scatterers are overlaid in a single resolution unit, we are not capable of detecting and separating all of them.We tested the proposed algorithm in the three-scatterer case and found that the proposed algorithm tends to detect overlaid triple scatterers as double scatterers locating between the ground truth.Due to the fact that triple scatterers are not considered and covered in the training phase, the poor performance in coping with triple scatterers is explainable.From our perspective, the solution to this problem can be two-fold.First, we can enrich the training data by introducing samples containing more scatterers.Second, we can view samples containing more than two scatterers as out-of-distribution samples, since in real-world processing only a tiny minority of pixels contain more than two scatterers.We can then use Dirichlet Prior Network (DPN) [42] [43] to detect these outof-distribution data and solve it using CS-solvers.

VI. CONCLUSION
In this paper, an advanced super-resolution TomoSAR inversion approach based on deep learning is proposed.We improved the complex-valued learned ISTA algorithm and proposed γ-Net by applying weight coupling structure, introducing support selection and employing the piecewise linear function instead of soft-thresholding.Experiments show that the proposed algorithm is capable of solving the L 2 -L 1 mixed norm minimization efficiently.Rigorous evaluation shows that the proposed approach is able to deliver competitive performance to the state of the art in terms of the super-resolution capability and elevation estimation accuracy.This paper opens a perspective on super-resolving TomoSAR inversion via deep learning and shows great potential of applying deep learning to solve other sparse reconstruction problems.In the future, we aim to extend the deep learning based approach to higher dimensional spectral estimation problems, especially to differential TomoSAR reconstruction.Moreover, we will further exploit the power of deep learning to improve the performance, e.g.introducing long short term memory (LSTM) unit to γ-Net to make use of historic information.

Fig. 1 :
Fig.1:The SAR imaging geometry.The elevation synthetic aperture is built up by SAR data acquired from slightly different viewing angles.Flight direction is orthogonal into the plane.

Fig. 3 :
Fig.3: Unfolded LISTA architecture.A K-layer LISTA unrolls the RNN and truncates it into K iterations, thus leading to a side-connected feedforward neural network.

Fig. 4 :
Fig. 4: Comparison between the piecewise linear function and soft-thresholding function.Instead of pruning the elements with small magnitude, the piecewise linear function just further minifies them, thus possibly avoiding the information loss.

Fig. 5 :
Fig. 5: Performance of γ-Net using different shrinkage function.The piecewise linear function conduces to faster convergence and improves the estimation accuracy.

Algorithm 1
Summary of the proposed algorithm Simulate training data Sampling the elevation extent Generate steering matrix R with R nl = exp (−j2πξ n s l ), where ξ n = −2b n /(λr) Simulate reflectivity profile γ Simulate SAR measurements with g = Rγ + ε Finish the generation of training data {(g i , γ i )} T i=1 Training of γ-Net Over given training samples {(g i , γ i )} Ψ, g) − γ|| 2 2 where Ψ = [ (W), (W), θ] for each pixel in the image: do Preliminary estimate via γ-Net: γ = γ-Net(g) Model order selection to remove outliers: P = argmin P σ −2 ε ||g − Rγ 2 2 + 1.5P ln N Determine the number of scatterers Final estimation of their elevation end for the amplitude being deterministic and the scattering phase following an uniform distribution, i.e. φ ∼ U (0, 2π).In order to randomize the amplitude in the simulation, we simulate it with a uniform distribution as well, i.e.A ∼ U (1, 4), although a real SAR amplitude image shows more Rayleigh or Gamma distribution.The elevation of the simulated scatterer is regularly distributed in the range from 0m to 200m with 1m sampling.Once the location of the scatterer is determined, the echo signal is simulated at 11 different levels of SNR, which is regularly distributed between [0dB, 10dB].

Fig. 10 :
Fig. 10: Estimated elevation of simulated facade and ground, (a) SN R = 0dB with SL1MMER, (b) SN R = 0dB with the proposed algorithm, (c) SN R = 6dB with SL1MMER, (d) SN R = 6dB with the proposed algorithm.Each dot has the sample mean of all estimates as its y value and the correspond standard deviation as error bar.The red line segments represent the true elevation of the simulated facade and ground.The dashed curves denote the true elevation ±1×CRLB normalized w.r.t the Rayleigh resolution.

Fig. 11 :
Fig. 11: Effective detection rate ρ d as a function of amplitude at 6dB SNR.

Fig. 13 :Fig. 14 :
Fig. 13: Effective detection rate P d as a function of the normalized elevation distance between double scatterers simulated with 6 real baselines.The simulated double scatterers are set to have identical phase and amplitude, i.e. the worst case.For each pair of (SNR, α), 0.2 million Monte Carlo trials were simulated.(a) the proposed algorithm, (b) SL1MMER

Fig. 17 :
Fig. 17: Reconstructed and color-coded elevation of detected double scatterers.From top to down: top and bottom layer, respectively.From left to right: the proposed and SL1MMER algorithm, respectively.

Fig. 18 :
Fig. 18: Comparison of time consumption between the proposed algorithm and SL1MMER.(a) on dataset simulated using 25 regularly distributed baselines at 6 dB, (b) on dataset simulated using 6 real baselines at 6 dB.The training time will be affected only by the size of training data and the number of training epochs we set.Different baseline configuration does not affect the training time.On the contrary, the time consumption of SL1MMER is strongly dependent on the number of baselines.When limited images are available, the time consumption of SL1MMER escalates with the increasing number of data, whereas the inference time of the trained γ-Net is negligible.The proposed algorithm shows great computational efficiency in processing regular TomoSAR data, which usually contains tens of million pixels

TABLE I :
Statistics of the estimate of single scatterer using the proposed algorithm.µ and σ denote the sample mean and the corresponding standard deviation,respectively.The proposed algorithm is able to detect the single scatterer in nearly all cases with the standard deviation approaching the CRLB and bias approaching zero.

TABLE II :
Statistics of evaluation on false detection.

TABLE III :
Percentage of scatterers detection for the two algorithms.