An Unsupervised CNN-Based Multichannel Interferometric Phase Denoising Method Applied to TomoSAR Imaging

Tomographic synthetic aperture radar (TomoSAR) is an advanced SAR interferometric technique to retrieve 3-D spatial information. However, the standard deviation in the reconstructed elevation could be high due to the noise in the interferometric phases, which makes the denoising filter crucial before tomographic reconstruction. In this article, we propose an unsupervised multichannel SAR interferometric phase denoising method based on the convolution neural network. It utilizes the weighted least-squares (WLS) regularization combining with the covariance of multichannel interferometric phases to minimize the standard deviation of phase noise, which leads to the accurate and complete TomoSAR reconstruction. This network is trained by real SAR images and the results of both simulated and real observations verify the effectiveness of our proposed method.

interferometric phases [6], [7] and further leads to higher standard deviation in the elevation estimated with tomographic focusing methods [8].
Great efforts have been put into interferometric phase denoising and plenty of filters have appeared [9]. The interferometric phase is traditionally estimated by an operation called multilooking [10] consisting in averaging pixels in the range and/or azimuth directions. Considering that the multilook filter reduces the size of the image, this operation cannot be performed on a large number of pixels. Since then, famous filters, such as Lee [11], [12], maximum-likelihood method [13], have been deployed in the spatial domain. Meanwhile, filters in the frequency domain [14], [15] and in the wavelet domain [16], [17] has also been widely developed, possessing the superior quality of enhancing the phase signatures to be more separable from the noise. However, these methods have not overcome the neighbor-connection limitation. The concept of nonlocal filtering [18], [19] has been migrated to deal with the phase noise in interferograms [20]. The weights are defined from intensity and interferometric phase, and are iteratively refined based both on the similarity between noisy patches and on the similarity of patches from the previous estimate.
The multichannel interferometric phases could carry much more information [21], [22], [23]. For example, TomoSAR exploits a set of interferometric phases to recover the elevation profile inside the pixel. These multichannel interferometric phases are of high interest for a wide range of applications for Earth observation. Therefore, the filtering methods are migrated to deal with multiple interferograms from tomographic data, such as the multilook filtering [24] and the nonlocal filtering [25], [26], [27] using the multidimensional probability density distribution (PDF) or the covariance matrix to achieve the better denoising performance.
Recent advances in deep learning have revolutionized the approaches to SAR processing tasks [28], [29]. The methods in [30], [31] combine convolutional neural networks (CNNs) together with the concept of residual learning for the joint estimation of the interferometric phase and coherence.In [32], the CNN with a multiobjective cost function is used for the task of SAR despeckling. In [33], the deep CNN learns the selfsimilarity function of real interferometric phase from the input interferogram to achieve the purpose of interferometric phase denoising. However, ground-truth datasets are required to train these network and the output of a well-trained network could be close to the ground truth. Nevertheless, there is no noise-free SAR dataset as ground truth in practice. A large number of optical and simulated datasets are used for the training process [34]. However, these reference datasets are quite different from real ones in the imaging mechanism, gray-level distribution, operational wavelength, etc., which results in a greatly reduced reliability of the network. In [35], the proposed network uses the noisy dataset as a reference in the training process. This method is built on the basic principle that the expectation of the noisy reference is theoretically equal to the clean signal. In [36], [37], the unsupervised filters based on CNN generative model have been proposed for single-channel interferometric phase filtering.
Clearly, these networks have been developed for the single interferogram denoising task and the deep learning-based approaches for multichannel interferometric phases denoising are understudied. Considering the application of multichannel interferometric phases in TomoSAR, we propose an unsupervised multichannel SAR interferometric phase denoising method (MS-IPDM) based on CNN in this article. The network is trained to learn the multichannel denoised interferometric phases based on the WLS regularization. Covariance matrix of multichannel interferometric phases is applied to minimize the standard deviation of multichannel interferometric phase noise. Based on the unsupervised training, real acquisitions are fed into the network for the training process. And, the results of simulated and real SAR images ensure the effectiveness of the proposed method.

II. INTERFEROMETRIC PHASE MODEL
For the SAR acquisition, the focused complex-valued measurement g m of an azimuth-range pixel (x 0 , r 0 ) for the mth acquisition at aperture position b m is the tomographic projection of the reflected signal along the elevation direction. Considering the TomoSAR imaging model, we can obtain the reflectivity function γ γ γ(s) along elevation s with an extent of Δs, regardless of the deformation term for simplicity where, the spatial frequency ξ m = −2b m /λr is proportional to the respective aperture position b m (m = 1, 2, . . .M). λ is the wavelength of radar signals, and r denotes the range between radar and the observed object, respectively. After the proper phase calibration, the interferometric phase can be expressed as a function of the elevation information and TomoSAR uses the multichannel interferometric phases to obtain the tomographic reconstruction along elevation direction, which is shown in Fig. 1. However, the phase noise degrades the quality of interferograms severely and leads to errors in the heights estimated with tomographic focusing methods. The multichannel interferometric phase has been completely characterized in the real domain, leading to the following model [11], [38]: where, the ϕ ϕ ϕ is the measured multichannel interferometric phase and the φ φ φ is the ideal interferometric phase without noise, The multichannel interferometric phases are calculated based on the common reference master SAR image. ε ε ε denotes the zero-mean phase noise, whose standard deviation reflects the noise level. According to the principle of TomoSAR imaging, the standard deviation of interferometric phase noise could definitely lead to a higher standard deviation of tomographic reconstruction. And the original phase φ φ φ and the phase noise ε ε ε are assumed to be independent of each other.

A. MS-IPDM Network
The idea of applying deep learning to image processing originated from [39], where the network uses a residual learning strategy to subtract the noisy component. Note that this architecture of network handles the additive-form noise well [39], [40], [41], thus we propose the MS-IPDM to filter the interferometric phase noise based on this network. Fig. 2 shows the architecture of the network.
The residual learning strategy is designed for speeding up the training process and improve denoising performance. In our interferometric phase filtering, we apply the similar idea by creating identity shortcuts for predicting the noise of multichannel interferometric phases. Instead of directly outputting the estimated clean components, the proposed CNN-based method is trained to predict noise. Therefore, the model implicitly filters the latent clean interferometric phases with hidden operations within the network.
The inputs of the network are 128 × 128 phase patches, generated by multichannel interferometric phase patches of SAR raw data and the output are the corresponding multichannel filtered interferometric phase patches. There are (M − 1) number of channels in the input and output layers, i.e., M = 2 when the proposed method deals with the single-channel interferometric phase. The dimension of output and input must be equal, which is achieved by the 3 × 3 convolutions with a stride of 1. The network removes the pooling layers and combines batch normalization (BN) for fast training and better denoising [42]. It sets the depth of the network according to the patch size used in the algorithms [39]. Each layer uses the filters of size 3 × 3 × 64, extracting 64 feature maps, except the first and last layers. The network is oriented to the denoising of multichannel interferometric phases. The configurations of the network are listed in Table I.

B. Loss Criterion Applied in MS-IPDM
To suppress the interferometric phase noise, we need to minimize the standard deviation of multichannel interferometric phase noise. Taking account of the model of measured interferometric phase ϕ ϕ ϕ in (2), we define the loss function based on the WLS regularization without considering the ground truth, i.e., the clean version of corresponding multichannel interferometric phases φ φ φ. Specifically where,φ φ φ denotes the output multichannel interferometric phases of the network. W W W Ψ is the weighting matrix, which is defined by the inverse of the covariance C C C of the multichannel interferometric phases, i.e., W W W Ψ = C C C −1 . The covariance can be expressed as follows: where, σ σ σ 2 i , i = 1, 2, . . ., M − 1 is defined by correlation coefficient in [43], σ σ σ 2 where L is the number of looks and ρ ρ ρ i denotes the corresponding complex coefficient. The loss function is illustrated in Fig. 3. It is not hard to see that σ σ σ 2 i is inversely proportional to the coherence, thus the weighting matrix of this term is proportional to the correlation coefficient. Based on the loss criterion, the output would fit the input to a greater degree in the regions with high coherence, and less well in the regions with poor coherence. By setting the WLS-based loss function, the standard deviation of multichannel interferometric phase noise could be minimized, which means that the zero-mean noise could be effectively suppressed.
During the training process, the parameters of the network could be optimized in order to minimize the loss function. Thus, the output of the network would have the low standard deviation of multichannel interferometric phase noise. Meanwhile, there is no need for ground truth in the training process according to the loss function (3), as we can learn from the input data itself, without requiring its clean version as the training dataset. Therefore, our network is completely unsupervised and can be trained directly on real SAR data, which are of supreme importance to the development of deep learning in the SAR image field. The framework of the proposed CNN method is provided in Fig. 4.

A. Experimental Settings and Data Description
In our experiments, we use the Aerospace Information Research Institute, Chinese Academy of Sciences array data acquired on the city of YunCheng in Shanxi province, as the dataset. All the input interferometric phases for training, which are of the image size of 3100 × 1220, are formed with the first track as a common reference. To ensure the generalization ability of the network and to ease the computational burden, the input is divided into patches of 128 × 128 pixels, with a stride of 64. This dataset has 34 225 image patches, including 24 260 patches for training and the remaining for test. The proposed model is implemented in the PyTorch package and run on an NVIDIA 2060Ti GPU with a 6 GB RAM.
Meanwhile, the number of channels in the input layer and output layer is set to be 7 based on the training dataset for simplicity. And the network is trained for 100 epochs using the Adam optimizer, with a learning rate set as 0.001. A batch size of 24 is used in all experiments. In this section, we describe the test experiments conducted on the simulated and real SAR datasets to reveal the denoising performance of the proposed MS-IPDM.

B. Simulated Data
In this section, we simulate a simple building scene composed of planar facets representing the ground, wall, and roof.  The framework of simulation is shown in Fig. 6. Taking into account the number of channels in the input layer of the network, we assume 15-track complex SAR images to verify the generalization of the proposed method, which has a different elevation resolution from that of the training data. The number of interferometric phases of the simulated data is twice that of the actual data, which are divided into two groups and fed into the network sequentially. The difference in the number of input channels leads to the different elevation resolution to validate the effectiveness of the proposed network. The building roof is 50 m in the height direction. Fig. 5 shows the simulated scene and the corresponding intensity of an image in the stack. There are three areas in the intensity image.
To demonstrate the suppression performance of interferometric phase noise, we simulate the SAR images with 0.6 rad  standard deviation in the multichannel interferometric phases. Due to phase noise, the single-look (unfiltered) interferometric phase has a high standard deviation of the multichannel interferometric phase noise, which can be seen in Fig. 7. The ground truth, the mean, and an interval of standard deviation are represented on the graphics for the three estimated components. Fig. 8 presents the example of the multilook (5 × 5) interferogram, the nonlocal filtered interferogram [20], the CNN-IPDM (interferometric phase denoising method based on CNN-InSAR [37]), the Gen-IPDM (interferometric phase denoising method based on GenInSAR [36]), the adapted SS-IPDM (single-channel interferometric phase denoising method), and the proposed MS-IPDM filtered interferograms. The multilook filter averages 5 × 5 pixels in the range and azimuth directions. The nonlocal filter performing a weighted averaging of similar pixels [20] based on the multichannel statistical model of interferometric phases. The CNN-IPDM is a supervised interferometric phase denoising method adapted from CNN-InSAR [37]. The training dataset is simulated following the original paper. The Gen-IPDM is an unsupervised interferometric phase denoising method adapted from GenIn-SAR [36]. The SS-IPDM is a method achieved by changing the number of channels to be single in the input layer, and the MS-IPDM is the proposed multichannel interferometric phase denoising method. Fig. 9 shows the statistical answer of different filters on the samples from 5 to 35 m along the azimuth direction. The statistics have been measured on one of the denoised interferometric phases. We find that SS-IPDM can reduce the standard deviation of interferometric phase noise partly but it represents a bias sometimes, which is similar to the performances of singlechannel CNN-IPDM and Gen-IPDM. Meanwhile, nonlocal filter is unbiased with a lower standard deviation, which considers the joint PDF of multichannel interferometric phase noise. And the proposed MS-IPDM can do better, which uses the covariance of multichannel interferometric phase to suppress the noise.   Table II shows the filtering performances of the interferometric phases for the adoptive methods. RMSE p wa denotes the root-mean-square error of the restored interferometric phases on the whole area, and PCE p wa [36] describes the phase cosine error of the restored interferometric phases on the whole area. It can be noticed that nonlocal filter and MS-IPDM have small phase errors, which is consistent with the statistical answers on the different filtered interferometric phases in Fig. 8. Figs. 10 and 11 show the 3-D reconstruction of unfiltered and different filtered data, obtained with the iterative shrinkage and thresholding algorithm [44]. Clearly, the 3-D reconstruction of the single-look interferogram is extremely noisy. Filtering allows to strongly mitigate this effect. It is noted that the results obtained from the adoptive methods are able to preserve the    structure of the scene while removing noise on the reconstructed reflectivity profile. However, the better restoration of interferometric phases leads to the better tomographic reconstruction. As expected, the nonlocal filter suppresses the outliers well and MS-IPDM achieves a more accurate and complete result.
The standard deviation of interferometric phases could definitely result in the standard deviation of reconstructed tomographic elevation. To show the filtering performance, we analyze the Cramér-Rao lower bounds (CRLB) for the estimated   elevation of two scatterers in the wall layover area. The true elevation of simulated wall and ground is shown as two solid line segments w.r.t. their normalized elevation distance α in Fig. 12(a), respectively. It is noted that α is the elevation distance between two scatterers normalized w.r.t. the Rayleigh resolution unit. And, the dashed lines mark true elevation ±1 × CRLB with the simulated interferomgrams.
As shown in Figs. 12(b) and 13, the elevation estimatesα of wall and ground are plotted w.r.t. their normalized true elevation difference α. Each dot depicts mean value of all estimates, with error bar indicating its standard deviation. Regardless of the superresolution properties of the tomographic focusing method, we find that the tomographic reconstruction of single-look interferometric phases has a higher standard deviation. After the adoptive filters, the deviation has been suppressed partly, especially with the proposed MS-IPDM.
To provide a quantitative analysis, a comparison of the obtained results with ground truth is also performed based on different metrics in Table III. MAE h ga denotes the mean absolute error of reconstructed height properties in the ground area, STD h ga describes the standard deviation of reconstructed height properties in the ground area, and RMSE h ga describes the root-mean-square error of the reconstructed height properties in ground area. MAE h rl represents the mean absolute error of the reconstructed height properties in roof layover area, STD h rl denotes the standard deviation of the reconstructed height properties in roof layover area, and RMSE h rl describes the rootmean-square error of the reconstructed height properties in roof layover area. It can be noticed that MS-IPDM reduces deviation well while keeping the lower errors, which means that there are   fewer noisy points in the 3-D reconstruction of the proposed method. In this case, the tomographic reconstruction could be precise and accurate.
Meanwhile, we also showed the number of detected scatterers to analyzed the missing points in Fig. 14. Note that the reconstruction after nonlocal filter has impressive detection capability, i.e., many of the scatterers in the ground area are detected. However, the performance of detection in the wall layover decreases accordingly. In contrast, reconstruction after the proposed MS-IPDM has the better detection, especially in the wall layover area, indicating that the missing points are reduced effectively.
Taking into account the outliers and missing points in the 3-D reconstruction, we plotted the curves of accuracy as a function of completeness [45] in Fig. 15. The lower values represent the improved performance. It can be seen that the proposed MS-IPDM leads to a better tradeoff between accuracy and completeness, which is consistent with the suppression of noisy and missing points in the tomographic reconstruction.

C. Real Data
Furthermore, we present the 3-D reconstruction results on the three buildings of experimental array data, which are not existent in the training dataset. The radar system operates at 15 GHz and has eight channels in the cross-track direction. The distance between the adjacent channels is 0.8 m. The height of the radar platform is 972 m and the local incidence angle is 30 • . The corresponding intensity maps of the areas are shown in Fig. 16. As we know, the height of building 1 is 25 m, the height of building 2 is 24 m, and the height of building 3 is 50 m. Fig. 17 shows the unfiltered interferometric phases of building 1 and building 2, corresponding 3-D reconstruction and range-height planes of the two buildings. Here we assume that the ground truth of the buildings could be fitted with vertical lines at the range-height plane, whose height is known. It can be seen that there is much phase noise in the interferometric phase. Correspondingly, noisy and missing points appear in the 3-D reconstruction, which are obvious in the range-height plane of the reconstruction. Fig. 18 shows the different filtered interferometric phases of building 1 and building 2. We find that the proposed MS-IPDM leads to less phase noise in the interferogram and could not be oversmoothing like the multilook filter. For example, the noise is depressed effectively and the details of the building are preserved well. As seen in the red rectangles, they denote the regular shapes of the building. The 3-D reconstructed results are presented in Fig. 19. It can be observed that the buildings are well reconstructed and fewer noisy points make the visual interpretation easy. Meanwhile, fewer points are missing, leading to relatively complete walls. The range-height planes of the 3-D reconstruction results of building 1 and building 2 are reported in Figs. 20 and 21.
On the one hand, most of the noisy points can be removed by the filtering methods, as shown in the red ellipses marked. Visual inspection shows that there are fewer noisy points around the buildings based on the interferograms after the proposed filter, which is especially evident in the reconstructed results of the building 1. On the other hand, the empty walls of the buildings marked by the blue rectangles can be filled by the applied filtering methods, which is evident in the reconstructed results of the building 2. In summary, the nonlocal filter leads   to visually comparable results whereas MS-IPDM results in a lower amount of noisy points and more complete buildings.
Similarly, the unfiltered interferometric phase and 3-D reconstructed result of building 3 are represented in Fig. 22. And, the filtered interferometric phases and 3-D reconstructed results of building 3 are shown in Figs. 23 and 24. The red rectangles are the examples of the effect in denoising and detail-preserving, which represent the areas around the building roof. It is noted that the nonlocal filter handles the noisy points and missing points well, compared with other filters. Particularly, the building is better-reconstructed thanks to the proposed filter. Range-height plane results of the 3-D reconstruction in Fig. 25 show that there are fewer noisy points around the building and the missing points of the wall are filled by the MS-IPDM.
Numerical evaluation of experimental results is difficult due to the lack of reference data. To measure the filtering performance on the 3-D reconstruction, we use the MAE h b and STD h b to evaluate the obtained height of different buildings, as shown in Table IV. According to the quantitative analysis, the proposed MS-IPDM keeps a lower standard deviation than the nonlocal filter, which is consistent with the simulation analysis.
Combining with the small absolute error, we can conclude that     ground, there are fewer noisy points in MS-IPDM thus the number of scatterers should be small. Meanwhile, the reconstruction of wall is dense, thus the number of detected scatterers grows faster than other methods, which indicates the improvement in completeness.

V. CONCLUSION
In this article, we have shown that the interferometric phase noise has a bad effect on the 3-D reconstruction of TomoSAR, resulting in the higher standard deviation of tomographic elevation. Consequently, we have proposed an unsupervised multichannel interferometric phase denoising method for TomoSAR based on CNN, which utilizes WLS regularization combining the covariance matrix of multichannel interferometric phases to minimize the standard deviation of phase noise. The network has been trained by real SAR images and tested by the simulated and real data.
We have simulated a simple scene containing different layover areas to evaluate the influence of the proposed filter generally. Considering the filtered interferograms and the quality of reconstructed 3-D points cloud, we have presented numerical results for nonfiltered and different filtered data. As expected, the proposed method could suppress the standard deviation of interferometric phases effectively and keep a low error, therefore the precision and accuracy of the reconstruction could be improved by suppressing the noisy points. Meanwhile, we found that MS-IPDM could reduce the missing points of the 3-D points cloud, thus lead to the better detection and completeness.
Furthermore, we have chosen three buildings of the real data, which were not existent in the training dataset. According to the corresponding results, noisy and missing points have been suppressed effectively. Based on the known heights of buildings, we found that the standard deviation of the reconstructed heights could be lower, while keeping the small absolute error. Meanwhile, the statistics of reconstructed scatterers have proved the improvement of detection, which were in line with the simulation evaluation.