Interferometric SAR Coherence Magnitude Estimation by Machine Learning

Current interferometric wide area ground motion services require the estimation of the coherence magnitude as accurately and computationally effectively as possible. However, a precise and at the same time computationally efficient method is missing. Therefore, the objective of this article is to improve the empirical Bayesian coherence magnitude estimation in terms of accuracy and computational cost. Precisely, this article proposes the interferometric coherence magnitude estimation by machine learning (ML). It results in a nonparametric and automated statistical inference. However, applying ML in this estimation context is not straightforward. The number and the domain of possible input processes is infinite and it is not possible to train all possible input signals. It is shown that the expected channel amplitudes and the expected interferometric phase cause redundancies in the input signals allowing to solve this issue. Similar to the empirical Bayesian methods, a single parameter for the maximum underlaying coherence is used to model the prior. However, no prior or any shape of prior probability is easy to implement within the ML framework. The article reports on the bias, standard deviation and RMSE of the developed estimators. It was found that ML estimators improve the coherence estimation RMSE from small samples ($2 \leq N < 30$) and for small underlaying coherence compared to the conventional and empirical Bayes estimators. The developed ML coherence magnitude estimators are suitable and recommended for operational InSAR systems. For the estimation, the ML model is extremely fast evaluated because no iteration, numeric integration or bootstrapping is needed.


I. INTRODUCTION
I N RECENT years, SAR interferometry (InSAR) has developed rapidly and now allows continuous monitoring of subtle deformations of the Earth's surface with millimeter accuracy [1], [2], [3]. There is an increasing number of wide area operational services such as the European Ground Motion Service [4], [5], [6] and the Ground Motion Service Germany [7], [8], [9] that make the deformation maps freely available and, thus, widely visible. For their production, the coherence magnitude is an essential estimate, since this is the crucial weighting [10] in all estimation methods based on distributed scatterers. Due to the large amount of data and the significance, there is an actual need to estimate this parameter as accurately and computationally effectively as possible. Technically, the task is to estimate the population parameter coherence magnitude from a sample of size N . However, the challenges are the bias and variance of the estimate, which are large for small coherences and small sample sizes, and the high computational cost of using more precise methods. This article proposes the interferometric coherence magnitude estimation by machine learning (ML). ML has made a lot of progress in recent years and has already found numerous applications in radar remote sensing [11], but not for the direct estimation of this parameter.
Practically, coherence magnitude estimation is a special kind of statistical inference. Conventional parametric methods are maximum likelihood estimation (MLE) and Bayesian techniques. Basically, bootstrapping is a nonparametric approach to statistical inference. And, indeed, ML can be considered to be another nonparametric and automated statistical inference.
Estimating the coherence magnitude has long been an active research topic. It stems from the fact that all state-of-the-art InSAR processing methods using distributed scatterers require precise coherence estimates [2], [3], [12], [13]. As pointed out by Zebker and Villasenor [14] as well as Just and Bamler [15] it is a proxy for the signal to noise ratio. Fundamental work on the underlying statistical models are contributed by Goodman [16] and Touzi et al. [17], [18] and is the basis for all conventional parametric methods. The sample estimator for the coherence magnitude is universal. Therefore, it is implemented by default in operational InSAR systems. It has been comprehensively studied by Touzi and Lopes [17] and its characteristics are well known.
Different techniques have been developed to improve the estimation accuracy. Touzi et al. [18] proposed the inversion of the functional relation between the first moment of the sample coherence magnitude estimate and the true coherence. For the applicability of this method, the authors state that the number of samples must be sufficiently large. Zebker and Chen [19] published a bias correction by fitting a polynomial to coherence estimates of simulated data as a function of the true correlation and the number of looks in the estimate. Another bias mitigation has been published by Abdelfattah and Nicolas [20] based on the logarithm of the sample coherence named second kind statistic. The first nonparametric approach has been published by Jiang et al. [21] with the double bootstrapping. It is computationally demanding and the double bias correction introduces extra This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ estimation variability, which can be observed by a high estimator root mean square error (RMSE). Recently, the empirical Bayesian method has been published [22]. For the first time, the inclusion of prior knowledge is demonstrated. The empirical Bayesian method improves the coherence magnitude estimation with respect to bias and standard deviation measurably in a low estimation RMSE. Most improvements related to small sample sizes and low coherences, which is advantageous in repeat pass InSAR.
Up to now, the coherence magnitude estimation based on ML has not been studied and published. Applying ML in this estimation context is not straightforward. The number and the domain of possible input processes is infinite and it is not possible to train all input signals. It has also not yet been shown that prior knowledge can be used to support and improve the ML prediction. Therefore, specific objectives of the article are summarized as follows: 1) provide the principle and methods of the ML estimation framework; 2) demonstrate the coherence magnitude estimation by ML; 3) demonstrate inclusion of prior information to support the estimation; 4) characterize the estimation for small sample sizes by bias, standard deviation, and RMSE; 5) check if this technique is suitable in terms of performance for operational systems; 6) compare the performance with the sample estimator but also with the empirical Bayesian methods; 7) Check whether the implementation is independent of the ML method used. The rest of this article is organized as follows. Section II describes the methods, i.e., the principle and components of the developed ML framework. Simulation results are provided for different types of prior and sample sizes in Section III. This section also includes a proof of concept using real Sentinel-1 data. In Section IV, the characteristics of these methods are discussed. Finally, Section V concludes this article.

II. METHODS
The variable x k,i = a k,i exp(jδ k,i ) denotes the single look complex (SLC) SAR scene pixel with index k = 1 for the primary and k = 2 for the secondary scene. i is the pixel index within a statistically homogeneous area with N independent and identically distributed (i.i.d.) samples. If the scattering surface is rough with respect to the radar wavelength, the data are modeled by a stationary complex circular Gaussian (CCG) process as stated by Goodman [16] and Just and Bamler [15].
In practice, the sample coherence magnitudeγ s is the universal coherence estimator for CCG signals and corresponds according to Touzi et al. [18] to the MLE of the underlaying coherence magnitude γ (1) The coherence magnitude has the domain {γ | 0 ≤ γ ≤ 1}. This article intends to develop a new method for estimation of the coherence magnitudeγ based on the random CCG processes X 1 and X 2 with specific realizations x 1 and x 2 The ML approach results in a nonparametric method. Because input samples are mapped to a continuous output value, this ML task corresponds to the regression problem in contrast to classification. Fig. 1 visualizes the basic principle and components of the development. Initially, nonparametric estima- are automatically generated from simulations and by supervised learning for any practically occurring number of samples N . In the operational system, this estimatorf N (.) is then used with N single look i.i.d. interferometric samples. It implies to reverse the typically present oversampling and spectral weighting of the SAR data. This does not have to be done individually per statistically homogeneous area, but is better calculated only once for each SAR scene. Such preprocessing into SLC SAR data is not a disadvantage of this technique in particular. It is also necessary for all other coherence magnitude estimation methods. In fact, all known estimators work with i.i.d. samples, where independence implies zero autocorrelation of samples within the primary and secondary channel. In case of autocorrelation, the spatial arrangement of the samples (for InSAR on the 2-D grid) would have to be taken into account by the estimators. To illustrate typical effective number of looks, Sentinel-1 acquired with Interferometric Wide swath mode beam IW2 is chosen as an example. An area of 5 azimuth times 4 range samples corresponds to N = 9 and 6 azimuth times 7 range samples reduce to N = 20 independent samples. Both system components are described in the following.

A. Generation of Estimators
For all required sample sizes N , ML provides a representative nonparametric modelf N (.). That means, there is no assumption about the function shape and the internal dependencies of the extracted features. As a result, a previously unknown number of internal parameters is required to represent the model and, accordingly, a lot of training data and computational effort  are necessary for the learning. However, this does not pose a problem, since the corresponding data can be simulated in practically any quantity and the theoretically infinite number of possible variants of input data can be restricted in terms of quantity.
1) Simulation: As pointed out by Goodman [16] and Just and Bamler [15], we can limit ourselves to CCG signals for medium resolution SAR. Starting point is the 2 × 2 covariance matrix Σ, which describes the relation of the respective CCG processes X 1 and X 2 . It is defined by the simulation parameters a 1 , a 2 , which are the CCG processes' expected amplitudes, and the complex coherence ρ = γe jφ 0 . This term is substituted into (3) and the coherence magnitude γ simu is substituted for γ and is also used as the ML label for the respective simulated data The matrix above contains the expected intensities on the diagonal and the covariances on the off-diagonal. φ 0 is the true interferometric phase.
First of all, the square, positive definite and Hermitian covariance matrix Σ is decomposed The superscript H denotes the conjugate transpose of the complex matrix. Practically, this operation can be performed using singular value decomposition (SVD), Schur decomposition, or Cholesky decomposition. The SVD The Schur decomposition produces an orthonormal matrix Q and an upper triangular matrix T QT = Schur(Σ).
With the Cholesky decomposition the product of a lower triangular L and its conjugate transpose matrix L H arises. In this case A = L. In case, the library provides the upper triangular matrix U such that Next, a complex matrix Z ∈ C 2×N of independent CCG random variables is created. N (0, 1 / √ 2) denotes the normal distribution with zero mean and standard deviation 1 / √ 2. The simulated interferometric data pair corresponds to the complex matrix S ∈ C 2×N calculated by This principle of transforming the covariance matrix Σ into an interferometric data pair S can be applied to the simulation of InSAR data stacks. The dimensions of the covariance and the CCG matrix Z need to be increased accordingly. For the described framework, it is not necessary and interferograms are simulated from independent 2 × 2 covariance matrices. All three decompositions were implemented and finally the SVD was used for this article.
2) Encoder: The encoder transforms the input data and has two preprocessing functions: 1) reduce redundancies and 2) convert the input into an advantageous data representation.
Ideally, the signal entering the ML training includes all appropriate features and recognizable patterns, and is a data representation without ambiguity or redundancy. This makes  data sets. Generating and training all this is not realistic. As will be shown shortly, the expected channel amplitudes and the expected interferometric phase cause redundancies in the signal representation. Indeed the correlation coefficient ρ X 1 ,X 2 is independent of change of origin, e.g., by real numbers b and d and scale of the data, e.g., by real numbers a and c This means that by scaling the amplitudes the data are restricted without loss of information to a domain Equation (1) shows that the coherence magnitude γ and the interferometric phase φ 0 , which is optimally estimated by the sample estimatorφ s , are independent of each other. Hence, assuming a stationary phase signal, i.e., residual topography, deformation, and atmospheric phase screen are compensated, the expected interferometric phase φ 0 can be estimated from the statistically homogeneous pixels (SHPs) Since only the interferometric phase difference between each i.i.d. sample is used, the expected value can be compensated in the primary scene in advance This transformation eliminates the phase ambiguities δ k,i + K2π and preserves the respective amplitude's Rayleigh PDF of the primary and secondary scene and the statistics of the interferometric phase differences (except for the mean). As a result of the amplitude scaling and interferometric phase compensation, the number of possible input data has now been significantly reduced.
b) Data representation: The encoder converts the CCG input data because the data representation has an impact on the performance of the model. This is due to the fact that there is practically no direct regression from the input variables to the output value. Inside the ML model, attributes that are not visible from the outside are calculated. This internal automatic generation of features is manually supported by the encoding. Two examples for possible CCG data representations are {Re(x 1 ), Im(x 1 ), Re(x 2 ), Im(x 2 )} and {|x 1 |, |x 2 |, arg(x 1 x * 2 )}. Tests have shown that the latter data representation, consisting of the sample amplitudes and expected interferometric phase compensated phase differences, is more advantageous than others. It is apparent that the data dimension is reduced and irrelevant information is removed. The data fully represent all required features and the ML methods can use them directly.
3) ML Training: The ML training learns the features with which the internal model is evaluated to return the coherence magnitude estimate. According to James et al. [23], the general form of ML regression isγ is an inherent random error term and is named the irreducible error. In this application, it results from the random sampling and the limited sample size N but not from the noise in the data. Practically, every sample is differently representative and corresponds to unmeasured information, which results in bias and variance of the coherence magnitude estimate. Touzi et al. [18] have proven that an unbiased estimator, which is a function of the sample coherence magnitude, cannot be found. It follows that the ML estimator will also have a bias and a variance. In other words, is independent of the input data x 1 , x 2 and can only be mitigated by increasing the sample size N reducing unmeasured information.
ML provides procedures for estimating f (.) based on training data and approximately represents it byf (.). Depending on the ML method,f (.) is represented differently, such as a decision tree, a random forest, or a neural network. According to James et al. [23], the error from the approximationf (.) of a particular ML method is termed reducible error. It can be diminished by choosing an appropriate ML method and, if used, suitable neural network layers as well as optimizing the learning parameters, such as the learning rate and the learning iteration count. In this article, gradient-boosted trees ML is implemented based on the XGBoost library with its C-API developed by Chen and Guestrin [24], [25].
All possible CCG input processes must be simulated for the ML training. To get as close to the real estimation scenario as possible, the amplitudes of the primary and secondary signals and the interferometric phases are modeled in such a way that the encoder works as it will later. In this article, the scenes' expected amplitudes are simulated with uniform likelihoods a 1 ∼ U (0, 2) and a 2 ∼ U (0, 2), and the expected interferometric phase with φ 0 ∼ U (−π, π). For the training of an estimator, 10 8 independent interferograms are generated. In the course of the ML learning, the parameters of the model are tuned to perform best on the given training data. This suggests to add prior knowledge on the underlaying coherence magnitude by adjusting the training data set. In doing so, the fact that ML learns the model from the data is exploited. Training data are generated with a number of observations corresponding to the prior on the underlaying coherence. The assumption is that the ML parameter  tuning then works better for these observed values than with the data, who has not or rarely seen the training. A single parameter γ max is used to model the prior. In the following, this parameter is specified as a subscript at the respective method. a) ML without prior (MLWP): Without prior information, training data are generated with the straight forward characteristic and γ simu is sampled from the uniform distribution γ simu ∼ U (0, 1). With 10 8 simulated interferograms, about 10 6 samples are generated in an interval γ simu ± 0.005. b) ML less strict prior (MLLSP): Fig. 2 shows the distribution of γ simu for the less strict prior. The implementation is based on the inverse cumulative distribution function (CDF) sampling method. It provides one random variate γ simu ∼ P prior (γ max ) from one random sample with distribution u ∼ U (0, 1). The corresponding CDF is This leads to the respective inverse CDF c) ML strict prior (MLSP): Fig. 3 visualizes the distribution of the underlaying coherence magnitude γ for the strict prior. Consequently, the respective training data are generated with γ simu sampled from the uniform distribution γ simu ∼ U (0, γ max ).

B. Estimation of Coherence Magnitude
N interferometric samples are input to the operational coherence magnitude estimation. These data are transformed according to (12) and (14), i.e., 3 × N real values, encoded by  prediction modelf N,p (.). Once again, all phases are the expected interferometric phase compensated phase differences. The model is extremely fast evaluated because no iteration, numeric integration or bootstrapping is needed. The estimated coherence magnitudeγ p with {p | MLWP, MLLSP γ max , MLSP γ max } is deterministic, i.e., one and the same input data result in one and the same estimate.

III. RESULTS
In this section, the estimation characteristics obtained with gradient boosted trees ML implemented using the XGBoost library [24], [25] are presented. Based on the fact that the estimation from a small sample size is the critical problem, priority is put on such test cases, i.e., N = 2, N = 3, and N = 9.
The results show, the intuitively introduced Bayesian principle works. Any likelihood of prior can be implemented. In contrast to the empirical Bayesian approach [22], no insoluble integral has to be solved and replaced by computationally ineffective numerical integration.  In the following, the characteristics of the estimators are compared with each other and the universally applicable sample estimator (1) is taken as the reference. Generally, the bias γ bias = E{γ * − γ true }, the standard deviation γ σ = E{(γ * − E{γ * }) 2 }, and the RMSE γ rmse = E{(γ * − γ true ) 2 } for {γ * |γ s ,γ MLWP ,γ MLLSP ,γ MLSP } are relevant quality criteria of estimators. For the coherence magnitude estimation, these properties are functions of the underlaying true coherence magnitude γ true . This is the reason, the parameters above are estimated 101 times for each plot with {γ true | 0, 0.01, . . ., 1}. 10 6 simulations are performed for each data point γ true and each method is applied on the one and the same data set per analysis. For the test cases shown below, the prior parameter γ max = 0.6 is chosen because it is a typical value in InSAR. In the plots below, the MLSP curves end at an underlaying coherence of 0.6. It is apparent, a strict prior assumes zero probability outside of this range. However, it should be noted that the MLSP 0.6 estimator provides also estimates outside of this strict range.

A. Test Case N = 2 Samples
The bias compared in Fig. 4(a) is reduced for small coherences by all ML methods. For a zero coherence, and compared to the sample estimatorγ s , the MLWP reduces the bias from 0.6664 to 0.4374, i.e., by 34.4%, the MLLSP 0.6 reduces the bias to 0.3860, i.e., by 42.1%, and the MLSP 0.6 reduces the bias to 0.2967, i.e., by 55.5%. For the sample estimator, the bias becomes zero at an underlaying coherence of one. Not surprisingly, all newly developed ML estimators, are bias free at much smaller coherences. However, this is achieved at the expense of a larger bias for higher underlaying coherence magnitude values.
The standard deviation γ σ is visualized in Fig. 4(b). Again, the zero coherence is taken as an example. Compared to the sample estimator, the MLWP reduces the standard deviation from 0.2358 to 0.1323, i.e., by 43.9%, the MLLSP 0.6 reduces it to 0.0954, i.e., by 59.5%, and the MLSP 0.6 reduces the standard deviation to 0.0545, i.e., by 76.9%.
The RMSE best describes the estimator performance as it includes the bias and the variance of the estimators γ rmse = γ 2 bias + γ 2 σ . The comparison of the RMSE in Fig. 4(c) confirms the observation from the empirical Bayesian coherence magnitude estimation [22] that the more information is used and the stricter the general prior, the more accurate the estimate will be. Compared to the conventional sample estimator, MLWP is more efficient for all underlaying coherence magnitudes up to 0.68, the MLLSP 0.6 method up to 0.65, and the MLSP 0.6 estimator up to 0.58.

B. Test Case N=3 Samples
The properties and principles from the test case N = 2 are also confirmed in this configuration. The comparison of the bias (γ bias ) is visualized in Fig. 5(a), of the standard deviation (γ σ ) in Fig. 5(b) and of the RMSE (γ rmse ) in Fig. 5(c). For a zero coherence magnitude, the bias related to the sample estimator improves from 0.5333 to 0.3852 by 27.8% for MLWP, to 0.3572 by 33.0% for MLLSP 0.6 , and to 0.2905 by 45.5% for MLSP 0.6 . The standard deviation is reduced from 0.2211 to 0.1528, i.e., by 30.9% using MLWP, to 0.1285, i.e., by 41.9% with MLLSP 0.6 and to 0.08116, i.e., 63.3% with MLSP 0.6 . Compared to the conventional sample estimator, MLWP is more efficient for all underlaying coherence magnitudes up to 0.62, the MLLSP 0.6 method up to 0.61 and the MLSP 0.6 estimator up to 0.55.

C. Test Case N=9 Samples
As the performance of the sample estimator improves with the number of samples, it can be expected that advantages are reduced for other methods. The visualizations of the bias in Fig. 6(a), of the standard deviation in Fig. 6(b) and of the RMSE in Fig. 6(c) confirm this expectation. Accordingly, the reduction in bias is less pronounced. At zero coherence magnitude, the sample estimator has a bias of 0.30. The MLWP reduces the estimation bias by 13.4% to 0.26. Also, the prior has less effect on the bias mitigation compared to test cases with fewer samples. The MLLSP 0.6 improves the bias by 14.0% to 0.2576 and the MLSP 0.6 by 18.4% to 0.2444. A similar characteristic is observed for the standard deviation. For zero coherence, the standard deviation of MLWP even increases by 5.9% from 0.1463 to 0.1548. Some prior helps to mitigate the random variation. The MLLSP 0.6 lessens the standard deviation by 4.9% to 0.1390, and the MLSP 0.6 by 26.5% to 0.1075. Nevertheless, the ML algorithms outperform the sample estimator for small coherence magnitude values. The MLWP is more efficient for all underlaying coherence magnitudes up to 0.43, the MLLSP 0.6 method up to 0.47 and the MLSP 0.6 estimator up to 0.48.

D. Sentinel-1 Application Demonstration
As a proof of concept, the estimator prototype is demonstrated using real Sentinel-1 data in Interferometric Wide swath mode. The primary scene has the orbit number 30 741 and was acquired on January 10, 2020. After 12 days, the secondary scene was recorded. Their orbit number is 30 916 and the observation geometry is characterized by an effective baseline of about 27 m. Without going into details, the oversampling in the input data is reversed and the estimation window of 3 × 3 samples in range and azimuth (i.e., N = 9) overlaps from sample to sample. Fig. 7(a) visualizes the test case with 512 × 512 i.i.d. samples by the radar backscatter amplitude. The coherence magnitude from the sample estimator is visualized in Fig. 7(b). Using identical estimation windows, the respective ML result is shown in Fig. 7(d). In this example, the ML coherence magnitude is estimated locally adaptive with respect to the prior from MLWP, MLLSP 0.6 , MLLSP 0.4 , or MLSP 0.4 . It can be seen that the estimation performance now depends not only on the window size but mainly on the prior and its strictness.
To give an intuitive idea of the effect of different priors and various parameters, Fig. 7(c) visualizes a composition of coherence estimates. In this figure from left to right, the result from the sample estimator, MLWP, MLLSP 0.6 , MLLSP 0.4 , MLSP 0.6 , and MLSP 0.4 can be compared. Similar coherence magnitudes are observed for all but the last two columns. It follows that the less strict prior can robustly cope with an underlying coherence greater than the prior parameter γ max .

IV. DISCUSSION
In the section above, test cases using gradient boosted trees are presented. The question arises whether other ML methods provide similar results. This is the reason, additional prototypes for the coherence magnitude estimation based on neural networks and random forests are assessed. By similar graphs, Fig. 8 demonstrates for the test case with N = 3 samples that the developed framework is robust with respect to a particular ML method. However, in the course of development it turned out that the encoder has a significant influence on the estimation performance. To illustrate this, Fig. 9 compares the RMSE for N = 3 samples with an unfavorable encoding {Re(x 1 ), Im(x 1 ), Re(x 2 ), Im(x 2 )} between gradient boosted trees ML, neural network ML, and random forrest ML. The comparison with Fig. 8 shows that the ML methods have different robustness with respect to the encoding of the input data. For this application, neural networks are able to handle complex input data without encoding.
Unexpectedly, Fig. 5(c) shows a similarity of the ML methods in the RMSE characteristics with the empirical Bayesian methods [22, Fig. 8(f)]. Some comparative plots will show which method performs better in terms of RMSE. The test case with N = 3 serves as a demonstration. First, Fig. 10 compares the Bayesian and the ML estimation without prior. Second, Fig. 11 benchmarks the Bayesian and the ML estimation with less strict prior γ max = 0.6, and third, Fig. 12 visualizes the Bayesian and the ML estimation with strict prior γ max = 0. 6.
The examples demonstrate, ML improves the coherence magnitude estimation compared to the empirical Bayesian estimation for small coherences and small samples. In the field of InSAR, the precise estimation of small coherence magnitudes is the challenge. This is one of the reasons, the newly developed ML methods are recommended for operational systems. Another is the computational efficiency compared to the empirical Bayesian estimation. On a laptop, the ML prototypes perform 10 5 estimates in less than 10 s. In contrast, the empirical Bayesian methods need more than half an hour for the same number of estimates. The computing performance is achieved at the expense of a high training effort. Each estimator took three days to train on a small laptop.
Above, the estimation characteristics are demonstrated for small sample sizes. For N = 15, the RMSE of the ML methods is still better compared to the sample estimator. The respective test case is visualized in Fig. 13. The MLWP is more efficient for all underlaying coherence magnitudes up to 0.40, the MLLSP 0.6 method up to 0.44 and the MLSP 0.6 estimator up to 0.46. However, the test case N = 30 demonstrates in Fig. 14, only the MLSP 0.6 performs better than the sample estimator. As indicated by [22, Fig. 14(c) and (f)], the empirical Bayesian estimators are recommended for this test case. The plots show that a hybrid approach has to be developed in order to obtain the best estimate for an operational application. Such an approach should include the sample estimator, the ML methods, and the Bayesian methods [22]. As pointed out by a reviewer, the respective coherence magnitude estimate would be always better or equal to the sample estimator in terms of RMSE.
One reviewer argued that N = 2 is too low a sample count. It is a well-known fact; the concept of coherence is not relevant to individual samples requiring N > 1. The reason is that the coherence is a statistical quantity since an expected value has to be calculated as indicated by [22, eq. (3)]. Practically, two samples are sufficient for the expected value to be meaningful. For this reason, the sample estimator (1) can be evaluated for N = 2 samples. Accordingly, if (1) can be evaluated meaningfully for two samples, then it can also be predicted by ML and is demonstrated in the section above. For the Bayesian estimation [22], it was surprising that N = 2 works because the used conditional probability density function (pdf) (see [22, eq. 12]) was reported by Touzi [17] to be valid for N > 2. The newly developed ML method is nonparametric and does not depend on this pdf. As a consequence, this restriction is not relevant. Interestingly, the practical demonstration of the empirical Bayesian estimator in [22] has shown that the conditional pdf also works for N = 2.
As already stated in [22] for the empirical Bayesian techniques, the demonstrated ML methods support typical InSAR scenarios. First, the MLWP improves the estimation without prior knowledge and is generally applicable. Second, MLLSP and MLSP include an assumption on the maximum coherence magnitude γ max of the underlaying true coherence. Such information is available in InSAR based on stacks of interferograms. For example,γ max can be estimated from an initial coherence matrix [3, Fig. 5], which can straight forward be converted into the best possible coherence as a function of acquisition time difference. Depending on theγ max accuracy and the likelihood that the underlaying coherence is aboveγ max , the less strict or strict prior should be selected. The strict prior limits the estimates inside the assumed range and the less strict prior favors estimates in this range. In principle, any shape of prior can easily be implemented in the developed framework.
Practically, the greatest gain in precision is achieved with the strict prior. Not surprisingly, the more restrictive the prior, the better the estimation performance. Fig. 15 compares the estimators MLSP 0.6 and MLSP 0.4 using N = 9 as an example. It shows that the strict prior with a small γ max should always be preferred. However, a reliableγ max value for the application is crucial.

V. CONCLUSION
The developed ML coherence magnitude estimators are suitable and recommended for operational InSAR systems. First, they improve the estimation performance compared to the conventional sample estimator and to the empirical Bayesian estimators [22]. Especially, the estimation of small coherence magnitudes from a small sample is improved. Second, the framework supports any shape of Bayesian prior on the underlaying coherence magnitude. In this manuscript, the Bayesian prior is modeled with a single parameter (γ max ). Less strict and strict assumptions on the range of the underlaying coherence magnitude can be modeled and are demonstrated. Both types of prior correspond to typical InSAR scenarios. Third, the estimation is computationally extremely fast evaluated because no iteration, numeric integration or bootstrapping is needed. Forth, the implementation is straight forward because of the availability of many ML libraries. In this manuscript, the implementation utilizes XGBoost developed by Chen and Guestrin [24]. If a suitable encoder is used, the estimation results are independent of the ML method used.
A limitation of the newly developed ML methods is that the performance improves only for low coherences and for sample sizes N < 30.
The developed estimators are not limited to InSAR, but are generally applicable to coherence estimation problems from CCG processes.

ACKNOWLEDGMENT
The author would like to thank the anonymous reviewers and is also grateful for the positive and very thorough comments and suggestions that helped improving this manuscript. This article contains modified Copernicus Sentinel data 2020, processed by ESA.