Deep Injective Prior for Inverse Scattering

In electromagnetic inverse scattering, the goal is to reconstruct object permittivity using scattered waves. While deep learning has shown promise as an alternative to iterative solvers, it is primarily used in supervised frameworks which are sensitive to distribution drift of the scattered fields, common in practice. Moreover, these methods typically provide a single estimate of the permittivity pattern, which may be inadequate or misleading due to noise and the ill-posedness of the problem. In this paper, we propose a data-driven framework for inverse scattering based on deep generative models. Our approach learns a low-dimensional manifold as a regularizer for recovering target permittivities. Unlike supervised methods that necessitate both scattered fields and target permittivities, our method only requires the target permittivities for training; it can then be used with any experimental setup. We also introduce a Bayesian framework for approximating the posterior distribution of the target permittivity, enabling multiple estimates and uncertainty quantification. Extensive experiments with synthetic and experimental data demonstrate that our framework outperforms traditional iterative solvers, particularly for strong scatterers, while achieving comparable reconstruction quality to state-of-the-art supervised learning methods like the U-Net.

While inverse scattering is well-posed and Lipschitz stable in theory, when full-aperture continuous measurements are available [6], it becomes a severely ill-posed inverse problem for a finite number of measurements.This means that even a small perturbation in the scattered fields can result in a significant error in the reconstructed permittivity pattern [7].Additionally, the nonlinearity of the forward operator, caused by multiple scattering and amplified by higher permittivity contrasts [7], further complicates the inversion process.All these together make inverse scattering a challenging problem, especially for strong scatterers (objects with large permittivity) and noisy measurements.To address these challenges, an effective regularization technique is necessary to constrain the search space and achieve accurate recovery.
Several optimization-based methods have been proposed to tackle the nonlinearity and ill-posedness of the inverse scattering problem.These include the Born iterative method [8], distorted Born iterative method (DBIM) [9], contrast source inversion (CSI) [10], and subspace-based optimization (SOM) [11].While these methods have demonstrated effectiveness in reconstructing objects with small permittivity variations, they often fall short in accurately reconstructing objects with large permittivity contrasts.These methods typically rely on iterative optimization of a regularized objective, incorporating manually designed regularization terms [7].
Deep learning has achieved remarkable success in inverse scattering.Most deep learning models employed for inverse scattering adopt a supervised learning approach, which trains a deep neural network to regress the permittivity pattern.Some studies [12]- [14] have utilized scattered fields as the input of the neural network.Despite the satisfactory reconstructions [14], these methods are sensitive to changes in the experimental configuration, such as frequency, the number of transmitters and receivers or other real-world factors.Even slight variations in the distribution of scattered fields in test time can lead to a significant degradation in reconstruction quality, requiring costly acquisition of new training data.Back-projections can be used as input to tackle some of these issues [15]- [17].While this approach yields good reconstructions for objects with small and moderate permittivity, due to the non-linearity the quality of back-projections significantly drops in large permittivity leading to a drop in the reconstruction quality [14].Moreover, supervised learning methods are vulnerable to adversarial attacks [18], which is problematic in medical applications [19].Importantly, incorporating the well-established physics of the scattering problem (i.e., the forward operator) to improve the generalization capability is not straightforward in such deep learning models [20]- [25].
To tackle these issues, we propose a deep learning approach to inverse scattering using injective generative models.The proposed method adopts an unsupervised learning framework-the training phase uses only the target permittivity patterns, and the physics of scattering is fully incorporated into the solution.Deep generative models such as generative adversarial networks (GANs) [26], [27], variational autoencoders (VAEs) [28], normalizing flows [29]- [31] and diffusion models [32] belong to a class of unsupervised learning methods and train a deep neural network to transform the samples of a simple (Gaussian) distribution into samples that resemble the target data distribution.Recently, deep generative models (DGM) have been used as a prior for solving inverse problems [33]- [38].By leveraging a trained generator on a dataset of target images (the solutions of a given inverse problem), one can explore the latent space of the generator to find a latent code yielding a solution that aligns with the given measurements.
The choice of generative model is of paramount importance to provide an effective regularization for solving ill-posed inverse problems.While GANs have been used as generative priors for inverse problems [33], [39]- [41], they are unstable in training [42], [43] and result in local minima in iterative approaches [33].Normalizing flows resolve some of these issues [44]- [46], however, they are computationally expensive to train and often do not provide sufficient regularization for highly ill-posed inverse problems.Injective normalizing flows [35], [47], [48], specifically designed for solving ill-posed inverse problems, alleviated these issues; they benefit from a low-dimensional latent space which serves as an effective regularizer for ill-posed inverse problems.In a related work, Guo et al. [49] employed VAEs as generative priors for inverse scattering.
In this paper, we use injective flows as generative priors for full-wave inverse scattering.The proposed approach has a significant advantage: it only requires training on the target permittivity patterns and does not require any training data from scattered fields.Once the generator is trained, it can be used to solve inverse scattering problems in arbitrary configurations.This property endows the model with robustness against distribution shifts in the measurements as well as to adversarial attacks.In contrast to the work of Guo et al. [49], the invertibility of our generator allows us to perform optimization in both latent and data spaces, providing great flexibility in choosing the scattering solver.Additionally, while Guo et al. [49] require a data-driven initialization, our proposed method can leverage both back-projection and data-driven initializations (among others), making it adaptable to different scenarios and reducing dependence on the particularities of a specific starting point.We show that the proposed framework significantly outperforms traditional iterative solvers with reconstructions of comparable or better quality compared to highly-successful supervised methods such as the U-Net [50].
All the aforementioned methods reconstruct a single point estimate from the permittivity pattern given the measurements.A point estimate, however, is often insufficient or misleading due to the ill-posedness of the inverse scattering problem.This limitation can be tackled by applying Bayesian frameworks based on deep learning networks [14], [51] to generate multiple estimates of the permittivity and perform uncertainty quantification (UQ).However, these methods are supervised and suffer from the aforementioned issues.Our second contribution is to leverage our pre-trained injective generator to develop a Bayesian framework that produces multiple estimates of the permittivity pattern enabling the uncertainty quantification.Crucially, the proposed method does not rely on scattered fields during training.As we will discuss in Section V, this framework requires injectivity and is thus not practicable with non-injective generators like GANs or VAEs.This paper is organized as follows.Section II provides a brief review of the forward and inverse scattering problem.In Section III, we present an overview of normalizing flows and injective flows.Our proposed methods for MAP estimation and posterior modeling in inverse scattering are introduced in Sections IV and V. Computational experiments are presented in Section VI.Section VII discusses the limitations of our approach and provides insights into future work.

II. FORWARD AND INVERSE SCATTERING
We begin our discussion with equations governing the 2D forward and inverse scattering problem.We focus on the 2D transverse magnetic (TM z ) case, where the longitudinal direction is along ẑ.As depicted in Figure 1, we consider non-magnetic scatterers with permittivity r situated in the investigation domain D inv , which is a D × D square.The scat-terers are surrounded by a vacuum background with permittivity 0 and permeability µ 0 .The scatterers are illuminated by N i plane waves with equispaced directions, and N r receivers are uniformly positioned on a circle with radius R to measure the scattered fields.The forward scattering problem can be derived from the time-harmonic formulation of Maxwell's equations and can be expressed as follows [52], where E t represents the total electric field which has only the E z component in the TM z case.In addition, k 0 = ω √ µ 0 0 denotes the wavenumber of the homogeneous background, and J corresponds to the contrast current density.The contrast current density, calculated using the equivalence theorem [53], is given by J(r) = χ(r)E t (r), where χ(r) = r (r) − 1 and is referred to as the contrast.Throughout this paper, the timedependence factor exp(iωt) with angular working frequency ω is assumed and will be suppressed for simplicity.
We discretize the investigation domain D inv into N × N units.The state equation can be expressed as, where G d ∈ R N 2 ×N 2 and E t , E i are the total and incident electric fields, respectively; χ is a diagonal matrix with elements χ(n, n) = r (n) − 1 accounting for the contrast in the medium.On the other hand, the data equation is given by, where G s ∈ R Nr×N 2 , E s denotes the scattered electric fields, and δ is the additive noise in the measurements.It is worth mentioning that G d and G s have closed-form analytical expressions [7].
We combine (2) and (3) to obtain a unified expression for the forward model [7], which represents a nonlinear mapping from χ to E s .For convenience, we define a forward operator A that maps χ to E s , where A(•) corresponds to the nonlinear forward scattering operator, with y = E s and x = χ.The objective of inverse scattering is to reconstruct the contrast χ from the scattered fields E s , assuming that G d , G s , incident electric waves E i , and hence the forward operator A(•) are known.In the following section, we will provide a brief overview of deep generative models, focusing specifically on normalizing flows as prior models for inverse problems.

III. NORMALIZING FLOWS
Normalizing flows were introduced by Rezende and Mohamed [54] in the context of variational inference, and by Dinh et al. [30] for density estimation.A normalizing flow f θ is an invertible deep neural network, parameterized by a vector of neuron weights θ, that transforms a simple base distribution, typically a Gaussian, p Z , into the target data distribution p X , or an approximation thereof.By transforming a data sample x back to the latent space z = f −1 θ (x), the likelihood of x can be evaluated as where p Z = N (0, I) and J f θ represents the Jacobian matrix of the neural network f θ evaluated at f −1 θ (x).
Numerous studies have focused on designing invertible neural networks that admit a computationally efficient inverse f −1 θ and log det Jacobian.A staple design block that enables these efficient computations is the so-called coupling layers, introduced by Dinh et al. [29] and further developed in [30].The fact that unlike many other generative models normalizing flows allow for efficient likelihood computation as in (7) enables training based on maximum likelihood (ML), Normalizing flows also have important limitations.They require bijective neural networks with the same data space dimension throughout the model, resulting in large networks and slow training.Furthermore, as the range of the bijective network is unconstrained and covers the entire space, they do not inherently provide strong regularization for solving ill-posed inverse problems.In the following section, we will provide a brief review of injective normalizing flows [35], specifically designed for solving ill-posed inverse problems.

A. Injective Normalizing Flows
While regular normalizing flows have the same dimension in the latent and data space, injective normalizing flows [35], [47], [48] map a low-dimensional latent space to the highdimensional data space using a set of invertible layers.Injective flows retain the advantages of regular normalizing flows, including fast inverses and training based on maximum likelihood.As shown in Figure 2, an injective network f θ (z) = g γ (h η (z)) with weights θ = (γ, η), called a Trumpet, comprises two subnetworks: a bijective part h η that maps R d to R d and an injective part (with expansive layers) g γ that maps R d to R D where d D. Both the bijective and injective subnetworks are composed of revnet blocks.A bijective (injective) revnet block comprises three components: 1) activation normalization, 2) bijective (injective) 1 × 1 convolution, and 3) coupling layers: Latent space Data space

Intermediate space
Bijective part Injective part MOG From latent to data space From data to latent space Fig. 2: Injective normalizing flows [35] comprise two submodules, a low-dimensional bijective flow h η and an injective network g γ with expansive layers.The MOG initialization, z = 0, is illustrated with a red circle in the latent space.
2) 1 × 1 convolution with a kernel w, a) Bijective version: b) Injective version: where w ∈ R cin×cout is a 1 × 1 convolutional filter, which is simply a matrix multiplication along the channel dimension and w † is the pseudo-inverse of w (a nonsquare matrix in the injective dimension-expanding case).3) Affine coupling layer FORWARD: ) The mappings s and b are respectively the scale and the shift networks.
For additional details about the network architecture, please refer to Section A in the appendix.
The training process for injective normalizing flows involves two phases, as initially proposed in [47].In the first phase, we adjust the range of the injective generator by optimizing the weights of the injective subnetwork g γ to align with the training data, where {(x (i) , y (i) )} N i=1 represents the training data and g † denotes the layer-wise inverse of the injective subnetwork.
Once the injective subnetwork has been trained for a fixed number of epochs, we move to the second phase where we train the bijective subnetwork h η by maximizing the likelihood of the projected training samples in the intermediate space (as shown in Figure 2), where ) and p Z = N (0, I).Upon completion of training, we can generate random samples similar to the training data using x gen = f (z gen ), where z gen ∼ N (0, I).Further investigation on the universality of density and manifold approximation of injective flows can be found in [55].
Injective flows, due to their low-dimensional latent space, parameterize a low-dimensional manifold embedded in the high-dimensional data space.During training, this manifold captures plausible samples, making it an effective regularizer for ill-posed inverse problems.The injective part provides a projection operator on the range of g γ as P gγ (x) := g γ (g † γ (x)) which maps the data samples x to the intermediate space by z = g † γ (x) and projects them back to the data space by g γ (z ).Kothari et al. [35] employed this projection operator to project a sample onto the manifold in iterative reconstruction schemes.In the next section, we introduce our methodology for solving inverse scattering problems using injective normalizing flows.

IV. MAP INFERENCE WITH INJECTIVE FLOWS FOR INVERSE SCATTERING
Inverse scattering with partial data is a severely ill-posed inverse problem, which means that a small perturbation in the measurements of scattered fields can result in a significant error in the recovered contrast [7].As discussed in Section II, inverse scattering is a nonlinear inverse problem, with the degree of nonlinearity being strongly influenced by the maximum contrast value.Particularly for objects with large contrasts, the problem becomes highly nonlinear, further increasing the difficulty of the inversion.In such cases, the presence of a robust regularizer that effectively constrains the search space becomes crucially important.
We model the contrast χ = x ∈ X and the scattered fields E s = y ∈ Y as random vectors.For simplicity, we assume that the additive noise δ in ( 5) is a random vector with Gaussian distribution δ ∼ N (0, σ 2 I) although our framework admits other distributions.With this assumption, the likelihood p Y |X can be expressed as, An effective approach for solving ill-posed inverse problems is to compute the maximum a posteriori (MAP) estimate, where we seek the solution x that has the highest posterior likelihood given a measurement y, where p X|Y (x|y) denotes the posterior distribution, representing the conditional distribution of the image of interest given the measurements y.The posterior distribution p X|Y can be computed using Bayes theorem as, which leads to the following expression for the MAP estimate, From ( 14) we get where the first term represents the data-consistency loss while p X (x) denotes the prior distribution of the contrast and yields a regularization term.We additionally insert λ as a hyperparameter to adjust the weight of the regularization term as its value depends on the unknown noise power.In general, estimating the prior distribution p X is challenging, and a commonly used approximation is a Gaussian distribution with zero mean, leading to Tikhonov regularization.However, a Gaussian distribution often deviates significantly from the true prior, resulting in poor reconstructions.
This paper explores a data-driven regularization in inverse scattering based on deep generative models.We leverage a training set of contrast patterns {x (i) } N i=1 and train a deep generative model x = f (z) to produce samples from (approximately) the same distribution as that of the training set.By sampling from a Gaussian distribution in the latent space z ∈ Z, we expect the trained generator f to produce plausible contrast samples.This property of deep generative models makes them an effective regularizer for solving inverse problems [33], [49].
In this paper, we employ injective flows as a generative prior due to their suitability for addressing ill-posed inverse problems [35].We perform optimization in the latent space to find the latent code which produces a permittivity pattern compatible with the measurements y.The optimization problem can be formulated as follows, where the regularization term log p X is approximated via (7).The reconstructed contrast is then obtained as x MAP = f (z MAP ).
We call this method latent space optimization (LSO).We note that (19) has been previously proposed by [44], [45] for solving compressed sensing inverse problems using regular normalizing flows.
Unlike the supervised learning methods for inverse scattering [13], [15]- [17], which rely on paired training sets of contrast and scattered fields {(x (i) , y (i) )} N i=1 , our framework is unsupervised, without the need for scattered fields during training.This eliminates the need to retrain the model when the distribution of scattered fields changes due to variations in the experimental configuration.Once the injective generator is trained on the contrast samples, we can directly optimize (19) for new measurements to reconstruct the corresponding contrast.In addition, our proposed method fully leverages the underlying physics of the scattering problem by optimizing over the complex-valued scattered fields in (19).Kothari et al. [56] have demonstrated that incorporating wave physics into the neural network architecture can significantly enhance the quality of reconstructions, particularly for out-of-distribution data.
Invertibility of the injective generator allows us to use an alternative method for (19) proposed by [35] for linear inverse problems.This method performs the optimization directly in the data space.We call this method data space optimization (DSO) and formulate it as follows, where g(g † (x)) represents the projection operator described in section III-A.Similar to LSO, the second term log p X can be approximated using (7) and acts as an additional regularizer.
In LSO the reconstructed point x = f (z) always lies on the learned manifold; this is not the case for the DSO method, where the reconstructed image may deviate from the manifold.On the other hand, as we discuss next, DSO offers more flexibility in the choice of the initial guess.
The choice of initial guess is important for inverse scattering solvers.A poor initialization may result in convergence to poor local minima due to nonlinearity.A good initial guess facilitates efficient convergence to good minima.The authors of [9] used Born approximation as the initialization for the distorted Born iterative method (DBIM).A back-propagation (BP) solution was also used in [10], [57] as an initial guess of the contrast source inversion (CSI) method.Figure 3 shows the ground truth, back-propagation (BP), and Born approximations (BA) for an object with different maximum r values.While BP and BA may yield satisfactory results for objects with small permittivity, their performance sharply drops for large r (especially numerically) which makes them a poor initialization for strong scatterers.
In order to circumvent this issue, we adopt a datadriven initialization suggested in [44]; mean of the Gaussian distribution (MOG) in the latent space which is set to 0. The MOG initialization z = 0, depicted in Figure 2, provides a fixed initialization with respect to the measurements (scattered fields); thereby being independent of the maximum contrast value and the problem configuration.This property leads to more robust convergence in both ( 19) and ( 20) even for objects with large permittivity.While the DSO method can be initialized with both BP and MOG, the LSO should exclusively be initialized with MOG.This is due to the possibility of BP being significantly distant from the range of the injective network, making inversion to the latent space infeasible.In section VI, we will show that the MOG significantly improves the quality of the reconstructions compared to BP, especially for strong scatterers.

V. POSTERIOR MODELING AND UNCERTAINTY QUANTIFICATION
Due to ill-posedness, there are an infinite number of contrasts that are consistent with the measurements within the noise level.These diverse solutions can lead to different scientific interpretations, highlighting the need to characterize their distribution.Relying on a single estimate, such as the MAP estimate obtained in the previous section, fails to reflect the inevitable uncertainty and pinpoint features recovered only with low confidence.To address this drawback of point estimates, we adopt a Bayesian perspective.Rather than solely computing the MAP estimate, we approximate the full posterior distribution p X|Y introduced in (16).By doing so, we are able to generate many posterior samples which explore plausible permittivity patterns.
The computation of the posterior distribution, as stated in (16), involves the integral x p X,Y (x, y)dx which is intractable for high-dimensional imaging problems.Variational inference [58], [59] is a promising framework that approximates the posterior distribution p X|Y (x|y) by defining a class of distributions q X (x; ψ) parameterized by ψ.The goal is to find the optimal ψ that ensures the "closeness" between q X (x; ψ) and p X|Y (x|y) for a given y.Examples of such approximators include Gaussian mixture models and distributions induced by deep generative models.
In variational inference, a commonly used measure of fit is the Kullback-Leibler (KL) distance, KL(q p) = X q(x) log q(x) p(x) dx We optimize ψ to minimize the KL distance between q X (x; ψ) and p X|Y (x|y) for a given y, ψ * = arg min ψ KL(q X (x; ψ) p X|Y (x|y)).
Sun et al. [60] parameterized q X (x; ψ) using an untrained normalizing flow through (7) and directly performed the optimization over the network's weights.
We propose to leverage our pre-trained injective flow f θ as a prior to approximate the posterior distribution.Our approach relies on the following principle: when we apply an injective mapping to the distributions Q and P , resulting in new distributions Q and P , respectively, the KL distance between Q and P remains the same as the KL distance between Q and P (for the formal theorem and the proof, refer to Section B in the appendix).This property of injective mappings motivates us to approximate the posterior distribution in the latent space instead of the data space.Consequently, we minimize the KL distance between q Z (z, ψ) and p Z|Y (z|y) as follows, where p Z = N (0, I) represents the prior distribution introduced in (7).We consider β as a hyperparameter to control the diversity of the posterior samples as its value depends on the unknown noise power.

Ground truth
Projections on the manifold Generated samples Fig. 5: Performance evaluation of the trained injective flow on ellipses dataset; ground truth contrasts, their projections on the learned manifold and generated samples.Now we must select our posterior approximator q Z (z, ψ).While previous works [35], [61] used an additional normalizing flow to model q Z (z, ψ), we use a Gaussian distribution for simplicity and computational efficiency.Specifically, we define q Z (z, ψ) = N (z; µ q , diag(σ q )), where ψ = (µ q , σ q ) represents our variational parameters.This Gaussian parameterization of q Z (z, ψ) simplifies the KL term in ( 22) since there exists a closed-form expression for the KL distance between two Gaussian distributions, where µ q (i) and σ q (i) denote the ith element of µ q ∈ R d and σ q ∈ R d , respectively.Furthermore, since we have already obtained the MAP estimate in the latent space through (19), we set µ q = z MAP and only optimize σ q .
We cannot directly optimize (22) using gradient-based methods since optimization variables are inside the expectation.We thus use the reparameterization trick [28], [62], letting z = z MAP + σ q t, where t ∼ N (0, I) and denotes the element-wise multiplication.By substituting ( 23) into ( 22) and incorporating the above reparameterization, To evaluate the expectation, we compute the average over K iid samples drawn from the standard normal distribution, Once we obtain the optimal σ * q , we can generate posterior samples x post = f (z MAP + σ * q t) where t ∼ N (0, I).
Additionally, we can evaluate the empirical minimum meansquared error (MMSE) estimate and the associated uncertainty by calculating the pixel-wise average and standard deviation over multiple posterior samples.

VI. COMPUTATIONAL EXPERIMENTS
We assess the performance of the proposed methods for MAP estimation and posterior modeling on synthetic and experimental data.We train the model on two synthetic largescale datasets: 1) MNIST [63] with 60000 training samples in the resolution N = 32, and 2) a more challenging dataset we generated comprising 60000 training samples with resolution N = 64 of overlapping ellipses used in [14].Figure 5 shows example test contrasts, their projections on the learned manifold, and the samples generated by the injective network, verifying the ability of the model to produce outputs of good quality.For additional details about the network architecture and training, please refer to Section A in the appendix.

A. Synthetic Data
In experiments with synthetic data, the task is to reconstruct the test samples from MNIST and ellipses datasets that have not been "seen" by the injective network during training.We use N i = 12 incident plane waves and N r = 12 receivers, uniformly distributed on a circle with radius R = 20 cm around the object with maximum permittivity r and dimension D = 20 cm.The working frequency is 3 GHz and we added 30 dB noise to the measurements of the scattered fields.a) MAP estimation: We conduct a comprehensive evaluation of the DSO and LSO methods.We consider the MOG and BP initializations for DSO while only using the MOG initialization for LSO.We compare the performance of our proposed methods with a traditional iterative method, DBIM [9].While our approach is unsupervised so that the scattered fields are not used during training, we also compare its performance with a supervised learning method, the U-Net [50], which has enjoyed tremendous empirical success in a variety of imaging inverse problems including inverse scattering [16].The U-Net takes the BP image as input and regresses the corresponding permittivity.
We have fully implemented the forward operator in Tensorflow [64], enabling efficient GPU utilization for parallel reconstruction of multiple samples.Moreover, it allows us to use a variety of optimizers provided in Tensorflow including Adam [65] and L-BFGS [66].In these experiments, we optimize ( 19) and ( 20) using the Adam optimizer with a learning rate of 0.05 for 300 iterations as it leads to more accurate reconstructions compared to L-BFGS.We set λ = 0.01 for BP and λ = 0 for MOG.For the MOG initialization, we begin from high-likelihood regions (mean of the Gaussian), viewed as a hidden regularizer and we thus set λ = 0. Figure 4 illustrates the MOG initializations for ellipses and MNIST datasets.
Figure 6 shows the performance of various methods for r = 4 using 5 test samples from MNIST and ellipses datasets.While DBIM falls short in this challenging task with a high contrast and 30 dB noise, DSO and LSO exhibit much better reconstructions.Moreover, the MOG initialization, as expected, yields superior reconstructions compared to BP. Notably, LSO outperforms DSO, demonstrating the advantages of running optimization in the latent space as discussed in Section IV.
Despite not utilizing scattered fields during the training phase, LSO produces reconstructions of comparable or even superior  As discussed in Section IV, the maximum r of the object plays a significant role in the performance of inverse scattering solvers.Figure 7 shows the performance of various methods across different maximum r values on MNIST.This analysis shows that LSO, combined with the MOG initialization, remains effective even for objects with high r , which highlights the significance of data-driven initialization and optimization in the latent space.
Regarding the computational efficiency, we used a single Tesla V100 GPU for training and solving the inverse scattering problem where each iteration of LSO (or DSO) takes 0.08 seconds at the resolution of N = 32 and 0.25 seconds at the resolution of N = 64.Although good estimates can be obtained with much fewer iterations, we empirically determined that 300 iterations ensure good convergence.b) Posterior Sampling and UQ: As explained in Section V, we approximate the posterior distribution of contrast as a pushforward of a Gaussian around the MAP estimate in the latent space; the covariance is chosen to obtain the best variational approximation of the posterior in the sense of the KL divergence.We use the MAP estimate obtained from the LSO method in the previous section and optimize (25) using the Adam optimizer with a learning rate of 0.01.The initial value for σ q is set as an all-one vector, and we use K = 25 random samples drawn from the standard Gaussian in each iteration.To compute the MMSE estimate and UQ, we calculate the pixelwise average and standard deviation over 25 posterior samples.Figure 8 showcases 4 posterior samples along with UQ and MMSE estimates for β = 0.01 and β = 0.05.As expected, larger β values lead to more diverse posterior samples.The UQ map identifies regions with higher uncertainty visually represented in red.This information is highly valuable for conducting a more thorough and informed analysis.Finally, the MAP estimate is sharper than the MMSE as expected.c) Generalization: In this section, we evaluate the generalization performance of the proposed method under outof-distribution changes in the permittivity patterns.We train injective flows exclusively on MNIST digits 0-5 and use the remaining digits for testing.The LSO solver is configured with the same setup as in the previous section.Figure 9 shows the posterior samples, UQ, MMSE, and MAP estimates for two test samples of digits 6 and 8 with β = 0.05.This experiment clearly shows the effectiveness of the proposed method in handling out-of-distribution data.We should point out that there exists a trade-off between regularization power and generalization performance, governed by the dimension of the latent space.Larger latent space dimensions yield better generalization but less effective regularization.This has also been observed in regular normalizing flows, where matching dimensions in the latent and data space result in excellent generalization over out-of-distribution data but less effective regularization [44], [46].

B. Experimental Data
We finally evaluate our proposed model on FoamDielExt and FoamTwinDiel: real experimental data for two phantoms provided by the Institute Fresnel in Marseille, France [67].In these experiments, there are N i = 8 transmitters and 241 receivers located on a circle with radius R = 1.67 m.Out of those, we only use N r = 20 receivers to make the inversion more challenging.Additional details about the setup are discussed in [67].As shown in Figure 10, FoamDielExt and FoamTwinDiel consist of dielectric cylinders in a vacuum background.We use the measurements at the working frequency of 3 GHz, and the side length of the investigation domain is D = 20 cm.
We use two pre-trained injective flows on the ellipses dataset for resolutions N = 32 and N = 64.The inverse scattering problem is solved using (19) for MAP estimation and (25) for posterior modeling.We added the total-variation (TV) regularization term to (19) and (25) to further improve the quality of the reconstruction.The TV-norm multiplier is 0.1 and 0.08 for resolutions N = 32 and N = 64, respectively.Figure 11 shows posterior samples, UQ, MMSE, and MAP estimates.Despite the idealized forward operator and the substantial dissimilarity between the ground truth (two or three circles) and the training data (combinations of four ellipses with random positions and contrasts), the proposed framework produces satisfactory reconstructions.This experiment illustrates the robustness of the proposed method to noise and variations in experimental configuration.It also showcases the importance of posterior modeling: while the MAP and MMSE estimates in Figure 11a wrongly reconstruct the larger circle as compared to the ground truth, the uncertainty maps clearly signal that this part of the recovered contrast is not reliable.
b S 0 Y s 5 l 9 + C P j 4 w f q 3 p a D < / l a t e x i t > ✏r = 1.45 ± 0.15 < l a t e x i t s h a 1 _ b a s e 6 4 = " m p l b S 0 Y s 5 l 9 + C P j 4 w f q 3 p a D < / l a t e x i t > ✏r = 1.45 ± 0.15 < l a t e x i t s h a 1 _ b a s e 6 4 = " m p l

VII. LIMITATIONS AND CONCLUSIONS
We proposed a data-driven framework for inverse scattering using an injective prior.The proposed method fully exploits the physics of wave scattering while benefiting from a data-driven initialization resulting in a powerful solver even for objects with a large contrast.The invertible generator admits optimization in both latent and data space and uses either a data-driven initialization or a back-projection.We showed that optimization in the latent space and with the latent Gaussian center as the initial guess significantly outperforms traditional iterative methods and even gives reconstructions comparable to a strong supervised method, the U-Net.

Limitations and Future Works:
The proposed framework has several key limitations.It requires running an iterative method at test time, which is slow and impractical for real-time applications.Moreover, iterative methods can converge to local minima even with clever initialization.To speed up convergence, one may consider a more accurate initial guess by exploiting physics in the data-driven initialization via a combination of traditional backprojection (like BP) and data-driven initializations (like MOG).Furthermore, while the L-BFGS optimizer didn't improve the convergence rate in our experiments, other Newton's family optimizers may improve the convergence rate as shown in [49].Additionally, forcing the reconstruction to be within the range of an injective flow can introduce undesired bias and artifacts in certain applications.Recently, Hussein et al. [41] optimized the generator weights with a small rate after finding the optimal latent code in (19) to further improve the reconstructions; this idea might be adapted to our framework.We leave addressing these limitations for future work.

A. Network Architecture and Training Details
The injective subnetwork g γ is composed of 6 injective revnet blocks described in Section III-A, each increasing the dimension by a factor of 2. To enhance the expressiveness of  the model, we insert 36 bijective revnet blocks between them.We choose a latent space of dimension 64 which provides a compression rate of 98.5% for resolution N = 64 and 93.7% for resolution N = 32.The bijective subnetwork h η is constructed using 20 bijective revnet blocks.
We normalize the training data between 0 and 1 before training the model.We then multiply the output of the trained network by the maximum contrast of the dataset before using it as the generative prior.We train the injective subnetwork g γ for 150 epochs to ensure the training samples (contrast patterns) align with the generator's range.Following this, we train the bijective subnetwork h η for 150 epochs to maximize the likelihood of the training samples in the intermediate space.

B. The Invariance of KL Distance under Injective Mappings
Lemma.We assume probability distributions q Z and p Z have the same support.We let q X = f # q Z and p X = f # p Z where f # p denotes the pushforward of p via mapping f , i.e., for every x from p, f (x) is a sample from f # p 1 .If f is injective then it holds, KL(q X p X ) = KL(q Z p Z ) 1 For simplicity we lightly abuse notation by identifying a probability measure and its density.where z = f † (x) and is valid for x ∈ Range(f ).Now, we can compute the KL distance in the data space as follows, KL(q X p X ) =E x∼q X [log q X (x) − log p X (x)] which establishes the lemma.

Fig. 1 :
Fig. 1: The setup for the inverse scattering problem, red arrows show the incident plane waves; the green circles are the receivers.

Fig. 3 :Fig. 4 :
Fig. 3: Performance analysis of the Back-Propagation (BP) and Born Approximation (BA) methods across objects with different maximum r values.While both BP and BA reconstructions are visually meaningful for small r , their performance significantly deteriorates for objects with larger r .

64 Fig. 6 :
Fig. 6: Performance comparison of different methods for objects with maximum r = 4.

Fig. 7 :
Fig. 7: Performance of various methods across objects with different maximum r values on the MNIST dataset.
4 9 F 4 N t 6 M 9 0 n r g j G d 2 Y M / M j 5 + A P F k l g U = < / l a t e x i t > ✏r = 3.0 ± 0 t e x i t s h a 1 _ b a s e 6 4 = " t + s 6 Q

Fig. 11 :
Fig. 11: Posterior samples, UQ, MMSE, and MAP estimates for experimental Fresnel data.The uncertainty maps clearly signify the importance of posterior modeling by assigning higher uncertainty to wrongly reconstructed areas (red regions).

TABLE I :
Performance of different methods for solving inverse scattering ( r = 4) averaged over 5 test samples.
Table I lists the numerical results in PSNR and SSIM averaged over 5 test samples.