Towards Intelligent Electromagnetic Inverse Scattering Using Deep Learning Techniques and Information Metasurfaces

Electromagnetic inverse scattering (EMIS) is uniquely positioned among many inversion methods because it enables to image the scene in a contactless, quantitative and super-resolution way. Although many EMIS approaches have been proposed to date, they usually suffer from two important challenges, i.e., time-consuming data acquisition and computationally -prohibitive data post processing, especially for large-scale objects with high and even moderate contrasts. To tackle the challenges, we here propose a framework of intelligent EMIS with the aid of deep learning techniques and information metasurfaces, enabling to the efficient data acquisition and instant data processing in a smart way. Towards this goal, as illustrative examples, we considerably extend the canonical contrast source inversion (CSI) algorithm, a canonical EMIS method by updating the contrast via the generative adversarial network (GAN), an unsupervised deep learning approach, leading to a novel physics-informed unsupervised deep learning method for EMIS, referred to as CSI-GAN in short. Compared with existing deep learning solutions for EMIS, our method relies on the supervision of physical law instead of the labeled training dataset, beating the bottleneck arising from the collection of labeled training datasets. Furthermore, we propose a scheme of adaptive data acquisition with the use of information metasurface in a cost-efficiency way, remarkably reducing the number of measurements and thus speeding up the data acquisition but maintaining the reconstruction's quality. Illustrative examples are provided to demonstrate the performance gain in terms of reconstruction quality, showing the promising potentials for providing the intelligent scheme for the EMIS problems.


I. INTRODUCTION
Electromagnetic inverse scattering (EMIS) has been shown a powerful contactless super-resolution examination technique, which enables us to "see" clearly the internal structure of scene [1], and has found its valuable applications in nondestructive evaluation [2], [3], [4], geological exploration [5], [6], [7], cancer detection [8], [9], [10], and security check [11], [12], to name a few. By now, various inverse scattering methods have been developed, such as contrast source inversion (CSI) [13], [14], [15], [16], distorted Born/Rytov iterative methods [17], [18], and stochastic methods [19], [20], [21], etc. However, they usually suffer from two formidable challenges, i.e., time-consuming data acquisition and computationally prohibitive data post processing, especially for large-scale objects with high and even moderate contrasts. To deal with the challenges, here we propose a concept of intelligent electromagnetic inverse scattering by exploring the deep learning techniques for the adaptive data postprocessing on digital level and the information metaurface for the adaptive data acquisition on physical level.
Technically, the aforementioned EMIS methods are developed in the context of data postprocessing. Parallelly, the deep learning techniques are gained ever-increasingly attentions mainly in the digital world since the huge success of artificial intelligence (AI) board-game-Go program (AlphaGo) [22], such as in the areas of speech recognition [23], [24], [25], image recognition [26], [27], [28], automatic translation [29], [30], [31], image editing [32], [33], [34], [35], robot control [36], [37], [38], etc. Naturally, the deep learning techniques can be explored to solve the difficulties arising in the current EMIS algorithms. By now, various deep-learning-based EMIS methods have been proposed for pixel-based inverse scattering imaging [39], [40]. In [41], a U-Net structure [42] was used to learn the mapping from the electromagnetic input data to the pixelated contrast images, and three schemes to preprocess the input data were discussed: raw measured data, back propagation (BP), and dominant current schemes, where the latter two schemes were proved able to reconstruct satisfactory images. The effectiveness of the U-Net structure was also verified in the image reconstruction from multiple scattered light measurements [43], where it is proved that modified contrast is more suitable to be the input of the U-Net for full-wave EMIS problems [44]. In [45], three different input schemes to U-Net for solving EMIS with phaseless data (PD) were proposed and discussed, i.e., the raw phaseless measured field data, the dominant induced currents by the Levenberg-Marquardt method, and the contrast source inversion method, where except inputting raw PD, the other two input schemes both performed robust and accurate imaging results. In order to improve the imaging quality in EMIS when directly inputting the raw field data to the deep learning networks, data compression and preprocessing tricks were introduced by a pretrained autoencoder [46] or semiphysics-driven subnet [47] to reduce mapping workloads of the inverse deep learning networks.
Besides the U-Net, some generative adversarial networks (GANs), which are instinctively suitable for the imaging generation tasks, also prove to work well for electromagnetic pixelated imaging. In [48], GAN was trained to generate virtual dielectric anatomical breast phantoms that were similar to real human breasts, which realized data expansion for further researches. In the forward electromagnetic scattering problem, a framework of GAN, named pix2pix [49], was used to immediately predict the graphical induced currents when given permittivity contrast and incident field [50]. In [51], a complex-valued pix2pix was proposed for EMIS, in which a generator composed of multi-layer complex-valued convolutional neural network was trained to learn the mapping from the BP result to the contrast images. Although the generator was complex-valued, the researchers in [51] admitted that their proposed method still lack interpretability, which is a common problem for the deep learning schemes. Further, in [52], an attention-assisted pix2pix with spatial attention mechanism was used to reconstruct scatterers including perfectly electric conductor (PEC) and dielectric objects, which is the first time that the learning-based approach was introduced to solve the EMIS with mixed boundary conditions.
The cost for lack of interpretability is that the training of deep learning networks is a black-box process and a great deal of training data are needed for a satisfactory result. In order to reduce the dependence on training data and increase the generalization ability, physical priori knowledge should be manually added into the design of deep learning schemes. Recently, some physics-informed neural networks driven by partial differential equations have been rapidly developed, which have shown great effectiveness in solving the classic physical problems including fluid mechanics and quantum mechanics [53]. For the inverse scattering, inspired by the similarities between the iterative optimization method of nonlinear inverse scattering and the ANN architecture, a cascade of end-to-end complex-valued residual convolutional neural network (CNN) modules, termed DeepNIS, was proposed in our previous work [54], in which the generalization ability was verified by testing with the experimental data. Another physics-informed work for EMIS was demonstrated in [55], where a cascaded end-to-end CNN was designed to hierarchically fit the induced current from scattered fields by setting multiple training labels in the intermediate layers. As a result, the nonlinearity of the mapping relation that the proposed CNN needed to fit was greatly reduced, which was verified by numerous numerical and experimental tests [55].
Besides these physics-informed end-to-end deep learning networks, the deep learning frameworks can also be iterative by integrated with iterative physical optimization operators, which may not be as fast as the end-to-end frameworks, but the inversion process can be relatively more controllable and interpretable. Such iterative deep learning frameworks mainly work by two sequential stages: offline training and online optimization. As a classical case presented in [56], in the offline training stage, Guo et al. trained two kinds of neural networks named "FWD SOLVER" and "subnetwork" in ISP SOLVER to surrogate the numerical forward solver [57] and cascaded inversion solver respectively. In the online optimization stage, the residual between the observed data and simulated data was input to the subnetwork to guide the iterative update of model reconstruction. Another iterative deep learning framework was introduced in [58], where a pretrained supervised deep learning network was embedded in a CSI method to iteratively map the signal subspace of the contrast source to an estimate of the true contrast source.
Although the prior physical knowledge was introduced to increase the generalization by these abovementioned attempts, it was not enough to get rid of paired training data. In other words, these deep learning schemes are still supervised methods, which need to collect the scattered fields for thousands of training samples. Both simulation and experiment methods for scattered field collection are time consuming and expensive, especially for experiments.
In our daily research in iterative optimization methods [1] for EMIS of intricate scatterers, we found that if the contrast image was initialized as a shape topologically similar to the ground truth, the iterative methods could eventually converge to a satisfactory result. On the contrary, if the contrast image was not well initialized, such as initialized by a fuzzy BP result, the iterative methods were easy to generate blurry results. The main reason for this phenomenon is that the strong nonlinearity and ill-posedness of EMIS make the iterative methods trap in local optimum, which is hard to escape without outside assistances, such as revision with topological features. For solving the EMIS problems, we realized that respecting physical formulas is the inherent characteristic of the iterative optimization methods while adding the topological features is the advantages of deep learning methods especially GANs. Inspired by this idea, in this article, we propose a physics-informed unsupervised deep learning framework for solving EMIS, where a whole iterative optimization method is directly integrated in the framework to provide the physical priori knowledge, while an unsupervised GAN with physical loss function is trained to add the topological features into the optimization process of EMIS. In order to demonstrate the procedure and effectiveness of the proposed framework, we choose CSI as the example of iterative optimization backbones integrated in our deep learning framework, which is termed CSI-GAN in this article. Benefitting from the abundant physical mechanism brought by CSI, the measured scattered fields in training dataset are no longer needed and the contrast images alone are enough to train our CSI-GAN. Time-consuming and laborious full-wave simulations for the training-data preparation could be skipped, which lowers the threshold for the usage of the deep learning methods in the EMIS problems. Several simulation tests are conducted to fully vindicate the effectiveness of our proposed framework. Nevertheless, it must be noted that the unsupervised framework of CSI-GAN is accompanied by higher costs of computational resources and imaging time compared with the supervised deep learning frameworks. From the user's perspective, choosing the supervised or unsupervised framework depends on whether the paired training samples are easy to be collected or not, or the user prefers to save time on the data preparation process or the imaging process.
For the data acquisition of EMIS, there are three schemes of measurement configurations: the real aperture (RA), synthetic aperture (SA), and coding aperture (CA). The RA method uses a great number of elements in a large aperture, which is flexible in controlling the measurement modes, but suffers from its big size, heavy weight, high power, and expensive price of hardware. On the contrary, the SA method works with the mechanical movement of a single sensor to form virtually a large-size aperture in digital world. Apparently, the SA system has relatively low hardware cost compared to the RA system; however, it is inefficient in data acquisition besides the relatively low signal-to-noise ratio (SNR) due to the mechanical movement of the single sensor. The CA method relies on the theory of compressive sensing [59], in which a sequence of random modulators is manipulated to achieve the compressive measurements. It is apparent that information metasurface can play the role of the dynamic modulators. The information metasurface was developed from programmable metasurface [60], [61], [62], [63], which consists of an array of controllable units (called as meta-atoms), where each meta-atom is connected to an external control unit so that its resonance can be damped by application of a bias voltage. Through applying different voltages to the control circuit, the selected subsets of the elements can be switched on to create the desirable radiation patterns. As such, the information metasurface is capable of producing arbitrary radiation patterns in a very flexible and dynamic manner. In this way, spatial information of an imaging domain can be encoded onto this set of radiation patterns, or the measurements, which can be processed to reconstruct the targets in the scene. The readers may refer to [63] for the comprehensive review on the information metasurfaces. Apparently, this kind of system has the advantages of simplified architecture and low hardware cost compared to the phased-array techniques. Here, we would like to point out that the alliance of the information metasurface with machine learning techniques opens a new exciting door for designing smarter systems with lower hardware costs.
We remark that a much shorter conference version of this paper will appear in [64]. Our initial conference paper did not address the detailed structure of the improved CSI-GAN nor the termination strategy of CSI-GAN iteration process. This paper will address these issues and add a comprehensive discussion of how information metasurface is used in solving EMIS.

II. FORMULATIONS OF THE EMIS PROBLEM
In this section, we detailly introduce the formulations describing the EMIS problem, while the formulations of the CSI algorithm that are needed to integrate with our deep learning framework are explained in APPENDIX in details. Some inverse imaging results of three CSI algorithms (CSI [13], ECSI [14] and MCSI [15]) are also illustrated.

A. FORWARD SCATTERING PROBLEM
For convenience, we consider a two-dimensional (2D) EMIS problem illuminated by plane waves with transverse-magnetic (TM) polarization, as illustrated in Fig. 1. A nonmagnetic scatterer with the relative permittivity ε r (r) is located in a domain of interest (DOI). Outside the scatterer is free space with permittivity ε 0 and permeability μ 0 . For measurements, the DOI is illuminated successively by a total number of K in transmitting antennas. For each illumination, the scattered electric fields are simultaneously recorded by a total number of K s receiving antennas. The transmitters and receivers are alternately and uniformly arranged on a circle with radius R sharing the same center with the DOI. We remark that R should be large enough so that the incident fields on the scatterer by the transmitting antennas are approximately plane waves.
Referring to the symbolic representation in [65], we use E t (r) and E i (r) to represent the total and incident electric fields at the field point r, respectively. In domain D (DOI), the total electric field is calculated by: , r ∈ D (1) In free space S outside the domain D, the scattered field E s (r) radiated by scatterer can be written as: where g(r, r ) denotes the 2D Green's function from source point r' to field point r in free space: where H (1) 0 is the first-kind zeroth-order Hankel function. In order to simplify (1) and (2), a normalized contrast current density J (r ) can be defined: where χ (r ) is the contrast defined by: Then (1) and (2) can be simplified as: It is difficult to directly calculate the integral equations in (5) and (6). We use the method of moment (MOM) to transform these equations into discrete versions and refer to the symbolic representation in [55] to rewrite the formulations in their discrete versions. We assume that the numbers of transmitters and receivers are both K. For the convenience of centralized data processing, for each physical quantity, we gather the data measured at the k-th (k = 1,2, …,K) radiation into one complex matrix and provide their computational formulas. Firstly, we discretize the domain D into N×N subunits, and the contrast values at the centers of these N×N subunits are represented by an N×N matrix, which is then flattened into a column vector χ . Then, we denote the incident field vector radiated by the k-th transmitter on χ as E i k ,whose dimension is the same as χ . All of E i k (k = 12, …,K) can be gathered into an N 2 ×K complex matrix E i : Similarly, we define an N 2 ×K complex matrix E t to represent the total electrical field in domain D, whose definition is the same as E i but replacing the incident field in E i with the total electrical field. The normalized contrast current density in domain D under all k-th (k = 12, …,K) radiations can then be represented by an N 2 ×K complex matrix J, which can be calculated by to give the discrete form of (4), where ξ means the diagonalizable matrix of χ. The total electric field E t can be calculated by to give the discrete form of (5), where G D is an N 2 ×N 2 complex matrix whose definition is the same to [55]: Here, a = (s/π ) 1/2 , and s is the area of one subunit, J 1 ( * ) donates the Bessel function of the first order, H (1) 0 and H (1) 1 donate the zeroth and the first orders of the Hankel function of the first kind, respectively. Physically, G D (n, n ) means the propagation coefficient from the n-th subunit to the n'-th subunit of domain D. Similarly, (6) can also be represented in a discrete version: where E s is a K×K complex matrix, and each column represents the measured scattered fields under one radiation. G S is a K×N 2 complex matrix containing the propagation coefficient from the n-th subunit of domain D to the m-th receiver: Lastly, combining (8) and (9), together with (12), we can get the two equations we need to define this EMIS, which are termed as data equation and state equation:

B. NUMERICAL SIMULATIONS OF CSI ALGORITHMS
In the numerical test, the domain of interest D with reference to Fig. 1 is set to be a 5.6λ 0 ×5.6λ 0 square (λ 0 = 7.5 cm is the working wavelength in vacuum), which is uniformly discretized into 64 × 64 sub-squares for the simulations. 36 linearly TM-polarized transmitters are located uniformly over the circle with a radius of 12λ 0 ; and 36 co-polarized receivers alternately distributed with the transmitters are used to simultaneously measure the electrical field scattered from the probed scene. In the numerical forward problem calculations with MoM, the objects are set to be lossless dielectrics with a relative permittivity of ε r = 2 and 2% noise has been added for all simulations throughout this article.
We randomly choose an image from the MNIST data set [66] and binarize it as the imaging object in the domain of interest D. Four kinds of inverse scattering methods (BP, CSI, ECSI and MCSI) are numerically tested to recover the permittivity-distribution of the imaging object. The testing results are demonstrated in Fig. 2. It can be found that all four inverse scattering methods fail to produce a legible image with sharp edge and complete topological structure.

III. THE CSI-GAN METHOD
In this section, we will introduce the methods of the proposed CSI-GAN in detail. First, the overall structure of CSI-GAN would be illustrated and detailly explained. Then, the design strategy for the loss function of GAN part would be interpreted. Last, the tricks of the training process of CSI-GAN would be demonstrated for a stable convergence. The definition of all relevant symbols appeared in the CSI iterations, such as χ n , ξ n , J n , g n and p n , can be found in the Appendix section.

A. OVERALL STRUCTURE OF CSI-GAN
The schematic diagram of the proposed CSI-GAN in the n-th iteration is shown in Fig. 3. At the beginning (n=1), we execute the classic CSI method [13] for 100 iterations to obtain the initial values of the normalized contrast current density J 1 , the contrast ξ 1 , the gradient g 0 and the Polak-Ribière conjugate gradient search direction p 0 . Because ξ n is the diagonalizable matrix of the contrast vector χ n , and the contrast image is the two-dimensional form of χ n , in the following paragraphs we define a bold symbol " χ n " to represent the contrast image corresponding to χ n or ξ n .
The image of χ 1 (represented as χ 1 ) which is directedly initialized by CSI method is blurry in most cases because of the strong nonlinearity and ill-posedness of EMIS. In other words, the restriction imposed on χ 1 is not enough to let it converge to a high-resolution solution. Inspired by the fact that most of the images in real word own semantic information, in other words, they have regular shapes and could be recognized by people, thus we introduce a GAN to continually add topological constraint in the update process of the contrast image.
The schematic diagram of GAN is illustrated in Fig. 4. In the n-th iteration, the generator (Gen.) of GAN receives the image of χ n+1 (represented as χ n+1 ) from CSI, optimizes it towards the direction of adding more topological and semantic information, and outputs a modified image χ R n+1 . Then χ n+1 would be replaced by χ R n+1 in the next optimization iteration of CSI to help it jump out of the local optima and converge to the correct image.
The abstract structure diagram of CSI-GAN displayed in Fig. 3 is shown in Fig. 4. As we can see, after the contrast image is modified by GAN (χ n = χ R n ), the CSI iteration is only executed once to update the normalized contrast current density (J n → J n+1 ), which is not enough to make J n+1 match with χ n from the angle of objective function F (J n+1 , ξ n ) and will bring difficulty to the convergence of CSI-GAN . In order to make the convergence of CSI-GAN more stable, we improve the CSI-GAN workflow, as illustrated in Fig. 6. The improved workflow can be tackled in three steps. Firstly, the normalized contrast current density J n is input into GAN, driving the training process of GAN before χ n is modified by GAN and replaced by χ R n . Secondly, after χ n is replaced by χ R n , we keep its value unchanged while 50 iterations of the modified gradient method [14] are continuously executed to update J n alone (J n → J R n ) to make χ n and J R n match with each other:   Thirdly, a classic CSI procedure initialized by J R n and χ n is executed by 50 iterations to update the normalized contrast current density (J n → J n+1 ) before it is input to the GAN module in the next iteration of CSI-GAN: Thus, the modified gradient iterations, CSI iterations and topological inpainting by GAN are alternately executed in the CSI-GAN pipeline. Meanwhile, all auxiliary variables can flow smoothly in the pipeline.

B. DERAILED TRUCTURE OF GAN
As we know, GANs [32] are able to transform Gaussian noise to target images, in other words, the output of GANs could be totally different from the input. As we just want to revise the contrast image ( χ n ) using GAN, we wish that the output of GAN (χ R n ) could share some common features with its input image, rather than outputting a brand-new image. So, the degree of freedom for GAN should be restricted to some extents. Inspired by this reason, we introduce a special GAN structure named as Cycle-GAN [67], which is widely used in the unpaired image-to-image translation to solve the mismatch problem between the input and output images, into the structure design of GAN in CSI-GAN.
The structure of the generator in the proposed CSI-GAN combined with Cycle-GAN is shown in Fig. 7, in which G R χ is the revised generator module responsible for the topological inpainting, and G χ R is the inverse generator coupled with G R χ to restore χ R n to χ n , insuring that χ R n and χ n share some common topological features and restrict the freedom of G R χ . D R is the discriminator to judge to which extent that χ R n shares the same topological features with the training dataset consisting of congeneric scatterers. The loss function of this Cycle-GAN structure will be provided in the next section.

C. LOSS FUNCTION OF GAN
For the design of CSI-GAN, we refer the error function of MSCI [15], which can be described as: where F R (ξ ) is a topological regularization item indicating the flatness level of image edge. In the loss function of CSI-GAN, we replace F R (ξ ) in (17) with a discriminator error and change the multiplicative regularization to the additive one. As a result, the loss functions of G R χ and G χ R in Fig. 7 can be written as: where a 1 to a 4 are fixed parameters, while CL 1 (χ n ) and CL 2 (χ R n ) are cycle consistency losses represented as: in which * 1 represents the L1-norm.
For the loss function of discriminator D R , we use the loss function of WGAN [68] to simulate the calculation of Wasserstein distance between the generated images and real images: where P g and P t represent the distributions of the generated images from G R χ (fake samples) and real scatterer images (real samples) in the training dataset, respectively.

D. TRAINING PROCESS OF GAN
Similar to [56], we train the GAN in the CSI-GAN by two sequential stages: offline training and online optimization. In the offline training process, we pre-train the generator alone using paired images with randomly smudged images by gaussian noise as input, and the corresponding raw images selected from the training dataset as the output to make it have the preliminary inpainting ability. In other words, the generator is pre-trained as a denoising network. However, a denoising mapping is now powerful enough to modify the contrast image in CSI iterations because it lacks the prior knowledge to guide the mapping from estimated contrast images to real ones in CSI iterations. Thus, in the online optimization process, we further train the revised generator G R χ with the back propagation gradients from the loss function (18) which has a physically-guided error term F D ( * ) to instruct G R χ in fitting the modified mapping we want. It must be noted that for different testing cases, the offline training process is executed ahead (no need to repeat) but the online optimization process needs to be restarted. Different testing cases do not share the network parameters of GAN in their online optimization processes. In other words, the GAN is a specialized image-corrector for each testing case and its network parameters are dynamically adapted in a testing case.
One of the most important things for GAN's training process in online optimization stage is to ensure that the generator is fully trained while preventing the discriminator from falling into overfitting. In the training process of discriminator, there may encounter a problem that fake samples generated from the revised generator G R χ are much less than the real samples from the training dataset, making the discriminator easily trap into overfitting and cause mode collapse. One way to relieve this problem is increasing the diversity of the fake samples. We use G R χ (χ n ), G R χ (χ R n ) and their weighted means with random weighting coefficients as the whole batch of fake samples. Meanwhile, after the CSI-GAN procedure begins, in the n-th iteration, the GAN will be trained with 10 iterations before it is used to modify the contrast image. Another way is restricting the learning progress of discriminator. In one training iteration of GAN, the generator will be trained 3 times after the discriminator is trained once.
Some hyper-parameters need to be carefully designed for stable convergence. We choose the Resnet structure [69] with 34 layers and 4 layers as the generator and discriminator networks, respectively. For the fixed parameters in (18) and  The whole CSI-GAN pipeline is run on the Pytorch deeplearning platform including the modified gradient method iterations and CSI iterations to make full use of the parallel computing capability of the graphics processing unit (GPU). In other words, the modified gradient and CSI methods are integrated in the whole deep learning pipeline as a part of the deep neural network. The GPU device we use is NVIDIA RXT 3090. In our testing cases, the run time for one iteration of CSI-GAN including training the GAN is about 6s.

IV. TESTING RESULTS OF CSI-GAN
The settings of parameters for simulations are the same as those of Fig. 2. The image of the scatterer is chosen from the MNIST dataset of handwritten digits. We use 6000 images from MNIST as the training dataset for the GAN in CSI-GAN to learn the topological features of the potential target scatterer. It should be guaranteed that GAN will not see the exact image of the target scatterer. So, other 1000 images from MNIST are made as the testing dataset, from which a sample image is randomly chosen and then binarized to be used as the ground truth contrast image of the target scatterer. It must be noted that the contrast images in the training dataset need not to be numerically simulated to get their scattering fields.
When given a target scatterer, we get the corresponding scattered field E s and put it into the CSI method to obtain an initial normalized contrast current density and other auxiliary variables to launch our CSI-GAN procedure. As the process goes on, the output of the generator would gradually converge to the correct image of the target scatterer. The imaging results of χ n+1 , χ R n+1 and χ n+2 in the n-th optimization iterations of CSI-GAN for the target scatterer in Fig. 2 are shown in Fig. 8. As we can see, the GAN could help the integrated CSI method jump out of local optimum and repair the image with more topological features. The imaging result of the 36-th iteration  (bottom right corner in Fig. 8) is already very similar to the target image. The graph of data error function F D (J n+1 , ξ n+2 ) in the n-th iteration is shown in Fig. 9. The initial data error at the 0-th iteration in Fig. 9 is the one for the imaging result of the classical CSI method and is proved by experiment that it could not be further improved by the CSI method itself or traps in a local optimum. However, the descending curve of data error in Fig. 9 indicates that the proposed CSI-GAN could haul it out of the quagmire and steadily converge to an ideal physical solution for this inverse scattering problem.
We further conduct a quantitative analysis of the final imaging result for the target scatterer (original image) in Fig. 2 and the results are presented in Fig. 10. The imaging result generated by CSI-GAN (Fig. 10(c)) is very close to the original image with a mean square error (MSE) of 0.0078 and the structural similarity index measure (SSIM) of 0.8775. In order to fully display the recovering capacity of topological details, we plot Fig. 10(d) by binarizing the image in Fig. 10(c) and further improve the SSIM to 0.9586. In Fig. 10(e), the defect and surplus pixels in Fig. 10(d)  FIGURE 11. The comparison results of the inverse scattering imaging between CSI and the proposed CSI-GAN.
compared with the original image are marked in green and red respectively, showing that 95.31% pixels in Fig. 10(d) are identical to those of the original image, which confirms the powerful precision for the detail-recovery ability of our proposed CSI-GAN. Fig. 10(f) shows the most similar sample (under the criterion of SSIM) with the original image in the training dataset, whose topological posture is distinctly different from the original image. This fact indicates the effectiveness of the physical constraint put on our CSI-GAN, which can guide the output image to converge to the exact image of the target scatterer rather than a similar image existing in the MNIST dataset.
Other testing results are shown in Fig. 11. The imaging results of CSI methods are also given for comparison. As we can see in Fig. 11, the proposed CSI-GAN could generate high-resolution imaging results very close to the ground truth with plenty of semantic features, while the CSI method can only converge to blurry and illegible results.
Next, we will discuss the termination strategy of CSI-GAN iteration process. There are mainly two kinds of cases that may occur in the implementation of CSI-GAN. One is that CSI-GAN continually traps in and jumps out of the local optimum and finally converges to an optimal result, as Fig. 12 shows. Fig. 12 displays the graph of data error F D and several local-optimum imaging results in the iteration process of CSI-GAN. In this case, the CSI-GAN could be ended when the data error F D drops to a preset threshold value. Another case is show in Fig. 13 as an example. In this case, the CSI-GAN rapidly converges to an optimal result at the 11-th iteration and  keeps the data error F D stabilized around the optimal value. The difficulty for the judge of termination for this case is that we could not foresee whether a better result will come out in the subsequent iterations. In other words, we should know whether the CSI-GAN procedure deserves to continue to explore a potential better result when an ideal result is rapidly obtained. Weighing the efficiency and effectiveness together, our final termination strategy is firstly executing the CSI-GAN by 100 iterations. If the optimal data error F D is lower than a preset threshold, the CSI-GAN is ended and the optimal result is outputted as the final result. Otherwise, if the optimal data error F D does not meet the requirement, the CSI-GAN will be executed continuously until an acceptable result is obtained. The run time for one iteration of CSI-GAN is about 6s.

V. INFORMATION-METASURFACE-ENABLED INVERSE SCATTERING
It is noticed from the aforementioned discussions that the EMIS requires the time-consuming multi-static multi-view measurements to achieve the acceptable results. However, for most of practical applications, it becomes a challenging issue for them due to the high-dimensional data or "data crisis". Fortunately, it is well known from the Johnson-Lindenstrauss lemma [70] that the high-dimensional data, if they have a structure representation, can be projected into a low-dimensional feature space with nearly neglectable information loss. That is to say, the essential information of the high-dimensional data can be efficiently retrieved from the remarkably reduced measurements. Given a sample x ∈ X, where X is a cloud of Q points in an N-dimensional space, there exists, at least, a projection operator H : y = H(x) such that the M-dimensional projection has as small loss of intrinsic information as possible compared to x, where M<<N. For illustration, we here restrict ourselves into the case of Born approximation [71]; however, the developed methodologies can be generalized for more general cases in a straightforward manner [63], [72], [73], [74]. Then, the operator H is linear, and this problem can be represented as y = H · x + n, where y ∈ C M is the M-length measurements, x ∈ C N denotes the probed scene, and n is the measurement noise. Under the Born approximation, the entry of measurement matrix (H i j ) is simply proportional to the fields radiated by the transmitter and receiver antennas at a given point in the scene r j : , where E T i (r j ) and E R i (r j ) represent the radiation and receiving patterns. Note that a single sensor is assumed for data acquisition, like the setting of computational compressive imager [1]. Each row of the measurement matrix H corresponds to a measurement mode, and hence the number of rows equals the number of measurements. Apparently, the entries of H is determined by the illumination pattern of information metasurface E T i (r j ). In the terminology of machine learning, the matrix H corresponds to the so-called embedding projection operator. We here consider two popular projection operators widely used in the area of machine learning, in particular, the random projection and principal component analysis (PCA) projection. For the random projection, the entries of H are drawn from independent random numbers. Note that the random approach imposes no restrictions on the object to be reconstructed. The PCA is a prior task-aware embedding technique, where each row of H is trained over many training samples available. Thus, when a set of prior knowledge on the scene under investigation are available, PCA enables the design of efficient measurement modes, reducing remarkably the number of measurements. Now, we provide a set of results to demonstrate the performance of the above method [71]. To that end, a two-bit information metasurface is considered here, which is composed of 48 × 48 controllable digital meta-atoms, as shown in Fig. 14(a). More detailed about the metasurface can be found in [71]. In addition, the two-bit information metasurface is trained to get the desired radiation patterns or measurement modes over the MNIST dataset, a widely utilized handwriting dataset in the area of machine learning. Full-wave  simulation results of the PCA-based radiation patterns are shown in Fig. 14(c), and the corresponding coding patterns of the information metasurface are given in Fig. 14(d). These figures illustrate that the information metasurface can be trained by PCA, implying that the machine-learning-driven information metasurface is able to generate the measurement modes needed by PCA. Fig. 15(a) and (b) illustrate the recovered images of nine digit-like objects of the machinelearning imager trained with the random projection and PCA with varying numbers of measurements. Here, Fig. 15(a) and (b) present the retrieved images from the PCA and random projection respectively. In each method, the measurement numbers are 10, 50, 100, 200, and 300 from the left to the right. We also evaluate quantitatively the image qualities with respect to the signal-to-noise ratio (SNR). We clearly observe that the quality of image turns better with the increasing measurements, and PCA behave better than the random projection, especially in the low measurement cases.
To summarize, we can see from the above preliminary results that the information metasurface driven by machine learning techniques is able to produce the task-oriented measurement modes, which are very relevant to the probed object, in a very flexible and dynamic manner. In this way, the number of measurements can be remarkably reduced for the high-quality inversion, which is directly responsible for considerably speeding up the data acquisition, and meanwhile maintaining the very low complexity for the digital data processing.

VI. CONCLUSION
In this work, we propose the framework of intelligent EMIS by integrating deep learning techniques for efficient data processing and information metasurface for adaptive data acquisition into the entire EMIS's pipeline, in order to address the fundamental but important challenges arising from the conventional EMIS's strategies. Towards this goal, we firstly propose an unsupervised EMIS framework, i.e., CSI-GAN, which can be trained on the off-the-shelf dataset without paired training samples, leading to the remarkably reduced simulation time during the collection of training dataset. As opposed to the supervised EMIS's solutions relying on a large amount of labeled training data, our CSI-GAN takes the CSI with physical interpretations as the network's backbone, and explores the unsupervised deep learning technique for the update of the contrast images. Moreover, the CSI-GAN is not exclusionary with the supervised deep-learning-based inversion schemes, and the inversion results can be used as the initial solutions for the CSI-GAN to improve the convergence procedure. Of course, compared with those end-to-end inverse scattering deep learning networks trained by paired samples whose imaging results can be directly obtained when given the input electromagnetic data, our CSI-GAN needs several optimization iterations before obtaining the final imaging results, which is inferior to the former in computing overhead and imaging speed. However, our CSI-GAN can be used as a very ideal alternative method to achieve better inverse scattering imaging results when paired training samples are missing or difficult to obtain. Furthermore, we propose the scheme of adaptive data acquisition with the information metasurface in a cost-efficiency way, remarkably reducing the measurement numbers and thus speeding up the data acquisition but maintaining the reconstruction quality. In the near future, the strategies to improve the performance of the intelligent EMIS could be further explored, for instance, to design more "smart" data acquisition schemes by combing the machine learning method with the information metasurface (reinforcement-learning-driven metasurface) to develop more specialized deep learning networks with more physically meaningful physical properties. We expect that the presented strategy can open a new avenue for the future intelligent EMIS by combining the information measurface and the machine learning techniques.

A. BACK-PROPAGATION METHOD
Back-propagation (BP) is a simple non-iterative method to solve the EMIS problems and gets the estimate value of the contrast, which is widely used as the initial value for the iterative methods such as CSI. In this section, we provide the concise version of BP written by matrix operations.
The two main steps are as follows: 1) Estimate the value of the normalized contrast current density J : where the superscript H means conjugate transpose, and γ is a complex diagonal matrix with dimension K×K.
The value of γ is to minimize the quadratic cost of state equation: The minimization problem (25) has an analytical solution: where E s = G S · G H S · E s . The function sum[A,1] means to calculate the summation of each column in matrix A and form a row vector. The symbol means element-wise multiplication, and superscript " * " means element-wise conjugate. The vector division in (26) is element-wise division. 2) Calculate E t using (9) and get the estimation of the contrast χ by means of (8): where sum[A,2] means to calculate the summation of each row in matrix A and form a column vector. The vector division in (27) is element-wise division.

B. CONTRAST SOURCE INVERSION METHOD
The classical CSI algorithm [13] plays an important role in our physics-informed unsupervised deep learning frame-work. It is used to extract the physical knowledge and directly integrated in the framework. Here, the classic CSI algorithm written as matrix operations is provided.
We define two error functions named as the data error function F D (J, ξ) and the state error function F S (J), which indicate the mismatches in the data equation and the state equation respectively: The CSI method aims to minimize the sum of data error and state error functions by iteratively optimizing the normalized contrast current density matrix J and the contrast matrix ξ: The classic CSI algorithm firstly updates J using the Polak-Ribiére conjugate gradient method, and then updates ξ by orderly calculating (9) and (27). Before giving the details of CSI, we define some auxiliary matrices and values for more succinct expression of CSI formulations: where I represents the identity matrix. The value of ξ remains constant in one iterative update of J, so for the updating process of J, the objective function F (J, ξ) in (30) can be shortened to F (J): The main steps of the classical CSI algorithm are as follows.
1) Use the BP method to calculate the initial values of J and ξ, represented as J 0 and ξ 0 respectively.
2) For the n-th iteration (n≥0), put J n and ξ n into (31) to update the auxiliary matrices and values A 2 , B 2 , and c 2 . ξ n is the diagonalizable matrix of the contrast vector χ n . 3) Calculate the gradient matrix g n of the objective function F (J) with respect to J = J n : g n = c 1 A H 1 · (A 1 · J n − B 1 ) + c 2 A H 2 · (A 2 · J n − B 2 ) (33) 4) Calculate the Polak-Ribière conjugate gradient search direction p n : p n = g n , n = 0 g n + p n−1 · β n , n ≥ 1 (34) where β n is the diagonalizable matrix of β n : β n = Re sum g n (g n − g n−1 ) * , 1 sum g n g * n , 1 In (35), the function Re[ * ] means taking the real value part of a complex value vector or matrix, and the vector division is element-wise operator. 5) Do one-dimensional search at the direction p n to minimize the objective function: argmin α n F J n+1 = F J n + p n · α n (36) where α n is the diagonalizable matrix of the row vector α n , who has analytical solution: The vector division is element-wise operator. 1) Update the normalized contrast current density: 2) Update the contrast: 3) Execute Steps 2-7 repeatedly, until reaching the maximum number of iterations or getting the value of objective function F (J n , ξ n ) less than predetermined threshold.