Multiresolution Preconditioners for Solving Realistic Multi-Scale Complex Problems

In this work, the hierarchic multiresolution (MR) preconditioner is combined with the multilevel fast multipole algorithm-fast Fourier transform (MLFMA-FFT) and efficiently parallelized in multicore computers for computing electromagnetic scattering and radiation from complex problems exhibiting deep multi-scale features. The problem is formulated using the thin-dielectric-sheet (TDS) approximation for thin dielectric materials and the electric and combined field integral equations (EFIE/CFIE) for conducting objects. The parallel MLFMA-FFT is tailored to accommodate the MR hierarchical functions, which provide vast improvement of the matrix system conditioning by accurately handling multi-scale mesh features in different levels of detail. The higher (coarser) level hierarchical functions are treated by an algebraic incomplete LU decomposition preconditioner, which has been efficiently embedded into the parallel framework to further accelerate the solution. Numerical examples are presented to demonstrate the precision and efficiency of the proposed approach for the solution of realistic multi-scale scattering and radiation problems.


I. INTRODUCTION
Numerical modelling of time-harmonic electromagnetic scattering and radiation problems using the surface integral equation (SIE) methods has been a major research topic in computational electromagnetics (CEM) [1]- [10]. SIE methods bring important advantages when compared to other volumetric approaches, as they naturally incorporate radiation conditions and both parameterization and computations are restricted to the boundary surfaces and interfaces. This yields a reduction of the three-dimensional (3D) domain onto two-dimensional (2D) domains, significantly decreasing the required number of unknowns. Combined with its known versatility, numerical stability and fewer restrictions on geometry and material modelling, this has made the SIE The associate editor coordinating the review of this manuscript and approving it for publication was Yi Ren . method a preferred choice for solving many electromagnetic and engineering problems. SIE methods also have some inconveniences. They pose a fully populated dense complex system matrix, which burdens the computational cost and restricts their direct application to small-scale problems, with a few hundred thousand unknowns at most. Fortunately, the last decades have witnessed great achievements in the development of efficient fast solvers to compress this otherwise high computational cost. It is worth mentioning, among others, the fast multipole method (FMM) [11] and its multilevel version, the multilevel fast multipole algorithm (MLFMA) [3], [12], [13], which are based on the iterative resolution of the matrix system. In synergy with concurrent advances in computer technology, efficient parallel variants have also been devised on modern shared and distributed memory computing architectures [14], [15]. Some combine FMM with fast Fourier transform (FFT) seeking better parallel scalability, such as FMM-FFT [16] or its multilevel version, MLFMA-FFT [17]- [19]. These and other advances have pushed the bounds of SIE methods, allowing to increase the problem size from tens of millions to billions of unknowns [18]- [22].
However, as all these advances evolve, bringing computational capabilities closer to industrial needs, the targeted problems are becoming increasingly realistic, posing electromagnetic models whose complexity and size are constantly growing. This is giving rise to new challenges, one of the most important being the multiscale nature of the highfidelity models. The presence of small geometrical details demands discretization with elements that are very small compared to the wavelength. Additionally, they often involve very different spatial scales, which can differ by several orders of magnitude, resulting in disparate mesh sizes. All of this contributes to increasing the condition number of the system matrix, which degrades the convergence rate of the iterative solution, jeopardizing the efficiency gained with fast solvers [9], [19], [23]- [28].
A well-known method to improve the convergence of iterative solvers is applying a preconditioning technique. Preconditioning techniques have been the focus of intense research in the past decades. A recent and extensive review of preconditioning techniques for different applications can be found in [23]. Broadly speaking, they can be divided into two main categories: algebraic and physics-based preconditioners. The algebraic preconditioners, such as the incomplete lower unitriangular upper triangular (ILU), sparse approximate inverse (SPAI), Jacobi (diagonal) or block-Jacobi preconditioners [29]- [31], estimate an inverse of the system matrix, applied then to improve the matrix conditioning. Instead, the physics-based ones act on the discretized operator to regularize it. Some examples of dense-discretization stable physicsbased preconditioners are the Calderón preconditioner [6], [32]- [36], and the Multiresolution (MR) preconditioner [37]. The MR preconditioner introduces a set of multi-level basis functions, to discretize the problem, able to keep the different scales of variation of the solution, improving then the system matrix conditioning in particular in the case of multi-scale structures [8], [38], [39].
In this paper we apply the MR preconditioner for computing electromagnetic scattering and radiation of complex multi-scale problems exhibiting deep multi-scale nature. The problem is formulated using EFIE/CFIE for the conducting surfaces and the thin dielectric sheet (TDS) approximation for thin dielectrics [40], [41]. The integral equations are discretized using the well-known Rao-Wilton-Glisson (RWG) basis functions [42] and the Galerkin's formulation of the method of moments (MoM) [1], and the resolution is accelerated using MLFMA-FFT. The MR preconditioner, alone and combined with LU preconditioner, is then judiciously embedded into the above formulation, rendering a vast improvement of the matrix system conditioning by accurately handling multi-scale mesh features in different levels of detail. Details of the MR integration and subsequent parallelization of the resulting MLFMA-MR approach are reported for the first time, laying the groundwork for a fast-converging highly-scalable methodology in multicore shared-memory computers. Examples are presented to evaluate the precision and efficiency of the MR preconditioner, demonstrating its performance and applicability in different scenarios.

II. FORMULATION
Let us consider a collection of homogeneous penetrable bodies in a homogeneous unbounded background. The homogeneous regions are denoted by R i , with i = 1, . . . , M , the index number of each region. The material properties of the regions are defined by the complex permittivity i = ri 0 and the complex permeability µ i = µ ri µ 0 . We denote with S ij the surface between regions R i and R j , and withn ij the unit vector normal to S ij pointing toward R i .
With the above notation, we can formulate the electric field integral equation (EFIE) and the magnetic field integral equation (MFIE) in region R i by introducing the equivalent electric and magnetic currents J ij (r ) and M ij (r ) on the interfaces S ij that surround R i . The equivalent currents are related to the total electric and magnetic fields in this region, E i (r ) and Then, applying the equivalence principle to the total electric and magnetic fields in R i , we obtain the tangential (T) EFIE and MFIE equations as follows: T-MFIE i : In a similar way, the twisted or normal (N) equations in S ij throughout R i can be obtained as In Eqs. (1)-(4) G i denotes the set of indices of the regions surrounding region R i , η i is the intrinsic impedance in medium R i , E inc i and H inc i are the incident fields due to the sources located inside R i , and the integro-differential operators L and K are defined as where r is the observation point approaching to S ij from the interior of region R i and r ∈ S ij denote the source points at the boundaries surrounding R i . ∇ denotes the divergence in the primed (source) coordinates, PV denotes the principal value of the integral in Eq. (6), k i is the wavenumber in R i and we define which is the homogeneous Green function in R i . The summations in equations (1) to (4) account for all the surfaces S ij surrounding R i , including S ij . We can now combine the above integral equations in region R i , yielding two combined field integral equations (CFIEs) on S ij , one being well-tested for the equivalent electric currents (J-CFIE) and the other being well-tested for the equivalent magnetic currents (M-CFIE), as follows [43]: where a i , b i , c i and d i are the appropriate complex combination coefficients. The next step is to combine Eqs. (8) and (9) for regions R i and R j onto a single integral equation for each interface S i j as follows: posing the so-called JMCFIE formulation.
In the case of perfect electric conduction (PEC) objects surrounded by or in contact with multiple penetrable objects, we can simply derive an appropriate CFIE from the JMCFIE formulation by dropping Eq. (9) and adequately choosing the combination parameters in Eq. (8). The CFIE for a PEC object in contact with a penetrable region R i can be written as: where 0 < α < 1 is the weight controlling the contribution of the EFIE and MFIE equations.

A. THIN DIELECTRIC SHEET (TDS)
In the case of simulating penetrable dielectric sheets that are thin compared to wavelength, the JMCFIE formulation, described above, involves a large count of unknowns, as both sides of the dielectric sheet must be modeled using equivalent electric and magnetic currents. In these particular cases, the computational cost can be greatly alleviated by applying a specific formulation, namely the thin dielectric sheet (TDS) approximation [40], [41]. It is based on the fact that, when the thickness of the TDS is small in terms of wavelength and the sheet dielectric permittivity s = rs 0 is not too low compared with the background, the impinging field will penetrate the sheet almost perpendicularly. Under these conditions, tangential fields will dominate in the dielectric and the normal components can be neglected. Therefore, a TDS of thickness τ embedded in a penetrable region R i can be modeled as where the volumetric sheet is approximately represented by an equivalent surface electric current J i placed on its middle surface S i .
is the equivalent surface impedance of the TDS. The above integral equation is very similar to that for a PEC surface. Only one set of surface equivalent electric currents is required to model the complete TDS, reducing the amount of unknowns with respect to the JMCFIE approach by a factor of four.

B. THE MULTIRESOLUTION PRECONDITIONER (MR)
The multiresolution (MR) preconditioner [8], [37], [44]- [47] improves the spectral properties of the original MoM matrix system by splitting the unknown current into solenoidal and non-solenoidal parts. The procedure is divided into different steps. First, the input triangular mesh, supporting the discretization of the problem in terms of standard RWG basis functions, is rearranged until getting a set of meshes with different mesh-element (cell) sizes. This is done via a multilevel algorithm in which the adjacent cells of the previous level, starting from level-0 triangular facets, are aggregated giving rise to macro-cells. The generalized RWG (gRWG) functions are then defined on each pair of adjacent macro-cells, and the associated unknown current is divided into solenoidal and non-solenoidal parts as detailed in [37], [45]. This process poses the MR basis functions of each level. The above scheme is applied recursively down to the quasi-Nyquist (coarsest) cell-size level, where gRWGs are defined completing the set of multilevel basis functions.
The above MR functions at the intermediate (detail) levels and gRWG functions at the coarsest level can be described as linear combinations of the initial underlying RWG functions. Thereby, the MoM system matrix Z in the space of the initial RWG basis functions can be expressed in the new functions by simply applying a change-of-basis matrix T, as follows: where the matrix T is made up of two different blocks: being T MR the sub-matrix describing the set of N MR basis functions defined at detail levels, and T gRWG the sub-matrix with the set of N gRWG functions at the coarsest level, where N MR + N gRWG = N , i.e. the total number of unknowns. Next, a diagonal preconditioner (DP) D is applied to the MoM matrix in the new multilevel basis, whose elements are given by: The matrix block corresponding to the gRWG functions at the coarsest level (Ẑ gRWG in Eq. (15)) is further preconditioned in terms of its incomplete LU factorization [8], which can be denoted as: where D gRWG is the portion of D corresponding to the gRWG functions at the coarsest level andẐ gRWG is the gRWG matrix block ofẐ. It is worth mentioning that a naive implementation of the above LU preconditioner would compromise the scalability on mixed-memory architectures, burdening the communications between distributed processes and drastically reducing efficiency. In this paper, we propose a very efficient implementation of this operation taking advantage of the highly localized and hierarchic nature of the new set of basis functions. Details of this implementation will be given in the next section.

C. FAST SOLVER
Applying the standard MoM procedure and the Galerkin's testing method to Eqs. (12) and (13), we obtain an N × N dense matrix whose solution are the N unknown coefficients J n of the expansion of the equivalent current density J in terms of RWG basis functions. Here, J stands for the collection of current densities on the PEC and TDS surfaces of the problem under analysis.
To reduce the otherwise prohibitive computational cost of the MoM for the case of large-scale geometries, we apply the MLFMA-FFT. As mentioned earlier, this is an extension of MLFMA to speed-up the matrix-vector-product (MVP) in the framework of an iterative resolution of the problem. Assuming an octree spatial decomposition of the geometry into a set of groups, the MVP in group p can be obtained using FMM as where B p is the set of indexes for the nearby (adjacent) groups of group p, G p is the set of indexes corresponding to the testing functions of group p, G q is the set of indexes corresponding to the basis functions of group q, and S 2 is the Ewald unit sphere. In Eq. (19), the near interactions between basis and testing functions belonging to adjacent groups are calculated using the direct MoM procedure. The far (nonadjacent) group contributions are accounted for using the standard FMM by (i) aggregating the radiation of the basis functions within each group q to the center of their groups where r q is the center of the qth group and withĪ the 3-D unit dyad; (ii) Translating the aggregated radiations between the different groups using the translator operator where h (2) l is the spherical Hankel function of the second kind, P l is the Legendre polynomial of degree l, and L is the number of multipole expansion terms [12]; And (iii), disaggregating the receiving patterns to the testing functions within each receiving group p With this algorithm, the computational cost is reduced from O(N it N 2 ) to O(N it N 1.5 ), with N it the number of iterations required to obtain a prescribed residual error.
Using MLFMA the computational cost of the MVP can be further reduced to O(N it N logN ), using exponential translation, interpolation and adjoint interpolation (or anterpolation) of the fields, in the framework of a multilevel octree decomposition of the geometry. Additionally, to benefit from the availability of multicore distributed computers, the MLFMA-FFT extension is applied. This algorithm implements the translation between groups at the coarsest level of the octree decomposition by performing a 3-D circular convolution per sample of the Ewald sphere. This operation is efficiently done in the transformed domain applying the FFT. MLFMA-FFT avoids inter-process communication and equally distributes the workload among parallel processes, posing a highly scalable parallel implementation. Solutions of surface integral equation with up to one billion unknowns have been obtained using this method [18], [48].
The above algorithms reduce the complexity of the MVP. But the overall cost can also be lowered by accelerating convergence to the solution (i.e., reducing N it ). To this purpose, we apply the MR preconditioner described in the previous section, where the multilevel MR and gRWG basis functions obtained as linear combinations of the original RWG basis functions. This change of bases can be efficiently applied through two sparse matrix vector products (SpMVP) before and after the MLFMA main MVP as where T = D · T is the sparse change-of-basis matrix including the DP of Eq. (17).

III. PARALLEL IMPLEMENTATION
In this section we delve into the details of the integration of the hierarchic MR preconditioner in the parallel MLFMA-FFT. This integration must be addressed carefully, as operating with large sparse matrices, such as those involved in the MR approach, could create a significant computational bottleneck in parallel deployment. Let us start by introducing a simplified notation for the accelerated MVP in (19), as: where the product by Z near on the right hand side (RHS) denotes the near-field contributions (adjacent groups) to the MVP, and the product by Z far denotes the far (non adjacent groups) contributions calculated via MLFMA-FFT as explained in Section II.C. These contributions correspond to the first and second terms of the RHS in (19) respectively. I is the vector with the N unknown coefficients of the expansion of the current density J (in the original RWG space), and y is the vector resulting from the MVP. The above MVP is called in the framework of the parallel iterative solving of the dense matrix system. The iterative solver is the GMRES implementation from the book of SIAM Templates [49], adapting the code for complex arguments and parallelizing it using the OpenMP standard.
Applying the MR preconditioner requires GMRES to operate in the subspace of the new set of MR basis functions, where the MVP in (23) should be applied instead of (19). Considering that MR hierarchical bases can spread beyond the boundaries of individual MLFMA octree groups at the finest level, a naive implementation of (23) could easily burden the parallel numerical computation. The inclusion of the preconditioner was resolved in this work by incorporating two additional sparse MVPs (SpMVPs) in (24), as follows: whereÎ is the vector with the unknown coefficients of the expansion of the current density J in the new set of (MR+gRWG) bases, and wherê is the resulting MVP in the MR subspace, withŷ MR the part corresponding to the MR functions defined at the detail levels andŷ gRWG the part corresponding to the gRWG functions at the coarsest level. Next, the LU preconditioner defined in Eq. (18) is applied to the gRWG part of the MVP resulting from Eq. (25), aŝ Only the gRWG bases contained in the original MLFMA near-field matrix are considered, by applying (15) to Z near , which is calculated for the initial RWG basis functions at the finest level of the MLFMA octree (i.e.,Ẑ near = T·Z near ·T T ). Subsequently, (18) is applied to the gRWG block ofẐ near (Ẑ near gRWG ). This maximizes data locality and the efficiency of the parallel implementation, albeit at the expense of limiting the interactions between gRWG functions to those available in MLFMA, resulting in incomplete LU (ILU) factorization. In our implementation, this implies that there may even be pairs of gRWG functions that are partially computed in the LU preconditioner, since not all the partial contributions of the RWGs that make up these macrobases are available in Z near .
The above formulation has the advantage of applying MLFMA-FFT in the initial RWG subspace, where its parallel performance is optimized, while GMRES operates in the preconditioned MR subspace. Importantly, the two additional SpMVPs involved in Eq. (25) and subsequent application of the ILU preconditioner have been accelerated and judiciously embedded into the MLFMA-FFT parallel implementation. All the required sparse matrices (T , Z near and Z near gRWG ) are stored in compress sparse row (CSR) format. Efficient algorithms have been developed for the SpMVP parallel computation using OpenMP (this also applies to the near-field calculation of the MVP in the RWG subspace in Eq. (25), i.e., Z near ·(T T ·Î). Instead of the usual octree-group parallelization, we have used a SpMVP to avoid memory overlapping, so only the CSR sparse version of this matrix is stored). Additionally,Ẑ near is never fully calculated, as only the diagonal and the gRWG block of this matrix are needed. Effective parallel algorithms have been developed for the calculation of both elements.
The efficient implementation of the ILU preconditioner is, however, somewhat more cumbersome, as the ILU preconditioner is not naturally prone to parallelization. Naive implementations generally suffer from heavy inter-process communication overhead, resulting in very inefficient parallel performance. This lack of scalability can be overcome in the present case by applying the parallel sparse direct and multi-recursive iterative linear solver (PARDISO), available in the Intel Math Kernel Library (MKL) [50]. The pipelining parallelism of this implementation combined with the strong sparsity and local dependencies of the new set of hierarchical functions with respect to the initial RWGs, allow efficient parallel computation of (L · U) −1 in multicore computers, avoiding the bottleneck normally involved when applying the ILU preconditioner in large-scale parallel calculations.

IV. NUMERICAL RESULTS
In this section, we illustrate the effectiveness of the multiresolution preconditioner in solving large-scale scattering and radiation problems with real-life interest. First, we examine the correctness of the proposed method by solving a cylindrical PEC monopole with 0.25 m length and 15 mm diameter on a square ground plane with a side of 1 m. The real and imaginary parts of the input admittance versus frequency calculated through the proposed MR-LU (indicated in the following MLFMA-MR-LU) and the MR alone (indicated in the following MLFMA-MR) are depicted in Fig. 1, compared to the reference solution using MLFMA and RWG basis functions without preconditioning (indicated MLFMA). In both cases the EFIE is applied. A perfect agreement is observed between the three results.
To assess the spectral properties of the proposed MR preconditioner, the singular values of the EFIE impedance matrix at the resonant frequency of 300 MHz are evaluated for the RWG, MR and MR-LU spaces, after the application in all cases of the diagonal preconditioner (DP). In particular, Table 1 reports the corresponding minimum and maximum singular values. It is evident that, with respect to the RWG space, the minimum singular value is around two orders of magnitude higher in the MR space and almost three orders more if LU preconditioner is applied to the gRWG block, reducing so significantly the matrix condition number. Moreover, Fig. 2 shows the eigenvalues distribution of the RWG and MR-LU MoM matrices after DP. Applying the MR-LU preconditoner the eigenvalues are concentrated at 1, as it would happen for a second-kind integral equation: this measures the regularization introduced by the applied preconditioner. Finally, the respective relative residual errors with the number of Krylov iterations are plotted in Fig. 3. It is observed clearly that a dramatic improvement of convergence is obtained using the proposed MR preconditioner.
A second example is considered to evaluate the correctness and effectiveness of the MR preconditioner when applied to the TDS approximation. We consider a spherical shell made of glass material (permittivity r = 5) with a diameter of 1 m and a thickness of 2.5 mm in vacuum, illuminated by a linearly polarized plane wave at a frequency of 900 MHz. Fig. 4(a) shows the bistatic radar cross section (RCS) calculated using the TDS approximation, compared to the reference JMCFIE solution. An average mesh size of λ/10 on both sides (spherical surfaces) of the shell is considered for  the JMCFIE solution, as well as for the equivalent (single surface) sphere in the case of the TDS approximation. In the latter model, a finer mesh of λ/40 is applied around a small region of the surface, as illustrated in the inset of Fig. 4(a), to better illustrate the effectiveness of the MR approach in the case of a TDS including multiscale features. Looking at this figure, a good agreement between the TDS and the reference results is observed, as expected since the shell thickness is small enough compared to the wavelength. Figure 4(b) shows the convergence of the relative residual error, where it can be seen that both MLFMA-MR and MLFMA-MR-LU outperform MLFMA with RWG functions. This reveals the effectiveness of the MR preconditioner in conjunction with the TDS approach.
A challenging structure consisting of a Ferrari Testarossa is considered next, as shown in Fig. 5. This structure exhibits deep multi-scale nature, combining smooth surfaces with different levels of details distributed throughout the structure: on the sides of the doors, front and rear grills, rear cover, wheels, shock absorbers, and the exhaust pipes. To test the   proposed method exhaustively, two different kinds of problems were considered, consisting of the calculation of the scattering for an incident plane wave, and the calculation of a radiation problem for a shark-type antenna located in the ceiling. Additionally, both problems were solved by MLFMA using two different integral equation approaches: first, the EFIE formulation was applied to the entire structure; alternatively, the CFIE formulation was applied to the closed parts (body), leaving EFIE for the open ones (antenna). The car windows and windshields were modeled using the TDS integral equation formulation applied to a 2.5 mm thick glass material with relative permittivity r = 5. The frequency was set to 900 MHz, and an average mesh size of λ/20 was set to the smooth parts, while a fine enough meshing (up to λ/500 in some parts) was applied to properly model the small details, which clearly reveals the multi-scale nature of this example, yielding a total of 1,051,408 unknowns. At the prescribed operating frequency, the thickness of the glass structures is small enough compared to the wavelength within the sheet to give good results using the TDS approximation, without considering the normal current induced in the dielectric.
The proposed MLFMA-MR-LU is compared to MLFMA-MR, the conventional ILU preconditioning applied to the whole problem using RWG basis functions (indicated MLFMA-ILU), and a raw solution using RWG without preconditioning at all (indicated MLFMA). Importantly, given the large size of the problems posed, the gRWG matrix in MR-LU is stored in CSR format and ILU factorization is accelerated and parallelized throughout PARDISO routines, which are available from the Intel Math Kernel Library. Sparse storage and PARDISO factorization and solving are also applied to the conventional ILU preconditioning. This enables these ILU based preconditioners to be used in the large examples posed, which would otherwise be intractable due to computational burden.
Let us focus first on the scattering problem. An incident vertically polarized plane wave is considered, impinging into the car nose with θ i = 90 • and φ i = 0 • . The equivalent electric currents induced on the car surfaces are shown in Fig.6. Figure 7(a) shows the convergence of the relative residual error to reach these currents with the number of Krylov iterations, for the case in which the problem is solved by MLFMA applying EFIE for the complete structure. It can be observed that both the MLFMA-MR-LU and MLFMA-ILU approaches outperform MLFMA and MLFMA-MR, reaching residual errors below 10 −6 in around 1500 iterations. Nevertheless, it must be remarked that the time per iteration is different in each preconditioner, as can be observed from Table 1, which gathers the setup, iteration time and solving time (time to reach 10 −6 residual error) for the set of examples considered in this section. Consequently, a better figure of merit is the wall-clock time, defined as the time to solve  the complete problem, which is shown in Fig. 7(b). From these curves we can observe that MLFMA-MR-LU is by far the preconditioner that provides the fastest convergence, reaching the prescribed relative error in little more than half an hour. This contrasts with the wall-clock time convergence of the other solutions, which remains above a relative error of 10 −3 after one hour and a half. MLFMA-MR-LU is superior to MLFMA-ILU also in terms of peak memory, as shown in Table 2, which lists the peak memory for the different examples and preconditioners. MLFMA-MR-LU requires a slight increase in memory compared to the plain MLFMA, but it is far from the more than four times the memory needed by MLFMA-ILU.
A similar picture can be observed in Fig. 8 (a) and (b) for the case of using CFIE for closed parts and EFIE for open ones. In this case, even though the iteration count is lower for MLFMA-ILU, when we bring time and peak memory up the MLFMA-MR-ILU solver is still the one that provides fastest convergence.  We move next to the radiation problem, where the excitation consists of an extended delta-gap applied to the base of the shark-type antenna. The equivalent electric currents induced on the structure are shown in Fig. 9. Looking at the convergences for the EFIE case, gathered in Fig. 10, a very similar behavior to the scattering problem is observed. Nevertheless, as could be expected, MLFMA without preconditioning performs worse than in the scattering case (for a given example, convergence is usually worse in radiation problems than in scattering problems due to localized excitation). What stands out in this result, however, is the excellent convergence of the MLFMA-MR-LU approach, which even improves the speed-up attained in the scattering counterpart. A residual error below 10 −6 is reached in less than half an hour (1000 iterations), a very fast convergence given the complexity of this problem, which demonstrates the remarkable improvement of matrix-conditioning contributed by MR basis functions.
Similar conclusions are observed in the case of using CFIE/EFIE formulation, shown in Fig. 11. As expected, the convergence of all preconditioned cases is better than using pure EFIE formulation, as can be deduced from Fig. 10. Both the MR and LU based preconditioners greatly facilitate VOLUME 10, 2022  In view of the preceding results, we can summarize that MLFMA-MR-ILU constitutes the best approach of those tested to deal with this kind of multi-scale problems, especially when the EFIE formulation come into play. Finally, for the sake of completeness, Fig. 12 shows some views of the induced currents obtained with MLFMA-MR-ILU, where it can be appreciated the noiseless distribution of current on  the very tiny details in different parts of the car. The current flows perfectly without observable artifacts along the entire structure, including car windows and windshields.
All results have been calculated on an Intel(R) Xeon(R) E7-8867 v3 computing server using 32 cores, providing an average speedup ratio of 24 for the larger examples.

V. CONCLUSION
In this work, a highly scalable parallel implementation of MLFMA-FFT combined with MR-ILU was proposed for the analysis of complex large-scale radiation and scattering problems including TDS and EFIE/CFIE formulations. The highly localized and hierarchic nature of the MR basis functions is taken into account to maximize the efficiency of the parallel implementation in the context of the MLFMA-FFT parallel scheme. The preconditioner is applied through the use of efficient algorithms for the store and calculation of SpMVPs and the efficient application of PARDISO solver.
The efficiency and versatility of the proposed approach has been demonstrated through simulations of a challenging structure, consisting of a Ferrari Testarossa, in different radiation and scattering scenarios. It was shown that the inclusion of the hierarchical MR-ILU scheme within the MLFMA-FFT constitutes a robust preconditioner for the solution of this kind of problems in all cases, and especially when the EFIE formulation come into play.