Tear-and-Interconnect Domain Decomposition Scheme for Solving Multiscale Composite Penetrable Objects

In this work, the tear-and-interconnect (T&I) surface-integral-equation (SIE) domain-decomposition (DD) approach–previously developed for non-penetrable bodies–, is extended to composite piecewise homogeneous penetrable objects including multiple materials and multiscale features. The main advantage of the proposed T&I scheme, with respect to conventional DD approach, is that it obviates the definition of large artificial surfaces (required for splitting the volumetric regions) with additional redundant unknowns, which avoids a significant increase in the computational cost, especially when dealing with large volumetric regions. In this sense, it has been shown that the T&I approach is an efficient and accurate alternative, which besides complements the conventional DD approach, since both are compatible and can be easily combined, which gives the possibility of applying one or the other as appropriate, depending on the specific characteristics of the problems to be solved in each case.


I. INTRODUCTION
Domain decomposition (DD) methods based on transmission conditions have meant a breakaway from conventional surface integral equation (SIE) approaches, enjoying considerable success in the resolution of electromagnetic scattering from complex problems made up of penetrable and non-penetrable homogeneous and piecewise homogeneous components with different electromagnetic properties and multiscale features [1]. The resolution of such strenuous problems is challenging and often unattainable by conventional techniques. By decomposing the original problem domain into a collection of smaller subdomains, in which local sub-problems are solved separately and coupled back together, the DD methods are able to address these The associate editor coordinating the review of this manuscript and approving it for publication was Wei E. I. Sha . challenging problems with a good degree of success. Although initially conceived for the preconditioning of finite element methods [2]- [8], the DD paradigm was finally extended to conventional surface integral equation (SIE) approaches, first applied to non-penetrable bodies [9]- [12] and later on extended to penetrable and composite objects [1], [13]- [19]. In these SIE-DD schemes, the local subdomains are enclosed by closed surfaces prior to be solved individually throughout the appropriate SIE fast algorithms. The transmission conditions are then enforced to guarantee the continuities of tangential electric and magnetic fields on the interfaces between touching subdomains, which provides the Huygens equivalent currents posing well defined local solutions. One drawback of this strategy is that it implies the introduction of redundant unknowns on the artificial interfaces required to close the subdomains, resulting in a significant computational cost overhead in some cases. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ An alternative to these early DD schemes is the tearand-interconnect (T&I) SIE-DD approach, which has been recently proposed applied to non-penetrable bodies [20]. This alternative does not require artificially closing the surfaces between adjacent (touching) subdomains, which instead can remain open [20]- [24]. The transmission conditions preventing unphysical reflections from the subdomain interfaces, which is essential to achieve good convergence, are then enforced along the tearing contours between subdomains. In some cases using overlapping regions to couple adjacent subdomains [20], [22], and in some other without auxiliar unknowns at all, directly enforcing the current continuity through an interior penalty formulation [23], [24]. The primarily advantage of the T&I scheme is the absence of fictitious surfaces to close the open subdomains, thus preventing the use of large numbers of redundant unknowns. Additionally, this simplifies the algorithm, which indeed can be more easily embedded into existing SIE codes, and it has a good propensity for parallelization over distributed computers.
In this work, we demonstrate that the T&I SIE-DD approach can be further extended to expedite the solution of fully homogeneous as well as composite piecewise homogeneous penetrable objects including multiple materials and multiscale features. In particular, we apply this approach to the accurate and efficient solution of complex, large-scale nanoplasmonic assemblies combined with dielectric substrates, in the context of the most recent advances in biosensing nanosensors. Although at first glance the resolution of open penetrable subdomains, as they arise from the application of the tear-and-interconnect DD, might seem rather counterintuitive, it must be noted that the local solutions generate scattered fields that subsequently impinge on all other subdomains, thus closing each subdomain via the righthand-side (or excitations vector) and posing well-defined local subproblems. By a judicious selection of subdomains, such that different scale and physics problems are isolated and treated separately, the proposed approach constitutes a true alternative for the efficient and accurate solution of different kinds of complex real-life penetrable problems.
A major advantage of the proposed scheme is that it avoids the definition of artificial surfaces and additional redundant unknowns, which would burden the computational cost especially in the case of splitting large volumetric homogeneous regions. Additionally, the tear-and-interconnect alternative can be straightforwardly combined with the more conventional approaches, thus allowing the DD partition method and local solver to be tailored to the characteristics and needs of each local subproblem.

II. FORMULATION
The DD method is outlined next. Let us start with the matrix system of linear equations posed by the application of the SIE formulation and the Galerkin method of moments (MoM) procedure [25] to a piecewise homogeneous composite object involving penetrable materials and material junctions, which can be written as Z · I = V, where Z is the impedance matrix, V is the known excitation vector and I is the unknown (solution) vector with the complex coefficients for the electric and the magnetic equivalent current expansions in terms of a set of known vector basis functions. A nonoverlapping additive Schwarz domain decomposition (DD) preconditioner can be applied to the solution of the above matrix system. We start by applying a partition of the geometry according to the geometrical features, materials and the particularities of the different blocks and subsystems of the overall problem. This results in a DD of the objects boundary surfaces into a collection of non-overlapping touching and/or non-touching subdomains. The matrix equation can be then left-preconditioned throughout the solutions of the individual subdomains, as follows: where P is the DD block diagonal preconditioner, which can be built as: where R i is a rectangular restriction matrix that maps the complete vector of unknowns to the sub-vector of unknowns corresponding to subdomain i (which is denoted as D i ), the transpose R T i is the prolongation matrix that extends the sub-vector in D i to the whole domain, and Z −1 i is the inverse of the impedance matrix governing the local problem enclosed in D i . The total number of subdomains in which the whole problem is split is denoted as p.
The rightmost matrix vector product (MVP) in the left-hand side of (1) corresponds to the global (outer) MVP, coupling the different subdomain solutions with one another. Meanwhile, the leftmost product by the matrix P provides the DD preconditioner individual subdomain solutions. Though formally written with the inverses of the subdomain matrices, the individual solutions can be addressed by the method deemed appropriate in each case.
Different SIE formulations can be applied to derive the matrix system of linear equations. Among the many possibilities, we have selected the electric (J) and magnetic (M) current combined field integral equation (JMCFIE) formulation, as it poses a well-conditioned system of linear equations for both the equivalent electric and magnetic currents. For the discretization of the integral equations we used the so-called multiregion (MR) piecewise vector basis functions, which implicitly satisfy the boundary conditions for the currents on the boundary surfaces and line junctions where different material regions intersect. The global and local MVPs required in (1) have been sped up using a hybrid MPI/OpenMP parallel implementation of the multilevel fast multipole algorithm -fast Fourier transform (MLFMA-FFT), which is applied in synergy with the DD scheme. The interested reader is referred to [20] and [22] for a more detailed explanation of the method.
In order to achieve good convergence with the above preconditioner, it is crucial to enforce the proper transmission conditions among touching subdomains. We apply the formulation of [20], which consists of enlarging the subdomains leading to the so-called augmented subdomains. With this scheme, it is assumed that the near fields along the subdomain enlargements are able to cancel the potentials due to the charge accumulation on the open edges. The preconditioner matrix is defined for the augmented (overlapping) subdomains as follows: where Z i = R i · Z · R T i is the impedance matrix block of the augmented subdomain D i , and R i and R T i are the restriction and prolongation matrices for D i respectively. Once these systems are solved, their solutions are restricted-back to the original non-overlapped subdomains and assembled together to provide the whole solution of currents at the actual stage using (1).
In our implementation, the enlargement to build the augmented subdomains is done by including ''flaps'' of a quarter to a half wavelength width (according to the medium with the longest wavelength touching the interface), belonging to the adjacent (touching) subdomain(s). Note that the use of larger buffer regions would not affect convergence or precision, although this might pose reduced efficiency. Otherwise, the restriction and prolongation matrices are not explicitly computed. Instead, the subdomains are constructed following an orderly sequence from the meshed CAD models, which are generated specifically to facilitate the application of the domain decomposition. The procedure is illustrated in Fig. 1, compared to the conventional approach using artificial closing surfaces. The original (penetrable or non-penetrable) domain, whose boundary surface is denoted as S, is split into two subdomains, D 1 and D 2 , made up as D 1 = S 1 ∪ 12 and D 2 = S 2 ∪ 21 in the conventional DD partition, and D 1 = S 1 ∪ 12 and D 2 = S 2 ∪ 21 in the tear-andinterconnect approach. Using this last approach, D 1 and D 2 are not the subdomains really solved in each global iteration, FIGURE 1. Sketch of the geometry decomposition into subdomains using conventional and T&I approaches.
but the augmented subdomains are solved instead, build up as D 1 = S 1 ∪ 12 ∪ 21 and D 2 = S 2 ∪ 21 ∪ 12 . Once the local solutions of the augmented subdomains are found, these solutions are restricted back to the original subdomains, D 1 and D 2 , and assembled to provide the global solution. The portions of the solutions belonging to the enlargement flaps in each case ( 21 and 12 , respectively) are discarded, thus paving the way to the global iterative solver, which will not have to deal with the cancellation of the electric potentials produced by the charges accumulated along the contours. Remarkably, the size of the quarter to a half wavelength width enlargement flaps, 12 and 21 , as required by the tearand-interconnect approach, is usually much smaller than the auxiliar (artificial) surfaces 12 and 21 required to close the subdomains in the conventional DD approach.

III. NUMERICAL RESULTS
In this section, we illustrate the effectiveness of the SIE tear-and-interconnect DD implementation in solving real-life systems. To begin with, we consider two validation examples showing the ability and accuracy of this procedure to decompose the simulation of fully homogeneous dielectric objects. A dielectric polytetrafluoroethylene (Teflon) cylinder is considered first, with 12.5 m in length, 2.5 m radius, and a relative permittivity and permeability of r = 2.1 − j0.00042 and µ r = 1. Aθ polarized plane wave impinging on the cylinder with θ inc = 45 • and φ inc = 0 • at a frequency of 300 MHz is considered as the excitation. Following the tear-and-interconnect procedure, the cylinder is decomposed into three open subdomains, as illustrated in Fig. 2, where the overlapping flaps to construct the augmented subdomains are also shown (using the notation of Fig. 1). A fairly dense discretization was applied to benefit iterative convergence, posing 1,615,392 unknowns for the equivalent electric and magnetic currents. Fig. 3 shows the equivalent electric and magnetic current distributions on the surface for the DD solution, calculated with the JMCFIE formulation for penetrable bodies. The parallel MLFMA-FFT algorithm was applied to speed-up both the reference (single-domain) solution, as well as the local solutions and the global MVP within the DD method. In both cases the simulations were run in a single computing node with 64 parallel cores and 1 TB RAM. Looking at Fig. 3, it can be seen that no artifacts are observed around the tearing lines, with the current perfectly flowing between subdomains. Additionally, the DD current solution does not show appreciable differences with the reference solution calculated with conventional MLFMA-FFT (not shown as they are identical). Bistatic radar cross sections (RCS) were also calculated for the previous impinging wave, and shown in Fig. 4. An excellent agreement between the DD and the reference MLFMA-FFT solution is observed. Iterative count and time to solve (wall-clock time) convergences are gathered in Fig. 5 for both the DD and the reference solution. A strong reduction of the iteration number required to attain a residual error below 10 −6 is observed for the case of the DD solution, where only 48 iterations were required. This contrasts with the more than 1,400 iterations required for the the reference solution. Nevertheless, despite the sharp reduction of the iterative count, it is important to note that the application of the DD preconditioner poses a significant increase in the cost per iteration. Indeed, we have found that better convergence rates are reached if a high precision is ascribed to the inner solvers making up the preconditioner (consequently, the same iteration tolerances have been prescribed for inner problems as for the external solver). A better figure of merit to compare both solutions in terms of computational cost is the wall-clock time required to   complete the calculations. Looking at Fig. 3(down), it can be observed that the wall-clock convergence is much faster in the reference MFLMA-FFT solution. This is as expected, considering that this is a well-conditioned homogeneous example, without multiscale features or multiple materials, which was entirely solved on a shared-memory computing node. Using DD approaches for this kind of well-conditioned problems does not report any benefit, as the reduction in the number of Krylov iterations does not compensate the much higher cost per iteration. Such well-conditioned and ''medium-sized'' problems can be solved faster using MLFMA or MLFMA-FFT, provided they are properly parallelized to benefit from the availability of many parallel cores.
Next, a dielectric slab is considered with 12 × 12 × 0.25 m 3 and a relative permittivity and permeability of r = 2 and µ r = 1. The excitation is an oblique plane wave with incident angles θ inc = 45 • and φ inc = 0 • andθ polarized at the frequency 300 MHz. A total of 7,381,344 unknowns are applied to model the equivalent electric and magnetic currents on the slab surfaces. The dielectric slab is divided into nine mosaic-like subdomains, as shown in Fig. 6. Also in this case the subdomains are open, which is not an inconvenience to obtain proper local solutions through the JMCFIE formulation. As an example, consider the first subdomain, in the upper left corner, which is represented in more detail in the inset. The subdomain is build up as D 1 = S 1 ∪ 12 ∪ 14 ∪ 15 , while the augmented (actually solved) subdomain is The other subdomains and augmented subdomains are constructed similarly. The induced electric and magnetic current densities are shown in Fig. 7, where no artifacts are observed around the tearing lines, with the current density flowing perfectly across the different subdomains. Regarding convergences, in Fig. 8, conclusions similar to those of the previous example can be drawn: with the DD solution the number of iterations is drastically reduced, although the wall-clock time is higher for these homogeneous, well-conditioned problems. Nevertheless, the above two examples demonstrate the validity of the tearing-and-interconnect approach to subdivide the  solution of homogeneous dielectrics without need of defining artificial surfaces to close the subdomains.
A different example is considered next, consisting of an assembly of Au nanorods (NR) onto a Polystyrene (PS) bead. The PS bead has a diameter of 400 nm and relative permittivity r = 1.5. The NRs have circular section and they are ended with spherical end-caps, with a diameter of 16 nm and a total length of 38 nm. The excitation of this problem is a normally impinging plane wave (laser) at 633 nm. The relative (dispersive) permittivity of gold at this wavelength is interpolated from the measurements of [26], being r = −11.7522 − 1.2598i. This kind of real-life hybrid dielectric/plasmonic system is of interest to surface-enhanced Raman scattering (SERS) chemical sensing of analytes, as they can directly interact with the NR surfaces [27]. The tunable Au NR density can be used to optimize the SERS efficiency of these hybrid nanomaterials. Nevertheless, the presence of multiple materials governed by a very different physics exhibiting rather high dielectric contrast, such as in this case the PS surface (dielectric) and the embedded gold nanorods (plasmonic at visible frequencies), poses a slow convergence of the iterative solver. The above is compounded by the fact that NRs are embedded onto the PS bead. Although the multiple-material junctions can be precisely modeled by the MR piecewise basis/testing functions, its presence hinders the already challenging multiphysics problem.
To overcome the potential lack of convergence, the complete problem is partitioned as sketched in Fig. 9(left). Considering first the large size of this system, it is split into 8 main subdomains, one for each octant. Next, considering the very different physical nature of the PS bead and the Au NRs, each NR (and its respective plasmonic-todielectric junction) is included into a different subdomain. In all, the domain decomposition procedure of this system with 496 Au NRs leads to 504 subdomains, 8 for the PS dielectric sphere and 496 for the NRs. Based on this decomposition, we apply different solvers tailored to the different subdomains. Subdomains 1 to 8 are solved iteratively via MLFMA-FFT, while the remaining subdomains (NRs) are solved by direct MoM factorization. In this particular case, as all the NRs are equal in shape and material, the same impedance matrix block is recycled for the calculations of the 496 NRs, thereby benefiting from the multiple repetitions (which is a usual feature of these nanosystems). Global interactions between subdomains are accelerated through parallel distributed MLFMA-FFT.
Given the presence of localized surface plasmon resonances (LSPRs) supporting very fast (subwavelength) field fluctuations, a high surface mesh density is considered for the Au NRs, with an edge size around 1/400 wavelengths. The size is reduced to around 1/20 wavelengths for the PS sphere. This poses a total of 6,754,224 unknowns for the electric and magnetic equivalent currents. The JMCFIE SIE formulation is applied, and a relative error norm of 10 −6 is considered to halt the global Krylov iterative solver. The calculations for both the DD solution and the MLFMA-FFT reference solutions were run in a computer cluster of 4 computing nodes with 64 parallel cores and 1 TB RAM each. For this, highly scalable distributed parallel implementations based on hybrid programming with the Message Passing Interface (MPI) and OpenMP pragmas were used for both the DD and the MLFMA-FFT methods. Fig. 9 (center and right) shows the equivalent electric and magnetic current distributions on the surface for this example, calculated through the distributed DD implementation. As in the previous examples, no artifacts are observed around the tearing lines between the different PS octants or the NR subdomains. Iterative count and time to solve (wall-clock time) convergences are shown in Fig. 10 for both the distributed DD and MLFMA-FFT solutions. A fast convergence is observed for the DD solution, requiring only 35 outer Krylov iterations to attain a residual error below 10 −6 . In comparison, it can be observed that MLFMA-FFT fails to reach the prescribed residual error within 200 iterations (posing a residual error above 5 · 10 −2 ). A more big picture is obtained by observing the wall-clock time convergence. Unlike in the previous canonical examples, which lacked a multi-scale component and were solved on a single shared-memory node, in this real-life case the proposed DD method brings a definitive advantage both in terms of iteration count and solving time, enabling convergence and providing a solution with a high degree of accuracy in just over 5 hours. A solution of such precision would be very difficult to achieve with MLFMA-FFT (if even possible), as it begins to stagnate at high residual errors.

IV. CONCLUSIONS
In this work we have extended the tear-and-interconnect domain decomposition approach to the decomposition of homogeneous and/or piecewise homogeneous penetrable objects. Importantly, even though the T&I methodology poses local subproblems containing open dielectric surfaces, they belong to larger closed dielectric regions. Therefore, these apparently open local problems are indeed closed by incoming radiation from the other subdomains (as collected in the local excitation vector), and the usual dielectric formulation can be perfectly applied (defined over the closed union of subdomains). The numerical examples illustrate the applicability of the DD approach for solving large-scale electrical problems including multi-scale features. In particular, the last example of the PS bead with embedded NRs is also useful to infer the advantage of the proposed T&I-DD approach compared to the conventional method based on closing subdomains. In the latter, the artificial boundary surfaces required to close the PS-bead octant subdomains would be even larger than the actual dielectric surfaces of the original problem, posing a much larger number of unknowns and burdening the computational cost. Nevertheless, it is also worth mentioning that the conventional DD partition method could be more useful in the case of the NR subdomains, since in this case no artificial surfaces between the rods and the PS-sphere are required (as they are made of different materials). In this sense, an additional advantage of the tear-an-interconnect proposed solution is that it can be easily combined with the conventional DD approach, thus giving the chance to apply one or the other depending on the specific characteristics of the problem to be solved. He became a full professor, in 2019. His current research interests include integral-equation numerical methods in computational electromagnetics, with a focus on fast solvers, domain decomposition techniques and parallel supercomputing, plasmonics, metamaterials, and bio-sensing applications. He is also involved in the evaluation and control of electromagnetic compatibility and the interference of cosite antennas and transceivers. He received the International PRACE Award and the Intel Itanium Solutions Innovation Award, in 2009, for his work on the integration of electromagnetics and supercomputing.
LUIS LANDESA received the M.S. and Ph.D. degrees in electrical engineering from the Universidade de Vigo, in 1995 and 1998, respectively. In 1999, he joined the Universidad de Extremadura as an Associate Professor, and he is responsible for the Computational Electromagnetics Group in this university. He is also responsible for the bachelor's degree in telecommunications engineering with the Universidad de Extremadura, where he teaches courses on electromagnetics. He has more than 35 articles in refereed journals. His current research is focused on fast algorithms in computational electromagnetics, supercomputing, and antenna arrays. He was a member of the Universidade de Vigo and the Spanish Council of Scientific Research. He received the International PRACE Award in supercomputing, in 2009, for his work on integration electromagnetics and supercomputing.
FERNANDO OBELLEIRO was born in Galicia, Spain, in 1968. He received the M.S. and Ph.D. degrees in telecommunication engineering from the Universidade de Vigo, Spain, in 1991 and 1994, respectively. In 1993, he was a Visiting Scholar with the ElectroScience Laboratory, The Ohio State University. In 1991, he joined the Faculty of the Telecommunication Engineering School, Universidade de Vigo. He became an associate professor, in 1994, and a full professor, in 2001. He also cooperates with the Spanish Navy, as a Technical Consultant on electronic warfare, radar signature, and electromagnetic compatibility issues. His current research interests include computational electromagnetics, supercomputing, electromagnetic compatibility, metamaterials, and nanoplasmonic modeling for bio-sensing applications. In 2009, he received the International PRACE Award and the Intel Innovation Award, both for his work on the integration electromagnetics and supercomputing.