Efficient Atomistic Simulation of Heterostructure Field-Effect Transistors

In this paper, atomistic-level quantum mechanical simulations are performed for nanoscale field-effect transistors (FETs) with lateral or vertical heterojunction, within the non-equilibrium Green’s function formalism. For efficient simulation of such heterostructure FETs, a novel approach is developed where the Green’s functions are calculated by complementarily using the two algorithms of the recursive Green’s function and the <inline-formula> <tex-math notation="LaTeX">$R$ </tex-math></inline-formula>-matrix. The <inline-formula> <tex-math notation="LaTeX">$R$ </tex-math></inline-formula>-matrix algorithm is extended to seamlessly combine the two methods on the open system and an algorithm for the electron correlation function based on the extended <inline-formula> <tex-math notation="LaTeX">$R$ </tex-math></inline-formula>-matrix algorithm is also developed. The proposed method significantly reduces simulation time, making rigorous atomistic simulations of heterojunction FETs possible. As an application, device simulations are performed for the germanane/InSe vertical tunneling FET (VTFET) modeled through the first-principles density functional theory. Our simulation results reveal that the germanane/InSe VTFET is a promising candidate for future low power applications.


I. INTRODUCTION
Devices with heterojunctions are being widely considered for the future field-effect transistors (FETs) such as tunneling FETs (TFETs) [1]- [4] and Schottky barrier FETs [5], [6]. By employing the heterojunctions, the band gap and band offsets can be altered, thereby the device performances can be engineered. In order to properly describe such band properties and to naturally capture the interface effects in heterojunctions, it is essential to model the heterostructure devices in the atomistic level. Although the empirical tight-binding method, one of the most popular atomisticlevel approaches, has been widely adopted for nanodevice modeling, it is not adequate for heterojunction [7]. Recently, the first-principles density functional theory (DFT) has been increasingly applied to modeling heterostructure FETs, since the devices with any atomic structures can be rigorously simulated by self-consistently combining DFT with non-equilibrium Green's function (NEGF) that is the most prevailing quantum transport approach.
As a practical approach, recent works [8]- [10] have employed the simulation framework where the DFT and NEGF parts are performed sequentially. Namely, after modeling a device by the DFT method, the Hamiltonians with the basis set of linear combination of atomic orbitals (LCAO) are extracted and imported into a NEGF-Poisson solver. This framework has served a powerful simulation platform for the homogeneous devices and can also be applicable to devices with heterojunctions. For the latter, however, the large size of junction Hamiltonians can become an obstacle to efficient NEGF simulations. Especially, if a junction is modeled in DFT using the supercell approximation [11], to capture the right physics, the supercell should be large enough to prevent the coupling between images [12], [13], resulting in the increase of the size of junction Hamiltonian.
In the recursive Green's function (RGF) method [14]- [16], which is the widely-used NEGF algorithm, the computational burden to handle large-sized junction Hamiltonians can be reduced by regularly slicing the junction region into smaller slabs. It is in principle possible, but it is in practice not efficient when it comes to not-well structured atom arrangement. This kind of problem was tackled in [17], [18] by exploiting parallel computations. Other methods such as hierarchical Shur complement [19], [20] and fast inverse using nested dissection methods [21] are more efficient than RGF, but the usage of these methods is limited to the systems where the matrix decomposition processes (e.g., LDL T , LU) are applicable.
Mil'nikov et al. have developed the R-matrix propagation method for the NEGF simulations of both continuum [22] and atomistic devices [23]. The R-matrix method is not restricted to the shape of device geometry or Hamiltonian because the method not only takes a group of atoms arbitrarily from the device region but also uses only basic matrix operations. The R-matrix algorithm is therefore more suitable for handling junction Hamiltonians than the aforementioned methods.
The aim of this paper is to present a methodology to reduce the computational burden for NEGF simulations of FETs with heterojunctions modeled in the atomistic level. Our main idea is to use the R-matrix and RGF methods complementarily; i.e., the R-matrix method is employed for the junctions usually having large-sized Hamiltonians, and for those of the rest of the device (homogeneous unit cells), the RGF method is used. For this purpose, we extend the R-matrix algorithm so that the algorithm directly applies to the open systems. The elapsed time for calculating retarded Green's functions is reported to assess the proposed approach, and an application to an example device of germanane/InSe vertical TFET (VTFET) is demonstrated.

A. CONCEPT OF THE PROPOSED METHOD
Consider a heterostructure device that consists of a 'device region' and semi-infinite 'leads,' as shown in Fig. 1(a). The device region can be divided into homogeneous unit cells and junction region. The homogeneous unit cells have usually small-sized Hamiltonian, whereas the junction unit cell has relatively large-sized Hamiltonian. In the widely-used RGF method, the unit cell is usually chosen as the slab, and the numerical cost is cubically proportional to the size (N s ) of Hamiltonian for the slab. Thus, the large-sized junction Hamiltonian gives substantial burden to the NEGF simulation employing the RGF. To alleviate this burden, the R-matrix algorithm is applied to the junction region in the proposed method (termed as RGF+R-matrix method) as illustrated in Fig. 1(b). For the junction region, the operation count of the R-matrix algorithm scales as N 3 a n a N 2 c [23], where N a is the size of Hamiltonian for atom cluster a, n a is the number of cluster a in the junction, and N c is the size of Hamiltonian for the boundary atoms of atom cluster C. With a limited interaction range, N a and N c are typically much smaller than N s for the junction region. So, the numerical burden for the junction region can be reduced, thereby the numerical efficiency of whole NEGF simulation can be significantly improved.
Because the RGF method includes the open system while the R-matrix method was originally designed for the closed system [23], to combine the RGF and R-matrix methods seamlessly, the R-matrix method should be extended so that the algorithm applies to open system directly. In Sections II-B-II-D, we discuss that in detail, and the extended R-matrix method is then combined with the RGF method in Section II-E.

B. EXTENSION OF R-MATRIX METHOD TO OPEN SYSTEM
Let us start by writing down a linear equation describing an open system as where H DD and S DD are the Hamiltonian and the overlap matrices for the device region, respectively, DD is the selfenergy matrix which accounts for semi-infinite leads and scattering effect, if any (throughout this work, we neglect the scattering effects, but the method proposed in this work is applicable even if the scattering effects are included), is the wave function for the device region, and is the source term that describes the carrier injection. Since we consider the open system, the energy ε in Eq. (1) is usually given as a complex number having a small imaginary part (typically < 10 −9 ). Let C be a cluster of atoms arbitrarily taken from the device region and B be the rest of the device region. Then, H DD , S DD , and DD can be represented as The R-matrix method is similar to the contact block reduction (CBR) method [24], [25] in that device partitioning is performed, but the partitioning in the two methods is different. In the CBR method, the device region is divided into the boundary regions coupled with the leads and the rest of the device region while in the R-matrix method, the device region is divided into two regions C and B arbitrarily. Now Eq. (1) can be written as with

C.1. FORWARD PROPAGATION
Now we define the R matrix R cc as where c is the boundary atoms of C and P c is the projection operator from C onto c. The definition of R cc in this work is different from that in [23] in that Eq. (5) includes the self-energies. We now pick a group of atoms a ∈ B and add a into C. Note that a can be selected arbitrarily in principle. The wave functions at c and a can be obtained as where the setB = B − a and R aa = A −1 aa = (εS aa − H aa − aa ) −1 . Performing proper matrix operations for Eq. (6) as in [23], we can obtain whereR By taking only boundary atoms from the set c ∪ a (see Fig. 2(a)), we can construct new atom clusterc and its R matrix (Rcc), that is, where Pc is the projection operator from c ∪ a toc, and c and Rcc of an iteration step are used as c and R cc at the next iteration, respectively, and iterations continue. Rcc obtained at the last iteration is equal to the diagonal block of retarded Green's function, corresponding to the set a taken at the last iteration step. Eqs. (7) and (8) are consistent with Eqs. (17) and (18) in [23], respectively. This implies that, with the same recipes used in the closed system, R matrix can be propagated for the open system.

C.2. BACKWARD PROPAGATION
For the device region divided intoC = C ∪ a andB = B − a (see the left side of Fig. 2(b)), the retarded Green's function G can be given from G satisfies the Dyson's equations where The boundary terms of GCC in Eq. (12) can be written as whereb is the boundary atom set ofB. So the retarded Green's function corresponding to a is where Like in [23], ab and b a must be stored in the forward propagation to calculate G aa . To compute the off-diagonal parts of G, we move on to GBC. Performing the boundary projection for Eq. (13), we can obtain G ab and Gb a from GCB and GBC, respectively, and they are The initial Gbb for the backward propagation is set to be the final Rcc in the forward propagation. To calculate the matrix that plays the same role as Gbb in the next backward iteration, we build the retarded Green's function for the set a ∪b as In Eq. (20), by carrying out the boundary projection from the set a ∪b to b (see Fig. 2(b)), where b is the boundary atom set of B, G bb is obtained as where P b is the projection operator from the set a ∪b to b. G bb is used as Gbb of the next backward iteration, and the backward iterations continue.

D. CARRIER AND CURRENT DENSITIES BY EXTENDED R-MATRIX METHOD
The electron density by the NEGF method can be calculated using the electron correlation function G n . At an atom α, the electron density n α is calculated through where g s is the spin degeneracy, v α is the volume of α, and the subscript αα indicates the matrix block corresponding to atom α.
To derive the recipes for G n , let us start with the definition of G n of the device region divided into C and B. That is, where the superscript ' †' is the Hermitian conjugate of matrix and the in-scattering function in is defined as where N lead is the number of leads in the system and j and f j are the broadening matrix and Fermi-Dirac distribution function of lead j, respectively. With the self-energy j , j is given as G n satisfies the Dyson's equations where As in the calculation of G, the necessary parts of G n are calculated by the forward and backward recursive algorithms. In the forward propagation, the boundary parts of G n0 are calculated, and on-and off-diagonal blocks of G n are obtained by the backward recursion. We define R n as the boundary projection of G n0 . Then, from Eq. (28), the R matrix R n cc for the boundary atoms of C can be defined as Choosing a from B and adding it into C, we have Eq. (30) is associated with R n cc via Eq. (8). By performing the boundary projection from c ∪ a ontoc for Eq. (30), we can compute the matrix R ñ cc , which is the new R n cc for the next forward calculation.
Using Eqs. (26) and (27), the necessary parts of G n can be calculated in a similar manner to that for G, and they are The initial G ñ bb is equal to the final R n cc of the forward propagation, and the matrix G n bb (G ñ bb for the next iteration) can be obtained via The hole density is obtained from the hole correlation function G p , using the relation and the net drain current (J) is calculated by where q 0 is the elementary charge, h is the Planck constant, the subscript L indicates the left lead of the system, and the subscript β is the set of all the atoms coupled with the left lead. Eq. (35) is consistent with [16, eq. (154)].

E. RGF+R-MATRIX METHOD
A heterostructure FET can be divided into two parts: junction regions and homogeneous region consisting of identical unit cells. The junction regions play a critical role in determining the characteristics of the device while the homogeneous unit cells occupy most of the device spatially. The R-matrix method is numerically preferable for treating large-sized junction cells because an atom cluster can be constructed by adding a few number of atoms and only boundary atoms of the cluster contribute to calculating the related R matrices. However, for homogeneous unit cells usually represented by small-sized Hamiltonians, taking partial atoms for cluster construction is inefficient because it may increase the number of matrix operations. So, for the homogeneous unit cells, the RGF method, which is a well-established method for layered structures, is preferable. In order to sketch how the RGF and R-matrix methods are combined, let us consider a homogeneous device that has structurally identical N q unit cells, as shown in Fig. 3(a). We assume that all the atoms in each unit cell are coupled with the ones in the nearest unit cells, and we consider the unit cells as the slabs for RGF. Because a unit cell can be considered as an atom group, we can choose all the atoms in the slab q for the atom cluster a. Then, c andb are the slabs p = q − 1 and q + 1, respectively (see Fig. 3(a)), and they are not coupled each other. So the boundary atoms of c ∪ a are equal to a, satisfying Rcc =R aa . Now usingR aa in Eq. (8), we can write Rcc as or using the corresponding slab indices, Here R p has the meaning of the boundary parts of the consecutive slabs 1:p, that is One can see that Eq. (38) is the same as the definition of the left connected Green's function (g L p ) in the RGF [14]- [16]. Now we can write Eq. (37) as For a device region with open boundary conditions, the R matrices play the same role as the left-connected Green's functions in the RGF algorithm. By utilizing the relation between Rcc and g L q , we can combine the R-matrix method and the RGF method, as can be seen in Fig. 3(b). The RGF and R-matrix methods are applied to the homogeneous slabs and the junction region, respectively. For slab p at the left of the junction, g L p is calculated via Eq. (39) and used as R cc in Eq. (36) for the junction. For the final atom cluster of the junction, Rcc is calculated using Eq. (36) and it is treated as g L q−1 to calculate g L q of slab q on the right of the junction. The backward recursion can be carried out in a similar manner. When the process changes from RGF to Rmatrix, G q,q is considered as G bb , and for the reverse step, Gbb is used for the calculation of G q,q .

A. DFT SIMULATIONS AND HAMILTONIAN EXTRACTION
In this work, the SIESTA [26] code, a LCAO-based DFT tool, was utilized to model heterostructures in the atomistic level. The generalized gradient approximation of Perdew-Burke-Ernzerhof [27] was adopted for the exchangecorrelation functional, and double-ζ plus polarization pseudoatomic orbitals were employed for the basis set. To  demonstrate the efficiency of the RGF+R-matrix algorithm, we consider four heterostructures of two-dimensional (2D) materials: MoS 2 1T/2H/1T and silicene/silicane/silicene as the lateral heterojunctions ( Fig. 4(a)), and MoTe 2 /SnS 2 and germanane/InSe for the 2D VTFET structures (Fig. 4(b)).
For each material, the atomic positions in the primitive unit cell were relaxed first, until the maximum force became less than 0.05 eV/Å with 25 × 25 × 1 Monkhorst-Pack k-grids. Using the structurally optimized primitive cells, we then made the heterojunctions. As an example, let us consider MoS 2 1T/2H heterojunction illustrated in Fig. 4(c). First, for each material, the relaxed primitive cells were taken as the rectangular unit cells (RUCs), and with the RUCs, the junction supercell was constructed. To suppress the coupling between the periodic images, two buffer cells were attached to the left and right so that the junction supercell has total sixteen RUCs (see Fig. 4(c)), and the geometry optimization was conducted with the junction supercell. The periodic boundary conditions were imposed in the x-and y-directions while 10 nm vacuum layer were included in the z-direction. Other considered heterostructures were modeled in a similar way, treating the structure of Fig. 4(b) as a lateral junction of A/C/B [28].
For NEGF simulations, we extracted the necessary Hamiltonian blocks from the supercell consisting of the relaxed junctions and additional RUCs. For example, consider MoS 2 1T/2H heterostructure in Fig. 4(d). H A/B and W A/B are the Hamiltonians of the RUC A/B and between the nearest neighbor RUCs, respectively. H J is the on-site Hamiltonian of the junction. An extra RUC (shaded in Fig. 4(d)) was inserted between the junction and the left (right) region so that the coupling Hamiltonian between them was the same as W A (W B ), as in [29]. The overlap matrices were obtained in the same manner. The relaxed lattice parameters of the component materials and the extracted Hamiltonian sizes are summarized on Table 1.

B. COMPUTATIONAL EFFICIENCY
To assess the numerical efficiency of the RGF+R-matrix method, we have measured the elapsed CPU time per energy point for calculating the retarded Green's functions. Three algorithms, RGF-only, R-matrix-only, and RGF+R-matrix are compared. The computation was conducted using 20 cores, and the elapsed CPU time values for 20 cores were averaged. In case of the R-matrix-only and RGF+R-matrix methods, the number of picked atoms (size of the atom cluster a) were varied. As shown in Fig. 5, for all the considered structures, R-matrix-only and RGF+R-matrix exhibit the better performance than RGF-only. Applying the R-matrix algorithm to junction parts, the speed-up of from 1.7 to 4.3 times (41% to 77% reduction of elapsed CPU time) has been achieved, compared with the results of RGF-only. The RGF+R-matrix further improves the efficiency. This is because the RGF method is more efficient than the R-matrix method for handling the homogeneous unit cells of smallsized Hamiltonians as mentioned in the first paragraph of Section II-E.
To verify whether the retarded Green's functions are computed accurately by the RGF+R-matrix method, we have calculated the transmission function using 1000 energy points. As can be seen in Fig. 6, for all the simulated heterostructures, transmission functions of RGF+R-matrix  excellently match those of RGF-only. For the used 1000 energy points, the maximum difference of the calculated transmission was about 10 −5 . Note that the three methods of RGF, R-matrix, and RGF+R-matrix are the exact methods, thus mathematically, the final computation results by the three methods should be the same. This is the reason that we have chosen the widely-used RGF algorithm as the reference to verify the proposed RGF+R-matrix algorithm. However, the three methods are also recursive algorithms, so the round-off errors may accumulate during recursive iterations, leading to the different results numerically.

C. SELF-CONSISTENT DEVICE SIMULATION
As an application, we have performed a device simulation of germanane/InSe 2D VTFET using the RGF+R-matrix method. As can be seen in Figs. 7(a) and (b), germanane/InSe heterolayer has the direct band gap of 0.45 eV and the atomic orbitals of InSe and germanane contribute mainly to the conduction and valence bands, respectively, which implies that the band-to-band tunneling may occur between germanane and InSe. The schematic of the simulated device is illustrated in Fig. 7(c). The length of the overlap region (germanane/InSe heterolayer) is 25 nm and the gate length is 85 nm; the length of gate overlap for each source/drain is 30 nm. The source and drain lengths are 40 nm. Equivalent oxide thickness is 0.5 nm, and the additional thickness of the oxide at the top (bottom) of the source (drain) region is 0.54 nm (0.39 nm), which is equal to the atomic thickness of InSe (germanane) monolayer. The device is periodic in the y-direction, which is accounted for 20 transverse modes in this work. Source (germanane) and drain (InSe) regions are doped to be p-type and n-type, respectively, with the density of 10 20 cm −3 . A low drain voltage V DD of 0.4 V was applied. The threshold voltage (V th ) was defined such that OFF-state current was 10 −6 μA/μm, and we denote V GT as V TG − V th , where V TG is the top gate voltage. The bottom gate voltage (V BG ) was set to −0.8 V to ensure the TFET operation within the V GT window of 0 V and 0.4 V. The size of the extracted Hamiltonian and overlap matrices for the homogeneous unit cells was reduced by a mode space transformation [9] for more efficient implementation. Ballistic transport was assumed. The NEGF-Poisson selfconsistent calculations were carried out until convergence. The charge density was calculated by the algorithm described in Section II-D and the current values (I D ) were computed by both the methods of RGF-only and RGF+R-matrix. All the simulations were conducted assuming the temperature of 300 K.
It can be seen in Fig. 8(a) that the current values computed by RGF-only and RGF+R-matrix agree excellently. The simulated transistor has the ON-state current of 121 μA/μm, and the minimum subthreshold swing of the device is 12 mV/dec, offering low power applications. One can see the operation mechanism of the simulated VTFET in Figs. 8(b) and (c). At V GT = 0 V, the conduction band edge (CBE) of InSe is higher than the valence band edge (VBE) of germanane, so ultra-low tunneling current flows in the device. As the V GT increases, the CBE of InSe moves down in energy faster than the VBE of germanane, until at on-state, CBE of InSe becomes located below VBE of germanane. Then, the electrons in the valence band of germanane move easily to the conduction band of InSe, giving rise to a large current (see Fig. 8(c)). Note that, in the heterostructure TFETs, the resonances by the quasi-bound states in junction region can contribute to the total current when scattering effects are included [30]. So, the current values in Fig. 8(a) may be underestimated, since ballistic transport was assumed in this work.

IV. CONCLUSION
We have presented a method for computationally efficient NEGF simulations of the heterostructure FETs including large-sized junctions modeled in the atomistic level. The main idea is to employ the RGF and R-matrix algorithms for the small-sized homogeneous unit cells and the largesized junctions, respectively. For this purpose, the R-matrix method has been extended so that the algorithm applies to open systems directly. Compared with the usual RGF-only method, the Green's functions are calculated more efficiently using the proposed RGF+R-matrix method. As an application, the device simulations on germanane/InSe 2D VTFET have been demonstrated successfully. In this work, although we have considered the heterostructures modeled by DFT, the RGF+R-matrix method can be applied to the heterostructures modeled using other approaches such as tight-binding method and k·p method. In addition to the 2D heterostructure FETs that were selected for example devices in this work, the proposed method can be useful for the devices having heterojunctions of three-dimensional materials and also for the case where irregularities such as surface roughness, atom vacancies, and dopants are present.