Geometrical Features of Epidermal Growth Factor Receptor-Related Dimers Reveal the Mechanisms of Drug Resistance in Lung Cancer Patients

Epidermal growth factor receptor (EGFR) plays an important role in lung cell proliferation. Dimerization of EGFR family members and other receptor tyrosine kinases (RTKs) act as a vital controller for lung cell life cycle signals. Mutations in the kinase domain of EGFR may disorder the signaling networks and lead to cancer. Drug resistance occurs in several generations of EGFR drugs due to genetic mutations, and there is a very less understanding about the mechanism of EGFR-mutated drug resistance. In this work, we investigate the mechanism of wild type EGFR protein and its drug-sensitive and drug-resistant mutations. We performed molecular dynamics (MD) simulation for the 10-ns EGFR-drug mutant complex and investigated the structures’ geometrical properties. With features extracted by alpha shape modeling, different geometrical properties, such as matching rates of atom solid angles at interaction sites and centroid distances between interfacial atoms, were calculated to characterize the binding intensity. Wilcoxon rank sum test was applied to reveal the differences between mutations based on extracted properties. We have developed a framework that explains the drug resistance mechanism based on geometrical properties and binding free energy. Results revealed that drug-sensitive mutants have tighter interactions with corresponding RTK in the complex for all protein-drug systems, while drug-resistant mutants are bound looser. The extracted geometrical properties of the drug mutant complex help understand the drug response mechanism at atomic level.


I. INTRODUCTION A. BACKGROUND
Lung cancer is the most deadly type of cancer, having the lowest survival rate among all types [1], [2].Non-small cell lung carcinoma (NSCLC) is the primary type of lung cancer that covers about 85% of all the lung cancer cases, and therefore it is an active research problem [3].Epidermal growth factor receptor (EGFR), a member of the ErbB family, is a vital controller on signal pathways of cell proliferation [4], [5].As a transmembrane protein, the extracellular domain of EGFR can be activated by binding to a ligand called a growth factor.A homodimer or a heterodimer of EGFR will The associate editor coordinating the review of this manuscript and approving it for publication was G. R. Sinha .then be formed, leading to specific residues' phosphorylation on intra-cellular tyrosine kinase domain and switch on the downstream signal pathways.Thus, dimerization proves to be one of the essential parts of signaling [6], [7].
Researches have confirmed that EGFR plays a vital role in lung cancer pathology, and overexpression of EGFR can be found in about 40% 80% of NSCLC patients [8], [9].Mutations on the EGFR gene are the root cause of the signaling disturbance and will lead to abnormal cell proliferation [10].
Tyrosine kinase inhibitors (TKIs) like Gefitinib or Erlotinib are designed to target EGFR and stop the over-activated signal pathways.The drug molecules can occupy ATP's binding site, blocking the downstream signaling and reducing the abnormal proliferation.These drugs proved to be effective in therapies and are accepted as first-line treatment for NSCLC patients suffering from EGFR mutations [11].However, in clinical therapies, drug resistance usually appears in about a year [12], [13].Several studies have been conducted in the past decade to understand the drug resistance mechanism due to a secondary point mutation T790M, which accounts for about 50% of all the resistant cases [14]- [16].These mutations decreased the binding affinity in the EGFR-TKI system and weaken the effectiveness of TKIs

B. LITERATURE REVIEW
Previous research have been conducted on molecular dynamics simulation (MD) to understand the drug resistance process in lung cancer patients [17]- [19].Sanders et al. simulated the binding of signal ligands to the extracellular domain of EGFR, studying the affinity of binding with free energy [20].Qureshi et al. investigated EGFR mutants' domains using parametric models and complex networks [21], [22].Principal component analysis can also be used to investigate the structural dynamics [23] and visualization of protein-drug interactions [24].Dixit et al. applied multiscale simulations to study long-range communications in the EGFR kinase domain [25].Ghosh et al. analyzed the hydrogen bonds of the EGFR heterodimer [15].

C. SELECTION OF THE MODELS AND RESEARCH PROCEDURES
This study analyzed the interactions between EGFR mutations and their RTK partners in dimer-drug complexes with computational methods.It is known that apart from ErbB family members, RTKs like Insulin-like growth factor 1 receptor (IGF-1R) [26], [27] can also dimerize with EGFR or its mutants and join the cell signaling network.Thus, we took both EGFR-EGFR homodimers and IGF-1R-EGFR heterodimers for MD simulations.Gefitinib (Iressa, ZD1839) is a first-generation TKI to treat NSCLC patients [28] and was selected to set up the dimer-drug complex.Exon 21 L858R, exon 19 deletion (short as del 19) [29], [30] and G719S [31] mutations are more activate and sensitive to Gefitinib, while secondary mutations T854A, L858R-T790M (short as T790M) [29], [30] and L858R-T790M-C797S (short as C797S) [32], [33] are drug-resistant.It is also shown that the αC-β4 loop is an important part of protein kinases and has a significant influence on protein structures and functions [34].Insertion mutations on αC-β4 loop of EGFR are found to have a relationship with drug resistance of first generation drugs including Gefitinib.The mutations also influence interfaces of protein-protein interactions and are related with asymmetric dimerization of EGFR, which is the main focus of our work [34], [35].The αC-β4 loop exists in exon 20 of the protein kinase, and mutations on exon 20 may have a relationship with drug resistance, such as T790M and C797S.Here, we choose the D770_N771insNPG, which is an exon 20 insertion (short as Ins 20 in this work) and is found in αC-β4 loop of EGFR [35].The crystal structure can be obtained from PDB bank(PDB:4LRM) [36].Wild type (WT) EGFR is also taken as a reference.The WT EGFR is not sensitive to the drug, and former research showed that it behaves similarly to drug-resistant mutants [37], [38].These mutations were dimerized with RTK partners, respectively, for comparing.
To simulate the reactions in dimer-drug complexes, MD simulations were implemented for all the mutations in the same environment.Then we used three-dimensional Alpha Shape modeling [39], [40] to extract the geometrical information from all the MD trajectories.For atoms at interaction sites, matching rates of their solid angles and centroid distances between the interaction atoms on different chains are calculated to describe the binding affinities.Wilcoxon rank sum test [41], [42] is then applied to check the difference of extracted properties between two kinds of mutations.After that, as a mature method, binding free energy was also used to compare geometrical analysis.These methods will reveal the influence of EGFR mutations on dimerization in the dimer-drug complex and provide a better understanding of lung cancer drug resistance mechanisms.The paper is organized as follows: In Section 2, protein system preparation and analysis methods are described.Section 3 presents result and discussion, and finally Section 4 concludes the paper with future prospects.The whole research process is shown in Figure 1.

A. PREPARING MODELS FOR EGFR MUTANTS AND DIMER-DRUG COMPLEXES
EGFR is a multi-domain protein containing a large extracellular domain, transmembrane domain, and kinase domain.The dimerization of the tyrosine kinase (TK) domain is related to our topic.A crystal structure of wild type EGFR kinase domain homodimer (PDB:2GS2, resolution 2.80 Å) [43] is selected as a structural template, and structure of IGF-1R is extracted from PDB:5FXQ (resolution 2.30 Å) [44].We applied Rosetta to generate all the mutations of EGFR based on one monomer from the EGFR dimer [45].Mutations include L858R, exon 19 deletion, G719S, T854A, L858R-T790M, L858R-T790M-C797S and exon 20 insertion, D770_N771insNPG in αC-β4 loop.We first filled the template model gaps to obtain a whole chain structure for the alpha shape algorithm, and all necessary fragments including αC-β4 loop is contained in the model.Rosetta provides a fragment library, Robetta [46], to decompose the protein into 9mer and 3mer fragments.Rosetta comparative modeling (CM) protocol generates different possible structures based on fragment files and rank the energy of the output structures.Extractpdb protocol in Rosetta is designed to select the most stable structure, namely the structure with the lowest energy.Exon19 deletion mutant was generated in the same way.Other mutants of EGFR are mostly point mutations.For example, L858R means a replacement from Leucine (L) to Arginine (R) at the 858th position on the chain of residues.The mutated structures can be derived FIGURE 1. Flow diagram that shows the whole research process.We extract the templates from PDB and perform modeling using Chimera.We prepare EGFR homodimers and heterodimers with Gefitinib drug-molecule.Drug-sensitive and drug resistive types characterize the systems.After that, we perform MD simulation for 10-ns and use alpha shape modeling and binding free energies to evaluate the geometrical properties.Simulation results reveal interesting insights that explain the drug resistance mechanism.by a high-resolution ddgmonomer (HRDM) protocol.In the pre-minimization procedure, we set harmonic distance constraints of 9 Angstroms for all the C-alpha atoms.The minimization procedure is run for three rounds, with a repulsive term set to 100 percent weight and Rosetta parameter file score12 as input parameters.An example of a modeling result is shown in Figures 2A and 2B.
Models of EGFR-RTK dimers were built by the alignment tool of Chimera [47].Our EGFR-EGFR homodimer template has two peptide chains.The research found that TKI-blocked EGFR with an intact C-terminal lobe (C-lobe) face can be re-activated by the N-lobe and ATP-binding site of its RTK partner [48].To avoid this phenomenon's interference, we selected one chain with the N-terminal lobe (N-lobe) face at the dimer interaction area, and EGFR (wild type or its mutants) were aligned to this position.The other chain's C-lobe face is near the ATP binding site and located in the interaction area.RTK partners were aligned to this position to establish the whole dimer.The Gefitinib molecule's structure and position are extracted from a crystal structure of the EGFR-drug complex (PDB:2ITY) [43] and aligned to the same drug docking site method.An example of a dimer-drug complex structure is shown in Figure 2C.

B. MOLECULAR DYNAMICS (MD) SIMULATION
MD simulation is a widely used tool for analyzing the properties and behavior of molecules.With initial data, including coordinates, potential energies, and force fields as input, MD algorithm calculates all the target atoms' motions by Newton's equations [49].In this study, we applied the Amber software suite to execute MD simulations on all the dimer-drug complexes [50] based on the explicit-solvent model.We first solvated the complex into a water-box with Ff99SB and gaff force fields for the simulation procedures.The solvent system was neutralized with Na+ and Cl-ions and the energy of the system was minimized.Procedures  of heating for 50 picoseconds(ps), density equilibrium for 50ps, and constant pressure equilibrium for four nanoseconds (ns) were performed in sequence.After all these steps, MD simulation was performed for 10ns, with a sampling interval of 10ps, resulting in 1000 frames' trajectory for each protein systems.The production MD files are analyzed using computational methods.

C. GEOMETRICAL ANALYSIS OF EGFR-RTK INTERACTION IN DIMER-DRUG COMPLEXES 1) ALPHA SHAPE MODELING
Alpha shape modeling is a convenient tool to extract geometrical features.As a linear approximation method, the alpha shape algorithm facilitates the geometrical data and reconstructs the target object's surface to obtain the properties [51].The alpha shape theory is based on triangulation algorithms.The basic definition of triangulation is that after the triangulation of a particular set of points, no point in the set will be located in the circumcircle of any triangles generated by the algorithm.Figure 3 shows examples on the same point set A, B, C, D. Figure 3A shows a triangulation that does not meet the criterion and needs to be re-generated, while Figure 3B shows a successful triangulation.
Different algorithms are designed to implement the triangulation theory, and Delaunay triangulation is the most popular method.The Delaunay triangulation algorithm aims to maximize the minimum of all angles in the simplexes [52], with criterion as follows: Here, each pair (xi, yi) stands for the coordinates of the corresponding point i.If the determinant's value is positive, The alpha shape algorithm we adopted starts from 3D Delaunay triangulation.Points in dataset l form tetrahedrons instead of 2D triangles, and circumscribed spheres of every tetrahedron are calculated.A value α is defined, and squared radii of the spheres are required to be smaller than or equal to α, while no points in the dataset should appear inside the spheres.The algorithm examines and modifies the tetrahedrons until the requirements are reached, and the corresponding tetrahedrons form the alpha shape at value α.
In the dimer-drug complex, our interest focused on the heavy atoms that composed the molecule's skeleton.As different kinds of atoms have different mass, we implement a weighted alpha shape algorithm, as it considers positions of the points and weights [39].For this method, we set two points, p 1 = (p 1 , p 1 ) and p 2 = (p 2 , p 2 ) as example.p 1 and p 2 are the coordinates of points, while p 1 and p 2 are the values of their weights.Euclidean distance between the two points are set as p 1 , p 2 and the algorithm will compare the Euclidean distance with the sum of weights.The judgement formulas are shown as follows: Here, Equation ( 2) means the two points, p1 and p2, are orthogonal, while Equation ( 3) means the corresponding points are sub-orthogonal.Then, a point can be defined as: where α is a pre-defined real value.
For every tetrahedron composing the resulting weighted alpha shape model, there is a set of relevant points to the vertices of the tetrahedron.There should be an orthogonal point to all these points and is sub-orthogonal to all the rest points.When the condition is met, the algorithm is successfully ended.
In this study, Computational Geometry Algorithm Library (CGAL) was adopted to calculate the weighted Alpha Shape models of all the MD trajectory frames individually [53].The value of α was set as 0 for all the complexes.The atoms' coordinates were set as locations of points, and the square values of Van der Waals radii of the atoms were taken as point weights.Resulting weighted alpha shapes were collected as input data for further analysis.

2) MATCHING RATE OF INTERFACIAL ATOMS
As we mainly focused on the EGFR-RTK interactions in the complexes, states of atoms on the interface between two peptide chains are essential.Alpha shape algorithm can help generate the geometrical surface of a specific structure, and we made use of this function to fetch the target atoms.We collected the dimer's surface atoms based on alpha shape model and named these atoms as point set D. Then, we obtain point set E for surface atoms of monomer EGFR and point set R, which correspond to surface atoms of monomer RTK separately.The point set of interfacial atoms, named I, was obtained by set operations.These interfacial atoms can be further divided into atoms on EGFR chain and atoms on RTK chain, named as point set I E and I R respectively.S Point sets are obtained as follows: Convex or concave degree of alpha shape surface at interfacial atoms can describe the geometrical properties and characterize monomers' interactions.The solid angle is implemented as a quantization method of curvature [54].We set the vertices as A, B, C, and D in a tetrahedron, respectively.The dihedral angle between geometrical surfaces ABC and ACD is named as ϕ AC , dihedral angle between surface ABC and ABD is ϕ AB , and dihedral angle between surface ACD and ABD is ϕ AD .Solid angle of point A is calculated by these three values, and the equation is: where i stands for the index number of any tetrahedron with A as vertex, and the final result of is scaled into the range of [−1, 1].The value characterizes the surface's shape, where a positive result corresponds to a convex curve, and a negative result corresponds to a concave curve.With the values that can describe curvatures, we can indicate the interactions between the dimer's two peptide chains by their geometrical properties.For instance, we select an atom A on one chain of the dimer and atom B on the other chain.If these two atoms are on the same edge of any tetrahedron that connects the two chains, we take them as a pair.Suppose curvature at one atom is convex, and the other is concave.In that case, the shapes fit into each other, implying an intensive interaction between the two atoms, and the pair is recorded as matched.
On the contrary, if both curvatures at atoms are both convex or are both concave, then the interaction is weaker, and the pair is marked as unmatched.A two-dimensional example is shown in Figure 4.In the algorithm, we calculate the solid angle of A and B, named as A and B .Whether they are matched can be decided and recorded by the equation below: For each of the target model, we calculate the matching rate of two chains in the complex based on the following equation: where (i, j) is the atoms in a pair, and N is the total number of all the pairs.The algorithm was implemented on each frame of the MD trajectory, and the results were collected to analyze the interaction properties.angle is calculated based on all the triangles with A as vertex, namely ABM, ABN, etc.As curvature at points A and B are both concave, surface shapes do not fit into each other.Thus, the interaction here is weak, the atoms do not match.(B) The curvature at point C is concave, but D is convex, leading to a better interaction and the atoms match according to the decision formula.

3) WILCOXON RANK SUM TEST
The Wilcoxon rank sum test, also known as Mann-Whitney U test, is designed to judge whether two data groups are taken out from the same population [55].First, we make a null hypothesis that the two datasets, named A and B, are from the same population.Then, we combined all the observed values from the two datasets into one group.The number of elements in subset A is recorded as n A , and number of data in subset B as n B .The total amount of all the data is N, and data in the new group is re-ranked from 1 to N. Next, the rank scores are divided by the group, and the sum of scores is calculated for each group, recorded as T A and T B respectively.The computing method of value U is as following equations: U is discrete or uniform so that we can set a critical value with assigned probability.U's statistic is compared with this vital value, and if U is larger, we will reject the null hypothesis and declare that the two datasets are not from the same population.Outputs of Wilcoxon rank sum test can quantize the difference between distributions of two datasets and check whether the properties of different mutations are similar or different.

4) CENTROID DISTANCES BETWEEN INTERFACIAL ATOMS
The interfacial atoms are derived as methods above.According to which peptide chain of the dimer they are located on, the interfacial atoms can be divided into two groups, and each group has a mass center.We take a group of point objects as an example.The total number of the objects is N, and their masses are set as m 1 , m 2 , m 3 , . . ., m N .All these point objects are supposed to be on a specific axis x, with coordinates x 1 , x 2 , x 3 , . . ., x N .The coordinate of mass center can be calculated as: For a 2D or 3D point object, the algorithm follows similar ideas.In this study, we applied the cpptraj tool in amber to obtain the locations of mass centers [56].Centroid distances are then calculated on each frame among the MD trajectory.These output data can intuitively evaluate the degree of interactions between two peptide chains in the system.

5) BINDING FREE ENERGY IN THE COMPLEX
Free energy is a widely accepted metric for describing the binding affinity between segments of a system.In a solvent environment, it is quite complicated to calculate the binding free energy value directly.Therefore, the computing method is based on the thermodynamic cycle, with the equation as follows [57], [58]: where G sol,bind stands for the difference of energy in the solvent environment when the system changes between bounded and unbounded states.G vac,bind stands for the same energy difference but in the vacuum environment.G sol,complex , G sol,receptor and G sol,ligand are the free energies in the solvent environment for the whole dimer-drug complex, receptor component (here is the RTK monomer), and ligand component (here is the EGFR-drug complex), respectively.For a stable system, binding free energy is always negative, and a more negative value of binding free energy represents a more stable binding.Here we implement Molecular mechanics generalized Born and Surface Area (MM/GBSA) protocol provided by Amber to calculate the binding free energy.The output data will state the intensities of interactions in different systems and validate the results obtained from geometrical properties.

III. RESULTS AND DISCUSSIONS A. ESTABLISHING OF DIMER-DRUG COMPLEXES, MD SIMULATIONS, AND ALPHA SHAPE MODELING
The wild type EGFR, together with all the six mutants, form the target dimers with EGFR or IGF-1R monomer as their partner, respectively.A molecule of Gefitinib is aligned into the 14 binding site residues on the EGFR-Gefitinib crystal structure template.MD simulation of each dimer-drug complexes is executed in an explicit-solvent environment by Amber.The equilibration stage, which lasts for 4ns with a sampling interval of 2ps, is designed to guarantee the prepared system is reliable for further simulation.As an inspection, we applied root-mean-square-deviation (RMSD) on frames of equilibration trajectories, and output data for VOLUME 9, 2021 FIGURE 5. RMSD curves of homodimer-drug complexes, with EGFR as RTK partner, plotted on frames of equilibration trajectories.Wild type complex is colored grey, while complexes with drug-sensitive mutants are colored orange, and complexes with drug-resistant mutants are colored blue.FIGURE 6. RMSD curves of heterodimer-drug complexes, with IGF-1R as RTK partner, plotted on frames of equilibration trajectories.Wild type complex is colored grey, while complexes with drug-sensitive mutants are colored orange, and complexes with drug-resistant mutants are colored blue.
all the complexes are shown in figure 5 and figure 6.It is demonstrated that all the curves are leveling off and are acceptable for subsequent steps.MD simulations are applied to the complexes, and trajectories of 1000 frames are generated for each system.

B. ANALYZING GEOMETRICAL PROPERTIES TO REVEAL THE MECHANISMS OF DRUG RESISTANCE
As dimerization is a vital signaling process, the interactions between EGFR and RTK are essential features that indicate the dimer's stability.Interfacial atoms at the interaction sites between two peptide chains were obtained with a weighted alpha shape algorithm.An example that visually displays the result of alpha shape modeling is showed in Figure 7.For each of the dimer-drug complexes, matching rates of all the 1000 frames are calculated.Results are shown in Figures 8 and 9, where the moving average is applied to smooth the curves.In Figure 8, the curves are matching rates between EGFR-EGFR dimer.shows matching rates of three drug-sensitive mutants (L858R, exon 19 deletion, and G719S) dimerized with EGFR partner, while Figure 8B shows three drug-resistant mutants (T854A, L858R-T790M, L858R-T790M-C797S, and D770_N771insNPG).In Figure 8C, a simple comparison is made.Wild type EGFR is plotted as a reference, and L858R is taken as a representative for drug-resistant dimers.L858R-T790M mutant, with a secondary mutation based on L858R, is selected to represent drug-resistant dimers as they're closely related to each other.Figure 9 shows the curves of IGF-1R-EGFR heterodimer with similar mutants.The high and low order of specific data in the IGF-1R-EGFR results isn't always the same as what appeared in figure 8, but the general regularities are similar.
The average value of frames for each dimer is recorded in table 1. Box plots of the data are shown in Figure 10, providing a more intuitive way to compare and contrast the differences between groups.We can notice from the figures and the table that the drug-sensitive dimers have a higher matching rate than the drug resistance group, which is the same in both homodimer and heterodimer examples.It means that with Gefitinib added to the system, the drugsensitive dimers, which used to be more active, are now more stable, and more interactions exist in the system.Their EGFR-RTK-drug complex is easier to form and continue to exist.As EGFR and RTK are bound to each other, with drug molecules blocking the ATP binding site, these proteins will not join the signal network.Thus, the over-activation of EGFR is reduced, and downstream signaling will return to normal.
On the contrary, for drug-resistant dimers, the interactions are remarkably weakened by changing geometrical properties due to drug-resistant mutations.The systems become more unstable, giving the drug less chance to come into effect and leading to the loss of efficacy.Results of Wild type EGFR are analogous to drug-resistance examples, which are consistent with the clinical information.
As all the data from complexes are independent from each other, the rank sum test can be executed as a tool for quantizing the differences between datasets.Results reveal the degree of similarity and difference in the dimer groups.Here, we calculated the test result for each pair of the dimers with the ranksum function of MATLAB.The function returns a value for each pair, and the larger the value is, the more consistent the two datasets are.As the values are small, we took a log of them and then filled them in a matrix for    and bottom-right corner, while differences are more extensive in other areas.This indicates that the mutants can be divided into two groups according to similar geometrical properties, with L858R, exon 19 deletions, G719S mutants in one  group, and T854A, L858R-T790M, L858R-T790M-C797S, D770_N771insNPG mutants in the other.Wild type EGFR is also not sensitive to the drug and behaves like a drug-resistant mutant.The division meets with the division based on drug resistance, proving that the extracted geometrical features are reliable.The vital information is retained and can distinguish the drug-sensitive mutations from drug-resistance ones.As T790M, C797S and D770_N771insNPG all exist on exon 20 and have similar behavior.Our results demonstrate that the exon 20 mutations have a relationship with the mechanisms of drug resistance.Results of D770_N771insNPG dimers can also show that the αC-β4 loop influences the interactions in both homodimers and heterodimers, indicating that the loop may also play an important role in EGFR dimerization.

C. CENTROID DISTANCE RESULTS
Centroid distances of the interfacial atoms are calculated based on the trajectory frames of all the complexes.The distribution of results is shown in Figure 12. Results show that distances between drug-sensitive mutants and their partners are less than distances in drug-resistant dimers in both homodimer and heterodimer, implying nearer locations and tighter interactions.The results are the same as what we found by matching rates and the previous analysis.

D. BINDING FREE ENERGY OF THE DIMERS
The free energy of binding between the EGFR and RTK partners is calculated by MM/GBSA tool provided by Amber.Distribution of output data corresponding to all the complexes is shown in Figure 13.In both homodimer and heterodimer examples, binding free energy corresponds to drug-sensitive mutants are greater than energy corresponds to drug-resistant ones.The outcomings indicate that binding affinities between drug-sensitive mutants and their partners are tighter, and the complexes are more stable than the drug-resistant ones.These results further confirm our geometrical analysis.

IV. CONCLUSION
In this paper, we focused on interactions between EGFR and its RTK partner, as the dimerization between them plays a crucial role in the cellular signaling network.We established a set of models of drug-sensitive and drug-resistant EGFR mutants, dimerized them with EGFR and IGF-1R, respectively.A TKI, Gefitinib, is also added into each system to form a dimer-drug complex.We applied MD simulations on the complexes to simulate their behaviors for further analysis.
Geometrical properties were extracted with alpha shape modeling.We obtained the solid angles of interface atoms and calculated the matching rate on each phase of MD trajectories to estimate the interactions between partners in the complex.The centroid distance between these interface atoms is also computed as a criterion.It is found that drug-sensitive mutants are bound closely to RTK partners, while binding intensity between drug-resistant mutants and their partners are less.Binding free energy was also taken as a verification method.Results from free energy analysis were similar to results derived from geometrical feathers and supported our findings.
Thus, we can conclude that mutations on EGFR peptide chains result in the changes of geometrical properties on the mutated proteins' surfaces.With the altering of shapes, the interactions between partners are also transformed.Drug-resistant mutations lead to the instability of the TKI blocked dimers, reducing the blocking effect of Gefitinib, and downstream signaling pathways are restored.The three methods are consistent with each other, explaining the mechanism of drug resistance in terms of geometrical properties, distance, and binding free energy.Results showed that mutations on exon 20 are important in drug resistance to Gefitinib, and the αC-β4 loop has a significant association with the asymmetric dimerization of EGFR.Further research can be applied on these topics to provide a deeper understanding of the drug resistance mechanism.
This study provides valuable insights into the EGFR mutation-induced drug resistance, helping to improve the targeted drug therapies for the treatments of NSCLC.The idea and algorithm, we designed to extract features and analyze the geometrical properties are proved to be reliable.These characterization methods have the potential to accomplish relevant tasks in further research.In the future, we will work with newer generation drugs and finding other useful features for drug resistance analysis.

FIGURE 2 .
FIGURE 2. Examples of model structures (A) Structure of L858R mutant (colored orange) and wild type EGFR (colored grey), mutation site 858 is colored blue.(B) Structure of L858R-T790M mutant (colored blue) and wild type EGFR (colored grey), secondary mutation site 790 is colored red.(C) Example of a dimer-drug complex.The wild type EGFR is colored grey, with RTK partner colored green.A molecule of Gefitinib is attached to the wild type EGFR.

FIGURE 3 .
FIGURE 3. Triangulation examples of point set A, B, C, D. The figure is drawn according to fundamental theories [51].(a) Unsuccessful triangulation that does not satisfy the criterion, as point D in the set, is located in triangle ABC's circumcircle.(b) Successful triangulation after changing the way of triangulation.
VOLUME 9, 2021 then point D locates in the circumcircle of the triangle formed by points A, B, and C.

FIGURE 4 .
FIGURE 4. 2D example of solid angle matching.(A) At point A, a solid angle is calculated based on all the triangles with A as vertex, namely ABM, ABN, etc.As curvature at points A and B are both concave, surface shapes do not fit into each other.Thus, the interaction here is weak, the atoms do not match.(B) The curvature at point C is concave, but D is convex, leading to a better interaction and the atoms match according to the decision formula.

FIGURE 7 .
FIGURE 7.An example of alpha shape modeling.(A) A dimer-drug complex with Wild type EGFR.(B) The alpha shape surface we obtained after executing the algorithm on complex in (A).

FIGURE 8 .
FIGURE 8. Matching rate of atom solid angles obtained from the 1000 trajectory frames of homodimer-drug complexes.Dashed lines are the average value the corresponding curves.(A) Complexes with drug-sensitive mutants.(B) Complexes with drug-resistant mutants.(C) Comparison between wild type EGFR, L858R complex, and L858R-T790M complex.

FIGURE 9 .TABLE 1 .
FIGURE 9. Matching rate of atom solid angles obtained from the 1000 trajectory frames of heterodimer-drug complexes.Dashed lines are the average value of the corresponding curves.(A) Complexes with drug-sensitive mutants.(B) Complexes with drug-resistant mutants.(C) Comparison between wild type EGFR, L858R complex, and L858R-T790M complex.

FIGURE 10 .
FIGURE 10.Matching rates of different complexes expressed by box plots.As the area is limited, we use L as a short form for complex with L858R mutant.Del stands for exon 19 deletion mutant, GS, TM, TA, CS and Ins corresponding to G719S, T790M, T854A, C797S and D770_N771insNPG mutants.(A) EGFR-EGFR homodimer complexes.(B) IGF-1R-EGFR heterodimer complexes.
comparison.Heatmap is used to plot the output, as shown in Figure11.The colors and values directly display results.In general, pairs of dimers are more similar to each other in the left-upper

FIGURE 11 .
FIGURE 11.Rank sum test results of matching rates, shown as a heatmap.Yellow color corresponds to a larger value, while blue color means a smaller one.(A) Test results of EGFR-EGFR homodimer complexes.(B) Test results of IGF-1R-EGFR heterodimer complexes.

FIGURE 12 .
FIGURE 12. Centroid distance of each complex expressed by box plots.The length unit is angstrom.Data of wild-type complexes are colored grey, while drug-sensitive complexes are colored orange, and drug-resistant ones colored blue.(A) EGFR-EGFR homodimer complexes.(B) IGF-1R-EGFR heterodimer complexes.

FIGURE 13 .
FIGURE 13.Distributions of binding free energy for all the complexes.In general, groups of output data behave the same as what we discussed.(A) Energy for EGFR-EGFR homodimer complexes.(B) Energy for IGF-1R-EGFR heterodimer complexes.