T-DYNMOGA-Q w : Detecting Community From Dynamic Residue Interaction Energy Network and Its Application in Analyzing Lipase Thermostability

A new type of residue interaction network named residue interaction energy network (RINN) is built. Then, a multi-objective optimization dynamic network community discovery algorithm T-DYNMOGA-Q w has been proposed to detect communities from dynamic RINN. T-DYNMOGA-Q w sets a threshold during the initialization process and optimizes weighted modularity Q w as the objective function. Setting the threshold can better ﬁnd the stable structure in the dynamic RINN. The resolution limit of modularization has been broken by using objective function Q w . After Community detection from dynamic RINN of wild type of lipase (WTL) and its mutant 6B from 300K to 400K, it is found that the communities in 6B network can still maintain a tight structure even at higher temperature. Stable community is beneﬁt to the heat resistance of lipase 6B. The hydrogen bonds between mutated Ser15 and Ser17, and the Glu20 with other residues improved the structure stability. The mutated L114P, M134E, M137P, and S163P enhance the rigidity of the ﬂexible region and tighten the secondary structure, which stabilize the protein structure.


I. INTRODUCTION
Protein function is closely related with its conformation. The change of protein conformation over time can be coded by a dynamic residue interaction network, whose node is residue and edge is interactions between residues at time scale. It is a powerful method to study the relationship between protein structure and function using complex network theory. In recent years, residue interaction networks have important applications in understanding molecular communication, exploring allosteric inhibitors and activators, and determining important residue sites related to activity, and so forth [1]- [7]. In addition, the analysis of residue interaction network is also available to study the thermostability of proteins [8].
It is very crucial to construct a network that can accurately describe protein three-dimensional structure. Unweighted networks are widely used in the early stage of using complex The associate editor coordinating the review of this manuscript and approving it for publication was Tossapon Boongoen . network theory to study the relationship between protein structure and function [9]- [11]. Later, it was found that weighted networks better described the relations between residues [12]. The total number of interactions between all atoms including the main chain and the side chain atoms is often taken as the weight of the edges [13], [14]. In addition, the interactions between all side chain atoms within the specified distance threshold are considered to be the edge weight [15]. As we know, the residue-residue interactions contain strong interactions and weak interactions, for instance, disulfide bridge is a kind of strong interaction (bond energy is 167 kJ/mol), while van der Waals interaction is one weak interaction (bond energy is only 6 kJ/mol). For the relationship between residues, considering both the interaction type and the strength can reflect the true relationship between the residues. The software gRINN [16] developed by Serçinoğlu Onur uses the network construction method proposed by Vijayabaskar [17], [18], which takes the average interaction energy between each residue pair as the edge weight.
The calculation of the interaction energy in this method is the sum of the short-distance Lennard-Jones and Coulomb interactions, averaged over the overall structure generated at a constant temperature (300 K). In fact, all forces and their energy need to be taken into account. In this paper, the sum of interaction energy of residues is used as the network weight to construct the residue interaction energy network (RINN).
For residue interaction network, mining and analyzing the community structure is an important method to find the functional related regions in the three-dimensional structure of protein. ZHANG et al. used ESPRA Algorithm and the Evolving Graph + Fast-Newman algorithm to detect the rigid community from xylanase dynamic weighted residue interaction networks [19]. Sun et al. combined the analysis of community structure with CRIP detection of 116 catalytic proteins and found that community structure and CRIP may be the structural basis of low-frequency motion [20]. Stetz et al. analyzed the network centrality and community, and proved that substrate binding may strengthen the connectivity of local interactions in communities, which leads to a dense interaction network that can promote an efficient allosteric communication [21]. Obviously, exploring the community structure and evolution is an important way to analyze the relation between protein structure and function.
Most methods for detecting communities from dynamic network include incremental clustering and evolutionary clustering. Compared to incremental clustering, evolutionary clustering [22] considers the characteristics of dynamic networks that change slowly over time. In the framework of evolutionary clustering, many novel algorithms have emerged, such as ESC algorithm [23], EMMC algorithm [24], PisCES algorithm [25], Kin-Han algorithm [26], ESPRA algorithm [27], DYNMOGA algorithm [28], and LDMGA algorithm [29]. Most of these algorithms obtain the community structure by maximizing the community modularity. Currently, it has been proved that the optimization of the modularity has a resolution limitation [30], with the result that important community structure on a small scale can not be found. Nandinee proposed weighted modularity Q w [31] to overcome the resolution limit of modularity function by incorporating a weight term in the modularity formulation. For the community detection algorithm based on GA (genetic algorithm), trajectory-based representation methods and graphics-based representation methods have a great impact on the algorithm performance. Different initialization strategies also have a great impact on the algorithm performance. The DYNMOGA algorithm uses a trajectory-based initialization method, so that the randomness of population production is very large. When the population numbers and the iteration times are increased to obtain the clustering quality, it will take more time. In order to reduce the randomness of population generation, Anwar calculated and screened the clustering coefficients of the initial population to obtain a highly modularity community [32]. Niu introduced the node degree-based label propagation algorithm [29] when initializing individuals. Community detection based on the characteristics of different networks helps to detect a more realistic community structure. Cheng [33] proposed a new method based on node vitality to detect overlapping communities in a dynamic network. Here, we consider the characteristic of RINN and introduce the threshold in the initialization stage to reduce the randomness of the population.
In this study, the randomness of population generation and the limitation of resolution for the optimization of modularity is considered at the same time, a community detection algorithm T-DYNMOGA-Q w is proposed for dynamic RINN based on DYNMOGA. The rest of the paper is organized as follows: Section 2 introduces the selection and process of data sets. The T-DYNMOGA-Q w is described in detail in Section 3 and Section 4. In Section 5, the reliability of T-DYNMOGA-Q w is tested, and the lipase network communities detected by TDYNMOGA-Qw is analyzed.
The conformations of lipase and xylanase are obtained by molecular dynamics simulation at 300K, 350K and 400K. The simulation time is 300ns and the conformation is saved 1 frame/ns. Then, each enzyme saved 300 three-dimensional structures in chronological order at each temperature. For every frame, RING2.0 [37] is used to calculate all types of interactions between residues. The distance thresholds are salt bridge 3.5 Å, disulfide bridge 4.0 Å, hydrogen bond 3.5 Å, van der Waals interaction 0.8Å, π-π stacking 7.0Å, and π-cationic 7.0Å. For RINN, the edge weight w ij is determined as follows.
if residue i and j exist interaction 0, others. (1) In the formula E a , E b , . . . , E f are the energy of different type of interaction.
Thus, a dynamic RINN contains 300 RINN in chronological order per temperature as follows:

A. EVOLUTIONARY CLUSTERING
Evolutionary clustering is a method of processing time series data to generate clustering sequences. It needs to consider two conflicting indicators at the same time: snapshot quality (ST) and temporal cost (TC) proposed by Chakrabarti et al [22]. The snapshot quality is used to represent the clustering result of the network G t (the network at time t) at current time. The temporal cost is used to represent the difference between the clustering result C t (the community of network at time t) at current time and the clustering result C t−1 at previous time. The optimal clustering result at each time is to maximize snapshot quality and minimize the temporal cost. The balance factor α is introduced to regulate the influence of the two indicators. The formula is as follows: A dynamic network with t networks in time can be described There are tk communities in the network at time point t, then the network G t can be described Definition 1: For a static network G t , the multi-objective clustering problem can be defined as: . . , f h (C t )) based on a vector of independent objective functions f i (i = 1, 2, . . . , h). f i is the function used to measure the quality of the cluster. The solution of the multi-objective optimization problem is C t where each of the partial functions in the above objective function takes a maximum value.

2) OPTIMAL SOLUTION FOR MULTI-OBJECTIVE PROBLEM
For the multi-objective optimization problem, the mutual constraints between the targets may make the improvement of the performance of one target often at the expense of the performance of other targets. Problem of the solution is usually a set of non-inferior solutions, that is, Pareto solution set.
Definition 2: Pareto optimal: if C * ∈ is Pareto optimal solution, then for ∀C ∈ , the following condition is satisfied: where I = {1,2,. . . ,h}, h is the number of individual objectives and at least there exists j∈i, such that The collection of Pareto solutions is the so-called Pareto Front. All solutions in the Pareto front are not dominated by solutions outside the Pareto Front (and other solutions within the Pareto Front curve).

IV. METHODOLOGY A. MULTI-OBJECTIVE OPTIMIZATION CLUSTERING ALGORITHM BASED ON THRESHOLD SELECTION AND WEIGHTED MODULARITY (T-DYNMOGA-Q w ) 1) FRAMEWORK OF CLUSTERING ALGORITHM BASED ON MULTI-OBJECTIVE OPTIMIZATION
T-DYNMOGA-Q w is developed from the DYNMOGA algorithm [28], which uses multi-objective genetic algorithm (NSGA-II). NSGA-II is a fast non-dominated sorting method. The algorithm constructs a population and each individual in the population represents a possibility which represents the community structure in the network in the algorithm. Individuals in a population are ranked according to non-dominated precedence, and individuals are constantly evolving in successive offspring. The algorithm optimizes two conflicting functions Q w and NMI. Q w is used to measure the clustering result of the network Gt (the network at time t) at current time t. NMI is used to measure the difference between the clustering result C t (the community of network at time t) at current time t and the clustering result C t−1 at previous time t-1. The algorithm process is shown in Fig 1. Nodes represent amino acid residues, edges represent the sum of the energy of different type of interactions between residues. (a) represents the initial input network; (b) represents the different divisions of community; (c) represents the population after calculating the objective function and assigning the rank; (d) represents optimal results after rounds of selection, crossover, and mutation. Examples are as follows in

2) ENCODING AND DECODING PROCEDURE
The genetic representation of the algorithm uses locus-based adjacency representation. An individual is composed of N genes g 1 , g 2 , . . . , g N . It is assumed that there exists an allele j in a gene. j is from 1 to N. Gene and allele represent the VOLUME 8, 2020 residue nodes of RINN. The value j assigned to the gene i indicates that residues i and j are in the same community. If there is no interaction between residues i and j, j will be replaced by a residue that interacts with residue i.
When decoding it, residues in the same group are assigned to a community. The main advantage of this representation is that the number of clusters is automatically determined by the number of components contained in the individual and determined by the decoding step. Examples are as follows in Fig 2.

3) INITIALIZATION BASED ON THRESHOLD SELECTION
The DYNMOGA algorithm has some limitations in community detection of RINN, including the population initialization. The random generation of the population usually results in a low adaptability of the population. It may eventually fall into a local optimum, and it takes longer to reach a global optimum. A better population initialization reduces the chance of falling into a local optimum and guarantees a fast approach to the global optimum. Algorithms that generate better populations may reach the global best state faster with fewer iterations. The RINN is constructed based on the number of inter-residue forces and the energy of various forces. The stability of the edges between network nodes is closely related to the strength of the interaction between residues. The energy of hydrogen bond, salt bridge, π-cation, π-π stack, disulfide bridge, and van der Waals interaction are as follows: 17 kJ/mol, 20 kJ/mol, 9.6 kJ/mol, 9.4 kJ/mol, 167 kJ/mol, 6 kJ/mol. The weak interactions between residues are relatively unstable, and the existence of the number of weak interactions makes the community structure of the network unclear, which increases the difficulty of community detection.
In order to detect dense and stable communities, we set thresholds during the population initialization phase to filter out the more unstable interaction forces between residues. The selection result of the threshold is shown in Supplementary Material.

4) CROSSOVER
The uniform crossover algorithm randomly generates a crossover mask with a length of N. One child is generated from the gene of the first parent corresponding to the mask of 0 and the gene of the second parent corresponding to the mask of 1. The uniform crossover can guarantee the maintenance of the effective connections of the nodes in the network in the child individual. Examples are as follows in Fig 3.

5) MUTATION BASED ON THRESHOLD SELECTION
The mutation operation is that the possible values that the allele can assume are limited to the neighborhood genes that meet the threshold, that is, the gene values at certain loci of individuals in the population are changed.

6) OBJECTIVE FUNCTION
The formula of weighted modularity Q w [31] is: Among them, a i = d i /2L is the fraction of edges with at least one end in community i, l i is the number of edges in the community i, d i is the sum of the degree of vertexes in community i, n i is the total number of nodes in the community i, e ii = l i /L is the fraction of the edges in the community i.
Normalized Mutual Information (NMI): A matrix is used to measure the similarity between the community structure at time t and the community structure at the previous moment. Assume a network with two partitions A = {A 1 , A 2 , . . . , A a } and B = {B 1 , B 2 , . . . , B b }. C is a matrix, where the element C ij is the number of nodes in the communities A i ∈ A and B i ∈ B simultaneously. The formula of NMI as follows: Among them, C A represents number of communities in division A, C B represents number of communities in division B, C i . represents the sum of the rows in matrix C, C .j represents the sum of the columns in matrix C, N is the number of nodes. if A = B, NMI (A, B) = 0. So the second objective function is to maximize NMI (C t , C t−1 ) at time t. 89442 VOLUME 8, 2020

B. T-DYNMOGA-Q w ALGORITHM
The algorithm is based on the framework of a multi-objective genetic algorithm (NSGA-II), it optimizes two complementary objective functions Q w and NMI which have been proven to detect the effectiveness of communities in complex networks. Refer to the parameter setting of the DYN-MOGA algorithm. The experimental setting parameters are: crossover probability 0.8, mutation probability 0.2, population number 300, iteration number 100. The algorithm process and parameter settings are as follows in Fig. 4.

C. COMMUNITY ANALYSIS INDICATORS
The graph G = (V, E) has n vertices and m edges. S is a cluster with n s nodes and m s edges. c s = {(u, v) |u ∈ S, v / ∈ S} is the number of edges in the boundary, {S 1 , S 2 , . . . , S k } are k clusters of G.
The following indicators are used to judge the quality of community structure.
Modularity Q [38]: The e ii and a i is the same as formula (6). Intra-Cluster Density [39]: It measures the inner edge density of a community. The formula is as follows: Inter-Cluster Density [39]: It measures the proportion of all possible edges leaving the community. The formula is as follows: When the modularity is greater than 0.3 [38], it is a reliable result of network community. The larger the intra-cluster density value and the smaller the inter-cluster density value, the better the quality of the community structure obtained.

V. RESULTS AND ANALYSIS A. COMPARISON OF T-DYNMOGA-Q w ALGORITHM AND DYNMOGA ALGORITHM ON COMMUNITY DETECTION
The T-DYNMOGA-Q w algorithm and the DYNMOGA algorithm were used in community detection 1-300 frame networks of xyna_strli, xyna_theau, WTL and 6B at 300K, 350K, 400K. The values of Q, community number, community size, inter-cluster density, intra-cluster density are shown in table 1: In table 1, the community modularity of the networks of xyna_strli, xyna_theau, WTL and 6B at 300K, 350K, and 400K are all around 0.6, and the communities detected by the T-DYNMOGA-Q w algorithm have higher modularity. The number of communities detected by the T-DYNMOGA-Q w algorithm is larger, the community size is smaller. And the detected communities have higher intra-cluster density and lower inter-cluster density. The T-DYNMOGA-Q w algorithm has better quality of the detected community structure.
Next, we analyze the key factors affecting the thermal stability of WTL and 6B in the rigid community detected by the T-DYNMOGA-Q w algorithm.

B. RIGID STRUCTURE ANALYSIS
Rigid structure is an important property of protein stability. After the communities of WTL and 6B networks at 300K, 350K and 400k were detected, we analyzed the residues and interactions in communities in the same frame and further identified the stable communities at each temperature. We set that there is a stable relationship between residues and residues that exist in the same community in 270 frames (total 300 frames). The results are shown in Fig. 5: As can be seen from Fig. 5, at 300K, the rigid structures in WTL and 6B are distributed in most of the α-helix, β-sheet and 3 10 -helix, and only a little part in the loop. As the temperature increases, the rigid structures in WTL and 6B gradually decrease, and the scale of rigid structures becomes smaller. When the temperature reaches 400K, the rigid structures in WTL are mainly distributed in αB, αC, αE, and a small part are distributed in β5, β6, and 3 10 -helix. The residues of the rigid structure in the loop only Asp118-Pro119-Asn120-Gln121. The reduction of rigid structures in 6B is less. Compared to WTL, αB, αF, 3 10 -helix and loop (between β4 and αB) in 6B have rigid structures. There are rigid structures in the loop (Asp43-Lys44-Thr45-Gly46-Thr47) between β4 and αB and the loop (Asp133-Met134-Ile135) between β7 and αE too. Rathi [40] et al. studied the lipase unfolding simulation of Bacillus subtilis based on the rigid theory and found that the loop region in the structure first disintegrated, followed by the α-helix and β-sheet. Comparing the changes in the rigid structure of WTL and 6B with increasing temperature, we found that increasing the stability of the α-helix and β-sheet structures contributes to the improvement of lipase thermostability.  The left in Fig. 5 shows the rigid structure of WTL and 6B at 300K, 350K, and 400K, respectively. The circles in the figure represent residues, and the edges represent the sum of the forces and energies between the residues. The figure on the right shows the secondary structure corresponding to the rigid structure at each temperature.
By Analyzing the rigid communities of 300K, 350K, and 400K, we found the rigid structures that WTL and 6B are stable at all temperatures. The result is shown in Fig. 6: In Fig. 6, the number and scale of 6B rigid structures is much larger than that of WTL. Comparing the rigid structure in WTL and 6B, the rigid structure in WTL exist the corresponding structure among the rigid structures in 6B except Tyr86-Leu90. The corresponding three-dimensional structure of the rigid structure in the Fig. 6 is shown in Fig. 7 (blue): FIGURE 7. Rigid structure of WTL and 6B at all temperatures show in three-dimensional structure (a) shows the rigid structure of WTL at all temperatures, and (b) shows the rigid structure of 6B at all temperatures. Among them, the red structure represents α-helix, the yellow structure represents β-sheet, and the green structure represents loop region. The blue part indicates the rigid structure of WTL and 6B.
We found that most of the rigid structures in WTL also exist in 6B. Therefore, we analyze the unique rigid structure in 6B: (1) Residues Ser15-Ser17-Asn18 in a rigid structure are located in 3 10 -helix. The strenuous exercise of the 3 10 -helix will affect the stability of the structure [41]. The mutated Ser15 forms a strong hydrogen bond with the mutated Ser17, which stabilizes the 3 10 -helix structure in 6B. (2) Active site N18 and mutation site A20E locate near Gly21-Ile22-Tyr25-Leu26-Ser28-Gln29 in the rigid structure [34]. Ala20 is located at the N-terminus of αA. After mutation, the side chain of Glu20 interacts with water molecules, and water molecules interact with Ser24 (located on the αA helix). Glu20 not only forms hydrogen bond, but also forms salt bridge with Lys23, which contributes to the stability between Glu20 and Ser24. Glu20-Lys23 is also one of the unique rigid structures of 6B. Therefore, it can be speculated that the mutation of A20E contributes to the increase in the rigidity of the αA structure and the stability of the lipase structure. (3) The residues Ala113-Pro114-Pro115-Ile123-Leu124-Tyr125-Gln178 in the rigid structure are located in the loop and the partial structure of β7, and L114P is located in the longest loop [42], which is between β6 and β7. By analyzing the interactions related to L114P, we found that there is more VDW between the Pro114 residue and the Tyr125 residue after the mutation, and the Pro residue can stabilize the folded state of the protein. Therefore, the conversion of Leu to Pro helps to stabilize the structure of the protein, so that the relationship between this loop and β7 in 6B is closer than that in WTL. (4) The residues Asp133-Glu134-Ile135, Pro137-Tyr139-Leu140, Ser162-Pro163-Gln164-Val165 in the rigid structure, which are in the αE and αF terminal and loop regions. M134E, M137P and A163P residues are located at the positions where the end of the helix is connected to the loop region. Mutations at these positions of the WTL reduce the hydrophobic region on the surface of the protein molecule, which can prevent the protein from agglomeration at high temperatures and improve the thermostability [34], [43].
In summary, we find that site-directed mutation of residues and the increase of hydrogen bond between residues can improve the stability of lipase secondary structure. The loop, turn, and helix ends of the protein structure are very flexible regions. The introduction of Pro in these unstable regions can reduce structural flexibility and enhance rigidity. Improving the stability of the secondary structure and the rigidity of the unstable region can improve the heat resistance of the protein.

VI. DISCUSSION
In this study, we proposed an evolutionary multiobjective approach T-DYNMOGA-Q w based on DRINN for community discovery in dynamic networks. The threshold is set during the population initialization phase of the algorithm and the weighted modularity Q w is used as the objective function of the algorithm to optimize the resolution limitation of the modularity. Comparing the T-DYNMOGA-Q w algorithm with the properties of communities detected by DYNMOGA on the xylanase and lipase data sets, we find that the T-DYNMOGA-Q w algorithm can correctly divide the nodes and detect a stable community structure. However, it has the disadvantage of a large number of communities and a small community size. Community detection was performed on the lipase WTL and 6B networks at different temperatures. It was found that the communities detected from the 6B network could still maintain a tight structure when the temperature increased, indicating that the stability between the structures is the reason why 6B is thermophilic one. The mutated Ser15 forms a strong hydrogen bond with the mutated Ser17, and forms more force with other residues. Glu20 not only forms hydrogen bond, but also forms a salt bridge with Lys23. It can be speculated that the mutation of A20E can help increase the rigidity of the αA structure. VOLUME 8, 2020 The introduction of Pro in flexible regions such as the loop region, turns and the ends of the helix can reduce structural flexibility and increase rigidity. Therefore, the increase of hydrogen bond and the improvement of the rigidity of the flexible region contribute to the stability of the structure and have an important impact on the improvement of the protein thermostability.
[42] C. Magyar, M. M. Gromiha, Z. Sávoly, and I. Simon, ''The role of stabilization centers in protein thermal stability,'' Biochem. Biophys JINGSI JI is currently pursuing the master's degree with the Digital Media Institute, Jiangnan University, China. Her main research direction is bioinformatics. She has a strong interest in mining and analyzing biological data using complex networks and different algorithms.
YANRUI DING received the Ph.D. degree in bioinformatics from Jiangnan University, China. She is currently a Professor with the School of Science, Jiangnan University and a Visiting Scholar with New York University. She has published over 60 research articles. She has presided and finished several research projects including two projects supported by the National Natural Science Foundation. She applied two authorized patents and one soft-ware copyright. Her research interests include computing intelligence, data mining, and bioinformatics. VOLUME 8, 2020