VD-Analysis: A Dynamic Network Framework for Analyzing Disease Progressions

The progression of a disease associates with changes in genomic activity, but it remains a challenge to screen genetic biomarkers for clinical applications. The disease progression, in dynamic network methods (DNM), can be analogous to an animated film composed of discrete frames, where each frame represents a temporary state of the time-varying gene-gene interaction network. The major shortage therein is that the transition between two neighboring temporary states was beyond investigation. Here, we develop an updated computational methodology named after VD-analysis. Because single-gene biomarkers were not approved capable of representing a complex biological process, we firstly introduce V-structure — a gene module composed of three genes and two interactions among them — and define it as unit module. We then identify the perturbed pathways that mark the disease progression, followed with the V-structures identified which drive the pathway perturbations. Such driver V-structures can be taken as eligible biomarkers for clinical applications. To test the feasibility of this method, we apply it to a time course dataset of gene expression related to mouse type-II diabetes (T2D). Result indicates that the whole process of T2D is exactly divided into 3 stages and that the driver V-structures inferred for each stage are qualified biomarkers. In summary, our method contributes to the description of dynamic disease progression and the V-structure biomarkers facilitate the treatments of disease.


I. INTRODUCTION
Essentially, the progress of any disease is a process directed by microscopic molecular interaction events. Therefore, to identify and trace the molecular events takes a fundamental role for comprehending the state evolution of a disease [1]- [3]. High-throughput techniques provided abundant datasets, which describe the state evolution as a series of discrete states; however, it is unclear how the state transitions occur. On the other hand, such shortage can be compensated in some degree by computational methods, which give insights to the molecular mechanisms [4]- [6].
Dynamic network method (DNM) was widely used to investigate the dynamic and systematic progression of a disease. The changes of gene expression levels during the whole disease course are contained in the dataset on the time course of gene expression; thereby, a series of temporary GGNsfor every sample time point, nodes of the network are the The associate editor coordinating the review of this manuscript and approving it for publication was Liangxiu Han . genes active at that time -were constructed. This series of temporary networks represents the dynamic evolution of the disease at the level of molecular interactions [7]- [10]. It has been accepted that, in each temporary network, some crucial gene modules take primary roles and thus the gene modules were utilized to elucidate the temporary states of disease.
Traditionally, the disease progressions were usually treated as a series of discrete states at sampling time points, with what caused the state transitions not addressed. Aiming at understanding the pathogenesis of heart disease based on time course RNA-seq dataset, Ma et al. proposed the inference of multiple differential modules (iMDM) algorithm to identify gene modules across multiple differential co-expression networks [8]. Based on temporal microarray data of gene expression, Rajat et al. obtained dysregulated gene modules at 10 time points, which constitute paths of the disease evolution [9]. Srihari et al. identified cancer-related protein complexes via analyzing expression level differences for normal and cancer conditions [10]. All of them pay attention to the temporary states of disease at the sampling time points and ignore states transitions. In this work, not only do we try to reveal the temporal states, but also identify underlying molecular drivers causing states transitions.
In this work, we investigate the molecular mechanisms that lead to the state transitions, via studying the communications among those subnetworks corresponding to dysregulated pathways. Overlapped nodes take the communication roles between the two subnetworks, as evidenced by the associativity decline of two pathways when the overlapped genes are deleted. The node genes are thereby taken as biomarkers for uncovering the molecular mechanism of the state transitions. Traditionally, many researchers focused on extracting isolated genes as biomarkers. However, it was uncovered that individual genes are insufficient for the purpose. A promising method is to take several genes together with their mutual interactions as a functional unit that represents a biomarker [11]- [14]. The function of such a biomarker can be inferred or annotated from the datasets where the annotation for the related genes and the interactions are provided.
Here we adopt the V-structure, which contains three genes together with two interactions among them, as gene functional unit. The concept V-structure was firstly proposed by Azad et al. [13], [14]; and as shown latter, it serves as a good biomarker balancing complexity and simplification.
To overcome the difficult in analyzing network characteristics based on the traditional biological networks where nodes represent individual molecules and edges are interactions between two nodes, here we introduce line graph. To date, line graphs have been widely used in social network and protein interaction network [15], [16]. The line graphs are much more expressive, overcoming the difficulties in analyzing network where each V-structure is regarded as a whole, thereby, we introduce the line graph. Given an original graph, the line graph of original graph is produced by substituting the edges and nodes of original graph with the nodes and edges in line graph separately (which means a node-edge-exchange procedure). In this way, each V-structure is denoted by two nodes along with one interaction between them. Compared with original graph, line graph is more structural and able to reflect higher-order local neighborhood of interactions [15].
Based on the line graphs of those temporary GGNs, pathways playing significant roles in directing disease progression can be identified. Such crucial pathways are known as perturbated pathways, since disease progression stems from the loss of control of them. For a temporary network, the set of perturbed pathways describes the temporary state of the disease. Moreover, the differences between two perturbed pathway sets reflect the state difference and state transition of the disease, which is manifested as the appearances, disappearances and transformations of special pathways. That is, the evolution of the collection of the perturbed pathway sets represents development of the disease. Furthermore, V-structures that change the sets act as drivers for disease progressions.
Thereby, we propose a novel framework -diseases progression analysis based on V-structure and dynamic GGN (for the sake of convenience, we call it ''VD-analysis'') -for studying disease progression. The novelty of this framework relies on that, we manage to elucidate the disease progression from both static and dynamic aspects. From the static point of view, based on the time series line graphs, perturbed pathway set at each sampling time point is identified. Such sets reflect the temporary states in disease progression. From the dynamic point of view, the framework defines the network coherence and determines the evolutionary relationship between perturbed pathways. Analyzing the coherence from the perspective of network connection, the driver V-structure can be obtained. We find that, the regulation mechanism described by these V-structures is related to those described by the disturbed pathways, with the latter delineating the static state of disease and the former reflecting the cause of the state transition. In addition, these driver V-structures can also be used as biomarkers to predict the stage of disease development, which is helpful for treatment and prognosis of diseases. In conclusion, the purpose of this paper is to apply V-structure to analyze the process of disease development and reveal the cause of disease state transitions, which is conducive to the acquisition of state transition biomarkers and the research of disease progression.

II. METHODS
In this paper, we propose a novel methodology named after VD-analysis. It includes 4 steps: (1) construct line graphs of active GGNs; (2) extract core modules with the use of a peakattachment algorithm; (3) identify perturbed pathways of all time points for delineating the temporary states; (4) determine evolution relationships among the perturbed pathways and then find driver V-structures that contribute to the occurrences of the relationships.
Step (1)-(3) are similar to DNM. After these steps, perturbed pathways for a GGN is produced. In the absence of clinical information, the stages of the disease are determined by the continuity of the distribution of those perturbed pathways, which reflects the changes of molecular regulation systems during the disease procession from a static perspective. According to step (4), driver V-structures are identified to understand the dynamic state transitions of diseases (See Fig.1).
This method is capable of elucidating the progression of disease from both the delineation of temporary states and the analysis on the inherent molecular mechanism that induces state transitions.

A. COMPOSITION OF V-STRUCTURE IN GGN
Since GGNs focusing on interactions between single genes are not embedding the multi-genes cooperation, we therefore consider a three-gene unit, termed as V-structure, as a unit module to describe complex molecular mechanism. In this paper, a V-structure in GGN is composed of three genes g 1 , g 2 and g 3 as well as two directed interactions among them, i.e. < g 1 , g 2 > and < g 2 , g 3 >, denoted by g 1 −g 2 −g 3 (See Fig. 2). For simple representation, we transform each V-structure to  a two-node-and-one-edge structure following an edge-nodeexchange procedure.
We demonstrate that line graph is useful in analyzing cooperation among functional sub-networks as the potential in describing molecular systems. Such potential is evidenced by the following two points. Firstly, line graph is a pruned version of original graph [15] as shown in Fig.3, which means that it discards leaf nodes and highlights functions of linker nodes (nodes with indegree > 0 as well as outdegree > 0). Secondly, nodes in line graph are more influential. For instance, given a node l of line graph and its corresponding directed interaction < l 1 , l 2 > in line graph, the number of nodes effected by l equals to the sum of the indegree of l 1 and the outdegree of l 2 (See Fig. 3(A)). A simple numerical experiment is also intended to support this point. Technically, we calculate the ratios of nodes required in pathway networks and the line graphs of pathway networks respectively if made them affect 80% nodes of the total. The result (See Fig. 3(B)) indicates that the ratio of nodes required in the latter is significantly smaller than that in GGN (about a third less).

B. CONSTRUCTION OF TIME COURSE LINE GRAPHS OF GGN
DNM models the disease progression by time course molecular regulation networks in T time points. Let a time course network series be denoted by = {W 1 , · · · , W t , · · · , W T }, where W t represents the temporary GGN (TGN) related to the temporary state at time point t (t = 1, . . . ,T). In VD-analysis, W t is constructed by the line graph of TGN at t (TGN t ), denoted byTLG t .
Firstly, we select active genes at t to construct TGN t . We define a gene is an active gene (AG) at t if its expression level at t larger than the threshold which is calculated by a statistical method, called 3-sigma principle [17]. As a supplement, differential expression genes (DEGs) are identified by applying a log fold change analysis of 1.5 into experiment vs control groups. Eventually, nodes in TGN t are composed by the union of AG t and DEG t . In addition, we define the gene activity score (AS) as (1), then the weight of edge d =< g 1 , g 2 >, W d , is evaluated as (2), where the maximum information coefficient (MIC) [18] represents the expression correlation between g 1 and g 2 : Secondly, for gene-gene interactions in TGN t , we gather the directed gene-gene interactions in pathways from Kyoto Encyclopedia of Genes and Genomes (KEGG, https://www.kegg.jp/) database. Due to the excessive number of the directed interactions, we confine that each gene-gene interaction must satisfy a condition where the proteins the genes expressed are still interacted. The protein interaction information is supplied by functional protein association networks (STRING, https://string-db.org/) database.
So far, we have obtained T node-weighted-and-edgeweighted TGNs, we next transform them to their related line graph (TLG t ) through a node-edge-exchange procedure. Eventually, the time course network = {TLG 1 , · · · , TLG t , · · · , TLG T } is obtained. For extracting properties of the temporary states that reflects, we identified core modules in TLG t for further identifying pathways playing significant roles in the deterioration of the disease.

C. EXTRACTING CORE MODULE USING A PEAK-ATTACHMENT ALGORITHM
Exploiting the property that there are fewer nodes powerful in affecting most other nodes in LGNs than in GGNs (See Fig. 3), we present a novel algorithm, named after peak-attachment algorithm on TLGs for extracting core modules. The key processes of the peak-attachment algorithm as follows: Firstly, influences of the nodes are evaluated incorporating both topology importance in the network and AS by directed random walk algorithm. The ultimate probability distribution after random walk algorithm is exploited as the overall score (OS) of each node. Next, peaks are defined as nodes with higher OS than all of its neighbors as well as threshold1. Moreover, we initialize a core module as a single peak and add the node with highest OS among neighbors of all members into the core module. The process will terminate when the mean OS of all members in the module undergoes a sharp decline (threshold2) or the OS of the next node to be added in is lower than a fix proportion of the current mean OS (threshold3). After traversing all peak nodes, a series of core modules are extracted. In addition, modules have common nodes are merged. All the steps above are listed in Algorithm 1. In this work, we perform the peak-attachment algorithm on each ALG t , and the core modules group obtained can be denoted by Module t = {Module t1 , Module t2 , · · · , Module tN t , where N t is the total number of core modules.
Moreover, we utilize a grid-search method to determine the best threshold combination {threshold1, threshold2, threshold3} in Algorithm 1. For setting the best thresholds combination, we discretize these thresholds in empirical VOLUME 8, 2020 Algorithm 1 Peak-attachment algorithm.
Input: Adjacency matrix of a line graph of temporary GGN: L. Overall scores of all nodes in the line graph of temporary GGN: OS. Three thresholds: threshold1, threshold2, threshold3. Peak set: peak. Output: Core module series: Module.
For the a th peak with OS a > thrshold3 in peak: Merge modules in Module having overlapping members. ranges and then generate candidate thresholds combinations through Cartesian product among these scatter sets. These combinations are subsequently tested by applying them into peak-attachment algorithm. For time point t, an index calculated by (3) to measure the performance of each threshold combinations.
where Module tE m indicates the edge set of Module tm . The resultant core modules describe the properties of the temporary state of disease and represent the active part of pathways. We next identify the perturbed pathways according to the associations between pathways and the resultant core modules.

D. IDENTIFYING PERTURBED PATHWAYS VIA A COHERENCE INDEX BETWEEN SUBNETWORKS
For evaluating the correlation between subnetworks of core modules and pathways, we present a novel coherence index (CI) defined as, that is, the ratio of the number of common V-structures to the total number of V-structures in the smaller subnetworks (See (4)).
where L indicates the adjacency matrix of TLG, in which L m,n = 1 represents that node m interacts with nodes n, and otherwise, L m,n = 0. C 1 and C 2 are both subnetworks. The perturbed pathway is such pathway that the CI of its associated line graphs and any core modules is larger than threshold CI . In this work we select threshold CI = 0.79 which is obtained by applying 3-sigma principle to the CI value curve composed of all the CIs between all pathways.
The compositions of perturbed pathways reflect their associated temporary states separately, which further enables an overall description of the disease progression after linking them following the time order. For time point t, the set includes all perturbed pathway can be denoted by P t . Let P t be the set of perturbed pathways at t and AP Then the distribution of perturbed pathways can be recorded by N × T dimensional matrix F, with binary element F j,t = 1 if pathway AP j perturbed at time point t, and otherwise, F j,t = 0.

E. TRACING PATHWAY EVOLUTION RELATIONSHIPS AND IDENTIFYING DRIVER V-STRUCTURES
For understanding the molecular mechanism underneath the state transitions in the progression of disease, pathway evolution relationships for time point t and t − 1 are detected. We calculate the CI across all pathway pairs between pathways in P t−1 and pathways in P t . Then one association of the pathway pair with CI larger than threshold CI is identified as an evolution relationship, indicated by (p s,t−1 , p j,t ), where p s,t−1 ∈ P t−1 and p j,t ∈ P t .
The occurrences of evolution relationships rely on the functions of genes, which makes it possible to find out the underlying molecular mechanisms that drive the disease development. Therefore, we identify driver V-structures for each pair of evolution relationship (p t-1,s , p t,j ) by a deletion strategy. Technically, each V-structure contained in p t−1,s or p t,j is removed stepwise and recalculated CI p t−1,s ,p t,j after this operation. We then rank these V-structures in decreasing according to the decrease of CI p t−1,s ,p t,j . Furthermore V-structures in the top 5% are labeled as driver V-structures corresponding to (p t-1,s , p t,j ). Traversing all evolution relationships at time point t − 1 and t, all the driver V-structures are collected and denoted by DV t . After identifying driver V-structures, the properties of TLG and the biological process promoting the states transition can be traced.

III. RESULTS
In this paper, we applied VD-analysis in a time course gene expression dataset (GEO accession: GSE63175) related to mouse type 2 diabetes (T2D), downloaded from Gene Expression Omnibus database (GEO, https://www.ncbi.nlm. nih.gov/geo/). The dataset is provided by an experiment where some mice are divided into two groups (i.e. control group vs experimental group) by feeding with high fat vs low fat diet. The temporary gene expression microarrays inside are measured in the following time points: Day1, Day 6, Day 10, Day 14, Week 3, Week 6, Week 9, Week 12, Week 15 and Week 18 (Note: the above sampling time points are denoted by T1, T2, . . . , T10 in the following).

A. THE PROGRESSION OF T2D CAN BE DIVIDED INTO THREE STAGES ACCORDING TO THE DISTRIBUTION OF PERTURBED PATHWAYS
The temporary states of T2D can be discriminated in manner of distributions of perturbed pathways. The numbers of the  Table 1. More details are shown in S2 Table. Although physical conditions of mice are time-varying, the whole process of a disease can be divided into several stages. Two important criteria are utilized to conclude whether a disease staging is reasonable. One is distinct perturbed pathway distributions for different stages, and the other is consistent perturbed pathway distributions at the time points involved in the same stage. For detecting the stages of T2D, a hierarchical clustering on the distribution of perturbed pathways is performed and the result indicates that the 10 time points are clearly divided into 2 clusters, that is, (1) T1-T6, and (2) T7-T10 (See Fig. 4(A)).
In addition, based on the result of hierarchical clustering, we investigate whether there are similar perturbed pathway distributions within the same clusters. To assess the extent to the state changes between adjacent time points, we define an overlapping ratio OR t between P t and P t−1 as (5). And the resultant OR t is listed in Table 1.
Obviously, the smaller of OG, the more dramatic of the changes of states. We find that the minimum OG is achieved when t = 7(OR 8 = 0.2500), which indicates a stage transition event between T7 and T8. Furthermore, with OR > 0.2500, T2D is proved to perform similar state characteristics in both T1-T6 and T7-T10. Specially, OR 10 equals to 1, referring the disease has reached the end state at the last time point. In addition, we illustrate the perturbed pathway distributions at all time points in the form of a heatmap (See Fig. 4(B)). We find that there are more pathways perturbed at T1-T6 under the effect of high-fat diet compared with T7-T10. This difference might originate from the underlying changes of characteristics of the molecular mechanism with the development of T2D.

B. T2D PERFORMS DISTINCT STATE PROPERTIES AT DIFFERENT STAGE
Based on the obtained stages, we shed light on the properties of each stage of T2D progression according to the perturbed pathway distributions. In the early-stage, there are 3 pathways are involved in all 6 time points of earlystates, namely, Platelet activation, Dopaminergic synapse, and Estrogen signaling pathway. All of which are associated with insulin regulation and platelet activation. In the later-stage pathway perturbed are associated with diabetes complications. Specially, there are 7 pathways perturbed at the last time point (T10), 4 of which are related to diabetes complications (mainly non-small cell lung cancer and prostate cancer). Therefore, it is convincing to conclude that mice in experiment group eventually suffer cancer which is a diabetic complication. Key perturbed pathways of each stage and their T2D related functions are shown in Table 2. In summary, in early-stage the physical condition of the mice in the experimental group are normal, though the inherit regulation mechanism is disturbed under the high-fat and high-glucose environment; in later-stage, mice eventually have T2D or T2D complications; and in transition-stage, which is a critical VOLUME 8, 2020 stage, there is a period occurs a transition event from normal state to disease. But the molecular mechanisms underlying the transition-stage have not been determined so far, due to the lack of information on the associations between the temporary states at T6 and T7.

C. DRIVER PATHWAYS ARE CONCENTRATED IN 6 PATHWAY CLUSTERS
We then investigate the transition states and further uncover the molecular mechanisms promoting T2D progression. So far, we have obtained the evolution relationships between perturbed pathways as well as their driver V-structures (See Methods). The driver V-structures are investigated into the known pathways in KEGG. For time point t, we then define the promotion power of a pathway by the number of DV t it contained. We evaluate the promotion power of all pathways at all time point and the result is assembled into a matrix F DP ∈ N N P ×(T −1) , where N P indicates the total number of pathways that KEGG includes. F DP is supplied in S4 Table. Moreover, the normalized F DP is shown as heatmaps in Fig. 4 in column direction (Fig. 4(A)) and row direction (Fig. 4(B)) respectively. These heatmaps indicate obvious differences of promotion power between (1) the distinct pathway at the same time point and (2) the identical pathway at different stages. More details are demonstrated as follows.
Firstly, a large number of pathways promote T2D progression at all time points, similar to the traditional cognition related to T2D which most biological processes are disturbed during the progression of T2D (See Fig. 4(A)). However, when we consider a rougher category of pathways provided by KEGG, the driver pathways are concentrated on 6 biological processes, namely, Signal transduction, Immune system, Endocrine system, Nervous system, Cancers: Specific types, and Infectious diseases: Viral. More details about biological processes concentrated are shown in S3 Table. In addition, 9 pathways (pathway in cancer, Phospholipase D signaling pathway, Rap1 signaling pathway, Platelet activation, Chemokine signaling pathway, Relaxion signaling pathway, Estrogen signaling pathway, Cholinergic synapse, Human cytomegalovirus infection) are founded to possess stronger promotion powers at most time points (i.e., T1, T2, T3, T4, T5, T8, T9) occupying both early-stage and later-stage.
Secondly, for all time points, most driver pathways have stronger powers in promoting T2D progress during early-state than later-states, suggesting of more driver V-structures participating T2D development under the condition of high fat and high glucose diet (See Fig. 4(B)). When in later-stage, promotion powers are overall reduced, which indicates that T2D has reached a stable state that characterized by various T2D complications, especially cancer. With aim of investigating the differences of promotion power of pathways between early-stage and later-stage, we select top 5 driver pathways ordered by promotion powers at the two stages (See Table 3) and illustrate their functions with T2D. In early-stage, driver pathways are associated with biological processes aberrant under high-fat and high-glucose diet, such as adipose tissue development, glucose metabolism regulation, or pancreatic β-cell survival; In the transitionstage, driver pathways are related with insulin regulation or T2D complications; In later-stages, almost all of the driver pathways correspond to T2D complications and insulin resistance. Details about driver pathways and their T2D-related functions are listed in Table 3.

D. DRIVER V-STRUCTURE CAN BE GIVEN AS BIOMARKERS
Although driver pathways are capable of uncovering the regulation mechanism involved in disease progressions, they consist too may genes as well as interactions which hampers a concise understanding on the disease driver mechanism. We therefore extract the V-structures playing center roles in the driver pathways as biomarkers for delineating the biological processes. 3 kinds of V-structures are paid attention to corresponding to T2D stages as follows: 3) DV later = t=T 7,··· ,T 10 DV t . Fig.6 shows the line graph s related to DV early , DV transition , and DV later respectively. Apparently, these line graph s are divided into several independent modules. We select the unique V-structure as biomarkers contained in each subnetwork according to 3 network centrality indexes, namely betweenness centrality, degree, and average shortest path length. By doing a literature survey, we found that these V-structures biomarkers perform functions similar to driver pathway in identical stage, which indicates that V-structures biomarkers are able to represent driver pathways to delineate the state of T2D progression. The biological functions of these V-structures are discussed in the following.

1) EARLY-STAGE: DV EARLY
1) Pik3ca-Src-Pik3r3: This V-structure promotes T2D process in early-stage via regulating the adipose tissue development and insulin produce. In details, Pik3ca encodes an important protein which is annotated to insulin receptor substrate binding [29] and play significant roles in lots of T2D related biological process such as adipose tissue development, cell migration, cellular response to glucose stimulus and glucose metabolic process. Src encodes a protein which performs as insulin receptor substrate binding. Moreover, Src family protein tyrosine kinases interact with a variety of receptors to modulate cell growth and cell migration [44]. This indicates Src may function in adipose tissue growth. Rodriguez-Monterrosas et al. has reported that SRC-dependent pathway can promote the expression of insulin-like growth factor one receptor (IGF1R) and insulin receptor, both of which mediate cell responses induced by insulin [45]. Pik3r3 encodes a protein which represents a regulatory subunit of PI3K.
There is an experimentally proven functional link between PIK3R3 and hepatic lipid metabolism [46].  for the other, insulin can induce the phosphorylation of ERK1/2 to promote the β-cell growth [50]. Furthermore, the reintroduction of Erk1 in Erk1-knockout mice enables the activation of some glucose induced proteins such as mitogen and stress-activated kinase1 (MSK1) and cAMP-responsive element-binding protein (CREB). The function of Pik3r3 relies on regulating the process of insulin receptor signaling pathway and cell migration. 2) Cldn18-Cldn23-Cldn4: All of the three genes are included in cell adhesion molecules, pathway. Ku et al. reported that the expression of cell adhesion molecules is significantly increased in high glucose [51]. Moreover, the upregulation of cell adhesion molecules is associated to be associated with cancer. Note that mice eventually suffer cancer, this V-structure can be used to predict cancer caused by T2D. 3) Itpa-Entpd1-Entpd8: This V-structure may modulate nucleotide metabolism. Firstly, all of the three genes are included in Purine metabolism and Purine metabolism, conserved biosystem. Both two pathways present in the last time point and are reported to be associated with cancer. Specifically, Entpd1 is a nucleoside triphosphate diphosphohydrolase that hydrolyzes ATP and ADP. [52], and performs a crucial role in regulating the activity of both purinergic receptors and downstream adenosine receptors [53]. Furthermore, Chakravarthi et al. reported that purine metabolism and pyrimidine metabolism are simulated in cancer cells due to the enhanced cell division [29]. Secondly, all of the three genes regulate nucleotide metabolism. Extracellular nucleotide metabolism has been reported as a critical factor of T2D [53]. Itpa encodes a protein that and participates in nucleotide metabolic process. Via modulating extracellular nucleotide signaling, Entpd1 regulates the process of T2D.
3) LATER-STAGE: DV LATER 1) Gpi-HK3-Hkdc1: This V-structure functions in amino sugar and nucleotide sugar metabolism pathway and regulates glucose related cellular process in later-diabetes stage. Gpi encodes a member of the glucose phosphate isomerase protein family that controls glycolysis and gluconeogenesis. Upregulation of Glycosylphosphatidylinositol-specific phospholipase D (GPI-PLD) may cause impaired hepatic insulin signaling [54]. The protein encoded by HK3 is one of important hexokinase that catalysis glycolysis and regulate glucose consumption [55]. Meanwhile, the role of Hkdc1 for conferring impaired glucose tolerance is demonstrated, which indicates the function in global glucose consumption [56]. 2) Acat1-Ehhadh-Acat2: The role of this V-structure to T2D relies on its effect on insulin resistance that is one of the main causes of T2D. At transcriptional level, insulin promotes ACAT1 gene expression and activates the ACAT1 P1 promoter [57]. Duggirala et al. investigated the chromosomal region related to insulin resistance and the results indicate that Acat2 is one of positional candidate genes harbored in this region [58].

E. RELATIONSHIP AMONG PERTURBED PATHWAY, DRIVER PATHWAY, AND V-STRUCTURE BIOMARKERS
We have excluded the progression of T2D from a further dynamic insight based on DNM. In VD-analysis, perturbed pathways at each time point is identified to highlight the properties of temporary states in T2D. Furthermore, numerous driver V-structures as well as driver pathways are identified for uncovering the state transitions. Considering the strong influence of V-structure, the function of all driver pathways can be represented by several V-structures biomarkers. In this subsection, we summarize the relationships among perturbed pathway, driver pathway, and V-structure biomarkers for providing a clear understanding of T2D progression revealed by our VD-analysis. The progression of T2D cross three stages can be excluded as follows. According to the distribution of perturbed pathways, in the early-stage mice are damaged by the highglucose environment, manifested by glycated albumin, early diabetic complications, and protection on insulin production to delay diabetes. They are mostly caused by protein glycosylation under the influence of glucose dysregulation and adipose tissue development, according to driver pathways and V-structures. While in the later-stage, the normal regulation mechanism of mice has been ruined. Pathways related to T2D and T2D complications, especially cancer, have occupied the dominant position, indicating the complete destruction of T2D on the metabolic system. In this stage, mice are more likely have liver injury and loss the capacity of regulating glycolysis and insulin degradation. The transformation of T2D is completed in transition-stage, mainly achieved by insulin secretion when activated by the regulation glucose and cell adhesion molecules. Associations among perturbed pathways, top 5 driver pathways and V-structure biomarkers in the analysis of T2D progression are summarized in Table 4.

IV. DISCUSSION AND CONCLUSION
Description and understanding on the progression of disease remain a challenge for bioinformatics, where a systematic as well as dynamic insight on molecular regulation system is of vital importance. For understanding the dynamic of molecular mechanisms during disease progression, typical VOLUME 8, 2020 method like DNM encodes the context of disease progression as time course GGNs where nodes are active or differentially expressed at temporary moments. However, DNM do not resolve the challenges of the description for disease progression as they only account for the temporary states at several time points, and the transition between two neighboring temporary states are beyond investigation.
In this study, we expand the DNM by a further study on state transitions, which contributes to the understanding of disease progression with a dynamic perspective. We propose a framework name after VD-analysis, a disease progression mechanisms description method based on dynamic GGNs. The VD-analysis encourages a dynamic insight on the description of disease progression. In other word, our focus is not only on the temporary states during disease progression, but also on the molecular mechanisms promoting the states transition so that drive the disease to deteriorate.
The VD-analysis takes into account a gene functional unit composed of three genes and two interactions as a basis in gene regulation mechanisms. Understanding of molecular regulation systems based on V-structure acting as an inseparable unit is superior than on individual genes in conventional GGN for two reasons. Firstly, V-structure are structural and appropriate for explaining cross-talk between pathways. Secondly, V-structures have more clear function annotations in contrast to individual genes, suggesting they are significant for determining the underlying molecular mechanisms underneath state transition.
In VD-analysis, we first constructed ALG t from AGN t for researching V-structures in the VD-analysis framework. Secondly, we identify core modules at each time point based on a peak-attachment algorithm (See Algorithm 1). Next, based on the resultant core modules, perturbed pathways are identified using a coherence index (CI, (4)). In the end, we recognize evolution relationships between perturbed pathways, and the driver V-structures promoting state transitions are identified by a deletion strategy. We demonstrate that VD-analysis encourages understanding of disease progressions according to distributions of pathways. There are two kinds of pathways identified in VD-analysis: (1) perturbed pathways, encoded by one or zero, for delineating the temporary states of disease; (2) driver pathways, encoded by driver V-structures, for uncovering the molecular systems underlying states transitions. Although pathways are excellent in revealing gene regulation relationships, they are complex in structure, so that V-structures are identified acting as biomarkers.
We apply VD-analysis to a time course gene expression dataset related to mouse T2D. Results demonstrate the efficiency of VD-analysis in description and understanding the process of disease. The notable contributions of VD-analysis are summarized as follows: 1) Three stages of T2D progression identical with experiment reports are extracted according to the distribution of perturbed pathways. In VD-analysis, all sampling time points are categorized into 3 stages, i.e. early-stage, transition-stage and later-stage.
All of the stages are meaningful in clinical. Firstly, early-stage corresponds to normal stage when mouse is healthy though the inherent many biological processes are affected by the high-fat diet. In this stage, quite a few pathways perturbed while the body remains healthy and struggles to maintain Intracellular homeostasis against sugar intake. Secondly, transition-stage corresponds to a critical stage before onset of T2D. In this stage occurs least perturbed pathways which indicates a low-dimensional space for state transition. Finally, the later-stage corresponds to a disease condition. We find that most perturbed pathway here related to T2D complications, especially cancer, indicating a final state of cancer for mice.
2) Driver pathways of T2D concentrate in 6 biological processes. Although T2D is a systematic disease and affect majority of pathways during the progression, we find that driver pathways significantly concentrated in 6 biological processes, namely, Signal transduction, Immune system, Endocrine system, Nervous system, Cancers: Specific types, and Infectious diseases: Viral.
3) Driver V-structures interpret diseases progression and can be given as biomarkers. According to stage differentiated by perturbed pathways, we investigate four kinds of V-structures in the following four periods, namely, early-stage, later-stage and transitionstage. Fig. 6 shows that there are several distinct subnetworks generated in the driver V-structure network corresponding to each period. Therefore, only hub V-structures that play central roles in subnetworks would be selected. For early-stage, we identified Pik3ca-Sr-Pik3r3, a V-structure that functions in regulating the adipose tissue development. We found that there are two hub V-structures play significant roles throughout the development of T2D. In details, Plcβ1-Plcβ2-Plcβ3 consists of three subtypes of Plcβ that affect glucose-induced insulin release. Erk2-p38γ -Erk1 promotes insulin gene transcription in β-cells stimulated by glucose. For later-stage, a total of three hub V-structures are selected in later-stage Specifically, Gpi-HK3-Hkdc1 regulates glucose related cellular processes and are related to impaired hepatic insulin signaling; Acat1-Ehhadh-Acat2 effects on insulin resistance. Furthermore, a total of 3 hub V-structures are selected in laterstage, which can be used as biomarkers of cancer caused by T2D. Specifically, influenced by glucose, Erk1-Hras-Pik3r3 regulates insulin production and insulin receptor signaling pathway; Cldn18-Cldn23-Cldn4 are overexpressed significantly in high glucose and included in a cancer-related pathway, cell adhesion molecules; Itpa-Entpd1-Entpd8 modulates nucleotide metabolism, which is crucial for cancer growth due to the enhanced cell division.
Based on DNM and V-structure, VD-analysis investigates the progression of disease. Not only do the work reveals the temporary state of disease from a static insight, but also uncover the underlying biological processes that promoting the disease process from a dynamic perspective. Furthermore, some V-structures biomarkers are identified and used to predict state transition events. As a gene module with both well structure and definite biological annotation, V-structure has more potential in delineating molecular mechanism. APPENDIX S1