Survival Marker miRNA-Mediated Subpathways of Breast Invasive Carcinoma Derived From Activity Profile

We not only collected 1102 samples form 3 different breast invasive carcinoma (BRCA) datasets, which included a TCGA dataset and 2 GEO datasets, but also collected 23 other cancer datasets, one of which included more than 100 samples. Using these datasets, we topologically inferred miRNA-mediated subpathway activity profiles, which integrated gene expression profiles, prior gene interaction information and target relations between miRNAs and genes, and topological information. Then we constructed the Global Directed Pathway Network (GDPN) with genes as nodes, and from 3 BRCA datasets and other 23 cancer datasets identified a set of miRNA-mediated subpathways that are survival-related risk markers. The results showed large activity values correlated with poor prognosis, such as hsa-miR-107 and hsa-miR-142-3p of BRCA datasets. We assessed the stability and robustness of the miRNA-mediated subpathway survival markers with 2 GEO datasets and 23 external independent datasets. The results showed that the proposed method can significantly reduce noise from sequencing errors and samples heterogeneity by integrating pathway topological information, and can break down the boundary of pathways and provide a new measure for detecting survival-related markers. The top miRNA-mediated subpathways are more reliable in stratifying high risk group and low risk group and selecting therapeutic strategies.


I. INTRODUCTION
Breast invasive carcinoma (BRCA) is not only the most commonly diagnosed malignancy in women around the globe but also the world's first cause of female deaths from cancer [1]. BRCA is a heterogeneous disease since from one patient to another it differs in clinical presentation, molecular characteristics, and prognosis [2]. The heterogeneity of BRCA increases the complexity of pathological diagnosis, deciding subtypes and stratification for clinic treatment. For BRCA patients, personalized medicine is becoming a promising way to overcome large variations between individual's treatment responses [3], [4]. In this study, we need to detect robust risk survival markers related to BRCA and evaluate the The associate editor coordinating the review of this manuscript and approving it for publication was Quan Zou .
performance of our method based on independent datasets. Furthermore, the purpose of our study based on survival data was to find consistent risk miRNA-mediated subpathways (miRNAs), together with facilitation of cancer samples stratification, which could be associated with the prognosis of cancers.
MicroRNAs (miRNAs) are short, endogenous, non-coding RNAs that regulate post-transcription by inhibiting the expression of target genes, thereby affecting the initiation, progression, and prognosis of cancers [5]- [7]. Many studies about high-throughput miRNA-expression profiling have been performed with the aim to identify cancer-related miR-NAs for clinical utility in diagnostic and prognostic applications [8]- [10]. MiRNAs are stable and protected from RNase-mediated degradation, thereby enabling its detection in biological fluids and archival tissues for prognosis VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ biomarker studies [11]. Thus, several methods were proposed to detect miRNA survival-related biomarkers of the cancers, such as mirwalk analysis platform [12], SVR-luad [13] and miMe [14]. Despite these efforts, at present it is quite difficult to find a clear compendium of miRNA markers for BRCA survival. The stratification performance of miRNA biomarkers often descends drastically in other independent datasets for the same disease phenotype. These problems may be caused by cellular heterogeneity within tissues, the racial differentiation of the patients, the measurement errors in microarray platforms and the sample shortages [15]. Previous studies demonstrated that pathways could be used as a crucial feature in identification of cancer-related survival markers [16]. Therefore, pathway topological analysis can also be used to mine candidate markers for prognosis prediction. Therefore, we incorporated multiple information, such as the differentially expressed level of genes (univariate Cox regression p < 0.05), the topological importance of genes in the Global Directed Pathway Network (GDPN), clinical information of samples and target relationships between miRNAs and genes, to identify miRNAmediated subpathways that could be levered as markers of survival and prognosis for BRCA. The purpose of this study is to topologically infer active miRNA-mediated subpathways toward precise identification of risk survival-related markers. To undertake this work, we proposed a machine learning method, which included data preparation, survival markers selection and risk markers evaluation. First, the collection provided a TCGA (The Cancer Genome Atlas) and 2 GEO (the Gene Expression Omnibus) datasets that included 1102 human BRCA samples, and obtained the target relationships between miRNAs and genes from TarBaseV6.0 [17], miRecords [18], miRTarBase [19], and miR2Disease ( [20] databases. We also downloaded other 23 cancer datasets, which included 7086 samples. Second, the GDPN was constructed with genes as nodes. Third, we inferred the expression profiles of miRNA-mediated subpathway activity and extracted the active miRNA-mediated subpathways as risk survival markers by Lasso-Cox model. Finally, we evaluated the performance of our method based on 2 independent datasets and other 23 cancer datasets. After internal and external cross-validation, the top 20 high frequency miRNAmediated subpathways selected as risk survival-related markers were used to construct risk predictors to stratify patients with respect to their relative risk. Figure 1 depicts the method of survival markers identification. The method contains three parts: data preparation, survival markers selection and method evaluation. The data preparation step consists of gene expression matrix and miRNA expression matrix, in which each row represents a gene (or miRNA) and each column represents a sample. Then, the method integrates the gene expression matrix, topology information and target relations between miRNAs and genes into miRNA-mediated subpathway activity profile which is FIGURE 1. The workflow of survival marker identification. The miRNA-mediated subpathways are obtained from the gene profiles by directed random walk. The z(g i ) is a row vector of gene g i expression value across all samples. The a(miR j ) (i.e. miRNA-mediated sub-pathway activity) is also a row vector which is the row j of the miRNA (namely miR j ) expression value across all samples. The middle panel is the overview illustration of miDRW-based miRNA-mediated subpathway activity inference. The GDPN is constructed on 343 KEGG pathways, which include 7159 gene nodes and 39930 directed edges. The dotted line circle is a virtual node which ensures gene weights to flow through the GDPN. P is the initial weights of the genes and P ∞ is the output weight vector. For the miR j , we reverse the edge direction when merging the pathways into the GDPN. The miRNA-mediated subpathway activity a(miR j ) only integrates expression value vector of the univariate Cox regressio p<0.05 genes of miRj into P ∞.

II. MATERIALS AND METHODS
topologically inferred by the directed random walk. We select risk survival-ralated markers by ranking frequency based on the outcome of 500 Lasso-Cox models. Finally, we evaluate the performance of our method based on 2 different independent datasets and other 23 cancer datasets. miRNAs and 19596 genes. We also obtained the other independent dataset from the GEO database with accession number GSE22220 [24]. The platform was Illumina humanRef-8 v1.0 expression beadchip (GPL6098) and Illumina Human v1 microRNA expression beadchip (GPL8178). A total of 207 samples with 303 miRNAs and 10220 genes profiled were included for further analysis. We also downloaded other 23 cancer datasets from UCSC Xena Cncer Browser. All GEO profiles and 23 TCGA datasetes were processed in the similar manner with ''BRCA-TCGA'' profile.

B. miRNA-TARGETS ASSOCIATED WITH BRCA
In this study, the target relationships between miRNAs and genes were derived from TarBase V6.0, miRecords, miRTarBase, and miR2Disease databases. After filtering, a total of 755 226 non-repeated human specific interactions among 1137 miRNAs and 20 263 genes were obtained as follows: 598 pairs from miR2Disease, 1749 pairs from miRecords, 26 388 pairs from TarBase, and 750 381 pairs from miRTarBase. We integrated predicted and experimentally verified miRNA-target relationships in our study.

C. CONSTRUCTING THE PATHWAY GRAPH WITH THE GLOBAL DIRECTION NETWORK (GDPN)
The global pathway graph was constructed based on the interaction data from 343 KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways by using the R package ''Subpath-wayMiner''(https://github.com/chunquanlipathway) [25]. Firstly, each of 343 KEGG pathways was converted into a directed graph based on the interaction information of the KEGG. Then the resulting 343 graphs were merged into a global directed pathway graph (GDPN). We merged the genes that overlapped among the multiple pathways and retained the topological structure of each pathway. The GDPN covers 7159 nodes and 39930 edges. Moreover, a virtual node was added to the GDPN. Each node pointed to the virtual node and virtual node pointed to all other nodes in the GDPN [15], [26]. Each node of the graph represents a gene or a metabolite, and each directed edge represents interaction or regulation relationships between genes in the GDPN. The direction of the edge is derived from the type of interaction between two nodes, which is available from KEGG. For example, if gene A inhibits or activates gene B, the direction of edge points to B. The directed random walk [15], [26] is similar to the PageRank algorithm [27], which is used to search important Web Pages. For PageRank algorithm, if there are more other pages pointing to a Web Page, the Web Page is more important. However, a gene is important if it regulates more other genes [28]. This gene is often called hub genes. The GDPN is followed by power law distributions with an R 2 = 0.72, 0.77 and 0.71 for in-degree, out-degree and total degree, respectively. Only a small fraction of genes have higher degrees in the GDPN. It is the most important basic of random walk algorithm [29].

D. miRNA-MEDIATED SUBPATHWAY ACTIVITY INFERENCE IN THE GDPN
We evaluated the topological importance of genes by the directed random walk in the GDPN. The formula is defined as follows: M is the row-normalized adjacency matrix of the GDPN, which was to divide the sum of all elements in the row by each element of the row; r ∈ [0, 1] (r = 0.7 in this study) is restart probability; P t is a vector in which the kth element holds the probability of being at gene k at time step t; P 0 is assigned to each gene with its -log(p-value), which is derived from univariate Cox regression analysis, and normalized to a unit vector; P t will converge to steady state P ∞ after several iterations until |P t+1 − P t | ≤ 10 −10 . P ∞ are topological weights, which give a measure of global topological importance of genes in the GDPN. So, we suppose that if genes (1) have large degree; (2) have significant p-value and (3) are closely associated with many survival genes that also have significant p-value, these genes will obtain larger weights. The larger weights of genes enhance the inference of a more robust miRNA-mediated subpathway activity. The miRNAmediated subpathway activity inference formula is defined as follows: According to the formula (2), the significantly differential target genes (the univariate Cox regression p<0.05, the same as significantly differential target genes below) of each miRNA, could be integrated into a special value, which was called miRNA-mediated subpathway activity. Therefore, the miRNA-mediated subpathway with different sizes are comparable under this activity score function. The expr(g ik ) is the normalized expression value gene k on the ith samples; COXzscore( ) is the z statistics of g k and miR j from a univariate Cox regression analysis on their expression values and the surviving time; sgn( ) returns −1 for negative numbers and +1 for positive numbers; n j represents the number of differentially expressed target genes of each miRNAmediated subpathway in the GDPN; j represents the number of miRNA. The formula (3) is a constraint, which is added to formula (2) to detect the negative regulation between miRNAs and genes (sgn(COXzscore(g k )·COXzscore(miR j )) equal to −1). For example, for upregulated miRNAs, we only integrated downregulated target genes into the gene set and calculated miRNA-mediated subpathway activity, and vice versa. In other words, if COXzscore(g k )·COXzscore(miR j ) is negative and COXzscore(g k ) is positive, the expression values of g k in high risk samples will be larger than those in low VOLUME 8, 2020 risk samples; if COXzscore(g k )·COXzscore(miR j ) is negative and COXzscore(g k ) is negative, the expression values of g k in high risk samples will be smaller than those in low risk samples. Thus, the value of a(miR ji ) will be large in high risk samples and small in low risk samples for all negatively correlated miRNAs and genes. Finally, we obtain an activity profile of miRNA-mediated subpathway, of which each row represents a miRNA and each column represents a sample, and the values represent the activity of the miRNA in BRCA.

E. THE SELECTION OF RISK miRNA-MEDIATED SUBPATHWAY AND THE EVALUATION OF METHOD
For selecting the risk miRNA-mediated subpathway, we sorted the miRNA-mediated subpathways activity by their univariate Cox regression p-value in increasing order and selected miRNA-mediated subpathways with p-value < 0.05 as candidate survival markers. Then we used Lasso-Cox model to select risk survival-related markers, and used C-statistic to estimate the number of survival markers. Firstly, we randomly selected one-fifth of samples for test and the rest for training. For unbiased evaluation, we repeated fivefold cross validation 100 times and calculated the average C-statistics. A total of 500 concordance statistics were generated from five test sets in turn. The R package ''glmnet'' was used to implement Lasso-Cox model based on activity profile of candidate survival markers. The input of Lasso-Cox model is the data (a ji , t ji , σ ji ), where a ji is the jth miRNA-mediated subpathway activity of ith sample; t ji is the the survival time when σ ji = 1, and is the censoring time when σ ji = 0. Thus, we could study the relationship between the miRNA-mediated subpathway activity values and the survival times. A few more important miRNA-mediated subpathways are selected with nonzero coefficients (lambda.min). Then we counted their frequency over 500 Lasso-Cox models. The top ranked miRNA-mediated subpathways in the frequency list are used as candidate risk survival markers. Finally, we use the R package ''survAUC'' to calculate the C-statistic and estimate the number of survival markers.

A. THE METHOD INFERRED SURVIVAL-RELATED miRNA-MEDIATED SUBPATHWAY ACTIVITY
First, the method is applied to analyze the BRCA-TCGA dataset. We calculated the topological importance of each gene in the GPDN with the formula (1), and inferred miRNA-mediated subpathway (miRNAs) activity profiles with the formula (2) and formula (3). In miRNA-mediated subpathway activity profiles, each row represented a miRNAmediated subpathway (miRNAs) and each column represented a sample. For BRCA-TCGA dataset, a total of 153 miRNA-mediated subpathway and 749 samples were contained in the inferred activity profiles. To reveal the survival predictive ability of the miRNA-mediated subpathway activity, we sorted the activity profile of miRNA-mediated subpathways by their univariate Cox regression p-value in increasing order. Then, the candidate miRNA-mediated subpathways (p < 0.05) inputted the Lasoo-Cox model to perform 100 times five-fold cross validation, and calculated the average C-statistics. The range of average C-statistics was from 0.6102 to 0.8125. It was a stable point of the C-statistics when the N equaled to the 20. Therefore, we called the most frequent miRNA-mediated subpathways (the top 20) survival-related risk markers. These miRNAs have been shown to be related to BRCA survival (Table S1: miRNAs). Hierarchical clustering of the risk markers activity profiles was performed using the R package ''gplots'' with complete linkage and Euclidean distance ( Figure 2). In the heatmap, each row represented a miRNA-mediated subpathway, and each column represented a sample. The survival patterns of BRCA-TCGA dataset were shown in the cluster analysis. The blue cluster samples had poor survival (958.34±769.51, Figure 2A). It implied that samples had larger values of the miRNA-mediated subpathway activities in the high risk cluster, and the miRNA-mediated subpathway activities could effectively distinguish different survival pattern of samples. Furthermore, we performed hierarchical cluster analysis based on GSE19783 and GSE22220 using these risk markers except hsa-miR-330 (no hsa-miR-330 in GSE19783 and GSE22220). We obtained the consistent results across these independent datasets ( Figure 2B-2C). The results showed that our method could infer robust miRNA-mediated subpathway activities, which can distinguish samples with more significance. Furthermore, these most frequent miRNA-mediated subpathways not only divided samples into two clusters, but also divided them into more detailed subtypes with different survival times ( Figure S1).

B. THE miRNA-MEDIATED SUBPATHWAY INTEGRATED TOPOLOGICAL IMPORTANCE OF THE TARGET GENES
The risk markers that are selected based on activity profile can give us insight into the biological basis about why breast cancer patients have good and poor outcome. For simplicity, if the target genes of miRNA are univariate Cox regression p < 0.05, we called them differentially expressed target genes in this study. We inferred the activity profile by directed random walk method, which integrated the topological importance of differentially expressed target genes in the GDPN. For evaluating the value of the topology information, we annotated differentially expressed target genes of risk markers into KEGG pathways, respectively. We ranked those pathways with the p < 0.01 in ascending order, and selected the pathways of FDR < 0.05 (Benjamini and Hochberg method). A total of 189 pathways were annotated, and each pathway of the 189 KEGG pathways was selected with the risk markers. Then we counted the frequency occurrences of each pathway and identified the most frequent pathways with over 10 occurrences, of which 82.76% (24/29) were supported by the existing literature (Table S1: Pathways). ''Metabolic pathways'' (hsa01100) and ''HIF-1 signaling pathway'' (hsa04066) were ranked first and second on the most frequent pathways list respectively. The ''Metabolic pathways'' had been the research hotspots in the therapy of breast cancer [30][31][32][33]. Metabolic pathways promoted cancer cell survival and growth [34]. The ''Metabolic pathways'' had been shown to play important roles in early cancer detection such as amino acid metabolism, arachidonic acid (AA) metabolism, fatty acid metabolism, linoleic acid metabolism, and retinol metabolism, showing significant differences between the breast cancer group and the control group in a Korean prospective cohort [30]. Many researches had shown that the effects of hypoxia-inducible factor 1 (HIF-1), a master regulator of the hypoxic response, had been extensively studied during breast cancer progression and metastasis [35]. Many studies showed that HIF-1 signaling pathway played a role in breast cancer progression and metastasis with the PI3K-Akt signaling pathway [36]- [38] (hsa04151, Figure 3). Furthermore, we calculated the topological weights of differentially expressed target genes in the top 20 miRNAmediated subpathways, and sorted them in descending order. 38 of the top 100 topologically important target genes were annotated to the most frequent pathways (over 10 occurrences), such as PGK1 and BTG2. PGK1 was a potential survival biomarker and invasion promoter by regulating the HIF-1α-Mediated epithelial-mesenchymal transition process in breast cancer, and high PGK1 expression predicted poor overall survival (OS) in breast cancer and some other cancers [39], [40]. BTG2 inhibited mTORc1 activity by reducing Raptor-mTOR interaction along with upregulation of tsc1 expression, which led to significant reduction of p70S6K activation as opposed to AKT1S473, but not AKT2, phosphorylation via downregulating PHLPP2 (AKT1-specific phosphatase) in breast cancers [41]. Additionally, high-level BTG2 protein expression correlated with prolonged survival in patients with breast carcinoma. [42]. The result indicated that the topological integration was valuable strategy, which could detected the risk miRNA-mediated subpathways and the key genes of these subpathways (Figure 3, Table S1: Genes). Furthermore, these miRNA-mediated subpathways located smaller pathways region, and broke down the boundary of pathways, which helped us to understand the interaction between molecules and detect survival-related markers from the whole pathway network. One node represents a dysregulated miRNA or target gene, and the color represents the level of differential expression. Some survival-related star genes targeted by hsa-miR-107 (with five-pointed star in Figure 4A), such as PGK1 and BTG2.

C. CASE STUDY
There were a total of 1087 edges between 20 miRNAs and their differentially expressed 292 target genes ( Figure 4A). Each miRNA-mediated subpathway activity was integrated by their differentially expressed target genes, and considered the topological importance of genes. Then miRNA-mediated subpathways were more robust survival markers. For evaluating predictive performance of the survival markers, we further investigated whether the miRNA-mediated subpathway activities of the most frequently selected risk markers could stratified the patients into low-risk group and highrisk group. We ranked the breast cancer patients by activity values of each miRNA-mediated subpathway in descending order, and assigned the top 40% patients as the high risk group and the the bottom 40% patients patients as the low risk group. The activity values of hsa-miR-107, hsa-miR-142-3p, hsa-miR-192, hsa-miR-338-3p, hsa-miR-106b, hsa-miR-542-3p, hsa-miR-125a-5p and hsa-miR-127-3p significantly separated breast cancer patients in independent GSE19783 and GSE2220 datasets (there is no hsa-miR-330-5p in GSE19783 and GSE22220). The hsa-miR-107 and hsa-miR-142-3p ranked the first and the second in the frequency list. Figure 4B showed the network based on hsa-miR-107, which targeted many star genes of breast cancer, such as PGK1, BTG2 and CDK8. Furthermore, we used the R package ''survminer'' to visualize the Kaplan-Meier survival curves and the log rank test of subtypes ( Figure 5). Figure 5B and Figure 5E showed the Kaplan-Meier curves for two risk groups in GSE19783 (log-rank, p = 2e-5 and p = 2e-3), and Figure 5C and Figure 5F showed the Kaplan-Meier curves for two risk groups in GSE22220 (log-rank, p = 5e-3 and p = 3e-2), respectively. It is noted that the follow-up time of GSE22220 was less than 10 years. Thus we removed those samples, of which the follow-up time equaled to 10 years. Many researches had shown hsa-miR-107 and hsa-miR-142-3p were the survival-related markers. The hsa-miR-107 was associated with tumor recurrence and reduced OS in TNBC patients as it was exclusively up-regulated in relapsing TNBC compared with non-relapsing TNBC, healthy subjects or ER + patients [43], [44]. The expression levels of hsa-miR-107 were significantly elevated in the metastatic group compared with the disease-free group [45]. Moreover, 20 differentially expressed genes of hsa-miR-107 were annotated to 39 pathways, and four differentially expressed genes (JAK1, BCL2, YWHAB and EFNA3) of hsa-miR-107 were annotated to the PI3K-Akt signaling pathway. JAK1 expression was significantly lower in breast invasive carcinoma compared with adjacent normal tissues, and JAK1 was a prognostic marker and its correlation with immune infiltrated in breast cancer [46]. BCL2 and YWHAB were significantly associated with overall survival in breast cancer [47], [48]. EFNA3 long noncoding RNAs induced by hypoxia promote metastatic dissemination [49] (Figure 3). On the other hand, The hsa-miR-142-3p played an important role in the therapy of breast cancer. The miR-142-3p overexpression resulted in a strong decline in breast cancer stem cell characteristics with a decrease in CD44, CD133, ALDH1, Bod1, BRCA2, and mammosphere formation as well as reduced survival after irradiation [50]. Re-expressing the hypoxia-repressed miR-142-3p, which targeted HIF1A, LOX and ITGA5, and caused further suppression of the HIF-1α/LOX/ITGA5/FN1 axis [51]. Notably, higher LOX, ITGA5, or FN1, or lower miR-142-3p levels are associated with shorter survival in chemotherapy-treated TNBC patients [51].
The above results indicated that the miRNA-mediated subpathways identified by our method could be served as potential markers for breast cancer prognosis,

D. EXTERNAL VALIDATION OF SURVIVAL MARKERS USING ON ANOTHER 23 CANCERS
The analyses done so far provided a ranked collection of miRNA-mediated subpathways found as robust markers of survival in BRCA. The consistency of the results obtained with the independent validation gave strong support to the risk markers found, but we had to consider the activity values used to evaluate our method in other cancers. We downloaded other 31 cancer datasets from UCSC Xena Cncer Browser (https://xena.ucsc.edu/). After data preparation, 23 cancer datasets with more than 100 samples were reserved for further analysis, which contained sample-matched miRNA expression, gene expression, clinical information. For unbiasedly evaluating the performance of the method, we calculated the activity profiles of 23 datasets, then we also selected top 20 miRNA-mediated subpathways (the most frequent miRNAmediated subpathways) as risk survival markers. We ranked the breast cancer patients by activity values of risk markers in descending order, and assigned the top 40% patients as the high risk group and the the bottom 40% patients patients as the low risk group. In 23 cancers datasets, we identified the most frequent common 89 risk markers, of which 17/89 occurred over 10 times and were supported by the existing literature (Table S1: Other independent datasets). These risk markers could significantly separate cancer patients in their datasets with Kaplan-Meier curves. However, the PRAD showed an unsatisfactory stratification effect because the dataset included multi-subtypes, and PRAD was a highly heterogeneous cancer.
The hsa-miR-107 and hsa-miR-125a-5p ranked first and second in the frequency list. Clinically, the signature of a hsa-miR-107 high, DAPK low, and KLF4 low expression profile correlated with the extent of lymph node and distant metastasis in patients with colorectal cancer and served as a prognostic marker for metastasis recurrence and poor survival [52]. High hsa-miR-107 expression was associated with poor clinicopathological parameters and prognosis in pancreatic ductal adenocarcinoma patients [53]. On the other hand, overexpression of hsa-miR-125a-5p significantly enhanced the ability of cell proliferation, migration and invasion in HNSCC, hsa-miR-125a-5p [54]. The hsa-miR-125a-5p induced apoptosis via a p53-dependent pathway in human lung cancer cells [55]. In other words, hsa-miR-125a-5p, a diagnosis and prognosis marker for multiple cancers, was the star miRNA to regulate important cancer-related pathways and targeted key genes.
Furthermore, we manually retrieved a lot of literatures, whereas there were no similar methods. We implemented five famous pathway-based identification of survival markers methods, which contained the Mean and Median methods [56], the PCA method [57] and the previous DRWPSurv method [16]. We repeated five fold validation experiments 100 times on each dataset and cross datasets, and generated 500 C-statistics and 100 C-statistics, respectively ( Figure S2).
The external validation of survival analysis showed that the method could detect the survival risk markers of cancers and the key target genes of miRNA-mediated subpathway. The results indicated that the method was designed to be independent of diseases, and could detect robust survival-related markers (genes and miRNAs) in other cancers. Meanwhile, our method was capable of maintaining the stable identification for different measurement platforms and patient cohorts.

IV. DISCUSSION
BRCA is a heterogeneous cancer composed of diverse subtypes, which presents multiple clinical symptoms. This heterogeneity causes the molecular stabilization of BRCA to remain deficient, with a lack of clear risk markers related to the prognosis of the BRCA [58]. General risk factors of the breast cancer in females are age, infertility, age of first full time pregnancy, age of menopause, an inherited mutation in the BRCA1/BRCA2 breast cancer gene [59]. In the recent years, considerable progress has been made to understand the mechanisms responsible for aberrant miRNA expression in BRCA and several miRNAs or miRNA families have been found as key regulators of BRCA hallmarks [60].
The recent advance of multi-omics integration technologies applied to the study of clinical samples does open the way to provide a new measure of risk survival-related markers. A clear limitation comes from the fact that, previous studies, focused on genes or miRNAs how to improve the performance of prognostic predictors, ignored effect of the joint impact from genes and miRNAs. It is well known that miR-NAs can disrupt biological pathways and cause diseases by regulating their target genes. Thus, there must be complicated relations among miRNAs, genes and pathways. Another view about risk survival markers is that, disease phenotypes are found to be highly associated with the key local subpathways, rather than entire pathways [25], [70], [71]. We believe that it is a promising way to understand the biological mechanism of cancer survival with deep mining of miRNA-mediated subpathways. If subpathways are to be analyzed effectively, we shall consider the topology structure of pathway network.
The purpose of this study is to detect robust risk markers for precise survival outcome prediction. The method contains three parts: data preparation, survival marker selection and method evaluation. The core of this study is selection of survival markers, which integrate gene expression profiles, prior gene interaction information and target relations between miRNAs and genes, and survival markers are topologically inferred by the directed random walk. Results have been shown that the our method obtained a better predictive performance on breast cancer independent and other cancer datasets.
We integrated weak signals of pathway member genes with mRNA-miRNAs target relations, miRNA-mediated subpathway activities accumulate larger discriminative power and are more robust in predicting survival outcome than individual gene or miRNA markers. The reliable performance of our method could be attributed to the strategy of incorporating multiple information at the step of miRNA-mediated subpathway activity inference. Our method reassigned the topological importance of each gene in a GPDN by directed random walk. The more topologically important the genes were, the larger topology weights they could obtain, and the larger activities they contributed to the miRNA-mediated subpathways. The directed random walk method adjusted gene weights according to the topological importance, the dysregulated signals of key genes were amplified in the GDPN, and vice versa. Finally, the survival-related miRNAs were identified with larger topology weights of target genes. As the hub genes often had subtle expression changes [72], the robustness of the inferred miRNA-mediated subpathway activities might be enhanced, leading to better prognosis prediction.

V. CONCLUSION AND FUTURE WORK
In conclusion, we consider that the results presented in this study provide strong evidence for the prognostic value of a list of miRNA-mediated subpathways in BRCA and for their potential to predict other cancers prognosis. In our method, the set of miRNA-mediated subpathways is ranked with their VOLUME 8, 2020 selection frequency of 500 Lasso-Cox models, being the top ones in the list that provide best prognostic performance. In fact, our results show that the top 20 miRNA-mediated subpathways applied to independent datasets provided very good separation of BRCA datasets and other 23 cancer datasets in two distinct groups of high risk and low risk. The method can significantly reduce noise from sequencing errors and samples heterogeneity by integrating pathway topological information, and can break down the boundary of pathways and provide a new measure to detect survival-related markers. However, our method depends on the data collection and credibility of target relationships. Thus, with the rapid development of human interaction databases and the sequence technology, we believe that the method is a promising way to precisely predict the state of prognosis, and provide a better guide for patient treatment.
[71] C. Li He worked as an Instructor with the Department of Measurement and Control Technology and Instruments, Harbin University of Science and Technology, in 2005, and moved up to an Associate Professor, in 2012, acting as a Master Supervisor later the same year. In 2018, he got a full professorship. He has worked for many years on 3D measurement by structured light and machine vision measurement. In recent years, he has presided over and completed nearly four research projects funded by provinces or Harbin city and been working on another one; published over 20 research articles, over ten of which were included in EI. Besides, he owns more than ten patents of invention and reaps 1 award in science and technology presented by provincial authorities. From 1984 to 1991, from 1991 to 1995, and from 1995 to 1999, he was a Teaching Assistant, a Lecturer, and an Associate Professor with the Harbin University of Science and Technology, Harbin. Since 1999, he has been a Professor with the Harbin University of Science and Technology. He is the author of three books, more than 200 articles, and more than 40 inventions. His research interests include three-dimensional visual inspection, tissue optical characteristic imaging, and image processing. He is an Editorial Board Member of the journal of Electric Machines and Control, the journal of Harbin University of Science and Technology, and the journal of Electrical measurement and instrument.
Dr. Yu is a Vice Chairman in Heilongjiang Province Instrumentation Society and a member of the council in China Instrument and Control Society. He was a recipient of eight Heilongjiang Provincial Science and Technology Awards awarded by Heilongjiang Provincial Government, China.