Skip to Main Content
This paper proposes a new method for preliminary identification of gene regulatory networks (GRNs) from gene microarray cancer databased on ridge partial least squares (RPLS) with recursive feature elimination (RFE) and novel Brier and occurrence probability measures. It facilitates the preliminary identification of meaningful pathways and genes for a specific disease, rather than focusing on selecting a small set of genes for classification purposes as in conventional studies. First, RFE and a novel Brier error measure are incorporated in RPLS to reduce the estimation variance using a two-nested cross validation (CV) approach. Second, novel Brier and occurrence probability-based measures are employed in ranking genes across different CV subsamples. It helps to detect different GRNs from correlated genes which consistently appear in the ranking lists. Therefore, unlike most conventional approaches that emphasize the best classification using a small gene set, the proposed approach is able to simultaneously offer good classification accuracy and identify a more comprehensive set of genes and their associated GRNs. Experimental results on the analysis of three publicly available cancer data sets, namely leukemia, colon, and prostate, show that very stable gene sets from different but relevant GRNs can be identified, and most of them are found to be of biological significance according to previous findings in biological experiments. These suggest that the proposed approach may serve as a useful tool for preliminary identification of genes and their associated GRNs of a particular disease for further biological studies using microarray or similar data.