Abstract:
To understand the underlying biological mechanisms of gene expression data, it is important to discover the groups of genes that have similar expression patterns under ce...Show MoreMetadata
Abstract:
To understand the underlying biological mechanisms of gene expression data, it is important to discover the groups of genes that have similar expression patterns under certain subsets of conditions. Biclustering algorithms have been effective in analyzing large-scale gene expression data. Recently, traditional biclustering has been improved by introducing biological knowledge along with the expression data during the biclustering process. In this paper, we propose the Pathway-based Order Preserving Biclustering (POPBic) algorithm by incorporating Kyoto Encyclopedia of Genes and Genomes (KEGG) based on the hypothesis that two genes sharing similar pathways are likely to be similar. The basic principle of the POPBic approach is to apply the concept of Longest Common Subsequence between a pair of genes which have a high number of common pathways. The algorithm identifies the expression patterns from data using two major steps: (i) selection of significant seed genes and (ii) extraction of biclusters. We performe exhaustive experimentation with the POPBic algorithm using synthetic dataset to evaluate the bicluster model, finding its robustness in the presence of noise and identifying overlapping biclusters. We demonstrate that POPBic is able to discover biologically significant biclusters for four cancer microarray gene expression datasets. POPBic has been found to perform consistently well in comparison to its closest competitors.
Published in: IEEE/ACM Transactions on Computational Biology and Bioinformatics ( Volume: 18, Issue: 6, 01 Nov.-Dec. 2021)