Skip to Main Content
We have applied rule induction to a publicly available adenocarcinoma gene expression data set. The typical approach to the analysis of gene expression data is to cluster the genes. However, interpreting the resulting clusters may be difficult. With rules, the interpretation is more obvious (e.g., (CDKN3>253)→(tumor-stage=3)). Our rule induction tool is a new semiautonomous discovery system we are developing called HAMB, and we used it to learn rules for survival status, survival time, and tumor stage. When we searched the world-wide web for publications relating our top 53 genes from our discovered rules to lung cancer, we found that 9 of them are known to be associated with lung cancer, 19 of them are known to be associated with other types of cancer, and the remaining 25 were not known to be associated with cancer. Our results suggest that the latter two groups of genes should be examined more closely for their association with lung cancer.