Skip to Main Content
Due to the complexity of the underlying biological processes, gene expression data obtained from DNA microarray technologies are typically noisy and have very high dimensionality and these make the mining of such data for gene function prediction very difficult. To tackle these difficulties, we propose to use an incremental fuzzy mining technique called incremental fuzzy mining (IFM). By transforming quantitative expression values into linguistic terms, such as highly or lowly expressed, IFM can effectively capture heterogeneity in expression data for pattern discovery. It does so using a fuzzy measure to determine if interesting association patterns exist between the linguistic gene expression levels. Based on these patterns, IFM can make accurate gene function predictions and these predictions can be made in such a way that each gene can be allowed to belong to more than one functional class with different degrees of membership. Gene function prediction problem can be formulated both as classification and clustering problems, and IFM can be used either as a classification technique or together with existing clustering algorithms to improve the cluster groupings discovered for greater prediction accuracies. IFM is characterized also by its being an incremental data mining technique so that the discovered patterns can be continually refined based only on newly collected data without the need for retraining using the whole dataset. For performance evaluation, IFM has been tested with real expression datasets for both classification and clustering tasks. Experimental results show that it can effectively uncover hidden patterns for accurate gene function predictions.