Skip to Main Content
Though DNA microarray technology simultaneously measures the expression levels of thousands of genes, only a few underlying gene features may account for significant data variation in gene classification problems. Selection of features from huge data set is difficult and so dimension reduction of gene expression data set is essential in order to determining important features, which play key role in predicting an outcome. Rough set theory (RST) has been used recently for dimension reduction of data, however, the existing methods are inadequate to finding minimal reduct. The paper proposes a RST based technique, applied on gene expression data for dimension reduction by obtaining single reduct in one pass. The gene expression data are discretized using linguistic terms with proper semantics and represented by fuzzy sets. The discretized values are calculated using Gaussian membership function with varied mean and standard deviation in order to eliminate the ambiguity between different linguistic terms. The genes are classified using linguistic decision attribute values based on the frequency of gene expression data. Discritization and classification of gene expression data are performed simultaneously, which significantly reduces time complexity of the procedure. Thus, the proposed framework selects the most significant samples for gene classification, resulting dimension reduction. The Proposed method produces output, which exhibits no variation with experimental microarray gene information unlike other existing methods.