Skip to Main Content
The main aim of this paper is to design a scheme to identify the species from its genome sequence. Feature descriptors for a genome sequence are identified using MapReduce framework. Each feature descriptor is a three lettered keyword generated using A, T, C, G nucleotide bases. Genome sequences of related species are clustered by considering the feature descriptor count. MapReduce version of clustering model that uses K-means, Differential Evolution (DE) and Ant Colony Optimization (ACO) has been proposed. This MapReduce model improves accuracy as the entire genome sequence is considered. The inherent parallelism in the MapReduce model also enhances execution time efficiency.