Abstract:
The microorganism samples taken directly from environment are not easy to assemble because they contains mixtures of microorganism. If sample complexity is very high and ...Show MoreMetadata
Abstract:
The microorganism samples taken directly from environment are not easy to assemble because they contains mixtures of microorganism. If sample complexity is very high and comes from highly diverse environment, the difficulty of assembling DNA sequences is increasing since the interspecies chimeras can happen. To avoid this problem, in this research, we proposed binning based on composition using unsupervised learning. We employed trinucleotide and tetranucleotide frequency as features and GSOM algorithm as clustering method. GSOM was implemented to map features into high dimension feature space. We tested our method using small microbial community dataset. The quality of cluster was evaluated based on the following parameters : topographic error, quantization error, and error percentage. The evaluation results show that the best cluster can be obtained using GSOM and tetranucleotide.
Published in: 2013 International Conference on Advanced Computer Science and Information Systems (ICACSIS)
Date of Conference: 28-29 September 2013
Date Added to IEEE Xplore: 13 March 2014
Electronic ISBN:978-1-4799-4692-1