By Topic

A Method for Evaluating Quality of Clustering DNA Fragments Encoded in Different Nucleotide Frequencies

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Chon-Kit Kenneth Chan ; DoMME, Univ. of Melbourne, Melbourne, VIC ; Arthur L. Hsu ; Sen-Lin Tang ; Saman K. Halgamuge

The whole-genome shotgun sequencing technique has been successfully applied to environmental genomes. However, a considerable amount of DNA sequences and small contigs remain generally unassembled after the shotgun sequencing. Binning is a step of grouping these sequences based on some biological and molecular features. The combination of oligonucleotide frequency and Self-Organising Maps (SOM) clustering algorithm shows high potential as a compositional binning tool. As the previous work did not provide methods for assessing results, we proposed a systematic quantitative method to evaluate the clustering results specifically for this type of application. We used this method to investigate the suitability of each of di, tri, tetra and pentanucleotide frequencies as training feature for this binning technique. The results show that dinucleotide frequency is unable to bin Wkb DNA sequence fragments into well-clustered species groups. Furthermore, we noticed that increasing order of oligonucleotide frequency may deteriorate the assignment of DNA sequences to classes in our test, which indicates the possible existence of optimal species-specific oligonucleotide frequency. Results suggest that using trinucleotide frequency for the combination of oligonucleotide frequency and SOM as a binning process gives sufficiently good clustering quality in this case.

Published in:

Frontiers in the Convergence of Bioscience and Information Technologies, 2007. FBIT 2007

Date of Conference:

11-13 Oct. 2007