Skip to Main Content
Expressed sequence tag (ESTs) are a technology used for gene discovery and transcriptome analysis. They are single-read short fragments of expressed gene produced from mRNA extracted from a living cell. Clustering is a vital computational step in the processing of ESTs, its main goal is to ensure that all ESTs originated from the same mRNA are grouped together. Basically, the clustering algorithms of EST can be classified into two approaches, i.e. alignment-based and alignment-free. The latter approach is preferred in recent years, due to its faster speed and satisfactory outcome. In this paper, we proposed and implemented an EST clustering algorithm based on the alignment-free approach, where we introduced a measurement of distance between ESTs using the combination of Burrows-Wheeler transform, window length and word-tuple. We assessed the proposed method with a dataset downloaded from the Unigene. The preliminary result shows high clustering quality with this method, where the accuracy of clustering (evaluated using F-measure) can achieve up to 0.9671.
Biomedical Engineering and Informatics (BMEI), 2010 3rd International Conference on (Volume:5 )
Date of Conference: 16-18 Oct. 2010