Abstract
DNA matching is a crucial step in sequence alignment. Since sequence alignment is an approximate matching process there is a need for good approximate algorithms. The process of matching in sequence alignment is generally finding longest common subsequences. However, finding a longest common subsequence may not be the best solution for either a database match or an assembly. An optimal alignment of subsequences is based on several factors, such as quality of bases, length of overlap, etc. Factors such as quality indicate if the data is an actual read or an experimental error. Fuzzy logic allows tolerance of inexactness or errors in sub sequence matching. We propose fuzzy logic for approximate matching of subsequences. Fuzzy characteristic functions are derived for parameters that influence a match. We develop a prototype for a fuzzy assembler. The assembler is designed to work with low quality data which is generally rejected by most of the existing techniques. We test the assembler on sequences from two genome projects namely, Drosophila melanogaster and Arabidopsis thaliana. The results are compared with other assemblers. The fuzzy assembler successfully assembled sequences and performed similar and in some cases better than existing techniques
Index
Terms
Available to subscribers and IEEE members.
References
Available to subscribers and IEEE members.
Citing Documents
Available to subscribers and IEEE members.