By Topic

Three improvements to the BLASTP search of genome databases

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
S. Delaney ; Dept. of Comput. Sci., Concordia Univ., Montreal, Que., Canada ; G. Butler ; C. Lam ; L. Thiel

The BLASTP program is a search tool for databases of protein sequences that is widely used by biologists as a first step in investigating new genome sequences. BLASTP finds high-scoring local alignments (qiqi+1…qi+k||s jsj+1…sj+k) without gaps between a query sequence q and sequences s in the database. The score of an alignment is the sum of the scores of individual alignments qi+t ||sj+t between amino acids that make up the protein. These individual scores come from a scoring matrix modeling the rate of evolutionary mutation. Here we provide a detailed description of the original program and three separate optimisations to it. BLASTP consists of three steps, that we call neighbourhood construction, hit detection, and hit extension. The three optimisations target hit extension since it accounts for 93% of the execution time. The first optimisation alters the data representation of the query sequence and the related code for indexing the scoring matrix. The second optimisation performs extensions in step-sizes of two rather than one. The third optimisation forstalls the calling of the hit extension step in cases that are unlikely to lead to a high-scoring alignment. Individually the three optimisations show speed ups of 15%, 48%, and 63% respectively

Published in:

Scientific and Statistical Database Management, 2000. Proceedings. 12th International Conference on

Date of Conference: