By Topic

Fastfinger: A study into the use of compressed residue pair separation matrices for protein sequence comparison

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $33
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

1 Author(s)
B. Robson ; IBM Research Division, Thomas J. Watson Research Center, P.O. Box 218, Yorktown Heights, New York 10598, USA

Protein sequences are diverse in size and in content meaningful to researchers. They are rich in what seems to be “noise,” or aspects of lesser interest that obscure clearer core features required to establish true relatedness and function. This paper represents part of a larger study that explores the possible efficient use and storage of “fingers” for protein sequence analysis, i.e., matrices of uniform size and shape that can “stand for” protein sequences by making more explicit the essential aspects of protein sequence pattern information. The essence of the study relates to data compression. Compression invokes an interesting alternative idea of pattern—the concept of “primeness” as in number theory is used to create the notion of an irreducible and potentially recurrent pattern element, and then this philosophy is mapped onto number theory by the unique factorization theorem, in order to define a novel measure of pattern difference. Other possible approaches are also discussed. Because compression and other approximations involve information loss, this is also a study of performance in the face of such loss. Because of the effects of this loss, no claims are made that encourage replacement of established sequence comparison methods, but the concept may have value in a number of applications within, and outside, molecular biology.

Note: The Institute of Electrical and Electronics Engineers, Incorporated is distributing this Article with permission of the International Business Machines Corporation (IBM) who is the exclusive owner. The recipient of this Article may not assign, sublicense, lease, rent or otherwise transfer, reproduce, prepare derivative works, publicly display or perform, or distribute the Article.  

Published in:

IBM Systems Journal  (Volume:40 ,  Issue: 2 )