A method is described for identification and classification of proteins encoded in large DNA sequences. Previously, an automated system was introduced for the general detection of amino acid sequence motifs within diverse protein families. The system generated a database consisting of aligned sequence segments (blocks) that correspond to the most highly conserved regions of proteins. This database of blocks can be searched using protein queries for sensitive detection of homology based on the detection of both local and global similarities. We show that this database searching approach can also be used to detect distant relatives encoded in very large DNA sequences. The approach is illustrated by the detection of known and new relationships in the 315 kilobase sequence of yeast chromosome III.<
Published in:
System Sciences, 1994. Proceedings of the Twenty-Seventh Hawaii International Conference on
(Volume:5
)
Date of Conference: 4-7 Jan. 1994