By Topic

A Visual Analytics approach to identifying protein structural constraints

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

1 Author(s)
William C. Ray ; The Research Institute at Nationwide Children's Hospital, The Ohio State University Biophysics Program

Predicting protein structures has long been a grand-challenge problem. Fine-grained computational simulation of folding events from a protein's synthesis to its final stable structure remains computationally intractable. Therefore, methods which derive constraints from other sources are attractive. To date, constraints derived from known structures have proven to be highly successful. However, these cannot be applied to molecules with no identifiable neighbors having already-determined structures. For such molecules, structural constraints must be derived in other ways. One popular approach has been the statistical analysis of large families of proteins, with the hope that residues that “change together” (co-evolve) imply that those residues are in contact. Unfortunately, despite repeated attempts to use this data to deduce structural constraints, this approach has met with minimal success. The consensus of current literature concludes that there is simply too little information contained within the correlated mutations of many protein families to reliably and generally predict structural constraints. Recent work in my laboratory challenges this conclusion. For some time we have been developing methods (MAVL/StickWRLD) to visualize the pattern of co-evolved mutations within sequence families. While our analysis of individual correlations agrees with the literature consensus, we have recently discovered that the visualized pattern of correlations is highly suggestive of structural relationships. In our preliminary test cases, human researchers can unambiguously determine many positive structural constraints by visual analysis of statistical sequence information alone, often with no training on interpretation of the visualization results. Herein we report the visualization design that supports this Visual Analytics approach to identifying high-confidence hypotheses about protein folding from protein sequence, and illustrate preliminary results from th- - is research. Our approach entails a higher-dimensional extension of parallel coordinates which illuminates distant shared sub-tuples of the vectors representing each protein sequence when these sub-tuples occur with an over abundance compared to expectations. It simultaneously eliminates all representations of tuples which occur with frequency near the expected norm. The result is a minimally-occluded representation of outlier, and only outlier co-occurrences within the sequence families.

Published in:

Visual Analytics Science and Technology (VAST), 2010 IEEE Symposium on

Date of Conference:

25-26 Oct. 2010