By Topic

Computational Biology and Bioinformatics, IEEE/ACM Transactions on

Issue 1 • Date Jan.-Feb. 2012

Filter Results

Displaying Results 1 - 25 of 35
  • [Front cover]

    Publication Year: 2012 , Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (631 KB)  
    Freely Available from IEEE
  • [Cover 2]

    Publication Year: 2012 , Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (106 KB)  
    Freely Available from IEEE
  • A Memory Efficient Method for Structure-Based RNA Multiple Alignment

    Publication Year: 2012 , Page(s): 1 - 11
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1685 KB) |  | HTML iconHTML  

    Structure-based RNA multiple alignment is particularly challenging because covarying mutations make sequence information alone insufficient. Existing tools for RNA multiple alignment first generate pairwise RNA structure alignments and then build the multiple alignment using only sequence information. Here we present PMFastR, an algorithm which iteratively uses a sequence-structure alignment procedure to build a structure-based RNA multiple alignment from one sequence with known structure and a database of sequences from the same family. PMFastR also has low memory consumption allowing for the alignment of large sequences such as 16S and 23S rRNA. The algorithm also provides a method to utilize a multicore environment. We present results on benchmark data sets from BRAliBase, which shows PMFastR performs comparably to other state-of-the-art programs. Finally, we regenerate 607 Rfam seed alignments and show that our automated process creates multiple alignments similar to the manually curated Rfam seed alignments. Thus, the techniques presented in this paper allow for the generation of multiple alignments using sequence-structure guidance, while limiting memory consumption. As a result, multiple alignments of long RNA sequences, such as 16S and 23S rRNAs, can easily be generated locally on a personal computer. The software and supplementary data are available at http://genome.ucf.edu/PMFastR. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Efficient Algorithm for Haplotype Inference on Pedigrees with Recombinations and Mutations

    Publication Year: 2012 , Page(s): 12 - 25
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1261 KB)  

    Haplotype Inference (HI) is a computational challenge of crucial importance in a range of genetic studies. Pedigrees allow to infer haplotypes from genotypes more accurately than population data, since Mendelian inheritance restricts the set of possible solutions. In this work, we define a new HI problem on pedigrees, called Minimum-Change Haplotype Configuration (MCHC) problem, that allows two types of genetic variation events: recombinations and mutations. Our new formulation extends the Minimum-Recombinant Haplotype Configuration (MRHC) problem, that has been proposed in the literature to overcome the limitations of classic statistical haplotyping methods. Our contribution is twofold. First, we prove that the MCHC problem is APX-hard under several restrictions. Second, we propose an efficient and accurate heuristic algorithm for MCHC based on an L-reduction to a well-known coding problem. Our heuristic can also be used to solve the original MRHC problem and can take advantage of additional knowledge about the input genotypes. Moreover, the L-reduction proves for the first time that MCHC and MRHC are O(nm/log nm)-approximable on general pedigrees, where n is the pedigree size and m is the genotype length. Finally, we present an extensive experimental evaluation and comparison of our heuristic algorithm with several other state-of-the-art methods for HI on pedigrees. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Efficient Method for Exploring the Space of Gene Tree/Species Tree Reconciliations in a Probabilistic Framework

    Publication Year: 2012 , Page(s): 26 - 39
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (912 KB) |  | HTML iconHTML  

    Background. Inferring an evolutionary scenario for a gene family is a fundamental problem with applications both in functional and evolutionary genomics. The gene tree/species tree reconciliation approach has been widely used to address this problem, but mostly in a discrete parsimony framework that aims at minimizing the number of gene duplications and/or gene losses. Recently, a probabilistic approach has been developed, based on the classical birth-and-death process, including efficient algorithms for computing posterior probabilities of reconciliations and orthology prediction. Results. In previous work, we described an algorithm for exploring the whole space of gene tree/species tree reconciliations, that we adapt here to compute efficiently the posterior probability of such reconciliations. These posterior probabilities can be either computed exactly or approximated, depending on the reconciliation space size. We use this algorithm to analyze the probabilistic landscape of the space of reconciliations for a real data set of fungal gene families and several data sets of synthetic gene trees. Conclusion. The results of our simulations suggest that, with exact gene trees obtained by a simple birth-and-death process and realistic gene duplication/loss rates, a very small subset of all reconciliations needs to be explored in order to approximate very closely the posterior probability of the most likely reconciliations. For cases where the posterior probability mass is more evenly dispersed, our method allows to explore efficiently the required subspace of reconciliations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Efficient Method for Modeling Kinetic Behavior of Channel Proteins in Cardiomyocytes

    Publication Year: 2012 , Page(s): 40 - 51
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4764 KB) |  | HTML iconHTML  

    Characterization of the kinetic and conformational properties of channel proteins is a crucial element in the integrative study of congenital cardiac diseases. The proteins of the ion channels of cardiomyocytes represent an important family of biological components determining the physiology of the heart. Some computational studies aiming to understand the mechanisms of the ion channels of cardiomyocytes have concentrated on Markovian stochastic approaches. Mathematically, these approaches employ Chapman-Kolmogorov equations coupled with partial differential equations. As the scale and complexity of such subcellular and cellular models increases, the balance between efficiency and accuracy of algorithms becomes critical. We have developed a novel two-stage splitting algorithm to address efficiency and accuracy issues arising in such modeling and simulation scenarios. Numerical experiments were performed based on the incorporation of our newly developed conformational kinetic model for the rapid delayed rectifier potassium channel into the dynamic models of human ventricular myocytes. Our results show that the new algorithm significantly outperforms commonly adopted adaptive Runge-Kutta methods. Furthermore, our parallel simulations with coupled algorithms for multicellular cardiac tissue demonstrate a high linearity in the speedup of large-scale cardiac simulations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Information Theoretic Approach to Constructing Robust Boolean Gene Regulatory Networks

    Publication Year: 2012 , Page(s): 52 - 65
    Cited by:  Papers (4)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (742 KB) |  | HTML iconHTML  

    We introduce a class of finite systems models of gene regulatory networks exhibiting behavior of the cell cycle. The network is an extension of a Boolean network model. The system spontaneously cycles through a finite set of internal states, tracking the increase of an external factor such as cell mass, and also exhibits checkpoints in which errors in gene expression levels due to cellular noise are automatically corrected. We present a 7-gene network based on Projective Geometry codes, which can correct, at every given time, one gene expression error. The topology of a network is highly symmetric and requires using only simple Boolean functions that can be synthesized using genes of various organisms. The attractor structure of the Boolean network contains a single cycle attractor. It is the smallest nontrivial network with such high robustness. The methodology allows construction of artificial gene regulatory networks with the number of phases larger than in natural cell cycle. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Assortative mixing in directed biological networks

    Publication Year: 2012 , Page(s): 66 - 78
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1158 KB) |  | HTML iconHTML  

    We analyze assortative mixing patterns of biological networks which are typically directed. We develop a theoretical background for analyzing mixing patterns in directed networks before applying them to specific biological networks. Two new quantities are introduced, namely the in-assortativity and the out-assortativity, which are shown to be useful in quantifying assortative mixing in directed networks. We also introduce the local (node level) assortativity quantities for in- and out-assortativity. Local assortativity profiles are the distributions of these local quantities over node degrees and can be used to analyze both canonical and real-world directed biological networks. Many biological networks, which have been previously classified as disassortative, are shown to be assortative with respect to these new measures. Finally, we demonstrate the use of local assortativity profiles in analyzing the functionalities of particular nodes and groups of nodes in real-world biological networks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Composition Vector Method Based on Maximum Entropy Principle for Sequence Comparison

    Publication Year: 2012 , Page(s): 79 - 87
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (899 KB) |  | HTML iconHTML  

    The composition vector (CV) method is an alignment-free method for sequence comparison. Because of its simplicity when compared with multiple sequence alignment methods, the method has been widely discussed lately; and some formulas based on probabilistic models, like Hao's and Yu's formulas, have been proposed. In this paper, we improve these formulas by using the entropy principle which can quantify the nonrandomness occurrence of patterns in the sequences. More precisely, existing formulas are used to generate a set of possible formulas from which we choose the one that maximizes the entropy. We give the closed-form solution to the resulting optimization problem. Hence, from any given CV formula, we can find the corresponding one that maximizes the entropy. In particular, we show that Hao's formula is itself maximizing the entropy and we derive a new entropy-maximizing formula from Yu's formula. We illustrate the accuracy of our new formula by using both simulated and experimental data sets. For the simulated data sets, our new formula gives the best consensus and significant values for three different kinds of evolution models. For the data set of tetrapod 18S rRNA sequences, our new formula groups the clades of bird and reptile together correctly, where Hao's and Yu's formulas failed. Using real data sets with different sizes, we show that our formula is more accurate than Hao's and Yu's formulas even for small data sets. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Disease Liability Prediction from Large Scale Genotyping Data Using Classifiers with a Reject Option

    Publication Year: 2012 , Page(s): 88 - 97
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1231 KB) |  | HTML iconHTML  

    Genome-wide association studies (GWA) try to identify the genetic polymorphisms associated with variation in phenotypes. However, the most significant genetic variants may have a small predictive power to forecast the future development of common diseases. We study the prediction of the risk of developing a disease given genome-wide genotypic data using classifiers with a reject option, which only make a prediction when they are sufficiently certain, but in doubtful situations may reject making a classification. To test the reliability of our proposal, we used the Wellcome Trust Case Control Consortium (WTCCC) data set, comprising 14,000 cases of seven common human diseases and 3,000 shared controls. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Drosophila Gene Expression Pattern Annotation through Multi-Instance Multi-Label Learning

    Publication Year: 2012 , Page(s): 98 - 112
    Cited by:  Papers (5)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1555 KB)  

    In the studies of Drosophila embryogenesis, a large number of two-dimensional digital images of gene expression patterns have been produced to build an atlas of spatio-temporal gene expression dynamics across developmental time. Gene expressions captured in these images have been manually annotated with anatomical and developmental ontology terms using a controlled vocabulary (CV), which are useful in research aimed at understanding gene functions, interactions, and networks. With the rapid accumulation of images, the process of manual annotation has become increasingly cumbersome, and computational methods to automate this task are urgently needed. However, the automated annotation of embryo images is challenging. This is because the annotation terms spatially correspond to local expression patterns of images, yet they are assigned collectively to groups of images and it is unknown which term corresponds to which region of which image in the group. In this paper, we address this problem using a new machine learning framework, Multi-Instance Multi-Label (MIML) learning. We first show that the underlying nature of the annotation task is a typical MIML learning problem. Then, we propose two support vector machine algorithms under the MIML framework for the task. Experimental results on the FlyExpress database (a digital library of standardized Drosophila gene expression pattern images) reveal that the exploitation of MIML framework leads to significant performance improvement over state-of-the-art approaches. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Inferring the Number of Contributors to Mixed DNA Profiles

    Publication Year: 2012 , Page(s): 113 - 122
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1272 KB) |  | HTML iconHTML  

    Forensic samples containing DNA from two or more individuals can be difficult to interpret. Even ascertaining the number of contributors to the sample can be challenging. These uncertainties can dramatically reduce the statistical weight attached to evidentiary samples. A probabilistic mixture algorithm that takes into account not just the number and magnitude of the alleles at a locus, but also their frequency of occurrence allows the determination of likelihood ratios of different hypotheses concerning the number of contributors to a specific mixture. This probabilistic mixture algorithm can compute the probability of the alleles in a sample being present in a 2-person mixture, 3-person mixture, etc. The ratio of any two of these probabilities then constitutes a likelihood ratio pertaining to the number of contributors to such a mixture. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Intervention in Gene Regulatory Networks via Phenotypically Constrained Control Policies Based on Long-Run Behavior

    Publication Year: 2012 , Page(s): 123 - 136
    Cited by:  Papers (6)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1588 KB) |  | HTML iconHTML  

    A salient purpose for studying gene regulatory networks is to derive intervention strategies to identify potential drug targets and design gene-based therapeutic intervention. Optimal and approximate intervention strategies based on the transition probability matrix of the underlying Markov chain have been studied extensively for probabilistic Boolean networks. While the key goal of control is to reduce the steady-state probability mass of undesirable network states, in practice it is important to limit collateral damage and this constraint should be taken into account when designing intervention strategies with network models. In this paper, we propose two new phenotypically constrained stationary control policies by directly investigating the effects on the network long-run behavior. They are derived to reduce the risk of visiting undesirable states in conjunction with constraints on the shift of undesirable steady-state mass so that only limited collateral damage can be introduced. We have studied the performance of the new constrained control policies together with the previous greedy control policies to randomly generated probabilistic Boolean networks. A preliminary example for intervening in a metastatic melanoma network is also given to show their potential application in designing genetic therapeutics to reduce the risk of entering both aberrant phenotypes and other ambiguous states corresponding to complications or collateral damage. Experiments on both random network ensembles and the melanoma network demonstrate that, in general, the new proposed control policies exhibit the desired performance. As shown by intervening in the melanoma network, these control policies can potentially serve as future practical gene therapeutic intervention strategies. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Iterative Dictionary Construction for Compression of Large DNA Data Sets

    Publication Year: 2012 , Page(s): 137 - 149
    Cited by:  Papers (5)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (989 KB) |  | HTML iconHTML  

    Genomic repositories increasingly include individual as well as reference sequences, which tend to share long identical and near-identical strings of nucleotides. However, the sequential processing used by most compression algorithms, and the volumes of data involved, mean that these long-range repetitions are not detected. An order-insensitive, disk-based dictionary construction method can detect this repeated content and use it to compress collections of sequences. We explore a dictionary construction method that improves repeat identification in large DNA data sets. Our adaptation, Comrad, of an existing disk-based method identifies exact repeated content in collections of sequences with similarities within and across the set of input sequences. Comrad compresses the data over multiple passes, which is an expensive process, but allows Comrad to compress large data sets within reasonable time and space. Comrad allows for random access to individual sequences and subsequences without decompressing the whole data set. Comrad has no competitor in terms of the size of data sets that it can compress (extending to many hundreds of gigabytes) and, even for smaller data sets, the results are competitive compared to alternatives; as an example, 39 S. cerevisiae genomes compressed to 0.25 bits per base. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Matching Split Distance for Unrooted Binary Phylogenetic Trees

    Publication Year: 2012 , Page(s): 150 - 160
    Cited by:  Papers (1)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1358 KB) |  | HTML iconHTML  

    The reconstruction of evolutionary trees is one of the primary objectives in phylogenetics. Such a tree represents the historical evolutionary relationship between different species or organisms. Tree comparisons are used for multiple purposes, from unveiling the history of species to deciphering evolutionary associations among organisms and geographical areas. In this paper, we propose a new method of defining distances between unrooted binary phylogenetic trees that is especially useful for relatively large phylogenetic trees. Next, we investigate in detail the properties of one example of these metrics, called the Matching Split distance, and describe how the general method can be extended to nonbinary trees. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Memory Efficient Algorithms for Structural Alignment of RNAs with Pseudoknots

    Publication Year: 2012 , Page(s): 161 - 168
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (367 KB) |  | HTML iconHTML  

    In this paper, we consider the problem of structural alignment of a target RNA sequence of length n and a query RNA sequence of length m with known secondary structure that may contain simple pseudoknots or embedded simple pseudoknots. The best known algorithm for solving this problem runs in O(mn3) time for simple pseudoknot or O(mn4) time for embedded simple pseudoknot with space complexity of O(mn3) for both structures, which require too much memory making it infeasible for comparing noncoding RNAs (ncRNAs) with length several hundreds or more. We propose memory efficient algorithms to solve the same problem. We reduce the space complexity to O(n3) for simple pseudoknot and O(mn2 + n3) for embedded simple pseudoknot while maintaining the same time complexity. We also show how to modify our algorithm to handle a restricted class of recursive simple pseudoknot which is found abundant in real data with space complexity of O(mn2 + n3) and time complexity of O(mn4). Experimental results show that our algorithms are feasible for comparing ncRNAs of length more than 500. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multiobjective Optimization Based-Approach for Discovering Novel Cancer Therapies

    Publication Year: 2012 , Page(s): 169 - 184
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3939 KB) |  | HTML iconHTML  

    Solid tumors must recruit new blood vessels for growth and maintenance. Discovering drugs that block tumor-induced development of new blood vessels (angiogenesis) is an important approach in cancer treatment. The complexity of angiogenesis presents both challenges and opportunities for cancer therapies. Intuitive approaches, such as blocking VegF activity, have yielded important therapies. But there maybe opportunities to alter nonintuitive targets either alone or in combination. This paper describes the development of a high-fidelity simulation of angiogenesis and uses this as the basis for a parallel search-based approach for the discovery of novel potential cancer treatments that inhibit blood vessel growth. Discovering new therapies is viewed as a multiobjective combinatorial optimization over two competing objectives: minimizing the estimated cost of practically developing the intervention while minimizing the simulated oxygen provided to the tumor by angiogenesis. Results show the effectiveness of the search process by finding interventions that are currently in use, and more interestingly, discovering potential new approaches that are nonintuitive yet effective. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parameter Estimation Using Metaheuristics in Systems Biology: A Comprehensive Review

    Publication Year: 2012 , Page(s): 185 - 202
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1132 KB) |  | HTML iconHTML  

    This paper gives a comprehensive review of the application of metaheuristics to optimization problems in systems biology, mainly focusing on the parameter estimation problem (also called the inverse problem or model calibration). It is intended for either the system biologist who wishes to learn more about the various optimization techniques available and/or the metaheuristic optimizer who is interested in applying such techniques to problems in systems biology. First, the parameter estimation problems emerging from different areas of systems biology are described from the point of view of machine learning. Brief descriptions of various metaheuristics developed for these problems follow, along with outlines of their advantages and disadvantages. Several important issues in applying metaheuristics to the systems biology modeling problem are addressed, including the reliability and identifiability of model parameters, optimal design of experiments, and so on. Finally, we highlight some possible future research directions in this field. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Predicting Metal-Binding Sites from Protein Sequence

    Publication Year: 2012 , Page(s): 203 - 213
    Cited by:  Papers (1)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (736 KB) |  | HTML iconHTML  

    Prediction of binding sites from sequence can significantly help toward determining the function of uncharacterized proteins on a genomic scale. The task is highly challenging due to the enormous amount of alternative candidate configurations. Previous research has only considered this prediction problem starting from 3D information. When starting from sequence alone, only methods that predict the bonding state of selected residues are available. The sole exception consists of pattern-based approaches, which rely on very specific motifs and cannot be applied to discover truly novel sites. We develop new algorithmic ideas based on structured-output learning for determining transition-metal-binding sites coordinated by cysteines and histidines. The inference step (retrieving the best scoring output) is intractable for general output types (i.e., general graphs). However, under the assumption that no residue can coordinate more than one metal ion, we prove that metal binding has the algebraic structure of a matroid, allowing us to employ a very efficient greedy algorithm. We test our predictor in a highly stringent setting where the training set consists of protein chains belonging to SCOP folds different from the ones used for accuracy estimation. In this setting, our predictor achieves 56 percent precision and 60 percent recall in the identification of ligand-ion bonds. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reassortment Networks and the Evolution of Pandemic H1N1 Swine-Origin Influenza

    Publication Year: 2012 , Page(s): 214 - 227
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3018 KB) |  | HTML iconHTML  

    Prior research developed Reassortment Networks to reconstruct the evolution of segmented viruses under both reassortment and mutation. We report their application to the swine-origin pandemic H1N1 virus (S-OIV). A database of all influenza A viruses, for which complete genome sequences were available in Genbank by October 2009, was created and dynamic programming was used to compute distances between all corresponding segments. A reassortment network was created to obtain the minimum cost evolutionary paths from all viruses to the exemplar S-OIV A/California/04/2009. This analysis took 35 hours on the Cray Extreme Multithreading (XMT) supercomputer, which has special hardware to permit efficient parallelization. Six specific H1N1/H1N2 bottleneck viruses were identified that almost always lie on minimum cost paths to S-OIV. We conjecture that these viruses are crucial to S-OIV evolution and worthy of careful study from a molecular biology viewpoint. In phylogenetics, ancestors are typically medians that have no functional constraints. In our method, ancestors are not inferred, but rather chosen from previously observed viruses along a path of mutation and reassortment leading to the target virus. This specificity and functional constraint render our results actionable for further experiments in vitro and in vivo. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • RENNSH: A Novel alpha-Helix Identification Approach for Intermediate Resolution Electron Density Maps

    Publication Year: 2012 , Page(s): 228 - 239
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2927 KB) |  | HTML iconHTML  

    Accurate identification of protein secondary structures is beneficial to understand three-dimensional structures of biological macromolecules. In this paper, a novel refined classification framework is proposed, which treats alpha-helix identification as a machine learning problem by representing each voxel in the density map with its Spherical Harmonic Descriptors (SHD). An energy function is defined to provide statistical analysis of its identification performance, which can be applied to all the α-helix identification approaches. Comparing with other existing α-helix identification methods for intermediate resolution electron density maps, the experimental results demonstrate that our approach gives the best identification accuracy and is more robust to the noise. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Residues with Similar Hexagon Neighborhoods Share Similar Side-Chain Conformations

    Publication Year: 2012 , Page(s): 240 - 248
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1287 KB) |  | HTML iconHTML  

    We present in this study a new approach to code protein side-chain conformations into hexagon substructures. Classical side-chain packing methods consist of two steps: first, side-chain conformations, known as rotamers, are extracted from known protein structures as candidates for each residue; second, a searching method along with an energy function is used to resolve conflicts among residues and to optimize the combinations of side chain conformations for all residues. These methods benefit from the fact that the number of possible side-chain conformations is limited, and the rotamer candidates are readily extracted; however, these methods also suffer from the inaccuracy of energy functions. Inspired by threading and Ab Initio approaches to protein structure prediction, we propose to use hexagon substructures to implicitly capture subtle issues of energy functions. Our initial results indicate that even without guidance from an energy function, hexagon structures alone can capture side-chain conformations at an accuracy of 83.8 percent, higher than 82.6 percent by the state-of-art side-chain packing methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Smolign: A Spatial Motifs-Based Protein Multiple Structural Alignment Method

    Publication Year: 2012 , Page(s): 249 - 261
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1007 KB) |  | HTML iconHTML  

    Availability of an effective tool for protein multiple structural alignment (MSTA) is essential for discovery and analysis of biologically significant structural motifs that can help solve functional annotation and drug design problems. Existing MSTA methods collect residue correspondences mostly through pairwise comparison of consecutive fragments, which can lead to suboptimal alignments, especially when the similarity among the proteins is low. We introduce a novel strategy based on: building a contact-window based motif library from the protein structural data, discovery and extension of common alignment seeds from this library, and optimal superimposition of multiple structures according to these alignment seeds by an enhanced partial order curve comparison method. The ability of our strategy to detect multiple correspondences simultaneously, to catch alignments globally, and to support flexible alignments, endorse a sensitive and robust automated algorithm that can expose similarities among protein structures even under low similarity conditions. Our method yields better alignment results compared to other popular MSTA methods, on several protein structure data sets that span various structural folds and represent different protein similarity levels. A web-based alignment tool, a downloadable executable, and detailed alignment results for the data sets used here are available at http://sacan.biomed. drexel.edu/Smolign and http://bio.cse.ohio-state.edu/Smolign. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Stable Gene Selection from Microarray Data via Sample Weighting

    Publication Year: 2012 , Page(s): 262 - 272
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1062 KB) |  | HTML iconHTML  

    Feature selection from gene expression microarray data is a widely used technique for selecting candidate genes in various cancer studies. Besides predictive ability of the selected genes, an important aspect in evaluating a selection method is the stability of the selected genes. Experts instinctively have high confidence in the result of a selection method that selects similar sets of genes under some variations to the samples. However, a common problem of existing feature selection methods for gene expression data is that the selected genes by the same method often vary significantly with sample variations. In this work, we propose a general framework of sample weighting to improve the stability of feature selection methods under sample variations. The framework first weights each sample in a given training set according to its influence to the estimation of feature relevance, and then provides the weighted training set to a feature selection method. We also develop an efficient margin-based sample weighting algorithm under this framework. Experiments on a set of microarray data sets show that the proposed algorithm significantly improves the stability of representative feature selection algorithms such as SVM-RFE and ReliefF, without sacrificing their classification performance. Moreover, the proposed algorithm also leads to more stable gene signatures than the state-of-the-art ensemble method, particularly for small signature sizes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The Complexity of Finding Multiple Solutions to Betweenness and Quartet Compatibility

    Publication Year: 2012 , Page(s): 273 - 285
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (485 KB) |  | HTML iconHTML  

    We show that two important problems that have applications in computational biology are ASP-complete, which implies that, given a solution to a problem, it is NP-complete to decide if another solution exists. We show first that a variation of BETWEENNESS, which is the underlying problem of questions related to radiation hybrid mapping, is ASP-complete. Subsequently, we use that result to show that QUARTET COMPATIBILITY, a fundamental problem in phylogenetics that asks whether a set of quartets can be represented by a parent tree, is also ASP-complete. The latter result shows that Steel's QUARTET CHALLENGE, which asks whether a solution to QUARTET COMPATIBILITY is unique, is coNP-complete. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

This bimonthly publishes archival research results related to the algorithmic, mathematical, statistical, and computational methods that are central in bioinformatics and computational biology.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Ying Xu
University of Georgia
xyn@bmb.uga.edu

Associate Editor-in-Chief
Dong Xu
University of Missouri
xudong@missouri.edu