By Topic

Computational Biology and Bioinformatics, IEEE/ACM Transactions on

Issue 5 • Date Sept.-Oct. 2012

Filter Results

Displaying Results 1 - 25 of 35
  • [Front cover]

    Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (614 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (193 KB)  
    Freely Available from IEEE
  • Guest Editorial: Application and Development of Bioinformatics

    Page(s): 1265
    Save to Project icon | Request Permissions | PDF file iconPDF (48 KB)  
    Freely Available from IEEE
  • QuickVina: Accelerating AutoDock Vina Using Gradient-Based Heuristics for Global Optimization

    Page(s): 1266 - 1272
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (516 KB) |  | HTML iconHTML  

    Predicting binding between macromolecule and small molecule is a crucial phase in the field of rational drug design. AutoDock Vina, one of the most widely used docking software released in 2009, uses an empirical scoring function to evaluate the binding affinity between the molecules and employs the iterated local search global optimizer for global optimization, achieving a significantly improved speed and better accuracy of the binding mode prediction compared its predecessor, AutoDock 4. In this paper, we propose further improvement in the local search algorithm of Vina by heuristically preventing some intermediate points from undergoing local search. Our improved version of Vina-dubbed QVina-achieved a maximum acceleration of about 25 times with the average speed-up of 8.34 times compared to the original Vina when tested on a set of 231 protein-ligand complexes while maintaining the optimal scores mostly identical. Using our heuristics, larger number of different ligands can be quickly screened against a given receptor within the same time frame. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving X!Tandem on Peptide Identification from Mass Spectrometry by Self-Boosted Percolator

    Page(s): 1273 - 1280
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (705 KB) |  | HTML iconHTML  

    A critical component in mass spectrometry (MS)-based proteomics is an accurate protein identification procedure. Database search algorithms commonly generate a list of peptide-spectrum matches (PSMs). The validity of these PSMs is critical for downstream analysis since proteins that are present in the sample are inferred from those PSMs. A variety of postprocessing algorithms have been proposed to validate and filter PSMs. Among them, the most popular ones include a semi-supervised learning (SSL) approach known as Percolator and an empirical modeling approach known as PeptideProphet. However, they are predominantly designed for commercial database search algorithms, i.e., SEQUEST and MASCOT. Therefore, it is highly desirable to extend and optimize those PSM postprocessing algorithms for open source database search algorithms such as X!Tandem. In this paper, we propose a Self-boosted Percolator for postprocessing X!Tandem search results. We find that the SSL algorithm utilized by Percolator depends heavily on the initial ranking of PSMs. Starting with a poor PSM ranking list may cause Percolator to perform suboptimally. By implementing Percolator in a cascade learning manner, we can progressively improve the performance through multiple boost runs, enabling many more PSM identifications without sacrificing false discovery rate (FDR). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • CEDER: Accurate Detection of Differentially Expressed Genes by Combining Significance of Exons Using RNA-Seq

    Page(s): 1281 - 1292
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1051 KB) |  | HTML iconHTML  

    RNA-Seq is widely used in transcriptome studies, and the detection of differentially expressed genes (DEGs) between two classes of individuals, e.g., cases versus controls, using RNA-Seq is of fundamental importance. Many statistical methods for DEG detection based on RNA-Seq data have been developed and most of them are based on the read counts mapped to individual genes. On the other hand, genes are composed of exons and the distribution of reads for the different exons can be heterogeneous. We hypothesize that the detection accuracy of differentially expressed genes can be increased by analyzing individual exons within a gene and then combining the results of the exons. We therefore developed a novel program, termed CEDER, to accurately detect DEGs by combining the significance of the exons. CEDER first tests for differentially expressed exons yielding a p-value for each, and then gives a score indicating the potential for a gene to be differentially expressed by integrating the p-values of the exons in the gene. We showed that CEDER can significantly increase the accuracy of existing methods for detecting DEGs on two benchmark RNA-Seq data sets and simulated datasets. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • How Little Do We Actually Know? On the Size of Gene Regulatory Networks

    Page(s): 1293 - 1300
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (726 KB) |  | HTML iconHTML  

    The National Center for Biotechnology Information (NCBI) recently announced the availability of whole genome sequences for more than 1,000 species. And the number of sequenced individual organisms is growing. Ongoing improvement of DNA sequencing technology will further contribute to this, enabling large-scale evolution and population genetics studies. However, the availability of sequence information is only the first step in understanding how cells survive, reproduce, and adjust their behavior. The genetic control behind organized development and adaptation of complex organisms still remains widely undetermined. One major molecular control mechanism is transcriptional gene regulation. The direct juxtaposition of the total number of sequenced species to the handful of model organisms with known regulations is surprising. Here, we investigate how little we even know about these model organisms. We aim to predict the sizes of the whole-organism regulatory networks of seven species. In particular, we provide statistical lower bounds for the expected number of regulations. For Escherichia coli we estimate at most 37 percent of the expected gene regulatory interactions to be already discovered, 24 percent for Bacillus subtilis, and <;3% human, respectively. We conclude that even for our best researched model organisms we still lack substantial understanding of fundamental molecular control mechanisms, at least on a large scale. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Comparative Assessment of Ranking Accuracies of Conventional and Machine-Learning-Based Scoring Functions for Protein-Ligand Binding Affinity Prediction

    Page(s): 1301 - 1313
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (853 KB) |  | HTML iconHTML  

    Accurately predicting the binding affinities of large sets of protein-ligand complexes efficiently is a key challenge in computational biomolecular science, with applications in drug discovery, chemical biology, and structural biology. Since a scoring function (SF) is used to score, rank, and identify drug leads, the fidelity with which it predicts the affinity of a ligand candidate for a protein's binding site has a significant bearing on the accuracy of virtual screening. Despite intense efforts in developing conventional SFs, which are either force-field based, knowledge-based, or empirical, their limited ranking accuracy has been a major roadblock toward cost-effective drug discovery. Therefore, in this work, we explore a range of novel SFs employing different machine-learning (ML) approaches in conjunction with a variety of physicochemical and geometrical features characterizing protein-ligand complexes. We assess the ranking accuracies of these new ML-based SFs as well as those of conventional SFs in the context of the 2007 and 2010 PDBbind benchmark data sets on both diverse and protein-family-specific test sets. We also investigate the influence of the size of the training data set and the type and number of features used on ranking accuracy. Within clusters of protein-ligand complexes with different ligands bound to the same target protein, we find that the best ML-based SF is able to rank the ligands correctly based on their experimentally determined binding affinities 62.5 percent of the time and identify the top binding ligand 78.1 percent of the time. For this SF, the Spearman correlation coefficient between ranks of ligands ordered by predicted and experimentally determined binding affinities is 0.771. Given the challenging nature of the ranking problem and that SFs are used to screen millions of ligands, this represents a significant improvement over the best conventional SF we studied, for which the corresponding ranking performance values are 57.8- percent, 73.4 percent, and 0.677. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Guest Editors' Introduction to the Special Section on Computational Methods in Systems Biology

    Page(s): 1314 - 1315
    Save to Project icon | Request Permissions | PDF file iconPDF (64 KB)  
    Freely Available from IEEE
  • Evaluation of Design Strategies for Time Course Experiments in Genetic Networks: Case Study of the XlnR Regulon in Aspergillus niger

    Page(s): 1316 - 1325
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (892 KB) |  | HTML iconHTML  

    One of the challenges in genetic network reconstruction is finding experimental designs that maximize the information content in a data set. In this paper, the information value of mRNA transcription time course experiments was used to compare experimental designs. The study concerns the dynamic response of genes in the XlnR regulon of Aspergillus niger, with the goal to find the best moment in time to administer an extra pulse of inducing D-xylose. Low and high D-xylose pulses were used to perturb the XlnR regulon. Evaluation of the experimental methods was based on simulation of the regulon. Models that govern the regulation of the target genes in this regulon were used for the simulations. Parameter sensitivity analysis, the Fisher Information Matrix (FIM) and the modified E-criterion were used to assess the design performances. The results show that the best time to give a second D-xylose pulse is when the D-xylose concentration from the first pulse has not yet completely faded away. Due to the presence of a repression effect the strength of the second pulse must be optimized, rather than maximized. The results suggest that the modified E-criterion is a better metric than the sum of integrals of absolute sensitivity for comparing alternative designs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The Phosphorylation of the Heat Shock Factor as a Modulator for the Heat Shock Response

    Page(s): 1326 - 1337
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (905 KB)  

    The heat shock response is a well-conserved defence mechanism against the accumulation of misfolded proteins due to prolonged elevated heat. The cell responds to heat shock by raising the levels of heat shock proteins (hsp), which are responsible for chaperoning protein refolding. The synthesis of hsp is highly regulated at the transcription level by specific heat shock (transcription) factors (hsf). One of the regulation mechanisms is the phosphorylation of hsf's. Experimental evidence shows a connection between the hyper-phosphorylation of hsfs and the transactivation of the hsp-encoding genes. In this paper, we incorporate several (de)phosphorylation pathways into an existing well-validated computational model of the heat shock response. We analyze the quantitative control of each of these pathways over the entire process. For each of these pathways we create detailed computational models which we subject to parameter estimation in order to fit them to existing experimental data. In particular, we find conclusive evidence supporting only one of the analyzed pathways. Also, we corroborate our results with a set of computational models of a more reduced size. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Time Series Dependent Analysis of Unparametrized Thomas Networks

    Page(s): 1338 - 1351
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (638 KB) |  | HTML iconHTML  

    This paper is concerned with the analysis of labeled Thomas networks using discrete time series. It focuses on refining the given edge labels and on assessing the data quality. The results are aimed at being exploitable for experimental design and include the prediction of new activatory or inhibitory effects of given interactions and yet unobserved oscillations of specific components in between specific sampling intervals. On the formal side, we generalize the concept of edge labels and introduce a discrete time series interpretation. This interpretation features two original concepts: 1) Incomplete measurements are admissible, and 2) it allows qualitative assumptions about the changes in gene expression by means of monotonicity. On the computational side, we provide a Python script, erda.py, that automates the suggested workflow by model checking and constraint satisfaction. We illustrate the workflow by investigating the yeast network IRMA. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Hybrid Factored Frontier Algorithm for Dynamic Bayesian Networks with a Biopathways Application

    Page(s): 1352 - 1365
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1479 KB) |  | HTML iconHTML  

    Dynamic Bayesian Networks (DBNs) can serve as succinct probabilistic dynamic models of biochemical networks [1]. To analyze these models, one must compute the probability distribution over system states at a given time point. Doing this exactly is infeasible for large models; hence one must use approximate algorithms. The Factored Frontier algorithm (FF) is one such algorithm [2]. However FF as well as the earlier Boyen-Koller (BK) algorithm [3] can incur large errors. To address this, we present a new approximate algorithm called the Hybrid Factored Frontier (HFF) algorithm. At each time slice, in addition to maintaining probability distributions over local states-as FF does-HFF explicitly maintains the probabilities of a number of global states called spikes. When the number of spikes is 0, we get FF and with all global states as spikes, we get the exact inference algorithm. We show that by increasing the number of spikes one can reduce errors while the additional computational effort required is only quadratic in the number of spikes. We validated the performance of HFF on large DBN models of biopathways. Each pathway has more than 30 species and the corresponding DBN has more than 3,000 nodes. Comparisons with FF and BK show that HFF is a useful and powerful approximate inferencing algorithm for DBNs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multilevel Computational Modeling and Quantitative Analysis of Bone Remodeling

    Page(s): 1366 - 1378
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2426 KB)  

    Our work focuses on bone remodeling with a multiscale breadth that ranges from modeling intracellular and intercellular RANK/RANKL signaling to tissue dynamics, by developing a multilevel modeling framework. Several important findings provide clear evidences of the multiscale properties of bone formation and of the links between RANK/RANKL and bone density in healthy and disease conditions. Recent studies indicate that the circulating levels of OPG and RANKL are inversely related to bone turnover and Bone Mineral Density (BMD) and contribute to the development of osteoporosis in postmenopausal women, and thalassemic patients. We make use of a spatial process algebra, the Shape Calculus, to control stochastic cell agents that are continuously remodeling the bone. We found that our description is effective for such a multiscale, multilevel process and that RANKL signaling small dynamic concentration defects are greatly amplified by the continuous alternation of absorption and formation resulting in large structural bone defects. This work contributes to the computational modeling of complex systems with a multilevel approach connecting formal languages and agent-based simulation tools. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A New Measure of Classifier Performance for Gene Expression Data

    Page(s): 1379 - 1386
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1076 KB) |  | HTML iconHTML  

    One of the major aims of many microarray experiments is to build discriminatory diagnosis and prognosis models. A large number of supervised methods have been proposed in literature for microarray-based classification for this purpose. Model evaluation and comparison is a critical issue and, the most of the time, is based on the classification cost. This classification cost is based on the costs of false positives and false negative, that are generally unknown in diagnostics problems. This uncertainty may highly impact the evaluation and comparison of the classifiers. We propose a new measure of classifier performance that takes account of the uncertainty of the error. We represent the available knowledge about the costs by a distribution function defined on the ratio of the costs. The performance of a classifier is therefore computed over the set of all possible costs weighted by their probability distribution. Our method is tested on both artificial and real microarray data sets. We show that the performance of classifiers is very depending of the ratio of the classification costs. In many cases, the best classifier can be identified by our new measure whereas the classic error measures fail. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Evolutionary Algorithm Approach for Feature Generation from Sequence Data and Its Application to DNA Splice Site Prediction

    Page(s): 1387 - 1398
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1417 KB)  

    Associating functional information with biological sequences remains a challenge for machine learning methods. The performance of these methods often depends on deriving predictive features from the sequences sought to be classified. Feature generation is a difficult problem, as the connection between the sequence features and the sought property is not known a priori. It is often the task of domain experts or exhaustive feature enumeration techniques to generate a few features whose predictive power is then tested in the context of classification. This paper proposes an evolutionary algorithm to effectively explore a large feature space and generate predictive features from sequence data. The effectiveness of the algorithm is demonstrated on an important component of the gene-finding problem, DNA splice site prediction. This application is chosen due to the complexity of the features needed to obtain high classification accuracy and precision. Our results test the effectiveness of the obtained features in the context of classification by Support Vector Machines and show significant improvement in accuracy and precision over state-of-the-art approaches. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Detecting Phenotype-Specific Interactions between Biological Processes from Microarray Data and Annotations

    Page(s): 1399 - 1409
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (470 KB) |  | HTML iconHTML  

    High throughput technologies enable researchers to measure expression levels on a genomic scale. However, the correct and efficient biological interpretation of such voluminous data remains a challenging problem. Many tools have been developed for the analysis of GO terms that are over- or under-represented in a list of differentially expressed genes. However, a previously unexplored aspect is the identification of changes in the way various biological processes interact in a given condition with respect to a reference. Here, we present a novel approach that aims at identifying such interactions between biological processes that are significantly different in a given phenotype with respect to normal. The proposed technique uses vector-space representation, SVD-based dimensionality reduction, differential weighting, and bootstrapping to asses the significance of the interactions under the multiple and complex dependencies expected between the biological processes. We illustrate our approach on two real data sets involving breast and lung cancer. More than 88 percent of the interactions found by our approach were deemed to be correct by an extensive manual review of literature. An interesting subset of such interactions is discussed in detail and shown to have the potential to open new avenues for research in lung and breast cancer. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Finding a Periodic Attractor of a Boolean Network

    Page(s): 1410 - 1421
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (613 KB) |  | HTML iconHTML  

    In this paper, we study the problem of finding a periodic attractor of a Boolean network (BN), which arises in computational systems biology and is known to be NP-hard. Since a general case is quite hard to solve, we consider special but biologically important subclasses of BNs. For finding an attractor of period 2 of a BN consisting of n OR functions of positive literals, we present a polynomial time algorithm. For finding an attractor of period 2 of a BN consisting of n AND/OR functions of literals, we present an O(1.985n) time algorithm. For finding an attractor of a fixed period of a BN consisting of n nested canalyzing functions and having constant treewidth w, we present an O(n2p(w+1)poly(n)) time algorithm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Gene Selection Using Iterative Feature Elimination Random Forests for Survival Outcomes

    Page(s): 1422 - 1431
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1787 KB) |  | HTML iconHTML  

    Although many feature selection methods for classification have been developed, there is a need to identify genes in high dimensional data with censored survival outcomes. Traditional methods for gene selection in classification problems have several drawbacks. First, the majority of the gene selection approaches for classification are single-gene based. Second, many of the gene selection procedures are not embedded within the algorithm itself. The technique of random forests has been found to perform well in high-dimensional data settings with survival outcomes. It also has an embedded feature to identify variables of importance. Therefore, it is an ideal candidate for gene selection in high-dimensional data with survival outcomes. In this paper, we develop a novel method based on the random forests to identify a set of prognostic genes. We compare our method with several machine learning methods and various node split criteria using several real data sets. Our method performed well in both simulations and real data analysis. Additionally, we have shown the advantages of our approach over single-gene-based approaches. Our method incorporates multivariate correlations in microarray data for survival outcomes. The described method allows us to better utilize the information available from microarray data with survival outcomes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hierarchical Motif Vectors for Prediction of Functional Sites in Amino Acid Sequences Using Quasi-Supervised Learning

    Page(s): 1432 - 1441
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (993 KB) |  | HTML iconHTML  

    We propose hierarchical motif vectors to represent local amino acid sequence configurations for predicting the functional attributes of amino acid sites on a global scale in a quasi-supervised learning framework. The motif vectors are constructed via wavelet decomposition on the variations of physico-chemical amino acid properties along the sequences. We then formulate a prediction scheme for the functional attributes of amino acid sites in terms of the respective motif vectors using the quasi-supervised learning algorithm that carries out predictions for all sites in consideration using only the experimentally verified sites. We have carried out comparative performance evaluation of the proposed method on the prediction of N-glycosylation of 55,184 sites possessing the consensus N-glycosylation sequon identified over 15,104 human proteins, out of which only 1,939 were experimentally verified Nglycosylation sites. In the experiments, the proposed method achieved better predictive performance than the alternative strategies from the literature. In addition, the predicted N-glycosylation sites showed good agreement with existing potential annotations, while the novel predictions belonged to proteins known to be modified by glycosylation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving the Prediction of Clinical Outcomes from Genomic Data Using Multiresolution Analysis

    Page(s): 1442 - 1450
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (859 KB) |  | HTML iconHTML  

    The prediction of patient's future clinical outcome, such as Alzheimer's and cardiac disease, using only genomic information is an open problem. In cases when genome-wide association studies (GWASs) are able to find strong associations between genomic predictors (e.g., SNPs) and disease, pattern recognition methods may be able to predict the disease well. Furthermore, by using signal processing methods, we can capitalize on latent multivariate interactions of genomic predictors. Such an approach to genomic pattern recognition for prediction of clinical outcomes is investigated in this work. In particular, we show how multiresolution transforms can be applied to genomic data to extract cues of multivariate interactions and, in some cases, improve on the predictive performance of clinical outcomes of standard classification methods. Our results show, for example, that an improvement of about 6 percent increase of the area under the ROC curve can be achieved using multiresolution spaces to train logistic regression to predict late-onset Alzheimer's disease (LOAD) compared to logistic regression applied directly on SNP data. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • LNA: Fast Protein Structural Comparison Using a Laplacian Characterization of Tertiary Structure

    Page(s): 1451 - 1458
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (935 KB) |  | HTML iconHTML  

    In the last two decades, a lot of protein 3D shapes have been discovered, characterized, and made available thanks to the Protein Data Bank (PDB), that is nevertheless growing very quickly. New scalable methods are thus urgently required to search through the PDB efficiently. This paper presents an approach entitled LNA (Laplacian Norm Alignment) that performs a structural comparison of two proteins with dynamic programming algorithms. This is achieved by characterizing each residue in the protein with scalar features. The feature values are calculated using a Laplacian operator applied on the graph corresponding to the adjacency matrix of the residues. The weighted Laplacian operator we use estimates, at various scales, local deformations of the topology where each residue is located. On some benchmarks, which are widely shared by the community, we obtain qualitatively similar results compared to other competing approaches, but with an algorithm one or two order of magnitudes faster. 180,000 protein comparisons can be done within 1 second with a single recent Graphical Processing Unit (GPU), which makes our algorithm very scalable and suitable for real-time database querying across the web. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • MORE: Mixed Optimization for Reverse Engineering&#x2014;An Application to Modeling Biological Networks Response via Sparse Systems of Nonlinear Differential Equations

    Page(s): 1459 - 1471
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (671 KB) |  | HTML iconHTML  

    Reverse engineering is the problem of inferring the structure of a network of interactions between biological variables from a set of observations. In this paper, we propose an optimization algorithm, called MORE, for the reverse engineering of biological networks from time series data. The model inferred by MORE is a sparse system of nonlinear differential equations, complex enough to realistically describe the dynamics of a biological system. MORE tackles separately the discrete component of the problem, the determination of the biological network topology, and the continuous component of the problem, the strength of the interactions. This approach allows us both to enforce system sparsity, by globally constraining the number of edges, and to integrate a priori information about the structure of the underlying interaction network. Experimental results on simulated and real-world networks show that the mixed discrete/continuous optimization approach of MORE significantly outperforms standard continuous optimization and that MORE is competitive with the state of the art in terms of accuracy of the inferred networks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Noniterative Convex Optimization Methods for Network Component Analysis

    Page(s): 1472 - 1481
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (703 KB) |  | HTML iconHTML  

    This work studies the reconstruction of gene regulatory networks by the means of network component analysis (NCA). We will expound a family of convex optimization-based methods for estimating the transcription factor control strengths and the transcription factor activities (TFAs). The approach taken in this work is to decompose the problem into a network connectivity strength estimation phase and a transcription factor activity estimation phase. In the control strength estimation phase, we formulate a new subspace-based method incorporating a choice of multiple error metrics. For the source estimation phase we propose a total least squares (TLS) formulation that generalizes many existing methods. Both estimation procedures are noniterative and yield the optimal estimates according to various proposed error metrics. We test the performance of the proposed algorithms on simulated data and experimental gene expression data for the yeast Saccharomyces cerevisiae and demonstrate that the proposed algorithms have superior effectiveness in comparison with both Bayesian Decomposition (BD) and our previous FastNCA approach, while the computational complexity is still orders of magnitude less than BD. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Qualitative Reasoning for Biological Network Inference from Systematic Perturbation Experiments

    Page(s): 1482 - 1491
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (609 KB) |  | HTML iconHTML  

    The systematic perturbation of the components of a biological system has been proven among the most informative experimental setups for the identification of causal relations between the components. In this paper, we present Systematic Perturbation-Qualitative Reasoning (SPQR), a novel Qualitative Reasoning approach to automate the interpretation of the results of systematic perturbation experiments. Our method is based on a qualitative abstraction of the experimental data: for each perturbation experiment, measured values of the observed variables are modeled as lower, equal or higher than the measurements in the wild type condition, when no perturbation is applied. The algorithm exploits a set of IF-THEN rules to infer causal relations between the variables, analyzing the patterns of propagation of the perturbation signals through the biological network, and is specifically designed to minimize the rate of false positives among the inferred relations. Tested on both simulated and real perturbation data, SPQR indeed exhibits a significantly higher precision than the state of the art. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

This bimonthly publishes archival research results related to the algorithmic, mathematical, statistical, and computational methods that are central in bioinformatics and computational biology.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Ying Xu
University of Georgia
xyn@bmb.uga.edu

Associate Editor-in-Chief
Dong Xu
University of Missouri
xudong@missouri.edu