Scheduled System Maintenance on May 29th, 2015:
IEEE Xplore will be upgraded between 11:00 AM and 10:00 PM EDT. During this time there may be intermittent impact on performance. We apologize for any inconvenience.
By Topic

Computational Biology and Bioinformatics, IEEE/ACM Transactions on

Issue 2 • Date March-April 2015

Filter Results

Displaying Results 1 - 22 of 22
  • Guest Editorial for Special Section on BIBM 2013

    Publication Year: 2015 , Page(s): 252 - 253
    Save to Project icon | Request Permissions | PDF file iconPDF (74 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Predicting Microbial Interactions Using Vector Autoregressive Model with Graph Regularization

    Publication Year: 2015 , Page(s): 254 - 261
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (732 KB) |  | HTML iconHTML  

    Microbial interactions play important roles on the structure and function of complex microbial communities. With the rapid accumulation of high-throughput metagenomic or 16S rRNA sequencing data, it is possible to infer complex microbial interactions. Co-occurrence patterns of microbial species among multiple samples are often utilized to infer interactions. There are few methods to consider the temporally interacting patterns among microbial species. In this paper, we present a Graph-regularized Vector Autoregressive (GVAR) model to infer causal relationships among microbial entities. The new model has advantage comparing to the original vector autoregressive (VAR) model. Specifically, GVAR can incorporate similarity information for microbial interaction inference - i.e., GVAR assumed that if two species are similar in the previous stage, they tend to have similar influence on the other species in the next stage. We apply the model on a time series dataset of human gut microbiome which was treated with repeated antibiotics. The experimental results indicate that the new approach has better performance than several other VAR-based models and demonstrate its capability of extracting relevant microbial interactions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Predicting the Pro-Longevity or Anti-Longevity Effect of Model Organism Genes with New Hierarchical Feature Selection Methods

    Publication Year: 2015 , Page(s): 262 - 275
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (366 KB) |  | HTML iconHTML  

    Ageing is a highly complex biological process that is still poorly understood. With the growing amount of ageing-related data available on the web, in particular concerning the genetics of ageing, it is timely to apply data mining methods to that data, in order to try to discover novel patterns that may assist ageing research. In this work, we introduce new hierarchical feature selection methods for the classification task of data mining and apply them to ageing-related data from four model organisms: Caenorhabditis elegans (worm), Saccharomyces cerevisiae (yeast), Drosophila melanogaster (fly), and Mus musculus (mouse). The main novel aspect of the proposed feature selection methods is that they exploit hierarchical relationships in the set of features (Gene Ontology terms) in order to improve the predictive accuracy of the Naïve Bayes and 1-Nearest Neighbour (1-NN) classifiers, which are used to classify model organisms' genes into pro-longevity or anti-longevity genes. The results show that our hierarchical feature selection methods, when used together with Naïve Bayes and 1-NN classifiers, obtain higher predictive accuracy than the standard (without feature selection) Naïve Bayes and 1-NN classifiers, respectively. We also discuss the biological relevance of a number of Gene Ontology terms very frequently selected by our algorithms in our datasets. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • UDoNC: An Algorithm for Identifying Essential Proteins Based on Protein Domains and Protein-Protein Interaction Networks

    Publication Year: 2015 , Page(s): 276 - 288
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1301 KB) |  | HTML iconHTML  

    Prediction of essential proteins which are crucial to an organism's survival is important for disease analysis and drug design, as well as the understanding of cellular life. The majority of prediction methods infer the possibility of proteins to be essential by using the network topology. However, these methods are limited to the completeness of available protein-protein interaction (PPI) data and depend on the network accuracy. To overcome these limitations, some computational methods have been proposed. However, seldom of them solve this problem by taking consideration of protein domains. In this work, we first analyze the correlation between the essentiality of proteins and their domain features based on data of 13 species. We find that the proteins containing more protein domain types which rarely occur in other proteins tend to be essential. Accordingly, we propose a new prediction method, named UDoNC, by combining the domain features of proteins with their topological properties in PPI network. In UDoNC, the essentiality of proteins is decided by the number and the frequency of their protein domain types, as well as the essentiality of their adjacent edges measured by edge clustering coefficient. The experimental results on S. cerevisiae data show that UDoNC outperforms other existing methods in terms of area under the curve (AUC). Additionally, UDoNC can also perform well in predicting essential proteins on data of E. coli. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • ENISI SDE: A New Web-Based Tool for Modeling Stochastic Processes

    Publication Year: 2015 , Page(s): 289 - 297
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (483 KB) |  | HTML iconHTML  

    Modeling and simulations approaches have been widely used in computational biology, mathematics, bioinformatics and engineering to represent complex existing knowledge and to effectively generate novel hypotheses. While deterministic modeling strategies are widely used in computational biology, stochastic modeling techniques are not as popular due to a lack of user-friendly tools. This paper presents ENISI SDE, a novel web-based modeling tool with stochastic differential equations. ENISI SDE provides user-friendly web user interfaces to facilitate adoption by immunologists and computational biologists. This work provides three major contributions: (1) discussion of SDE as a generic approach for stochastic modeling in computational biology; (2) development of ENISI SDE, a web-based user-friendly SDE modeling tool that highly resembles regular ODE-based modeling; (3) applying ENISI SDE modeling tool through a use case for studying stochastic sources of cell heterogeneity in the context of CD4+ T cell differentiation. The CD4+ T cell differential ODE model has been published [8] and can be downloaded from biomodels.net. The case study reproduces a biological phenomenon that is not captured by the previously published ODE model and shows the effectiveness of SDE as a stochastic modeling approach in biology in general and immunology in particular and the power of ENISI SDE. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Integrated Approach to Sequence-Independent Local Alignment of Protein Binding Sites

    Publication Year: 2015 , Page(s): 298 - 308
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (903 KB) |  | HTML iconHTML  

    Accurate alignment of protein-protein binding sites can aid in protein docking studies and constructing templates for predicting structure of protein complexes, along with in-depth understanding of evolutionary and functional relationships. However, over the past three decades, structural alignment algorithms have focused predominantly on global alignments with little effort on the alignment of local interfaces. In this paper, we introduce the PBSalign ( Protein-protein Binding Site alignment) method, which integrates techniques in graph theory, 3D localized shape analysis, geometric scoring, and utilization of physicochemical and geometrical properties. Computational results demonstrate that PBSalign is capable of identifying similar homologous and analogous binding sites accurately and performing alignments with better geometric match measures than existing protein-protein interface comparison tools. The proportion of better alignment quality generated by PBSalign is 46, 56, and 70 percent more than iAlign as judged by the average match index (MI), similarity index (SI), and structural alignment score (SAS), respectively. PBSalign provides the life science community an efficient and accurate solution to binding-site alignment while striking the balance between topological details and computational complexity. View full abstract»

    Open Access
  • P-Finder: Reconstruction of Signaling Networks from Protein-Protein Interactions and GO Annotations

    Publication Year: 2015 , Page(s): 309 - 321
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (858 KB) |  | HTML iconHTML  

    Because most complex genetic diseases are caused by defects of cell signaling, illuminating a signaling cascade is essential for understanding their mechanisms. We present three novel computational algorithms to reconstruct signaling networks between a starting protein and an ending protein using genome-wide protein-protein interaction (PPI) networks and gene ontology (GO) annotation data. A signaling network is represented as a directed acyclic graph in a merged form of multiple linear pathways. An advanced semantic similarity metric is applied for weighting PPIs as the preprocessing of all three methods. The first algorithm repeatedly extends the list of nodes based on path frequency towards an ending protein. The second algorithm repeatedly appends edges based on the occurrence of network motifs which indicate the link patterns more frequently appearing in a PPI network than in a random graph. The last algorithm uses the information propagation technique which iteratively updates edge orientations based on the path strength and merges the selected directed edges. Our experimental results demonstrate that the proposed algorithms achieve higher accuracy than previous methods when they are tested on well-studied pathways of S. cerevisiae. Furthermore, we introduce an interactive web application tool, called P-Finder, to visualize reconstructed signaling networks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A New Semantic Functional Similarity over Gene Ontology

    Publication Year: 2015 , Page(s): 322 - 334
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (958 KB) |  | HTML iconHTML  

    Identifying functionally similar or closely related genes and gene products has significant impacts on biological and clinical studies as well as drug discovery. In this paper, we propose an effective and practically useful method measuring both gene and gene product similarity by integrating the topology of gene ontology, known functional domains and their functional annotations. The proposed method is comprehensively evaluated through statistical analysis of the similarities derived from sequence, structure and phylogenetic profiles, and clustering analysis of disease genes clusters. Our results show that the proposed method clearlyoutperforms other conventional methods. Furthermore, literature analysis also reveals that the proposed method is both statistically and biologically promising for identifying functionally similar genes or gene products. In particular, we demonstrate that the proposed functional similarity metric is capable of discoverying new disease related genes or gene products. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Comparative Assessment of Predictive Accuracies of Conventional and Machine Learning Scoring Functions for Protein-Ligand Binding Affinity Prediction

    Publication Year: 2015 , Page(s): 335 - 347
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (724 KB) |  | HTML iconHTML  

    Accurately predicting the binding affinities of large diverse sets of protein-ligand complexes efficiently is a key challenge in computational biomolecular science, with applications in drug discovery, chemical biology, and structural biology. Since a scoring function (SF) is used to score, rank, and identify potential drug leads, the fidelity with which it predicts the affinity of a ligand candidate for a protein's binding site has a significant bearing on the accuracy of virtual screening. Despite intense efforts in developing conventional SFs, which are either force-field based, knowledge-based, or empirical, their limited predictive accuracy has been a major roadblock toward cost-effective drug discovery. Therefore, in this work, we explore a range of novel SFs employing different machine-learning (ML) approaches in conjunction with a variety of physicochemical and geometrical features characterizing protein-ligand complexes. We assess the scoring accuracies of these new ML SFs as well as those of conventional SFs in the context of the 2007 and 2010 PDBbind benchmark datasets on both diverse and protein-family-specific test sets. We also investigate the influence of the size of the training dataset and the type and number of features used on scoring accuracy. We find that the best performing ML SF has a Pearson correlation coefficient of 0.806 between predicted and measured binding affinities compared to 0.644 achieved by a state-of-the-art conventional SF. We also find that ML SFs benefit more than their conventional counterparts from increases in the number of features and the size of training dataset. In addition, they perform better on novel proteins that they were never trained on before. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Fractal Dimension and Wavelet Transform Based Method for Protein Sequence Similarity Analysis

    Publication Year: 2015 , Page(s): 348 - 359
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1074 KB) |  | HTML iconHTML  

    One of the key tasks related to proteins is the similarity comparison of protein sequences in the area of bioinformatics and molecular biology, which helps the prediction and classification of protein structure and function. It is a significant and open issue to find similar proteins from a large scale of protein database efficiently. This paper presents a new distance based protein similarity analysis using a new encoding method of protein sequence which is based on fractal dimension. The protein sequences are first represented into the 1-dimensional feature vectors by their biochemical quantities. A series of Hybrid method involving discrete Wavelet transform, Fractal dimension calculation (HWF) with sliding window are then applied to form the feature vector. At last, through the similarity calculation, we can obtain the distance matrix, by which, the phylogenic tree can be constructed. We apply this approach by analyzing the ND5 (NADH dehydrogenase subunit 5) protein cluster data set. The experimental results show that the proposed model is more accurate than the existing ones such as Su's model, Zhang's model, Yao's model and MEGA software, and it is consistent with some known biological facts. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Property-Driven Methodology for Formal Analysis of Synthetic Biology Systems

    Publication Year: 2015 , Page(s): 360 - 371
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (610 KB) |  | HTML iconHTML  

    This paper proposes a formal methodology to analyse bio-systems, in particular synthetic biology systems. An integrative analysis perspective combining different model checking approaches based on different property categories is provided. The methodology is applied to the synthetic pulse generator system and several verification experiments are carried out to demonstrate the use of our approach to formally analyse various aspects of synthetic biology systems. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Topology Potential-Based Method for Identifying Essential Proteins from PPI Networks

    Publication Year: 2015 , Page(s): 372 - 383
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1040 KB) |  | HTML iconHTML  

    Essential proteins are indispensable for cellular life. It is of great significance to identify essential proteins that can help us understand the minimal requirements for cellular life and is also very important for drug design. However, identification of essential proteins based on experimental approaches are typically time-consuming and expensive. With the development of high-throughput technology in the post-genomic era, more and more protein-protein interaction data can be obtained, which make it possible to study essential proteins from the network level. There have been a series of computational approaches proposed for predicting essential proteins based on network topologies. Most of these topology based essential protein discovery methods were to use network centralities. In this paper, we investigate the essential proteins' topological characters from a completely new perspective. To our knowledge it is the first time that topology potential is used to identify essential proteins from a protein-protein interaction (PPI) network. The basic idea is that each protein in the network can be viewed as a material particle which creates a potential field around itself and the interaction of all proteins forms a topological field over the network. By defining and computing the value of each protein's topology potential, we can obtain a more precise ranking which reflects the importance of proteins from the PPI network. The experimental results show that topology potential-based methods TP and TP-NC outperform traditional topology measures: degree centrality (DC), betweenness centrality (BC), closeness centrality (CC), subgraph centrality (SC), eigenvector centrality (EC), information centrality (IC), and network centrality (NC) for predicting essential proteins. In addition, these centrality measures are improved on their performance for identifying essential proteins in biological network when controlled by topology potential. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Efficient Exact Algorithm for the Motif Stem Search Problem over Large Alphabets

    Publication Year: 2015 , Page(s): 384 - 397
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (698 KB) |  | HTML iconHTML  

    In recent years, there has been an increasing interest in planted (l, d) motif search (PMS) with applications to discovering significant segments in biological sequences. However, there has been little discussion about PMS over large alphabets. This paper focuses on motif stem search (MSS), which is recently introduced to search motifs on large-alphabet inputs. A motif stem is an l-length string with some wildcards. The goal of the MSS problem is to find a set of stems that represents a superset of all (l , d) motifs present in the input sequences, and the superset is expected to be as small as possible. The three main contributions of this paper are as follows: (1) We build motif stem representation more precisely by using regular expressions. (2) We give a method for generating all possible motif stems without redundant wildcards. (3) We propose an efficient exact algorithm, called StemFinder, for solving the MSS problem. Compared with the previous MSS algorithms, StemFinder runs much faster and reports fewer stems which represent a smaller superset of all (l, d) motifs. StemFinder is freely available at http://sites.google.com/site/feqond/stemfinder. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Improved Integral Inequality to Stability Analysis of Genetic Regulatory Networks With Interval Time-Varying Delays

    Publication Year: 2015 , Page(s): 398 - 409
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (404 KB) |  | HTML iconHTML  

    This paper focuses on stability analysis for a class of genetic regulatory networks with interval time-varying delays. An improved integral inequality concerning on double-integral items is first established. Then, we use the improved integral inequality to deal with the resultant double-integral items in the derivative of the involved Lyapunov-Krasovskii functional. As a result, a delay-range-dependent and delay-rate-dependent asymptotical stability criterion is established for genetic regulatory networks with differential time-varying delays. Furthermore, it is theoretically proven that the stability criterion proposed here is less conservative than the corresponding one in [ Neurocomputing, 2012, 93: 19-26]. Based on the obtained result, another stability criterion is given under the case that the information of the derivatives of delays is unknown. Finally, the effectiveness of the approach proposed in this paper is illustrated by a pair of numerical examples which give the comparisons of stability criteria proposed in this paper and some literature. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Burial Level Change Defines a High Energetic Relevance for Protein Binding Interfaces

    Publication Year: 2015 , Page(s): 410 - 421
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (671 KB) |  | HTML iconHTML  

    Protein-protein interfaces defined through atomic contact or solvent accessibility change are widely adopted in structural biology studies. But, these definitions cannot precisely capture energetically important regions at protein interfaces. The burial depth of an atom in a protein is related to the atom's energy. This work investigates how closely the change in burial level of an atom/residue upon complexation is related to the binding. Burial level change is different from burial level itself. An atom deeply buried in a monomer with a high burial level may not change its burial level after an interaction and it may have little burial level change. We hypothesize that an interface is a region of residues all undergoing burial level changes after interaction. By this definition, an interface can be decomposed into an onion-like structure according to the burial level change extent. We found that our defined interfaces cover energetically important residues more precisely, and that the binding free energy of an interface is distributed progressively from the outermost layer to the core. These observations are used to predict binding hot spots. Our approach's F-measure performance on a benchmark dataset of alanine mutagenesis residues is much superior or similar to those by complicated energy modeling or machine learning approaches. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Data Requirement for Phylogenetic Inference from Multiple Loci: A New Distance Method

    Publication Year: 2015 , Page(s): 422 - 432
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (412 KB) |  | HTML iconHTML  

    We consider the problem of estimating the evolutionary history of a set of species (phylogeny or species tree) from several genes. It is known that the evolutionary history of individual genes (gene trees) might be topologically distinct from each other and from the underlying species tree, possibly confounding phylogenetic analysis. A further complication in practice is that one has to estimate gene trees from molecular sequences of finite length. We provide the first full data-requirement analysis of a species tree reconstruction method that takes into account estimation errors at the gene level. Under that criterion, we also devise a novel reconstruction algorithm that provably improves over all previous methods in a regime of interest. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Gene Selection Integrated with Biological Knowledge for Plant Stress Response Using Neighborhood System and Rough Set Theory

    Publication Year: 2015 , Page(s): 433 - 444
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (438 KB) |  | HTML iconHTML  

    Mining knowledge from gene expression data is a hot research topic and direction of bioinformatics. Gene selection and sample classification are significant research trends, due to the large amount of genes and small size of samples in gene expression data. Rough set theory has been successfully applied to gene selection, as it can select attributes without redundancy. To improve the interpretability of the selected genes, some researchers introduced biological knowledge. In this paper, we first employ neighborhood system to deal directly with the new information table formed by integrating gene expression data with biological knowledge, which can simultaneously present the information in multiple perspectives and do not weaken the information of individual gene for selection and classification. Then, we give a novel framework for gene selection and propose a significant gene selection method based on this framework by employing reduction algorithm in rough set theory. The proposed method is applied to the analysis of plant stress response. Experimental results on three data sets show that the proposed method is effective, as it can select significant gene subsets without redundancy and achieve high classification accuracy. Biological analysis for the results shows that the interpretability is well. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • GPUDePiCt: A Parallel Implementation of a Clustering Algorithm for Computing Degenerate Primers on Graphics Processing Units

    Publication Year: 2015 , Page(s): 445 - 454
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (703 KB) |  | HTML iconHTML  

    In order to make multiple copies of a target sequence in the laboratory, the technique of Polymerase Chain Reaction (PCR) requires the design of “primers”, which are short fragments of nucleotides complementary to the flanking regions of the target sequence. If the same primer is to amplify multiple closely related target sequences, then it is necessary to make the primers “degenerate”, which would allow it to hybridize to target sequences with a limited amount of variability that may have been caused by mutations. However, the PCR technique can only allow a limited amount of degeneracy, and therefore the design of degenerate primers requires the identification of reasonably well-conserved regions in the input sequences. We take an existing algorithm for designing degenerate primers that is based on clustering and parallelize it in a web-accessible software package GPUDePiCt, using a shared memory model and the computing power of Graphics Processing Units (GPUs). We test our implementation on large sets of aligned sequences from the human genome and show a multi-fold speedup for clustering using our hybrid GPU/CPU implementation over a pure CPU approach for these sequences, which consist of more than 7,500 nucleotides. We also demonstrate that this speedup is consistent over larger numbers and longer lengths of aligned sequences. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Identification of Protein Complexes from Tandem Affinity Purification/Mass Spectrometry Data via Biased Random Walk

    Publication Year: 2015 , Page(s): 455 - 466
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (794 KB) |  | HTML iconHTML  

    Systematic identification of protein complexes from protein-protein interaction networks (PPIs) is an important application of data mining in life science. Over the past decades, various new clustering techniques have been developed based on modelling PPIs as binary relations. Non-binary information of co-complex relations (prey/bait) in PPIs data derived from tandem affinity purification/mass spectrometry (TAP-MS) experiments has been unfairly disregarded. In this paper, we propose a Biased Random Walk based algorithm for detecting protein complexes from TAP-MS data, resulting in the random walk with restarting baits (RWRB). RWRB is developed based on Random walk with restart. The main contribution of RWRB is the incorporation of co-complex relations in TAP-MS PPI networks into the clustering process, by implementing a new restarting strategy during the process of random walk. Through experimentation on un-weighted and weighted TAP-MS data sets, we validated biological significance of our results by mapping them to manually curated complexes. Results showed that, by incorporating non-binary, co-membership information, significant improvement has been achieved in terms of both statistical measurements and biological relevance. Better accuracy demonstrates that the proposed method outperformed several state-of-the-art clustering algorithms for the detection of protein complexes in TAP-MS data. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Identifying Driver Nodes in the Human Signaling Network Using Structural Controllability Analysis

    Publication Year: 2015 , Page(s): 467 - 472
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (417 KB) |  | HTML iconHTML  

    Cell signaling governs the basic cellular activities and coordinates the actions in cell. Abnormal regulations in cell signaling processing are responsible for many human diseases, such as diabetes and cancers. With the accumulation of massive data related to human cell signaling, it is feasible to obtain a human signaling network. Some studies have shown that interesting biological phenomenon and drug-targets could be discovered by applying structural controllability analysis to biological networks. In this work, we apply structural controllability to a human signaling network and detect driver nodes, providing a systematic analysis of the role of different proteins in controlling the human signaling network. We find that the proteins in the upstream of the signaling information flow and the low in-degree proteins play a crucial role in controlling the human signaling network. Interestingly, inputting different control signals on the regulators of the cancer-associated genes could cost less than controlling the cancer-associated genes directly in order to control the whole human signaling network in the sense that less drive nodes are needed. This research provides a fresh perspective for controlling the human cell signaling system. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving Integration Effectiveness of ID Mapping Based Biological Record Linkage

    Publication Year: 2015 , Page(s): 473 - 486
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1749 KB) |  | HTML iconHTML  

    Traditionally, biological objects such as genes, proteins, and pathways are represented by a convenient identifier, or ID, which is then used to cross reference, link and describe objects in biological databases. Relationships among the objects are often established using non-trivial and computationally complex ID mapping systems or converters, and are stored in authoritative databases such as UniGene, GeneCards, PIR and BioMart. Despite best efforts, such mappings are largely incomplete and riddled with false negatives. Consequently, data integration using record linkage that relies on these mappings produces poor quality of data, inadvertently leading to erroneous conclusions. In this paper, we discuss this largely ignored dimension of data integration, examine how the ubiquitous use of identifiers in biological databases is a significant barrier to knowledge fusion using distributed computational pipelines, and propose two algorithms for ad hoc and restriction free ID mapping of arbitrary types using online resources. We also propose two declarative statements for ID conversion and data integration based on ID mapping on-the-fly. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multiple Break-Points Detection in Array CGH Data via the Cross-Entropy Method

    Publication Year: 2015 , Page(s): 487 - 498
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3735 KB) |  | HTML iconHTML  

    Array comparative genome hybridization (aCGH) is a widely used methodology to detect copy number variations of a genome in high resolution. Knowing the number of break-points and their corresponding locations in genomic sequences serves different biological needs. Primarily, it helps to identify disease-causing genes that have functional importance in characterizing genome wide diseases. For human autosomes the normal copy number is two, whereas at the sites of oncogenes it increases (gain of DNA) and at the tumour suppressor genes it decreases (loss of DNA). The majority of the current detection methods are deterministic in their set-up and use dynamic programming or different smoothing techniques to obtain the estimates of copy number variations. These approaches limit the search space of the problem due to different assumptions considered in the methods and do not represent the true nature of the uncertainty associated with the unknown break-points in genomic sequences. We propose the Cross-Entropy method, which is a model-based stochastic optimization technique as an exact search method, to estimate both the number and locations of the break-points in aCGH data. We model the continuous scale log-ratio data obtained by the aCGH technique as a multiple break-point problem. The proposed methodology is compared with well established publicly available methods using both artificially generated data and real data. Results show that the proposed procedure is an effective way of estimating number and especially the locations of break-points with high level of precision. Availability: The methods described in this article are implemented in the new R package breakpoint and it is available from the Comprehensive R Archive Network at http://CRAN.R-project.org/package=breakpoint. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

This bimonthly publishes archival research results related to the algorithmic, mathematical, statistical, and computational methods that are central in bioinformatics and computational biology.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Ying Xu
University of Georgia
xyn@bmb.uga.edu

Associate Editor-in-Chief
Dong Xu
University of Missouri
xudong@missouri.edu