By Topic

Computational Biology and Bioinformatics, IEEE/ACM Transactions on

Issue 2 • Date April-June 2010

Filter Results

Displaying Results 1 - 25 of 27
  • [Front cover]

    Publication Year: 2010 , Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (1200 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Publication Year: 2010 , Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (225 KB)  
    Freely Available from IEEE
  • EIC Editorial

    Publication Year: 2010 , Page(s): 193 - 194
    Save to Project icon | Request Permissions | PDF file iconPDF (123 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Data Mining in Bioinformatics: Selected Papers from BIOKDD

    Publication Year: 2010 , Page(s): 195 - 196
    Save to Project icon | Request Permissions | PDF file iconPDF (74 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • GPD: A Graph Pattern Diffusion Kernel for Accurate Graph Classification with Applications in Cheminformatics

    Publication Year: 2010 , Page(s): 197 - 207
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1807 KB) |  | HTML iconHTML  

    Graph data mining is an active research area. Graphs are general modeling tools to organize information from heterogeneous sources and have been applied in many scientific, engineering, and business fields. With the fast accumulation of graph data, building highly accurate predictive models for graph data emerges as a new challenge that has not been fully explored in the data mining community. In this paper, we demonstrate a novel technique called graph pattern diffusion (GPD) kernel. Our idea is to leverage existing frequent pattern discovery methods and to explore the application of kernel classifier (e.g., support vector machine) in building highly accurate graph classification. In our method, we first identify all frequent patterns from a graph database. We then map subgraphs to graphs in the graph database and use a process we call "pattern diffusion?? to label nodes in the graphs. Finally, we designed a graph alignment algorithm to compute the inner product of two graphs. We have tested our algorithm using a number of chemical structure data. The experimental results demonstrate that our method is significantly better than competing methods such as those kernel functions based on paths, cycles, and subgraphs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Molecular Function Prediction Using Neighborhood Features

    Publication Year: 2010 , Page(s): 208 - 217
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1613 KB) |  | HTML iconHTML  

    The recent advent of high-throughput methods has generated large amounts of gene interaction data. This has allowed the construction of genomewide networks. A significant number of genes in such networks remain uncharacterized and predicting the molecular function of these genes remains a major challenge. A number of existing techniques assume that genes with similar functions are topologically close in the network. Our hypothesis is that genes with similar functions observe similar annotation patterns in their neighborhood, regardless of the distance between them in the interaction network. We thus predict molecular functions of uncharacterized genes by comparing their functional neighborhoods to genes of known function. We propose a two-phase approach. First, we extract functional neighborhood features of a gene using Random Walks with Restarts. We then employ a KNN classifier to predict the function of uncharacterized genes based on the computed neighborhood features. We perform leave-one-out validation experiments on two S. cerevisiae interaction networks and show significant improvements over previous techniques. Our technique provides a natural control of the trade-off between accuracy and coverage of prediction. We further propose and evaluate prediction in sparse genomes by exploiting features from well-annotated genomes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Metric on the Space of Reduced Phylogenetic Networks

    Publication Year: 2010 , Page(s): 218 - 222
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (285 KB) |  | HTML iconHTML  

    Phylogenetic networks are leaf-labeled, rooted, acyclic, and directed graphs that are used to model reticulate evolutionary histories. Several measures for quantifying the topological dissimilarity between two phylogenetic networks have been devised, each of which was proven to be a metric on certain restricted classes of phylogenetic networks. A biologically motivated class of phylogenetic networks, namely, reduced phylogenetic networks, was recently introduced. None of the existing measures is a metric on the space of reduced phylogenetic networks. In this paper, we provide a metric on the space of reduced phylogenetic networks that is computable in time polynomial in the size of the networks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automated Hierarchical Density Shaving: A Robust Automated Clustering and Visualization Framework for Large Biological Data Sets

    Publication Year: 2010 , Page(s): 223 - 237
    Cited by:  Papers (3)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2702 KB) |  | HTML iconHTML  

    A key application of clustering data obtained from sources such as microarrays, protein mass spectroscopy, and phylogenetic profiles is the detection of functionally related genes. Typically, only a small number of functionally related genes cluster into one or more groups, and the rest need to be ignored. For such situations, we present Automated Hierarchical Density Shaving (Auto-HDS), a framework that consists of a fast hierarchical density-based clustering algorithm and an unsupervised model selection strategy. Auto-HDS can automatically select clusters of different densities, present them in a compact hierarchy, and rank individual clusters using an innovative stability criteria. Our framework also provides a simple yet powerful 2D visualization of the hierarchy of clusters that is useful for further interactive exploration. We present results on Gasch and Lee microarray data sets to show the effectiveness of our methods. Additional results on other biological data are included in the supplemental material. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automated Isolation of Translational Efficiency Bias That Resists the Confounding Effect of GC(AT)-Content

    Publication Year: 2010 , Page(s): 238 - 250
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4562 KB) |  | HTML iconHTML  

    Genomic sequencing projects are an abundant source of information for biological studies ranging from the molecular to the ecological in scale; however, much of the information present may yet be hidden from casual analysis. One such information domain, trends in codon usage, can provide a wealth of information about an organism's genes and their expression. Degeneracy in the genetic code allows more than one triplet codon to code for the same amino acid, and usage of these codons is often biased such that one or more of these synonymous codons are preferred. Detection of this bias is an important tool in the analysis of genomic data, particularly as a predictor of gene expressivity. Methods for identifying codon usage bias in genomic data that rely solely on genomic sequence data are susceptible to being confounded by the presence of several factors simultaneously influencing codon selection. Presented here is a new technique for removing the effects of one of the more common confounding factors, GC(AT)-content, and of visualizing the search-space for codon usage bias through the use of a solution landscape. This technique successfully isolates expressivity-related codon usage trends, using only genomic sequence information, where other techniques fail due to the presence of GC(AT)-content confounding influences. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Gene Association Networks from Microarray Data Using a Regularized Estimation of Partial Correlation Based on PLS Regression

    Publication Year: 2010 , Page(s): 251 - 262
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1651 KB) |  | HTML iconHTML  

    Reconstruction of gene-gene interactions from large-scale data such as microarrays is a first step toward better understanding the mechanisms at work in the cell. Two main issues have to be managed in such a context: 1) choosing which measures have to be used to distinguish between direct and indirect interactions from high-dimensional microarray data and 2) constructing networks with a low proportion of false-positive edges. We present an efficient methodology for the reconstruction of gene interaction networks in a small-sample-size setting. The strength of independence of any two genes is measured, in such "high-dimensional network," by a regularized estimation of partial correlation based on Partial Least Squares Regression. We finally emphasize specific properties of the proposed method. To assess the sensitivity and specificity of the method, we carried out the reconstruction of networks from simulated data. We also tested PLS-based partial correlation network on static and dynamic real microarray data. An R implementation of the proposed algorithm is available from http://biodev.extra.cea.fr/plspcnetwork/. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Identification of Full and Partial Class Relevant Genes

    Publication Year: 2010 , Page(s): 263 - 277
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4863 KB) |  | HTML iconHTML  

    Multiclass cancer classification on microarray data has provided the feasibility of cancer diagnosis across all of the common malignancies in parallel. Using multiclass cancer feature selection approaches, it is now possible to identify genes relevant to a set of cancer types. However, besides identifying the relevant genes for the set of all cancer types, it is deemed to be more informative to biologists if the relevance of each gene to specific cancer or subset of cancer types could be revealed or pinpointed. In this paper, we introduce two new definitions of multiclass relevancy features, i.e., full class relevant (FCR) and partial class relevant (PCR) features. Particularly, FCR denotes genes that serve as candidate biomarkers for discriminating all cancer types. PCR, on the other hand, are genes that distinguish subsets of cancer types. Subsequently, a Markov blanket embedded memetic algorithm is proposed for the simultaneous identification of both FCR and PCR genes. Results obtained on commonly used synthetic and real-world microarray data sets show that the proposed approach converges to valid FCR and PCR genes that would assist biologists in their research work. The identification of both FCR and PCR genes is found to generate improvement in classification accuracy on many microarray data sets. Further comparison study to existing state-of-the-art feature selection algorithms also reveals the effectiveness and efficiency of the proposed approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Model Composition for Macromolecular Regulatory Networks

    Publication Year: 2010 , Page(s): 278 - 287
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1570 KB) |  | HTML iconHTML  

    Models of regulatory networks become more difficult to construct and understand as they grow in size and complexity. Large models are usually built up from smaller models, representing subsets of reactions within the larger network. To assist modelers in this composition process, we present a formal approach for model composition, a wizard-style program for implementing the approach, and suggested language extensions to the Systems Biology Markup Language to support model composition. To illustrate the features of our approach and how to use the JigCell Composition Wizard, we build up a model of the eukaryotic cell cycle "engine?? from smaller pieces. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reassortment Networks for Investigating the Evolution of Segmented Viruses

    Publication Year: 2010 , Page(s): 288 - 298
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3150 KB) |  | HTML iconHTML  

    Many viruses of interest, such as influenza A, have distinct segments in their genome. The evolution of these viruses involves mutation and reassortment, where segments are interchanged between viruses that coinfect a host. Phylogenetic trees can be constructed to investigate the mutation-driven evolution of individual viral segments. However, reassortment events among viral genomes are not well depicted in such bifurcating trees. We propose the concept of reassortment networks to analyze the evolution of segmented viruses. These are layered graphs in which the layers represent evolutionary stages such as a temporal series of seasons in which influenza viruses are isolated. Nodes represent viral isolates and reassortment events between pairs of isolates. Edges represent evolutionary steps, while weights on edges represent edit costs of reassortment and mutation events. Paths represent possible transformation series among viruses. The length of each path is the sum edit cost of the events required to transform one virus into another. In order to analyze ?? stages of evolution of n viruses with segments of maximum length m, we first compute the pairwise distances between all corresponding segments of all viruses in O(m2n2) time using dynamic programming. The reassortment network, with O(??n2) nodes, is then constructed using these distances. The ancestors and descendents of a specific virus can be traced via shortest paths in this network, which can be found in O(??n3) time. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Signal Quality Measurements for cDNA Microarray Data

    Publication Year: 2010 , Page(s): 299 - 308
    Cited by:  Papers (1)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2649 KB)  

    Concerns about the reliability of expression data from microarrays inspire ongoing research into measurement error in these experiments. Error arises at both the technical level within the laboratory and the experimental level. In this paper, we will focus on estimating the spot-specific error, as there are few currently available models. This paper outlines two different approaches to quantify the reliability of spot-specific intensity estimates. In both cases, the spatial correlation between pixels and its impact on spot quality is accounted for. The first method is a straightforward parametric estimate of within-spot variance that assumes a Gaussian distribution and accounts for spatial correlation via an overdispersion factor. The second method employs a nonparametric quality estimate referred to throughout as the mean square prediction error (MSPE). The MSPE first smoothes a pixel region and then measures the difference between actual pixel values and the smoother. Both methods herein are compared for real and simulated data to assess numerical characteristics and the ability to describe poor spot quality. We conclude that both approaches capture noise in the microarray platform and highlight situations where one method or the other is superior. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Alignments of RNA Structures

    Publication Year: 2010 , Page(s): 309 - 322
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3403 KB) |  | HTML iconHTML  

    We describe a theoretical unifying framework to express the comparison of RNA structures, which we call alignment hierarchy. This framework relies on the definition of common supersequences for arc-annotated sequences and encompasses the main existing models for RNA structure comparison based on trees and arc-annotated sequences with a variety of edit operations. It also gives rise to edit models that have not been studied yet. We provide a thorough analysis of the alignment hierarchy, including a new polynomial-time algorithm and an NP-completeness proof. The polynomial-time algorithm involves biologically relevant edit operations such as pairing or unpairing nucleotides. It has been implemented in a software, called gardenia, which is available at the Web server http://bioinfo.lifl.fr/RNA/gardenia. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Approximation Algorithms for Predicting RNA Secondary Structures with Arbitrary Pseudoknots

    Publication Year: 2010 , Page(s): 323 - 332
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (659 KB)  

    We study three closely related problems motivated by the prediction of RNA secondary structures with arbitrary pseudoknots: the problem 2-lnterval Pattern proposed by Vialette, the problem Maximum Base Pair Stackings proposed by Leong et al., and the problem Maximum Stacking Base Pairs proposed by Lyngso. For the 2-lnterval Pattern, we present polynomial-time approximation algorithms for the problem over the preceding-and-crossing model and on input with the unitary restriction. For Maximum Base Pair Stackings and Maximum Stacking Base Pairs, we present polynomial-time approximation algorithms for the two problems on explicit input of candidate base pairs. We also propose a new problem called Length-Weighted Balanced 2-lnterval Pattern, which is natural in the context of RNA secondary structure prediction. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast Hinge Detection Algorithms for Flexible Protein Structures

    Publication Year: 2010 , Page(s): 333 - 341
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3843 KB) |  | HTML iconHTML  

    Analysis of conformational changes is one of the keys to the understanding of protein functions and interactions. For the analysis, we often compare two protein structures, taking flexible regions like hinge regions into consideration. The Root Mean Square Deviation (RMSD) is the most popular measure for comparing two protein structures, but it is only for rigid structures without hinge regions. In this paper, we propose a new measure called RMSD considering hinges (RMSDh) and its variant RMSDh(k) for comparing two flexible proteins with hinge regions. We also propose novel efficient algorithms for computing them, which can detect the hinge positions at the same time. The RMSDh is suitable for cases where there is one small hinge region in each of the two target structures. The new algorithm for computing the RMSDh runs in linear time, which is the same as the time complexity for computing the RMSD and is faster than any of previous algorithms for hinge detection. The RMSDh(k) is designed for comparing structures with more than one hinge region. The RMSDh(k) measure considers at most k small hinge region, i.e., the RMSDh(k) value should be small if the two structures are similar except for at most k hinge regions. To compute the value, we propose an O(kn2)-time and O(n)-space algorithm based on a new dynamic programming technique. With the same computational time and space, we can enumerate the predicted hinge positions. We also test our algorithms against actual flexible protein structures, and show that the hinge positions can be correctly detected by our algorithms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fixed-Parameter Tractability of the Maximum Agreement Supertree Problem

    Publication Year: 2010 , Page(s): 342 - 353
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1563 KB)  

    Given a set L of labels and a collection of rooted trees whose leaves are bijectively labeled by some elements of L, the Maximum Agreement Supertree (SMAST) problem is given as follows: find a tree T on a largest label set L' ?? L that homeomorphically contains every input tree restricted to L'. The problem has phylogenetic applications to infer supertrees and perform tree congruence analyses. In this paper, we focus on the parameterized complexity of this NP-hard problem, considering different combinations of parameters as well as particular cases. We show that SMAST on k rooted binary trees on a label set of size n can be solved in O((8n)k) time, which is an improvement with respect to the previously known O(n3k2) time algorithm. In this case, we also give an O((2k)pkn2) time algorithm, where p is an upper bound on the number of leaves of L missing in a SMAST solution. This shows that SMAST can be solved efficiently when the input trees are mostly congruent. Then, for the particular case where any triple of leaves is contained in at least one input tree, we give O(4pn3) and O(3.12p + n4) time algorithms, obtaining the first fixed-parameter tractable algorithms on a single parameter for this problem. We also obtain intractability results for several combinations of parameters, thus indicating that it is unlikely that fixed-parameter tractable algorithms can be found in these particular cases. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Modeling Protein Interacting Groups by Quasi-Bicliques: Complexity, Algorithm, and Application

    Publication Year: 2010 , Page(s): 354 - 364
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2577 KB) |  | HTML iconHTML  

    Protein-protein interactions (PPIs) are one of the most important mechanisms in cellular processes. To model protein interaction sites, recent studies have suggested to find interacting protein group pairs from large PPI networks at the first step and then to search conserved motifs within the protein groups to form interacting motif pairs. To consider the noise effect and the incompleteness of biological data, we propose to use quasi-bicliquesior finding interacting protein group pairs. We investigate two new problems that arise from finding interacting protein group pairs: the maximum vertex quasi-biclique problem and the maximum balanced quasi-biclique problem. We prove that both problems are NP-hard. This is a surprising result as the widely known maximum vertex biclique problem is polynomial time solvable [1]. We then propose a heuristic algorithm that uses the greedy method to find the quasi-bicliques from PPI networks. Our experiment results on real data show that this algorithm has a better performance than a benchmark algorithm for identifying highly matched BLOCKS and PRINTS motifs. We also report results of two case studies on interacting motif pairs that map well with two interacting domain pairs in iPfam. Availability: The software and supplementary information are available at http://www.cs.cityu.edu.hk/~lwang/software/ppi/index.html. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Sorting Genomes by Reciprocal Translocations, Insertions, and Deletions

    Publication Year: 2010 , Page(s): 365 - 374
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (661 KB) |  | HTML iconHTML  

    The problem of sorting by reciprocal translocations (abbreviated as SBT) arises from the field of comparative genomics, which is to find a shortest sequence of reciprocal translocations that transforms one genome ?? into another genome ??, with the restriction that ?? and ?? contain the same genes. SBT has been proved to be polynomial-time solvable, and several polynomial algorithms have been developed. In this paper, we show how to extend Bergeron's SBT algorithm to include insertions and deletions, allowing to compare genomes containing different genes. In particular, if the gene set of ?? is a subset (or superset, respectively) of the gene set of ??, we present an approximation algorithm for transforming ?? into ?? by reciprocal translocations and deletions (insertions, respectively), providing a sorting sequence with length at most OPT + 2, where OPT is the minimum number of translocations and deletions (insertions, respectively) needed to transform ?? into ??; if ?? and ?? have different genes but not containing each other, we give a heuristic to transform ?? into ?? by a shortest sequence of reciprocal translocations, insertions, and deletions, with bounds for the length of the sorting sequence it outputs. At a conceptual level, there is some similarity between our algorithm and the algorithm developed by El Mabrouk which is used to sort two chromosomes with different gene contents by reversals, insertions, and deletions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Linear Separability of Gene Expression Data Sets

    Publication Year: 2010 , Page(s): 375 - 381
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1119 KB) |  | HTML iconHTML  

    We study simple geometric properties of gene expression data sets, where samples are taken from two distinct classes (e.g., two types of cancer). Specifically, the problem of linear separability for pairs of genes is investigated. If a pair of genes exhibits linear separation with respect to the two classes, then the joint expression level of the two genes is strongly correlated to the phenomena of the sample being taken from one class or the other. This may indicate an underlying molecular mechanism relating the two genes and the phenomena(e.g., a specific cancer). We developed and implemented novel efficient algorithmic tools for finding all pairs of genes that induce a linear separation of the two sample classes. These tools are based on computational geometric properties and were applied to 10 publicly available cancer data sets. For each data set, we computed the number of actual separating pairs and compared it to an upper bound on the number expected by chance and to the numbers resulting from shuffling the labels of the data at random empirically. Seven out of these 10 data sets are highly separable. Statistically, this phenomenon is highly significant, very unlikely to occur at random. It is therefore reasonable to expect that it manifests a functional association between separating genes and the underlying phenotypic classes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Computer Society Career Center

    Publication Year: 2010 , Page(s): 382
    Save to Project icon | Request Permissions | PDF file iconPDF (312 KB)  
    Freely Available from IEEE
  • IEEE Computer Society CSDA Certification [advertisement]

    Publication Year: 2010 , Page(s): 383
    Save to Project icon | Request Permissions | PDF file iconPDF (329 KB)  
    Freely Available from IEEE
  • IEEE CS Press Too Soon to Tell

    Publication Year: 2010 , Page(s): 384
    Save to Project icon | Request Permissions | PDF file iconPDF (363 KB)  
    Freely Available from IEEE
  • Fixed-Parameter Tractability of the Maximum Agreement Supertree Problem

    Publication Year: 2010 , Page(s): 342 - 353
    Save to Project icon | Request Permissions | PDF file iconPDF (1563 KB)  
    Freely Available from IEEE

Aims & Scope

This bimonthly publishes archival research results related to the algorithmic, mathematical, statistical, and computational methods that are central in bioinformatics and computational biology.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Ying Xu
University of Georgia
xyn@bmb.uga.edu

Associate Editor-in-Chief
Dong Xu
University of Missouri
xudong@missouri.edu