By Topic

Computational Biology and Bioinformatics, IEEE/ACM Transactions on

Issue 6 • Date Nov.-Dec. 2011

Filter Results

Displaying Results 1 - 25 of 33
  • [Front cover]

    Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (366 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (103 KB)  
    Freely Available from IEEE
  • A Shape Descriptor for Fast Complementarity Matching in Molecular Docking

    Page(s): 1441 - 1457
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3634 KB) |  | HTML iconHTML  

    This paper presents a novel approach for fast rigid docking of proteins based on geometric complementarity. After extraction of the 3D molecular surface, a set of local surface patches is generated based on the local surface curvature. The shape complementarity between a pair of patches is calculated using an efficient shape descriptor, the Shape Impact Descriptor. The key property of the Shape Impact Descriptor is its rotation invariance, which obviates the need for taking an exhaustive set of rotations for each pair of patches. Thus, complementarity matching between two patches is reduced to a simple histogram matching. Finally, a condensed set of almost complementary pairs of surface patches is supplied as input to the final scoring step, where each pose is evaluated using a 3D distance grid. The experimental results prove that the proposed method demonstrates superior performance over other well-known geometry-based, rigid-docking approaches. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Algebraic Spline Model of Molecular Surfaces for Energetic Computations

    Page(s): 1458 - 1467
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1304 KB) |  | HTML iconHTML  

    In this paper, we describe a new method to generate a smooth algebraic spline (AS) approximation of the molecular surface (MS) based on an initial coarse triangulation derived from the atomic coordinate information of the biomolecule, resident in the Protein data bank (PDB). Our method first constructs a triangular prism scaffold covering the PDB structure, and then generates a piecewise polynomial F on the Bernstein-Bezier (BB) basis within the scaffold. An ASMS model of the molecular surface is extracted as the zero contours of F, which is nearly C1 and has dual implicit and parametric representations. The dual representations allow us easily do the point sampling on the ASMS model and apply it to the accurate estimation of the integrals involved in the electrostatic solvation energy computations. Meanwhile comparing with the trivial piecewise linear surface model, fewer number of sampling points are needed for the ASMS, which effectively reduces the complexity of the energy estimation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analysis of the Free Energy in a Stochastic RNA Secondary Structure Model

    Page(s): 1468 - 1482
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1256 KB) |  | HTML iconHTML  

    There are two custom ways for predicting RNA secondary structures: minimizing the free energy of a conformation according to a thermodynamic model and maximizing the probability of a folding according to a stochastic model. In most cases, stochastic grammars are used for the latter alternative applying the maximum likelihood principle for determining a grammar's probabilities. In this paper, building on such a stochastic model, we will analyze the expected minimum free energy of an RNA molecule according to Turner's energy rules. Even if the parameters of our grammar are chosen with respect to structural properties of native molecules only (and therefore, independent of molecules' free energy), we prove formulae for the expected minimum free energy and the corresponding variance as functions of the molecule's size which perfectly fit the native behavior of free energies. This gives proof for a high quality of our stochastic model making it a handy tool for further investigations. In fact, the stochastic model for RNA secondary structures presented in this work has, for example, been used as the basis of a new algorithm for the (nonuniform) generation of random RNA secondary structures. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Antibody-Specified B-Cell Epitope Prediction in Line with the Principle of Context-Awareness

    Page(s): 1483 - 1494
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1857 KB) |  | HTML iconHTML  

    Context-awareness is a characteristic in the recognition between antigens and antibodies, highlighting the reconfiguration of epitope residues when an antigen interacts with a different antibody. A coarse binary classification of antigen regions into epitopes, or nonepitopes without specifying antibodies may not accurately reflect this biological reality. Therefore, we study an antibody-specified epitope prediction problem in line with this principle. This problem is new and challenging as we pinpoint a subset of the antigenic residues from an antigen when it binds to a specific antibody. We introduce two kinds of associations of the contextual awareness: 1) residues-residues pairing preference, and 2) the dependence between sets of contact residue pairs. Preference plays a bridging role to link interacting paratope and epitope residues while dependence is used to extend the association from one-dimension to two-dimension. The paratope/epitope residues' relative composition, cooperativity ratios, and Markov properties are also utilized to enhance our method. A nonredundant data set containing 80 antibody-antigen complexes is compiled and used in the evaluation. The results show that our method yields a good performance on antibody-specified epitope prediction. On the traditional antibody-ignored epitope prediction problem, a simplified version of our method can produce a competitive, sometimes much better, performance in comparison with three structure-based predictors. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Classification of GPCRs Using Family Specific Motifs

    Page(s): 1495 - 1508
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1556 KB) |  | HTML iconHTML  

    The classification of G-Protein Coupled Receptor (GPCR) sequences is an important problem that arises from the need to close the gap between the large number of orphan receptors and the relatively small number of annotated receptors. Equally important is the characterization of GPCR Class A subfamilies and gaining insight into the ligand interaction since GPCR Class A encompasses a very large number of drug-targeted receptors. In this work, we propose a method for Class A subfamily classification using sequence-derived motifs which characterizes the subfamilies by discovering receptor-ligand interaction sites. The motifs that best characterize a subfamily are selected by the Distinguishing Power Evaluation (DPE) technique we propose. The experiments performed on GPCR sequence databases show that our method outperforms state-of-the-art classification techniques for GPCR Class A subfamily prediction. An important contribution of our work is to discover key receptor-ligand interaction sites which is very important for drug design. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Contour Extraction of Drosophila Embryos

    Page(s): 1509 - 1521
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1480 KB) |  | HTML iconHTML  

    Contour extraction of Drosophila (fruit fly) embryos is an important step to build a computational system for matching expression pattern of embryonic images to assist the discovery of the nature of genes. Automatic contour extraction of embryos is challenging due to severe image variations, including 1) the size, orientation, shape, and appearance of an embryo of interest; 2) the neighboring context of an embryo of interest (such as nontouching and touching neighboring embryos); and 3) illumination circumstance. In this paper, we propose an automatic framework for contour extraction of the embryo of interest in an embryonic image. The proposed framework contains three components. Its first component applies a mixture model of quadratic curves, with statistical features, to initialize the contour of the embryo of interest. An efficient method based on imbalanced image points is proposed to compute model parameters. The second component applies active contour model to refine embryo contours. The third component applies eigen-shape modeling to smooth jaggy contours caused by blurred embryo boundaries. We test the proposed framework on a data set of 8,000 embryonic images, and achieve promising accuracy (88 percent), that is, substantially higher than the-state-of-the-art results. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast Kernel Discriminant Analysis for Classification of Liver Cancer Mass Spectra

    Page(s): 1522 - 1534
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (7678 KB) |  | HTML iconHTML  

    The classification of serum samples based on mass spectrometry (MS) has been increasingly used for monitoring disease progression and for diagnosing early disease. However, the classification task in mass spectrometry data is extremely challenging due to the very huge size of peaks (features) on mass spectra. Linear discriminant analysis (LDA) has been widely used for dimension reduction and feature extraction in many applications. However, the conversional LDA suffers from the singularity problem when dealing with high-dimensional features. Another critical limitation is its linearity property which results in failing in classification problems over nonlinearly clustered data sets. To overcome such problems, we develop a new fast kernel discriminant analysis (FKDA) that is pretty fast in the calculation of optimal discriminant vectors. FKDA is applied to the classification of liver cancer mass spectrometry data that consist of three categories: hepatocellular carcinoma, cirrhosis, and healthy that was originally analyzed by Ressom et al.. We demonstrate the superiority and effectiveness of FKDA when compared to other classification techniques. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Function Annotation for Pseudoknot Using Structure Similarity

    Page(s): 1535 - 1544
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1165 KB) |  | HTML iconHTML  

    Many raw biological sequence data have been generated by the human genome project and related efforts. The understanding of structural information encoded by biological sequences is important to acquire knowledge of their biochemical functions but remains a fundamental challenge. Recent interest in RNA regulation has resulted in a rapid growth of deposited RNA secondary structures in varied databases. However, a functional classification and characterization of the RNA structure have only been partially addressed. This article aims to introduce a novel interval-based distance metric for structure-based RNA function assignment. The characterization of RNA structures relies on distance vectors learned from a collection of predicted structures. The distance measure considers the intersected, disjoint, and inclusion between intervals. A set of RNA pseudoknotted structures with known function are applied and the function of the query structure is determined by measuring structure similarity. This not only offers sequence distance criteria to measure the similarity of secondary structures but also aids the functional classification of RNA structures with pesudoknots. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High Performance Hybrid Functional Petri Net Simulations of Biological Pathway Models on CUDA

    Page(s): 1545 - 1556
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (852 KB) |  | HTML iconHTML  

    Hybrid functional Petri nets are a wide-spread tool for representing and simulating biological models. Due to their potential of providing virtual drug testing environments, biological simulations have a growing impact on pharmaceutical research. Continuous research advancements in biology and medicine lead to exponentially increasing simulation times, thus raising the demand for performance accelerations by efficient and inexpensive parallel computation solutions. Recent developments in the field of general-purpose computation on graphics processing units (GPGPU) enabled the scientific community to port a variety of compute intensive algorithms onto the graphics processing unit (GPU). This work presents the first scheme for mapping biological hybrid functional Petri net models, which can handle both discrete and continuous entities, onto compute unified device architecture (CUDA) enabled GPUs. GPU accelerated simulations are observed to run up to 18 times faster than sequential implementations. Simulating the cell boundary formation by Delta-Notch signaling on a CUDA enabled GPU results in a speedup of approximately 7{times} for a model containing 1,600 cells. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hilbert-Huang Transform for Analysis of Heart Rate Variability in Cardiac Health

    Page(s): 1557 - 1567
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (955 KB) |  | HTML iconHTML  

    This paper introduces a modified technique based on Hilbert-Huang transform (HHT) to improve the spectrum estimates of heart rate variability (HRV). In order to make the beat-to-beat (RR) interval be a function of time and produce an evenly sampled time series, we first adopt a preprocessing method to interpolate and resample the original RR interval. Then, the HHT, which is based on the empirical mode decomposition (EMD) approach to decompose the HRV signal into several monocomponent signals that become analytic signals by means of Hilbert transform, is proposed to extract the features of preprocessed time series and to characterize the dynamic behaviors of parasympathetic and sympathetic nervous system of heart. At last, the frequency behaviors of the Hilbert spectrum and Hilbert marginal spectrum (HMS) are studied to estimate the spectral traits of HRV signals. In this paper, two kinds of experiment data are used to compare our method with the conventional power spectral density (PSD) estimation. The analysis results of the simulated HRV series show that interpolation and resampling are basic requirements for HRV data processing, and HMS is superior to PSD estimation. On the other hand, in order to further prove the superiority of our approach, real HRV signals are collected from seven young health subjects under the condition that autonomic nervous system (ANS) is blocked by certain acute selective blocking drugs: atropine and metoprolol. The high-frequency power/total power ratio and low-frequency power/high-frequency power ratio indicate that compared with the Fourier spectrum based on principal dynamic mode, our method is more sensitive and effective to identify the low-frequency and high-frequency bands of HRV. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Integrated Analysis of Gene Expression and Copy Number Data on Gene Shaving Using Independent Component Analysis

    Page(s): 1568 - 1579
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4433 KB) |  | HTML iconHTML  

    DNA microarray gene expression and microarray-based comparative genomic hybridization (aCGH) have been widely used for biomedical discovery. Because of the large number of genes and the complex nature of biological networks, various analysis methods have been proposed. One such method is "gene shaving,” a procedure which identifies subsets of the genes with coherent expression patterns and large variation across samples. Since combining genomic information from multiple sources can improve classification and prediction of diseases, in this paper we proposed a new method, "ICA gene shaving” (ICA, independent component analysis), for jointly analyzing gene expression and copy number data. First we used ICA to analyze joint measurements, gene expression and copy number, of a biological system and project the data onto statistically independent biological processes. Next, we used these results to identify patterns of variation in the data and then applied an iterative shaving method. We investigated the properties of our proposed method by analyzing both simulated and real data. We demonstrated that the robustness of our method to noise using simulated data. Using breast cancer data, we showed that our method is superior to the Generalized Singular Value Decomposition (GSVD) gene shaving method for identifying genes associated with breast cancer. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Methods for Identifying SNP Interactions: A Review on Variations of Logic Regression, Random Forest and Bayesian Logistic Regression

    Page(s): 1580 - 1591
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (624 KB) |  | HTML iconHTML  

    Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Molecular Pattern Discovery Based on Penalized Matrix Decomposition

    Page(s): 1592 - 1603
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1253 KB) |  | HTML iconHTML  

    A reliable and precise identification of the type of tumors is crucial to the effective treatment of cancer. With the rapid development of microarray technologies, tumor clustering based on gene expression data is becoming a powerful approach to cancer class discovery. In this paper, we apply the penalized matrix decomposition (PMD) to gene expression data to extract metasamples for clustering. The extracted metasamples capture the inherent structures of samples belong to the same class. At the same time, the PMD factors of a sample over the metasamples can be used as its class indicator in return. Compared with the conventional methods such as hierarchical clustering (HC), self-organizing maps (SOM), affinity propagation (AP) and nonnegative matrix factorization (NMF), the proposed method can identify the samples with complex classes. Moreover, the factor of PMD can be used as an index to determine the cluster number. The proposed method provides a reasonable explanation of the inconsistent classifications made by the conventional methods. In addition, it is able to discover the modules in gene expression data of conterminous developmental stages. Experiments on two representative problems show that the proposed PMD-based method is very promising to discover biological phenotypes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Nonparametric Clustering for Studying RNA Conformations

    Page(s): 1604 - 1619
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2211 KB) |  | HTML iconHTML  

    The local conformation of RNA molecules is an important factor in determining their catalytic and binding properties. The analysis of such conformations is particularly difficult due to the large number of degrees of freedom, such as the measured torsion angles per residue and the interatomic distances among interacting residues. In this work, we use a nearest-neighbor search method based on the statistical mechanical Potts model to find clusters in the RNA conformational space. The proposed technique is mostly automatic and may be applied to problems, where there is no prior knowledge on the structure of the data space in contrast to many other clustering techniques. Results are reported for both single residue conformations, where the parameter set of the data space includes four to seven torsional angles, and base pair geometries, where the data space is reduced to two dimensions. Moreover, new results are reported for base stacking geometries. For the first two cases, i.e., single residue conformations and base pair geometries, we get a very good match between the results of the proposed clustering method and the known classifications with only few exceptions. For the case of base stacking geometries, we validate our classification with respect to geometrical constraints and describe the content, and the geometry of the new clusters. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On Lattice Protein Structure Prediction Revisited

    Page(s): 1620 - 1632
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1303 KB) |  | HTML iconHTML  

    Protein structure prediction is regarded as a highly challenging problem both for the biology and for the computational communities. In recent years, many approaches have been developed, moving to increasingly complex lattice models and off-lattice models. This paper presents a Large Neighborhood Search (LNS) to find the native state for the Hydrophobic-Polar (HP) model on the Face-Centered Cubic (FCC) lattice or, in other words, a self-avoiding walk on the FCC lattice having a maximum number of H-H contacts. The algorithm starts with a tabu-search algorithm, whose solution is then improved by a combination of constraint programming and LNS. The flexible framework of this hybrid algorithm allows an adaptation to the Miyazawa-Jernigan contact potential, in place of the HP model, thus suggesting its potential for tertiary structure prediction. Benchmarking statistics are given for our method against the hydrophobic core threading program HPstruct, an exact method which can be viewed as complementary to our method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Recipe for uncovering predictive genes using support vector machines based on model population analysis

    Page(s): 1633 - 1641
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1102 KB) |  | HTML iconHTML  

    Selecting a small number of informative genes for microarray-based tumor classification is central to cancer prediction and treatment. Based on model population analysis, here we present a new approach, called Margin Influence Analysis (MIA), designed to work with support vector machines (SVM) for selecting informative genes. The rationale for performing margin influence analysis lies in the fact that the margin of support vector machines is an important factor which underlies the generalization performance of SVM models. Briefly, MIA could reveal genes which have statistically significant influence on the margin by using Mann-Whitney U test. The reason for using the Mann-Whitney U test rather than two-sample t test is that Mann-Whitney U test is a nonparametric test method without any distribution-related assumptions and is also a robust method. Using two publicly available cancerous microarray data sets, it is demonstrated that MIA could typically select a small number of margin-influencing genes and further achieves comparable classification accuracy compared to those reported in the literature. The distinguished features and outstanding performance may make MIA a good alternative for gene selection of high dimensional microarray data. (The source code in MATLAB with GNU General Public License Version 2.0 is freely available at http://code.google.eom/p/mia2009/). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Selecting Oligonucleotide Probes for Whole-Genome Tiling Arrays with a Cross-Hybridization Potential

    Page(s): 1642 - 1652
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1089 KB) |  | HTML iconHTML  

    For designing oligonucleotide tiling arrays popular, current methods still rely on simple criteria like Hamming distance or longest common factors, neglecting base stacking effects which strongly contribute to binding energies. Consequently, probes are often prone to cross-hybridization which reduces the signal-to-noise ratio and complicates downstream analysis. We propose the first computationally efficient method using hybridization energy to identify specific oligonucleotide probes. Our Cross-Hybridization Potential (CHP) is computed with a Nearest Neighbor Alignment, which efficiently estimates a lower bound for the Gibbs free energy of the duplex formed by two DNA sequences of bounded length. It is derived from our simplified reformulation of t-gap insertion-deletion-like metrics. The computations are accelerated by a filter using weighted ungapped q-grams to arrive at seeds. The computation of the CHP is implemented in our software OSProbes, available under the GPL, which computes sets of viable probe candidates. The user can choose a trade-off between running time and quality of probes selected. We obtain very favorable results in comparison with prior approaches with respect to specificity and sensitivity for cross-hybridization and genome coverage with high-specificity probes. The combination of OSProbes and our Tileomatic method, which computes optimal tiling paths from candidate sets, yields globally optimal tiling arrays, balancing probe distance, hybridization conditions, and uniqueness of hybridization. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Superposition and Alignment of Labeled Point Clouds

    Page(s): 1653 - 1666
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (981 KB) |  | HTML iconHTML  

    Geometric objects are often represented approximately in terms of a finite set of points in three-dimensional euclidean space. In this paper, we extend this representation to what we call labeled point clouds. A labeled point cloud is a finite set of points, where each point is not only associated with a position in three-dimensional space, but also with a discrete class label that represents a specific property. This type of model is especially suitable for modeling biomolecules such as proteins and protein binding sites, where a label may represent an atom type or a physico-chemical property. Proceeding from this representation, we address the question of how to compare two labeled points clouds in terms of their similarity. Using fuzzy modeling techniques, we develop a suitable similarity measure as well as an efficient evolutionary algorithm to compute it. Moreover, we consider the problem of establishing an alignment of the structures in the sense of a one-to-one correspondence between their basic constituents. From a biological point of view, alignments of this kind are of great interest, since mutually corresponding molecular constituents offer important information about evolution and heredity, and can also serve as a means to explain a degree of similarity. In this paper, we therefore develop a method for computing pairwise or multiple alignments of labeled point clouds. To this end, we proceed from an optimal superposition of the corresponding point clouds and construct an alignment which is as much as possible in agreement with the neighborhood structure established by this superposition. We apply our methods to the structural analysis of protein binding sites. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Co-evolution Is Incompatible with the Markov Assumption in Phylogenetics

    Page(s): 1667 - 1670
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (525 KB) |  | HTML iconHTML  

    Markov models are extensively used in the analysis of molecular evolution. A recent line of research suggests that pairs of proteins with functional and physical interactions co-evolve with each other. Here, by analyzing hundreds of orthologous sets of three fungi and their co-evolutionary relations, we demonstrate that co-evolutionary assumption may violate the Markov assumption. Our results encourage developing alternative probabilistic models for the cases of extreme co-evolution. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Combined Feature Selection and Cancer Prognosis Using Support Vector Machine Regression

    Page(s): 1671 - 1677
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (606 KB) |  | HTML iconHTML  

    Prognostic prediction is important in medical domain, because it can be used to select an appropriate treatment for a patient by predicting the patient's clinical outcomes. For high-dimensional data, a normal prognostic method undergoes two steps: feature selection and prognosis analysis. Recently, the L1-L2-norm Support Vector Machine (L1-L2 SVM) has been developed as an effective classification technique and shown good classification performance with automatic feature selection. In this paper, we extend L1-L2 SVM for regression analysis with automatic feature selection. We further improve the L1-L2 SVM for prognostic prediction by utilizing the information of censored data as constraints. We design an efficient solution to the new optimization problem. The proposed method is compared with other seven prognostic prediction methods on three real-world data sets. The experimental results show that the proposed method performs consistently better than the medium performance. It is more efficient than other algorithms with the similar performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • CUDA-BLASTP: Accelerating BLASTP on CUDA-Enabled Graphics Hardware

    Page(s): 1678 - 1684
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1112 KB) |  | HTML iconHTML  

    Scanning protein sequence database is an often repeated task in computational biology and bioinformatics. However, scanning large protein databases, such as GenBank, with popular tools such as BLASTP requires long runtimes on sequential architectures. Due to the continuing rapid growth of sequence databases, there is a high demand to accelerate this task. In this paper, we demonstrate how GPUs, powered by the Compute Unified Device Architecture (CUDA), can be used as an efficient computational platform to accelerate the BLASTP algorithm. In order to exploit the GPU's capabilities for accelerating BLASTP, we have used a compressed deterministic finite state automaton for hit detection as well as a hybrid parallelization scheme. Our implementation achieves speedups up to 10.0 on an NVIDIA GeForce GTX 295 GPU compared to the sequential NCBI BLASTP 2.2.22. CUDA-BLASTP source code which is available at https://sites.google.com/site/liuweiguohome/software. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • From Gene Trees to Species Trees II: Species Tree Inference by Minimizing Deep Coalescence Events

    Page(s): 1685 - 1691
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (408 KB) |  | HTML iconHTML  

    When gene copies are sampled from various species, the resulting gene tree might disagree with the containing species tree. The primary causes of gene tree and species tree discord include incomplete lineage sorting, horizontal gene transfer, and gene duplication and loss. Each of these events yields a different parsimony criterion for inferring the (containing) species tree from gene trees. With incomplete lineage sorting, species tree inference is to find the tree minimizing extra gene lineages that had to coexist along species lineages; with gene duplication, it becomes to find the tree minimizing gene duplications and/or losses. In this paper, we present the following results: 1) The deep coalescence cost is equal to the number of gene losses minus two times the gene duplication cost in the reconciliation of a uniquely leaf labeled gene tree and a species tree. The deep coalescence cost can be computed in linear time for any arbitrary gene tree and species tree. 2) The deep coalescence cost is always not less than the gene duplication cost in the reconciliation of an arbitrary gene tree and a species tree. 3) Species tree inference by minimizing deep coalescence events is NP-hard. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Haplotype Inference Constrained by Plausible Haplotype Data

    Page(s): 1692 - 1699
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (204 KB) |  | HTML iconHTML  

    The haplotype inference problem (HIP) asks to find a set of haplotypes which resolve a given set of genotypes. This problem is important in practical fields such as the investigation of diseases or other types of genetic mutations. In order to find the haplotypes which are as close as possible to the real set of haplotypes that comprise the genotypes, two models have been suggested which are by now well-studied: The perfect phylogeny model and the pure parsimony model. All known algorithms up till now for haplotype inference may find haplotypes that are not necessarily plausible, i.e., very rare haplotypes or haplotypes that were never observed in the population. In order to overcome this disadvantage, we study in this paper, a new constrained version of HIP under the above-mentioned models. In this new version, a pool of plausible haplotypes H̃ is given together with the set of genotypes G, and the goal is to find a subset H ⊆ H̃ that resolves G. For constrained perfect phylogeny haplotyping (CPPH), we provide initial insights and polynomial-time algorithms for some restricted cases of the problem. For constrained parsimony haplotyping (CPH), we show that the problem is fixed parameter tractable when parameterized by the size of the solution set of haplotypes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

This bimonthly publishes archival research results related to the algorithmic, mathematical, statistical, and computational methods that are central in bioinformatics and computational biology.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Ying Xu
University of Georgia
xyn@bmb.uga.edu

Associate Editor-in-Chief
Dong Xu
University of Missouri
xudong@missouri.edu