Notification:
We are currently experiencing intermittent issues impacting performance. We apologize for the inconvenience.
By Topic

Computational Biology and Bioinformatics, IEEE/ACM Transactions on

Issue 2 • Date March-April 2013

Filter Results

Displaying Results 1 - 25 of 31
  • [Front Cover]

    Publication Year: 2013 , Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (1388 KB)  
    Freely Available from IEEE
  • [Front inside cover]

    Publication Year: 2013 , Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (290 KB)  
    Freely Available from IEEE
  • Guest Editorial: Advanced Algorithms of Bioinformatics

    Publication Year: 2013 , Page(s): 273
    Save to Project icon | Request Permissions | PDF file iconPDF (48 KB)  
    Freely Available from IEEE
  • Chain-RNA: A Comparative ncRNA Search Tool Based on the Two-Dimensional Chain Algorithm

    Publication Year: 2013 , Page(s): 274 - 285
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1805 KB) |  | HTML iconHTML  

    Noncoding RNA (ncRNA) identification is highly important to modern biology. The state-of-the-art method for ncRNA identification is based on comparative genomics, in which evolutionary conservations of sequences and secondary structures provide important evidence for ncRNA search. For ncRNAs with low sequence conservation but high structural similarity, conventional local alignment tools such as BLAST yield low sensitivity. Thus, there is a need for ncRNA search methods that can incorporate both sequence and structural similarities. We introduce chain-RNA, a pairwise structural alignment tool that can effectively locate cross-species conserved RNA elements with low sequence similarity. In chain-RNA, stem-loop structures are extracted from dot plots generated by an efficient local-folding algorithm. Then, we formulate stem alignment as an extended 2D chain problem and employ existing chain algorithms. Chain-RNA is tested on a data set containing annotated ncRNA homologs and is applied to novel ncRNA search in a transcriptomic data set. The experimental results show that chain-RNA has better tradeoff between sensitivity and false positive rate in ncRNA prediction than conventional sequence similarity search tools and is more time efficient than structural alignment tools. The source codes of chain-RNA can be downloaded at http://sourceforge.net/projects/chain-rna/ or at http://www.cse.msu.edu/~leijikai/chain-rna/. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Rough-Fuzzy Clustering for Grouping Functionally Similar Genes from Microarray Data

    Publication Year: 2013 , Page(s): 286 - 299
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3316 KB) |  | HTML iconHTML  

    Gene expression data clustering is one of the important tasks of functional genomics as it provides a powerful tool for studying functional relationships of genes in a biological process. Identifying coexpressed groups of genes represents the basic challenge in gene clustering problem. In this regard, a gene clustering algorithm, termed as robust rough-fuzzy c-means, is proposed judiciously integrating the merits of rough sets and fuzzy sets. While the concept of lower and upper approximations of rough sets deals with uncertainty, vagueness, and incompleteness in cluster definition, the integration of probabilistic and possibilistic memberships of fuzzy sets enables efficient handling of overlapping partitions in noisy environment. The concept of possibilistic lower bound and probabilistic boundary of a cluster, introduced in robust rough-fuzzy c-means, enables efficient selection of gene clusters. An efficient method is proposed to select initial prototypes of different gene clusters, which enables the proposed c-means algorithm to converge to an optimum or near optimum solutions and helps to discover coexpressed gene clusters. The effectiveness of the algorithm, along with a comparison with other algorithms, is demonstrated both qualitatively and quantitatively on 14 yeast microarray data sets. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Computational Reconstruction of Transcriptional Relationships from ChIP-Chip Data

    Publication Year: 2013 , Page(s): 300 - 307
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (551 KB) |  | HTML iconHTML  

    Eukaryotic gene transcription is a complex process, which requires the orchestrated recruitment of a large number of proteins, such as sequence-specific DNA binding factors, chromatin remodelers and modifiers, and general transcription machinery, to regulatory regions. Previous works have shown that these regulatory proteins favor specific organizational theme along promoters. Details about how they cooperatively regulate transcriptional process, however, remain unclear. We developed a method to reconstruct a Bayesian network (BN) model representing functional relationships among various transcriptional components. The positive/negative influence between these components was measured from protein binding and nucleosome occupancy data and embedded into the model. Application on S.cerevisiae ChIP-Chip data showed that the proposed method can recover confirmed relationships, such as Isw1-Pol II, TFIIH-Pol II, TFIIB-TBP, Pol II-H3K36Me3, H3K4Me3-H3K14Ac, etc. Moreover, it can distinguish colocating components from functionally related ones. Novel relationships, e.g., ones between Mediator and chromatin remodeling complexes (CRCs), and the combinatorial regulation of Pol II recruitment and activity by CRCs and general transcription factors (GTFs), were also suggested. Conclusion: protein binding events during transcription positively influence each other. Among contributing components, GTFs and CRCs play pivotal roles in transcriptional regulation. These findings provide insights into the regulatory mechanism. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Guest Editors' Introduction to the Special Section on Computational Methods in Systems Biology

    Publication Year: 2013 , Page(s): 308 - 309
    Save to Project icon | Request Permissions | PDF file iconPDF (62 KB)  
    Freely Available from IEEE
  • The Propagation Approach for Computing Biochemical Reaction Networks

    Publication Year: 2013 , Page(s): 310 - 322
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (498 KB) |  | HTML iconHTML  

    We introduce propagation models (PMs), a formalism able to express several kinds of equations that describe the behavior of biochemical reaction networks. Furthermore, we introduce the propagation abstract data type (PADT), which separates concerns regarding different numerical algorithms for the transient analysis of biochemical reaction networks from concerns regarding their implementation, thus allowing for portable and efficient solutions. The state of a propagation abstract data type is given by a vector that assigns mass values to a set of nodes, and its next operator propagates mass values through this set of nodes. We propose an approximate implementation of the next operator, based on threshold abstraction, which propagates only “significant” mass values and thus achieves a compromise between efficiency and accuracy. Finally, we give three use cases for propagation models: the chemical master equation (CME), the reaction rate equation (RRE), and a hybrid method that combines these two equations. These three applications use propagation models in order to propagate probabilities and/or expected values and variances of the model's variables. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Curvature Analysis of Cardiac Excitation Wavefronts

    Publication Year: 2013 , Page(s): 323 - 336
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2955 KB) |  | HTML iconHTML  

    We present the Spiral Classification Algorithm (SCA), a fast and accurate algorithm for classifying electrical spiral waves and their associated breakup in cardiac tissues. The classification performed by SCA is an essential component of the detection and analysis of various cardiac arrhythmic disorders, including ventricular tachycardia and fibrillation. Given a digitized frame of a propagating wave, SCA constructs a highly accurate representation of the front and the back of the wave, piecewise interpolates this representation with cubic splines, and subjects the result to an accurate curvature analysis. This analysis is more comprehensive than methods based on spiral-tip tracking, as it considers the entire wave front and back. To increase the smoothness of the resulting symbolic representation, the SCA uses weighted overlapping of adjacent segments which increases the smoothness at join points. SCA has been applied to a number of representative types of spiral waves, and, for each type, a distinct curvature evolution in time (signature) has been identified. Distinct signatures have also been identified for spiral breakup. These results represent a significant first step in automatically determining parameter ranges for which a computational cardiac-cell network accurately reproduces a particular kind of cardiac arrhythmia, such as ventricular fibrillation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multiscale Modeling and Analysis of Planar Cell Polarity in the Drosophila Wing

    Publication Year: 2013 , Page(s): 337 - 351
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2186 KB) |  | HTML iconHTML  

    Modeling across multiple scales is a current challenge in Systems Biology, especially when applied to multicellular organisms. In this paper, we present an approach to model at different spatial scales, using the new concept of Hierarchically Colored Petri Nets (HCPN). We apply HCPN to model a tissue comprising multiple cells hexagonally packed in a honeycomb formation in order to describe the phenomenon of Planar Cell Polarity (PCP) signaling in Drosophila wing. We have constructed a family of related models, permitting different hypotheses to be explored regarding the mechanisms underlying PCP. In addition our models include the effect of well-studied genetic mutations. We have applied a set of analytical techniques including clustering and model checking over time series of primary and secondary data. Our models support the interpretation of biological observations reported in the literature. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Graph-Theoretical Approach to the Selection of the Minimum Tiling Path from a Physical Map

    Publication Year: 2013 , Page(s): 352 - 360
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (665 KB) |  | HTML iconHTML  

    The problem of computing the minimum tiling path (MTP) from a set of clones arranged in a physical map is a cornerstone of hierarchical (clone-by-clone) genome sequencing projects. We formulate this problem in a graph theoretical framework, and then solve by a combination of minimum hitting set and minimum spanning tree algorithms. The tool implementing this strategy, called FMTP, shows improved performance compared to the widely used software FPC. When we execute FMTP and FPC on the same physical map, the MTP produced by FMTP covers a higher portion of the genome, and uses a smaller number of clones. For instance, on the rice genome the MTP produced by our tool would reduce by about 11 percent the cost of a clone-by-clone sequencing project. Source code, benchmark data sets, and documentation of FMTP are freely available at http://code.google.com/p/fingerprint-basedminimal-tiling-path/ under MIT license. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evaluation of Breast Cancer Susceptibility Using Improved Genetic Algorithms to Generate Genotype SNP Barcodes

    Publication Year: 2013 , Page(s): 361 - 371
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2560 KB) |  | HTML iconHTML  

    Genetic association is a challenging task for the identification and characterization of genes that increase the susceptibility to common complex multifactorial diseases. To fully execute genetic studies of complex diseases, modern geneticists face the challenge of detecting interactions between loci. A genetic algorithm (GA) is developed to detect the association of genotype frequencies of cancer cases and noncancer cases based on statistical analysis. An improved genetic algorithm (IGA) is proposed to improve the reliability of the GA method for high-dimensional SNP-SNP interactions. The strategy offers the top five results to the random population process, in which they guide the GA toward a significant search course. The IGA increases the likelihood of quickly detecting the maximum ratio difference between cancer cases and noncancer cases. The study systematically evaluates the joint effect of 23 SNP combinations of six steroid hormone metabolisms, and signaling-related genes involved in breast carcinogenesis pathways were systematically evaluated, with IGA successfully detecting significant ratio differences between breast cancer cases and noncancer cases. The possible breast cancer risks were subsequently analyzed by odds-ratio (OR) and risk-ratio analysis. The estimated OR of the best SNP barcode is significantly higher than 1 (between 1.15 and 7.01) for specific combinations of two to 13 SNPs. Analysis results support that the IGA provides higher ratio difference values than the GA between breast cancer cases and noncancer cases over 3-SNP to 13-SNP interactions. A more specific SNP-SNP interaction profile for the risk of breast cancer is also provided. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • FNphasing: A Novel Fast Heuristic Algorithm for Haplotype Phasing Based on Flow Network Model

    Publication Year: 2013 , Page(s): 372 - 382
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (751 KB) |  | HTML iconHTML  

    An enormous amount of sequence data has been generated with the development of new DNA sequencing technologies, which presents great challenges for computational biology problems such as haplotype phasing. Although arduous efforts have been made to address this problem, the current methods still cannot efficiently deal with the incoming flood of large-scale data. In this paper, we propose a flow network model to tackle haplotype phasing problem, and explain some classical haplotype phasing rules based on this model. By incorporating the heuristic knowledge obtained from these classical rules, we design an algorithm FNphasing based on the flow network model. Theoretically, the time complexity of our algorithm is O (n2m+m2), which is better than that of 2SNP, one of the most efficient algorithms currently. After testing the performance of FNphasing with several simulated data sets, the experimental results show that when applied on large-scale data sets, our algorithm is significantly faster than the state-of-the-art Beagle algorithm. FNphasing also achieves an equal or superior accuracy compared with other approaches. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • GENESHIFT: A Nonparametric Approach for Integrating Microarray Gene Expression Data Based on the Inner Product as a Distance Measure between the Distributions of Genes

    Publication Year: 2013 , Page(s): 383 - 392
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (614 KB) |  | HTML iconHTML  

    The potential of microarray gene expression (MAGE) data is only partially explored due to the limited number of samples in individual studies. This limitation can be surmounted by merging or integrating data sets originating from independent MAGE experiments, which are designed to study the same biological problem. However, this process is hindered by batch effects that are study-dependent and result in random data distortion; therefore numerical transformations are needed to render the integration of different data sets accurate and meaningful. Our contribution in this paper is two-fold. First we propose GENESHIFT, a new nonparametric batch effect removal method based on two key elements from statistics: empirical density estimation and the inner product as a distance measure between two probability density functions; second we introduce a new validation index of batch effect removal methods based on the observation that samples from two independent studies drawn from a same population should exhibit similar probability density functions. We evaluated and compared the GENESHIFT method with four other state-of-the-art methods for batch effect removal: Batch-mean centering, empirical Bayes or COMBAT, distance-weighted discrimination, and cross-platform normalization. Several validation indices providing complementary information about the efficiency of batch effect removal methods have been employed in our validation framework. The results show that none of the methods clearly outperforms the others. More than that, most of the methods used for comparison perform very well with respect to some validation indices while performing very poor with respect to others. GENESHIFT exhibits robust performances and its average rank is the highest among the average ranks of all methods used for comparison. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Gene Regulation Networks in Early Phase of Duchenne Muscular Dystrophy

    Publication Year: 2013 , Page(s): 393 - 400
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1321 KB) |  | HTML iconHTML  

    The aim of this study was to analyze previously published gene expression data of skeletal muscle biopsies of Duchenne muscular dystrophy (DMD) patients and controls (gene expression omnibus database, accession #GSE6011) using systems biology approaches. We applied an unsupervised method to discriminate patient and control populations, based on principal component analysis, using the gene expressions as units and patients as variables. The genes having the highest absolute scores in the discrimination between the groups, were then analyzed in terms of gene expression networks, on the basis of their mutual correlation in the two groups. The correlation network structures suggest two different modes of gene regulation in the two groups, reminiscent of important aspects of DMD pathogenesis. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • How Many Clusters: A Validation Index for Arbitrary-Shaped Clusters

    Publication Year: 2013 , Page(s): 401 - 414
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1020 KB) |  | HTML iconHTML  

    Clustering validation indexes are intended to assess the goodness of clustering results. Many methods used to estimate the number of clusters rely on a validation index as a key element to find the correct answer. This paper presents a new validation index based on graph concepts, which has been designed to find arbitrary shaped clusters by exploiting the spatial layout of the patterns and their clustering label. This new clustering index is combined with a solid statistical detection framework, the gap statistic. The resulting method is able to find the right number of arbitrary-shaped clusters in diverse situations, as we show with examples where this information is available. A comparison with several relevant validation methods is carried out using artificial and gene expression data sets. The results are very encouraging, showing that the underlying structure in the data can be more accurately detected with the new clustering index. Our gene expression data results also indicate that this new index is stable under perturbation of the input data. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mining Featured Patterns of MiRNA Interaction Based on Sequence and Structure Similarity

    Publication Year: 2013 , Page(s): 415 - 422
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (595 KB) |  | HTML iconHTML  

    MicroRNA (miRNA) is an endogenous small noncoding RNA that plays an important role in gene expression through the post-transcriptional gene regulation pathways. There are many literature works focusing on predicting miRNA targets and exploring gene regulatory networks of miRNA families. We suggest, however, the study to identify the interaction between miRNAs is insufficient. This paper presents a framework to identify relationships between miRNAs using joint entropy, to investigate the regulatory features of miRNAs. Both the sequence and secondary structure are taken into consideration to make our method more relevant from the biological viewpoint. Further, joint entropy is applied to identify correlated miRNAs, which are more desirable from the perspective of the gene regulatory network. A data set including Drosophila melanogaster and Anopheles gambiae is used in the experiment. The results demonstrate that our approach is able to identify known miRNA interaction and uncover novel patterns of miRNA regulatory network. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mining Quasi-Bicliques from HIV-1-Human Protein Interaction Network: A Multiobjective Biclustering Approach

    Publication Year: 2013 , Page(s): 423 - 435
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1530 KB) |  | HTML iconHTML  

    In this work, we model the problem of mining quasi-bicliques from weighted viral-host protein-protein interaction network as a biclustering problem for identifying strong interaction modules. In this regard, a multiobjective genetic algorithm-based biclustering technique is proposed that simultaneously optimizes three objective functions to obtain dense biclusters having high mean interaction strengths. The performance of the proposed technique has been compared with that of other existing biclustering methods on an artificial data. Subsequently, the proposed biclustering method is applied on the records of biologically validated and predicted interactions between a set of HIV-1 proteins and a set of human proteins to identify strong interaction modules. For this, the entire interaction information is realized as a bipartite graph. We have further investigated the biological significance of the obtained biclusters. The human proteins involved in the strong interaction module have been found to share common biological properties and they are identified as the gateways of viral infection leading to various diseases. These human proteins can be potential drug targets for developing anti-HIV drugs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multilabel Learning via Random Label Selection for Protein Subcellular Multilocations Prediction

    Publication Year: 2013 , Page(s): 436 - 446
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (934 KB) |  | HTML iconHTML  

    Prediction of protein subcellular localization is an important but challenging problem, particularly when proteins may simultaneously exist at, or move between, two or more different subcellular location sites. Most of the existing protein subcellular localization methods are only used to deal with the single-location proteins. In the past few years, only a few methods have been proposed to tackle proteins with multiple locations. However, they only adopt a simple strategy, that is, transforming the multilocation proteins to multiple proteins with single location, which does not take correlations among different subcellular locations into account. In this paper, a novel method named random label selection (RALS) (multilabel learning via RALS), which extends the simple binary relevance (BR) method, is proposed to learn from multilocation proteins in an effective and efficient way. RALS does not explicitly find the correlations among labels, but rather implicitly attempts to learn the label correlations from data by augmenting original feature space with randomly selected labels as its additional input features. Through the fivefold cross-validation test on a benchmark data set, we demonstrate our proposed method with consideration of label correlations obviously outperforms the baseline BR method without consideration of label correlations, indicating correlations among different subcellular locations really exist and contribute to improvement of prediction performance. Experimental results on two benchmark data sets also show that our proposed methods achieve significantly higher performance than some other state-of-the-art methods in predicting subcellular multilocations of proteins. The prediction web server is available at http://levis.tongji.edu.cn:8080/bioinfo/MLPred-Euk/ for the public usage. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Nonnegative Least-Squares Methods for the Classification of High-Dimensional Biological Data

    Publication Year: 2013 , Page(s): 447 - 456
    Cited by:  Papers (1)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (437 KB) |  | HTML iconHTML  

    Microarray data can be used to detect diseases and predict responses to therapies through classification models. However, the high dimensionality and low sample size of such data result in many computational problems such as reduced prediction accuracy and slow classification speed. In this paper, we propose a novel family of nonnegative least-squares classifiers for high-dimensional microarray gene expression and comparative genomic hybridization data. Our approaches are based on combining the advantages of using local learning, transductive learning, and ensemble learning, for better prediction performance. To study the performances of our methods, we performed computational experiments on 17 well-known data sets with diverse characteristics. We have also performed statistical comparisons with many classification techniques including the well-performing SVM approach and two related but recent methods proposed in literature. Experimental results show that our approaches are faster and achieve generally a better prediction performance over compared methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Normalized Feature Vectors: A Novel Alignment-Free Sequence Comparison Method Based on the Numbers of Adjacent Amino Acids

    Publication Year: 2013 , Page(s): 457 - 467
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1157 KB) |  | HTML iconHTML  

    Based on all kinds of adjacent amino acids (AAA), we map each protein primary sequence into a 400 by (L-1) matrix M. In addition, we further derive a normalized 400-tuple mathematical descriptors D, which is extracted from the primary protein sequences via singular values decomposition (SVD) of the matrix. The obtained 400-D normalized feature vectors (NFVs) further facilitate our quantitative analysis of protein sequences. Using the normalized representation of the primary protein sequences, we analyze the similarity for different sequences upon two data sets: 1) ND5 sequences from nine species and 2) transferrin sequences of 24 vertebrates. We also compared the results in this study with those from other related works. These two experiments illustrate that our proposed NFV-AAA approach does perform well in the field of similarity analysis of sequence. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the Increase in Network Robustness and Decrease in Network Response Ability during the Aging Process: A Systems Biology Approach via Microarray Data

    Publication Year: 2013 , Page(s): 468 - 480
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1999 KB) |  | HTML iconHTML  

    Aging, an extremely complex and system-level process, has attracted much attention in medical research, especially since chronic diseases are quite prevalent in the elderly population. These may be the result of both gene mutations that lead to intrinsic perturbations and environmental changes that may stimulate signaling in the body. Therefore, analysis of network robustness to tolerate intrinsic perturbations and network response ability of gene networks to respond to external stimuli during the aging process may provide insight into the systematic changes caused by aging. We first propose novel methods to estimate network robustness and measure network response ability of gene regulatory networks by using their corresponding microarray data in the aging process. Then, we find that an aging-related gene network is more robust to intrinsic perturbations in the elderly than the young, and therefore is less responsive to external stimuli. Finally, we find that the response abilities of individual genes, especially FOXOs, NF-κB, and p53, are significantly different in the young versus the aged subjects. These observations are consistent with experimental findings in the aged population, e.g., elevated incidence of tumorigenesis and diminished resistance to oxidative stress. The proposed method can also be used for exploring and analyzing the dynamic properties of other biological processes via corresponding microarray data to provide useful information on clinical strategy and drug target selection. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Pareto Optimal Pairwise Sequence Alignment

    Publication Year: 2013 , Page(s): 481 - 493
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1312 KB) |  | HTML iconHTML  

    Sequence alignment using evolutionary profiles is a commonly employed tool when investigating a protein. Many profile-profile scoring functions have been developed for use in such alignments, but there has not yet been a comprehensive study of Pareto optimal pairwise alignments for combining multiple such functions. We show that the problem of generating Pareto optimal pairwise alignments has an optimal substructure property, and develop an efficient algorithm for generating Pareto optimal frontiers of pairwise alignments. All possible sets of two, three, and four profile scoring functions are used from a pool of 11 functions and applied to 588 pairs of proteins in the ce_ref data set. The performance of the best objective combinations on ce_ref is also evaluated on an independent set of 913 protein pairs extracted from the BAliBASE RV11 data set. Our dynamic-programming-based heuristic approach produces approximated Pareto optimal frontiers of pairwise alignments that contain comparable alignments to those on the exact frontier, but on average in less than 1/58th the time in the case of four objectives. Our results show that the Pareto frontiers contain alignments whose quality is better than the alignments obtained by single objectives. However, the task of identifying a single high-quality alignment among those in the Pareto frontier remains challenging. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Profile-Based LC-MS Data Alignment - A Bayesian Approach

    Publication Year: 2013 , Page(s): 494 - 503
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2024 KB) |  | HTML iconHTML  

    A Bayesian alignment model (BAM) is proposed for alignment of liquid chromatography-mass spectrometry (LC-MS) data. BAM belongs to the category of profile-based approaches, which are composed of two major components: a prototype function and a set of mapping functions. Appropriate estimation of these functions is crucial for good alignment results. BAM uses Markov chain Monte Carlo (MCMC) methods to draw inference on the model parameters and improves on existing MCMC-based alignment methods through 1) the implementation of an efficient MCMC sampler and 2) an adaptive selection of knots. A block Metropolis-Hastings algorithm that mitigates the problem of the MCMC sampler getting stuck at local modes of the posterior distribution is used for the update of the mapping function coefficients. In addition, a stochastic search variable selection (SSVS) methodology is used to determine the number and positions of knots. We applied BAM to a simulated data set, an LC-MS proteomic data set, and two LC-MS metabolomic data sets, and compared its performance with the Bayesian hierarchical curve registration (BHCR) model, the dynamic time-warping (DTW) model, and the continuous profile model (CPM). The advantage of applying appropriate profile-based retention time correction prior to performing a feature-based approach is also demonstrated through the metabolomic data sets. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • RANGI: A Fast List-Colored Graph Motif Finding Algorithm

    Publication Year: 2013 , Page(s): 504 - 513
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (696 KB) |  | HTML iconHTML  

    Given a multiset of colors as the query and a list-colored graph, i.e., an undirected graph with a set of colors assigned to each of its vertices, in the NP-hard list-colored graph motif problem the goal is to find the largest connected subgraph such that one can select a color from the set of colors assigned to each of its vertices to obtain a subset of the query. This problem was introduced to find functional motifs in biological networks. We present a branch-and-bound algorithm named RANGI for finding and enumerating list-colored graph motifs. As our experimental results show, RANGI's pruning methods and heuristics make it quite fast in practice compared to the algorithms presented in the literature. We also present a parallel version of RANGI that achieves acceptable scalability. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

This bimonthly publishes archival research results related to the algorithmic, mathematical, statistical, and computational methods that are central in bioinformatics and computational biology.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Ying Xu
University of Georgia
xyn@bmb.uga.edu

Associate Editor-in-Chief
Dong Xu
University of Missouri
xudong@missouri.edu