By Topic

Computational Biology and Bioinformatics, IEEE/ACM Transactions on

Issue 5 • Date Sept.-Oct. 2011

Filter Results

Displaying Results 1 - 25 of 32
  • [Front cover]

    Publication Year: 2011 , Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (352 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Publication Year: 2011 , Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (128 KB)  
    Freely Available from IEEE
  • A Generalized Multivariate Approach to Pattern Discovery from Replicated and Incomplete Genome-Wide Measurements

    Publication Year: 2011 , Page(s): 1153 - 1169
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1402 KB) |  | HTML iconHTML  

    Estimation of pairwise correlation from incomplete and replicated molecular profiling data is an ubiquitous problem in pattern discovery analysis, such as clustering and networking. However, existing methods solve this problem by ad hoc data imputation, followed by aveGation coefficient type approaches, which might annihilate important patterns present in the molecular profiling data. Moreover, these approaches do not consider and exploit the underlying experimental design information that specifies the replication mechanisms. We develop an Expectation-Maximization (EM) type algorithm to estimate the correlation structure using incomplete and replicated molecular profiling data with a priori known replication mechanism. The approach is sufficiently generalized to be applicable to any known replication mechanism. In case of unknown replication mechanism, it is reduced to the parsimonious model introduced previously. The efficacy of our approach was first evaluated by comprehensively comparing various bivariate and multivariate imputation approaches using simulation studies. Results from real-world data analysis further confirmed the superior performance of the proposed approach to the commonly used approaches, where we assessed the robustness of the method using data sets with up to 30 percent missing values. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Novel Knowledge-Driven Systems Biology Approach for Phenotype Prediction upon Genetic Intervention

    Publication Year: 2011 , Page(s): 1170 - 1182
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1220 KB) |  | HTML iconHTML  

    Deciphering the biological networks underlying complex phenotypic traits, e.g., human disease is undoubtedly crucial to understand the underlying molecular mechanisms and to develop effective therapeutics. Due to the network complexity and the relatively small number of available experiments, data-driven modeling is a great challenge for deducing the functions of genes/proteins in the network and in phenotype formation. We propose a novel knowledge-driven systems biology method that utilizes qualitative knowledge to construct a Dynamic Bayesian network (DBN) to represent the biological network underlying a specific phenotype. Edges in this network depict physical interactions between genes and/or proteins. A qualitative knowledge model first translates typical molecular interactions into constraints when resolving the DBN structure and parameters. Therefore, the uncertainty of the network is restricted to a subset of models which are consistent with the qualitative knowledge. All models satisfying the constraints are considered as candidates for the underlying network. These consistent models are used to perform quantitative inference. By in silico inference, we can predict phenotypic traits upon genetic interventions and perturbing in the network. We applied our method to analyze the puzzling mechanism of breast cancer cell proliferation network and we accurately predicted cancer cell growth rate upon manipulating (anti)cancerous marker genes/proteins. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Preprocessing Procedure for Haplotype Inference by Pure Parsimony

    Publication Year: 2011 , Page(s): 1183 - 1195
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1246 KB) |  | HTML iconHTML  

    Haplotype data are especially important in the study of complex diseases since it contains more information than genotype data. However, obtaining haplotype data is technically difficult and costly. Computational methods have proved to be an effective way of inferring haplotype data from genotype data. One of these methods, the haplotype inference by pure parsimony approach (HIPP), casts the problem as an optimization problem and as such has been proved to be NP-hard. We have designed and developed a new preprocessing procedure for this problem. Our proposed algorithm works with groups of haplotypes rather than individual haplotypes. It iterates searching and deleting haplotypes that are not helpful in order to find the optimal solution. This preprocess can be coupled with any of the current solvers for the HIPP that need to preprocess the genotype data. In order to test it, we have used two state-of-the-art solvers, RTIP and GAHAP, and simulated and real HapMap data. Due to the computational time and memory reduction caused by our preprocess, problem instances that were previously unaffordable can be now efficiently solved. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Efficient Algorithm for Approximating Geodesic Distances in Tree Space

    Publication Year: 2011 , Page(s): 1196 - 1207
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (898 KB) |  | HTML iconHTML  

    The increasing use of phylogeny in biological studies is limited by the need to make available more efficient tools for computing distances between trees. The geodesic tree distance-introduced by Billera, Holmes, and Vogtmann-combines both the tree topology and edge lengths into a single metric. Despite the conceptual simplicity of the geodesic tree distance, algorithms to compute it don't scale well to large, real-world phylogenetic trees composed of hundred or even thousand leaves. In this paper, we propose the geodesic distance as an effective tool for exploring the likelihood profile in the space of phylogenetic trees, and we give a cubic time algorithm, GeoHeuristic, in order to compute an approximation of the distance. We compare it with the GTP algorithm, which calculates the exact distance, and the cone path length, which is another approximation, showing that GeoHeuristic achieves a quite good trade-off between accuracy (relative error always lower than 0.0001) and efficiency. We also prove the equivalence among GeoHeuristic, cone path, and Robinson-Foulds distances when assuming branch lengths equal to unity and we show empirically that, under this restriction, these distances are almost always equal to the actual geodesic. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Continuous Cotemporal Probabilistic Modeling of Systems Biology Networks from Sparse Data

    Publication Year: 2011 , Page(s): 1208 - 1222
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1997 KB) |  | HTML iconHTML  

    Modeling of biological networks is a difficult endeavor, but exploration of this problem is essential for understanding the systems behavior of biological processes. In this contribution, developed for sparse data, we present a new continuous Bayesian graphical learning algorithm to cotemporally model proteins in signaling networks and genes in transcriptional regulatory networks. In this continuous Bayesian algorithm, the correlation matrix is singular because the number of time points is less than the number of biological entities (genes or proteins). A suitable restriction on the degree of the graph's vertices is applied and a Metropolis-Hastings algorithm is guided by a BIC-based posterior probability score. Ten independent and diverse runs of the algorithm are conducted, so that the probability space is properly well-explored. Diagnostics to test the applicability of the algorithm to the specific data sets are developed; this is a major benefit of the methodology. This novel algorithm is applied to two time course experimental data sets: 1) protein modification data identifying a potential signaling network in chondrocytes, and 2) gene expression data identifying the transcriptional regulatory network underlying dendritic cell maturation. This method gives high estimated posterior probabilities to many of the proteins' directed edges that are predicted by the literature; for the gene study, the method gives high posterior probabilities to many of the literature-predicted sibling edges. In simulations, the method gives substantially higher estimated posterior probabilities for true edges and true subnetworks than for their false counterparts. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Designing Logical Rules to Model the Response of Biomolecular Networks with Complex Interactions: An Application to Cancer Modeling

    Publication Year: 2011 , Page(s): 1223 - 1234
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (965 KB) |  | HTML iconHTML  

    We discuss the propagation of constraints in eukaryotic interaction networks in relation to model prediction and the identification of critical pathways. In order to cope with posttranslational interactions, we consider two types of nodes in the network, corresponding to proteins and to RNA. Microarray data provides very lacunar information for such types of networks because protein nodes, although needed in the model, are not observed. Propagation of observations in such networks leads to poor and nonsignificant model predictions, mainly because rules used to propagate information-usually disjunctive constraints-are weak. Here, we propose a new, stronger type of logical constraints that allow us to strengthen the analysis of the relation between microarray and interaction data. We use these rules to identify the nodes which are responsible for a phenotype, in particular for cell cycle progression. As the benchmark, we use an interaction network describing major pathways implied in Ewing's tumor development. The Python library used to obtain our results is publicly available on our supplementary web page. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient Localization of Hot Spots in Proteins Using a Novel S-Transform Based Filtering Approach

    Publication Year: 2011 , Page(s): 1235 - 1246
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3298 KB) |  | HTML iconHTML  

    Protein-protein interactions govern almost all biological processes and the underlying functions of proteins. The interaction sites of protein depend on the 3D structure which in turn depends on the amino acid sequence. Hence, prediction of protein function from its primary sequence is an important and challenging task in bioinformatics. Identification of the amino acids (hot spots) that leads to the characteristic frequency signifying a particular biological function is really a tedious job in proteomic signal processing. In this paper, we have proposed a new promising technique for identification of hot spots in proteins using an efficient time-frequency filtering approach known as the S-transform filtering. The S-transform is a powerful linear time-frequency representation and is especially useful for the filtering in the time-frequency domain. The potential of the new technique is analyzed in identifying hot spots in proteins and the result obtained is compared with the existing methods. The results demonstrate that the proposed method is superior to its counterparts and is consistent with results based on biological methods for identification of the hot spots. The proposed method also reveals some new hot spots which need further investigation and validation by the biological community. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast Flexible Modeling of RNA Structure Using Internal Coordinates

    Publication Year: 2011 , Page(s): 1247 - 1257
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1750 KB) |  | HTML iconHTML  

    Modeling the structure and dynamics of large macromolecules remains a critical challenge. Molecular dynamics (MD) simulations are expensive because they model every atom independently, and are difficult to combine with experimentally derived knowledge. Assembly of molecules using fragments from libraries relies on the database of known structures and thus may not work for novel motifs. Coarse-grained modeling methods have yielded good results on large molecules but can suffer from difficulties in creating more detailed full atomic realizations. There is therefore a need for molecular modeling algorithms that remain chemically accurate and economical for large molecules, do not rely on fragment libraries, and can incorporate experimental information. RNABuilder works in the internal coordinate space of dihedral angles and thus has time requirements proportional to the number of moving parts rather than the number of atoms. It provides accurate physics-based response to applied forces, but also allows user-specified forces for incorporating experimental information. A particular strength of RNABuilder is that all Leontis-Westhof basepairs can be specified as primitives by the user to be satisfied during model construction. We apply RNABuilder to predict the structure of an RNA molecule with 160 bases from its secondary structure, as well as experimental information. Our model matches the known structure to 10.2 Angstroms RMSD and has low computational expense. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improved Algorithms for Finding Gene Teams and Constructing Gene Team Trees

    Publication Year: 2011 , Page(s): 1258 - 1272
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (983 KB) |  | HTML iconHTML  

    A gene team is a set of genes that appear in two or more species, possibly in a different order yet with the distance between adjacent genes in the team for each chromosome always no more than a certain threshold δ. A gene team tree is a succinct way to represent all gene teams for every possible value of δ. In this paper, improved algorithms are presented for the problem of finding the gene teams of two chromosomes and the problem of constructing a gene team tree of two chromosomes. For the problem of finding gene teams, Beal et al. had an O(n lg2 n)-time algorithm. Our improved algorithm requires O(n lg t) time, where t ≤ n is the number of gene teams. For the problem of constructing a gene team tree, Zhang and Leong had an O(n lg2 n)-time algorithm. Our improved algorithm requires O(n lg n lglg n) time. Similar to Beal et al.'s gene team algorithm and Zhang and Leong's gene team tree algorithm, our improved algorithms can be extended to k chromosomes with the time complexities increased only by a factor of k. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Metasample-Based Sparse Representation for Tumor Classification

    Publication Year: 2011 , Page(s): 1273 - 1282
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1148 KB) |  | HTML iconHTML  

    A reliable and accurate identification of the type of tumors is crucial to the proper treatment of cancers. In recent years, it has been shown that sparse representation (SR) by l1-norm minimization is robust to noise, outliers and even incomplete measurements, and SR has been successfully used for classification. This paper presents a new SR-based method for tumor classification using gene expression data. A set of metasamples are extracted from the training samples, and then an input testing sample is represented as the linear combination of these metasamples by l1-regularized least square method. Classification is achieved by using a discriminating function defined on the representation coefficients. Since l1-norm minimization leads to a sparse solution, the proposed method is called metasample-based SR classification (MSRC). Extensive experiments on publicly available gene expression data sets show that MSRC is efficient for tumor classification, achieving higher accuracy than many existing representative schemes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multiple Sequence Assembly from Reads Alignable to a Common Reference Genome

    Publication Year: 2011 , Page(s): 1283 - 1295
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (853 KB) |  | HTML iconHTML  

    We describe a set of computational problems motivated by certain analysis tasks in genome resequencing. These are assembly problems for which multiple distinct sequences must be assembled, but where the relative positions of reads to be assembled are already known. This information is obtained from a common reference genome and is characteristic of resequencing experiments. The simplest variant of the problem aims at determining a minimum set of superstrings such that each sequenced read matches at least one superstring. We give an algorithm with time complexity O(N), where N is the sum of the lengths of reads, substantially improving on previous algorithms for solving the same problem. We also examine the problem of finding the smallest number of reads to remove such that the remaining reads are consistent with k superstrings. By exploiting a surprising relationship with the minimum cost flow problem, we show that this problem can be solved in polynomial time when nested reads are excluded. If nested reads are permitted, this problem of removing the minimum number of reads becomes NP-hard. We show that permitting mismatches between reads and their nearest superstrings generally renders these problems NP-hard. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parameterized Algorithmics for Finding Connected Motifs in Biological Networks

    Publication Year: 2011 , Page(s): 1296 - 1308
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (518 KB) |  | HTML iconHTML  

    We study the NP-hard LIST-COLORED GRAPH MOTIF problem which, given an undirected list-colored graph G = (V, E) and a multiset M of colors, asks for maximum-cardinality sets S ⊆ V and M' ⊆ M such that G[S] is connected and contains exactly (with respect to multiplicity) the colors in M'. LIST-COLORED GRAPH MOTIF has applications in the analysis of biological networks. We study LIST-COLORED GRAPH MOTIF with respect to three different parameterizations. For the parameters motif size |M| and solution size |S|, we present fixed-parameter algorithms, whereas for the parameter |V| - |M|, we show W[1]-hardness for general instances and achieve fixed-parameter tractability for a special case of LIST-COLORED GRAPH MOTIF. We implemented the fixed-parameter algorithms for parameters |M| and |S|, developed further speed-up heuristics for these algorithms, and applied them in the context of querying protein-interaction networks, demonstrating their usefulness for realistic instances. Furthermore, we show that extending the request for motif connectedness to stronger demands, such as biconnectedness or bridge-connectedness leads to W[1]-hard problems when the parameter is the motif size |M|. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Probabilistic Models for Semisupervised Discriminative Motif Discovery in DNA Sequences

    Publication Year: 2011 , Page(s): 1309 - 1317
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (644 KB) |  | HTML iconHTML  

    Methods for discriminative motif discovery in DNA sequences identify transcription factor binding sites (TFBSs), searching only for patterns that differentiate two sets (positive and negative sets) of sequences. On one hand, discriminative methods increase the sensitivity and specificity of motif discovery, compared to generative models. On the other hand, generative models can easily exploit unlabeled sequences to better detect functional motifs when labeled training samples are limited. In this paper, we develop a hybrid generative/discriminative model which enables us to make use of unlabeled sequences in the framework of discriminative motif discovery, leading to semisupervised discriminative motif discovery. Numerical experiments on yeast ChIP-chip data for discovering DNA motifs demonstrate that the best performance is obtained between the purely-generative and the purely-discriminative and the semisupervised learning improves the performance when labeled sequences are limited. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • SCJ: A Breakpoint-Like Distance that Simplifies Several Rearrangement Problems

    Publication Year: 2011 , Page(s): 1318 - 1329
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (618 KB) |  | HTML iconHTML  

    The breakpoint distance is one of the most straightforward genome comparison measures. Surprisingly, when it comes to defining it precisely for multichromosomal genomes with both linear and circular chromosomes, there is more than one way to go about it. Pevzner and Tesler gave a definition in a 2003 paper, Tannier et al. defined it differently in 2008, and in this paper we provide yet another alternative, calling it SCJ for single-cut-or-join, in analogy to the popular double cut and join (DCJ) measure. We show that several genome rearrangement problems, such as median and halving, become easy for SCJ, and provide linear and higher polynomial time algorithms for them. For the multichromosomal linear genome median problem, this is the first polynomial time algorithm described, since for other distances this problem is NP-hard. In addition, we show that small parsimony under SCJ is also easy, and can be solved by a variant of Fitch's algorithm. In contrast, big parsimony is NP-hard under SCJ. This new distance measure may be of value as a speedily computable, first approximation to distances based on more realistic rearrangement models. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • SEGA: Semiglobal Graph Alignment for Structure-Based Protein Comparison

    Publication Year: 2011 , Page(s): 1330 - 1343
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1416 KB) |  | HTML iconHTML  

    Comparative analysis is a topic of utmost importance in structural bioinformatics. Recently, a structural counterpart to sequence alignment, called multiple graph alignment, was introduced as a tool for the comparison of protein structures in general and protein binding sites in particular. Using approximate graph matching techniques, this method enables the identification of approximately conserved patterns in functionally related structures. In this paper, we introduce a new method for computing graph alignments motivated by two problems of the original approach, a conceptual and a computational one. First, the existing approach is of limited usefulness for structures that only share common substructures. Second, the goal to find a globally optimal alignment leads to an optimization problem that is computationally intractable. To overcome these disadvantages, we propose a semiglobal approach to graph alignment in analogy to semiglobal sequence alignment that combines the advantages of local and global graph matching. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • SLIDER: A Generic Metaheuristic for the Discovery of Correlated Motifs in Protein-Protein Interaction Networks

    Publication Year: 2011 , Page(s): 1344 - 1357
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1212 KB) |  | HTML iconHTML  

    Correlated motif mining (cmm) is the problem of finding overrepresented pairs of patterns, called motifs, in sequences of interacting proteins. Algorithmic solutions for cmm thereby provide a computational method for predicting binding sites for protein interaction. In this paper, we adopt a motif-driven approach where the support of candidate motif pairs is evaluated in the network. We experimentally establish the superiority of the Chi-square-based support measure over other support measures. Furthermore, we obtain that cmm is an np-hard problem for a large class of support measures (including Chi-square) and reformulate the search for correlated motifs as a combinatorial optimization problem. We then present the generic metaheuristic slider which uses steepest ascent with a neighborhood function based on sliding motifs and employs the Chi-square-based support measure. We show that slider outperforms existing motif-driven cmm methods and scales to large protein-protein interaction networks. The slider-implementation and the data used in the experiments are available on http://bioinformatics.uhasselt.be. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Some Mathematical Refinements Concerning Error Minimization in the Genetic Code

    Publication Year: 2011 , Page(s): 1358 - 1372
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1517 KB) |  | HTML iconHTML  

    The genetic code is known to have a high level of error robustness and has been shown to be very error robust compared to randomly selected codes, but to be significantly less error robust than a certain code found by a heuristic algorithm. We formulate this optimization problem as a Quadratic Assignment Problem and use this to formally verify that the code found by the heuristic algorithm is the global optimum. We also argue that it is strongly misleading to compare the genetic code only with codes sampled from the fixed block model, because the real code space is orders of magnitude larger. We thus enlarge the space from which random codes can be sampled from approximately 2.433 × 1018 codes to approximately 5.908 × 1045 codes. We do this by leaving the fixed block model, and using the wobble rules to formulate the characteristics acceptable for a genetic code. By relaxing more constraints, three larger spaces are also constructed. Using a modified error function, the genetic code is found to be more error robust compared to a background of randomly generated codes with increasing space size. We point out that these results do not necessarily imply that the code was optimized during evolution for error minimization, but that other mechanisms could be the reason for this error robustness. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Using Kernel Alignment to Select Features of Molecular Descriptors in a QSAR Study

    Publication Year: 2011 , Page(s): 1373 - 1384
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (678 KB) |  | HTML iconHTML  

    Quantitative structure-activity relationships (QSARs) correlate biological activities of chemical compounds with their physicochemical descriptors. By modeling the observed relationship seen between molecular descriptors and their corresponding biological activities, we may predict the behavior of other molecules with similar descriptors. In QSAR studies, it has been shown that the quality of the prediction model strongly depends on the selected features within molecular descriptors. Thus, methods capable of automatic selection of relevant features are very desirable. In this paper, we present a new feature selection algorithm for a QSAR study based on kernel alignment which has been used as a measure of similarity between two kernel functions. In our algorithm, we deploy kernel alignment as an evaluation tool, using recursive feature elimination to compute a molecular descriptor containing the most important features needed for a classification application. Empirical results show that the algorithm works well for the computation of descriptors for various applications involving different QSAR data sets. The prediction accuracies are substantially increased and are comparable to those from earlier studies. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Mathematical Model for the Validation of Gene Selection Methods

    Publication Year: 2011 , Page(s): 1385 - 1392
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (196 KB) |  | HTML iconHTML  

    Gene selection methods aim at determining biologically relevant subsets of genes in DNA microarray experiments. However, their assessment and validation represent a major difficulty since the subset of biologically relevant genes is usually unknown. To solve this problem a novel procedure for generating biologically plausible synthetic gene expression data is proposed. It is based on a proper mathematical model representing gene expression signatures and expression profiles through Boolean threshold functions. The results show that the proposed procedure can be successfully adopted to analyze the quality of statistical and machine learning-based gene selection algorithms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A SAT-Based Algorithm for Finding Attractors in Synchronous Boolean Networks

    Publication Year: 2011 , Page(s): 1393 - 1399
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (570 KB) |  | HTML iconHTML  

    This paper addresses the problem of finding attractors in synchronous Boolean networks. The existing Boolean decision diagram-based algorithms have limited capacity due to the excessive memory requirements of decision diagrams. The simulation-based algorithms can be applied to larger networks, however, they are incomplete. We present an algorithm, which uses a SAT-based bounded model checking to find all attractors in a Boolean network. The efficiency of the presented algorithm is evaluated by analyzing seven networks models of real biological processes, as well as 150,000 randomly generated Boolean networks of sizes between 100 and 7,000. The results show that our approach has a potential to handle an order of magnitude larger models than currently possible. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast Exact Algorithms for the Closest String and Substring Problems with Application to the Planted (L,d)-Motif Model

    Publication Year: 2011 , Page(s): 1400 - 1410
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (645 KB)  

    We present two parameterized algorithms for the closest string problem. The first runs in O(nL + nd · 17.97d) time for DNA strings and in O(nL + nd · 61.86d) time for protein strings, where n is the number of input strings, L is the length of each input string, and d is the given upper bound on the number of mismatches between the center string and each input string. The second runs in O(nL + nd · 13.92d) time for DNA strings and in O(nL + nd · 47.21d) time for protein strings. We then extend the first algorithm to a new parameterized algorithm for the closest substring problem that runs in O((n - 1)m2(L + d · 17.97d · m[log2(d+1)])) time for DNA strings and in O((n - 1)m2(L + d · 61.86d · m[log2(d+1)])) time for protein strings, where n is the number of input strings, L is the length of the center substring, L - 1 + m is the maximum length of a single input string, and d is the given upper bound on the number of mismatches between the center substring and at least one substring of each input string. All the algorithms significantly improve the previous bests. To verify experimentally the theoretical improvements in the time complexity, we implement our algorithm in C and apply the resulting program to the planted (L, d)-motif problem proposed by Pevzner and Sze in 2000. We compare our program with the previously best exact program for the problem, namely PMSPrune (designed by Davila et al. in 2007). Our experimental data show that our program runs faster for practical cases and also for several challenging cases. Our algorithm uses less memory too. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the Distribution of the Number of Cycles in the Breakpoint Graph of a Random Signed Permutation

    Publication Year: 2011 , Page(s): 1411 - 1416
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (369 KB) |  | HTML iconHTML  

    We use the finite Markov chain embedding technique to obtain the distribution of the number of cycles in the breakpoint graph of a random uniform signed permutation. This further gives a very good approximation of the distribution of the reversal distance between two random genomes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Probabilistic Mixture Regression Models for Alignment of LC-MS Data

    Publication Year: 2011 , Page(s): 1417 - 1424
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1391 KB) |  | HTML iconHTML  

    A novel framework of a probabilistic mixture regression model (PMRM) is presented for alignment of liquid chromatography-mass spectrometry (LC-MS) data with respect to retention time (RT) points. The expectation maximization algorithm is used to estimate the joint parameters of spline-based mixture regression models and prior transformation density models. The latter accounts for the variability in RT points and peak intensities. The applicability of PMRM for alignment of LC-MS data is demonstrated through three data sets. The performance of PMRM is compared with other alignment approaches including dynamic time warping, correlation optimized warping, and continuous profile model in terms of coefficient variation of replicate LC-MS runs and accuracy in detecting differentially abundant peptides/proteins. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

This bimonthly publishes archival research results related to the algorithmic, mathematical, statistical, and computational methods that are central in bioinformatics and computational biology.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Ying Xu
University of Georgia
xyn@bmb.uga.edu

Associate Editor-in-Chief
Dong Xu
University of Missouri
xudong@missouri.edu