• A Multi-Layered Screening Method to Identify Plant Regulatory Genes

We used a seven-step process to identify genes involved in glucosinolate biosynthesis and metabolism in the Chinese cabbage (Brassica rapa). We constructed an annotated data set with 34,570 unigenes from B. rapa and predicted 11,526 glucosinolate-related candidate genes using expression profiles generated across nine stages of development on a 47k-gene microarray. Using our multi-layered screening... View full abstract»

• An Algorithm for Constructing Principal Geodesics in Phylogenetic Treespace

Most phylogenetic analyses result in a sample of trees, but summarizing and visualizing these samples can be challenging. Consensus trees often provide limited information about a sample, and so methods such as consensus networks, clustering and multidimensional scaling have been developed and applied to tree samples. This paper describes a stochastic algorithm for constructing a principal geodesi... View full abstract»

• Distance-Based Phylogenetic Methods Around a Polytomy

Distance-based phylogenetic algorithms attempt to solve the NP-hard least-squares phylogeny problem by mapping an arbitrary dissimilarity map representing biological data to a tree metric. The set of all dissimilarity maps is a Euclidean space properly containing the space of all tree metrics as a polyhedral fan. Outputs of distance-based tree reconstruction algorithms such as UPGMA and neighbor-j... View full abstract»

• An Odd Parity Checker Prototype Using DNAzyme Finite State Machine

A finite-state machine (FSM) is an abstract mathematical model of computation used to design both computer programs and sequential logic circuits. Considered as an abstract model of computation, FSM is weak; it has less computational power than some other models of computation such as the Turing machine. This paper discusses the finite-state automata based on Deoxyribonucleic Acid (DNA) and differ... View full abstract»

• Improved Exact Enumerative Algorithms for the Planted ($l$, $d$)-Motif Search Problem

In this paper efficient exact algorithms are proposed for the planted ( l, d)-motif search problem. This problem is to find all motifs of length l that are planted in each input string with at most d mismatches. The “quorum” version of this problem is also treated in this paper to find motifs planted not in all input strings but in at least q input strings. The proposed algorithms ar... View full abstract»

• Hierarchical Probabilistic Interaction Modeling for Multiple Gene Expression Replicates

Microarray technology allows for the collection of multiple replicates of gene expression time course data for hundreds of genes at a handful of time points. Developing hypotheses about a gene transcriptional network, based on time course gene expression data is an important and very challenging problem. In many situations there are similarities which suggest a hierarchical structure between the r... View full abstract»

• Hybrid Ant Bee Algorithm for Fuzzy Expert System Based Sample Classification

Accuracy maximization and complexity minimization are the two main goals of a fuzzy expert system based microarray data classification. Our previous Genetic Swarm Algorithm (GSA) approach has improved the classification accuracy of the fuzzy expert system at the cost of their interpretability. The if-then rules produced by the GSA are lengthy and complex which is difficult for the physician to und... View full abstract»

• Indexing Graphs for Path Queries with Applications in Genome Research

We propose a generic approach to replace the canonical sequence representation of genomes with graph representations, and study several applications of such extensions. We extend the Burrows-Wheeler transform (BWT) of strings to acyclic directed labeled graphs, to support path queries as an extension to substring searching. We develop, apply, and tailor this technique to a) read alignment on an ex... View full abstract»

• Merging Partially Labelled Trees: Hardness and a Declarative Programming Solution

Intraspecific studies often make use of haplotype networks instead of gene genealogies to represent the evolution of a set of genes. Cassens et al. proposed one such network reconstruction method, based on the global maximum parsimony principle, which was later recast by the first author of the present work as the problem of finding a minimum common supergraph of a set of t partially labelled tree... View full abstract»

• Predicting Essential Proteins Based on Weighted Degree Centrality

Essential proteins are vital for an organism's viability under a variety of conditions. There are many experimental and computational methods developed to identify essential proteins. Computational prediction of essential proteins based on the global protein-protein interaction (PPI) network is severely restricted because of the insufficiency of the PPI data, but fortunately the gene expression pr... View full abstract»

• Optimizing Spaced $k$-mer Neighbors for Efficient Filtration in Protein Similarity Search

Large-scale comparison or similarity search of genomic DNA and protein sequence is of fundamental importance in modern molecular biology. To perform DNA and protein sequence similarity search efficiently, seeding (or filtration) method has been widely used where only sequences sharing a common pattern or “seed” are subject to detailed comparison. Therefore these methods trade search ... View full abstract»

• Solving the Secondary Structure Matching Problem in Cryo-EM De Novo Modeling Using a Constrained $K$-Shortest Path Graph Algorithm

Electron cryomicroscopy is becoming a major experimental technique in solving the structures of large molecular assemblies. More and more three-dimensional images have been obtained at the medium resolutions between 5 and 10 Å. At this resolution range, major α-helices can be detected as cylindrical sticks and β-sheets can be detected as plain-like regions. A critical question... View full abstract»

• Systematic Approach to Computational Design of Gene Regulatory Networks with Information Processing Capabilities

We present several measures that can be used in de novo computational design of biological systems with information processing capabilities. Their main purpose is to objectively evaluate the behavior and identify the biological information processing structures with the best dynamical properties. They can be used to define constraints that allow one to simplify the design of more complex biologica... View full abstract»

• Thermodynamic Post-Processing versus GC-Content Pre-Processing for DNA Codes Satisfying the Hamming Distance and Reverse-Complement Constraints

Stochastic, meta-heuristic and linear construction algorithms for the design of DNA strands satisfying Hamming distance and reverse-complement constraints often use a GC-content constraint to pre-process the DNA strands. Since GC-content is a poor predictor of DNA strand hybridization strength the strands can be filtered by post-processing using thermodynamic calculations. An alternative approach ... View full abstract»

