By Topic

Computational Biology and Bioinformatics, IEEE/ACM Transactions on

Issue 6 • Date Nov.-Dec. 2014

Filter Results

Displaying Results 1 - 25 of 35
  • Table of Contents

    Page(s): C1
    Save to Project icon | Request Permissions | PDF file iconPDF (949 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Pattern Analysis and Machine Intelligence Editorial Board

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (320 KB)  
    Freely Available from IEEE
  • Selected Articles from the 2012 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS 2012)

    Page(s): 981 - 983
    Save to Project icon | Request Permissions | PDF file iconPDF (110 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Latent Feature Decompositions for Integrative Analysis of Multi-Platform Genomic Data

    Page(s): 984 - 994
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (872 KB) |  | HTML iconHTML  

    Increased availability of multi-platform genomics data on matched samples has sparked research efforts to discover how diverse molecular features interact both within and between platforms. In addition, simultaneous measurements of genetic and epigenetic characteristics illuminate the roles their complex relationships play in disease progression and outcomes. However, integrative methods for diverse genomics data are faced with the challenges of ultra-high dimensionality and the existence of complex interactions both within and between platforms. We propose a novel modeling framework for integrative analysis based on decompositions of the large number of platform-specific features into a smaller number of latent features. Subsequently we build a predictive model for clinical outcomes accounting for both within- and between-platform interactions based on Bayesian model averaging procedures. Principal components, partial least squares and non-negative matrix factorization as well as sparse counterparts of each are used to define the latent features, and the performance of these decompositions is compared both on real and simulated data. The latent feature interactions are shown to preserve interactions between the original features and not only aid prediction but also allow explicit selection of outcome-related features. The methods are motivated by and applied to a glioblastoma multiforme data set from The Cancer Genome Atlas to predict patient survival times integrating gene expression, microRNA, copy number and methylation data. For the glioblastoma data, we find a high concordance between our selected prognostic genes and genes with known associations with glioblastoma. In addition, our model discovers several relevant cross-platform interactions such as copy number variation associated gene dosing and epigenetic regulation through promoter methylation. On simulated data, we show that our proposed method successfully incorporates interactions within and between g- nomic platforms to aid accurate prediction and variable selection. Our methods perform best when principal components are used to define the latent features. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Integrated Approach to Anti-Cancer Drug Sensitivity Prediction

    Page(s): 995 - 1008
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1399 KB) |  | HTML iconHTML  

    A framework for design of personalized cancer therapy requires the ability to predict the sensitivity of a tumor to anti-cancer drugs. The predictive modeling of tumor sensitivity to anti-cancer drugs has primarily focused on generating functions that map gene expressions and genetic mutation profiles to drug sensitivity. In this paper, we present a new approach for drug sensitivity prediction and combination therapy design based on integrated functional and genomic characterizations. The modeling approach when applied to data from the Cancer Cell Line Encyclopedia shows a significant gain in prediction accuracy as compared to elastic net and random forest techniques based on genomic characterizations. Utilizing a Mouse Embryonal Rhabdomyosarcoma cell culture and a drug screen of 60 targeted drugs, we show that predictive modeling based on functional data alone can also produce high accuracy predictions. The framework also allows us to generate personalized tumor proliferation circuits to gain further insights on the individualized biological pathway. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Integration of Network Biology and Imaging to Study Cancer Phenotypes and Responses

    Page(s): 1009 - 1019
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (606 KB) |  | HTML iconHTML  

    Ever growing “omics” data and continuously accumulated biological knowledge provide an unprecedented opportunity to identify molecular biomarkers and their interactions that are responsible for cancer phenotypes that can be accurately defined by clinical measurements such as in vivo imaging. Since signaling or regulatory networks are dynamic and context-specific, systematic efforts to characterize such structural alterations must effectively distinguish significant network rewiring from random background fluctuations. Here we introduced a novel integration of network biology and imaging to study cancer phenotypes and responses to treatments at the molecular systems level. Specifically, Differential Dependence Network (DDN) analysis was used to detect statistically significant topological rewiring in molecular networks between two phenotypic conditions, and in vivo Magnetic Resonance Imaging (MRI) was used to more accurately define phenotypic sample groups for such differential analysis. We applied DDN to analyze two distinct phenotypic groups of breast cancer and study how genomic instability affects the molecular network topologies in high-grade ovarian cancer. Further, FDA-approved arsenic trioxide (ATO) and the ND2-SmoA1 mouse model of Medulloblastoma (MB) were used to extend our analyses of combined MRI and Reverse Phase Protein Microarray (RPMA) data to assess tumor responses to ATO and to uncover the complexity of therapeutic molecular biology. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Logistic Principal Component Analysis for Rare Variants in Gene-Environment Interaction Analysis

    Page(s): 1020 - 1028
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (732 KB) |  | HTML iconHTML  

    The characteristics of low minor allele frequency (MAF) and weak individual effects make genome-wide association studies (GWAS) for rare variant single nucleotide polymorphisms (SNPs) more difficult when using conventional statistical methods. By aggregating the rare variant effects belonging to the same gene, collapsing is the most common way to enhance the detection of rare variant effects for association analyses with a given trait. In this paper, we propose a novel framework of MAF-based logistic principal component analysis (MLPCA) to derive aggregated statistics by explicitly modeling the correlation between rare variant SNP data, which is categorical. The derived aggregated statistics by MLPCA can then be tested as a surrogate variable in regression models to detect the gene-environment interaction from rare variants. In addition, MLPCA searches for the optimal linear combination from the best subset of rare variants according to MAF that has the maximum association with the given trait. We compared the power of our MLPCA-based methods with four existing collapsing methods in gene-environment interaction association analysis using both our simulation data set and Genetic Analysis Workshop 17 (GAW17) data. Our experimental results have demonstrated that MLPCA on two forms of genotype data representations achieves higher statistical power than those existing methods and can be further improved by introducing the appropriate sparsity penalty. The performance improvement by our MLPCA-based methods result from the derived aggregated statistics by explicitly modeling categorical SNP data and searching for the maximum associated subset of SNPs for collapsing, which helps better capture the combined effect from individual rare variants and the interaction with environmental factors. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Network-Based Methods to Identify Highly Discriminating Subsets of Biomarkers

    Page(s): 1029 - 1037
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (631 KB) |  | HTML iconHTML  

    Complex diseases such as various types of cancer and diabetes are conjectured to be triggered and influenced by a combination of genetic and environmental factors. To integrate potential effects from interplay among underlying candidate factors, we propose a new network-based framework to identify effective biomarkers by searching for groups of synergistic risk factors with high predictive power to disease outcome. An interaction network is constructed with node weights representing individual predictive power of candidate factors and edge weights capturing pairwise synergistic interactions among factors. We then formulate this network-based biomarker identification problem as a novel graph optimization model to search for multiple cliques with maximum overall weight, which we denote as the Maximum Weighted Multiple Clique Problem (MWMCP). To achieve optimal or near optimal solutions, both an analytical algorithm based on column generation method and a fast heuristic for large-scale networks have been derived. Our algorithms for MWMCP have been implemented to analyze two biomedical data sets: a Type 1 Diabetes (T1D) data set from the Diabetes Prevention Trial-Type 1 (DPT-1) study, and a breast cancer genomics data set for metastasis prognosis. The results demonstrate that our network-based methods can identify important biomarkers with better prediction accuracy compared to the conventional feature selection that only considers individual effects. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • BM-SNP: A Bayesian Model for SNP Calling Using High Throughput Sequencing Data

    Page(s): 1038 - 1044
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (670 KB) |  | HTML iconHTML  

    A single-nucleotide polymorphism (SNP) is a sole base change in the DNA sequence and is the most common polymorphism. Detection and annotation of SNPs are among the central topics in biomedical research as SNPs are believed to play important roles on the manifestation of phenotypic events, such as disease susceptibility. To take full advantage of the next-generation sequencing (NGS) technology, we propose a Bayesian approach, BM-SNP, to identify SNPs based on the posterior inference using NGS data. In particular, BM-SNP computes the posterior probability of nucleotide variation at each covered genomic position using the contents and frequency of the mapped short reads. The position with a high posterior probability of nucleotide variation is flagged as a potential SNP. We apply BM-SNP to two cell-line NGS data, and the results show a high ratio of overlap (${>}95$ percent) with the dbSNP database. Compared with MAQ, BM-SNP identifies more SNPs that are in dbSNP, with higher quality. The SNPs that are called only by BM-SNP but not in dbSNP may serve as new discoveries. The proposed BM-SNP method integrates information from multiple aspects of NGS data, and therefore achieves high detection power. BM-SNP is fast, capable of processing whole genome data at 20-fold average coverage in a short amount of time. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Unfold High-Dimensional Clouds for Exhaustive Gating of Flow Cytometry Data

    Page(s): 1045 - 1051
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2066 KB) |  | HTML iconHTML  

    Flow cytometry is able to measure the expressions of multiple proteins simultaneously at the single-cell level. A flow cytometry experiment on one biological sample provides measurements of several protein markers on or inside a large number of individual cells in that sample. Analysis of such data often aims to identify subpopulations of cells with distinct phenotypes. Currently, the most widely used analytical approach in the flow cytometry community is manual gating on a sequence of nested biaxial plots, which is highly subjective, labor intensive, and not exhaustive. To address those issues, a number of methods have been developed to automate the gating analysis by clustering algorithms. However, completely removing the subjectivity can be quite challenging. This paper describes an alternative approach. Instead of automating the analysis, we develop novel visualizations to facilitate manual gating. The proposed method views single-cell data of one biological sample as a high-dimensional point cloud of cells, derives the skeleton of the cloud, and unfolds the skeleton to generate 2D visualizations. We demonstrate the utility of the proposed visualization using real data, and provide quantitative comparison to visualizations generated from principal component analysis and multidimensional scaling. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Basic Protein Comparative Three-Dimensional Modeling Methodological Workflow Theory and Practice

    Page(s): 1052 - 1065
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1445 KB) |  | HTML iconHTML  

    When working with proteins and studying its properties, it is crucial to have access to the three-dimensional structure of the molecule. If experimentally solved structures are not available, comparative modeling techniques can be used to generate useful protein models to subsidize structure-based research projects. In recent years, with Bioinformatics becoming the basis for the study of protein structures, there is a crescent need for the exposure of details about the algorithms behind the softwares and servers, as well as a need for protocols to guide in silico predictive experiments. In this article, we explore different steps of the comparative modeling technique, such as template identification, sequence alignment, generation of candidate structures and quality assessment, its peculiarities and theoretical description. We then present a practical step-by-step workflow, to support the Biologist on the in silico generation of protein structures. Finally, we explore further steps on comparative modeling, presenting perspectives to the study of protein structures through Bioinformatics. We trust that this is a thorough guide for beginners that wish to work on the comparative modeling of proteins. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Parameter Estimation Method for Biological Systems modelled by ODE/DDE Models Using Spline Approximation and Differential Evolution Algorithm

    Page(s): 1066 - 1076
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1103 KB) |  | HTML iconHTML  

    The inverse problem of identifying unknown parameters of known structure dynamical biological systems, which are modelled by ordinary differential equations or delay differential equations, from experimental data is treated in this paper. A two stage approach is adopted: first, combine spline theory and Nonlinear Programming (NLP), the parameter estimation problem is formulated as an optimization problem with only algebraic constraints; then, a new differential evolution (DE) algorithm is proposed to find a feasible solution. The approach is designed to handle problem of realistic size with noisy observation data. Three cases are studied to evaluate the performance of the proposed algorithm: two are based on benchmark models with priori-determined structure and parameters; the other one is a particular biological system with unknown model structure. In the last case, only a set of observation data available and in this case a nominal model is adopted for the identification. All the test systems were successfully identified by using a reasonable amount of experimental data within an acceptable computation time. Experimental evaluation reveals that the proposed method is capable of fast estimation on the unknown parameters with good precision. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Efficient and Very Accurate Method for Calculating Steady-State Sensitivities in Metabolic Reaction Systems

    Page(s): 1077 - 1086
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1487 KB) |  | HTML iconHTML  

    Stability and sensitivity analyses of biological systems require the ad hoc writing of computer code, which is highly dependent on the particular model and burdensome for large systems. We propose a very accurate strategy to overcome this challenge. Its core concept is the conversion of the model into the format of biochemical systems theory (BST), which greatly facilitates the computation of sensitivities. First, the steady state of interest is determined by integrating the model equations toward the steady state and then using a Newton-Raphson method to fine-tune the result. The second step of conversion into the BST format requires several instances of numerical differentiation. The accuracy of this task is ensured by the use of a complex-variable Taylor scheme for all differentiation steps. The proposed strategy is implemented in a new software program, COSMOS, which automates the stability and sensitivity analysis of essentially arbitrary ODE models in a quick, yet highly accurate manner. The methods underlying the process are theoretically analyzed and illustrated with four representative examples: a simple metabolic reaction model; a model of aspartate-derived amino acid biosynthesis; a TCA-cycle model; and a modified TCA-cycle model. COSMOS has been deposited to https://github.com/ BioprocessdesignLab/COSMOS. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Classifying Protein Sequences Using Regularized Multi-Task Learning

    Page(s): 1087 - 1098
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (733 KB) |  | HTML iconHTML  

    Classification problems in which several learning tasks are organized hierarchically pose a special challenge because the hierarchical structure of the problems needs to be considered. Multi-task learning (MTL) provides a framework for dealing with such interrelated learning tasks. When two different hierarchical sources organize similar information, in principle, this combined knowledge can be exploited to further improve classification performance. We have studied this problem in the context of protein structure classification by integrating the learning process for two hierarchical protein structure classification database, SCOP and CATH. Our goal is to accurately predict whether a given protein belongs to a particular class in these hierarchies using only the amino acid sequences. We have utilized the recent developments in multi-task learning to solve the interrelated classification problems. We have also evaluated how the various relationships between tasks affect the classification performance. Our evaluations show that learning schemes in which both the classification databases are used outperform the schemes which utilize only one of them. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Computing Elementary Flux Modes Involving a Set of Target Reactions

    Page(s): 1099 - 1107
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (263 KB) |  | HTML iconHTML  

    Elementary flux mode (EM) computation is an important tool in the constraint-based analysis of genome-scale metabolic networks. Due to the combinatorial complexity of these networks, as well as the advances in the level of detail to which they can be reconstructed, an exhaustive enumeration of all EMs is often not practical. Therefore, in recent years interest has shifted towards searching EMs with specific properties. We present a novel method that allows computing EMs containing a given set of target reactions. This generalizes previous algorithms where the set of target reactions consists of a single reaction. In the one-reaction case, our method compares favorably to the previous approaches. In addition, we present several applications of our algorithm for computing EMs containing two target reactions in genome-scale metabolic networks. A software tool implementing the algorithms described in this paper is available at https://sourceforge.net/ projects/caefm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Detection of Replication Origin Sites in Herpesvirus Genomes by Clustering and Scoring of Palindromes with Quadratic Entropy Measures

    Page(s): 1108 - 1118
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (940 KB) |  | HTML iconHTML  

    Replication in herpesvirus genomes is a major concern of public health as they multiply rapidly during the lytic phase of infection that cause maximum damage to the host cells. Earlier research has established that sites of replication origin are dominated by high concentration of rare palindrome sequences of DNA. Computational methods are devised based on scoring to determine the concentration of palindromes. In this paper, we propose both extraction and localization of rare palindromes in an automated manner. Discrete Cosine Transform (DCT-II), a widely recognized image compression algorithm is utilized here to extract palindromic sequences based on their reverse complimentary symmetry property of existence. We formulate a novel approach to localize the rare palindrome clusters by devising a Minimum Quadratic Entropy (MQE) measure based on the Renyi’s Quadratic Entropy (RQE) function. Experimental results over a large number of herpesvirus genomes show that the RQE based scoring of rare palindromes have higher order of sensitivity, and lesser false alarm in detecting concentration of rare palindromes and thereby sites of replication origin. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Determining Semantically Related Significant Genes

    Page(s): 1119 - 1130
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (479 KB) |  | HTML iconHTML  

    GO relation embodies some aspects of existence dependency. If GO term x is existence-dependent on GO term y, the presence of y implies the presence of x. Therefore, the genes annotated with the function of the GO term y are usually functionally and semantically related to the genes annotated with the function of the GO term x. A large number of gene set enrichment analysis methods have been developed in recent years for analyzing gene sets enrichment. However, most of these methods overlook the structural dependencies between GO terms in GO graph by not considering the concept of existence dependency. We propose in this paper a biological search engine called RSGSearch that identifies enriched sets of genes annotated with different functions using the concept of existence dependency. We observe that GO term x cannot be existence-dependent on GO term y, if x and y have the same specificity (biological characteristics). After encoding into a numeric format the contributions of GO terms annotating target genes to the semantics of their lowest common ancestors (LCAs), RSGSearch uses microarray experiment to identify the most significant LCA that annotates the result genes. We evaluated RSGSearch experimentally and compared it with five gene set enrichment systems. Results showed marked improvement. View full abstract»

    Open Access
  • GECC: Gene Expression Based Ensemble Classification of Colon Samples

    Page(s): 1131 - 1145
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (930 KB) |  | HTML iconHTML  

    Gene expression deviates from its normal composition in case a patient has cancer. This variation can be used as an effective tool to find cancer. In this study, we propose a novel gene expressions based colon classification scheme (GECC) that exploits the variations in gene expressions for classifying colon gene samples into normal and malignant classes. Novelty of GECC is in two complementary ways. First, to cater overwhelmingly larger size of gene based data sets, various feature extraction strategies, like, chi-square, F-Score, principal component analysis (PCA) and minimum redundancy and maximum relevancy (mRMR) have been employed, which select discriminative genes amongst a set of genes. Second, a majority voting based ensemble of support vector machine (SVM) has been proposed to classify the given gene based samples. Previously, individual SVM models have been used for colon classification, however, their performance is limited. In this research study, we propose an SVM-ensemble based new approach for gene based classification of colon, wherein the individual SVM models are constructed through the learning of different SVM kernels, like, linear, polynomial, radial basis function (RBF), and sigmoid. The predicted results of individual models are combined through majority voting. In this way, the combined decision space becomes more discriminative. The proposed technique has been tested on four colon, and several other binary-class gene expression data sets, and improved performance has been achieved compared to previously reported gene based colon cancer detection techniques. The computational time required for the training and testing of 208 × 5,851 data set has been 591.01 and 0.019 s, respectively. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Gene Selection Using Locality Sensitive Laplacian Score

    Page(s): 1146 - 1156
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (497 KB) |  | HTML iconHTML  

    Gene selection based on microarray data, is highly important for classifying tumors accurately. Existing gene selection schemes are mainly based on ranking statistics. From manifold learning standpoint, local geometrical structure is more essential to characterize features compared with global information. In this study, we propose a supervised gene selection method called locality sensitive Laplacian score (LSLS), which incorporates discriminative information into local geometrical structure, by minimizinglocal within-class information and maximizing local between-class information simultaneously. In addition, variance information isconsidered in our algorithm framework. Eventually, to find more superior gene subsets, which is significant for biomarker discovery, a two-stage feature selection method that combines the LSLS and wrapper method (sequential forward selection or sequential backward selection) is presented. Experimental results of six publicly available gene expression profile data sets demonstrate the effectiveness of the proposed approach compared with a number of state-of-the-art gene selection methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Identification of Functionally Related Enzymes by Learning-to-Rank Methods

    Page(s): 1157 - 1169
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (714 KB) |  | HTML iconHTML  

    Enzyme sequences and structures are routinely used in the biological sciences as queries to search for functionally related enzymes in online databases. To this end, one usually departs from some notion of similarity, comparing two enzymes by looking for correspondences in their sequences, structures or surfaces. For a given query, the search operation results in a ranking of the enzymes in the database, from very similar to dissimilar enzymes, while information about the biological function of annotated database enzymes is ignored. In this work, we show that rankings of that kind can be substantially improved by applying kernel-based learning algorithms. This approach enables the detection of statistical dependencies between similarities of the active cleft and the biological function of annotated enzymes. This is in contrast to search-based approaches, which do not take annotated training data into account. Similarity measures based on the active cleft are known to outperform sequence-based or structure-based measures under certain conditions. We consider the Enzyme Commission (EC) classification hierarchy for obtaining annotated enzymes during the training phase. The results of a set of sizeable experiments indicate a consistent and significant improvement for a set of similarity measures that exploit information about small cavities in the surface of enzymes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Identifying Non-Redundant Gene Markers from Microarray Data: A Multiobjective Variable Length PSO-Based Approach

    Page(s): 1170 - 1183
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (632 KB) |  | HTML iconHTML  

    Identifying relevant genes which are responsible for various types of cancer is an important problem. In this context, important genes refer to the marker genes which change their expression level in correlation with the risk or progression of a disease, or with the susceptibility of the disease to a given treatment. Gene expression profiling by microarray technology has been successfully applied to classification and diagnostic prediction of cancers. However, extracting these marker genes from a huge set of genes contained by the microarray data set is a major problem. Most of the existing methods for identifying marker genes find a set of genes which may be redundant in nature. Motivated by this, a multiobjective optimization method has been proposed which can find a small set of non-redundant disease related genes providing high sensitivity and specificity simultaneously. In this article, the optimization problem has been modeled as a multiobjective one which is based on the framework of variable length particle swarm optimization. Using some real-life data sets, the performance of the proposed algorithm has been compared with that of other state-of-the-art techniques. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Maximizing Protein Translation Rate in the Ribosome Flow Model: The Homogeneous Case

    Page(s): 1184 - 1195
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (618 KB) |  | HTML iconHTML  

    Gene translation is the process in which intracellular macro-molecules, called ribosomes, decode genetic information in the mRNA chain into the corresponding proteins. Gene translation includes several steps. During the elongation step, ribosomes move along the mRNA in a sequential manner and link amino-acids together in the corresponding order to produce the proteins. The homogeneous ribosome flow model (HRFM) is a deterministic computational model for translation-elongation under the assumption of constant elongation rates along the mRNA chain. The HRFM is described by a set of $n$ first-order nonlinear ordinary differential equations, where $n$ represents the number of sites along the mRNA chain. The HRFM also includes two positive parameters: ribosomal initiation rate and the (constant) elongation rate. In this paper, we show that the steady-state translation rate in the HRFM is a concave function of its parameters. This means that the problem of determining the parameter values that maximize the translation rate is relatively simple. Our results may contribute to a better understanding of the mechanisms and evolution of translation-elongation. We demonstrate this by using the theoretical results to estimate the initiation rate in M. musculus embryonic stem cell. The underlying assumption is that evolution optimized the translation mechanism. For the infinite-dimensional HRFM, we derive a closed-form solution to the problem of determining the initiation and transition rates that maximize the protein translation rate. We show that these expressions provide good approx- mations for the optimal values in the $n$ -dimensional HRFM already for relatively small values of $n$ . These results may have applications for synthetic biology where an important problem is to re-engineer genomic systems in order to maximize the protein production rate. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Molecular Modeling and Evaluation of Novel Dibenzopyrrole Derivatives as Telomerase Inhibitors and Potential Drug for Cancer Therapy

    Page(s): 1196 - 1207
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1036 KB) |  | HTML iconHTML  

    During previous years, many studies on synthesis, as well as on anti-tumor, anti-inflammatory and anti-bacterial activities of the pyrazole derivatives have been described. Certain pyrazole derivatives exhibit important pharmacological activities and have proved to be useful template in drug research. Considering importance of pyrazole template, in current work the series of novel inhibitors were designed by replacing central ring of acridine with pyrazole ring. These heterocyclic compounds were proposed as a new potential base for telomerase inhibitors. Obtained dibenzopyrrole structure was used as a novel scaffold structure and extension of inhibitors was done by different functional groups. Docking of newly designed compounds in the telomerase active site (telomerase catalytic subunit TERT) was carried out. All dibenzopyrrole derivatives were evaluated by three docking programs: CDOCKER, Ligandfit docking (Scoring Functions) and AutoDock. Compound C_9g, C_9k and C_9l performed best in comparison to all designed inhibitors during the docking in all methods and in interaction analysis. Introduction of pyrazole and extension of dibenzopyrrole in compounds confirm that such compound may act as potential telomerase inhibitors. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Nonparametric Tikhonov Regularized NMF and Its Application in Cancer Clustering

    Page(s): 1208 - 1217
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (315 KB) |  | HTML iconHTML  

    The Tikhonov regularized nonnegative matrix factorization (TNMF) is an NMF objective function that enforces smoothness on the computed solutions, and has been successfully applied to many problem domains including text mining, spectral data analysis, and cancer clustering. There is, however, an issue that is still insufficiently addressed in the development of TNMF algorithms, i.e., how to develop mechanisms that can learn the regularization parameters directly from the data sets. The common approach is to use fixed values based on a priori knowledge about the problem domains. However, from the linear inverse problems study it is known that the quality of the solutions of the Tikhonov regularized least square problems depends heavily on the choosing of appropriate regularization parameters. Since least squares are the building blocks of the NMF, it can be expected that similar situation also applies to the NMF. In this paper, we propose two formulas to automatically learn the regularization parameters from the data set based on the L-curve approach. We also develop a convergent algorithm for the TNMF based on the additive update rules. Finally, we demonstrate the use of the proposed algorithm in cancer clustering tasks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On Representing Protein Folding Patterns Using Non-Linear Parametric Curves

    Page(s): 1218 - 1228
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (874 KB) |  | HTML iconHTML  

    Proteins fold into complex three-dimensional shapes. Simplified representations of their shapes are central to rationalise, compare, classify, and interpret protein structures. Traditional methods to abstract protein folding patterns rely on representing their standard secondary structural elements (helices and strands of sheet) using line segments. This results in ignoring a significant proportion of structural information. The motivation of this research is to derive mathematically rigorous and biologically meaningful abstractions of protein folding patterns that maximize the economy of structural description and minimize the loss of structural information. We report on a novel method to describe a protein as a non-overlapping set of parametric three dimensional curves of varying length and complexity. Our approach to this problem is supported by information theory and uses the statistical framework of minimum message length (MML) inference. We demonstrate the effectiveness of our non-linear abstraction to support efficient and effective comparison of protein folding patterns on a large scale. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

This bimonthly publishes archival research results related to the algorithmic, mathematical, statistical, and computational methods that are central in bioinformatics and computational biology.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Ying Xu
University of Georgia
xyn@bmb.uga.edu

Associate Editor-in-Chief
Dong Xu
University of Missouri
xudong@missouri.edu