# IEEE/ACM Transactions on Computational Biology and Bioinformatics

## Issue 6 • Nov.-Dec. 1 2017

• ### A Survey of Software and Hardware Approaches to Performing Read Alignment in Next Generation Sequencing

Publication Year: 2017, Page(s):1202 - 1213
Computational genomics is an emerging field that is enabling us to reveal the origins of life and the genetic basis of diseases such as cancer. Next Generation Sequencing (NGS) technologies have unleashed a wealth of genomic information by producing immense amounts of raw data. Before any functional analysis can be applied to this data, read alignment is applied to find the genomic coordinates of ... View full abstract»

• ### Batch Mode TD($lambda$ ) for Controlling Partially Observable Gene Regulatory Networks

Publication Year: 2017, Page(s):1214 - 1227
External control of gene regulatory networks (GRNs) has received much attention in recent years. The aim is to find a series of actions to apply to a gene regulation system making it avoid its diseased states. In this work, we propose a novel method for controlling partially observable GRNs combining batch mode reinforcement learning (Batch RL) and TD(λ) algorithms. Unlike the existing stud... View full abstract»

• ### Benchmark Dataset for Whole Genome Sequence Compression

Publication Year: 2017, Page(s):1228 - 1236
The research in DNA data compression lacks a standard dataset to test out compression tools specific to DNA. This paper argues that the current state of achievement in DNA compression is unable to be benchmarked in the absence of such scientifically compiled whole genome sequence dataset and proposes a benchmark dataset using multistage sampling procedure. Considering the genome sequence of organi... View full abstract»

• ### Copy Number Variations Detection: Unravelling the Problem in Tangible Aspects

Publication Year: 2017, Page(s):1237 - 1250
In the midst of the important genomic variants associated to the susceptibility and resistance to complex diseases, Copy Number Variations (CNV) has emerged as a prevalent class of structural variation. Following the flood of next-generation sequencing data, numerous tools publicly available have been developed to provide computational strategies to identify CNV at improved accuracy. This review g... View full abstract»

• ### Data Management for Heterogeneous Genomic Datasets

Publication Year: 2017, Page(s):1251 - 1264
Next Generation Sequencing (NGS), a family of technologies for reading DNA and RNA, is changing biological research, and will soon change medical practice, by quickly providing sequencing data and high-level features of numerous individual genomes in different biological and clinical conditions. The availability of millions of whole genome sequences may soon become the biggest and most important &... View full abstract»

• ### Detecting Pairwise Interactive Effects of Continuous Random Variables for Biomarker Identification with Small Sample Size

Publication Year: 2017, Page(s):1265 - 1275
Aberrant changes to interactions among cellular components have been conjectured to be potential causes of abnormalities in cellular functions. By systematic analysis of high-throughput-omics data, researchers hope to detect potential associations among measured variables for better biomarker identification and phenotype prediction. In this paper, we focus on the methods to measure pairwise intera... View full abstract»

• ### Effect of Aggregation Operators on Network-Based Disease Gene Prioritization: A Case Study on Blood Disorders

Publication Year: 2017, Page(s):1276 - 1287
Owing to the innate noise in the biological data sources, a single source or a single measure do not suffice for an effective disease gene prioritization. So, the integration of multiple data sources or aggregation of multiple measures is the need of the hour. The aggregation operators combine multiple related data values to a single value such that the combined value has the effect of all the ind... View full abstract»

• ### Enhancing Protein Conformational Space Sampling Using Distance Profile-Guided Differential Evolution

Publication Year: 2017, Page(s):1288 - 1301
De novo protein structure prediction aims to search for low-energy conformations as it follows the thermodynamics hypothesis that places native conformations at the global minimum of the protein energy surface. However, the native conformation is not necessarily located in the lowest-energy regions owing to the inaccuracies of the energy model. This study presents a differential evolution algorith... View full abstract»

• ### Extending the Applicability of Graphlets to Directed Networks

Publication Year: 2017, Page(s):1302 - 1315
With recent advances in high-throughput cell biology, the amount of cellular biological data has grown drastically. Such data is often modeled as graphs (also called networks) and studying them can lead to new insights into molecule-level organization. A possible way to understand their structure is by analyzing the smaller components that constitute them, namely network motifs and graphlets. Grap... View full abstract»

• ### High Class-Imbalance in pre-miRNA Prediction: A Novel Approach Based on deepSOM

Publication Year: 2017, Page(s):1316 - 1326
The computational prediction of novel microRNA within a full genome involves identifying sequences having the highest chance of being a miRNA precursor (pre-miRNA). These sequences are usually named candidates to miRNA. The well-known pre-miRNAs are usually only a few in comparison to the hundreds of thousands of potential candidates to miRNA that have to be analyzed, which makes this task a high ... View full abstract»

• ### Improving Biochemical Named Entity Recognition Using PSO Classifier Selection and Bayesian Combination Methods

Publication Year: 2017, Page(s):1327 - 1338
Named Entity Recognition (NER) is a basic step for large number of consequent text mining tasks in the biochemical domain. Increasing the performance of such recognition systems is of high importance and always poses a challenge. In this study, a new community based decision making system is proposed which aims at increasing the efficiency of NER systems in the chemical/ drug name context. Particl... View full abstract»

• ### ML-Space: Hybrid Spatial Gillespie and Particle Simulation of Multi-Level Rule-Based Models in Cell Biology

Publication Year: 2017, Page(s):1339 - 1349
Spatio-temporal dynamics of cellular processes can be simulated at different levels of detail, from (deterministic) partial differential equations via the spatial Stochastic Simulation algorithm to tracking Brownian trajectories of individual particles. We present a spatial simulation approach for multi-level rule-based models, which includes dynamically hierarchically nested cellular compartments... View full abstract»

• ### Multi-Block Bipartite Graph for Integrative Genomic Analysis

Publication Year: 2017, Page(s):1350 - 1358
Human diseases involve a sequence of complex interactions between multiple biological processes. In particular, multiple genomic data such as Single Nucleotide Polymorphism (SNP), Copy Number Variation (CNV), DNA Methylation (DM), and their interactions simultaneously play an important role in human diseases. However, despite the widely known complex multi-layer biological processes and increased ... View full abstract»

• ### Normalizing Kernels in the Billera-Holmes-Vogtmann Treespace

Publication Year: 2017, Page(s):1359 - 1365
As costs of genome sequencing have dropped precipitously, development of efficient bioinformatic methods to analyze genome structure and evolution have become ever more urgent. For example, most published phylogenomic studies involve either massive concatenation of sequences, or informal comparisons of phylogenies inferred on a small subset of orthologous genes, neither of which provides a compreh... View full abstract»

• ### Novel Methods for Microglia Segmentation, Feature Extraction, and Classification

Publication Year: 2017, Page(s):1366 - 1377
Segmentation and analysis of histological images provides a valuable tool to gain insight into the biology and function of microglial cells in health and disease. Common image segmentation methods are not suitable for inhomogeneous histology image analysis and accurate classification of microglial activation states has remained a challenge. In this paper, we introduce an automated image analysis f... View full abstract»

• ### Pluribus—Exploring the Limits of Error Correction Using a Suffix Tree

Publication Year: 2017, Page(s):1378 - 1388
Next generation sequencing technologies enable efficient and cost-effective genome sequencing. However, sequencing errors increase the complexity of the de novo assembly process, and reduce the quality of the assembled sequences. Many error correction techniques utilizing substring frequencies have been developed to mitigate this effect. In this paper, we present a novel and effective method calle... View full abstract»

• ### Predicting Protein-DNA Binding Residues by Weightedly Combining Sequence-Based Features and Boosting Multiple SVMs

Publication Year: 2017, Page(s):1389 - 1398
Protein-DNA interactions are ubiquitous in a wide variety of biological processes. Correctly locating DNA-binding residues solely from protein sequences is an important but challenging task for protein function annotations and drug discovery, especially in the post-genomic era where large volumes of protein sequences have quickly accumulated. In this study, we report a new predictor, named TargetD... View full abstract»

• ### Protein Inference from the Integration of Tandem MS Data and Interactome Networks

Publication Year: 2017, Page(s):1399 - 1409
Since proteins are digested into a mixture of peptides in the preprocessing step of tandem mass spectrometry (MS), it is difficult to determine which specific protein a shared peptide belongs to. In recent studies, besides tandem MS data and peptide identification information, some other information is exploited to infer proteins. Different from the methods which first use only tandem MS data to i... View full abstract»

• ### Reframed Genome-Scale Metabolic Model to Facilitate Genetic Design and Integration with Expression Data

Publication Year: 2017, Page(s):1410 - 1418
Genome-scale metabolic network models (GEMs) have played important roles in the design of genetically engineered strains and helped biologists to decipher metabolism. However, due to the complex gene-reaction relationships that exist in model systems, most algorithms have limited capabilities with respect to directly predicting accurate genetic design for metabolic engineering. In particular, meth... View full abstract»

• ### Significance and Functional Similarity for Identification of Disease Genes

Publication Year: 2017, Page(s):1419 - 1433
One of the most significant research issues in functional genomics is insilico identification of disease related genes. In this regard, the paper presents a new gene selection algorithm, termed as SiFS, for identification of disease genes. It integrates the information obtained from interaction network of proteins and gene expression profiles. The proposed SiFS algorithm culls out a subset of gene... View full abstract»

• ### Strategies for Comparing Metabolic Profiles: Implications for the Inference of Biochemical Mechanisms from Metabolomics Data

Publication Year: 2017, Page(s):1434 - 1445
Background: Large amounts of metabolomics data have been accumulated in recent years and await analysis. Previously, we had developed a systems biology approach to infer biochemical mechanisms underlying metabolic alterations observed in cancers and other diseases. The method utilized the typical Euclidean distance for comparing metabolic profiles. Here, we ask whether any of the numerous alternat... View full abstract»

• ### Triangular Alignment (TAME): A Tensor-Based Approach for Higher-Order Network Alignment

Publication Year: 2017, Page(s):1446 - 1458
Network alignment has extensive applications in comparative interactomics. Traditional approaches aim to simultaneously maximize the number of conserved edges and the underlying similarity of aligned entities. We propose a novel formulation of the network alignment problem that extends topological similarity to higher-order structures and provides a new objective function that maximizes the number... View full abstract»

• ### Unsupervised Binning of Metagenomic Assembled Contigs Using Improved Fuzzy C-Means Method

Publication Year: 2017, Page(s):1459 - 1467
Metagenomic contigs binning is a necessary step of metagenome analysis. After assembly, the number of contigs belonging to different genomes is usually unequal. So a metagenomic contigs dataset is a kind of imbalanced dataset and traditional fuzzy c-means method (FCM) fails to handle it very well. In this paper, we will introduce an improved version of fuzzy c-means method (IFCM) into metagenomic ... View full abstract»

• ### Collective Prediction of Disease-Associated miRNAs Based on Transduction Learning

Publication Year: 2017, Page(s):1468 - 1475
The discovery of human disease-related miRNA isa challenging problem for complex disease biology research. For existing computational methods, it is difficult to achieve excellent performance with sparse known miRNA-disease association verified by biological experiment. Here, we develop CPTL, a Collective Prediction based on Transduction Learning, to systematically prioritize miRNAs related to dis... View full abstract»

• ### Modeling and Identification of Amnioserosa Cell Mechanical Behavior by Using Mass-Spring Lattices

Publication Year: 2017, Page(s):1476 - 1481
Various mechanical models of live amnioserosa cells during Drosophila melanogaster’s dorsal closure are proposed. Such models account for specific biomechanical oscillating behaviors and depend on a different set of parameters. The identification of the parameters for each of the proposed models is accomplished according to a least-squares approach in such a way to best fit... View full abstract»

