<?xml version="1.0" ?>
<rss version="2.0">
	<channel>
		<title><![CDATA[ Computational Biology and Bioinformatics, IEEE/ACM Transactions on - new TOC ]]></title>
		<link>http://ieeexplore.ieee.org</link>
		<description>TOC Alert for Publication# 8857 </description>
		<year>2012</year>
		<month>February </month>
		<day>10</day>
		<item>
			<title><![CDATA[Cover1]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6138580]]></link>
			<description><![CDATA[ ]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6138580]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>c1</startPage>
			<endPage>c1</endPage>
			<fileSize>1096</fileSize>
			<authors><![CDATA[]]></authors>
		</item>
		<item>
			<title><![CDATA[Cover2]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6138581]]></link>
			<description><![CDATA[ ]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6138581]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>c2</startPage>
			<endPage>c2</endPage>
			<fileSize>168</fileSize>
			<authors><![CDATA[]]></authors>
		</item>
		<item>
			<title><![CDATA[A Hybrid EKF and Switching PSO Algorithm for Joint State and Parameter Estimation of Lateral Flow Immunoassay Models]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6051426]]></link>
			<description><![CDATA[In this paper, a hybrid extended Kalman filter (EKF) and switching particle swarm optimization (SPSO) algorithm is proposed for jointly estimating both the parameters and states of the lateral flow immunoassay model through available short time-series measurement. Our proposed method generalizes the well-known EKF algorithm by imposing physical constraints on the system states. Note that the state constraints are encountered very often in practice that give rise to considerable difficulties in system analysis and design. The main purpose of this paper is to handle the dynamic modeling problem with state constraints by combining the extended Kalman filtering and constrained optimization algorithms via the maximization probability method. More specifically, a recently developed SPSO algorithm is used to cope with the constrained optimization problem by converting it into an unconstrained optimization one through adding a penalty term to the objective function. The proposed algorithm is then employed to simultaneously identify the parameters and states of a lateral flow immunoassay model. It is shown that the proposed algorithm gives much improved performance over the traditional EKF method.]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6051426]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>321</startPage>
			<endPage>329</endPage>
			<fileSize>653</fileSize>
			<authors><![CDATA[Zeng, Nianyin;Wang, Zidong;Li, Yurong;Du, Min;Liu, Xiaohui;]]></authors>
		</item>
		<item>
			<title><![CDATA[A New Efficient Algorithm for the Gene-Team Problem on General Sequences]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=5871587]]></link>
			<description><![CDATA[Identifying conserved gene clusters is an important step toward understanding the evolution of genomes and predicting the functions of genes. A famous model to capture the essential biological features of a conserved gene cluster is called the gene-team model. The problem of finding the gene teams of two general sequences is the focus of this paper. For this problem, He and Goldwasser had an efficient algorithm that requires O(mn) time using O(m + n) working space, where m and n are, respectively, the numbers of genes in the two given sequences. In this paper, a new efficient algorithm is presented. Assume m le n. Let C = sum _{alpha in Sigma } o_{1}(alpha )o_{2}(alpha ), where Sigma is the set of distinct genes, and o_{1}(alpha ) and o_{2}(alpha ) are, respectively, the numbers of copies of &#x03B1; in the two given sequences. Our new algorithm requires O({rm min}{C {rm lg} n, mn}) time using O(m + n) working space. As compared with He and Goldwasser's algorithm, our new algorithm is more practical, as C is likely to be much smaller than mn in practice. In addition, our new algorithm is output sensitive. Its running time is O({rm lg} n) times the size of the output. Moreover, our new algorithm can be efficiently extended to find the gene teams of k general sequences in O(k C lg (n_{1}n_{2} ldots n_{k})) time, where n_i is the number of genes in the ith input sequence.]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=5871587]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>330</startPage>
			<endPage>344</endPage>
			<fileSize>803</fileSize>
			<authors><![CDATA[Wang, Biing-Feng;Kuo, Chung-Chin;Liu, Shang-Ju;Lin, Chien-Hsin;]]></authors>
		</item>
		<item>
			<title><![CDATA[A New Efficient Data Structure for Storage and Retrieval of Multiple Biosequences]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6081847]]></link>
			<description><![CDATA[Today's genome analysis applications require sequence representations allowing for fast access to their contents while also being memory-efficient enough to facilitate analyses of large-scale data. While a wide variety of sequence representations exist, lack of a generic implementation of efficient sequence storage has led to a plethora of poorly reusable or programming language-specific implementations. We present a novel, space-efficient data structure (GtEncseq) for storing multiple biological sequences of variable alphabet size, with customizable character transformations, wildcard support, and an assortment of internal representations optimized for different distributions of wildcards and sequence lengths. For the human genome (3.1 gigabases, including 237 million wildcard characters) our representation requires only 2 + 8cdot 10^{-6} bits per character. Implemented in C, our portable software implementation provides a variety of methods for random and sequential access to characters and substrings (including different reading directions) using an object-oriented interface. In addition, it includes access to metadata like sequence descriptions or character distributions. The library is extensible to be used from various scripting languages. GtEncseq is much more versatile than previous solutions, adding features that were previously unavailable. Benchmarks show that it is competitive with respect to space and time requirements.]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6081847]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>345</startPage>
			<endPage>357</endPage>
			<fileSize>933</fileSize>
			<authors><![CDATA[Steinbiss, Sascha;Kurtz, Stefan;]]></authors>
		</item>
		<item>
			<title><![CDATA[A Swarm Intelligence Framework for Reconstructing Gene Networks: Searching for Biologically Plausible Architectures]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=5765940]]></link>
			<description><![CDATA[In this paper, we investigate the problem of reverse engineering the topology of gene regulatory networks from temporal gene expression data. We adopt a computational intelligence approach comprising swarm intelligence techniques, namely particle swarm optimization (PSO) and ant colony optimization (ACO). In addition, the recurrent neural network (RNN) formalism is employed for modeling the dynamical behavior of gene regulatory systems. More specifically, ACO is used for searching the discrete space of network architectures and PSO for searching the corresponding continuous space of RNN model parameters. We propose a novel solution construction process in the context of ACO for generating biologically plausible candidate architectures. The objective is to concentrate the search effort into areas of the structure space that contain architectures which are feasible in terms of their topological resemblance to real-world networks. The proposed framework is initially applied to the reconstruction of a small artificial network that has previously been studied in the context of gene network reverse engineering. Subsequently, we consider an artificial data set with added noise for reconstructing a subnetwork of the genetic interaction network of S. cerevisiae (yeast). Finally, the framework is applied to a real-world data set for reverse engineering the SOS response system of the bacterium Escherichia coli. Results demonstrate the relative advantage of utilizing problem-specific knowledge regarding biologically plausible structural properties of gene networks over conducting a problem-agnostic search in the vast space of network architectures.]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=5765940]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>358</startPage>
			<endPage>371</endPage>
			<fileSize>1083</fileSize>
			<authors><![CDATA[Kentzoglanakis, Kyriakos;Poole, Matthew;]]></authors>
		</item>
		<item>
			<title><![CDATA[Algorithms for Reticulate Networks of Multiple Phylogenetic Trees]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6051423]]></link>
			<description><![CDATA[A reticulate network N of multiple phylogenetic trees may have nodes with two or more parents (called reticulation nodes). There are two ways to define the reticulation number of N. One way is to define it as the number of reticulation nodes in N in this case, a reticulate network with the smallest reticulation number is called an optimal type-I reticulate network of the trees. The better way is to define it as the total number of parents of reticulation nodes in N minus the number of reticulation nodes in N ; in this case, a reticulate network with the smallest reticulation number is called an optimal type-II reticulate network of the trees. In this paper, we first present a fast fixed-parameter algorithm for constructing one or all optimal type-I reticulate networks of multiple phylogenetic trees. We then use the algorithm together with other ideas to obtain an algorithm for estimating a lower bound on the reticulation number of an optimal type-II reticulate network of the input trees. To our knowledge, these are the first fixed-parameter algorithms for the problems. We have implemented the algorithms in ANSI C, obtaining programs CMPT and MaafB. Our experimental data show that CMPT can construct optimal type-I reticulate networks rapidly and MaafB can compute better lower bounds for optimal type-II reticulate networks within shorter time than the previously best program PIRN designed by Wu.]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6051423]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>372</startPage>
			<endPage>384</endPage>
			<fileSize>1760</fileSize>
			<authors><![CDATA[Chen, Zhi-Zhong;Wang, Lusheng;]]></authors>
		</item>
		<item>
			<title><![CDATA[Antilope&#x02014;A Lagrangian Relaxation Approach to the de novo Peptide Sequencing Problem]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=5740842]]></link>
			<description><![CDATA[Peptide sequencing from mass spectrometry data is a key step in proteome research. Especially de novo sequencing, the identification of a peptide from its spectrum alone, is still a challenge even for state-of-the-art algorithmic approaches. In this paper, we present antilope, a new fast and flexible approach based on mathematical programming. It builds on the spectrum graph model and works with a variety of scoring schemes. antilope combines Lagrangian relaxation for solving an integer linear programming formulation with an adaptation of Yen's k shortest paths algorithm. It shows a significant improvement in running time compared to mixed integer optimization and performs at the same speed like other state-of-the-art tools. We also implemented a generic probabilistic scoring scheme that can be trained automatically for a data set of annotated spectra and is independent of the mass spectrometer type. Evaluations on benchmark data show that antilope is competitive to the popular state-of-the-art programs PepNovo and NovoHMM both in terms of runtime and accuracy. Furthermore, it offers increased flexibility in the number of considered ion types. antilope will be freely available as part of the open source proteomics library OpenMS.]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=5740842]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>385</startPage>
			<endPage>394</endPage>
			<fileSize>378</fileSize>
			<authors><![CDATA[Andreotti, Sandro;Klau, Gunnar W.;Reinert, Knut;]]></authors>
		</item>
		<item>
			<title><![CDATA[Constructing and Drawing Regular Planar Split Networks]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=5989792]]></link>
			<description><![CDATA[Split networks are commonly used to visualize collections of bipartitions, also called splits, of a finite set. Such collections arise, for example, in evolutionary studies. Split networks can be viewed as a generalization of phylogenetic trees and may be generated using the SplitsTree package. Recently, the NeighborNet method for generating split networks has become rather popular, in part because it is guaranteed to always generate a circular split system, which can always be displayed by a planar split network. Even so, labels must be placed on the "outside&#x0201D; of the network, which might be problematic in some applications. To help circumvent this problem, it can be helpful to consider so-called flat split systems, which can be displayed by planar split networks where labels are allowed on the inside of the network too. Here, we present a new algorithm that is guaranteed to compute a minimal planar split network displaying a flat split system in polynomial time, provided the split system is given in a certain format. We will also briefly discuss two heuristics that could be useful for analyzing phylogeographic data and that allow the computation of flat split systems in this format in polynomial time.]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=5989792]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>395</startPage>
			<endPage>407</endPage>
			<fileSize>2566</fileSize>
			<authors><![CDATA[Spillner, Andreas;Nguyen, Binh;Moulton, Vincent;]]></authors>
		</item>
		<item>
			<title><![CDATA[DICLENS: Divisive Clustering Ensemble with Automatic Cluster Number]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6035671]]></link>
			<description><![CDATA[Clustering has a long and rich history in a variety of scientific fields. Finding natural groupings of a data set is a hard task as attested by hundreds of clustering algorithms in the literature. Each clustering technique makes some assumptions about the underlying data set. If the assumptions hold, good clusterings can be expected. It is hard, in some cases impossible, to satisfy all the assumptions. Therefore, it is beneficial to apply different clustering methods on the same data set, or the same method with varying input parameters or both. We propose a novel method, DICLENS, which combines a set of clusterings into a final clustering having better overall quality. Our method produces the final clustering automatically and does not take any input parameters, a feature missing in many existing algorithms. Extensive experimental studies on real, artificial, and gene expression data sets demonstrate that DICLENS produces very good quality clusterings in a short amount of time. DICLENS implementation runs on standard personal computers by being scalable, and by consuming very little memory and CPU.]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6035671]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>408</startPage>
			<endPage>420</endPage>
			<fileSize>3882</fileSize>
			<authors><![CDATA[Mimaroglu, Selim;Aksehirli, Emin;]]></authors>
		</item>
		<item>
			<title><![CDATA[Efficient Maximal Repeat Finding Using the Burrows-Wheeler Transform and Wavelet Tree]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6035669]]></link>
			<description><![CDATA[Finding repetitive structures in genomes and proteins is important to understand their biological functions. Many data compressors for modern genomic sequences rely heavily on finding repeats in the sequences. Small-scale and local repetitive structures are better understood than large and complex interspersed ones. The notion of maximal repeats captures all the repeats in the data in a space-efficient way. Prior work on maximal repeat finding used either a suffix tree or a suffix array along with other auxiliary data structures. Their space usage is 19-50 times the text size with the best engineering efforts, prohibiting their usability on massive data such as the whole human genome. We focus on finding all the maximal repeats from massive texts in a time- and space-efficient manner. Our technique uses the Burrows-Wheeler Transform and wavelet trees. For data sets consisting of natural language texts and protein data, the space usage of our method is no more than three times the text size. For genomic sequences stored using one byte per base, the space usage of our method is less than double the sequence size. Our space-efficient method keeps the timing performance fast. In fact, our method is orders of magnitude faster than the prior methods for processing massive texts such as the whole human genome, since the prior methods must use external memory. For the first time, our method enables a desktop computer with 8 GB internal memory (actual internal memory usage is less than 6 GB) to find all the maximal repeats in the whole human genome in less than 17 hours. We have implemented our method as general-purpose open-source software for public use.]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6035669]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>421</startPage>
			<endPage>429</endPage>
			<fileSize>834</fileSize>
			<authors><![CDATA[Kulekci, M. Oguzhan;Vitter, Jeffrey Scott;Xu, Bojian;]]></authors>
		</item>
		<item>
			<title><![CDATA[Eigen-Genomic System Dynamic-Pattern Analysis (ESDA): Modeling mRNA Degradation and Self-Regulation]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6081850]]></link>
			<description><![CDATA[High-throughput methods systematically measure the internal state of the entire cell, but powerful computational tools are needed to infer dynamics from their raw data. Therefore, we have developed a new computational method, Eigen-genomic System Dynamic-pattern Analysis (ESDA), which uses systems theory to infer dynamic parameters from a time series of gene expression measurements. As many genes are measured at a modest number of time points, estimation of the system matrix is underdetermined and traditional approaches for estimating dynamic parameters are ineffective; thus, ESDA uses the principle of dimensionality reduction to overcome the data imbalance. Since degradation rates are naturally confounded by self-regulation, our model estimates an effective degradation rate that is the difference between self-regulation and degradation. We demonstrate that ESDA is able to recover effective degradation rates with reasonable accuracy in simulation. We also apply ESDA to a budding yeast data set, and find that effective degradation rates are normally slower than experimentally measured degradation rates. Our results suggest that either self-regulation is widespread in budding yeast and that self-promotion dominates self-inhibition, or that self-regulation may be rare and that experimental methods for measuring degradation rates based on transcription arrest may severely overestimate true degradation rates in healthy cells.]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6081850]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>430</startPage>
			<endPage>437</endPage>
			<fileSize>416</fileSize>
			<authors><![CDATA[Wang, Daifeng;Markey, Mia K.;Wilke, Claus O.;Arapostathis, Ari;]]></authors>
		</item>
		<item>
			<title><![CDATA[GSGS: A Computational Approach to Reconstruct Signaling Pathway Structures from Gene Sets]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6051429]]></link>
			<description><![CDATA[Reconstruction of signaling pathway structures is essential to decipher complex regulatory relationships in living cells. The existing computational approaches often rely on unrealistic biological assumptions and do not explicitly consider signal transduction mechanisms. Signal transduction events refer to linear cascades of reactions from the cell surface to the nucleus and characterize a signaling pathway. In this paper, we propose a novel approach, Gene Set Gibbs Sampling (GSGS), to reverse engineer signaling pathway structures from gene sets related to the pathways. We hypothesize that signaling pathways are structurally an ensemble of overlapping linear signal transduction events which we encode as Information Flows (IFs). We infer signaling pathway structures from gene sets, referred to as Information Flow Gene Sets (IFGSs), corresponding to these events. Thus, an IFGS only reflects which genes appear in the underlying IF but not their ordering. GSGS offers a Gibbs sampling like procedure to reconstruct the underlying signaling pathway structure by sequentially inferring IFs from the overlapping IFGSs related to the pathway. In the proof-of-concept studies, our approach is shown to outperform the existing state-of-the-art network inference approaches using both continuous and discrete data generated from benchmark networks in the DREAM initiative. We perform a comprehensive sensitivity analysis to assess the robustness of our approach. Finally, we implement GSGS to reconstruct signaling mechanisms in breast cancer cells.]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6051429]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>438</startPage>
			<endPage>450</endPage>
			<fileSize>1534</fileSize>
			<authors><![CDATA[Acharya, Lipi;Judeh, Thair;Duan, Zhansheng;Rabbat, Michael;Zhu, Dongxiao;]]></authors>
		</item>
		<item>
			<title><![CDATA[Identification of Differentially Expressed Genes for Time-Course Microarray Data Based on Modified RM ANOVA]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=5744082]]></link>
			<description><![CDATA[The regulation of gene expression is a dynamic process, hence it is of vital interest to identify and characterize changes in gene expression over time. We present here a general statistical method for detecting changes in microarray expression over time within a single biological group and is based on repeated measures (RM) ANOVA. In this method, unlike the classical F-statistic, statistical significance is determined taking into account the time dependency of the microarray data. A correction factor for this RM F-statistic is introduced leading to a higher sensitivity as well as high specificity. We investigate the two approaches that exist in the literature for calculating the p-values using resampling techniques of gene-wise p-values and pooled p-values. It is shown that the pooled p-values method compared to the method of the gene-wise p-values is more powerful, and computationally less expensive, and hence is applied along with the introduced correction factor to various synthetic data sets and a real data set. These results show that the proposed technique outperforms the current methods. The real data set results are consistent with the existing knowledge concerning the presence of the genes. The algorithms presented are implemented in R and are freely available upon request.]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=5744082]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>451</startPage>
			<endPage>466</endPage>
			<fileSize>3704</fileSize>
			<authors><![CDATA[ElBakry, Ola;Ahmad, M. Omair;Swamy, M.N.S.;]]></authors>
		</item>
		<item>
			<title><![CDATA[Identifying Bacterial Virulent Proteins by Fusing a Set of Classifiers Based on Variants of Chou's Pseudo Amino Acid Composition and on Evolutionary Information]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=5999656]]></link>
			<description><![CDATA[The availability of a reliable prediction method for prediction of bacterial virulent proteins has several important applications in research efforts targeted aimed at finding novel drug targets, vaccine candidates, and understanding virulence mechanisms in pathogens. In this work, we have studied several feature extraction approaches for representing proteins and propose a novel bacterial virulent protein prediction method, based on an ensemble of classifiers where the features are extracted directly from the amino acid sequence and from the evolutionary information of a given protein. We have evaluated and compared several ensembles obtained by combining six feature extraction methods and several classification approaches based on two general purpose classifiers (i.e., Support Vector Machine and a variant of input decimated ensemble) and their random subspace version. An extensive evaluation was performed according to a blind testing protocol, where the parameters of the system are optimized using the training set and the system is validated in three different independent data sets, allowing selection of the most performing system and demonstrating the validity of the proposed method. Based on the results obtained using the blind test protocol, it is interesting to note that even if in each independent data set the most performing stand-alone method is not always the same, the fusion of different methods enhances prediction efficiency in all the tested independent data sets.]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=5999656]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>467</startPage>
			<endPage>475</endPage>
			<fileSize>943</fileSize>
			<authors><![CDATA[Nanni, Loris;Lumini, Alessandra;Gupta, Dinesh;Garg, Aarti;]]></authors>
		</item>
		<item>
			<title><![CDATA[Molecular Dynamics Trajectory Compression with a Coarse-Grained Model]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6051427]]></link>
			<description><![CDATA[Molecular dynamics trajectories are very data intensive thereby limiting sharing and archival of such data. One possible solution is compression of trajectory data. Here, trajectory compression based on conversion to the coarse-grained model PRIMO is proposed. The compressed data are about one third of the original data and fast decompression is possible with an analytical reconstruction procedure from PRIMO to all-atom representations. This protocol largely preserves structural features and to a more limited extent also energetic features of the original trajectory.]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6051427]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>476</startPage>
			<endPage>486</endPage>
			<fileSize>1754</fileSize>
			<authors><![CDATA[Cheng, Yi-Ming;Gopal, Srinivasa Murthy;Law, Sean M.;Feig, Michael;]]></authors>
		</item>
		<item>
			<title><![CDATA[Multiscale Binarization of Gene Expression Data for Reconstructing Boolean Networks]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=5740845]]></link>
			<description><![CDATA[Network inference algorithms can assist life scientists in unraveling gene-regulatory systems on a molecular level. In recent years, great attention has been drawn to the reconstruction of Boolean networks from time series. These need to be binarized, as such networks model genes as binary variables (either "expressed&#x0201D; or "not expressed&#x0201D;). Common binarization methods often cluster measurements or separate them according to statistical or information theoretic characteristics and may require many data points to determine a robust threshold. Yet, time series measurements frequently comprise only a small number of samples. To overcome this limitation, we propose a binarization that incorporates measurements at multiple resolutions. We introduce two such binarization approaches which determine thresholds based on limited numbers of samples and additionally provide a measure of threshold validity. Thus, network reconstruction and further analysis can be restricted to genes with meaningful thresholds. This reduces the complexity of network inference. The performance of our binarization algorithms was evaluated in network reconstruction experiments using artificial data as well as real-world yeast expression time series. The new approaches yield considerably improved correct network identification rates compared to other binarization techniques by effectively reducing the amount of candidate networks.]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=5740845]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>487</startPage>
			<endPage>498</endPage>
			<fileSize>569</fileSize>
			<authors><![CDATA[Hopfensitz, Martin;Mussel, Christoph;Wawra, Christian;Maucher, Markus;Kuhl, Michael;Neumann, Heiko;Kestler, Hans A.;]]></authors>
		</item>
		<item>
			<title><![CDATA[Mutation Region Detection for Closely Related Individuals without a Known Pedigree]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6051420]]></link>
			<description><![CDATA[Linkage analysis serves as a way of finding locations of genes that cause genetic diseases. Linkage studies have facilitated the identification of several hundreds of human genes that can harbor mutations which by themselves lead to a disease phenotype. The fundamental problem in linkage analysis is to identify regions whose allele is shared by all or almost all affected members but by none or few unaffected members. Almost all the existing methods for linkage analysis are for families with clearly given pedigrees. Little work has been done for the case where the sampled individuals are closely related, but their pedigree is not known. This situation occurs very often when the individuals share a common ancestor at least six generations ago. Solving this case will tremendously extend the use of linkage analysis for finding genes that cause genetic diseases. In this paper, we propose a mathematical model (the shared center problem) for inferring the allele-sharing status of a given set of individuals using a database of confirmed haplotypes as reference. We show the NP-completeness of the shared center problem and present a ratio-2 polynomial-time approximation algorithm for its minimization version (called the closest shared center problem). We then convert the approximation algorithm into a heuristic algorithm for the shared center problem. Based on this heuristic, we finally design a heuristic algorithm for mutation region detection. We further implement the algorithms to obtain a software package. Our experimental data show that the software is both fast and accurate. The package is available at &#x003E;http://www.cs.cityu.edu.hk/~lwang/software/LDWP/ for noncommercial use.]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6051420]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>499</startPage>
			<endPage>510</endPage>
			<fileSize>1520</fileSize>
			<authors><![CDATA[Ma, Wenji;Yang, Yong;Chen, Zhi-Zhong;Wang, Lusheng;]]></authors>
		</item>
		<item>
			<title><![CDATA[On Complexity of Protein Structure Alignment Problem under Distance Constraint]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6051419]]></link>
			<description><![CDATA[We study the well-known Largest Common Point-set (LCP) under Bottleneck Distance Problem. Given two proteins a and b (as sequences of points in three-dimensional space) and a distance cutoff sigma, the goal is to find a spatial superposition and an alignment that maximizes the number of pairs of points from a and b that can be fit under the distance sigma from each other. The best to date algorithms for approximate and exact solution to this problem run in time O(n^8 ) and O(n^{32} ), respectively, where n represents protein length. This work improves runtime of the approximation algorithm and the expected runtime of the algorithm for absolute optimum for both order-dependent and order-independent alignments. More specifically, our algorithms for near-optimal and optimal sequential alignments run in time O(n^7 log n) and O(n^{14} log n), respectively. For nonsequential alignments, corresponding running times are O(n^{7.5} ) and O(n^{14.5} ).]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6051419]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>511</startPage>
			<endPage>516</endPage>
			<fileSize>552</fileSize>
			<authors><![CDATA[Poleksic, Aleksandar;]]></authors>
		</item>
		<item>
			<title><![CDATA[On the Elusiveness of Clusters]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6035670]]></link>
			<description><![CDATA[Rooted phylogenetic networks are often used to represent conflicting phylogenetic signals. Given a set of clusters, a network is said to represent these clusters in the softwired sense if, for each cluster in the input set, at least one tree embedded in the network contains that cluster. Motivated by parsimony we might wish to construct such a network using as few reticulations as possible, or minimizing the level of the network, i.e., the maximum number of reticulations used in any "tangled&#x0201D; region of the network. Although these are NP-hard problems, here we prove that, for every fixed k ge 0, it is polynomial-time solvable to construct a phylogenetic network with level equal to k representing a cluster set, or to determine that no such network exists. However, this algorithm does not lend itself to a practical implementation. We also prove that the comparatively efficient Cass algorithm correctly solves this problem (and also minimizes the reticulation number) when input clusters are obtained from two not necessarily binary gene trees on the same set of taxa but does not always minimize level for general cluster sets. Finally, we describe a new algorithm which generates in polynomial-time all binary phylogenetic networks with exactly r reticulations representing a set of input clusters (for every fixed r ge 0).]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6035670]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>517</startPage>
			<endPage>534</endPage>
			<fileSize>1286</fileSize>
			<authors><![CDATA[Kelk, Steven;Scornavacca, Celine;van Iersel, Leo;]]></authors>
		</item>
		<item>
			<title><![CDATA[Optimizing Phylogenetic Networks for Circular Split Systems]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=5963638]]></link>
			<description><![CDATA[We address the problem of realizing a given distance matrix by a planar phylogenetic network with a minimum number of faces. With the help of the popular software SplitsTree4, we start by approximating the distance matrix with a distance metric that is a linear combination of circular splits. The main results of this paper are the necessary and sufficient conditions for the existence of a network with a single face. We show how such a network can be constructed, and we present a heuristic for constructing a network with few faces using the first algorithm as the base case. Experimental results on biological data show that this heuristic algorithm can produce phylogenetic networks with far fewer faces than the ones computed by SplitsTree4, without affecting the approximation of the distance matrix.]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=5963638]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>535</startPage>
			<endPage>547</endPage>
			<fileSize>964</fileSize>
			<authors><![CDATA[Phipps, Paul;Bereg, Sergey;]]></authors>
		</item>
		<item>
			<title><![CDATA[Output-Sensitive Algorithms for Finding the Nested Common Intervals of Two General Sequences]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=5989789]]></link>
			<description><![CDATA[The focus of this paper is the problem of finding all nested common intervals of two general sequences. Depending on the treatment one wants to apply to duplicate genes, Blin et al. introduced three models to define nested common intervals of two sequences: the uniqueness, the free-inclusion, and the bijection models. We consider all the three models. For the uniqueness and the bijection models, we give O(n + N_{rm out})-time algorithms, where N_{rm out} denotes the size of the output. For the free-inclusion model, we give an O(n^{1 + varepsilon } + N_{{rm out}})-time algorithm, where varepsilon &#x003E; 0 is an arbitrarily small constant. We also present an upper bound on the size of the output for each model. For the uniqueness and the free-inclusion models, we show that N_{rm out}=O(n^{2}). Let C = sum _{g in Gamma } o_{1}(g)o_{2}(g), where Gamma is the set of distinct genes, and o_{1}(g) and o_{2}(g) are, respectively, the numbers of copies of g in the two given sequences. For the bijection model, we show that N_{rm out}=O(Cn). In this paper, we also study the problem of finding all approximate nested common intervals of two sequences on the bijection model. An O(delta n + N_{{rm out}})-time algorithm is presented, where delta denotes the maximum number of allowed gaps. In addition, we show that for this problem N_{rm out} is O(delta n^{3}).]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=5989789]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>548</startPage>
			<endPage>559</endPage>
			<fileSize>315</fileSize>
			<authors><![CDATA[Wang, Biing-Feng;]]></authors>
		</item>
		<item>
			<title><![CDATA[Parallelized Evolutionary Learning for Detection of Biclusters in Gene Expression Data]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=5728798]]></link>
			<description><![CDATA[The analysis of gene expression data obtained from microarray experiments is important for discovering the biological process of genes. Biclustering algorithms have been proven to be able to group the genes with similar expression patterns under a number of experimental conditions. In this paper, we propose a new biclustering algorithm based on evolutionary learning. By converting the biclustering problem into a common clustering problem, the algorithm can be applied in a search space constructed by the conditions. To further reduce the size of the search space, we randomly separate the full conditions into a number of condition subsets (subspaces), each of which has a smaller number of conditions. The algorithm is applied to each subspace and is able to discover bicluster seeds within a limited computing time. Finally, an expanding and merging procedure is employed to combine the bicluster seeds into larger biclusters according to a homogeneity criterion. We test the performance of the proposed algorithm using synthetic and real microarray data sets. Compared with several previously developed biclustering algorithms, our algorithm demonstrates a significant improvement in discovering additive biclusters.]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=5728798]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>560</startPage>
			<endPage>570</endPage>
			<fileSize>1355</fileSize>
			<authors><![CDATA[Huang, Qinghua;Tao, Dacheng;Li, Xuelong;Liew, Alan;]]></authors>
		</item>
		<item>
			<title><![CDATA[Quantum Gate Circuit Model of Signal Integration in Bacterial Quorum Sensing]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=5959159]]></link>
			<description><![CDATA[Bacteria evolved cell to cell communication processes to gain information about their environment and regulate gene expression. Quorum sensing is such a process in which signaling molecules, called autoinducers, are produced, secreted and detected. In several cases bacteria use more than one autoinducers and integrate the information conveyed by them. It has not yet been explained adequately why bacteria evolved such signal integration circuits and what can learn about their environments using more than one autoinducers since all signaling pathways merge in one. Here quantum information theory, which includes classical information theory as a special case, is used to construct a quantum gate circuit that reproduces recent experimental results. Although the conditions in which biosystems exist do not allow for the appearance of quantum mechanical phenomena, the powerful computation tools of quantum information processing can be carefully used to cope with signal and information processing by these complex systems. A simulation algorithm based on this model has been developed and numerical experiments that analyze the dynamical operation of the quorum sensing circuit were performed for various cases of autoinducer variations, which revealed that these variations contain significant information about the environment in which bacteria exist.]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=5959159]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>571</startPage>
			<endPage>579</endPage>
			<fileSize>867</fileSize>
			<authors><![CDATA[Karafyllidis, Ioannis G.;]]></authors>
		</item>
		<item>
			<title><![CDATA[Robust Classification Method of Tumor Subtype by Using Correlation Filters]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6051421]]></link>
			<description><![CDATA[Tumor classification based on Gene Expression Profiles (GEPs), which is of great benefit to the accurate diagnosis and personalized treatment for different types of tumor, has drawn a great attention in recent years. This paper proposes a novel tumor classification method based on correlation filters to identify the overall pattern of tumor subtype hidden in differentially expressed genes. Concretely, two correlation filters, i.e., Minimum Average Correlation Energy (MACE) and Optimal Tradeoff Synthetic Discriminant Function (OTSDF), are introduced to determine whether a test sample matches the templates synthesized for each subclass. The experiments on six publicly available data sets indicate that the proposed method is robust to noise, and can more effectively avoid the effects of dimensionality curse. Compared with many model-based methods, the correlation filter-based method can achieve better performance when balanced training sets are exploited to synthesize the templates. Particularly, the proposed method can detect the similarity of overall pattern while ignoring small mismatches between test sample and the synthesized template. And it performs well even if only a few training samples are available. More importantly, the experimental results can be visually represented, which is helpful for the further analysis of results.]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6051421]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>580</startPage>
			<endPage>591</endPage>
			<fileSize>2385</fileSize>
			<authors><![CDATA[Wang, Shu-Lin;Zhu, Yi-Hai;Jia, Wei;Huang, De-Shuang;]]></authors>
		</item>
		<item>
			<title><![CDATA[SimBioNeT: A Simulator of Biological Network Topology]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=5999655]]></link>
			<description><![CDATA[Studying biological networks at topological level is a major issue in computational biology studies and simulation is often used in this context, either to assess reverse engineering algorithms or to investigate how topological properties depend on network parameters. In both contexts, it is desirable for a topology simulator to reproduce the current knowledge on biological networks, to be able to generate a number of networks with the same properties and to be flexible with respect to the possibility to mimic networks of different organisms. We propose a biological network topology simulator, SimBioNeT, in which module structures of different type and size are replicated at different level of network organization and interconnected, so to obtain the desired degree distribution, e.g., scale free, and a clustering coefficient constant with the number of nodes in the network, a typical characteristic of biological networks. Empirical assessment of the ability of the simulator to reproduce characteristic properties of biological network and comparison with E. coli and S. cerevisiae transcriptional networks demonstrates the effectiveness of our proposal.]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=5999655]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>592</startPage>
			<endPage>600</endPage>
			<fileSize>923</fileSize>
			<authors><![CDATA[Di Camillo, Barbara;Falda, Marco;Toffolo, Gianna;Cobelli, Claudio;]]></authors>
		</item>
		<item>
			<title><![CDATA[Structural SCOP Superfamily Level Classification Using Unsupervised Machine Learning]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=5989791]]></link>
			<description><![CDATA[One of the major research directions in bioinformatics is that of assigning superfamily classification to a given set of proteins. The classification reflects the structural, evolutionary, and functional relatedness. These relationships are embodied in a hierarchical classification, such as the Structural Classification of Protein (SCOP), which is mostly manually curated. Such a classification is essential for the structural and functional analyses of proteins. Yet a large number of proteins remain unclassified. In this study, we have proposed an unsupervised machine learning approach to classify and assign a given set of proteins to SCOP superfamilies. In the method, we have constructed a database and similarity matrix using P-values obtained from an all-against-all BLAST run and trained the network with the ART2 unsupervised learning algorithm using the rows of the similarity matrix as input vectors, enabling the trained network to classify the proteins from 0.82 to 0.97 f-measure accuracy. The performance of ART2 has been compared with that of spectral clustering, Random forest, SVM, and HHpred. ART2 performs better than the others except HHpred. HHpred performs better than ART2 and the sum of errors is smaller than that of the other methods evaluated.]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=5989791]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>601</startPage>
			<endPage>608</endPage>
			<fileSize>988</fileSize>
			<authors><![CDATA[Angadi, Ulavappa B.;Venkatesulu, M.;]]></authors>
		</item>
		<item>
			<title><![CDATA[Subcellular Localization Prediction through Boosting Association Rules]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6035673]]></link>
			<description><![CDATA[Computational methods for predicting protein subcellular localization have used various types of features, including N-terminal sorting signals, amino acid compositions, and text annotations from protein databases. Our approach does not use biological knowledge such as the sorting signals or homologues, but use just protein sequence information. The method divides a protein sequence into short k-mer sequence fragments which can be mapped to word features in document classification. A large number of class association rules are mined from the protein sequence examples that range from the N-terminus to the C-terminus. Then, a boosting algorithm is applied to those rules to build up a final classifier. Experimental results using benchmark data sets show that our method is excellent in terms of both the classification performance and the test coverage. The result also implies that the k-mer sequence features which determine subcellular locations do not necessarily exist in specific positions of a protein sequence. Online prediction service implementing our method is available at http://isoft.postech.ac.kr/research/BCAR/subcell.]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6035673]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>609</startPage>
			<endPage>618</endPage>
			<fileSize>1840</fileSize>
			<authors><![CDATA[Yoon, Yongwook;Lee, Gary Geunbae;]]></authors>
		</item>
		<item>
			<title><![CDATA[The Impact of Normalization and Phylogenetic Information on Estimating the Distance for Metagenomes]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6007126]]></link>
			<description><![CDATA[Metagenomics enables the study of unculturable microorganisms in different environments directly. Discriminating between the compositional differences of metagenomes is an important and challenging problem. Several distance functions have been proposed to estimate the differences based on functional profiles or taxonomic distributions; however, the strengths and limitations of such functions are still unclear. Initially, we analyzed three well-known distance functions and found very little difference between them in the clustering of samples. This motivated us to incorporate suitable normalizations and phylogenetic information into the functions so that we could cluster samples from both real and synthetic data sets. The results indicate significant improvement in sample clustering over that derived by rank-based normalization with phylogenetic information, regardless of whether the samples are from real or synthetic microbiomes. Furthermore, our findings suggest that considering suitable normalizations and phylogenetic information is essential when designing distance functions for estimating the differences between metagenomes. We conclude that incorporating rank-based normalization with phylogenetic information into the distance functions helps achieve reliable clustering results.]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6007126]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>619</startPage>
			<endPage>628</endPage>
			<fileSize>648</fileSize>
			<authors><![CDATA[Su, Chien-Hao;Wang, Tse-Yi;Hsu, Ming-Tsung;Weng, Francis Cheng-Hsuan;Kao, Cheng-Yan;Wang, Daryi;Tsai, Huai-Kuang;]]></authors>
		</item>
		<item>
			<title><![CDATA[The LASSO and Sparse Least Squares Regression Methods for SNP Selection in Predicting Quantitative Traits]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6051425]]></link>
			<description><![CDATA[Recent work concerning quantitative traits of interest has focused on selecting a small subset of single nucleotide polymorphisms (SNPs) from among the SNPs responsible for the phenotypic variation of the trait. When considered as covariates, the large number of variables (SNPs) and their association with those in close proximity pose challenges for variable selection. The features of sparsity and shrinkage of regression coefficients of the least absolute shrinkage and selection operator (LASSO) method appear attractive for SNP selection. Sparse partial least squares (SPLS) is also appealing as it combines the features of sparsity in subset selection and dimension reduction to handle correlations among SNPs. In this paper, we investigate application of the LASSO and SPLS methods for selecting SNPs that predict quantitative traits. We evaluate the performance of both methods with different criteria and under different scenarios using simulation studies. Results indicate that these methods can be effective in selecting SNPs that predict quantitative traits but are limited by some conditions. Both methods perform similarly overall but each exhibit advantages over the other in given situations. Both methods are applied to Canadian Holstein cattle data to compare their performance.]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6051425]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>629</startPage>
			<endPage>636</endPage>
			<fileSize>1162</fileSize>
			<authors><![CDATA[Feng, Zeny Z.;Yang, Xiaojian;Subedi, Sanjeena;McNicholas, Paul D.;]]></authors>
		</item>
		<item>
			<title><![CDATA[Whats new in Transactions]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6138582]]></link>
			<description><![CDATA[ ]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6138582]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>637</startPage>
			<endPage>637</endPage>
			<fileSize>339</fileSize>
			<authors><![CDATA[]]></authors>
		</item>
		<item>
			<title><![CDATA[ePUBs Available in the CSDL]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6138583]]></link>
			<description><![CDATA[ ]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6138583]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>638</startPage>
			<endPage>638</endPage>
			<fileSize>367</fileSize>
			<authors><![CDATA[]]></authors>
		</item>
		<item>
			<title><![CDATA[Stay Connected to the CSDL]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6138584]]></link>
			<description><![CDATA[ ]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6138584]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>639</startPage>
			<endPage>639</endPage>
			<fileSize>295</fileSize>
			<authors><![CDATA[]]></authors>
		</item>
		<item>
			<title><![CDATA[IEEE Computer Society Jobs board]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6138585]]></link>
			<description><![CDATA[ ]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6138585]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>640</startPage>
			<endPage>640</endPage>
			<fileSize>598</fileSize>
			<authors><![CDATA[]]></authors>
		</item>
		<item>
			<title><![CDATA[Cover3]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6138586]]></link>
			<description><![CDATA[ ]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6138586]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>c3</startPage>
			<endPage>c3</endPage>
			<fileSize>168</fileSize>
			<authors><![CDATA[]]></authors>
		</item>
		<item>
			<title><![CDATA[Cover4]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6138587]]></link>
			<description><![CDATA[ ]]></description>
			<pubDate><![CDATA[March-April  2012]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=6138579&arnumber=6138587]]></guid>
			<volume>9</volume>
			<issue>2</issue>
			<startPage>c4</startPage>
			<endPage>c4</endPage>
			<fileSize>1096</fileSize>
			<authors><![CDATA[]]></authors>
		</item>
	</channel>
</rss>
