Scheduled System Maintenance on May 29th, 2015:
IEEE Xplore will be upgraded between 11:00 AM and 10:00 PM EDT. During this time there may be intermittent impact on performance. We apologize for any inconvenience.
By Topic

Bioinformatics Conference, 2002. Proceedings. IEEE Computer Society

Date 16-16 Aug. 2002

Filter Results

Displaying Results 1 - 25 of 48
  • Proceedings IEEE Computer Society Bioinformatics Conference

    Publication Year: 2002
    Save to Project icon | Request Permissions | PDF file iconPDF (347 KB)  
    Freely Available from IEEE
  • Author index

    Publication Year: 2002 , Page(s): 347 - 348
    Save to Project icon | Request Permissions | PDF file iconPDF (187 KB)  
    Freely Available from IEEE
  • Designing oscillators in synthetic gene networks based on multi-scale dynamics

    Publication Year: 2002
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (217 KB) |  | HTML iconHTML  

    Multistability, oscillations, and switching exist at various levels of biological processes and organizations and have been investigated on the basis of many theoretical models, such as circadian oscillations with the period protein (PER) and the timeless protein (TIM) in Drosophila, and multistable dynamics regulated by transcriptional factors. Considerable experimental evidence suggests that cellular processes are intrinsically rhythmic or periodic. Various periodic oscillations with different time scales ranging from less than a second to more than a year, which may allow for living organisms to adapt their behaviors to a periodically varying environment, have also been observed experimentally. On the other hand, in synthetic gene networks, both toggle switch and repressilator have been theoretically proposed and further confirmed by experiments. All of these works stress the importance of feedback regulation of transcriptional factors, which is a key in giving rise to oscillatory or multistable dynamical behaviors exhibited by biological genetic systems. In addition, it should be noted that many periodic behaviors do not simply oscillate smoothly; rather, they change rapidly or jump at certain states. In gene expression systems, many different time scales characterize the gene regulatory processes. For instance, the transcription and translation processes generally evolve on a time scale that is much slower than that of phosphorylation, dimerization or binding reactions of transcription factors. In genetic networks, the time scale for expression of some genes is much slower than that of others, depending on the length of the genes. We aim to design robust periodic oscillators in synthetic gene-protein systems by simple nonlinear models and to analyze the basic mechanism of limit cycles with jumping behaviors or relaxation oscillations by exploiting multiple time-scale properties [1, 2]. We show that periodic oscillations are mainly generated by nonlinear feedback loops in gene regulatory systems and the jumping dynamics caused by time scale differences among biochemical reactions. Moreover, effects of time delay are also examined. We show that time delay generally enlarges the stability region of oscillations, thereby making the oscillations more - sustainable despite parameter changes or noise [1, 2]. The dynamics of the proposed models is robust in terms of stability and period length to the parameter perturbations or environment variations. Although we mainly analyze some specific models, the mechanisms identified in this work are likely to apply to a variety of genetic regulatory systems. These simple models may actually act as basic building block in synthetic gene-protein networks, such as genetic oscillators or switches because the dynamics is robust for parameter perturbations or environment variations. Several examples are also provided to demonstrate implementation of synthetic oscillators by using genes of the λ phage bacteria. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A literature based method for identifying gene-disease connections

    Publication Year: 2002 , Page(s): 109 - 117
    Cited by:  Papers (4)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (402 KB) |  | HTML iconHTML  

    We present a statistical method that can swiftly identify, from the literature, sets of genes known to be associated with given diseases. It offers a comprehensive way to treat alias symbols, a statistical method for computing the relevance of the gene to the query, and a novel way to disambiguate gene symbols from other abbreviations. The method is illustrated by finding genes related to breast cancer. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • MiCA: web-based computational tools for the analysis of microbial community structure and composition based on T-RFLP of 16S rRNA genes

    Publication Year: 2002
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (309 KB)  

    Analyses of microbial community structure based on terminal restriction fragment length polymorphisms (T-RFLP) of 16S rRNA genes are hindered by the lack of computational tools needed to aid experimental design, and to archive and analyze large data sets. The aim of this research was to develop a suite of Web-based tools that would enable researchers to perform several tasks, including: (a) in silico PCR amplification and restriction of 16S rRNA gene sequences found in public databases; (b) automatic retrieval of data and archival storage in an Oracle relational database; (c) comparison of multiple T-RFLP profiles obtained from a single sample using different primer-enzyme combinations; and (d) statistical analysis of T-RFLP data and clustering of samples based on similarities and differences. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Protein-based analysis of alternative splicing in the human genome

    Publication Year: 2002 , Page(s): 118 - 124
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (317 KB) |  | HTML iconHTML  

    Understanding the functional significance of alternative splicing and other mechanisms that generate RNA transcript diversity is an important challenge facing modern-day molecular biology. Using homology-based, protein sequence analysis methods, it should be possible to investigate how transcript diversity impacts protein structure and function. To test this, a data mining technique ("DiffHit") was developed to identify and catalog genes producing protein isoforms which exhibit distinct profiles of conserved protein motifs. We found that out of a test set of over 1,300 alternatively spliced genes with solved genomic structure, over 30% exhibited a differential profile of conserved InterPro and/or Blocks protein motifs across distinct isoforms. These results suggest that motif databases such as Blocks and InterPro are potentially useful tools for investigating how alternative transcript structure affects gene function. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • P-quasi complete linkage analysis for gene-expression data

    Publication Year: 2002
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (220 KB) |  | HTML iconHTML  

    In order to find the function of genes from gene-expression profiles, hierarchical clustering has generally been used, but this method has problems, for example a dendrogram tends to change by data dependence, therefore it is easy to be influenced of the error of an experimental noise. To cope with problems, we propose another type of clustering. We formulate the problem of clustering as a graph-covering problem by connected subgraphs where vertices and edges of the graph denote genes and similarities between genes, respectively. The method is based on the p-quasi complete linkage algorithm for describing clusters. We present the outline of an algorithm for clustering a set of genes into subsets corresponding to p-quasi complete linkage graphs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Constrained multiple sequence alignment tool development and its application to RNase family alignment

    Publication Year: 2002 , Page(s): 127 - 137
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (427 KB) |  | HTML iconHTML  

    In this paper, we design an algorithm of computing a constrained multiple sequence alignment (CMSA) for guaranteeing that the generated alignment satisfies the user-specified constraints that some particular residues should be aligned together If the number of residues needed to be aligned together is a constant a, then the time-complexity of our CMSA algorithm for aligning K sequences is 𝒪(αKn4), where n is the maximum of the lengths of sequences. In addition, we have build up such a CMSA software system and made several experiments on the RNase sequences, which mainly function in catalyzing the degradation of RNA molecules. The resulting alignments illustrate the practicability of our method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Selective tree growing: a deterministic constant-space linear-time algorithm for pattern discovery and for computing multiple sequence alignment

    Publication Year: 2002
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (207 KB) |  | HTML iconHTML  

    Summary form only given. Given a set of n sequences, the multiple sequence alignment problem is to align these n sequences, with gaps or otherwise, such that the commonality of the sequences is projected appropriately. If m is the total sum of the lengths of the input sequences, A is the alphabet size of the input sequences, and P is the final number of unique patterns, fixed by the user, that cause an alignment between sequences, then the algorithm runs in time bound O(m(A + P)), linear worst case time. Our algorithm runs on both sequences where A is small and large. Our algorithm forms the alignment by first discovering patterns, and thus is also a pattern discovery solution. We support our theoretical conclusions with experimental results obtained from running our algorithm on GenPept sequences and human genome sequences from the GenBank public domain database. Our algorithm uses direct n-wise alignment and constant memory space irrespective of the value of m. What differentiates this algorithm from most others is that it is deterministic; it is guaranteed and theoretically proved that all patterns of any arbitrary length that occur in at least k sequences and that are responsible for multiple sequence alignment are found by the algorithm, where k is specified by the user. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallelizing a DNA simulation code for the Cray MTA-2

    Publication Year: 2002 , Page(s): 291 - 302
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (456 KB) |  | HTML iconHTML  

    The Cray MTA-2 (Multithreaded Architecture) is an unusual parallel supercomputer that promises ease of use and high performance. We describe our experience on the MTA-2 with a molecular dynamics code, SIMU-MD, that we are using to simulate the translocation of DNA through a nanopore in a silicon based ultrafast sequencer. Our sequencer is constructed using standard VLSI technology and consists of a nanopore surrounded by field effect transistors (FETs). We propose to use the FETs to sense variations in charge as a DNA molecule translocates through the pore and thus differentiate between the four building block nucleotides of DNA. We were able to port SIMU-MD, a serial C code, to the MTA with only a modest effort and with good performance. Our porting process needed neither a parallelism support platform nor attention to the intimate details of parallel programming and interprocessor communication, as would have been the case with more conventional supercomputers. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • PheGe, the platform for exploring genotype-phenotype relations on cellular and organism level

    Publication Year: 2002 , Page(s): 79 - 86
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1016 KB) |  | HTML iconHTML  

    One major challenge of bioinformatics is to extract biological information into a form that gives access to analyses and predictive models and that sheds new light on cellular and organism function. In order to approach automated network analysis on organism level the relational platform PheGe was generated. PheGe enables (a) presentation of cell-specific regulatory and metabolic pathways, (b) sorting and coordination of the various molecules, genes and reactions to their particular signaling systems, (c) visualization of signaling par distance, (d) organization of downstream events on a multicellular level, (e) recording and evaluation of pathological relevant data, (f) coordination of the aberrant genes and gene products into the various regulatory pathways balancing phenotypic patterns (g) modeling of cellular differentiation and finally (h) tracing of network components that balance differentiation programs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Electronic polymerase chain reaction (EPCR) search algorithm

    Publication Year: 2002
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (296 KB)  

    We developed an integer-encoding scheme and a search algorithm for in silico PCR (polymerase chain reaction) amplification that identifies sequence homology with the specified primers and enzymes. Unlike the traditional character-based approach, the EPCR algorithm developed represents DNA sequences as four integer variables. The bit streams in each integer variable reflect the occurrences of nucleotides (A, T C, G) in the sequence. This approach exploits the fact that there are only four possible nucleotides in either DNA or RNA. A sequence of 32 nucleotides therefore can be reduced to four integers. In addition, since nucleotides are individually represented by four integer variables, ambiguities in the sequence (e.g., "N") can be fully resolved and encoded within the four integers. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design of genetic switches with only positive feedback loops

    Publication Year: 2002 , Page(s): 151 - 162
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (417 KB)  

    We develop a new methodology to design synthetic genetic switch networks with multiple genes and time delays, by using monotone dynamical theory. We show that the networks with only positive feedback loops have no stable oscillation except equilibria whose stability is also independent of the time delays. Such systems have ideal properties for switch networks and can be designed without consideration of time delays, because the systems can be reduced from functional spaces to Euclidian spaces due to the independence to time delays. Specifically, we first prove the basic properties of the genetic networks composed of only positive feedback loops, and then propose a procedure to design the switches, which drastically simplifies analysis of the switches and makes theoretical analysis and designing tractable even for large scale systems. Finally, we demonstrate our theoretical results by designing a biologically plausible synthesized genetic switch with experimentally well investigated lacI, tetR, and cI genes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast and sensitive algorithm for aligning ESTs to human genome

    Publication Year: 2002 , Page(s): 43 - 53
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1970 KB) |  | HTML iconHTML  

    There is a pressing need to align growing set of expressed sequence tags (ESTs) to newly sequenced human genome. The problem is, however, complicated by the exon/intron structure of eucaryotic genes, misread nucleotides in ESTs, and millions of repetitive sequences in genomic sequences. Indeed, to solve this, algorithms that use dynamic programming (DP) have been proposed, but in reality, these algorithms require an enormous amount of processing time. In an effort to improve the computational efficiency of these classical DP algorithms, we develop a software that fully utilizes the lookup-table for allowing the efficient detection of the start- and end-points of an EST within a given DNA sequence, and subsequently, the prompt identification of exons and introns. In addition, high sensitivity and accuracy must be achieved by calculating locations of all spliced sites correctly for more ESTs while retaining high computational efficiency. This goal is hard to accomplish in practice, owing to misread nucleotides in ESTs and repetitive sequences in the genome, but we present a couple of heuristics effective in settling this issue. Experimental results confirm that our technique improves the overall computation time by orders of magnitude compared with common tools such as sim4 and BLAT and attains high sensitivity and accuracy against datasets of clean and documented genes at the same time. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Bayesian network and nonparametric heteroscedastic regression for nonlinear modeling of genetic network

    Publication Year: 2002 , Page(s): 219 - 227
    Cited by:  Papers (1)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (402 KB) |  | HTML iconHTML  

    We propose a new statistical method for constructing a genetic network from microarray gene expression data by using a Bayesian network. An essential point of Bayesian network construction is in the estimation of the conditional distribution of each random variable. We consider fitting nonparametric regression models with heterogeneous error variances to the microarray gene expression data to capture the nonlinear structures between genes. A problem still remains to be solved in selecting an optimal graph, which gives the best representation of the system among genes. We theoretically derive a new graph selection criterion from Bayes approach in general situations. The proposed method includes previous methods based on Bayesian networks. We demonstrate the effectiveness of the proposed method through the analysis of Saccharomyces cerevisiae gene expression data newly obtained by disrupting 100 genes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Towards automatic clustering of protein sequences

    Publication Year: 2002 , Page(s): 175 - 186
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (405 KB) |  | HTML iconHTML  

    Analyzing protein sequence data becomes increasingly important recently. Most previous work on this area has mainly focused on building classification models. In this paper we investigate in the problem of automatic clustering of unlabeled protein sequences. As a widely recognized technique in statistics and computer science, clustering has been proven very useful in detecting unknown object categories and revealing hidden correlations among objects. One difficulty, that prevents clustering from being performed directly on protein sequence is the lack of an effective similarity measure that can be computed efficiently. Therefore, we propose a novel model for protein sequence cluster by exploring significant statistical properties possessed by the sequences. The concept of imprecise probabilities are introduced to the original probabilistic suffix tree to monitor the convergence of the empirical measurement and to guide the clustering process. It is demonstrated that the proposed method can successfully discover meaningful families without the necessity of learning models of different families from pre-labeled "training data". View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Genome annotation and protein structure

    Publication Year: 2002
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (190 KB) |  | HTML iconHTML  

    Summary form only given. Structural genomics aims to provide a good experimental structure or computational model of every tractable protein in a complete genome. Underlying this goal is the immense value of protein structure, especially in permitting recognition of distant evolutionary relationships for proteins whose sequence analysis has failed to find any significant homologue. A considerable fraction of the genes in all sequenced genomes have no known function, and structure determination provides a direct means of revealing homology that may be used to infer their putative molecular function. The solved structures is similarly useful for elucidating the biochemical or biophysical role of proteins that have been previously ascribed only phenotypic functions. More generally, knowledge of an increasingly complete repertoire of protein structures will aid structure prediction methods, improve understanding of protein structure, and ultimately lend insight into molecular interactions and pathways. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A new clustering method for microarray data analysis

    Publication Year: 2002 , Page(s): 268 - 275
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (345 KB) |  | HTML iconHTML  

    A novel clustering approach is introduced to overcome missing data and inconsistency of gene expression levels under different conditions in the stage of clustering. It is based on the so-called smooth score, which is defined for measuring the deviation of the expression level of a gene and the average expression level of all the genes involved under a condition. We present an efficient greedy algorithm for finding clusters with a smooth score below a threshold after studying its computational complexity. The algorithm was tested intensively on random matrices and yeast data. It was shown to perform it well in finding co-regulation patterns in a test with the yeast data. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • AxML: a fast program for sequential and parallel phylogenetic tree calculations based on the maximum likelihood method

    Publication Year: 2002 , Page(s): 21 - 28
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (307 KB) |  | HTML iconHTML  

    Heuristics for the NP-complete problem of calculating the optimal phylogenetic tree for a set of aligned rRNA sequences based on the maximum likelihood method are computationally expensive. In most existing algorithms, the tree evaluation and branch length optimization functions, calculating the likelihood value for each tree topology examined in the search space, account for the greatest part of the overall computation time. This paper introduces AxML, a program derived from fastDNAml, incorporating a fast topology evaluation junction. The algorithmic optimizations introduced, represent a general approach for accelerating this function and are applicable to both sequential and parallel phylogeny programs, irrespective of their search space strategy. Therefore, their integration into three existing phylogeny programs rendered encouraging results. Experimental results on conventional processor architectures show a global run time improvement of 35% up to 47% for the various test sets and program versions we used. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Simultaneous classification and feature clustering using discriminant vector quantization with applications to microarray data analysis

    Publication Year: 2002 , Page(s): 246 - 255
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1228 KB) |  | HTML iconHTML  

    In many applications of supervised learning, automatic feature clustering is often desirable for a better understanding of the interaction among the various features as well as the interplay between the features and the class labels. In addition, for high dimensional data sets, feature clustering has the potential for improvement in classification accuracy and reduction in computational complexity. In this paper, a method is developed for simultaneous classification and feature clustering by extending discriminant vector quantization (DVQ), a prototype classification method derived from the principle of minimum description length using source coding techniques. The method incorporates feature clustering with classification performed by fusing features in the same clusters. To illustrate its effectiveness, the method has been applied to microarray gene expression data for human lymphoma classification. It is demonstrated that incorporating feature clustering improves classification accuracy, and the clusters generated match well with biological meaningful gene expression signature groups. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Distributions of free energy, melting temperature, and hybridization propensity for genomic DNA oligomers

    Publication Year: 2002
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (217 KB) |  | HTML iconHTML  

    Many molecular biology techniques such as PCR, southern blotting, molecular beacon based assays, and DNA microarrays rely on the ability to design oligonucleotide probes possessing specific thermodynamic properties. Thermodynamic parameters for DNA duplex formation (melting temperature: Tm, free energy: ΔG°γ, and hybridization extent: Fb) are accurately predicted using the nearest-neighbor model for a range of physical conditions for oligonucleotides up to about 50 bases in length. The use of thermodynamic quantities is ubiquitous in probe design schemes, but invariably focus on achieving specific values for sequences in hand. This fails to provide general insights about how these quantities depend on sequence composition, length, and experimental conditions. Here we present Tm and Fb distributions calculated for genomic DNA samples of 10 to 50 bases. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • DNA sequence compression using the Burrows-Wheeler Transform

    Publication Year: 2002 , Page(s): 303 - 313
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (356 KB) |  | HTML iconHTML  

    We investigate off-line dictionary oriented approaches to DNA sequence compression, based on the Burrows-Wheeler Transform (BWT). The preponderance of short repeating patterns is an important phenomenon in biological sequences. Here, we propose off-line methods to compress DNA sequences that exploit the different repetition structures inherent in such sequences. Repetition analysis is performed based on the relationship between the BWT and important pattern matching data structures, such as the suffix tree and suffix array. We discuss how the proposed approach can be incorporated in the BWT compression pipeline. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An efficient branch-and-bound algorithm for the assignment of protein backbone NMR peaks

    Publication Year: 2002 , Page(s): 165 - 174
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (385 KB) |  | HTML iconHTML  

    NMR resonance assignment is one of the key steps in solving an NMR protein structure. The assignment process links resonance peaks to individual residues of the target protein sequence, providing the prerequisite for establishing intra- and inter-residue spatial relationships between atoms. The assignment process is tedious and time-consuming, which could take many weeks. Though there exist a number of computer programs to assist the assignment process, many NMR labs are still doing the assignments manually to ensure quality. This paper presents a new computational method based on our recent work towards automating the assignment process, particularly the process of backbone resonance peak assignment. We formulate the assignment problem as a constrained weighted bipartite matching problem. While the problem, in the most general situation, is NP-hard, we present an efficient solution based on a branch-and-bound algorithm with effective bounding techniques and a greedy filtering algorithm for reducing the search space. Our experimental results on 70 instances of (pseudo) real NMR data derived from 14 proteins demonstrate that the new solution runs much faster than a recently introduced (exhaustive) two-layer algorithm and recovers more correct peak assignments than the two-layer algorithm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An application of a pathway alignment method to comparative analysis between genome and pathways

    Publication Year: 2002
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (243 KB)  

    We present a method for the comparative analysis of genomes and metabolic pathways based on similarity between gene orders and enzymatic reactions. To measure the reaction similarity, we formalized a scoring system by using the functional hierarchy of the EC numbers of enzymes. We have used an alignment method between given pathways, which is based on the longest common subsequence algorithm with the scoring system. The similarity score between pathways is expressed as the information content of their alignment. By applying our algorithm to the metabolic pathway in Escherichia coli, we have found several common patterns among the purine, lysine and arginine biosynthesis and other amino acid related metabolic pathways. We have also compared the alignments with gene orders on the E. coli genome by using a heuristic graph comparison method From the comparison, we have found that reaction orders and gene orders are conserved in the histidine and tryptophan biosynthesis pathways. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Rapid large-scale oligonucleotide selection for microarrays

    Publication Year: 2002 , Page(s): 54 - 63
    Cited by:  Papers (3)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (374 KB) |  | HTML iconHTML  

    We present the first algorithm that selects oligonucleotide probes (e.g. 25-mers) for microarray experiments on a large scale. For example, oligos for human genes can be found within 50 hours. This becomes possible by using the longest common substring as a specificity measure for candidate oligos. We present an algorithm based on a suffix array with additional information that is efficient both in terms of memory usage and running time to rank all candidate oligos according to their specificity. We also introduce the concept of master sequences to describe the sequences from which oligos are to be selected. Constraints such as oligo length, melting temperature, and self-complementarity are incorporated in the master sequence at a preprocessing stage and thus kept separate from the main selection problem. As a result, custom oligos can now be designed for any sequenced genome, just as the technology for on-site chip synthesis is becoming increasingly mature. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.