Skip to Main Content
GenoMosaic is a portable database application for on demand multiple genome comparison. We discuss the methods used to generate a GenoMosaic data set from genome sequence data, and present the relational data model used in the application. We define an abstraction of genome sequence data (the feature mosaic) that allows us to bridge between annotation that describes features within single genes and that which includes possibly multiple genes and intergenic features over long stretches of genomic sequence. The goal of this project is to support new method development for on-demand multiple genome comparison. Each genome to be compared can be modeled as a string of generic features of any type that can be computationally defined, related by adjacency information within and among genomes. The generic feature abstraction makes it possible to study the arrangement of features in the genome at a level of detail which includes RNA genes, putative regulatory regions, SNPs, overlapping transcripts, intron splice junctions, alternative polyadenylation signals-in short, to incorporate significant sequence details which are not necessarily within protein-coding regions. This abstraction is amenable to functional implementation as a relational data model upon which novel query capabilities can be built, and provides objects that can be analyzed using algorithms for comparison of strings and lists. As an initial effort, we have implemented a prototype using a representative set of comparative and content-based annotation methods to reduce a collection of prokaryotic genomes to a feature mosaic representation. Entity-Relationship modeling was then used to develop a data model capable of storing detailed results, including complete parameters for each instance of analysis.