Skip to Main Content
This work addresses two principles that will be integral to the post-genomic or proteomic era (i.e., after sequencing). The first is that any analysis of data from or related to the Human Genome Project will need to be designed with high-throughput in mind. Just the sequence information will encompass some 3 billion nucleotides, and that does not include information about introns, exons, promoters, and many other features of interest. The volume of information that must be synthesized is even larger than the genome itself, and it is diverse in nature. It includes sequence, structural, functional, and localization information for each gene, and each of those constituents has its own levels of organization as well (e.g., functional information for a protein can be obtained at the molecular, cellular, and organismal levels). Computational analysis must be able to handle all these data in a reasonable amount of time. The second principle, which has been alluded to here, is that analysis techniques must incorporate data from a variety of sources. Archiving and indexing of sequence data, for example, must include sequences from multiple organisms and from diseased and healthy states to be maximally useful. The other levels of information, including structure, function, and localization will need to be similarly organized.