Skip to Main Content
Textual analysis of microbial genomes reveals footprints of their early evolution of the genomes. It is shown that distributions frequency occurrence of words less than nine letters in genomes have widths that are many times those of Poisson distributions. This phenomenon suggests a simple biologically plausible model for the growth of genomes: the genome first grows randomly to an initial length of approximately one thousand nucleotides (1 kb), or about one thousandth of its final length, thereafter mainly grows by random short segmental duplication. We show that using duplicated segments averaging around 25 b, model sequences generated in this model possess statistical properties characteristic of present day genomes. Both the initial length and the duplicated segment length support an RNA world at the time duplication began.