Unknown

Dataset Information

0

How repetitive are genomes?


ABSTRACT:

Background

Genome sequences vary strongly in their repetitiveness and the causes for this are still debated. Here we propose a novel measure of genome repetitiveness, the index of repetitiveness, Ir, which can be computed in time proportional to the length of the sequences analyzed. We apply it to 336 genomes from all three domains of life.

Results

The expected value of Ir is zero for random sequences of any G/C content and greater than zero for sequences with excess repeats. We find that the Ir of archaea is significantly smaller than that of eubacteria, which in turn is smaller than that of eukaryotes. Mouse chromosomes have a significantly higher Ir than human chromosomes and within each genome the Y chromosome is most repetitive. A sliding window analysis reveals that the human HOXA cluster and two surrounding genes are characterized by local minima in Ir. A program for calculating the Ir is freely available at http://adenine.biz.fh-weihenstephan.de/ir/.

Conclusion

The general measure of DNA repetitiveness proposed in this paper can be efficiently computed on a genomic scale. This reveals a broad spectrum of repetitiveness among diverse genomes which agrees qualitatively with previous studies of repeat content. A sliding window analysis helps to analyze the intragenomic distribution of repeats.

SUBMITTER: Haubold B 

PROVIDER: S-EPMC1769404 | biostudies-literature | 2006 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

How repetitive are genomes?

Haubold Bernhard B   Wiehe Thomas T  

BMC bioinformatics 20061222


<h4>Background</h4>Genome sequences vary strongly in their repetitiveness and the causes for this are still debated. Here we propose a novel measure of genome repetitiveness, the index of repetitiveness, Ir, which can be computed in time proportional to the length of the sequences analyzed. We apply it to 336 genomes from all three domains of life.<h4>Results</h4>The expected value of Ir is zero for random sequences of any G/C content and greater than zero for sequences with excess repeats. We f  ...[more]

Similar Datasets

| S-EPMC208921 | biostudies-other
| S-EPMC310657 | biostudies-literature
| S-EPMC5983011 | biostudies-literature
| S-EPMC4540582 | biostudies-literature
| S-EPMC4241210 | biostudies-literature
| S-EPMC5901494 | biostudies-other
| S-EPMC6052550 | biostudies-literature
| S-EPMC4705657 | biostudies-literature
| S-EPMC2670323 | biostudies-literature
| S-EPMC3059204 | biostudies-literature