Dataset Information

How repetitive are genomes?

ABSTRACT:

Background

Genome sequences vary strongly in their repetitiveness and the causes for this are still debated. Here we propose a novel measure of genome repetitiveness, the index of repetitiveness, Ir, which can be computed in time proportional to the length of the sequences analyzed. We apply it to 336 genomes from all three domains of life.

Results

The expected value of Ir is zero for random sequences of any G/C content and greater than zero for sequences with excess repeats. We find that the Ir of archaea is significantly smaller than that of eubacteria, which in turn is smaller than that of eukaryotes. Mouse chromosomes have a significantly higher Ir than human chromosomes and within each genome the Y chromosome is most repetitive. A sliding window analysis reveals that the human HOXA cluster and two surrounding genes are characterized by local minima in Ir. A program for calculating the Ir is freely available at http://adenine.biz.fh-weihenstephan.de/ir/.

Conclusion

The general measure of DNA repetitiveness proposed in this paper can be efficiently computed on a genomic scale. This reveals a broad spectrum of repetitiveness among diverse genomes which agrees qualitatively with previous studies of repeat content. A sliding window analysis helps to analyze the intragenomic distribution of repeats.

SUBMITTER: Haubold B

PROVIDER: S-EPMC1769404 | biostudies-literature | 2006 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

How repetitive are genomes?

Haubold Bernhard B Wiehe Thomas T

BMC bioinformatics 20061222

<h4>Background</h4>Genome sequences vary strongly in their repetitiveness and the causes for this are still debated. Here we propose a novel measure of genome repetitiveness, the index of repetitiveness, Ir, which can be computed in time proportional to the length of the sequences analyzed. We apply it to 336 genomes from all three domains of life.<h4>Results</h4>The expected value of Ir is zero for random sequences of any G/C content and greater than zero for sequences with excess repeats. We f ...[more]

PMID: 17187668

Similar Datasets

Project description:Recent genome-wide experiments in different eukaryotic genomes provide an unprecedented view of transcription factor (TF) binding locations and of nucleosome occupancy. These experiments revealed that a large fraction of TF binding events occur in regions where only a small number of specific TF binding sites (TFBSs) have been detected. Furthermore, in vitro protein-DNA binding measurements performed for hundreds of TFs indicate that TFs are bound with wide range of affinities to different DNA sequences that lack known consensus motifs. These observations have thus challenged the classical picture of specific protein-DNA binding and strongly suggest the existence of additional recognition mechanisms that affect protein-DNA binding preferences. We have previously demonstrated that repetitive DNA sequence elements characterized by certain symmetries statistically affect protein-DNA binding preferences. We call this binding mechanism nonconsensus protein-DNA binding in order to emphasize the point that specific consensus TFBSs do not contribute to this effect. In this paper, using the simple statistical mechanics model developed previously, we calculate the nonconsensus protein-DNA binding free energy for the entire C. elegans and D. melanogaster genomes. Using the available chromatin immunoprecipitation followed by sequencing (ChIP-seq) results on TF-DNA binding preferences for ~100 TFs, we show that DNA sequences characterized by low predicted free energy of nonconsensus binding have statistically higher experimental TF occupancy and lower nucleosome occupancy than sequences characterized by high free energy of nonconsensus binding. This is in agreement with our previous analysis performed for the yeast genome. We suggest therefore that nonconsensus protein-DNA binding assists the formation of nucleosome-free regions, as TFs outcompete nucleosomes at genomic locations with enhanced nonconsensus binding. In addition, here we perform a new, large-scale analysis using in vitro TF-DNA preferences obtained from the universal protein binding microarrays (PBM) for ~90 eukaryotic TFs belonging to 22 different DNA-binding domain types. As a result of this new analysis, we conclude that nonconsensus protein-DNA binding is a widespread phenomenon that significantly affects protein-DNA binding preferences and need not require the presence of consensus (specific) TFBSs in order to achieve genome-wide TF-DNA binding specificity.

Project description:BackgroundSome clover species, particularly Trifolium subterraneum, have previously been reported to have highly unusual plastomes, relative to closely related legumes, enlarged with many duplications, gene losses and the presence of DNA unique to Trifolium, which may represent horizontal transfer. In order to pinpoint the evolutionary origin of this phenomenon within the genus Trifolium, we sequenced and assembled the plastomes of eight additional Trifolium species widely sampled from across the genus.ResultsThe Trifolium plastomes fell into two groups: those of Trifolium boissieri, T. strictum and T. glanduliferum (representing subgenus Chronosemium and subg. Trifolium section Paramesus) were tractable, assembled readily and were not unusual in the general context of Fabeae plastomes. The other Trifolium species ("core Trifolium") proved refractory to assembly mainly because of numerous short duplications. These species form a single clade, which we call the "refractory clade" (comprising subg, Trifolium sections Lupinaster, Trifolium, Trichocephalum, Vesicastrum and Trifoliastrum). The characteristics of the refractory clade are the presence of numerous short duplications and 7-15% longer genomes than the tractable species. Molecular dating estimates that the origin of the most recent common ancestor (MRCA) of the refractory clade is approximately 13.1 million years ago (MYA). This is considerably younger than the estimated MRCA ages of Trifolium (c. 18.6 MYA) and Trifolium subg. Trifolium (16.1 MYA).ConclusionsWe conclude that the unusual repetitive plastome type previously characterized in Trifolium subterraneum had a single origin within Trifolium and is characteristic of most (but not all) species of subgenus Trifolium. It appears that an ancestral plastome within Trifolium underwent an evolutionary change resulting in plastomes that either actively promoted, were permissive to, or were unable to control, duplications within the genome. The precise mechanism of this important change in the mode and tempo of plastome evolution deserves further investigation.

Project description:BackgroundThe magnitude of noncoding DNA in organelle genomes can vary significantly; it is argued that much of this variation is attributable to the dissemination of selfish DNA. The results of a previous study indicate that the mitochondrial DNA (mtDNA) of the green alga Volvox carteri abounds with palindromic repeats, which appear to be selfish elements. We became interested in the evolution and distribution of these repeats when, during a cursory exploration of the V. carteri nuclear DNA (nucDNA) and plastid DNA (ptDNA) sequences, we found palindromic repeats with similar structural features to those of the mtDNA. Upon this discovery, we decided to investigate the diversity and evolutionary implications of these palindromic elements by sequencing and characterizing large portions of mtDNA and ptDNA and then comparing these data to the V. carteri draft nuclear genome sequence.ResultsWe sequenced 30 and 420 kilobases (kb) of the mitochondrial and plastid genomes of V. carteri, respectively -- resulting in partial assemblies of these genomes. The mitochondrial genome is the most bloated green-algal mtDNA observed to date: ~61% of the sequence is noncoding, most of which is comprised of short palindromic repeats spread throughout the intergenic and intronic regions. The plastid genome is the largest (>420 kb) and most expanded (>80% noncoding) ptDNA sequence yet discovered, with a myriad of palindromic repeats in the noncoding regions, which have a similar size and secondary structure to those of the mtDNA. We found that 15 kb (~0.01%) of the nuclear genome are homologous to the palindromic elements of the mtDNA, and 50 kb (~0.05%) are homologous to those of the ptDNA.ConclusionSelfish elements in the form of short palindromic repeats have propagated in the V. carteri mtDNA and ptDNA, resulting in the distension of these genomes. Copies of these same repeats are also found in a small fraction of the nucDNA, but appear to be inert in this compartment. We conclude that the palindromic repeats in V. carteri represent a single class of selfish DNA and speculate that the derivation of this element involved the lateral gene transfer of an organelle intron that first appeared in the mitochondrial genome, spreading to the ptDNA through mitochondrion-to-plastid DNA migrations, and eventually arrived in the nucDNA through organelle-to-nucleus DNA transfer events. The overall implications of palindromic repeats on the evolution of chlorophyte organelle genomes are discussed.

Project description:Eukaryotic genomes are packaged into chromatin structures that play pivotal roles in regulating all DNA-associated processes. Histone posttranslational modifications modulate chromatin structure and function, leading to rapid regulation of gene expression and genome stability, key steps in environmental adaptation. Candida albicans, a prevalent fungal pathogen in humans, can rapidly adapt and thrive in diverse host niches. The contribution of chromatin to C. albicans biology is largely unexplored. Here, we generated the first comprehensive chromatin profile of histone modifications (histone H3 trimethylated on lysine 4 [H3K4me3], histone H3 acetylated on lysine 9 [H3K9Ac], acetylated lysine 16 on histone H4 [H4K16Ac], and γH2A) across the C. albicans genome and investigated its relationship to gene expression by harnessing genome-wide sequencing approaches. We demonstrated that gene-rich nonrepetitive regions are packaged into canonical euchromatin in association with histone modifications that mirror their transcriptional activity. In contrast, repetitive regions are assembled into distinct chromatin states; subtelomeric regions and the ribosomal DNA (rDNA) locus are assembled into heterochromatin, while major repeat sequences and transposons are packaged in chromatin that bears features of euchromatin and heterochromatin. Genome-wide mapping of γH2A, a marker of genome instability, identified potential recombination-prone genomic loci. Finally, we present the first quantitative chromatin profiling in C. albicans to delineate the role of the chromatin modifiers Sir2 and Set1 in controlling chromatin structure and gene expression. This report presents the first genome-wide chromatin profiling of histone modifications associated with the C. albicans genome. These epigenomic maps provide an invaluable resource to understand the contribution of chromatin to C. albicans biology and identify aspects of C. albicans chromatin organization that differ from that of other yeasts.IMPORTANCE The fungus Candida albicans is an opportunistic pathogen that normally lives on the human body without causing any harm. However, C. albicans is also a dangerous pathogen responsible for millions of infections annually. C. albicans is such a successful pathogen because it can adapt to and thrive in different environments. Chemical modifications of chromatin, the structure that packages DNA into cells, can allow environmental adaptation by regulating gene expression and genome organization. Surprisingly, the contribution of chromatin modification to C. albicans biology is still largely unknown. For the first time, we analyzed C. albicans chromatin modifications on a genome-wide basis. We demonstrate that specific chromatin states are associated with distinct regions of the C. albicans genome and identify the roles of the chromatin modifiers Sir2 and Set1 in shaping C. albicans chromatin and gene expression.

Dataset Information

How repetitive are genomes?

Background

Results

Conclusion

Publications

How repetitive are genomes?

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets