Unknown

Dataset Information

0

Unique Features of Tandem Repeats in Bacteria.


ABSTRACT: DNA tandem repeats, or satellites, are well described in eukaryotic species, but little is known about their prevalence across prokaryotes. Here, we performed the most complete characterization to date of satellites in bacteria. We identified 121,638 satellites from 12,233 fully sequenced and assembled bacterial genomes with a very uneven distribution. We also determined the families of satellites which have a related sequence. There are 85 genomes that are particularly satellite rich and contain several families of satellites of yet unknown function. Interestingly, we only found two main types of noncoding satellites, depending on their repeat sizes, 22/44 or 52 nucleotides (nt). An intriguing feature is the constant size of the repeats in the genomes of different species, whereas their sequences show no conservation. Individual species also have several families of satellites with the same repeat length and different sequences. This result is in marked contrast with previous findings in eukaryotes, where noncoding satellites of many sizes are found in any species investigated. We describe in greater detail these noncoding satellites in the spirochete Leptospira interrogans and in several bacilli. These satellites undoubtedly play a specific role in the species which have acquired them. We discuss the possibility that they represent binding sites for transcription factors not previously described or that they are involved in the stabilization of the nucleoid through interaction with proteins.IMPORTANCE We found an enigmatic group of noncoding satellites in 85 bacterial genomes with a constant repeat size but variable sequence. This pattern of DNA organization is unique and had not been previously described in bacteria. These findings strongly suggest that satellite size in some bacteria is under strong selective constraints and thus that satellites are very likely to play a fundamental role. We also provide a list and properties of all satellites in 12,233 genomes, which may be used for further genomic analysis.

SUBMITTER: Subirana JA 

PROVIDER: S-EPMC7549362 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC8161180 | biostudies-literature
| S-EPMC10101126 | biostudies-literature
| S-EPMC4053949 | biostudies-literature
| S-EPMC4546671 | biostudies-literature
| S-EPMC3648361 | biostudies-literature
| S-EPMC7302867 | biostudies-literature
| S-EPMC3689786 | biostudies-literature
| S-EPMC3766767 | biostudies-literature
| S-EPMC3975930 | biostudies-literature
| S-EPMC6419916 | biostudies-literature