Unknown

Dataset Information

0

RiboSeed: leveraging prokaryotic genomic architecture to assemble across ribosomal regions.


ABSTRACT: The vast majority of bacterial genome sequencing has been performed using Illumina short reads. Because of the inherent difficulty of resolving repeated regions with short reads alone, only ?10% of sequencing projects have resulted in a closed genome. The most common repeated regions are those coding for ribosomal operons (rDNAs), which occur in a bacterial genome between 1 and 15 times, and are typically used as sequence markers to classify and identify bacteria. Here, we exploit the genomic context in which rDNAs occur across taxa to improve assembly of these regions relative to de novo sequencing by using the conserved nature of rDNAs across taxa and the uniqueness of their flanking regions within a genome. We describe a method to construct targeted pseudocontigs generated by iteratively assembling reads that map to a reference genome's rDNAs. These pseudocontigs are then used to more accurately assemble the newly sequenced chromosome. We show that this method, implemented as riboSeed, correctly bridges across adjacent contigs in bacterial genome assembly and, when used in conjunction with other genome polishing tools, can assist in closure of a genome.

SUBMITTER: Waters NR 

PROVIDER: S-EPMC6009695 | biostudies-literature | 2018 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

riboSeed: leveraging prokaryotic genomic architecture to assemble across ribosomal regions.

Waters Nicholas R NR   Abram Florence F   Brennan Fiona F   Holmes Ashleigh A   Pritchard Leighton L  

Nucleic acids research 20180601 11


The vast majority of bacterial genome sequencing has been performed using Illumina short reads. Because of the inherent difficulty of resolving repeated regions with short reads alone, only ∼10% of sequencing projects have resulted in a closed genome. The most common repeated regions are those coding for ribosomal operons (rDNAs), which occur in a bacterial genome between 1 and 15 times, and are typically used as sequence markers to classify and identify bacteria. Here, we exploit the genomic co  ...[more]

Similar Datasets

| S-EPMC2134781 | biostudies-literature
| S-EPMC3847771 | biostudies-literature
| S-EPMC3353972 | biostudies-literature
| S-EPMC6314543 | biostudies-literature
| S-EPMC6942260 | biostudies-literature
| S-EPMC3431384 | biostudies-literature
| S-EPMC554972 | biostudies-literature
| S-EPMC7936610 | biostudies-literature
| S-EPMC7676328 | biostudies-literature
| S-EPMC6110191 | biostudies-literature