Unknown

Dataset Information

0

Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly-repetitive transposable elements.


ABSTRACT: High-throughput DNA sequencing technologies have revolutionized genomic analysis, including the de novo assembly of whole genomes. Nevertheless, assembly of complex genomes remains challenging, in part due to the presence of dispersed repeats which introduce ambiguity during genome reconstruction. Transposable elements (TEs) can be particularly problematic, especially for TE families exhibiting high sequence identity, high copy number, or complex genomic arrangements. While TEs strongly affect genome function and evolution, most current de novo assembly approaches cannot resolve long, identical, and abundant families of TEs. Here, we applied a novel Illumina technology called TruSeq synthetic long-reads, which are generated through highly-parallel library preparation and local assembly of short read data and which achieve lengths of 1.5-18.5 Kbp with an extremely low error rate ([Formula: see text]0.03% per base). To test the utility of this technology, we sequenced and assembled the genome of the model organism Drosophila melanogaster (reference genome strain y; cn, bw, sp) achieving an N50 contig size of 69.7 Kbp and covering 96.9% of the euchromatic chromosome arms of the current reference genome. TruSeq synthetic long-read technology enables placement of individual TE copies in their proper genomic locations as well as accurate reconstruction of TE sequences. We entirely recovered and accurately placed 4,229 (77.8%) of the 5,434 annotated transposable elements with perfect identity to the current reference genome. As TEs are ubiquitous features of genomes of many species, TruSeq synthetic long-reads, and likely other methods that generate long-reads, offer a powerful approach to improve de novo assemblies of whole genomes.

SUBMITTER: McCoy RC 

PROVIDER: S-EPMC4154752 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

altmetric image

Publications

Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly-repetitive transposable elements.

McCoy Rajiv C RC   Taylor Ryan W RW   Blauwkamp Timothy A TA   Kelley Joanna L JL   Kertesz Michael M   Pushkarev Dmitry D   Petrov Dmitri A DA   Fiston-Lavier Anna-Sophie AS  

PloS one 20140904 9


High-throughput DNA sequencing technologies have revolutionized genomic analysis, including the de novo assembly of whole genomes. Nevertheless, assembly of complex genomes remains challenging, in part due to the presence of dispersed repeats which introduce ambiguity during genome reconstruction. Transposable elements (TEs) can be particularly problematic, especially for TE families exhibiting high sequence identity, high copy number, or complex genomic arrangements. While TEs strongly affect g  ...[more]

Similar Datasets

| S-EPMC8279952 | biostudies-literature
2021-06-29 | PXD025736 | Pride
| S-EPMC4948706 | biostudies-literature
| S-EPMC6545690 | biostudies-literature
| S-EPMC4909310 | biostudies-literature
| S-EPMC1232128 | biostudies-literature
| S-EPMC4471408 | biostudies-literature
| S-EPMC3852290 | biostudies-literature
| S-EPMC2817418 | biostudies-literature
| S-EPMC6042816 | biostudies-other