Dataset Information

Sealer: a scalable gap-closing application for finishing draft genomes.

ABSTRACT:

Background

While next-generation sequencing technologies have made sequencing genomes faster and more affordable, deciphering the complete genome sequence of an organism remains a significant bioinformatics challenge, especially for large genomes. Low sequence coverage, repetitive elements and short read length make de novo genome assembly difficult, often resulting in sequence and/or fragment "gaps" - uncharacterized nucleotide (N) stretches of unknown or estimated lengths. Some of these gaps can be closed by re-processing latent information in the raw reads. Even though there are several tools for closing gaps, they do not easily scale up to processing billion base pair genomes.

Results

Here we describe Sealer, a tool designed to close gaps within assembly scaffolds by navigating de Bruijn graphs represented by space-efficient Bloom filter data structures. We demonstrate how it scales to successfully close 50.8% and 13.8% of gaps in human (3 Gbp) and white spruce (20 Gbp) draft assemblies in under 30 and 27 h, respectively - a feat that is not possible with other leading tools with the breadth of data used in our study.

Conclusion

Sealer is an automated finishing application that uses the succinct Bloom filter representation of a de Bruijn graph to close gaps in draft assemblies, including that of very large genomes. We expect Sealer to have broad utility for finishing genomes across the tree of life, from bacterial genomes to large plant genomes and beyond. Sealer is available for download at https://github.com/bcgsc/abyss/tree/sealer-release.

SUBMITTER: Paulino D

PROVIDER: S-EPMC4515008 | biostudies-literature | 2015 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Sealer: a scalable gap-closing application for finishing draft genomes.

Paulino Daniel D Warren René L RL Vandervalk Benjamin P BP Raymond Anthony A Jackman Shaun D SD Birol Inanç I

BMC bioinformatics 20150725

<h4>Background</h4>While next-generation sequencing technologies have made sequencing genomes faster and more affordable, deciphering the complete genome sequence of an organism remains a significant bioinformatics challenge, especially for large genomes. Low sequence coverage, repetitive elements and short read length make de novo genome assembly difficult, often resulting in sequence and/or fragment "gaps" - uncharacterized nucleotide (N) stretches of unknown or estimated lengths. Some of thes ...[more]

PMID: 26209068

Dataset Information

Sealer: a scalable gap-closing application for finishing draft genomes.

Background

Results

Conclusion

Publications

Sealer: a scalable gap-closing application for finishing draft genomes.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads.
| S-EPMC4524009 | biostudies-literature

Closing the gap between knowledge and clinical application: challenges for genomic translation.
| S-EPMC4342348 | biostudies-other

GAPPadder: a sensitive approach for closing gaps on draft genomes with short sequence reads.
| S-EPMC6551238 | biostudies-literature

ntEdit+Sealer: Efficient Targeted Error Resolution and Automated Finishing of Long-Read Genome Assemblies.
| S-EPMC9196995 | biostudies-literature

RFfiller: a robust and fast statistical algorithm for gap filling in draft genomes.
| S-EPMC9575681 | biostudies-literature

Closing Human Reference Genome Gaps: Identifying and Characterizing Gap-Closing Sequences.
| S-EPMC7407462 | biostudies-literature

FGAP: an automated gap closing tool.
| S-EPMC4091766 | biostudies-literature

Topological phase transition without gap closing.
| S-EPMC3784957 | biostudies-literature

A transcript finishing initiative for closing gaps in the human transcriptome.
| S-EPMC442158 | biostudies-literature

Closing the gap between professional teaching and practice.
| S-EPMC1119888 | biostudies-literature