Unknown

Dataset Information

0

NtJoin: Fast and lightweight assembly-guided scaffolding using minimizer graphs.


ABSTRACT:

Summary

The ability to generate high-quality genome sequences is cornerstone to modern biological research. Even with recent advancements in sequencing technologies, many genome assemblies are still not achieving reference-grade. Here, we introduce ntJoin, a tool that leverages structural synteny between a draft assembly and reference sequence(s) to contiguate and correct the former with respect to the latter. Instead of alignments, ntJoin uses a lightweight mapping approach based on a graph data structure generated from ordered minimizer sketches. The tool can be used in a variety of different applications, including improving a draft assembly with a reference-grade genome, a short-read assembly with a draft long-read assembly and a draft assembly with an assembly from a closely related species. When scaffolding a human short-read assembly using the reference human genome or a long-read assembly, ntJoin improves the NGA50 length 23- and 13-fold, respectively, in under 13?m, using <11 GB of RAM. Compared to existing reference-guided scaffolders, ntJoin generates highly contiguous assemblies faster and using less memory.

Availability and implementation

ntJoin is written in C++ and Python and is freely available at https://github.com/bcgsc/ntjoin.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Coombe L 

PROVIDER: S-EPMC7320612 | biostudies-literature | 2020 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

ntJoin: Fast and lightweight assembly-guided scaffolding using minimizer graphs.

Coombe Lauren L   Nikolić Vladimir V   Chu Justin J   Birol Inanc I   Warren René L RL  

Bioinformatics (Oxford, England) 20200601 12


<h4>Summary</h4>The ability to generate high-quality genome sequences is cornerstone to modern biological research. Even with recent advancements in sequencing technologies, many genome assemblies are still not achieving reference-grade. Here, we introduce ntJoin, a tool that leverages structural synteny between a draft assembly and reference sequence(s) to contiguate and correct the former with respect to the latter. Instead of alignments, ntJoin uses a lightweight mapping approach based on a g  ...[more]

Similar Datasets

| S-EPMC10541625 | biostudies-literature
| S-EPMC8562525 | biostudies-literature
| S-EPMC6816165 | biostudies-literature
| S-EPMC8519820 | biostudies-literature
| S-EPMC5069867 | biostudies-literature
| S-EPMC10463629 | biostudies-literature
| S-EPMC7961889 | biostudies-literature
| S-EPMC8017614 | biostudies-literature
| S-EPMC5411778 | biostudies-literature
| S-EPMC10699202 | biostudies-literature