Unknown

Dataset Information

0

LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly.


ABSTRACT: Background:Completing a genome is an important goal of genome assembly. However, many assemblies, including reference assemblies, are unfinished and have a number of gaps. Long reads obtained from third-generation sequencing (TGS) platforms can help close these gaps and improve assembly contiguity. However, current gap-closure approaches using long reads require extensive runtime and high memory usage. Thus, a fast and memory-efficient approach using long reads is needed to obtain complete genomes. Findings:We developed LR_Gapcloser to rapidly and efficiently close the gaps in genome assembly. This tool utilizes long reads generated from TGS sequencing platforms. Tested on de novo assembled gaps, repeat-derived gaps, and real gaps, LR_Gapcloser closed a higher number of gaps faster and with a lower error rate and a much lower memory usage than two existing, state-of-the art tools. This tool utilized raw reads to fill more gaps than when using error-corrected reads. It is applicable to gaps in the assemblies by different approaches and from large and complex genomes. After performing gap-closure using this tool, the contig N50 size of the human CHM1 genome was improved from 143 kb to 19 Mb, a 132-fold increase. We also closed the gaps in the Triticum urartu genome, a large genome rich in repeats; the contig N50 size was increased by 40%. Further, we evaluated the contiguity and correctness of six hybrid assembly strategies by combining the optimal TGS-based and next-generation sequencing-based assemblers with LR_Gapcloser. A proposed and optimal hybrid strategy generated a new human CHM1 genome assembly with marked contiguity. The contig N50 value was greater than 28 Mb, which is larger than previous non-reference assemblies of the diploid human genome. Conclusions:LR_Gapcloser is a fast and efficient tool that can be used to close gaps and improve the contiguity of genome assemblies. A proposed hybrid assembly including this tool promises reference-grade assemblies. The software is available at http://www.fishbrowser.org/software/LR_Gapcloser/.

SUBMITTER: Xu GC 

PROVIDER: S-EPMC6324547 | biostudies-literature | 2019 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly.

Xu Gui-Cai GC   Xu Tian-Jun TJ   Zhu Rui R   Zhang Yan Y   Li Shang-Qi SQ   Wang Hong-Wei HW   Li Jiong-Tang JT  

GigaScience 20190101 1


<h4>Background</h4>Completing a genome is an important goal of genome assembly. However, many assemblies, including reference assemblies, are unfinished and have a number of gaps. Long reads obtained from third-generation sequencing (TGS) platforms can help close these gaps and improve assembly contiguity. However, current gap-closure approaches using long reads require extensive runtime and high memory usage. Thus, a fast and memory-efficient approach using long reads is needed to obtain comple  ...[more]

Similar Datasets

| S-EPMC7476103 | biostudies-literature
| S-EPMC7573059 | biostudies-literature
| S-EPMC10997618 | biostudies-literature
| S-EPMC7541610 | biostudies-literature
| S-EPMC8281077 | biostudies-literature
| S-EPMC8557608 | biostudies-literature
| S-EPMC4460631 | biostudies-literature
| S-EPMC5411768 | biostudies-literature
| S-EPMC5889714 | biostudies-literature
| S-EPMC4199626 | biostudies-literature