Unknown

Dataset Information

0

TGS-GapCloser: A fast and accurate gap closer for large genomes with low coverage of error-prone long reads.


ABSTRACT: BACKGROUND:Analyses that use genome assemblies are critically affected by the contiguity, completeness, and accuracy of those assemblies. In recent years single-molecule sequencing techniques generating long-read information have become available and enabled substantial improvement in contig length and genome completeness, especially for large genomes (>100 Mb), although bioinformatic tools for these applications are still limited. FINDINGS:We developed a software tool to close sequence gaps in genome assemblies, TGS-GapCloser, that uses low-depth (?10×) long single-molecule reads. The algorithm extracts reads that bridge gap regions between 2 contigs within a scaffold, error corrects only the candidate reads, and assigns the best sequence data to each gap. As a demonstration, we used TGS-GapCloser to improve the scaftig NG50 value of 3 human genome assemblies by 24-fold on average with only ?10× coverage of Oxford Nanopore or Pacific Biosciences reads, covering with sequence data up to 94.8% gaps with 97.7% positive predictive value. These improved assemblies achieve 99.998% (Q46) single-base accuracy with final inserted sequences having 99.97% (Q35) accuracy, despite the high raw error rate of single-molecule reads, enabling high-quality downstream analyses, including up to a 31-fold increase in the scaftig NGA50 and up to 13.1% more complete BUSCO genes. Additionally, we show that even in ultra-large genome assemblies, such as the ginkgo (?12 Gb), TGS-GapCloser can cover 71.6% of gaps with sequence data. CONCLUSIONS:TGS-GapCloser can close gaps in large genome assemblies using raw long reads quickly and cost-effectively. The final assemblies generated by TGS-GapCloser have improved contiguity and completeness while maintaining high accuracy. The software is available at https://github.com/BGI-Qingdao/TGS-GapCloser.

SUBMITTER: Xu M 

PROVIDER: S-EPMC7476103 | biostudies-literature | 2020 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

TGS-GapCloser: A fast and accurate gap closer for large genomes with low coverage of error-prone long reads.

Xu Mengyang M   Guo Lidong L   Gu Shengqiang S   Wang Ou O   Zhang Rui R   Peters Brock A BA   Fan Guangyi G   Liu Xin X   Xu Xun X   Deng Li L   Zhang Yongwei Y  

GigaScience 20200901 9


<h4>Background</h4>Analyses that use genome assemblies are critically affected by the contiguity, completeness, and accuracy of those assemblies. In recent years single-molecule sequencing techniques generating long-read information have become available and enabled substantial improvement in contig length and genome completeness, especially for large genomes (>100 Mb), although bioinformatic tools for these applications are still limited.<h4>Findings</h4>We developed a software tool to close se  ...[more]

Similar Datasets

| S-EPMC5661950 | biostudies-literature
| S-EPMC2978382 | biostudies-literature
| S-EPMC6362602 | biostudies-literature
| S-EPMC10718184 | biostudies-literature
| S-EPMC5206522 | biostudies-literature
| S-EPMC9750119 | biostudies-literature
| S-EPMC4403973 | biostudies-literature
| S-EPMC5905663 | biostudies-literature
| S-EPMC6966875 | biostudies-literature
| S-EPMC4615873 | biostudies-literature