Unknown

Dataset Information

0

CONNET: Accurate Genome Consensus in Assembling Nanopore Sequencing Data via Deep Learning.


ABSTRACT: Single-molecule sequencing technologies produce much longer reads compared with next-generation sequencing, greatly improving the contiguity of de novo assembly of genomes. However, the relatively high error rates in long reads make it challenging to obtain high-quality assemblies. A computationally intensive consensus step is needed to resolve the discrepancies in the reads. Efficient consensus tools have emerged in the recent past, based on partial-order alignment. In this study, we discovered that the spatial relationship of alignment pileup is crucial to high-quality consensus and developed a deep learning-based consensus tool, CONNET, which outperforms the fastest tools in terms of both accuracy and speed. We tested CONNET using a 90× dataset of E. coli and a 37× human dataset. In addition to achieving high-quality consensus results, CONNET is capable of delivering phased diploid genome consensus. Diploid consensus on the above-mentioned human assembly further reduced 12% of the consensus errors made in the haploid results.

SUBMITTER: Zhang Y 

PROVIDER: S-EPMC7229283 | biostudies-literature | 2020 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

CONNET: Accurate Genome Consensus in Assembling Nanopore Sequencing Data via Deep Learning.

Zhang Yifan Y   Liu Chi-Man CM   Leung Henry C M HCM   Luo Ruibang R   Lam Tak-Wah TW  

iScience 20200501 5


Single-molecule sequencing technologies produce much longer reads compared with next-generation sequencing, greatly improving the contiguity of de novo assembly of genomes. However, the relatively high error rates in long reads make it challenging to obtain high-quality assemblies. A computationally intensive consensus step is needed to resolve the discrepancies in the reads. Efficient consensus tools have emerged in the recent past, based on partial-order alignment. In this study, we discovered  ...[more]

Similar Datasets

| S-EPMC8191041 | biostudies-literature
| S-EPMC8514461 | biostudies-literature
| S-EPMC7545146 | biostudies-literature
| S-EPMC7434944 | biostudies-literature
| PRJEB31789 | ENA
| S-EPMC7997805 | biostudies-literature
| S-EPMC6129308 | biostudies-literature
| S-EPMC4970289 | biostudies-literature
| S-EPMC3673215 | biostudies-literature
| S-EPMC6776680 | biostudies-literature