Unknown

Dataset Information

0

Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome.


ABSTRACT: The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5?kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of short-read sequencing to detect small variants and structural variants. We estimate that 2,434 discordances are correctable mistakes in the 'genome in a bottle' (GIAB) benchmark set. Nearly all (99.64%) variants can be phased into haplotypes, further improving variant detection. De novo genome assembly using CCS reads alone produced a contiguous and accurate genome with a contig N50 of >15?megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads.

SUBMITTER: Wenger AM 

PROVIDER: S-EPMC6776680 | biostudies-literature | 2019 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications


The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall r  ...[more]

Similar Datasets

| S-EPMC10777354 | biostudies-literature
| S-EPMC11246426 | biostudies-literature
| S-EPMC9900919 | biostudies-literature
| S-EPMC6788989 | biostudies-literature
| S-EPMC7004874 | biostudies-literature
| S-EPMC10144670 | biostudies-literature
| S-EPMC8590762 | biostudies-literature
| S-EPMC10123983 | biostudies-literature
| S-EPMC11322167 | biostudies-literature