Dataset Information

Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads.

ABSTRACT: The sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5-fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective standalone technology for de novo assembly of human genomes.

SUBMITTER: Vollger MR

PROVIDER: S-EPMC7015760 | biostudies-literature | 2020 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads.

Vollger Mitchell R MR Logsdon Glennis A GA Audano Peter A PA Sulovari Arvis A Porubsky David D Peluso Paul P Wenger Aaron M AM Concepcion Gregory T GT Kronenberg Zev N ZN Munson Katherine M KM Baker Carl C Sanders Ashley D AD Spierings Diana C J DCJ Lansdorp Peter M PM Surti Urvashi U Hunkapiller Michael W MW Eichler Evan E EE

Annals of human genetics 20191111 2

The sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent ...[more]

PMID: 31711268

Dataset Information

Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads.

Publications

Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Benchmarking datasets for assembly-based variant calling using high-fidelity long reads.
| S-EPMC10045170 | biostudies-literature

Metagenome assembly of high-fidelity long reads with hifiasm-meta.
| S-EPMC9343089 | biostudies-literature

Quantifying the benefit offered by transcript assembly with Scallop-LR on single-molecule long reads.
| S-EPMC6918626 | biostudies-literature

Improved assembly of noisy long reads by k-mer validation.
| S-EPMC5131822 | biostudies-literature

HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads.
| S-EPMC7545148 | biostudies-literature

Hybrid error correction and de novo assembly of single-molecule sequencing reads.
| S-EPMC3707490 | biostudies-literature

An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing.
| S-EPMC5437942 | biostudies-literature

A survey of the sorghum transcriptome using single-molecule long reads.
| S-EPMC4931028 | biostudies-literature

Improved transcriptome assembly using a hybrid of long and short reads with StringTie.
| S-EPMC9191730 | biostudies-literature

Ratatosk: hybrid error correction of long reads enables accurate variant calling and assembly.
| S-EPMC7792008 | biostudies-literature