Unknown

Dataset Information

0

Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads.


ABSTRACT: Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing1,2 with continuous long-read or high-fidelity3 sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms.

SUBMITTER: Porubsky D 

PROVIDER: S-EPMC7954704 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC4720449 | biostudies-literature
| S-EPMC5503144 | biostudies-literature
| S-EPMC6836740 | biostudies-literature
| S-EPMC5889714 | biostudies-literature
| S-EPMC3707490 | biostudies-literature
| S-EPMC5221426 | biostudies-literature
| S-EPMC7419660 | biostudies-literature
| S-EPMC7696006 | biostudies-literature
| S-EPMC6918626 | biostudies-literature
| S-EPMC5004134 | biostudies-literature