Unknown

Dataset Information

0

Accurate Haplotype-Resolved Assembly Reveals The Origin Of Structural Variants For Human Trios.


ABSTRACT:

Motivation

Achieving a near complete understanding of how the genome of an individual affects the phenotypes of that individual requires deciphering the order of variations along homologous chromosomes in species with diploid genomes. However, true diploid assembly of long-range haplotypes remains challenging.

Results

To address this, we have developed Haplotype-resolved Assembly for Synthetic long reads using a Trio-binning strategy, or HAST, which uses parental information to classify reads into maternal or paternal. Once sorted, these reads are used to independently de novo assemble the parent-specific haplotypes. We applied HAST to co-barcoded second-generation sequencing data from an Asian individual, resulting in a haplotype assembly covering 94.7% of the reference genome with a scaffold N50 longer than 11 Mb. The high haplotyping precision (∼99.7%) and recall (∼95.9%) represents a substantial improvement over the commonly used tool for assembling co-barcoded reads (Supernova), and is comparable to a trio-binning-based third generation long-read based assembly method (TrioCanu) but with a significantly higher single-base accuracy (up to 99.99997% (Q65)). This makes HAST a superior tool for accurate haplotyping and future haplotype-based studies.

Availability

The code of the analysis is available at https://github.com/BGI-Qingdao/HAST.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Xu M 

PROVIDER: S-EPMC8613828 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC7954703 | biostudies-literature
| S-EPMC9077962 | biostudies-literature
| S-EPMC6467913 | biostudies-literature
| S-EPMC8026704 | biostudies-literature
2022-11-15 | GSE192502 | GEO
| S-EPMC7961889 | biostudies-literature
| S-EPMC9464699 | biostudies-literature
2022-11-15 | GSE192499 | GEO
2022-11-15 | GSE192501 | GEO
| S-EPMC6476705 | biostudies-literature