Unknown

Dataset Information

0

Long-read sequence and assembly of segmental duplications.


ABSTRACT: We have developed a computational method based on polyploid phasing of long sequence reads to resolve collapsed regions of segmental duplications within genome assemblies. Segmental Duplication Assembler (SDA; https://github.com/mvollger/SDA ) constructs graphs in which paralogous sequence variants define the nodes and long-read sequences provide attraction and repulsion edges, enabling the partition and assembly of long reads corresponding to distinct paralogs. We apply it to single-molecule, real-time sequence data from three human genomes and recover 33-79 megabase pairs (Mb) of duplications in which approximately half of the loci are diverged (<99.8%) compared to the reference genome. We show that the corresponding sequence is highly accurate (>99.9%) and that the diverged sequence corresponds to copy-number-variable paralogs that are absent from the human reference genome. Our method can be applied to other complex genomes to resolve the last gene-rich gaps, improve duplicate gene annotation, and better understand copy-number-variant genetic diversity at the base-pair level.

SUBMITTER: Vollger MR 

PROVIDER: S-EPMC6382464 | biostudies-literature | 2019 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications


We have developed a computational method based on polyploid phasing of long sequence reads to resolve collapsed regions of segmental duplications within genome assemblies. Segmental Duplication Assembler (SDA; https://github.com/mvollger/SDA ) constructs graphs in which paralogous sequence variants define the nodes and long-read sequences provide attraction and repulsion edges, enabling the partition and assembly of long reads corresponding to distinct paralogs. We apply it to single-molecule, r  ...[more]

Similar Datasets

| S-EPMC7641771 | biostudies-literature
| S-EPMC8049597 | biostudies-literature
| S-EPMC4920363 | biostudies-literature
| S-EPMC154576 | biostudies-literature
| S-EPMC7545148 | biostudies-literature
| S-EPMC8290290 | biostudies-literature
| S-EPMC479105 | biostudies-literature
| S-EPMC5751067 | biostudies-literature
| S-EPMC2864568 | biostudies-literature
| S-EPMC311093 | biostudies-literature