Unknown

Dataset Information

0

Haplotype-resolved diverse human genomes and integrated analysis of structural variation.


ABSTRACT: Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterized 130 of the most active mobile element source elements and found that 63% of all SVs arise through homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.

SUBMITTER: Ebert P 

PROVIDER: S-EPMC8026704 | biostudies-literature | 2021 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

Haplotype-resolved diverse human genomes and integrated analysis of structural variation.

Ebert Peter P   Audano Peter A PA   Zhu Qihui Q   Rodriguez-Martin Bernardo B   Porubsky David D   Bonder Marc Jan MJ   Sulovari Arvis A   Ebler Jana J   Zhou Weichen W   Serra Mari Rebecca R   Yilmaz Feyza F   Zhao Xuefang X   Hsieh PingHsun P   Lee Joyce J   Kumar Sushant S   Lin Jiadong J   Rausch Tobias T   Chen Yu Y   Ren Jingwen J   Santamarina Martin M   Höps Wolfram W   Ashraf Hufsah H   Chuang Nelson T NT   Yang Xiaofei X   Munson Katherine M KM   Lewis Alexandra P AP   Fairley Susan S   Tallon Luke J LJ   Clarke Wayne E WE   Basile Anna O AO   Byrska-Bishop Marta M   Corvelo André A   Evani Uday S US   Lu Tsung-Yu TY   Chaisson Mark J P MJP   Chen Junjie J   Li Chong C   Brand Harrison H   Wenger Aaron M AM   Ghareghani Maryam M   Harvey William T WT   Raeder Benjamin B   Hasenfeld Patrick P   Regier Allison A AA   Abel Haley J HJ   Hall Ira M IM   Flicek Paul P   Stegle Oliver O   Gerstein Mark B MB   Tubio Jose M C JMC   Mu Zepeng Z   Li Yang I YI   Shi Xinghua X   Hastie Alex R AR   Ye Kai K   Chong Zechen Z   Sanders Ashley D AD   Zody Michael C MC   Talkowski Michael E ME   Mills Ryan E RE   Devine Scott E SE   Lee Charles C   Korbel Jan O JO   Marschall Tobias T   Eichler Evan E EE  

Science (New York, N.Y.) 20210225 6537


Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of whic  ...[more]

Similar Datasets

| S-EPMC6467913 | biostudies-literature
| S-EPMC7190621 | biostudies-literature
| S-EPMC7954703 | biostudies-literature
| S-EPMC11222905 | biostudies-literature
| S-EPMC9882142 | biostudies-literature
| S-EPMC4617611 | biostudies-literature
| S-EPMC10754456 | biostudies-literature
| S-EPMC9903802 | biostudies-literature
| S-EPMC8101998 | biostudies-literature
| S-EPMC11535549 | biostudies-literature