Genomics

Dataset Information

0

De novo assembly of 150 Danish genomes reveals rich structural complexity


ABSTRACT: Most known genetic variation in human genomes has been called from comparison of short reads to the reference genome, an approach biased against finding complex variation. We sequenced 150 individuals from 50 parent-offspring trios with multiple insert-size libraries to very high coverage. We show that each genome could be independently de novo assembled into a small number of high-quality scaffolds (median N50 > 21 Mb), each of quality comparable to long read assemblies while being very cost-effective. We show that our variant call set from comparing de novo assemblies is far more complete in terms of complex variation than previous studies. Importantly, even the complex 4-5 Mb extended MHC region was assembled and resolved into haplotypes, revealing >700kb novel sequence in this important region of the genome, and major parts of the Y chromosome including some palindromes were assembled with high accuracy. Finally, we show that our variant call-set allows for the genotyping of many more complex variants when used as a reference-panel for imputation into SNP-chip data or into previously resequenced genomes.

PROVIDER: EGAS00001002108 | EGA |

REPOSITORIES: EGA

altmetric image

Publications

Sequencing and de novo assembly of 150 genomes from Denmark as a population reference.

Maretty Lasse L   Jensen Jacob Malte JM   Petersen Bent B   Sibbesen Jonas Andreas JA   Liu Siyang S   Villesen Palle P   Skov Laurits L   Belling Kirstine K   Theil Have Christian C   Izarzugaza Jose M G JMG   Grosjean Marie M   Bork-Jensen Jette J   Grove Jakob J   Als Thomas D TD   Huang Shujia S   Chang Yuqi Y   Xu Ruiqi R   Ye Weijian W   Rao Junhua J   Guo Xiaosen X   Sun Jihua J   Cao Hongzhi H   Ye Chen C   van Beusekom Johan J   Espeseth Thomas T   Flindt Esben E   Friborg Rune M RM   Halager Anders E AE   Le Hellard Stephanie S   Hultman Christina M CM   Lescai Francesco F   Li Shengting S   Lund Ole O   Løngren Peter P   Mailund Thomas T   Matey-Hernandez Maria Luisa ML   Mors Ole O   Pedersen Christian N S CNS   Sicheritz-Pontén Thomas T   Sullivan Patrick P   Syed Ali A   Westergaard David D   Yadav Rachita R   Li Ning N   Xu Xun X   Hansen Torben T   Krogh Anders A   Bolund Lars L   Sørensen Thorkild I A TIA   Pedersen Oluf O   Gupta Ramneek R   Rasmussen Simon S   Besenbacher Søren S   Børglum Anders D AD   Wang Jun J   Eiberg Hans H   Kristiansen Karsten K   Brunak Søren S   Schierup Mikkel Heide MH  

Nature 20170726 7665


Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits. Genetic variation is identified mainly by mapping short reads to the reference genome or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly i  ...[more]

Similar Datasets

| PRJNA1162264 | ENA
| PRJEB83624 | ENA
| PRJEB76276 | ENA
2021-10-14 | PXD029123 |
2021-02-01 | GSE165787 | GEO
| PRJNA1065107 | ENA
2018-10-15 | GSE120677 | GEO
2020-09-01 | GSE157183 | GEO
| PRJEB20074 | ENA
| PRJNA435626 | ENA