Unknown

Dataset Information

0

Nanopore sequencing of 1000 Genomes Project samples to build a comprehensive catalog of human genetic variation.


ABSTRACT: Less than half of individuals with a suspected Mendelian condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control datasets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project ONT Sequencing Consortium aims to generate LRS data from at least 800 of the 1000 Genomes Project samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37x and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.

SUBMITTER: Gustafson JA 

PROVIDER: S-EPMC10942501 | biostudies-literature | 2024 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

Nanopore sequencing of 1000 Genomes Project samples to build a comprehensive catalog of human genetic variation.

Gustafson Jonas A JA   Gibson Sophia B SB   Damaraju Nikhita N   Zalusky Miranda Pg MP   Hoekzema Kendra K   Twesigomwe David D   Yang Lei L   Snead Anthony A AA   Richmond Phillip A PA   De Coster Wouter W   Olson Nathan D ND   Guarracino Andrea A   Li Qiuhui Q   Miller Angela L AL   Goffena Joy J   Anderson Zachery Z   Storz Sophie Hr SH   Ward Sydney A SA   Sinha Maisha M   Gonzaga-Jauregui Claudia C   Clarke Wayne E WE   Basile Anna O AO   Corvelo André A   Reeves Catherine C   Helland Adrienne A   Musunuri Rajeeva Lochan RL   Revsine Mahler M   Patterson Karynne E KE   Paschal Cate R CR   Zakarian Christina C   Goodwin Sara S   Jensen Tanner D TD   Robb Esther E   McCombie W Richard WR   Sedlazeck Fritz J FJ   Zook Justin M JM   Montgomery Stephen B SB   Garrison Erik E   Kolmogorov Mikhail M   Schatz Michael C MC   McLaughlin Richard N RN   Dashnow Harriet H   Zody Michael C MC   Loose Matt M   Jain Miten M   Eichler Evan E EE   Miller Danny E DE  

medRxiv : the preprint server for health sciences 20240307


Less than half of individuals with a suspected Mendelian condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control datasets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project ONT Sequencing Consortium aims to gene  ...[more]

Similar Datasets

| S-EPMC11610458 | biostudies-literature
| S-EPMC3563612 | biostudies-literature
| S-EPMC3167619 | biostudies-literature
| PRJEB56604 | ENA
| S-EPMC11222905 | biostudies-literature
| S-EPMC9882142 | biostudies-literature
2015-07-01 | GSE70188 | GEO
| S-EPMC3277631 | biostudies-literature
| S-EPMC5553676 | biostudies-literature
| S-EPMC11445439 | biostudies-literature