HAPNEST synthetic dataset
Ontology highlight
ABSTRACT: This synthetic dataset contains genetics data for 1,008,000 individuals and 9 continuous phenotypic traits with various genetic architectures. The dataset includes 6 ancestry groups (AFR, AMR, CSA, EAS, EUR, MID) and over 6.8 million single nucleotide polymorphisms (SNPs) across 22 chromosomes. The data was generated using the HAPNEST software program (https://github.com/intervene-EU-H2020/synthetic_data) developed by members of the INTERVENE consortium (https://www.interveneproject.eu/). This software has been specifically designed to enable efficient, large-scale synthetic data generation for common genetic variants and complex phenotypic traits. We have open sourced this software so that anyone can easily generate their own synthetic datasets. Please see the linked GitHub repository for further details. The reference dataset used to generate this synthetic dataset is the combined 1000 Genomes Project and Human Genomic Diversity Project datasets downloaded from https://gnomad.broadinstitute.org/downloads. The data was preprocessed by retaining SNPs with non-zero MAF in all populations for which rsID numbers could be successfully aligned. This resulted in over 6.8 million variants across 22 chromosomes.
ORGANISM(S): Homo sapiens (human)
SUBMITTER:
PROVIDER: S-BSST936 | biostudies-other |
REPOSITORIES: biostudies-other
ACCESS DATA