Project description:Only a few scattered groups with oral traditions of Khoe-San hunter-gatherer ancestry remain in southeastern Africa. We investigate genomic variation of remaining individuals from two South African groups with oral histories connecting them to eastern San groups, i.e., the San from Lake Chrissie and the Duma San of the uKhahlamba-Drakensberg. Using ~2.2 million genetic markers, combined with comparative published datasets, we show that the Lake Chrissie San have genetic ancestry from both Khoe-San (likely the ||Xegwi San) and Bantu-speakers. Specifically, we found that the Lake Chrissie San are closely related to current southern San groups (i.e. the Karretjie People). Duma San individuals, on the other hand, were genetically similar to southeastern Bantu speakers from South Africa. Samples were genotyped on the Illumina Omni2.5M (HumanOmni25-8v1-2_A1) SNP chip. Results were analyzed using the software GenomeStudio 2011.1 and the data were exported to Plink format, aligned to Human Genome build version 37.
Project description:The genetic structure of the indigenous hunter-gatherer peoples of Southern Africa, the oldest known lineage of modern man, holds an important key to understanding humanity's early history. Previously sequenced human genomes have been limited to recently diverged populations. Here we present the first complete genome sequences of an indigenous hunter-gatherer from the Kalahari Desert and of a Bantu from Southern Africa, as well as protein-coding regions from an additional three hunter-gatherers from disparate regions of the Kalahari. We characterize the extent of whole-genome and exome diversity among the five men, reporting 1.3 million novel DNA differences genome-wide, and 13,146 novel amino-acid variants. These data allow genetic relationships among Southern African foragers and neighboring agriculturalists to be traced more accurately than was previously possible. Adding the described variants to current databases will facilitate inclusion of Southern Africans in medical research efforts.
Project description:The history of click-speaking Khoe-San, and African populations in general, remains poorly understood. We genotyped ~2.3 million SNPs in 220 southern Africans and found that the Khoe-San diverged from other populations at least 100,000 years ago, but structure within the Khoe-San dated back to about 35,000 years ago. Genetic variation in various sub-Saharan populations did not localize the origin of modern humans to a single geographic region within Africa, instead, it indicated a history of admixture and stratification. We found evidence of adaptation targeting muscle function and immune response, potential adaptive introgression of UV-light protection, and selection predating modern human diversification involving skeletal and neurological development. These new findings illustrate the importance of African genomic diversity in understanding human evolutionary history .220 samples were analysed with the Illumina HumanOmni2.5-Quad BeadChip and are described herein.
Project description:The genetic structure of the indigenous hunter-gatherer peoples of Southern Africa, the oldest known lineage of modern man, holds an important key to understanding humanity's early history. Previously sequenced human genomes have been limited to recently diverged populations. Here we present the first complete genome sequences of an indigenous hunter-gatherer from the Kalahari Desert and of a Bantu from Southern Africa, as well as protein-coding regions from an additional three hunter-gatherers from disparate regions of the Kalahari. We characterize the extent of whole-genome and exome diversity among the five men, reporting 1.3 million novel DNA differences genome-wide, and 13,146 novel amino-acid variants. These data allow genetic relationships among Southern African foragers and neighboring agriculturalists to be traced more accurately than was previously possible. Adding the described variants to current databases will facilitate inclusion of Southern Africans in medical research efforts. Copy number differences between NA18507 and KB1 were predicted from the depth of whole-genome shotgun sequence reads. These predictions were then validated using array-CGH using a a genome-wide design as well as a custom design targeted at specific regions of copy number difference
Project description:Whole genome sequencing of 92 individuals from 44 African indigenous populations. Sequences made with Illumina HiSeq 2000 sequencing system; data uploaded in BAM format.
Project description:<p><b>LOCATION CHANGE FOR ALZHEIMER'S DISEASE SEQUENCING PROJECT (ADSP) DATA:</b> Please go to <a href="https://dss.niagads.org/" target="_blank">NIAGADS DSS</a> to apply for build 38 ADSP genetic and phenotypic data. See Background below for more details. For instructions on how to access the additional ADSP data that are shared through <a href="https://dss.niagads.org/" target="_blank">NIAGADS DSS</a>, visit the <a href="https://dss.niagads.org/documentation/" target="_blank">Application Instructions</a> page.</p> <p>Background: Additional sequencing data are continuously being generated by the ADSP. These data are mapped to the latest Genome Reference Consortium human genome build GRCh38 (hg38) and are being shared through the NIA Genetics of Alzheimer's Disease Data Storage Site (<a href="https://www.niagads.org/" target="_blank">NIAGADS</a>) Data Sharing Service (<a href="https://dss.niagads.org/" target="_blank">DSS</a>). As of May 1, 2020 there are 4,789 whole genomes and 19,922 whole exomes available to the research community. Later in 2020 there will be a total of ~17,000 whole genomes and 19,922 whole exomes available through NIAGADS DSS (<a href="https://dss.niagads.org/datasets/ng00067/" target="_blank">ng00067</a>). The total number of genomes from multi-ethnic cohorts is anticipated to exceed 50,000. Please see the <a href="https://www.niagads.org/adsp/content/study-design">ADSP Design</a> page for the complete study description.</p> <p>ADSP whole exome and whole genome sequence data that were shared through dbGaP were mapped to the GRCh37 (build 37). These data are from the Discovery Phase of the project (described below) and will continue to be available at this site.</p> <p><b>STUDY DESCRIPTION FOR dbGaP BUILD 37 ADSP DATA: </b>The overarching goals of the Alzheimer's Disease Sequencing Project (ADSP) are to: (1) identify new genomic variants contributing to increased risk of developing Alzheimer's Disease (AD), (2) identify new genomic variants contributing to protection against developing AD, and (3) provide insight as to why individuals with known risk factor variants escape from developing AD. These factors will be studied in multi-ethnic populations in order to identify new pathways for disease prevention. Such a study of human genomic variation and its relationship to health and disease requires examination of a large number of study participants and needs to capture information about common and rare variants (both single nucleotide and copy number) in well phenotyped individuals.</p> <p>Using existing samples from NIH funded and other studies, three NHGRI funded Large Scale Sequencing and Analysis Centers (LSAC) - Broad, Baylor, and Washington University - produced the DNA sequence data. Variant call data are being made available to the scientific community through NIH-approved data repositories. Statistical analysis of the sequence data is anticipated to identify new genetic risk and protective factors. The ADSP will conduct and facilitate analysis of sequence data to extend previous discoveries that may ultimately result in new directions for AD therapeutics. Analysis of ADSP data will be done in two phases.</p> <p>The Discovery Phase analysis (2014-2018) is funded under <a href="http://grants.nih.gov/grants/guide/pa-files/PAR-12-183.html">PAR-12-183</a>. The entire Discovery dataset contains whole-genome sequencing data on 584 subjects from 113 families, and pedigree data for > 4000 subjects; whole exome sequencing data on 5096 cases 4965 controls; and whole exome sequence data on an additional 853 (682 Cases [510 Non-Hispanic, 172 Hispanic]), and 171 Hispanic Control subjects from families that are multiply affected with AD.</p> <p>The Replication Phase (2016-2021) analysis will be funded under <a href="http://grants.nih.gov/grants/guide/rfa-files/RFA-AG-16-001.html">RFA-AG-16-001</a> and <a href="http://grants.nih.gov/grants/guide/rfa-files/RFA-AG-16-002.html">RFA-AG-16-002</a> and is expected to include a combination of genotyping and sequencing approaches on at least 30,000 subjects. Targeted sequencing will be done by the LSACs.</p> <p><b>GRCh37 Data Releases</b></p> <ul> <li>The <b>first</b> ADSP data release occurred on November 25, 2013. It included the whole-genome sequencing data in BAM file format on 410 individuals.</li> <li>The <b>second</b> ADSP data release occurred on March 31, 2014, and included the whole-genome sequencing data in BAM file format for an additional 168 individuals.</li> <li>The <b>third</b> ADSP data release occurred on November 03, 2014 and included whole-exome sequencing data in BAM file format for 10,939 individuals.</li> <li>The <b>fourth</b> ADSP data release occurred on February 13, 2015 and included revised ethnic data for subjects with whole-exome sequencing data.</li> <li>The <b>fifth</b> ADSP data release occurred on July 13, 2015 and included whole-genome genotypes and updated phenotypes as well as changes to pedigree structures and sample IDs.</li> <li>The <b>sixth</b> ADSP data release occurred on December 8, 2015, and included whole-exome genotypes and updated phenotypes as well as changes to subject IDs.</li> </ul> <p>This <b>seventh ADSP data release on April 12, 2016</b> includes: </p> <ul> <p>(1) WES and WGS SNV VCF files</p> <p>(2) WES and WGS Indel PLINK files</p> </ul> <p><b>ADSP Data Available through dbGaP:</b></p> <p> <table border="1"> <tr> <th></th> <th><b>ADSP - Whole Genome Sequencing</b></th> <th><b>ADSP - Whole Exome Sequencing</b></th> <th><b>Comments</b></th> </tr> <tr> <td>DNA-Seq (BAM)</td> <td>n=578</td> <td>n=10913</td> <td>Sequence data available (plus n=38 replications w/out genotype data)</td> </tr> <tr> <td>Concordant SNV Genotypes (PLINK format)</td> <td>N/A</td> <td>n=10913</td> <td>QC'ed genotypes that are concordant between the Atlas (Baylor's) and GATK (Broad's) calling pipelines (a subset of the consensus genotype set)</td> </tr> <tr> <td>Consensus Genotypes (PLINK and VCF format)</td> <td>n=578</td> <td>n=10913</td> <td>QC'ed genotypes that are concordant between Atlas and GATK pipelines as well as those that that were called uniquely by Atlas or GATK</td> </tr> <tr> <td>Concordant Indel Genotypes (PLINK format)</td> <td>n=578</td> <td>n=10913</td> <td>QC'ed genotypes that are concordant between the Atlas and GATK calling pipelines</td> </tr> <tr> <td>Phenotype Data</td> <td>n=4735</td> <td>n=10913</td> <td>Data of n=53 phenotype variables available (plus administrative data), including APOE genotype. WGS phenotypes include data of connecting family members.</td> </tr> </table> </p> <p>Please use the <a href="ftp://ftp.ncbi.nlm.nih.gov/dbgap/studies/phs000572/phs000572.v7.p4/">release notes</a> provided by dbGaP to obtain detailed information about study release updates. </p> <p>The <a href="https://www.niagads.org/adsp/portal/">ADSP data portal</a> provides a customized interface for users to quickly identify and retrieve files by covariates, phenotypes, and data properties such as sequencing facility or coverage. For more information about the ADSP study and the data portal, please visit <a href="https://www.niagads.org/adsp/">https://www.niagads.org/adsp/</a>.</p>
Project description:Genetic, linguistic, and archaeological studies have demonstrated the existence of strong links between eastern and southern Africa over the past millennia, including the diffusion of the first domesticated sheep and goats. However, the proportions at which they were introduced into past human subsistence strategies in Africa is difficult to assess archaeologically, as caprines share skeletal features with a number of wild bovids. Palaeoproteomics has proven effective at retrieving biological information from archaeological remains in African arid contexts. Using published collagen sequences and generated de novo ones of wild bovids, we present the molecular (re-)attribution of remains morphologically identified as sheep/goat or unidentifiable bovids from seventeen archaeological sites distributed between eastern and southern Africa and spanning seven millennia. More than 70% of the remains were identified and the direct radiocarbon dating of domesticates specimens allowed the chronological refinement of the arrival of caprines in both African regions. Our results further substantiate a predominance of sheep in the assemblages along with a similar arrival chronology. Beyond adding substantial biological data to the field of (palaeo-)proteomics, it is the first large-scale palaeoproteomics investigation to include both eastern and southern African sites, opening promising future applications of the method on the continent.