Project description:Genome-wide patterns of variation across individuals provide a powerful source of data for uncovering the history of migration, range expansion, and adaptation of the human species. However, high-resolution surveys of variation in genotype, haplotype and copy number have generally focused on a small number of population groups. Here we report the analysis and public release of high-quality genotypes at 525,910 single-nucleotide polymorphisms (SNPs) and 396 copy-number-variable loci in a worldwide sample of 29 populations. Analysis of SNP genotypes yields strongly supported fine-scale inferences about population structure. Increasing linkage disequilibrium is observed with geographic distance from Africa, as expected under a serial founder effect for an out-of-Africa spread of human populations. New approaches for haplotype analysis produce inferences about population structure that complement results based on unphased SNPs. Despite a difference from SNPs in the frequency spectrum of the copy-number variants (CNVs) detected—including a comparatively large number of CNVs in previously unexamined populations from Oceania and the Americas—the global distribution of CNVs largely accords with population structure analyses for SNP data sets of similar size. Our results produce new inferences about inter-population variation, support the utility of CNVs in human population-genetic research, and serve as a genomic resource for human-genetic studies in diverse worldwide populations. Keywords: High Density SNP array
Project description:Copy number variants (CNVs) can reach appreciable frequencies in the human population, and several of these copy number polymorphisms (CNPs) have been recently associated with human diseases including lupus, psoriasis, Crohn disease, and obesity. Despite new advances, significant biases remain in terms of CNP discovery and genotyping. Developing a novel method based on single channel intensity data and benchmarking against copy numbers determined from sequencing read-depth, we successfully obtained CNP genotypes for 1489 CNPs from 487 human DNA samples from diverse ethnic backgrounds. This customized microarray was enriched for segmental duplication-rich regions and novel insertions of sequences not represented in the reference genome assembly or on standard single nucleotide polymorphism (SNP) microarray platforms. We observe that CNPs in segmental duplications are more likely to be population differentiated than CNPs in unique regions (p = 0.015) and that bi-allelic CNPs show greater stratification when compared to frequency-matched SNPs (p = 0.0026). Although bi-allelic CNPs show a strong correlation of copy number with flanking SNP genotypes, the majority of multi-copy CNPs do not (40% with r >0.8). We selected a subset of CNPs for further characterization in 1873 additional samples from 62 populations (947 samples analyzed by microarray; 926 samples analyzed with PCR based assays); this revealed striking population-differentiated structural variants in genes of clinical significance such as the OCLN gene, a tight junction protein involved in hepatitis C viral entry. Our new microarray design allows these variants to be rapidly tested for disease association and our results suggest that CNPs (especially those that are not in linkage disequilibrium with SNPs) may have contributed disproportionately to human diversity and selection.
Project description:Thomas Hunt Morgan and colleagues identified variation in gene copy number in Drosophila in the 1920s and 1930s and linked such variation to phenotypic differences [Bridges, C. B. (1936) Science 83, 210]. Yet the extent of variation in the number of chromosomes, chromosomal regions, or gene copies, and the importance of this variation within species, remain poorly understood. Here, we focus on copy-number variation in Drosophila melanogaster. We characterize copy-number polymorphism (CNP) across genomic regions, and we contrast patterns to infer the evolutionary processes acting on this variation. Copy-number variation in D. melanogaster is non-randomly distributed, presumably due to a mutational bias produced by tandem repeats or other mechanisms. Comparisons of coding and noncoding CNPs, however, reveal a strong effect of purifying selection in the removal of structural variation from functionally constrained regions. Most patterns of CNP in D. melanogaster suggest that negative selection and mutational biases are the primary agents responsible for shaping structural variation. Keywords: comparative genomic hybridization
Project description:Copy number variants (CNVs) affect both disease and normal phenotypic variation but those lying within heavily duplicated, highly identical sequence have been difficult to assay. By analyzing short-read mapping depth for 159 human genomes, we demonstrate accurate estimation of absolute copy number for duplications as small as 1.9 kbp, ranging from 0-48 copies. We identified 4.1 million ‘singly unique nucleotide’ (SUN) positions informative in distinguishing specific copies, and use them to genotype the copy and content of specific paralogs within highly duplicated gene families. These data identify human-specific expansions in genes associated with brain development, reveal extensive population genetic diversity, and detect signatures consistent with gene conversion in the human species. Our approach makes ~1000 genes accessible to genetic studies of disease association.