Dataset Information

Shared and unique components of human population structure and genome-wide signals of positive selection in South Asia.

ABSTRACT: South Asia harbors one of the highest levels genetic diversity in Eurasia, which could be interpreted as a result of its long-term large effective population size and of admixture during its complex demographic history. In contrast to Pakistani populations, populations of Indian origin have been underrepresented in previous genomic scans of positive selection and population structure. Here we report data for more than 600,000 SNP markers genotyped in 142 samples from 30 ethnic groups in India. Combining our results with other available genome-wide data, we show that Indian populations are characterized by two major ancestry components, one of which is spread at comparable frequency and haplotype diversity in populations of South and West Asia and the Caucasus. The second component is more restricted to South Asia and accounts for more than 50% of the ancestry in Indian populations. Haplotype diversity associated with these South Asian ancestry components is significantly higher than that of the components dominating the West Eurasian ancestry palette. Modeling of the observed haplotype diversities suggests that both Indian ancestry components are older than the purported Indo-Aryan invasion 3,500 YBP. Consistent with the results of pairwise genetic distances among world regions, Indians share more ancestry signals with West than with East Eurasians. However, compared to Pakistani populations, a higher proportion of their genes show regionally specific signals of high haplotype homozygosity. Among such candidates of positive selection in India are MSTN and DOK5, both of which have potential implications in lipid metabolism and the etiology of type 2 diabetes.

SUBMITTER: Metspalu M

PROVIDER: S-EPMC3234374 | biostudies-literature | 2011 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Shared and unique components of human population structure and genome-wide signals of positive selection in South Asia.

Metspalu Mait M Romero Irene Gallego IG Yunusbayev Bayazit B Chaubey Gyaneshwer G Mallick Chandana Basu CB Hudjashov Georgi G Nelis Mari M Mägi Reedik R Metspalu Ene E Remm Maido M Pitchappan Ramasamy R Singh Lalji L Thangaraj Kumarasamy K Villems Richard R Kivisild Toomas T

American journal of human genetics 20111201 6

South Asia harbors one of the highest levels genetic diversity in Eurasia, which could be interpreted as a result of its long-term large effective population size and of admixture during its complex demographic history. In contrast to Pakistani populations, populations of Indian origin have been underrepresented in previous genomic scans of positive selection and population structure. Here we report data for more than 600,000 SNP markers genotyped in 142 samples from 30 ethnic groups in India. C ...[more]

PMID: 22152676

Similar Datasets

Project description:Comparative genomics studies investigating the signals of positive selection among groups of closely related species are still rare and limited in taxonomic breadth. Such studies show great promise in advancing our knowledge about the proportion and the identity of genes experiencing diversifying selection. However, methodological challenges have led to high levels of false positives in past studies. Here, we use the well-annotated genome of the purple sea urchin, Strongylocentrotus purpuratus, as a reference to investigate the signals of positive selection at 6520 single-copy orthologs from nine sea urchin species belonging to the family Strongylocentrotidae paying careful attention to minimizing false positives.We identified 1008 (15.5%) candidate positive selection genes (PSGs). Tests for positive selection along the nine terminal branches of the phylogeny identified 824 genes that showed lineage-specific adaptive diversification (1.67% of branch-sites tests performed). Positively selected codons were not enriched at exon borders or near regions containing missing data, suggesting a limited contribution of false positives caused by alignment or annotation errors. Alignments were validated at 10 loci with re-sequencing using Sanger methods. No differences were observed in the rates of synonymous substitution (d S), GC content, and codon bias between the candidate PSGs and those not showing positive selection. However, the candidate PSGs had 68% higher rates of nonsynonymous substitution (d N) and 33% lower levels of heterozygosity, consistent with selective sweeps and opposite to that expected by a relaxation of selective constraint. Although positive selection was identified at reproductive proteins and innate immunity genes, the strongest signals of adaptive diversification were observed at extracellular matrix proteins, cell adhesion molecules, membrane receptors, and ion channels. Many candidate PSGs have been widely implicated as targets of pathogen binding, inactivation, mimicry, or exploitation in other groups (notably mammals).Our study confirmed the widespread action of positive selection across sea urchin genomes and allowed us to reject the possibility that annotation and alignment errors (including paralogs) were responsible for creating false signals of adaptive molecular divergence. The candidate PSGs identified in our study represent promising targets for future research into the selective agents responsible for their adaptive diversification and their contribution to speciation.

Project description:This review compiles the results of 21 genomic studies of European Bos taurus breeds and thus provides a general picture of the selection signatures in taurine cattle identified by genome-wide selection-mapping scans. By performing a comprehensive summary of the results reported in the literature, we compiled a list of 1049 selection sweeps described across 37 cattle breeds (17 beef breeds, 14 dairy breeds, and 6 dual-purpose breeds), and four different beef-vs.-dairy comparisons, which we subsequently grouped into core selective sweep (CSS) regions, defined as consecutive signals within 1 Mb of each other. We defined a total of 409 CSSs across the 29 bovine autosomes, 232 (57%) of which were associated with a single-breed (Single-breed CSSs), 134 CSSs (33%) were associated with a limited number of breeds (Two-to-Four-breed CSSs) and 39 CSSs (9%) were associated with five or more breeds (Multi-breed CSSs). For each CSS, we performed a candidate gene survey that identified 291 genes within the CSS intervals (from the total list of 5183 BioMart-extracted genes) linked to dairy and meat production, stature, and coat color traits. A complementary functional enrichment analysis of the CSS positional candidates highlighted other genes related to pathways underlying behavior, immune response, and reproductive traits. The Single-breed CSSs revealed an over-representation of genes related to dairy and beef production, this was further supported by over-representation of production-related pathway terms in these regions based on a functional enrichment analysis. Overall, this review provides a comparative map of the selection sweeps reported in European cattle breeds and presents for the first time a characterization of the selection sweeps that are found in individual breeds. Based on their uniqueness, these breed-specific signals could be considered as "divergence signals," which may be useful in characterizing and protecting livestock genetic diversity.

Dataset Information

Shared and unique components of human population structure and genome-wide signals of positive selection in South Asia.

Publications

Shared and unique components of human population structure and genome-wide signals of positive selection in South Asia.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets