Browse
Submit Data
Databases
API
Help

Dataset Information

0 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

Multi-locus inference of population structure: a comparison between single nucleotide polymorphisms and microsatellites.

ABSTRACT: Although growing numbers of single nucleotide polymorphisms (SNPs) and microsatellites (short tandem repeat polymorphisms or STRPs) are used to infer population structure, their relative properties in this context remain poorly understood. SNPs and STRPs mutate differently, suggesting multi-locus genotypes at these loci might differ in ability to detect population structure. Here, we use coalescent simulations to measure the power of sets of SNPs and STRPs to identify population structure. To maximize the applicability of our results to empirical studies, we focus on the popular STRUCTURE analysis and evaluate the role of several biological and practical factors in the detection of population structure. We find that: (1) fewer unlinked STRPs than SNPs are needed to detect structure at recent divergence times <0.3 N(e) generations; (2) accurate estimation of the number of populations requires many fewer STRPs than SNPs; (3) for both marker types, declines in power due to modest gene flow (N(e)m=1.0) are largely negated by increasing marker number; (4) variation in the STRP mutational model affects power modestly; (5) SNP haplotypes (θ=1, no recombination) provide power comparable with STRP loci (θ=10); (6) ascertainment schemes that select highly variable STRP or SNP loci increase power to detect structure, though ascertained data may not be suitable to other inference; and (7) when samples are drawn from an admixed population and one of its parent populations, the reduction in power to detect two populations is greater for STRPs than SNPs. These results should assist the design of multi-locus studies to detect population structure in nature.

SUBMITTER: Haasl RJ

PROVIDER: S-EPMC2892635 | biostudies-other | 2011 Jan

REPOSITORIES: biostudies-other

ACCESS DATA

Json Xml

Similar Datasets

POPSTR: Inference of Admixed Population Structure Based on Single-Nucleotide Polymorphisms and Copy Number Variations.

Project description:Statistical approaches for population structure estimation have been predominantly driven by a particular data type, single-nucleotide polymorphisms (SNPs). However, in the presence of weak identifiability in SNPs, population structure estimation can suffer from undesirable accuracy loss. Copy number variations (CNVs) are genomic structural variants with loci that are commonly shared within a specific population and thus provide valuable information for estimation of the ancestry of sampled populations. We develop a Bayesian joint modeling framework of SNPs and CNVs, called POPSTR, to better understand population structure than approaches that use SNPs solely. To deal with the increased data volume, we use the Metropolis Adjusted Langevin algorithm (MALA) that guides the target distribution in a computationally efficient way. We illustrate applications of our approach using the HapMap 2005 project data. We carry out simulation studies and show that the performance of our approach is comparable or better than that of popular benchmarks, STRUCTURE and ADMIXTURE. We also observe that using only CNVs can be remarkably efficient if SNP data are not available.

| S-EPMC5915226 | biostudies-literature

Whole-genome scan, in a complex disease, using 11,245 single-nucleotide polymorphisms: comparison with microsatellites.

Project description:Despite the theoretical evidence of the utility of single-nucleotide polymorphisms (SNPs) for linkage analysis, no whole-genome scans of a complex disease have yet been published to directly compare SNPs with microsatellites. Here, we describe a whole-genome screen of 157 families with multiple cases of rheumatoid arthritis (RA), performed using 11,245 genomewide SNPs. The results were compared with those from a 10-cM microsatellite scan in the same cohort. The SNP analysis detected HLA*DRB1, the major RA susceptibility locus (P=.00004), with a linkage interval of 31 cM, compared with a 50-cM linkage interval detected by the microsatellite scan. In addition, four loci were detected at a nominal significance level (P<.05) in the SNP linkage analysis; these were not observed in the microsatellite scan. We demonstrate that variation in information content was the main factor contributing to observed differences in the two scans, with the SNPs providing significantly higher information content than the microsatellites. Reducing the number of SNPs in the marker set to 3,300 (1-cM spacing) caused several loci to drop below nominal significance levels, suggesting that decreases in information content can have significant effects on linkage results. In contrast, differences in maps employed in the analysis, the low detectable rate of genotyping error, and the presence of moderate linkage disequilibrium between markers did not significantly affect the results. We have demonstrated the utility of a dense SNP map for performing linkage analysis in a late-age-at-onset disease, where DNA from parents is not always available. The high SNP density allows loci to be defined more precisely and provides a partial scaffold for association studies, substantially reducing the resource requirement for gene-mapping studies.

| S-EPMC1182008 | biostudies-literature

Comparison of microsatellites versus single-nucleotide polymorphisms in a genome linkage screen for prostate cancer-susceptibility Loci.

Project description:Prostate cancer is one of the most common cancers among men and has long been recognized to occur in familial clusters. Brothers and sons of affected men have a 2-3-fold increased risk of developing prostate cancer. However, identification of genetic susceptibility loci for prostate cancer has been extremely difficult. Although the suggestion of linkage has been reported for many chromosomes, the most promising regions have been difficult to replicate. In this study, we compare genome linkage scans using microsatellites with those using single-nucleotide polymorphisms (SNPs), performed in 467 men with prostate cancer from 167 families. For the microsatellites, the ABI Prism Linkage Mapping Set version 2, with 402 microsatellite markers, was used, and, for the SNPs, the Early Access Affymetrix Mapping 10K array was used. Our results show that the presence of linkage disequilibrium (LD) among SNPs can lead to inflated LOD scores, and this seems to be an artifact due to the assumption of linkage equilibrium that is required by the current genetic-linkage software. After excluding SNPs with high LD, we found a number of new LOD-score peaks with values of at least 2.0 that were not found by the microsatellite markers: chromosome 8, with a maximum model-free LOD score of 2.2; chromosome 2, with a LOD score of 2.1; chromosome 6, with a LOD score of 4.2; and chromosome 12, with a LOD score of 3.9. The LOD scores for chromosomes 6 and 12 are difficult to interpret, because they occurred only at the extreme ends of the chromosomes. The greatest gain provided by the SNP markers was a large increase in the linkage information content, with an average information content of 61% for the SNPs, versus an average of 41% for the microsatellite markers. The strengths and weaknesses of microsatellite versus SNP markers are illustrated by the results of our genome linkage scans.

| S-EPMC1182157 | biostudies-literature

Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms.

Project description:Haplotypes have gained increasing attention in the mapping of complex-disease genes, because of the abundance of single-nucleotide polymorphisms (SNPs) and the limited power of conventional single-locus analyses. It has been shown that haplotype-inference methods such as Clark's algorithm, the expectation-maximization algorithm, and a coalescence-based iterative-sampling algorithm are fairly effective and economical alternatives to molecular-haplotyping methods. To contend with some weaknesses of the existing algorithms, we propose a new Monte Carlo approach. In particular, we first partition the whole haplotype into smaller segments. Then, we use the Gibbs sampler both to construct the partial haplotypes of each segment and to assemble all the segments together. Our algorithm can accurately and rapidly infer haplotypes for a large number of linked SNPs. By using a wide variety of real and simulated data sets, we demonstrate the advantages of our Bayesian algorithm, and we show that it is robust to the violation of Hardy-Weinberg equilibrium, to the presence of missing data, and to occurrences of recombination hotspots.

| S-EPMC448439 | biostudies-other

Empirical landscape genetic comparison of single nucleotide polymorphisms and microsatellites in three arid-zone mammals with high dispersal capacity.

Project description:Landscape genetics is increasingly transitioning away from microsatellites, with single nucleotide polymorphisms (SNPs) providing increased resolution for detecting patterns of spatial-genetic structure. This is particularly pertinent for research in arid-zone mammals due to challenges associated with unique life history traits, such as boom-bust population dynamics and long-distance dispersal capacities. Here, we provide a case study comparing SNPs versus microsatellites for testing three explicit landscape genetic hypotheses (isolation-by-distance, isolation-by-barrier, and isolation-by-resistance) in a suite of small, arid-zone mammals in the Pilbara region of Western Australia. Using clustering algorithms, Mantel tests, and linear mixed effects models, we compare functional connectivity between genetic marker types and across species, including one marsupial, Ningaui timealeyi, and two native rodents, Pseudomys chapmani and P. hermannsburgensis. SNPs resolved subtle genetic structuring not detected by microsatellites, particularly for N. timealeyi where two genetic clusters were identified. Furthermore, stronger signatures of isolation-by-distance and isolation-by-resistance were detected when using SNPs, and model selection based on SNPs tended to identify more complex resistance surfaces (i.e., composite surfaces of multiple environmental layers) in the best-performing models. While we found limited evidence for physical barriers to dispersal across the Pilbara for all species, we found that topography, substrate, and soil moisture were the main environmental drivers shaping functional connectivity. Our study demonstrates that new analytical and genetic tools can provide novel ecological insights into arid landscapes, with potential application to conservation management through identifying dispersal corridors to mediate the impacts of ongoing habitat fragmentation in the region.

| S-EPMC10154367 | biostudies-literature

The cardiovascular implication of single nucleotide polymorphisms of chromosome 9p21 locus among Arab population.

Project description:BackgroundBased on several reports including genome-wide association studies, genetic variability has been linked with higher (nearly half) susceptibility toward coronary artery disease (CAD). We aimed to evaluate the association of chromosome 9p21 single nucleotide polymorphisms (SNPs): rs2383207, rs10757278, and rs10757274 with the risk and severity of CAD among Arab population.Materials and methodsA prospective observational case-control study was conducted between 2011 and 2012, in which 236 patients with CAD were recruited from the Heart Hospital in Qatar. Patients were categorized according to their coronary angiographic findings. Also, 152 healthy volunteers were studied to determine if SNPs are associated with risk of CAD. All subjects were genotyped for SNPs (rs2383207, rs2383206, rs10757274 and rs10757278) using allele-specific real-time polymerase chain reaction.ResultsPatients with CAD had a mean age of 57 ± 10; of them 77% were males, 54% diabetics, and 25% had family history of CAD. All SNPs were in Hardy-Weinberg equilibrium except rs2383206, with call rate >97%. After adjusting for age, sex and body mass index, the carriers of GG genotype for rs2383207 have increased the risk of having CAD with odds ratio (OR) of 1.52 (95% confidence interval [CI] = 1.01-2.961, P = 0.046). Also, rs2383207 contributed to CAD severity with adjusted OR 1.80 (95% CI = 1.04-3.12, P = 0.035) based on the dominant genetic model. The other SNPs (rs10757274 and rs10757278) showed no significant association with the risk of CAD or its severity.ConclusionAmong Arab population in Qatar, only G allele of rs2483207 SNP is significantly associated with risk of CAD and its severity.

| S-EPMC4468449 | biostudies-literature

The efficacy of short tandem repeat polymorphisms versus single-nucleotide polymorphisms for resolving population structure.

Project description:Accurately resolving population structure in a sample is important for both linkage and association studies. In this study we investigated the power of single-nucleotide polymorphisms (SNPs) in detecting population structure in a sample of 286 unrelated individuals. We varied the number of SNPs to determine how many are required to approach the degree of resolution obtained with the Collaborative Study on the Genetics of Alcoholism (COGA) short tandem repeat polymorphisms (STRPs). In addition, we selected SNPs with varying minor allele frequencies (MAFs) to determine whether low or high frequency SNPs are more efficient in resolving population structure. We conclude that a set of at least 100 evenly spaced SNPs with MAFs of 40-50% is required to resolve population structure in this dataset. If SNPs with lower MAFs are used, then more than 250 SNPs may be required to obtain reliable results.

| S-EPMC1866696 | biostudies-literature

Genome-wide analysis of single nucleotide polymorphisms uncovers population structure in Northern Europe.

Project description:BackgroundGenome-wide data provide a powerful tool for inferring patterns of genetic variation and structure of human populations.Principal findingsIn this study, we analysed almost 250,000 SNPs from a total of 945 samples from Eastern and Western Finland, Sweden, Northern Germany and Great Britain complemented with HapMap data. Small but statistically significant differences were observed between the European populations (F(ST) = 0.0040, p<10(-4)), also between Eastern and Western Finland (F(ST) = 0.0032, p<10(-3)). The latter indicated the existence of a relatively strong autosomal substructure within the country, similar to that observed earlier with smaller numbers of markers. The Germans and British were less differentiated than the Swedes, Western Finns and especially the Eastern Finns who also showed other signs of genetic drift. This is likely caused by the later founding of the northern populations, together with subsequent founder and bottleneck effects, and a smaller population size. Furthermore, our data suggest a small eastern contribution among the Finns, consistent with the historical and linguistic background of the population.SignificanceOur results warn against a priori assumptions of homogeneity among Finns and other seemingly isolated populations. Thus, in association studies in such populations, additional caution for population structure may be necessary. Our results illustrate that population history is often important for patterns of genetic variation, and that the analysis of hundreds of thousands of SNPs provides high resolution also for population genetics.

| S-EPMC2567036 | biostudies-literature

From microsatellites to single nucleotide polymorphisms for the genetic monitoring of a critically endangered sturgeon.

Project description:The use of genetic information is crucial in conservation programs for the establishment of breeding plans and for the evaluation of restocking success. Short tandem repeats (STRs) have been the most widely used molecular markers in such programs, but next-generation sequencing approaches have prompted the transition to genome-wide markers such as single nucleotide polymorphisms (SNPs). Until now, most sturgeon species have been monitored using STRs. The low diversity found in the critically endangered European sturgeon (Acipenser sturio), however, makes its future genetic monitoring challenging, and the current resolution needs to be increased. Here, we describe the discovery of a highly informative set of 79 SNPs using double-digest restriction-associated DNA (ddRAD) sequencing and its validation by genotyping using the MassARRAY system. Comparing with STRs, the SNP panel proved to be highly efficient and reproducible, allowing for more accurate parentage and kinship assignments' on 192 juveniles of known pedigree and 40 wild-born adults. We explore the effectiveness of both markers to estimated relatedness and inbreeding, using simulated and empirical datasets. Interestingly, we found significant correlations between STRs and SNPs at individual heterozygosity and inbreeding that give support to a reasonable representation of whole genome diversity for both markers. These results are useful for the conservation program of A. sturio in building a comprehensive studbook, which will optimize conservation strategies. This approach also proves suitable for other case studies in which highly discriminatory genetic markers are needed to assess parentage and kinship.

| S-EPMC6662312 | biostudies-literature

Comparison of Risk Allele Frequencies of Psoriasis-Associated Single-Nucleotide Polymorphisms in Different Population Groups.

Project description:BackgroundThe prevalence of psoriasis differs by population, and it appears to be more common among Europeans than in East Asians. Recent genome-wide association studies (GWAS) have identified alleles that increase the risk of psoriasis, and these alleles may present different frequencies in different geographic regions.ObjectiveWe aimed to gain insights into the causes of differences in disease frequencies according to populations and the factors affecting prevalence and pattern differences.MethodsWe collected a total of 147 psoriasis-associated single-nucleotide polymorphisms (SNPs) from the GWAS catalog and compared the allele frequency differences in 27 populations using public population frequency in the 1000 Genomes Project phase 3 (n=2,504) and the Korean Reference Genome Database (n=1,722). Additionally, we calculated the composited genetic risk scores across the population groups.ResultsThere were distinct patterns of allele frequencies in different population groups. In many cases, East Asians exhibited allele frequencies opposite to that of Europeans. The genetic risk score was higher in Europeans (average: 0.487) and Americans (average: 0.492) than in East Asians (average: 0.471). The prevalence of psoriasis correlated with the average genetic risk score of the population.ConclusionWe observed a difference in the allele frequencies of psoriasis-associated SNPs between the studied populations. This result suggests that the difference in the prevalence of psoriasis between population groups can be interpreted to some extent by the genotype.

| S-EPMC9905865 | biostudies-literature

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data