Dataset Information

Effects of sample selection bias on the accuracy of population structure and ancestry inference.

ABSTRACT: Population stratification is an important task in genetic analyses. It provides information about the ancestry of individuals and can be an important confounder in genome-wide association studies. Public genotyping projects have made a large number of datasets available for study. However, practical constraints dictate that of a geographical/ethnic population, only a small number of individuals are genotyped. The resulting data are a sample from the entire population. If the distribution of sample sizes is not representative of the populations being sampled, the accuracy of population stratification analyses of the data could be affected. We attempt to understand the effect of biased sampling on the accuracy of population structure analysis and individual ancestry recovery. We examined two commonly used methods for analyses of such datasets, ADMIXTURE and EIGENSOFT, and found that the accuracy of recovery of population structure is affected to a large extent by the sample used for analysis and how representative it is of the underlying populations. Using simulated data and real genotype data from cattle, we show that sample selection bias can affect the results of population structure analyses. We develop a mathematical framework for sample selection bias in models for population structure and also proposed a correction for sample selection bias using auxiliary information about the sample. We demonstrate that such a correction is effective in practice using simulated and real data.

SUBMITTER: Shringarpure S

PROVIDER: S-EPMC4025489 | biostudies-literature | 2014 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Effects of sample selection bias on the accuracy of population structure and ancestry inference.

Shringarpure Suyash S Xing Eric P EP

G3 (Bethesda, Md.) 20140317 5

Population stratification is an important task in genetic analyses. It provides information about the ancestry of individuals and can be an important confounder in genome-wide association studies. Public genotyping projects have made a large number of datasets available for study. However, practical constraints dictate that of a geographical/ethnic population, only a small number of individuals are genotyped. The resulting data are a sample from the entire population. If the distribution of samp ...[more]

PMID: 24637351

Dataset Information

Effects of sample selection bias on the accuracy of population structure and ancestry inference.

Publications

Effects of sample selection bias on the accuracy of population structure and ancestry inference.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Selection Bias When Estimating Average Treatment Effects Using One-sample Instrumental Variable Analysis.
| S-EPMC6525095 | biostudies-literature

Ancestry inference of 96 population samples using microhaplotypes.
| S-EPMC5920014 | biostudies-literature

Struct-f4: a Rcpp package for ancestry profile and population structure inference from F4 statistics.
| S-EPMC8963280 | biostudies-literature

Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness.
| S-EPMC4836868 | biostudies-literature

Genetic Ancestry and Natural Selection Drive Population Differences in Immune Responses to Pathogens
2019-08-29 | GSE136566 | GEO

How Population Structure Impacts Genomic Selection Accuracy in Cross-Validation: Implications for Practical Breeding.
| S-EPMC7772221 | biostudies-literature

Fundamental limits on the accuracy of demographic inference based on the sample frequency spectrum.
| S-EPMC4485089 | biostudies-other

Estimating effective population size using RADseq: Effects of SNP selection and sample size.
| S-EPMC7042749 | biostudies-literature

Case studies in bias reduction and inference for electronic health record data with selection bias and phenotype misclassification.
| S-EPMC9826451 | biostudies-literature

Inference-based accuracy of metagenome prediction tools varies across sample types and functional categories.
| S-EPMC7118876 | biostudies-literature