Unknown

Dataset Information

0

Selecting SNPs to identify ancestry.


ABSTRACT: An individual's genotypes at a group of single-nucleotide polymorphisms (SNPs) can be used to predict that individual's ethnicity or ancestry. In medical studies, knowledge of a subject's ancestry can minimize possible confounding, and in forensic applications, such knowledge can help direct investigations. Our goal is to select a small subset of SNPs, from the millions already identified in the human genome, that can predict ancestry with a minimal error rate. The general form for this variable selection procedure is to estimate the expected error rates for sets of SNPs using a training dataset and consider those sets with the lowest error rates given their size. The quality of the estimate for the error rate determines the quality of the resulting SNPs. As the apparent error rate performs poorly when either the number of SNPs or the number of populations is large; we propose a new estimate, the Improved Bayesian Estimate. We demonstrate that selection procedures based on this estimate produce small sets of SNPs that can accurately predict ancestry. We also provide a list of the 100 optimal SNPs for identifying ancestry.

SUBMITTER: Sampson JN 

PROVIDER: S-EPMC3141729 | biostudies-literature | 2011 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

Selecting SNPs to identify ancestry.

Sampson Joshua N JN   Kidd Kenneth K KK   Kidd Judith R JR   Zhao Hongyu H  

Annals of human genetics 20110701 4


An individual's genotypes at a group of single-nucleotide polymorphisms (SNPs) can be used to predict that individual's ethnicity or ancestry. In medical studies, knowledge of a subject's ancestry can minimize possible confounding, and in forensic applications, such knowledge can help direct investigations. Our goal is to select a small subset of SNPs, from the millions already identified in the human genome, that can predict ancestry with a minimal error rate. The general form for this variable  ...[more]

Similar Datasets

| S-EPMC4855449 | biostudies-literature
| S-EPMC3476707 | biostudies-literature
| S-EPMC1316880 | biostudies-other
| S-EPMC3072384 | biostudies-literature
| S-EPMC3899836 | biostudies-literature
| S-EPMC3677760 | biostudies-literature
| S-EPMC7860026 | biostudies-literature
| S-EPMC6906462 | biostudies-literature
| S-EPMC5495141 | biostudies-literature
| S-EPMC9731996 | biostudies-literature