Unknown

Dataset Information

0

Genetic differences among ethnic groups.


ABSTRACT: Many differences between different ethnic groups have been observed, such as skin color, eye color, height, susceptibility to some diseases, and response to certain drugs. However, the genetic bases of such differences have been under-investigated. Since the HapMap project, large-scale genotype data from Caucasian, African and Asian population samples have been available. The project found that these populations were located in different areas of the PCA (Principal Component Analysis) plot. However, as an unsupervised method, PCA does not measure the differences in each single nucleotide polymorphism (SNP) among populations.We applied an advanced mutual information-based feature selection method to detect associations between SNP status and ethnic groups using the latest HapMap Phase 3 release version 3, which included more sub-populations. A total of 299 SNPs were identified, and they can accurately predicted the ethnicity of all HapMap populations. The 10-fold cross validation accuracy of the SMO (sequential minimal optimization) model on training dataset was 0.901, and the accuracy on independent test dataset was 0.895.In-depth functional analysis of these SNPs and their nearby genes revealed the genetic bases of skin and eye color differences among populations.

SUBMITTER: Huang T 

PROVIDER: S-EPMC4687076 | biostudies-literature | 2015

REPOSITORIES: biostudies-literature

altmetric image

Publications

Genetic differences among ethnic groups.

Huang Tao T   Shu Yang Y   Cai Yu-Dong YD  

BMC genomics 20151221


<h4>Background</h4>Many differences between different ethnic groups have been observed, such as skin color, eye color, height, susceptibility to some diseases, and response to certain drugs. However, the genetic bases of such differences have been under-investigated. Since the HapMap project, large-scale genotype data from Caucasian, African and Asian population samples have been available. The project found that these populations were located in different areas of the PCA (Principal Component A  ...[more]

Similar Datasets

| S-EPMC3005333 | biostudies-literature
| S-EPMC3071720 | biostudies-other
| S-EPMC2729371 | biostudies-literature
| S-EPMC2515844 | biostudies-other
| S-EPMC3686566 | biostudies-literature
| S-EPMC4351376 | biostudies-literature
| S-EPMC2745423 | biostudies-literature
| S-EPMC5564542 | biostudies-other