Unknown

Dataset Information

0

Predictive modeling in case-control single-nucleotide polymorphism studies in the presence of population stratification: a case study using Genetic Analysis Workshop 16 Problem 1 dataset.


ABSTRACT: In this paper, we apply the gradient-boosting machine predictive model to the rheumatoid arthritis data for predicting the case-control status. QQ-plot suggests severe population stratification. In univariate genome-wide association studies, a correction factor for ethnicity confounding can be derived. Here we propose a novel strategy to deal with population stratification in the context of multivariate predictive modeling. We address the problem by clustering the subjects on the axes of genetic variations, and building a predictive model separately in each cluster. This allows us to control ethnicity without explicitly including it in the model, which could marginalize the genetic signal we are trying to discover. Clustering not only leads to more similar ethnicity groups but also, as our results show, increases the accuracy of our model when compared to the non-clustered approach. The highest accuracy is achieved with the model adjusted for population stratification, when the genetic axes of variation are included among the set of predictors, although this may be misleading given the confounding effects.

SUBMITTER: Arshadi N 

PROVIDER: S-EPMC2795961 | biostudies-literature | 2009 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Predictive modeling in case-control single-nucleotide polymorphism studies in the presence of population stratification: a case study using Genetic Analysis Workshop 16 Problem 1 dataset.

Arshadi Niloofar N   Chang Billy B   Kustra Rafal R  

BMC proceedings 20091215


In this paper, we apply the gradient-boosting machine predictive model to the rheumatoid arthritis data for predicting the case-control status. QQ-plot suggests severe population stratification. In univariate genome-wide association studies, a correction factor for ethnicity confounding can be derived. Here we propose a novel strategy to deal with population stratification in the context of multivariate predictive modeling. We address the problem by clustering the subjects on the axes of genetic  ...[more]

Similar Datasets

| S-EPMC2795916 | biostudies-literature
| S-EPMC2795938 | biostudies-literature
| S-EPMC4741865 | biostudies-literature
| S-EPMC5820983 | biostudies-literature
| S-EPMC2795929 | biostudies-literature
| S-EPMC7358551 | biostudies-literature
| 2135070 | ecrin-mdr-crc
2023-05-31 | GSE186441 | GEO
| S-EPMC2795975 | biostudies-literature
| S-EPMC4485866 | biostudies-literature