Unknown

Dataset Information

0

Modeling X Chromosome Data Using Random Forests: Conquering Sex Bias.


ABSTRACT: Machine learning methods, including Random Forests (RF), are increasingly used for genetic data analysis. However, the standard RF algorithm does not correctly model the effects of X chromosome single nucleotide polymorphisms (SNPs), leading to biased estimates of variable importance. We propose extensions of RF to correctly model X SNPs, including a stratified approach and an approach based on the process of X chromosome inactivation. We applied the new and standard RF approaches to case-control alcohol dependence data from the Study of Addiction: Genes and Environment (SAGE), and compared the performance of the alternative approaches via a simulation study. Standard RF applied to a case-control study of alcohol dependence yielded inflated variable importance estimates for X SNPs, even when sex was included as a variable, but the results of the new RF methods were consistent with univariate regression-based approaches that correctly model X chromosome data. Simulations showed that the new RF methods eliminate the bias in standard RF variable importance for X SNPs when sex is associated with the trait, and are able to detect causal autosomal and X SNPs. Even in the absence of sex effects, the new extensions perform similarly to standard RF. Thus, we provide a powerful multimarker approach for genetic analysis that accommodates X chromosome data in an unbiased way. This method is implemented in the freely available R package "snpRF" (http://www.cran.r-project.org/web/packages/snpRF/).

SUBMITTER: Winham SJ 

PROVIDER: S-EPMC4724236 | biostudies-literature | 2016 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

Modeling X Chromosome Data Using Random Forests: Conquering Sex Bias.

Winham Stacey J SJ   Jenkins Gregory D GD   Biernacka Joanna M JM  

Genetic epidemiology 20151207 2


Machine learning methods, including Random Forests (RF), are increasingly used for genetic data analysis. However, the standard RF algorithm does not correctly model the effects of X chromosome single nucleotide polymorphisms (SNPs), leading to biased estimates of variable importance. We propose extensions of RF to correctly model X SNPs, including a stratified approach and an approach based on the process of X chromosome inactivation. We applied the new and standard RF approaches to case-contro  ...[more]

Similar Datasets

| S-EPMC8408353 | biostudies-literature
| S-EPMC3750505 | biostudies-literature
| S-EPMC7794504 | biostudies-literature
| S-EPMC6686255 | biostudies-literature
| S-EPMC4331277 | biostudies-literature
| S-EPMC3287827 | biostudies-other
| S-EPMC6598279 | biostudies-literature
| S-EPMC5080476 | biostudies-literature
| S-EPMC2651179 | biostudies-literature
| S-EPMC3018816 | biostudies-other