Dataset Information

Maximal conditional chi-square importance in random forests.

ABSTRACT:

Motivation

High-dimensional data are frequently generated in genome-wide association studies (GWAS) and other studies. It is important to identify features such as single nucleotide polymorphisms (SNPs) in GWAS that are associated with a disease. Random forests represent a very useful approach for this purpose, using a variable importance score. This importance score has several shortcomings. We propose an alternative importance measure to overcome those shortcomings.

Results

We characterized the effect of multiple SNPs under various models using our proposed importance measure in random forests, which uses maximal conditional chi-square (MCC) as a measure of association between a SNP and the trait conditional on other SNPs. Based on this importance measure, we employed a permutation test to estimate empirical P-values of SNPs. Our method was compared to a univariate test and the permutation test using the Gini and permutation importance. In simulation, the proposed method performed consistently superior to the other methods in identifying of risk SNPs. In a GWAS of age-related macular degeneration, the proposed method confirmed two significant SNPs (at the genome-wide adjusted level of 0.05). Further analysis showed that these two SNPs conformed with a heterogeneity model. Compared with the existing importance measures, the MCC importance measure is more sensitive to complex effects of risk SNPs by utilizing conditional information on different SNPs. The permutation test with the MCC importance measure provides an efficient way to identify candidate SNPs in GWAS and facilitates the understanding of the etiology between genetic variants and complex diseases.

Contact

heping.zhang@yale.edu

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Wang M

PROVIDER: S-EPMC2832825 | biostudies-literature | 2010 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Maximal conditional chi-square importance in random forests.

Wang Minghui M Chen Xiang X Zhang Heping H

Bioinformatics (Oxford, England) 20100203 6

<h4>Motivation</h4>High-dimensional data are frequently generated in genome-wide association studies (GWAS) and other studies. It is important to identify features such as single nucleotide polymorphisms (SNPs) in GWAS that are associated with a disease. Random forests represent a very useful approach for this purpose, using a variable importance score. This importance score has several shortcomings. We propose an alternative importance measure to overcome those shortcomings.<h4>Results</h4>We c ...[more]

PMID: 20130032

Dataset Information

Maximal conditional chi-square importance in random forests.

Motivation

Results

Contact

Supplementary information

Publications

Maximal conditional chi-square importance in random forests.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Variable importance-weighted Random Forests.
| S-EPMC6051549 | biostudies-literature

An AUC-based permutation variable importance measure for random forests.
| S-EPMC3626572 | biostudies-literature

The Chi-Square Test of Distance Correlation.
| S-EPMC9191842 | biostudies-literature

Surrogate minimal depth as an importance measure for variables in random forests.
| S-EPMC6761946 | biostudies-literature

Variable importance for sustaining macrophyte presence via random forests: data imputation and model settings.
| S-EPMC6162213 | biostudies-literature

Intervention in prediction measure: a new approach to assessing variable importance for random forests.
| S-EPMC5414143 | biostudies-literature

Random Forests Based Group Importance Scores and Their Statistical Interpretation: Application for Alzheimer's Disease.
| S-EPMC6034092 | biostudies-literature

OBLIQUE RANDOM SURVIVAL FORESTS
| S-EPMC9875945 | biostudies-literature

A Powerful Variant-Set Association Test Based on Chi-Square Distribution.
| S-EPMC5669628 | biostudies-literature

Aggregated recommendation through random forests.
| S-EPMC4142736 | biostudies-other