Unknown

Dataset Information

0

TRM: a powerful two-stage machine learning approach for identifying SNP-SNP interactions.


ABSTRACT: Studies have shown that interactions of single nucleotide polymorphisms (SNPs) may play an important role in understanding the causes of complex disease. We have proposed an integrated machine learning method that combines two machine-learning methods-Random Forests (RF) and Multivariate Adaptive Regression Splines (MARS)-to identify a subset of important SNPs and detect interaction patterns more effectively and efficiently. In this two-stage RF-MARS (TRM) approach, RF is first applied to detect a predictive subset of SNPs, and then MARS is used to identify the interaction patterns. We evaluated the TRM performances in four models. RF variable selection was based on out-of-bag classification error rate (OOB) and variable important spectrum (IS). Our results support that RF(OOB) had better performance than MARS and RF(IS) in detecting important variables. This study demonstrates that TRM(OOB) , which is RF(OOB) plus MARS, has combined the strengths of RF and MARS in identifying SNP-SNP interactions in a scenario of 100 candidate SNPs. TRM(OOB) had greater true positive rate and lower false positive rate compared with MARS, particularly for searching interactions with a strong association with the outcome. Therefore, the use of TRM(OOB) is favored for exploring SNP-SNP interactions in a large-scale genetic variation study.

SUBMITTER: Lin HY 

PROVIDER: S-EPMC3243917 | biostudies-literature | 2012 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

TRM: a powerful two-stage machine learning approach for identifying SNP-SNP interactions.

Lin Hui-Yi HY   Chen Y Ann YA   Tsai Ya-Yu YY   Qu Xiaotao X   Tseng Tung-Sung TS   Park Jong Y JY  

Annals of human genetics 20111211 1


Studies have shown that interactions of single nucleotide polymorphisms (SNPs) may play an important role in understanding the causes of complex disease. We have proposed an integrated machine learning method that combines two machine-learning methods-Random Forests (RF) and Multivariate Adaptive Regression Splines (MARS)-to identify a subset of important SNPs and detect interaction patterns more effectively and efficiently. In this two-stage RF-MARS (TRM) approach, RF is first applied to detect  ...[more]

Similar Datasets

2023-06-01 | GSE193400 | GEO
| S-EPMC9499949 | biostudies-literature
| PRJNA796028 | ENA
| S-EPMC10601900 | biostudies-literature
| S-EPMC5600667 | biostudies-literature
| S-EPMC3235098 | biostudies-literature
| S-EPMC7396586 | biostudies-literature
| S-EPMC11229760 | biostudies-literature
| S-EPMC9245788 | biostudies-literature
| S-EPMC11371525 | biostudies-literature