Unknown

Dataset Information

0

Detecting significant single-nucleotide polymorphisms in a rheumatoid arthritis study using random forests.


ABSTRACT: Random forest is an efficient approach for investigating not only the effects of individual markers on a trait but also the effect of the interactions among the markers in genetic association studies. This approach is especially appealing for the analysis of genome-wide data, such as those obtained from gene expression/single-nucleotide polymorphism (SNP) array experiments in which the number of candidate genes/SNPs is vast. We applied this approach to the Genetic Analysis Workshop 16 Problem 1 data to identify SNPs that contribute to rheumatoid arthritis. The random forest computed a raw importance score for each SNP marker, where higher importance score suggests higher level of association between the marker and the trait. The significance level of the association was determined empirically by repeatedly reapplying the random forest on randomly generated data under the null hypothesis that no association exists between the markers and the trait. Using random forest, we were able to identify 228 significant SNPs (at the genome-wide significant level of 0.05) across the whole genome, over two-thirds of which are located on chromosome 6, especially clustered in the region of 6p21 containing the human leukocyte antigen (HLA) genes, such as gene HLA-DRB1 and HLA-DRA. Further analysis of this region indicates a strong association to the rheumatoid arthritis status.

SUBMITTER: Wang M 

PROVIDER: S-EPMC2795970 | biostudies-literature | 2009 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Detecting significant single-nucleotide polymorphisms in a rheumatoid arthritis study using random forests.

Wang Minghui M   Chen Xiang X   Zhang Meizhuo M   Zhu Wensheng W   Cho Kelly K   Zhang Heping H  

BMC proceedings 20091215


Random forest is an efficient approach for investigating not only the effects of individual markers on a trait but also the effect of the interactions among the markers in genetic association studies. This approach is especially appealing for the analysis of genome-wide data, such as those obtained from gene expression/single-nucleotide polymorphism (SNP) array experiments in which the number of candidate genes/SNPs is vast. We applied this approach to the Genetic Analysis Workshop 16 Problem 1  ...[more]

Similar Datasets

| S-EPMC2367463 | biostudies-literature
| S-EPMC1526613 | biostudies-literature
| S-EPMC6986563 | biostudies-literature
| S-EPMC2795969 | biostudies-literature
| S-EPMC10419658 | biostudies-literature
| S-EPMC4167828 | biostudies-literature
| S-EPMC2367457 | biostudies-literature
| S-EPMC2911851 | biostudies-literature
| S-EPMC8440805 | biostudies-literature
| S-EPMC5400544 | biostudies-literature