Unknown

Dataset Information

0

Bootstrap aggregating of alternating decision trees to detect sets of SNPs that associate with disease.


ABSTRACT: Complex genetic disorders are a result of a combination of genetic and nongenetic factors, all potentially interacting. Machine learning methods hold the potential to identify multilocus and environmental associations thought to drive complex genetic traits. Decision trees, a popular machine learning technique, offer a computationally low complexity algorithm capable of detecting associated sets of single nucleotide polymorphisms (SNPs) of arbitrary size, including modern genome-wide SNP scans. However, interpretation of the importance of an individual SNP within these trees can present challenges. We present a new decision tree algorithm denoted as Bagged Alternating Decision Trees (BADTrees) that is based on identifying common structural elements in a bootstrapped set of Alternating Decision Trees (ADTrees). The algorithm is order nk(2), where n is the number of SNPs considered and k is the number of SNPs in the tree constructed. Our simulation study suggests that BADTrees have higher power and lower type I error rates than ADTrees alone and comparable power with lower type I error rates compared to logistic regression. We illustrate the application of these data using simulated data as well as from the Lupus Large Association Study 1 (7,822 SNPs in 3,548 individuals). Our results suggest that BADTrees hold promise as a low computational order algorithm for detecting complex combinations of SNP and environmental factors associated with disease.

SUBMITTER: Guy RT 

PROVIDER: S-EPMC3769952 | biostudies-literature | 2012 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

Bootstrap aggregating of alternating decision trees to detect sets of SNPs that associate with disease.

Guy Richard T RT   Santago Peter P   Langefeld Carl D CD  

Genetic epidemiology 20120201 2


Complex genetic disorders are a result of a combination of genetic and nongenetic factors, all potentially interacting. Machine learning methods hold the potential to identify multilocus and environmental associations thought to drive complex genetic traits. Decision trees, a popular machine learning technique, offer a computationally low complexity algorithm capable of detecting associated sets of single nucleotide polymorphisms (SNPs) of arbitrary size, including modern genome-wide SNP scans.  ...[more]

Similar Datasets

| S-EPMC2914027 | biostudies-literature
| S-EPMC2825231 | biostudies-literature
| S-EPMC4572492 | biostudies-literature
| S-EPMC3006123 | biostudies-literature
| S-EPMC6526911 | biostudies-literature
| S-EPMC10317128 | biostudies-literature
| S-EPMC7064639 | biostudies-literature
| S-EPMC3021847 | biostudies-literature
| S-EPMC7763457 | biostudies-literature
| S-EPMC3808618 | biostudies-literature