Unknown

Dataset Information

0

Efficient identification of context dependent subgroups of risk from genome-wide association studies.


ABSTRACT: We have developed a modified Patient Rule-Induction Method (PRIM) as an alternative strategy for analyzing representative samples of non-experimental human data to estimate and test the role of genomic variations as predictors of disease risk in etiologically heterogeneous sub-samples. A computational limit of the proposed strategy is encountered when the number of genomic variations (predictor variables) under study is large (>500) because permutations are used to generate a null distribution to test the significance of a term (defined by values of particular variables) that characterizes a sub-sample of individuals through the peeling and pasting processes. As an alternative, in this paper we introduce a theoretical strategy that facilitates the quick calculation of Type I and Type II errors in the evaluation of terms in the peeling and pasting processes carried out in the execution of a PRIM analysis that are under-estimated and non-existent, respectively, when a permutation-based hypothesis test is employed. The resultant savings in computational time makes possible the consideration of larger numbers of genomic variations (an example genome-wide association study is given) in the selection of statistically significant terms in the formulation of PRIM prediction models.

SUBMITTER: Dyson G 

PROVIDER: S-EPMC4171947 | biostudies-literature | 2014 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

Efficient identification of context dependent subgroups of risk from genome-wide association studies.

Dyson Greg G   Sing Charles F CF  

Statistical applications in genetics and molecular biology 20140401 2


We have developed a modified Patient Rule-Induction Method (PRIM) as an alternative strategy for analyzing representative samples of non-experimental human data to estimate and test the role of genomic variations as predictors of disease risk in etiologically heterogeneous sub-samples. A computational limit of the proposed strategy is encountered when the number of genomic variations (predictor variables) under study is large (>500) because permutations are used to generate a null distribution t  ...[more]

Similar Datasets

| S-EPMC4211878 | biostudies-other
| S-EPMC3386377 | biostudies-literature
| S-EPMC2964405 | biostudies-literature
| S-EPMC3554627 | biostudies-literature
| S-EPMC7029489 | biostudies-literature
| S-EPMC4866522 | biostudies-literature
| S-EPMC3247851 | biostudies-other
2021-06-01 | GSE172368 | GEO
| S-EPMC4373116 | biostudies-literature
| S-EPMC6018732 | biostudies-literature