Unknown

Dataset Information

0

A decision-theory approach to interpretable set analysis for high-dimensional data.


ABSTRACT: A key problem in high-dimensional significance analysis is to find pre-defined sets that show enrichment for a statistical signal of interest; the classic example is the enrichment of gene sets for differentially expressed genes. Here, we propose a new decision-theory approach to the analysis of gene sets which focuses on estimating the fraction of non-null variables in a set. We introduce the idea of "atoms," non-overlapping sets based on the original pre-defined set annotations. Our approach focuses on finding the union of atoms that minimizes a weighted average of the number of false discoveries and missed discoveries. We introduce a new false discovery rate for sets, called the atomic false discovery rate (afdr), and prove that the optimal estimator in our decision-theory framework is to threshold the afdr. These results provide a coherent and interpretable framework for the analysis of sets that addresses the key issues of overlapping annotations and difficulty in interpreting p values in both competitive and self-contained tests. We illustrate our method and compare it to a popular existing method using simulated examples, as well as gene-set and brain ROI data analyses.

SUBMITTER: Boca SM 

PROVIDER: S-EPMC3927844 | biostudies-literature | 2013 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

A decision-theory approach to interpretable set analysis for high-dimensional data.

Boca Simina M SM   Bravo Héctor Céorrada HC   Caffo Brian B   Leek Jeffrey T JT   Parmigiani Giovanni G  

Biometrics 20130802 3


A key problem in high-dimensional significance analysis is to find pre-defined sets that show enrichment for a statistical signal of interest; the classic example is the enrichment of gene sets for differentially expressed genes. Here, we propose a new decision-theory approach to the analysis of gene sets which focuses on estimating the fraction of non-null variables in a set. We introduce the idea of "atoms," non-overlapping sets based on the original pre-defined set annotations. Our approach f  ...[more]

Similar Datasets

| S-EPMC6175336 | biostudies-literature
| S-EPMC8575033 | biostudies-literature
| S-EPMC3443674 | biostudies-literature
| S-EPMC7083016 | biostudies-literature
| S-EPMC5862270 | biostudies-literature
| S-EPMC8599814 | biostudies-literature
| S-EPMC8782527 | biostudies-literature
2016-09-01 | E-GEOD-70405 | biostudies-arrayexpress
| S-EPMC4834947 | biostudies-literature
| S-EPMC2682540 | biostudies-literature