Unknown

Dataset Information

0

Gene set bagging for estimating the probability a statistically significant result will replicate.


ABSTRACT: BACKGROUND:Significance analysis plays a major role in identifying and ranking genes, transcription factor binding sites, DNA methylation regions, and other high-throughput features associated with illness. We propose a new approach, called gene set bagging, for measuring the probability that a gene set replicates in future studies. Gene set bagging involves resampling the original high-throughput data, performing gene-set analysis on the resampled data, and confirming that biological categories replicate in the bagged samples. RESULTS:Using both simulated and publicly-available genomics data, we demonstrate that significant categories in a gene set enrichment analysis may be unstable when subjected to resampling. We show our method estimates the replication probability (R), the probability that a gene set will replicate as a significant result in future studies, and show in simulations that this method reflects replication better than each set's p-value. CONCLUSIONS:Our results suggest that gene lists based on p-values are not necessarily stable, and therefore additional steps like gene set bagging may improve biological inference on gene sets.

SUBMITTER: Jaffe AE 

PROVIDER: S-EPMC3890500 | biostudies-literature | 2013 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Gene set bagging for estimating the probability a statistically significant result will replicate.

Jaffe Andrew E AE   Storey John D JD   Ji Hongkai H   Leek Jeffrey T JT  

BMC bioinformatics 20131212


<h4>Background</h4>Significance analysis plays a major role in identifying and ranking genes, transcription factor binding sites, DNA methylation regions, and other high-throughput features associated with illness. We propose a new approach, called gene set bagging, for measuring the probability that a gene set replicates in future studies. Gene set bagging involves resampling the original high-throughput data, performing gene-set analysis on the resampled data, and confirming that biological ca  ...[more]

Similar Datasets

| S-EPMC3179615 | biostudies-literature
| S-EPMC3084717 | biostudies-literature
| S-EPMC2943622 | biostudies-literature
| S-EPMC1200092 | biostudies-literature
| S-EPMC5918465 | biostudies-other
| S-EPMC6691336 | biostudies-literature
| S-EPMC10616627 | biostudies-literature
| S-EPMC3688764 | biostudies-literature
| S-EPMC5181558 | biostudies-literature
| S-EPMC7037884 | biostudies-literature