Dataset Information

Using controls to limit false discovery in the era of big data.

ABSTRACT: Procedures for controlling the false discovery rate (FDR) are widely applied as a solution to the multiple comparisons problem of high-dimensional statistics. Current FDR-controlling procedures require accurately calculated p-values and rely on extrapolation into the unknown and unobserved tails of the null distribution. Both of these intermediate steps are challenging and can compromise the reliability of the results.We present a general method for controlling the FDR that capitalizes on the large amount of control data often found in big data studies to avoid these frequently problematic intermediate steps. The method utilizes control data to empirically construct the distribution of the test statistic under the null hypothesis and directly compares this distribution to the empirical distribution of the test data. By not relying on p-values, our control data-based empirical FDR procedure more closely follows the foundational principles of the scientific method: that inference is drawn by comparing test data to control data. The method is demonstrated through application to a problem in structural genomics.The method described here provides a general statistical framework for controlling the FDR that is specifically tailored for the big data setting. By relying on empirically constructed distributions and control data, it forgoes potentially problematic modeling steps and extrapolation into the unknown tails of the null distribution. This procedure is broadly applicable insofar as controlled experiments or internal negative controls are available, as is increasingly common in the big data setting.

SUBMITTER: Parks MM

PROVIDER: S-EPMC6137876 | biostudies-other | 2018 Sep

REPOSITORIES: biostudies-other

ACCESS DATA

Publications

Using controls to limit false discovery in the era of big data.

Parks Matthew M MM Raphael Benjamin J BJ Lawrence Charles E CE

BMC bioinformatics 20180914 1

<h4>Background</h4>Procedures for controlling the false discovery rate (FDR) are widely applied as a solution to the multiple comparisons problem of high-dimensional statistics. Current FDR-controlling procedures require accurately calculated p-values and rely on extrapolation into the unknown and unobserved tails of the null distribution. Both of these intermediate steps are challenging and can compromise the reliability of the results.<h4>Results</h4>We present a general method for controlling ...[more]

PMID: 30217148

Dataset Information

Using controls to limit false discovery in the era of big data.

Publications

Using controls to limit false discovery in the era of big data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

SYMPOSIUM - ICPRP 2019-ERA OF BIG DATA
| S-EPMC8021051 | biostudies-literature

Studying alcohol use disorder using Drosophila melanogaster in the era of 'Big Data'.
| S-EPMC6469124 | biostudies-literature

References for Haplotype Imputation in the Big Data Era.
| S-EPMC4888899 | biostudies-literature

Local false discovery rate estimation using feature reliability in LC/MS metabolomics data.
| S-EPMC4657040 | biostudies-literature

Storing, combining and analysing turkey experimental data in the Big Data era.
| S-EPMC7538337 | biostudies-literature

Adverse Drug Event Discovery Using Biomedical Literature: A Big Data Neural Network Adventure.
| S-EPMC5741828 | biostudies-literature

Antibody-Antigen Binding Interface Analysis in the Big Data Era
| S-EPMC9329859 | biostudies-literature

Renewing Felsenstein's phylogenetic bootstrap in the era of big data.
| S-EPMC6030568 | biostudies-literature

Optimal False Discovery Rate Control for Dependent Data.
| S-EPMC3559028 | biostudies-literature

Discovery of preventive drugs for cisplatin-induced acute kidney injury using big data analysis.
| S-EPMC9283743 | biostudies-literature