Dataset Information

A non-randomized procedure for large-scale heterogeneous multiple discrete testing based on randomized tests.

ABSTRACT: In the analysis of next-generation sequencing technology, massive discrete data are generated from short read counts with varying biological coverage. Conducting conditional hypothesis testing such as Fisher's Exact Test at every genomic region of interest thus leads to a heterogeneous multiple discrete testing problem. However, most existing multiple testing procedures for controlling the false discovery rate (FDR) assume that test statistics are continuous and become conservative for discrete tests. To overcome the conservativeness, in this article, we propose a novel multiple testing procedure for better FDR control on heterogeneous discrete tests. Our procedure makes decisions based on the marginal critical function (MCF) of randomized tests, which enables achieving a powerful and non-randomized multiple testing procedure. We provide upper bounds of the positive FDR (pFDR) and the positive false non-discovery rate (pFNR) corresponding to our procedure. We also prove that the set of detections made by our method contains every detection made by a naive application of the widely-used q-value method. We further demonstrate the improvement of our method over other existing multiple testing procedures by simulations and a real example of differentially methylated region (DMR) detection using whole-genome bisulfite sequencing (WGBS) data.

SUBMITTER: Dai X

PROVIDER: S-EPMC6565503 | biostudies-literature | 2019 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A non-randomized procedure for large-scale heterogeneous multiple discrete testing based on randomized tests.

Dai Xiaoyu X Lin Nan N Li Daofeng D Wang Ting T

Biometrics 20190309 2

In the analysis of next-generation sequencing technology, massive discrete data are generated from short read counts with varying biological coverage. Conducting conditional hypothesis testing such as Fisher's Exact Test at every genomic region of interest thus leads to a heterogeneous multiple discrete testing problem. However, most existing multiple testing procedures for controlling the false discovery rate (FDR) assume that test statistics are continuous and become conservative for discrete ...[more]

PMID: 30387496

Dataset Information

A non-randomized procedure for large-scale heterogeneous multiple discrete testing based on randomized tests.

Publications

A non-randomized procedure for large-scale heterogeneous multiple discrete testing based on randomized tests.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

UGM: a more stable procedure for large-scale multiple testing problems, new solutions to identify oncogene.
| S-EPMC6927121 | biostudies-literature

Large-Scale Multiple Testing of Correlations.
| S-EPMC4894362 | biostudies-literature

False Discovery Control in Large-Scale Spatial Multiple Testing.
| S-EPMC4310249 | biostudies-literature

A robust method for large-scale multiple hypotheses testing.
| S-EPMC3960085 | biostudies-literature

Bayesian Hidden Markov Models for Dependent Large-Scale Multiple Testing.
| S-EPMC6818740 | biostudies-literature

Weighted False Discovery Rate Control in Large-Scale Multiple Testing.
| S-EPMC6474384 | biostudies-literature

Post hoc power estimation in large-scale multiple testing problems.
| S-EPMC3500624 | biostudies-literature

Large-scale protein function prediction using heterogeneous ensembles.
| S-EPMC6221071 | biostudies-literature

Fast and covariate-adaptive method amplifies detection power in large-scale multiple hypothesis testing.
| S-EPMC6668431 | biostudies-literature

A Semiautomated ChIP-Seq Procedure for Large-scale Epigenetic Studies.
| S-EPMC7870284 | biostudies-literature