Unknown

Dataset Information

0

RNASeqDesign: A framework for RNA-Seq genome-wide power calculation and study design issues.


ABSTRACT: Massively parallel sequencing (a.k.a. next-generation sequencing, NGS) technology has emerged as a powerful tool in characterizing genomic profiles. Among many NGS applications, RNA sequencing (RNA-Seq) has gradually become a standard tool for global transcriptomic monitoring. Although the cost of NGS experiments has dropped constantly, the high sequencing cost and bioinformatic complexity are still obstacles for many biomedical projects. Unlike earlier fluorescence-based technologies such as microarray, modelling of NGS data should consider discrete count data. In addition to sample size, sequencing depth also directly relates to the experimental cost. Consequently, given total budget and pre-specified unit experimental cost, the study design issue in RNA-Seq is conceptually a more complex multi-dimensional constrained optimization problem rather than one-dimensional sample size calculation in traditional hypothesis setting. In this paper, we propose a statistical framework, namely "RNASeqDesign", to utilize pilot data for power calculation and study design of RNA-Seq experiments. The approach is based on mixture model fitting of p-value distribution from pilot data and a parametric bootstrap procedure based on approximated Wald test statistics to infer genome-wide power for optimal sample size and sequencing depth. We further illustrate five practical study design tasks for practitioners. We perform simulations and three real applications to evaluate the performance and compare to existing methods.

SUBMITTER: Lin CW 

PROVIDER: S-EPMC7941184 | biostudies-literature | 2019 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

RNASeqDesign: A framework for RNA-Seq genome-wide power calculation and study design issues.

Lin Chien-Wei CW   Liao Serena G SG   Liu Peng P   Park Yong Seok YS   Lee Mei-Ling Ting MT   Tseng George C GC  

Journal of the Royal Statistical Society. Series C, Applied statistics 20181209 3


Massively parallel sequencing (a.k.a. next-generation sequencing, NGS) technology has emerged as a powerful tool in characterizing genomic profiles. Among many NGS applications, RNA sequencing (RNA-Seq) has gradually become a standard tool for global transcriptomic monitoring. Although the cost of NGS experiments has dropped constantly, the high sequencing cost and bioinformatic complexity are still obstacles for many biomedical projects. Unlike earlier fluorescence-based technologies such as mi  ...[more]

Similar Datasets

| S-EPMC7846147 | biostudies-literature
| S-ECPF-GEOD-40591 | biostudies-other
| S-EPMC3193274 | biostudies-other
| S-EPMC4159408 | biostudies-other
| S-EPMC7885420 | biostudies-literature
| S-ECPF-GEOD-45335 | biostudies-other
| S-ECPF-GEOD-53532 | biostudies-other
| S-EPMC7607860 | biostudies-literature
| S-EPMC5551484 | biostudies-other
| S-EPMC5962340 | biostudies-literature