Dataset Information

A powerful statistical approach for large-scale differential transcription analysis.

ABSTRACT: Next generation sequencing (NGS) is increasingly being used for transcriptome-wide analysis of differential gene expression. The NGS data are multidimensional count data. Therefore, most of the statistical methods developed well for microarray data analysis are not applicable to transcriptomic data. For this reason, a variety of new statistical methods based on count data of transcript reads have been correspondingly proposed. But due to high cost and limitation of biological resources, current NGS data are still generated from a few replicate libraries. Some of these existing methods do not always have desirable performances on count data. We here developed a very powerful and robust statistical method based on beta and binomial distributions. Our method (mBeta t-test) is specifically applicable to sequence count data from small samples. Both simulated and real transcriptomic data showed mBeta t-test significantly outperformed the existing top statistical methods chosen in all 12 given scenarios and performed with high efficiency and high stability. The differentially expressed genes found by our method from real transcriptomic data were validated by qPCR experiments. Our method shows high power in finding truly differential expression, conservatively estimating FDR and high stability in RNA sequence count data derived from small samples. Our method can also be extended to genome-wide detection of differential splicing events.

SUBMITTER: Tan YD

PROVIDER: S-EPMC4404056 | biostudies-literature | 2015

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A powerful statistical approach for large-scale differential transcription analysis.

Tan Yuan-De YD Chandler Anita M AM Chaudhury Arindam A Neilson Joel R JR

PloS one 20150420 4

Next generation sequencing (NGS) is increasingly being used for transcriptome-wide analysis of differential gene expression. The NGS data are multidimensional count data. Therefore, most of the statistical methods developed well for microarray data analysis are not applicable to transcriptomic data. For this reason, a variety of new statistical methods based on count data of transcript reads have been correspondingly proposed. But due to high cost and limitation of biological resources, current ...[more]

PMID: 25894390

Dataset Information

A powerful statistical approach for large-scale differential transcription analysis.

Publications

A powerful statistical approach for large-scale differential transcription analysis.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

A statistical framework for powerful multi-trait rare variant analysis in large-scale whole-genome sequencing studies.
| S-EPMC10634938 | biostudies-literature

Stability SCAD: a powerful approach to detect interactions in large-scale genomic study.
| S-EPMC3984751 | biostudies-literature

MEScan: a powerful statistical framework for genome-scale mutual exclusivity analysis of cancer mutations.
| S-EPMC8189684 | biostudies-literature

Universal probabilistic programming offers a powerful approach to statistical phylogenetics.
| S-EPMC7904853 | biostudies-literature

Large-scale computational and statistical analyses of high transcription potentialities in 32 prokaryotic genomes.
| S-EPMC2425493 | biostudies-literature

Statistical inference with large-scale trait imputation.
| S-EPMC10848238 | biostudies-literature

Diffacto quantification-centered large-scale differential analysis
2016-06-11 | MSV000079811 | MassIVE

Using Large-Scale Statistical Chinese Brain Template (Chinese2020) in Popular Neuroimage Analysis Toolkits.
| S-EPMC5562686 | biostudies-literature

Cellular arrays for large-scale analysis of transcription factor activity.
| S-EPMC3022829 | biostudies-literature

Large-scale differential proteome analysis in Plasmodium falciparum under drug treatment.
| S-EPMC2605551 | biostudies-literature