Dataset Information

Length bias correction for RNA-seq data in gene set analyses.

ABSTRACT: Next-generation sequencing technologies are being rapidly applied to quantifying transcripts (RNA-seq). However, due to the unique properties of the RNA-seq data, the differential expression of longer transcripts is more likely to be identified than that of shorter transcripts with the same effect size. This bias complicates the downstream gene set analysis (GSA) because the methods for GSA previously developed for microarray data are based on the assumption that genes with same effect size have equal probability (power) to be identified as significantly differentially expressed. Since transcript length is not related to gene expression, adjusting for such length dependency in GSA becomes necessary.In this article, we proposed two approaches for transcript-length adjustment for analyses based on Poisson models: (i) At individual gene level, we adjusted each gene's test statistic using the square root of transcript length followed by testing for gene set using the Wilcoxon rank-sum test. (ii) At gene set level, we adjusted the null distribution for the Fisher's exact test by weighting the identification probability of each gene using the square root of its transcript length. We evaluated these two approaches using simulations and a real dataset, and showed that these methods can effectively reduce the transcript-length biases. The top-ranked GO terms obtained from the proposed adjustments show more overlaps with the microarray results.R scripts are at http://www.soph.uab.edu/Statgenetics/People/XCui/r-codes/.

SUBMITTER: Gao L

PROVIDER: S-EPMC3042188 | biostudies-literature | 2011 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Length bias correction for RNA-seq data in gene set analyses.

Gao Liyan L Fang Zhide Z Zhang Kui K Zhi Degui D Cui Xiangqin X

Bioinformatics (Oxford, England) 20110119 5

<h4>Motivation</h4>Next-generation sequencing technologies are being rapidly applied to quantifying transcripts (RNA-seq). However, due to the unique properties of the RNA-seq data, the differential expression of longer transcripts is more likely to be identified than that of shorter transcripts with the same effect size. This bias complicates the downstream gene set analysis (GSA) because the methods for GSA previously developed for microarray data are based on the assumption that genes with sa ...[more]

PMID: 21252076

Dataset Information

Length bias correction for RNA-seq data in gene set analyses.

Publications

Length bias correction for RNA-seq data in gene set analyses.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Gene set analysis controlling for length bias in RNA-seq experiments.
| S-EPMC5294840 | biostudies-literature

Recurrent functional misinterpretation of RNA-seq data caused by sample-specific gene length bias.
| S-EPMC6850523 | biostudies-literature

A new approach to bias correction in RNA-Seq.
| S-EPMC3315719 | biostudies-literature

Comparative evaluation of gene set analysis approaches for RNA-Seq data.
| S-EPMC4265362 | biostudies-literature

GSVA: gene set variation analysis for microarray and RNA-seq data.
| S-EPMC3618321 | biostudies-literature

BCseq: accurate single cell RNA-seq quantification with bias correction.
| S-EPMC6101504 | biostudies-literature

Improving Gene-Set Enrichment Analysis of RNA-Seq Data with Small Replicates.
| S-EPMC5102490 | biostudies-literature

GSAASeqSP: a toolset for gene set association analysis of RNA-Seq data.
| S-EPMC4161965 | biostudies-literature

Bias detection and correction in RNA-Sequencing data.
| S-EPMC3149584 | biostudies-literature

Gene set enrichment analysis of RNA-Seq data: integrating differential expression and splicing.
| S-EPMC3622641 | biostudies-literature