Dataset Information

Modeling Exon-Specific Bias Distribution Improves the Analysis of RNA-Seq Data.

ABSTRACT: RNA-seq technology has become an important tool for quantifying the gene and transcript expression in transcriptome study. The two major difficulties for the gene and transcript expression quantification are the read mapping ambiguity and the overdispersion of the read distribution along reference sequence. Many approaches have been proposed to deal with these difficulties. A number of existing methods use Poisson distribution to model the read counts and this easily splits the counts into the contributions from multiple transcripts. Meanwhile, various solutions were put forward to account for the overdispersion in the Poisson models. By checking the similarities among the variation patterns of read counts for individual genes, we found that the count variation is exon-specific and has the conserved pattern across the samples for each individual gene. We introduce Gamma-distributed latent variables to model the read sequencing preference for each exon. These variables are embedded to the rate parameter of a Poisson model to account for the overdispersion of read distribution. The model is tractable since the Gamma priors can be integrated out in the maximum likelihood estimation. We evaluate the proposed approach, PGseq, using four real datasets and one simulated dataset, and compare its performance with other popular methods. Results show that PGseq presents competitive performance compared to other alternatives in terms of accuracy in the gene and transcript expression calculation and in the downstream differential expression analysis. Especially, we show the advantage of our method in the analysis of low expression.

SUBMITTER: Liu X

PROVIDER: S-EPMC4598124 | biostudies-literature | 2015

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Modeling Exon-Specific Bias Distribution Improves the Analysis of RNA-Seq Data.

Liu Xuejun X Zhang Li L Chen Songcan S

PloS one 20151008 10

RNA-seq technology has become an important tool for quantifying the gene and transcript expression in transcriptome study. The two major difficulties for the gene and transcript expression quantification are the read mapping ambiguity and the overdispersion of the read distribution along reference sequence. Many approaches have been proposed to deal with these difficulties. A number of existing methods use Poisson distribution to model the read counts and this easily splits the counts into the c ...[more]

PMID: 26448625

Dataset Information

Modeling Exon-Specific Bias Distribution Improves the Analysis of RNA-Seq Data.

Publications

Modeling Exon-Specific Bias Distribution Improves the Analysis of RNA-Seq Data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Recurrent functional misinterpretation of RNA-seq data caused by sample-specific gene length bias.
| S-EPMC6850523 | biostudies-literature

Quantitative visualization of alternative exon expression from RNA-seq data.
| S-EPMC4542614 | biostudies-literature

PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution.
| S-EPMC3919567 | biostudies-literature

Toward Modeling Context-Specific EMT Regulatory Networks Using Temporal Single Cell RNA-Seq Data.
| S-EPMC7190801 | biostudies-literature

Hierarchical analysis of RNA-seq reads improves the accuracy of allele-specific expression.
| S-EPMC6022640 | biostudies-literature

Freeze-quenched maize mesophyll and bundle sheath separation uncovers bias in previous tissue-specific RNA-Seq data.
| S-EPMC5853576 | biostudies-literature

Correcting 4sU induced quantification bias in nucleotide conversion RNA-seq data.
| S-EPMC11039982 | biostudies-literature

Evaluating the bias of circRNA predictions from total RNA-Seq data.
| S-EPMC5762294 | biostudies-literature

Length bias correction for RNA-seq data in gene set analyses.
| S-EPMC3042188 | biostudies-literature

Combining Multiple RNA-Seq Data Analysis Algorithms Using Machine Learning Improves Differential Isoform Expression Analysis.
| S-EPMC8544431 | biostudies-literature