Dataset Information

Improved data-driven likelihood factorizations for transcript abundance estimation.

ABSTRACT: Many methods for transcript-level abundance estimation reduce the computational burden associated with the iterative algorithms they use by adopting an approximate factorization of the likelihood function they optimize. This leads to considerably faster convergence of the optimization procedure, since each round of e.g. the EM algorithm, can execute much more quickly. However, these approximate factorizations of the likelihood function simplify calculations at the expense of discarding certain information that can be useful for accurate transcript abundance estimation.We demonstrate that model simplifications (i.e. factorizations of the likelihood function) adopted by certain abundance estimation methods can lead to a diminished ability to accurately estimate the abundances of highly related transcripts. In particular, considering factorizations based on transcript-fragment compatibility alone can result in a loss of accuracy compared to the per-fragment, unsimplified model. However, we show that such shortcomings are not an inherent limitation of approximately factorizing the underlying likelihood function. By considering the appropriate conditional fragment probabilities, and adopting improved, data-driven factorizations of this likelihood, we demonstrate that such approaches can achieve accuracy nearly indistinguishable from methods that consider the complete (i.e. per-fragment) likelihood, while retaining the computational efficiently of the compatibility-based factorizations.Our data-driven factorizations are incorporated into a branch of the Salmon transcript quantification tool: https://github.com/COMBINE-lab/salmon/tree/factorizations .rob.patro@cs.stonybrook.edu.Supplementary data are available at Bioinformatics online.

SUBMITTER: Zakeri M

PROVIDER: S-EPMC5870700 | biostudies-literature | 2017 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Improved data-driven likelihood factorizations for transcript abundance estimation.

Zakeri Mohsen M Srivastava Avi A Almodaresi Fatemeh F Patro Rob R

Bioinformatics (Oxford, England) 20170701 14

<h4>Motivation</h4>Many methods for transcript-level abundance estimation reduce the computational burden associated with the iterative algorithms they use by adopting an approximate factorization of the likelihood function they optimize. This leads to considerably faster convergence of the optimization procedure, since each round of e.g. the EM algorithm, can execute much more quickly. However, these approximate factorizations of the likelihood function simplify calculations at the expense of d ...[more]

PMID: 28881996

Dataset Information

Improved data-driven likelihood factorizations for transcript abundance estimation.

Publications

Improved data-driven likelihood factorizations for transcript abundance estimation.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Alignment and mapping methodology influence transcript abundance estimation.
| S-EPMC7487471 | biostudies-literature

EMSAR: estimation of transcript abundance from RNA-seq data by mappability-based segmentation and reclustering.
| S-EPMC4559005 | biostudies-literature

Isoform-level ribosome occupancy estimation guided by transcript abundance with Ribomap.
| S-EPMC4908323 | biostudies-literature

Improved pre-test likelihood estimation of coronary artery disease using phonocardiography.
| S-EPMC9779903 | biostudies-literature

Efficient pairwise composite likelihood estimation for spatial-clustered data.
| S-EPMC4431962 | biostudies-literature

A flexible quasi-likelihood model for microbiome abundance count data.
| S-EPMC11045296 | biostudies-literature

Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation.
| S-EPMC5143225 | biostudies-literature

Microbial-Maximum Likelihood Estimation Tool for Microbial Quantification in Food From Left-Censored Data Using Maximum Likelihood Estimation for Microbial Risk Assessment.
| S-EPMC8740018 | biostudies-literature

Maximum Likelihood Estimation for Semiparametric Regression Models With Panel Count Data.
| S-EPMC8691743 | biostudies-literature

Maximum likelihood estimation for semiparametric transformation models with interval-censored data.
| S-EPMC4890294 | biostudies-literature