Dataset Information

Tximeta: Reference sequence checksums for provenance identification in RNA-seq.

ABSTRACT: Correct annotation metadata is critical for reproducible and accurate RNA-seq analysis. When files are shared publicly or among collaborators with incorrect or missing annotation metadata, it becomes difficult or impossible to reproduce bioinformatic analyses from raw data. It also makes it more difficult to locate the transcriptomic features, such as transcripts or genes, in their proper genomic context, which is necessary for overlapping expression data with other datasets. We provide a solution in the form of an R/Bioconductor package tximeta that performs numerous annotation and metadata gathering tasks automatically on behalf of users during the import of transcript quantification files. The correct reference transcriptome is identified via a hashed checksum stored in the quantification output, and key transcript databases are downloaded and cached locally. The computational paradigm of automatically adding annotation metadata based on reference sequence checksums can greatly facilitate genomic workflows, by helping to reduce overhead during bioinformatic analyses, preventing costly bioinformatic mistakes, and promoting computational reproducibility. The tximeta package is available at https://bioconductor.org/packages/tximeta.

SUBMITTER: Love MI

PROVIDER: S-EPMC7059966 | biostudies-literature | 2020 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Tximeta: Reference sequence checksums for provenance identification in RNA-seq.

Love Michael I MI Soneson Charlotte C Hickey Peter F PF Johnson Lisa K LK Pierce N Tessa NT Shepherd Lori L Morgan Martin M Patro Rob R

PLoS computational biology 20200225 2

Correct annotation metadata is critical for reproducible and accurate RNA-seq analysis. When files are shared publicly or among collaborators with incorrect or missing annotation metadata, it becomes difficult or impossible to reproduce bioinformatic analyses from raw data. It also makes it more difficult to locate the transcriptomic features, such as transcripts or genes, in their proper genomic context, which is necessary for overlapping expression data with other datasets. We provide a soluti ...[more]

PMID: 32097405

Dataset Information

Tximeta: Reference sequence checksums for provenance identification in RNA-seq.

Publications

Tximeta: Reference sequence checksums for provenance identification in RNA-seq.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Automated identification of reference genes based on RNA-seq data.
| S-EPMC5568602 | biostudies-literature

SNP calling from RNA-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence.
| S-EPMC5100560 | biostudies-literature

Protein identification using customized protein sequence databases derived from RNA-Seq data.
| S-EPMC3727138 | biostudies-literature

An RNA-Seq-based reference transcriptome for Citrus.
| S-EPMC11388863 | biostudies-literature

De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis.
| S-EPMC3875132 | biostudies-literature

ISVASE: identification of sequence variant associated with splicing event using RNA-seq data.
| S-EPMC5490186 | biostudies-literature

iMapSplice: Alleviating reference bias through personalized RNA-seq alignment.
| S-EPMC6086400 | biostudies-literature

HLA typing from RNA-Seq sequence reads.
| S-EPMC4064318 | biostudies-literature

Cross-platform ultradeep transcriptomic profiling of human reference RNA samples by RNA-Seq.
| S-EPMC4322577 | biostudies-literature

Systematic identification and validation of the reference genes from 60 RNA-Seq libraries in the scallop Mizuhopecten yessoensis.
| S-EPMC6460854 | biostudies-literature