Dataset Information

Vespucci: a system for building annotated databases of nascent transcripts.

ABSTRACT: Global run-on sequencing (GRO-seq) is a recent addition to the series of high-throughput sequencing methods that enables new insights into transcriptional dynamics within a cell. However, GRO-sequencing presents new algorithmic challenges, as existing analysis platforms for ChIP-seq and RNA-seq do not address the unique problem of identifying transcriptional units de novo from short reads located all across the genome. Here, we present a novel algorithm for de novo transcript identification from GRO-sequencing data, along with a system that determines transcript regions, stores them in a relational database and associates them with known reference annotations. We use this method to analyze GRO-sequencing data from primary mouse macrophages and derive novel quantitative insights into the extent and characteristics of non-coding transcription in mammalian cells. In doing so, we demonstrate that Vespucci expands existing annotations for mRNAs and lincRNAs by defining the primary transcript beyond the polyadenylation site. In addition, Vespucci generates assemblies for un-annotated non-coding RNAs such as those transcribed from enhancer-like elements. Vespucci thereby provides a robust system for defining, storing and analyzing diverse classes of primary RNA transcripts that are of increasing biological interest.

SUBMITTER: Allison KA

PROVIDER: S-EPMC3936758 | biostudies-literature | 2014 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Vespucci: a system for building annotated databases of nascent transcripts.

Allison Karmel A KA Kaikkonen Minna U MU Gaasterland Terry T Glass Christopher K CK

Nucleic acids research 20131204 4

Global run-on sequencing (GRO-seq) is a recent addition to the series of high-throughput sequencing methods that enables new insights into transcriptional dynamics within a cell. However, GRO-sequencing presents new algorithmic challenges, as existing analysis platforms for ChIP-seq and RNA-seq do not address the unique problem of identifying transcriptional units de novo from short reads located all across the genome. Here, we present a novel algorithm for de novo transcript identification from ...[more]

PMID: 24304890

Dataset Information

Vespucci: a system for building annotated databases of nascent transcripts.

Publications

Vespucci: a system for building annotated databases of nascent transcripts.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Pervasive Targeting of Nascent Transcripts by Hfq.
| S-EPMC5990048 | biostudies-literature

Nascent RNA transcripts facilitate the formation of G-quadruplexes.
| S-EPMC4066803 | biostudies-literature

Uniclust databases of clustered and deeply annotated protein sequences and alignments.
| S-EPMC5614098 | biostudies-literature

Atlas of nascent RNA transcripts reveals enhancer to gene linkages.
| S-EPMC10723487 | biostudies-literature

Real-time assembly of ribonucleoprotein complexes on nascent RNA transcripts.
| S-EPMC6269517 | biostudies-literature

SON protects nascent transcripts from unproductive degradation by counteracting DIP1.
| S-EPMC6881055 | biostudies-literature

Antisense transcription licenses nascent transcripts to mediate transcriptional gene silencing.
| S-EPMC5131781 | biostudies-literature

Intergenic RNA mainly derives from nascent transcripts of known genes.
| S-EPMC8097831 | biostudies-literature

POINT technology illuminates the processing of polymerase-associated intact nascent transcripts.
| S-EPMC8122139 | biostudies-literature

Nascent HIV reverse transcripts
| PRJEB22170 | ENA