Dataset Information

The Project for High-Confidence Coding and Noncoding Transcriptome Maps

ABSTRACT: The advent of high-throughput RNA sequencing (RNA-seq) has led to the discovery of unprecedentedly immense transcriptomes encoded by eukaryotic genomes. However, the transcriptome maps are still incomplete partly because they were mostly reconstructed based on RNA-seq reads that lack their orientations (known as unstranded reads) and certain boundary information. Methods to expand the usability of unstranded RNA-seq data by predetermining the orientation of the reads and precisely determining the boundaries of assembled transcripts could significantly benefit the quality of the resulting transcriptome maps. Here, we present a high-performing transcriptome assembly pipeline, called CAFE, that significantly improves the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data, by orienting unstranded reads using the maximum likelihood estimation and by integrating information about transcription start sites and cleavage and polyadenylation sites. Applying large-scale transcriptomic data comprising 230 billion RNA-seq reads from the ENCODE, Human BodyMap Projects, The Cancer Genome Atlas, and GTEx, CAFE enabled us to predict the directions of about 220 billion unstranded reads, which led to the construction of more accurate transcriptome maps, comparable to the manually curated map, and a comprehensive lncRNA catalogue that includes thousands of novel lncRNAs. Our pipeline should not only help to build comprehensive, precise transcriptome maps from complex genomes but also to expand the universe of non-coding genomes. This SuperSeries is composed of the SubSeries listed below.

OTHER RELATED OMICS DATASETS IN: PRJNA381216PRJNA381218

ORGANISM(S): Mus musculus Homo sapiens

PROVIDER: GSE97212 | GEO | 2017/04/01

SECONDARY ACCESSION(S): PRJNA381216

REPOSITORIES: GEO

ACCESS DATA

Dataset's files

Source:

			Action	DRS
		Other

Items per page:

1 - 1 of 1

Dataset Information

The Project for High-Confidence Coding and Noncoding Transcriptome Maps

Dataset's files

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

High-confidence Coding and Noncoding Transcriptome Maps
2017-04-01 | GSE97211 | GEO

Co-assembly of stranded and unstranded RNA-seq data improves coding and noncoding transcriptome maps
2016-07-29 | E-GEOD-84946 | biostudies-arrayexpress

Co-assembly of stranded and unstranded RNA-seq data improves coding and noncoding transcriptome maps
2016-07-29 | GSE84946 | GEO

A tissue-mapped axolotl de novo transcriptome enables identification of limb regeneration factors
2016-12-15 | GSE92429 | GEO

De novo assembly of the desert tree Haloxylon ammodendron (C. A. Mey.) based on RNA-Seq data provides insight into drought response, gene discovery and marker identification
2014-12-20 | GSE63970 | GEO

Integration of transcriptome and proteome annotation in the naive Ixodes ricinus midgut with genome sequencing
2015-11-04 | PXD001796 | Pride

De novo assembly of the desert tree Haloxylon ammodendron (C. A. Mey.) based on RNA-Seq data provides insight into drought response, gene discovery and marker identification
2014-12-20 | E-GEOD-63970 | biostudies-arrayexpress

RNA-seq of 6 tissues from Macaca mulatta to investigate the evolution of gene expression levels in mammalian organs
2011-10-12 | E-MTAB-3717 | biostudies-arrayexpress