Dataset Information

Combining DGE and RNA-sequencing data to identify new polyA+ non-coding transcripts in the human genome.

ABSTRACT: Recent sequencing technologies that allow massive parallel production of short reads are the method of choice for transcriptome analysis. Particularly, digital gene expression (DGE) technologies produce a large dynamic range of expression data by generating short tag signatures for each cell transcript. These tags can be mapped back to a reference genome to identify new transcribed regions that can be further covered by RNA-sequencing (RNA-Seq) reads. Here, we applied an integrated bioinformatics approach that combines DGE tags, RNA-Seq, tiling array expression data and species-comparison to explore new transcriptional regions and their specific biological features, particularly tissue expression or conservation. We analysed tags from a large DGE data set (designated as 'TranscriRef'). We then annotated 750,000 tags that were uniquely mapped to the human genome according to Ensembl. We retained transcripts originating from both DNA strands and categorized tags corresponding to protein-coding genes, antisense, intronic- or intergenic-transcribed regions and computed their overlap with annotated non-coding transcripts. Using this bioinformatics approach, we identified ?34,000 novel transcribed regions located outside the boundaries of known protein-coding genes. As demonstrated using sequencing data from human pluripotent stem cells for biological validation, the method could be easily applied for the selection of tissue-specific candidate transcripts. DigitagCT is available at http://cractools.gforge.inria.fr/softwares/digitagct.

SUBMITTER: Philippe N

PROVIDER: S-EPMC3950697 | biostudies-literature | 2014 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Combining DGE and RNA-sequencing data to identify new polyA+ non-coding transcripts in the human genome.

Philippe Nicolas N Bou Samra Elias E Boureux Anthony A Mancheron Alban A Rufflé Florence F Bai Qiang Q De Vos John J Rivals Eric E Commes Thérèse T

Nucleic acids research 20131218 5

Recent sequencing technologies that allow massive parallel production of short reads are the method of choice for transcriptome analysis. Particularly, digital gene expression (DGE) technologies produce a large dynamic range of expression data by generating short tag signatures for each cell transcript. These tags can be mapped back to a reference genome to identify new transcribed regions that can be further covered by RNA-sequencing (RNA-Seq) reads. Here, we applied an integrated bioinformatic ...[more]

PMID: 24357408

Dataset Information

Combining DGE and RNA-sequencing data to identify new polyA+ non-coding transcripts in the human genome.

Publications

Combining DGE and RNA-sequencing data to identify new polyA+ non-coding transcripts in the human genome.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Hybridization-based reconstruction of small non-coding RNA transcripts from deep sequencing data.
| S-EPMC3439898 | biostudies-other

iSeeRNA: identification of long intergenic non-coding RNA transcripts from transcriptome sequencing data.
| S-EPMC3582448 | biostudies-literature

Paired rRNA-depleted and polyA-selected RNA sequencing data and supporting multi-omics data from human T cells.
| S-EPMC7652884 | biostudies-literature

Mining small RNA sequencing data: a new approach to identify small nucleolar RNAs in Arabidopsis.
| S-EPMC2685112 | biostudies-literature

Integrated detection of natural antisense transcripts using strand-specific RNA sequencing data.
| S-EPMC3787269 | biostudies-literature

A comprehensive inventory of TLX1 controlled long non-coding RNAs in T-cell acute lymphoblastic leukemia through polyA+ and total RNA sequencing.
| S-EPMC6269303 | biostudies-literature

Fusion transcripts and their genomic breakpoints in polyadenylated and ribosomal RNA-minus RNA sequencing data.
| S-EPMC8673554 | biostudies-literature

CoRAL: predicting non-coding RNAs from small RNA-sequencing data.
| S-EPMC3737537 | biostudies-literature

A Novel Analytical Strategy to Identify Fusion Transcripts between Repetitive Elements and Protein Coding-Exons Using RNA-Seq.
| S-EPMC4945064 | biostudies-literature

Identification of protein coding regions in RNA transcripts.
| S-EPMC4499116 | biostudies-literature