Dataset Information

Integration of quantitated expression estimates from polyA-selected and rRNA-depleted RNA-seq libraries.

ABSTRACT:

Background

The availability of fast alignment-free algorithms has greatly reduced the computational burden of RNA-seq processing, especially for relatively poorly assembled genomes. Using these approaches, previous RNA-seq datasets could potentially be processed and integrated with newly sequenced libraries. Confounding factors in such integration include sequencing depth and methods of RNA extraction and selection. Different selection methods (typically, either polyA-selection or rRNA-depletion) omit different RNAs, resulting in different fractions of the transcriptome being sequenced. In particular, rRNA-depleted libraries sample a broader fraction of the transcriptome than polyA-selected libraries. This study aimed to develop a systematic means of accounting for library type that allows data from these two methods to be compared.

Results

The method was developed by comparing two RNA-seq datasets from ovine macrophages, identical except for RNA selection method. Gene-level expression estimates were obtained using a two-part process centred on the high-speed transcript quantification tool Kallisto. Firstly, a set of reference transcripts was defined that constitute a standardised RNA space, with expression from both datasets quantified against it. Secondly, a simple ratio-based correction was applied to the rRNA-depleted estimates. The outcome is an almost perfect correlation between gene expression estimates, independent of library type and across the full range of levels of expression.

Conclusion

A combination of reference transcriptome filtering and a ratio-based correction can create equivalent expression profiles from both polyA-selected and rRNA-depleted libraries. This approach will allow meta-analysis and integration of existing RNA-seq data into transcriptional atlas projects.

SUBMITTER: Bush SJ

PROVIDER: S-EPMC5470212 | biostudies-literature | 2017 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Integration of quantitated expression estimates from polyA-selected and rRNA-depleted RNA-seq libraries.

Bush Stephen J SJ McCulloch Mary E B MEB Summers Kim M KM Hume David A DA Clark Emily L EL

BMC bioinformatics 20170613 1

<h4>Background</h4>The availability of fast alignment-free algorithms has greatly reduced the computational burden of RNA-seq processing, especially for relatively poorly assembled genomes. Using these approaches, previous RNA-seq datasets could potentially be processed and integrated with newly sequenced libraries. Confounding factors in such integration include sequencing depth and methods of RNA extraction and selection. Different selection methods (typically, either polyA-selection or rRNA-d ...[more]

PMID: 28610557

Similar Datasets

Project description:BackgroundPlatelets are small anucleate cells circulating in the blood vessels where they play a key role in hemostasis and thrombosis. Here, we compared platelet RNA-Seq results obtained from polyA+ mRNA and rRNA-depleted total RNA.Materials and methodsWe used purified, CD45 depleted, human blood platelets collected by apheresis from three male and one female healthy blood donors. The Illumina HiSeq 2000 platform was employed to sequence cDNA converted either from oligo(dT) isolated polyA+ RNA or from rRNA-depleted total RNA. The reads were aligned to the GRCh37 reference assembly with the TopHat/Cufflinks alignment package using Ensembl annotations. A de novo assembly of the platelet transcriptome using the Trinity software package and RSEM was also performed. The bioinformatic tools HTSeq and DESeq from Bioconductor were employed for further statistical analyses of read counts.ResultsConsistent with previous findings our data suggests that mitochondrially expressed genes comprise a substantial fraction of the platelet transcriptome. We also identified high transcript levels for protein coding genes related to the cytoskeleton function, chemokine signaling, cell adhesion, aggregation, as well as receptor interaction between cells. Certain transcripts were particularly abundant in platelets compared with other cell and tissue types represented by RNA-Seq data from the Illumina Human Body Map 2.0 project. Irrespective of the different library preparation and sequencing protocols, there was good agreement between samples from the 4 individuals. Eighteen differentially expressed genes were identified in the two sexes at 10% false discovery rate using DESeq.ConclusionThe present data suggests that platelets may have a unique transcriptome profile characterized by a relative over-expression of mitochondrially encoded genes and also of genomic transcripts related to the cytoskeleton function, chemokine signaling and surface components compared with other cell and tissue types. The in vivo functional significance of the non-mitochondrial transcripts remains to be shown.

Dataset Information

Integration of quantitated expression estimates from polyA-selected and rRNA-depleted RNA-seq libraries.

Background

Results

Conclusion

Publications

Integration of quantitated expression estimates from polyA-selected and rRNA-depleted RNA-seq libraries.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets