Dataset Information

IVT-seq reveals extreme bias in RNA-sequencing

ABSTRACT: Background: RNA-seq is a powerful technique for identifying and quantifying transcription and splicing events, both known and novel. However, given its recent development and the proliferation of library construction methods, understanding the bias it introduces is incomplete but critical to realizing its value. Results: We present a method, in vitro transcription sequencing (IVT-seq), for identifying and assessing the technical biases in RNA-seq library generation and sequencing at scale. We created a pool of over 1,000 in vitro transcribed RNAs from a full-length human cDNA library and sequenced them with polyA and total RNA-seq, the most common protocols. Because each cDNA is full length, and we show in vitro transcription is incredibly processive, each base in each transcript should be equivalently represented. However, with common RNA-seq applications and platforms, we find 50% of transcripts have more than two-fold and 10% have more than 10-fold differences in within-transcript sequence coverage. We also find greater than 6% of transcripts have regions of dramatically unpredictable sequencing coverage between samples, confounding accurate determination of their expression. We use a combination of experimental and computational approaches to show rRNA depletion is responsible for the most significant variability in coverage, and several sequence determinants also strongly influence representation. Conclusions: These results show the utility of IVT-seq for promoting better understanding of bias introduced by RNA-seq. We find rRNA depletion is responsible for substantial, unappreciated biases in coverage introduced during library preparation. These biases suggest exon-level expression analysis may be inadvisable, and we recommend caution when interpreting RNA-seq results.

ORGANISM(S): Mus musculus mixed libraries Homo sapiens

PROVIDER: GSE50445 | GEO | 2014/05/21

SECONDARY ACCESSION(S): PRJNA217498

REPOSITORIES: GEO

ACCESS DATA

Dataset's files

Source:

			Action	DRS
		Other

Items per page:

1 - 1 of 1

Similar Datasets

Project description:Background RNA sequencing (RNA-seq) is a powerful technique for identifying and quantifying transcription and splicing events, both known and novel. However, given its recent development and the proliferation of library construction methods, understanding the bias it introduces is incomplete but critical to realizing its value. Results Here we present a method, in vitro transcription sequencing (IVT-seq), for identifying and assessing the technical biases in RNA-seq library generation and sequencing at scale. We created a pool of > 1000 in vitro transcribed (IVT) RNAs from a full-length human cDNA library and sequenced them with poly-A and total RNA-seq, the most common protocols. Because each cDNA is full length and we show IVT is incredibly processive, each base in each transcript should be equivalently represented. However, with common RNA-seq applications and platforms, we find ~50% of transcripts have > 2-fold and ~10% have > 10-fold differences in within-transcript sequence coverage. Strikingly, we also find > 6% of transcripts have regions of high, unpredictable sequencing coverage, where the same transcript varies dramatically in coverage between samples, confounding accurate determination of their expression. To get at causal factors, we used a combination of experimental and computational approaches to show that rRNA depletion is responsible for the most significant variability in coverage and that several sequence determinants also strongly influence representation. Conclusions In sum, these results show the utility of IVT-seq in promoting better understanding of bias introduced by RNA-seq and suggest caution in its interpretation. Furthermore, we find that rRNA-depletion is responsible for substantial, unappreciated biases in coverage. Perhaps most importantly, these coverage biases introduced during library preparation suggest exon level expression analysis may be inadvisable. 5 rRNA-depleted samples with duplicates, 1 polyA selected, 1 total RNA, and 1 plasmid library all without replicates.

Project description:Purpose: Ribosome profiling and RNA-Seq were used to map the location and abundance of translating ribosomes on mouse heart and skeletal muscle transcripts. Methods: Tissue was rapidly harvested and snap-frozen to minimize bias to the pool of translating ribosomes. RNA was prepared from a single homogenate for each tissue so that starting RNA populations for both libraries were closely matched. Homogenates were not clarified before RNase digestion to avoid loss of ribosomes associated with large molecular weight complexes, and RNA-Seq libraries were prepared after rRNA subtraction to avoid positional loss of 5’ reads. Trimmed reads from 50 cycles of Illumina single-end sequencing were mapped onto a non-redundant set of 18,499 mouse protein-coding RefSeq transcripts from the nuclear genome. Results: Mapped sequence reads to myosin, actin and the giant protein titin together account for ~20% of the total mRNA-derived ribosome protected fragments (RPFs). We observed large-scale uniformity in the distribution of RPFs on the >30,000 codon titin open reading frame, from which we inferred an in vivo ribosome elongation error rate of ≤10-5. Ribosome footprints on Ttn mRNA also uncovered a novel 5’ UTR within a phylogenetically conserved intronic element that would produce ~2.35 mDa titin isoform that corresponds to the titin 'T2' band frequently described as a proteolytic artifact. Local translation efficiency across several >10 kb muscle mRNAs was also uniform, while their global translation efficiencies varied by ~20-fold suggesting initiation rate plays a major role in the translation efficiency of large mRNAs. Evidence for RPFs on 5’ UTRs was widespread with particular enrichment for ribosomes positioned at CUG codons. Comparison of global translation efficiency in cardiac and skeletal muscle revealed novel examples of tissue-specific translational control including synthesis of the myogenic factor Mef2c, and the titin-binding stress response protein Ankrd23. Conclusions: Our study represents the first detailed analysis of translation in an adult mammalian tissue generated by ribosome profiling technology. Current limitations to using ribosomal profiling in tissues include unknown perturbations to the dynamic state of translation despite rapidly harvested and snap-frozen samples. The uniform 5’ to 3’ coverage observed on individual large mRNAs and the ability to observe footprints on the extremely small phospholamban coding sequence, suggests that initiation and elongation were halted on similar time scales. More detailed examination of the positional information within CDS region requires further understanding of the bias introduced during the library preparation steps for both RPF-and RNA-Seq, as well as local biases induced as translation is arrested. Despite these qualifications, this initial view of active translation in muscle tissue highlights the potential for ribosome profiling to monitor the dynamic translation response to exercise, injury or disease pathology in animal models at a level of resolution not easily attainable with other quantitative approaches. Heart and skeletal muscle ribosome-protected fragment and RNA-Seq profiles of 10-week old C57BL/6J male mice were generated by deep sequencing using the Illumina HiSeq 2000.

Dataset Information

IVT-seq reveals extreme bias in RNA-sequencing

Dataset's files

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets