Project description:Unique Molecular Identifiers (UMIs) are random oligonucleotide barcodes sequences? that are critical for the removal of PCR amplification biases within both bulk and single-cell sequencing experiments. However, the impact that PCR and sequencing errors have on the accuracy of generating absolute counts of RNA molecules is underappreciated. We demonstrate that PCR errors and not sequencing errors are the main source of inaccuracy in sequencing data and that the use of UMIs synthesized with homotrimeric nucleoside building blocks provides a solution to pinpoint and remove errors, allowing absolute counting of sequenced molecules.
Project description:Unique Molecular Identifiers (UMIs) are random oligonucleotide barcodes sequences? that are critical for the removal of PCR amplification biases within both bulk and single-cell sequencing experiments. However, the impact that PCR and sequencing errors have on the accuracy of generating absolute counts of RNA molecules is underappreciated. We demonstrate that PCR errors and not sequencing errors are the main source of inaccuracy in sequencing data and that the use of UMIs synthesized with homotrimeric nucleoside building blocks provides a solution to pinpoint and remove errors, allowing absolute counting of sequenced molecules.
Project description:Counting and correcting errors within unique molecular identifiers to generate absolute numbers of sequencing molecules [scRNA-Seq]
Project description:Motivation:Counting molecules using next-generation sequencing (NGS) suffers from PCR amplification bias, which reduces the accuracy of many quantitative NGS-based experimental methods such as RNA-Seq. This is true even if molecules are made distinguishable using unique molecular identifiers (UMIs) before PCR amplification, and distinct UMIs are counted instead of reads: Molecules that are lost entirely during the sequencing process will still cause underestimation of the molecule count, and amplification artifacts like PCR chimeras create phantom UMIs and thus cause over-estimation. Results:We introduce the TRUmiCount algorithm to correct for both types of errors. The TRUmiCount algorithm is based on a mechanistic model of PCR amplification and sequencing, whose two parameters have an immediate physical interpretation as PCR efficiency and sequencing depth and can be estimated from experimental data without requiring calibration experiments or spike-ins. We show that our model captures the main stochastic properties of amplification and sequencing, and that it allows us to filter out phantom UMIs and to estimate the number of molecules lost during the sequencing process. Finally, we demonstrate that the phantom-filtered and loss-corrected molecule counts computed by TRUmiCount measure the true number of molecules with considerably higher accuracy than the raw number of distinct UMIs, even if most UMIs are sequenced only once as is typical for single-cell RNA-Seq. Availability and implementation:TRUmiCount is available at http://www.cibiv.at/software/trumicount and through Bioconda (http://bioconda.github.io). Supplementary information:Supplementary information is available at Bioinformatics online.
Project description:Several template DNA molecules with random base molecular barcodes were amplified and sequenced, and the efficacy of the random base barcode for digital counting was shown.
Project description:Molecule counting is central to single-cell sequencing, yet no experimental strategy to evaluate counting performance exist. Here, we introduce RNA spike-ins containing inbuilt unique molecular identifiers (molecular spikes) that we use to monitor single-cell RNA counting performance across methods and to identify experimental steps essential for accurate counting. In this dataset, we add molecular spikes to popular single-cell RNA-seq protocols: SCRB-seq, Smart-seq3 and 10x Genomics (v2). For SCRB-seq and Smart-seq3, we also include variations of the library preparation procedure that are suspected to lead to changes in the UMI counting accuracy.