Counting and correcting errors within unique molecular identifiers to generate absolute numbers of sequencing molecules [RNA-seq]
Ontology highlight
ABSTRACT: Unique Molecular Identifiers (UMIs) are random oligonucleotide barcodes sequences? that are critical for the removal of PCR amplification biases within both bulk and single-cell sequencing experiments. However, the impact that PCR and sequencing errors have on the accuracy of generating absolute counts of RNA molecules is underappreciated. We demonstrate that PCR errors and not sequencing errors are the main source of inaccuracy in sequencing data and that the use of UMIs synthesized with homotrimeric nucleoside building blocks provides a solution to pinpoint and remove errors, allowing absolute counting of sequenced molecules.
Project description:Unique Molecular Identifiers (UMIs) are random oligonucleotide barcodes sequences? that are critical for the removal of PCR amplification biases within both bulk and single-cell sequencing experiments. However, the impact that PCR and sequencing errors have on the accuracy of generating absolute counts of RNA molecules is underappreciated. We demonstrate that PCR errors and not sequencing errors are the main source of inaccuracy in sequencing data and that the use of UMIs synthesized with homotrimeric nucleoside building blocks provides a solution to pinpoint and remove errors, allowing absolute counting of sequenced molecules.
Project description:Counting and correcting errors within unique molecular identifiers to generate absolute numbers of sequencing molecules [scRNA-Seq]
Project description:Single-cell transcriptomics, reliant on the incorporation of barcodes and unique molecular identifiers (UMIs) into captured polyA+ mRNA, faces a significant challenge due to synthesis errors in oligonucleotide capture sequences. These inaccuracies, which are especially problematic in long-read sequencing, impair the precise identification of sequences and result in inaccuracies in UMI deduplication. To mitigate this issue, we have modified the oligonucleotide capture design, which integrates an interposed anchor between the barcode and UMI, and a 'V' base anchor adjacent to the polyA capture region. This configuration is devised to ensure compatibility with both short and long-read sequencing technologies, facilitating improved UMI recovery and enhanced feature detection, thereby improving the efficacy of droplet-based sequencing methods.
Project description:Reduced representation bisulfite sequencing (RRBS) has been proven a powerful method in DNA methylome profiling. Since the initial development of this method, the RRBS protocol has been modified in order to optimize it for genomic coverage, starting material, and library-construction throughput, which has resulted in new methods such as enhanced RRBS (ERRBS), double-enzyme RRBS (dRRBS), gel-free and multiplexed RRBS (mRRBS), and single-cell RRBS (scRRBS). However, each of these methods has failed to address PCR-derived duplication artifacts, which can bias the results of DNA methylation analyses. To overcome the aforementioned complication, we developed quantitative RRBS (Q-RRBS), a method in which unique molecular identifiers (UMIs) are used to eliminate PCR-induced duplication. By performing Q-RRBS on varying amounts of starting material, we determined that duplication-induced artifacts were more severe when small quantities of the starting material were used. However, through using the UMIs, we successfully eliminated these artifacts. Our results demonstrate that Q-RRBS is an optimal strategy for DNA methylation profiling of single cells or samples containing ultra-trace amounts of cells.
Project description:Unique molecular identifiers are random oligonucleotide sequences that remove PCR amplification biases. However, the impact that PCR associated sequencing errors have on the accuracy of generating absolute counts of RNA molecules is underappreciated. We show that PCR errors are a source of inaccuracy in both bulk and single-cell sequencing data, and synthesizing unique molecular identifiers using homotrimeric nucleotide blocks provides an error-correcting solution that allows absolute counting of sequenced molecules.