Project description:Chemical RNA modifications, collectively referred to as the ‘epitranscriptome’, have been intensively studied during the last years, largely facilitated by the use of next-generation sequencing technologies. Recent efforts have turned towards the nanopore direct RNA sequencing (DRS) platform, as it allows simultaneous detection of diverse RNA modification types in full-length native RNA molecules. While RNA modifications can be identified in the form of systematic basecalling ‘errors’ in DRS datasets, m6A modifications produce very modest ‘errors’, limiting the applicability of this approach to sites that are modified at high stoichiometries. Here, we demonstrate that the use of alternative RNA basecalling models, trained with fully-unmodified in vitro synthetic sequences, increase the ‘error’ signal of m6A modifications, leading to enhanced detection of RNA modifications even at lower stoichiometries. We then show that the use of these models enhances the detection of RNA modifications on previously published in vivo human samples, using third-party softwares for the detection of RNA modifications. Moreover, our work provides a novel RNA basecalling model that shows a median accuracy of 97%, compared to previously available RNA basecalling models that show 91% accuracy. Notably, this increase in accuracy does not only lead to improved detection of RNA modifications, but also enhanced mappability of RNA reads, which becomes more evident in the case of short RNA reads (50% increase). Altogether, our work stresses the importance of using fully unmodified RNA sequences for training RNA basecalling models, and how the use of different basecalling models can significantly affect the detection of RNA modifications and read mappability.
Project description:Generating and analyzing overlapping peptides through multienzymatic digestion is an efficient procedure for de novo protein using from bottom-up mass spectrometry (MS). Despite improved instrumentation and software, de novo MS data analysis remains challenging. In recent years, deep learning models have represented a performance breakthrough. Incorporating that technology into de novo protein sequencing workflows require machine-learning models capable of handling highly diverse MS data. In this study, we analyzed the requirements for assembling such generalizable deep learning models by systematically varying the composition and size of the training set. We assessed the generated models' performances using two test sets composed of peptides originating from the multienzyme digestion of samples from various species. The peptide recall values on the test sets showed that the deep learning models generated from a collection of highly N- and C-termini diverse peptides generalized 76% more over the termini-restricted ones. Moreover, expanding the training set's size by adding peptides from the multienzymatic digestion with five proteases of several species samples led to a 2-3 fold generalizability gain. Furthermore, we tested the applicability of these multienzyme deep learning (MEM) models by fully de novo sequencing the heavy and light monomeric chains of five commercial antibodies (mAbs). MEMs extracted over 10000 matching and overlapped peptides across six different proteases mAb samples, achieving a 100% sequence coverage for 8 of the ten polypeptide chains. We foretell that the MEMs' proven improvements to de novo analysis will positively impact several applications, such as analyzing samples of high complexity, unknown nature, or the peptidomics field.
Project description:The heterogeneous composition of cellular transcriptomes poses a major challenge for detecting weakly expressed RNA classes, as they can be obscured by abundant RNAs. Although biochemical protocols can enrich or deplete specified RNAs, they are time-consuming, expensive and can compromise RNA integrity. Here we introduce RISER, a biochemical-free technology for the real-time enrichment or depletion of RNA classes. RISER performs selective rejection of molecules during direct RNA sequencing by identifying RNA classes directly from nanopore signals with deep learning and communicating with the sequencing hardware in real time. By targeting the dominant messenger and mitochondrial RNA classes for depletion, RISER reduced their respective read counts by more than 85%, resulting in an increase in sequencing depth of up to 93% for long non-coding RNAs. We also applied RISER for the depletion of globin mRNA in whole blood, achieving a decrease in globin reads by more than 90% as well as a significant increase in non-globin reads. Furthermore, using a GPU or a CPU, RISER is faster than GPU-accelerated basecalling and mapping. RISER’s modular and retrainable software and intuitive command-line interface allow easy adaptation to other RNA classes. RISER is available at https://github.com/comprna/riser.
Project description:The heterogeneous composition of cellular transcriptomes poses a major challenge for detecting weakly expressed RNA classes, as they can be obscured by abundant RNAs. Although biochemical protocols can enrich or deplete specified RNAs, they are time-consuming, expensive and can compromise RNA integrity. Here we introduce RISER, a biochemical-free technology for the real-time enrichment or depletion of RNA classes. RISER performs selective rejection of molecules during direct RNA sequencing by identifying RNA classes directly from nanopore signals with deep learning and communicating with the sequencing hardware in real time. By targeting the dominant messenger and mitochondrial RNA classes for depletion, RISER reduced their respective read counts by more than 85%, resulting in an increase in sequencing depth of up to 93% for long non-coding RNAs. We also applied RISER for the depletion of globin mRNA in whole blood, achieving a decrease in globin reads by more than 90% as well as a significant increase in non-globin reads. Furthermore, using a GPU or a CPU, RISER is faster than GPU-accelerated basecalling and mapping. RISER’s modular and retrainable software and intuitive command-line interface allow easy adaptation to other RNA classes. RISER is available at https://github.com/comprna/riser.
Project description:A benchmark set of bottom-up proteomics data for training deep learning networks. It has data from 51 organisms and includes nearly 1 million peptides.
Project description:Myelodysplastic syndrome (MDS) is a clonal myeloid neoplasm characterized by ineffective haematopoiesis and cytopenia with frequent epigenetic modifications. Hypomethylating agents (HMA) such as azacitidine (AZA) and decitabine (DEC) are standard therapeutic options in MDS, but drug resistance is not uncommon.To study RNA modification in particular M6A, P39-AZA-S and P39-AZA-R native RNA libraries were prepared using the direct RNA sequencing kit (Oxford Nanopore) following the manufacturer’s protocol (SQK-RNA002). Differential methylated RNA signal called from raw fast5 files were used as input for basecalling and downstream mapping and methylated transcript detection.
Project description:A benchmark set of bottom-up proteomics data for training deep learning networks. It has data from 51 organisms and includes nearly 1 million peptides.