Project description:N6-methyladenosine (m6A) has been one of the most abundant and well-known modifications in mRNA since its discovery in 1970s. Recent studies have demonstrated that m6A gets involved in various biological processes such as alternative splicing and RNA degradation, playing an important role in all kinds of diseases. To better understand the role of m6A, transcriptome-wide m6A profiling data is indispensable. In these years, the Oxford Nanopore Technology Direct RNA Sequencing (DRS) platform has shown promise in RNA modification detection based on current disruptions measured in transcripts. However, decoding current intensity data into modification profiles remains a challenging task. Here, we introduce m6A Transcriptome-wide Mapper (m6ATM), a novel Python-based computational pipeline that applies deep neural networks to predict m6A sites at single-base resolution using DRS data. The m6ATM model architecture incorporates a WaveNet encoder and a dual-stream multiple instance learning model to extract features from specific target sites and characterize the m6A epitranscriptome. For validation, m6ATM achieved an accuracy of 80 to 98% across in-vitro transcription datasets containing varying m6A modification ratios and outperformed other tools in benchmarking with human cell-line data. Moreover, we demonstrated the versatility of m6ATM in providing reliable stoichiometric information and used it to pinpoint PEG10 as a potential m6A target transcript in liver cancer cells. In conclusion, we showed that m6ATM is a high-performance m6A detection tool and our results paved the way for epitranscriptomic precision medicine.
Project description:N6-methyladenosine (m6A) has been one of the most abundant and well-known modifications in mRNA since its discovery in 1970s. Recent studies have demonstrated that m6A gets involved in various biological processes such as alternative splicing and RNA degradation, playing an important role in all kinds of diseases. To better understand the role of m6A, transcriptome-wide m6A profiling data is indispensable. In these years, the Oxford Nanopore Technology Direct RNA Sequencing (DRS) platform has shown promise in RNA modification detection based on current disruptions measured in transcripts. However, decoding current intensity data into modification profiles remains a challenging task. Here, we introduce m6A Transcriptome-wide Mapper (m6ATM), a novel Python-based computational pipeline that applies deep neural networks to predict m6A sites at single-base resolution using DRS data. The m6ATM model architecture incorporates a WaveNet encoder and a dual-stream multiple instance learning model to extract features from specific target sites and characterize the m6A epitranscriptome. For validation, m6ATM achieved an accuracy of 80 to 98% across in-vitro transcription datasets containing varying m6A modification ratios and outperformed other tools in benchmarking with human cell-line data. Moreover, we demonstrated the versatility of m6ATM in providing reliable stoichiometric information and used it to pinpoint PEG10 as a potential m6A target transcript in liver cancer cells. In conclusion, we showed that m6ATM is a high-performance m6A detection tool and our results paved the way for epitranscriptomic precision medicine.
Project description:N6-methyladenosine (m6A) is one of the most abundant and well-known modifications in messenger RNAs since its discovery in the 1970s. Recent studies have demonstrated that m6A is involved in various biological processes, such as alternative splicing and RNA degradation, playing an important role in a variety of diseases. To better understand the role of m6A, transcriptome-wide m6A profiling data are indispensable. In recent years, the Oxford Nanopore Technology Direct RNA Sequencing (DRS) platform has shown promise for RNA modification detection based on current disruptions measured in transcripts. However, decoding current intensity data into modification profiles remains a challenging task. Here, we introduce the m6A Transcriptome-wide Mapper (m6ATM), a novel Python-based computational pipeline that applies deep neural networks to predict m6A sites at a single-base resolution using DRS data. The m6ATM model architecture incorporates a WaveNet encoder and a dual-stream multiple-instance learning model to extract features from specific target sites and characterize the m6A epitranscriptome. For validation, m6ATM achieved an accuracy of 80% to 98% across in vitro transcription datasets containing varying m6A modification ratios and outperformed other tools in benchmarking with human cell line data. Moreover, we demonstrated the versatility of m6ATM in providing reliable stoichiometric information and used it to pinpoint PEG10 as a potential m6A target transcript in liver cancer cells. In conclusion, m6ATM is a high-performance m6A detection tool, and our results pave the way for future advancements in epitranscriptomic research.
Project description:RNA internal modifications play critical role in development of multicellular organisms and their response to environmental cues. Using nanopore direct RNA sequencing (DRS), we constructed a large in vitro epitranscriptome (IVET) resource from plant cDNA library labeled with m6A, m1A and m5C respectively. Furthermore, after transfer learning, the pre-trained model was used to detect additional RNA internal modification such as m1A, hm5C, m7G and Ψ modification. Finally, we illustrated a global view of epitranscriptome with m6A, m1A, m5C, m7G and Ψ modification in rice seedlings under normal and high salinity environment. In summary, we provided a strategy for creating IVET resource from cDNA library and developed a computational method that use IVET-based transfer learning termed TandemMod for profiling epitranscriptome landscape with co-occupancy of multiple types of RNA modification in plants responsive to environmental signal.
Project description:We developed a semi-supervised deep learning framework for the identification of doublets in scRNA-seq analysis called Solo. To validate our method, we used MULTI-seq, cholesterol modified oligos (CMOs), to experimentally identify doublets in a solid tissue with diverse cell types, mouse kidney, and showed Solo recapitulated experimentally identified doublets.
Project description:Here we present miR-eCLIP analysis of AGO2 in HEK293 cells to address the small RNA repertoire and uncover their physiological targets. We developed an optimized bioinformatics approach of chimeric read identification to detect chimeras of high confidence, which were useed as an biologically validated input for miRBind, a deep learning method and web-server that can be used to accurately predict the potential of miRNA:target site binding.
Project description:Long-read sequencing has become a powerful tool for alternative splicing analysis. However, technical and computational challenges have limited our ability to couple long-read sequencing with single cell and spatial barcoding to explore alternative splicing in the single cell and spatial setting. Though Nanopore-based long reads sequencing are widelyhave been adopted applied to explore single cell alternative and spatially barcoded librariessplicing in recent research, there still exist technical issues have problems which could bias the hindered accurate single cell isoform-level quantification, which are not well addressed in such settings. First, Tthe relatively higher sequencing error of Nanopore long reads, despite the recent improvements, has limited the accuracy ofhinder cell barcode and unique molecular identifier (UMI) recovery, a necessary first step in the analysis of single cell/spatial sequencing data. Then Rread truncation and mapping errors, the latter exacerbated by the higher sequencing error rates, further leads to the false detection of spurious new isoformsdegrade quantification accuracy. We show that these technical issues persist despite the recent improvements in long read sequencing accuracy. Beyond the initial data pre-processing, in downstream analysis we are lacking a statistical framework to quantify splicing variation within and between cells/spots. In light of these multiple challenges, we developed Longcell, a statistical framework and computational pipeline for isoform quantification using single cell and spatial spot barcoded Nanopore long read sequencing data. Longcell performs computationally efficient cell/spot barcode extraction, UMI recovery, and UMI-based truncation- and mapping-error correction. Through a statistical model that accounts for varying read coverage across cells/spots, Longcell rigorously quantifies the level of inter-cell/spot versus intra-cell/ spot diversity in exon-usage and detects changes in splicing distributions between cell populations. Applying Longcell to single cell long-read data from multiple contexts, we found that intra-cell splicing heterogeneity, where multiple isoforms co-exist within the same cell, is ubiquitous for highly expressed genes. On matched single cell and Visium long read sequencing for a tissue of colorectal cancer metastasis to the liver, Longcell found concordant signals between the single cell and spatial data modalities. On Visium long read sequencing data for multiple tissues, Longcell allows accurate identification of spatial isoform switching. Finally, on a perturbation experiment for 9 splicing factors, Longcell identified regulatory targets that are validated by targeted sequencing.
Project description:Nanopore sequencers can enrich or deplete the targeted DNA molecules in a library by reversing the voltage across individual nanopores. However, it requires substantial computational resources to achieve rapid operations in parallel at read-time sequencing. We present a deep learning framework, NanoDeep, to overcome these limitations by incorporating convolutional neural network and squeeze and excitation. We first showed that the raw squiggle derived from native DNA sequences determines the origin of microbial and human genomes. Then, we demonstrated that NanoDeep successfully classified bacterial reads from the pooled library with human sequence and showed enrichment for bacterial sequence compared with routine nanopore sequencing setting. Further, we showed that NanoDeep improves the sequencing efficiency and preserves the fidelity of bacterial genomes in the mock sample. In addition, NanoDeep performs well in the enrichment of metagenome sequences of gut samples, showing its potential applications in the enrichment of unknown microbiota. Our toolkit is available at https://github.com/lysovosyl/NanoDeep.
Project description:To be able to reliably generate theoretical libraries that can be used in SWATH experiments, we developed a prediction framework, deep-learning for SWATH analysis (dpSWATH), to improve the sensitivity and specificity of data generated by Q-TOF mass spectrometers. The theoretical library built by dpSWATH allowed us to increase the identification rate of proteins and peptides compared to traditional or library-free methods. Especially, the in-silico library built based on the transcriptome scale identified the most proteins while kept a similar FDR as DDA library. Based on our analysis we conclude that dpSWATH is superior in predicting libraries that can be used for SWATH-MS measurements compared to other algorithms that are based on Orbitrap data.