Project description:RNA modifications, particularly N6-methyladenosine (m6A), are pivotal regulators of RNA functionality and cellular processes. We analyzed m6A modifications by employing Oxford Nanopore technology and the m6Anet algorithm, focusing on the HepG2 cell line. We identified 3968 potential m6A modification sites in 2851 transcripts, corresponding to 1396 genes. A gene functional analysis revealed the active involvement of m6A-modified genes in ubiquitination, transcription regulation, and protein folding processes, aligning with the known role of m6A modifications in histone ubiquitination in cancer. To ensure data robustness, we assessed reproducibility across technical replicates. This study underscores the importance of evaluating algorithmic reproducibility, especially in supervised learning. Furthermore, we examined correlations between transcriptomic, translatomic, and proteomic levels. A strong transcriptomic-translatomic correlation was observed. In conclusion, our study deepens our understanding of m6A modifications' multifaceted impacts on cellular processes and underscores the importance of addressing reproducibility concerns in analytical approaches.
Project description:N(6)-methyladenosine (m(6)A) is a prevalent RNA methylation modification involved in the regulation of degradation, subcellular localization, splicing and local conformation changes of RNA transcripts. High-throughput experiments have demonstrated that only a small fraction of the m(6)A consensus motifs in mammalian transcriptomes are modified. Therefore, accurate identification of RNA m(6)A sites becomes emergently important. For the above purpose, here a computational predictor of mammalian m(6)A site named SRAMP is established. To depict the sequence context around m(6)A sites, SRAMP combines three random forest classifiers that exploit the positional nucleotide sequence pattern, the K-nearest neighbor information and the position-independent nucleotide pair spectrum features, respectively. SRAMP uses either genomic sequences or cDNA sequences as its input. With either kind of input sequence, SRAMP achieves competitive performance in both cross-validation tests and rigorous independent benchmarking tests. Analyses of the informative features and overrepresented rules extracted from the random forest classifiers demonstrate that nucleotide usage preferences at the distal positions, in addition to those at the proximal positions, contribute to the classification. As a public prediction server, SRAMP is freely available at http://www.cuilab.cn/sramp/.
Project description:N(6)-methyladenosine (m(6)A), the most abundant internal RNA modification, functions in diverse biological processes, including regulation of embryonic stem cell self-renewal and differentiation. As yet, methods to detect m(6)A in the transcriptome rely on the availability and quality of an m(6)A antibody and are often associated with a high rate of false positives. Here, based on our observation that m(6)A interferes with A-T/U pairing, we report a microarray-based technology to map m(6)A sites in mouse embryonic stem cells. We identified 72 unbiased sites exhibiting high m(6)A levels from 66 PolyA RNAs. Bioinformatics analyses suggest identified sites are enriched on developmental regulators and may in some contexts modulate microRNA/mRNA interactions. Overall, we have developed microarray-based technology to capture highly enriched m(6)A sites in the mammalian transcriptome. This method provides an alternative means to identify m(6)A sites for certain applications.
Project description:As a prevalent existing post-transcriptional modification of RNA, N6-methyladenosine (m6A) plays a crucial role in various biological processes. To better radically reveal its regulatory mechanism and provide new insights for drug design, the accurate identification of m6A sites in genome-wide is vital. As the traditional experimental methods are time-consuming and cost-prohibitive, it is necessary to design a more efficient computational method to detect the m6A sites. In this study, we propose a novel cross-species computational method DNN-m6A based on the deep neural network (DNN) to identify m6A sites in multiple tissues of human, mouse and rat. Firstly, binary encoding (BE), tri-nucleotide composition (TNC), enhanced nucleic acid composition (ENAC), K-spaced nucleotide pair frequencies (KSNPFs), nucleotide chemical property (NCP), pseudo dinucleotide composition (PseDNC), position-specific nucleotide propensity (PSNP) and position-specific dinucleotide propensity (PSDP) are employed to extract RNA sequence features which are subsequently fused to construct the initial feature vector set. Secondly, we use elastic net to eliminate redundant features while building the optimal feature subset. Finally, the hyper-parameters of DNN are tuned with Bayesian hyper-parameter optimization based on the selected feature subset. The five-fold cross-validation test on training datasets show that the proposed DNN-m6A method outperformed the state-of-the-art method for predicting m6A sites, with an accuracy (ACC) of 73.58%-83.38% and an area under the curve (AUC) of 81.39%-91.04%. Furthermore, the independent datasets achieved an ACC of 72.95%-83.04% and an AUC of 80.79%-91.09%, which shows an excellent generalization ability of our proposed method.
Project description:N6-methyladenosine (m6A) is the most abundant within eukaryotic messenger RNA modification, which plays an essential regulatory role in the control of cellular functions and gene expression. However, it remains an outstanding challenge to detect mRNA m6A transcriptome-wide at base resolution via experimental approaches, which are generally time-consuming and expensive. Developing computational methods is a good strategy for accurate in silico detection of m6A modification sites from the large amount of RNA sequence data. Unfortunately, the existing computational models are usually only for m6A site prediction in a single species, without considering the tissue level of species, while most of them are constructed based on low-confidence level data generated by an m6A antibody immunoprecipitation (IP)-based sequencing method, thereby restricting reliability and generalizability of proposed models. Here, we review recent advances in computational prediction of m6A sites and construct a new computational approach named im6APred using ensemble deep learning to accurately identify m6A sites based on high-confidence level data in multiple tissues of mammals. Our model im6APred builds upon a comprehensive evaluation of multiple classification methods, including four traditional classification algorithms and three deep learning methods and their ensembles. The optimal base-classifier combinations are then chosen by five-fold cross-validation test to achieve an effective stacked model. Our model im6APred can produce the area under the receiver operating characteristic curve (AUROC) in the range of 0.82-0.91 on independent tests, indicating that our model has the ability to learn general methylation rules on RNA bases and generalize to m6A transcriptome-wide identification. Moreover, AUROCs in the range of 0.77-0.96 were achieved using cross-species/tissues validation on the benchmark dataset, demonstrating differences in predictive performance at the tissue level and the need for constructing tissue-specific models for m6A site prediction.
Project description:N6-methyladenosine (m6A) plays a crucial regulatory role in the control of cellular functions and gene expression. Recent advances in sequencing techniques for transcriptome-wide m6A mapping have accelerated the accumulation of m6A site information at a single-nucleotide level, providing more high-confidence training data to develop computational approaches for m6A site prediction. However, it is still a major challenge to precisely predict m6A sites using in silico approaches. To advance the computational support for m6A site identification, here, we curated 13 up-to-date benchmark datasets from nine different species (i.e., H. sapiens, M. musculus, Rat, S. cerevisiae, Zebrafish, A. thaliana, Pig, Rhesus, and Chimpanzee). This will assist the research community in conducting an unbiased evaluation of alternative approaches and support future research on m6A modification. We revisited 52 computational approaches published since 2015 for m6A site identification, including 30 traditional machine learning-based, 14 deep learning-based, and 8 ensemble learning-based methods. We comprehensively reviewed these computational approaches in terms of their training datasets, calculated features, computational methodologies, performance evaluation strategy, and webserver/software usability. Using these benchmark datasets, we benchmarked nine predictors with available online websites or stand-alone software and assessed their prediction performance. We found that deep learning and traditional machine learning approaches generally outperformed scoring function-based approaches. In summary, the curated benchmark dataset repository and the systematic assessment in this study serve to inform the design and implementation of state-of-the-art computational approaches for m6A identification and facilitate more rigorous comparisons of new methods in the future.
Project description:Synthetic lethality (SL) has shown great promise for the discovery of novel targets in cancer. CRISPR double-knockout (CDKO) technologies can only screen several hundred genes and their combinations, but not genome-wide. Therefore, good SL prediction models are highly needed for genes and gene pairs selection in CDKO experiments. In this paper, we develop a novel multi-layer encoder for individual sample-specific SL prediction (MLEC-iSL). Unlike existing SL prediction models, MLEC-iSL is built to predict SL connectivity first. Because SL connectivity is scalable from existing genes in the training data to new genes in validation data, we hypothesize MLEC-iSL has better SL prediction performance. MLEC-iSL has three encoders, namely gene encoder, graph encoder, and transformer encoder. MLEC-iSL has high performance in K562 (AUPR, 0.73; AUC, 0.72) and Jurkat (AUPR, 0.73; AUC, 0.71) cells while no existing methods exceed 0.62 AUPR and AUC in either cell. MLEC-iSL guided CDKO experiment in 22Rv1 cells yielded a 46.8% SL ratio amongst its selected gene pairs. Six of top ten SL connectivity hub genes are validated in 22Rv1 cells. It reveals SL gene pairs and dependency between apoptosis and mitosis cell death pathways.
Project description:N(1)-methyladenosine (m(1)A) is a prominent RNA modification involved in many biological processes. Accurate identification of m(1)A site is invaluable for better understanding the biological functions of m(1)A. However, limitations in experimental methods preclude the progress towards the identification of m(1)A site. As an excellent complement of experimental methods, a support vector machine based-method called RAMPred is proposed to identify m(1)A sites in H. sapiens, M. musculus and S. cerevisiae genomes for the first time. In this method, RNA sequences are encoded by using nucleotide chemical property and nucleotide compositions. RAMPred achieves promising performances in jackknife tests, cross cell line tests and cross species tests, indicating that RAMPred holds very high potential to become a useful tool for identifying m(1)A sites. For the convenience of experimental scientists, a web-server based on the proposed model was constructed and could be freely accessible at http://lin.uestc.edu.cn/server/RAMPred.
Project description:The most communal post-transcriptional modification, N6-methyladenosine (m6A), is associated with a number of crucial biological processes. The precise detection of m6A sites around the genome is critical for revealing its regulatory function and providing new insights into drug design. Although both experimental and computational models for detecting m6A sites have been introduced, but these conventional methods are laborious and expensive. Furthermore, only a handful of these models are capable of detecting m6A sites in various tissues. Therefore, a more generic and optimized computational method for detecting m6A sites in different tissues is required. In this paper, we proposed a universal model using a deep neural network (DNN) and named it TS-m6A-DL, which can classify m6A sites in several tissues of humans (Homo sapiens), mice (Mus musculus), and rats (Rattus norvegicus). To extract RNA sequence features and to convert the input into numerical format for the network, we utilized one-hot-encoding method. The model was tested using fivefold cross-validation and its stability was measured using independent datasets. The proposed model, TS-m6A-DL, achieved accuracies in the range of 75-85% using the fivefold cross-validation method and 72-84% on the independent datasets. Finally, to authenticate the generalization of the model, we performed cross-species testing and proved the generalization ability by achieving state-of-the-art results.
Project description:Here, we performed N6-methyladenosine (m6A) RNA sequencing to determine the circRNA m6A methylation changes in the placentas during the pathogenesis of preeclampsia (PE). We verified the expression of the circRNA circPAPPA2 using quantitative reverse transcription-PCR. An invasion assay was carried out to identify the role of circPAPPA2 in the development of PE. Mechanistically, we investigated the cause of the altered m6A modification of circPAPPA2 through overexpression and knockdown cell experiments, RNA immunoprecipitation, fluorescence in situ hybridization and RNA stability experiments. We found that increases in m6A-modified circRNAs are prevalent in PE placentas and that the main changes in methylation occur in the 3'UTR and near the start codon, implicating the involvement of these changes in PE development. We also found that the levels of circPAPPA2 are decreased but that m6A modification is augmented. Furthermore, we discovered that methyltransferase‑like 14 (METTL14) increases the level of circPAPPA2 m6A methylation and that insulin-like growth factor 2 mRNA-binding protein 3 (IGF2BP3) maintains circPAPPA2 stability. Decreases in IGF2BP3 levels lead to declines in circPAPPA2 levels. In summary, we provide a new vision and strategy for the study of PE pathology and report that placental circRNA m6A modification appears to be an important regulatory mechanism.