Staem5: A novel computational approachfor accurate prediction of m5C site
Ontology highlight
ABSTRACT: 5-Methylcytosine (m5C) is an important post-transcriptional modification that has been extensively found in multiple types of RNAs. Many studies have shown that m5C plays vital roles in many biological functions, such as RNA structure stability and metabolism. Computational approaches act as an efficient way to identify m5C sites from high-throughput RNA sequence data and help interpret the functional mechanism of this important modification. This study proposed a novel species-specific computational approach, Staem5, to accurately predict RNA m5C sites in Mus musculus and Arabidopsis thaliana. Staem5 was developed by employing feature fusion tactics to leverage informatic sequence profiles, and a stacking ensemble learning framework combined five popular machine learning algorithms. Extensive benchmarking tests demonstrated that Staem5 outperformed state-of-the-art approaches in both cross-validation and independent tests. We provide the source code of Staem5, which is publicly available at https://github.com/Cxd-626/Staem5.git. Graphical abstract Chai et al. developed a novel species-specific computational approach, Staem5, to accurately predict RNA m5C sites in Mus musculus and Arabidopsis thaliana. Staem5 was developed based on feature fusion tactics and a stacking ensemble learning framework. Extensive benchmarking tests demonstrated that Staem5 achieved better predictive performance than state-of-the-art approaches.
SUBMITTER: Chai D
PROVIDER: S-EPMC8571400 | biostudies-literature |
REPOSITORIES: biostudies-literature
ACCESS DATA