Unknown

Dataset Information

0

Predicting A-to-I RNA editing by feature selection and random forest.


ABSTRACT: RNA editing is a post-transcriptional RNA process that provides RNA and protein complexity for regulating gene expression in eukaryotes. It is challenging to predict RNA editing by computational methods. In this study, we developed a novel method to predict RNA editing based on a random forest method. A careful feature selection procedure was performed based on the Maximum Relevance Minimum Redundancy (mRMR) and Incremental Feature Selection (IFS) algorithms. Eighteen optimal features were selected from the 77 features in our dataset and used to construct a final predictor. The accuracy and MCC (Matthews correlation coefficient) values for the training dataset were 0.866 and 0.742, respectively; for the testing dataset, the accuracy and MCC were 0.876 and 0.576, respectively. The performance was higher using 18 features than all 77, suggesting that a small feature set was sufficient to achieve accurate prediction. Analysis of the 18 features was performed and may shed light on the mechanism and dominant factors of RNA editing, providing a basis for future experimental validation.

SUBMITTER: Shu Y 

PROVIDER: S-EPMC4206426 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

altmetric image

Publications

Predicting A-to-I RNA editing by feature selection and random forest.

Shu Yang Y   Zhang Ning N   Kong Xiangyin X   Huang Tao T   Cai Yu-Dong YD  

PloS one 20141022 10


RNA editing is a post-transcriptional RNA process that provides RNA and protein complexity for regulating gene expression in eukaryotes. It is challenging to predict RNA editing by computational methods. In this study, we developed a novel method to predict RNA editing based on a random forest method. A careful feature selection procedure was performed based on the Maximum Relevance Minimum Redundancy (mRMR) and Incremental Feature Selection (IFS) algorithms. Eighteen optimal features were selec  ...[more]

Similar Datasets

| S-EPMC3445488 | biostudies-literature
| S-EPMC5775496 | biostudies-literature
| S-EPMC6251864 | biostudies-literature