Unknown

Dataset Information

0

Improving classification of mature microRNA by solving class imbalance problem.


ABSTRACT: MicroRNAs (miRNAs) are ~20-25 nucleotides non-coding RNAs, which regulated gene expression in the post-transcriptional level. The accurate rate of identifying the start sit of mature miRNA from a given pre-miRNA remains lower. It is noting that the mature miRNA prediction is a class-imbalanced problem which also leads to the unsatisfactory performance of these methods. We improved the prediction accuracy of classifier using balanced datasets and presented MatFind which is used for identifying 5' mature miRNAs candidates from their pre-miRNA based on ensemble SVM classifiers with idea of adaboost. Firstly, the balanced-dataset was extract based on K-nearest neighbor algorithm. Secondly, the multiple SVM classifiers were trained in orderly using the balance datasets base on represented features. At last, all SVM classifiers were combined together to form the ensemble classifier. Our results on independent testing dataset show that the proposed method is more efficient than one without treating class imbalance problem. Moreover, MatFind achieves much higher classification accuracy than other three approaches. The ensemble SVM classifiers and balanced-datasets can solve the class-imbalanced problem, as well as improve performance of classifier for mature miRNA identification. MatFind is an accurate and fast method for 5' mature miRNA identification.

SUBMITTER: Wang Y 

PROVIDER: S-EPMC4867574 | biostudies-literature | 2016 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

Improving classification of mature microRNA by solving class imbalance problem.

Wang Ying Y   Li Xiaoye X   Tao Bairui B  

Scientific reports 20160516


MicroRNAs (miRNAs) are ~20-25 nucleotides non-coding RNAs, which regulated gene expression in the post-transcriptional level. The accurate rate of identifying the start sit of mature miRNA from a given pre-miRNA remains lower. It is noting that the mature miRNA prediction is a class-imbalanced problem which also leads to the unsatisfactory performance of these methods. We improved the prediction accuracy of classifier using balanced datasets and presented MatFind which is used for identifying 5'  ...[more]

Similar Datasets

| S-EPMC5492254 | biostudies-other
| S-EPMC7240486 | biostudies-literature
| S-EPMC4349932 | biostudies-literature
| S-EPMC7705099 | biostudies-literature
| S-EPMC5641385 | biostudies-other
| S-EPMC10232287 | biostudies-literature
| S-EPMC4523725 | biostudies-literature
| S-EPMC8606407 | biostudies-literature
| S-EPMC7979646 | biostudies-literature
| PRJEB40949 | ENA