Unknown

Dataset Information

0

Accurate Prediction of Transposon-Derived piRNAs by Integrating Various Sequential and Physicochemical Features.


ABSTRACT:

Background

Piwi-interacting RNA (piRNA) is the largest class of small non-coding RNA molecules. The transposon-derived piRNA prediction can enrich the research contents of small ncRNAs as well as help to further understand generation mechanism of gamete.

Methods

In this paper, we attempt to differentiate transposon-derived piRNAs from non-piRNAs based on their sequential and physicochemical features by using machine learning methods. We explore six sequence-derived features, i.e. spectrum profile, mismatch profile, subsequence profile, position-specific scoring matrix, pseudo dinucleotide composition and local structure-sequence triplet elements, and systematically evaluate their performances for transposon-derived piRNA prediction. Finally, we consider two approaches: direct combination and ensemble learning to integrate useful features and achieve high-accuracy prediction models.

Results

We construct three datasets, covering three species: Human, Mouse and Drosophila, and evaluate the performances of prediction models by 10-fold cross validation. In the computational experiments, direct combination models achieve AUC of 0.917, 0.922 and 0.992 on Human, Mouse and Drosophila, respectively; ensemble learning models achieve AUC of 0.922, 0.926 and 0.994 on the three datasets.

Conclusions

Compared with other state-of-the-art methods, our methods can lead to better performances. In conclusion, the proposed methods are promising for the transposon-derived piRNA prediction. The source codes and datasets are available in S1 File.

SUBMITTER: Luo L 

PROVIDER: S-EPMC4830532 | biostudies-literature | 2016

REPOSITORIES: biostudies-literature

altmetric image

Publications

Accurate Prediction of Transposon-Derived piRNAs by Integrating Various Sequential and Physicochemical Features.

Luo Longqiang L   Li Dingfang D   Zhang Wen W   Tu Shikui S   Zhu Xiaopeng X   Tian Gang G  

PloS one 20160413 4


<h4>Background</h4>Piwi-interacting RNA (piRNA) is the largest class of small non-coding RNA molecules. The transposon-derived piRNA prediction can enrich the research contents of small ncRNAs as well as help to further understand generation mechanism of gamete.<h4>Methods</h4>In this paper, we attempt to differentiate transposon-derived piRNAs from non-piRNAs based on their sequential and physicochemical features by using machine learning methods. We explore six sequence-derived features, i.e.  ...[more]

Similar Datasets

| S-EPMC4308892 | biostudies-literature
| S-EPMC5613491 | biostudies-literature
| S-EPMC5006569 | biostudies-literature
| S-EPMC4104576 | biostudies-literature
| S-EPMC7488740 | biostudies-literature
| S-EPMC4513867 | biostudies-literature
| S-EPMC8245499 | biostudies-literature
| S-EPMC2805124 | biostudies-literature
| S-EPMC5860356 | biostudies-other
| S-EPMC8016469 | biostudies-literature