Unknown

Dataset Information

0

A framework for improving microRNA prediction in non-human genomes.


ABSTRACT: The prediction of novel pre-microRNA (miRNA) from genomic sequence has received considerable attention recently. However, the majority of studies have focused on the human genome. Previous studies have demonstrated that sensitivity (correctly detecting true miRNA) is sustained when human-trained methods are applied to other species, however they have failed to report the dramatic drop in specificity (the ability to correctly reject non-miRNA sequences) in non-human genomes. Considering the ratio of true miRNA sequences to pseudo-miRNA sequences is on the order of 1:1000, such low specificity prevents the application of most existing tools to non-human genomes, as the number of false positives overwhelms the true predictions. We here introduce a framework (SMIRP) for creating species-specific miRNA prediction systems, leveraging sequence conservation and phylogenetic distance information. Substantial improvements in specificity and precision are obtained for four non-human test species when our framework is applied to three different prediction systems representing two types of classifiers (support vector machine and Random Forest), based on three different feature sets, with both human-specific and taxon-wide training data. The SMIRP framework is potentially applicable to all miRNA prediction systems and we expect substantial improvement in precision and specificity, while sustaining sensitivity, independent of the machine learning technique chosen.

SUBMITTER: Peace RJ 

PROVIDER: S-EPMC4787757 | biostudies-literature | 2015 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

A framework for improving microRNA prediction in non-human genomes.

Peace Robert J RJ   Biggar Kyle K KK   Storey Kenneth B KB   Green James R JR  

Nucleic acids research 20150710 20


The prediction of novel pre-microRNA (miRNA) from genomic sequence has received considerable attention recently. However, the majority of studies have focused on the human genome. Previous studies have demonstrated that sensitivity (correctly detecting true miRNA) is sustained when human-trained methods are applied to other species, however they have failed to report the dramatic drop in specificity (the ability to correctly reject non-miRNA sequences) in non-human genomes. Considering the ratio  ...[more]

Similar Datasets

| S-EPMC2955701 | biostudies-literature
| S-EPMC4869178 | biostudies-literature
| S-EPMC6997536 | biostudies-literature
| S-EPMC8021919 | biostudies-literature
| S-EPMC1847833 | biostudies-other
| S-EPMC10723423 | biostudies-literature
| S-EPMC2773258 | biostudies-literature
| S-EPMC5156920 | biostudies-literature
| S-EPMC2816677 | biostudies-literature
| S-EPMC8153819 | biostudies-literature