Unknown

Dataset Information

0

RNA search with decision trees and partial covariance models.


ABSTRACT: The use of partial covariance models to search for RNA family members in genomic sequence databases is explored. The partial models are formed from contiguous subranges of the overall RNA family multiple alignment columns. A binary decision-tree framework is presented for choosing the order to apply the partial models and the score thresholds on which to make the decisions. The decision trees are chosen to minimize computation time subject to the constraint that all of the training sequences are passed to the full covariance model for final evaluation. Computational intelligence methods are suggested to select the decision tree since the tree can be quite complex and there is no obvious method to build the tree in these cases. Experimental results from seven RNA families shows execution times of 0.066-0.268 relative to using the full covariance model alone. Tests on the full sets of known sequences for each family show that at least 95 percent of these sequences are found for two families and 100 percent for five others. Since the full covariance model is run on all sequences accepted by the partial model decision tree, the false alarm rate is at least as low as that of the full model alone.

SUBMITTER: Smith JA 

PROVIDER: S-EPMC3646588 | biostudies-literature | 2009 Jul-Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

RNA search with decision trees and partial covariance models.

Smith Jennifer A JA  

IEEE/ACM transactions on computational biology and bioinformatics 20090701 3


The use of partial covariance models to search for RNA family members in genomic sequence databases is explored. The partial models are formed from contiguous subranges of the overall RNA family multiple alignment columns. A binary decision-tree framework is presented for choosing the order to apply the partial models and the score thresholds on which to make the decisions. The decision trees are chosen to minimize computation time subject to the constraint that all of the training sequences are  ...[more]

Similar Datasets

| S-EPMC3692125 | biostudies-literature
| S-EPMC5478657 | biostudies-literature
| S-EPMC4604032 | biostudies-literature
| S-EPMC4504443 | biostudies-literature
2023-09-30 | PXD010631 | Pride
| S-EPMC8741801 | biostudies-literature
| S-EPMC6133289 | biostudies-literature
| S-EPMC9814695 | biostudies-literature
| S-EPMC3543761 | biostudies-literature
| S-EPMC5354111 | biostudies-literature