Unknown

Dataset Information

0

A new approach for interpreting Random Forest models and its application to the biology of ageing.


ABSTRACT: Motivation:This work uses the Random Forest (RF) classification algorithm to predict if a gene is over-expressed, under-expressed or has no change in expression with age in the brain. RFs have high predictive power, and RF models can be interpreted using a feature (variable) importance measure. However, current feature importance measures evaluate a feature as a whole (all feature values). We show that, for a popular type of biological data (Gene Ontology-based), usually only one value of a feature is particularly important for classification and the interpretation of the RF model. Hence, we propose a new algorithm for identifying the most important and most informative feature values in an RF model. Results:The new feature importance measure identified highly relevant Gene Ontology terms for the aforementioned gene classification task, producing a feature ranking that is much more informative to biologists than an alternative, state-of-the-art feature importance measure. Availability and implementation:The dataset and source codes used in this paper are available as 'Supplementary Material' and the description of the data can be found at: https://fabiofabris.github.io/bioinfo2018/web/. Supplementary information:Supplementary data are available at Bioinformatics online.

SUBMITTER: Fabris F 

PROVIDER: S-EPMC6041990 | biostudies-literature | 2018 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

A new approach for interpreting Random Forest models and its application to the biology of ageing.

Fabris Fabio F   Doherty Aoife A   Palmer Daniel D   de Magalhães João Pedro JP   Freitas Alex A AA  

Bioinformatics (Oxford, England) 20180701 14


<h4>Motivation</h4>This work uses the Random Forest (RF) classification algorithm to predict if a gene is over-expressed, under-expressed or has no change in expression with age in the brain. RFs have high predictive power, and RF models can be interpreted using a feature (variable) importance measure. However, current feature importance measures evaluate a feature as a whole (all feature values). We show that, for a popular type of biological data (Gene Ontology-based), usually only one value o  ...[more]

Similar Datasets

| S-EPMC7847145 | biostudies-literature
| S-EPMC5851637 | biostudies-literature
| S-EPMC4724119 | biostudies-literature
| S-EPMC8193767 | biostudies-literature
| S-EPMC6823902 | biostudies-literature
| S-EPMC5751401 | biostudies-literature
| S-EPMC7055778 | biostudies-literature
| S-EPMC9304350 | biostudies-literature
| S-EPMC8209152 | biostudies-literature
| S-EPMC7096517 | biostudies-literature