Dataset Information

R.ROSETTA: an interpretable machine learning framework.

ABSTRACT:

Background

Machine learning involves strategies and algorithms that may assist bioinformatics analyses in terms of data mining and knowledge discovery. In several applications, viz. in Life Sciences, it is often more important to understand how a prediction was obtained rather than knowing what prediction was made. To this end so-called interpretable machine learning has been recently advocated. In this study, we implemented an interpretable machine learning package based on the rough set theory. An important aim of our work was provision of statistical properties of the models and their components.

Results

We present the R.ROSETTA package, which is an R wrapper of ROSETTA framework. The original ROSETTA functions have been improved and adapted to the R programming environment. The package allows for building and analyzing non-linear interpretable machine learning models. R.ROSETTA gathers combinatorial statistics via rule-based modelling for accessible and transparent results, well-suited for adoption within the greater scientific community. The package also provides statistics and visualization tools that facilitate minimization of analysis bias and noise. The R.ROSETTA package is freely available at https://github.com/komorowskilab/R.ROSETTA . To illustrate the usage of the package, we applied it to a transcriptome dataset from an autism case-control study. Our tool provided hypotheses for potential co-predictive mechanisms among features that discerned phenotype classes. These co-predictors represented neurodevelopmental and autism-related genes.

Conclusions

R.ROSETTA provides new insights for interpretable machine learning analyses and knowledge-based systems. We demonstrated that our package facilitated detection of dependencies for autism-related genes. Although the sample application of R.ROSETTA illustrates transcriptome data analysis, the package can be used to analyze any data organized in decision tables.

SUBMITTER: Garbulowski M

PROVIDER: S-EPMC7937228 | biostudies-literature | 2021 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

R.ROSETTA: an interpretable machine learning framework.

Garbulowski Mateusz M Diamanti Klev K Smolińska Karolina K Baltzer Nicholas N Stoll Patricia P Bornelöv Susanne S Øhrn Aleksander A Feuk Lars L Komorowski Jan J

BMC bioinformatics 20210306 1

<h4>Background</h4>Machine learning involves strategies and algorithms that may assist bioinformatics analyses in terms of data mining and knowledge discovery. In several applications, viz. in Life Sciences, it is often more important to understand how a prediction was obtained rather than knowing what prediction was made. To this end so-called interpretable machine learning has been recently advocated. In this study, we implemented an interpretable machine learning package based on the rough se ...[more]

PMID: 33676405

Similar Datasets

Project description:ObjectivePostoperative red blood cell (RBC) transfusion is widely used during the perioperative period but is often associated with a high risk of infection and complications. However, prediction models for RBC transfusion in patients with orthopedic surgery have not yet been developed. We aimed to identify predictors and constructed prediction models for RBC transfusion after orthopedic surgery using interpretable machine learning algorithms.MethodsThis retrospective cohort study reviewed a total of 59,605 patients undergoing orthopedic surgery from June 2013 to January 2019 across 7 tertiary hospitals in China. Patients were randomly split into training (80%) and test subsets (20%). The feature selection method of recursive feature elimination (RFE) was used to identify an optimal feature subset from thirty preoperative variables, and six machine learning algorithms were applied to develop prediction models. The Shapley Additive exPlanations (SHAP) value was employed to evaluate the contribution of each predictor towards the prediction of postoperative RBC transfusion. For simplicity of the clinical utility, a risk score system was further established using the top risk factors identified by machine learning models.ResultsOf the 59,605 patients with orthopedic surgery, 19,921 (33.40%) underwent postoperative RBC transfusion. The CatBoost model exhibited an AUC of 0.831 (95% CI: 0.824-0.836) on the test subset, which significantly outperformed five other prediction models. The risk of RBC transfusion was associated with old age (>60 years) and low RBC count (<4.0 × 1012/L) with clear threshold effects. Extremes of BMI, low albumin, prolonged activated partial thromboplastin time, repair and plastic operations on joint structures were additional top predictors for RBC transfusion. The risk score system derived from six risk factors performed well with an AUC of 0.801 (95% CI: 0.794-0.807) on the test subset.ConclusionBy applying an interpretable machine learning framework in a large-scale multicenter retrospective cohort, we identified novel modifiable risk factors and developed prediction models with good performance for postoperative RBC transfusion in patients undergoing orthopedic surgery. Our findings may allow more precise identification of high-risk patients for optimal control of risk factors and achieve personalized RBC transfusion for orthopedic patients.

Dataset Information

R.ROSETTA: an interpretable machine learning framework.

Background

Results

Conclusions

Publications

R.ROSETTA: an interpretable machine learning framework.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets