Dataset Information

Transformational machine learning: Learning how to learn from many related scientific problems.

ABSTRACT: Almost all machine learning (ML) is based on representing examples using intrinsic features. When there are multiple related ML problems (tasks), it is possible to transform these features into extrinsic features by first training ML models on other tasks and letting them each make predictions for each example of the new task, yielding a novel representation. We call this transformational ML (TML). TML is very closely related to, and synergistic with, transfer learning, multitask learning, and stacking. TML is applicable to improving any nonlinear ML method. We tested TML using the most important classes of nonlinear ML: random forests, gradient boosting machines, support vector machines, k-nearest neighbors, and neural networks. To ensure the generality and robustness of the evaluation, we utilized thousands of ML problems from three scientific domains: drug design, predicting gene expression, and ML algorithm selection. We found that TML significantly improved the predictive performance of all the ML methods in all the domains (4 to 50% average improvements) and that TML features generally outperformed intrinsic features. Use of TML also enhances scientific understanding through explainable ML. In drug design, we found that TML provided insight into drug target specificity, the relationships between drugs, and the relationships between target proteins. TML leads to an ecosystem-based approach to ML, where new tasks, examples, predictions, and so on synergistically interact to improve performance. To contribute to this ecosystem, all our data, code, and our ∼50,000 ML models have been fully annotated with metadata, linked, and openly published using Findability, Accessibility, Interoperability, and Reusability principles (∼100 Gbytes).

SUBMITTER: Olier I

PROVIDER: S-EPMC8670494 | biostudies-literature | 2021 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Transformational machine learning: Learning how to learn from many related scientific problems.

Olier Ivan I Orhobor Oghenejokpeme I OI Dash Tirtharaj T Davis Andy M AM Soldatova Larisa N LN Vanschoren Joaquin J King Ross D RD

Proceedings of the National Academy of Sciences of the United States of America 20211201 49

Almost all machine learning (ML) is based on representing examples using intrinsic features. When there are multiple related ML problems (tasks), it is possible to transform these features into extrinsic features by first training ML models on other tasks and letting them each make predictions for each example of the new task, yielding a novel representation. We call this transformational ML (TML). TML is very closely related to, and synergistic with, transfer learning, multitask learning, and s ...[more]

PMID: 34845013

Dataset Information

Transformational machine learning: Learning how to learn from many related scientific problems.

Publications

Transformational machine learning: Learning how to learn from many related scientific problems.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Machine learning on quantum experimental data toward solving quantum many-body problems.
| S-EPMC11364692 | biostudies-literature

Machine learning and big scientific data.
| S-EPMC7015290 | biostudies-literature

A Bayesian machine scientist to aid in the solution of challenging scientific problems.
| S-EPMC6994216 | biostudies-literature

Designing a multilayer film via machine learning of scientific literature.
| S-EPMC8766440 | biostudies-literature

MLcps: machine learning cumulative performance score for classification problems.
| S-EPMC10716825 | biostudies-literature

Distillation of crop models to learn plant physiology theories using machine learning.
| S-EPMC6541271 | biostudies-literature

Learning to learn: theta oscillations predict new learning, which enhances related learning and neurogenesis.
| S-EPMC3277498 | biostudies-literature

Uncovering heterogeneous associations of disaster-related traumatic experiences with subsequent mental health problems: A machine learning approach.
| S-EPMC9102396 | biostudies-literature

Machine learning approach to identify adverse events in scientific biomedical literature.
| S-EPMC9199879 | biostudies-literature

Predicting mental health problems in adolescence using machine learning techniques.
| S-EPMC7135284 | biostudies-literature