Unknown

Dataset Information

0

A community-powered search of machine learning strategy space to find NMR property prediction models.


ABSTRACT: The rise of machine learning (ML) has created an explosion in the potential strategies for using data to make scientific predictions. For physical scientists wishing to apply ML strategies to a particular domain, it can be difficult to assess in advance what strategy to adopt within a vast space of possibilities. Here we outline the results of an online community-powered effort to swarm search the space of ML strategies and develop algorithms for predicting atomic-pairwise nuclear magnetic resonance (NMR) properties in molecules. Using an open-source dataset, we worked with Kaggle to design and host a 3-month competition which received 47,800 ML model predictions from 2,700 teams in 84 countries. Within 3 weeks, the Kaggle community produced models with comparable accuracy to our best previously published 'in-house' efforts. A meta-ensemble model constructed as a linear combination of the top predictions has a prediction accuracy which exceeds that of any individual model, 7-19x better than our previous state-of-the-art. The results highlight the potential of transformer architectures for predicting quantum mechanical (QM) molecular properties.

SUBMITTER: Bratholm LA 

PROVIDER: S-EPMC8291653 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC8587315 | biostudies-literature
| S-EPMC9579493 | biostudies-literature
| S-EPMC10938832 | biostudies-literature
| S-EPMC11233499 | biostudies-literature
| S-EPMC7952555 | biostudies-literature
| S-EPMC3340027 | biostudies-literature
| S-EPMC7351018 | biostudies-literature
2021-03-26 | PXD022280 | Pride
| S-EPMC8214147 | biostudies-literature
| S-EPMC4531788 | biostudies-literature