Dataset Information

Data-efficient machine learning for molecular crystal structure prediction.

ABSTRACT: The combination of modern machine learning (ML) approaches with high-quality data from quantum mechanical (QM) calculations can yield models with an unrivalled accuracy/cost ratio. However, such methods are ultimately limited by the computational effort required to produce the reference data. In particular, reference calculations for periodic systems with many atoms can become prohibitively expensive for higher levels of theory. This trade-off is critical in the context of organic crystal structure prediction (CSP). Here, a data-efficient ML approach would be highly desirable, since screening a huge space of possible polymorphs in a narrow energy range requires the assessment of a large number of trial structures with high accuracy. In this contribution, we present tailored Δ-ML models that allow screening a wide range of crystal candidates while adequately describing the subtle interplay between intermolecular interactions such as H-bonding and many-body dispersion effects. This is achieved by enhancing a physics-based description of long-range interactions at the density functional tight binding (DFTB) level-for which an efficient implementation is available-with a short-range ML model trained on high-quality first-principles reference data. The presented workflow is broadly applicable to different molecular materials, without the need for a single periodic calculation at the reference level of theory. We show that this even allows the use of wavefunction methods in CSP.

SUBMITTER: Wengert S

PROVIDER: S-EPMC8179468 | biostudies-literature | 2021 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Data-efficient machine learning for molecular crystal structure prediction.

Wengert Simon S Csányi Gábor G Reuter Karsten K Margraf Johannes T JT

Chemical science 20210211 12

The combination of modern machine learning (ML) approaches with high-quality data from quantum mechanical (QM) calculations can yield models with an unrivalled accuracy/cost ratio. However, such methods are ultimately limited by the computational effort required to produce the reference data. In particular, reference calculations for periodic systems with many atoms can become prohibitively expensive for higher levels of theory. This trade-off is critical in the context of organic crystal struct ...[more]

PMID: 34163719

Dataset Information

Data-efficient machine learning for molecular crystal structure prediction.

Publications

Data-efficient machine learning for molecular crystal structure prediction.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

A Hybrid Machine Learning Approach for Structure Stability Prediction in Molecular Co-crystal Screenings.
| S-EPMC9281391 | biostudies-literature

Geometric Deep Learning for Molecular Crystal Structure Prediction.
| S-EPMC10373482 | biostudies-literature

Machine-Learned Potentials by Active Learning from Organic Crystal Structure Prediction Landscapes.
| S-EPMC10860135 | biostudies-literature

Structure prediction of cyclic peptides by molecular dynamics + machine learning.
| S-EPMC8597836 | biostudies-literature

Prediction of Breast Cancer Estrogen Receptor Status using Machine Learning
2013-01-01 | E-GEOD-29210 | biostudies-arrayexpress

Machine learning for RNA 2D structure prediction benchmarked on experimental data.
| S-EPMC10199776 | biostudies-literature

Efficient clinical data analysis for prediction of coal workers' pneumoconiosis using machine learning algorithms.
| S-EPMC10363790 | biostudies-literature

Algal community structure prediction by machine learning.
| S-EPMC9923192 | biostudies-literature

Machine learning/molecular dynamic protein structure prediction approach to investigate the protein conformational ensemble.
| S-EPMC9200820 | biostudies-literature

Machine learning a model for RNA structure prediction.
| S-EPMC7671377 | biostudies-literature