Dataset Information

Machine learning dihydrogen activation in the chemical space surrounding Vaska's complex.

ABSTRACT: Homogeneous catalysis using transition metal complexes is ubiquitously used for organic synthesis, as well as technologically relevant in applications such as water splitting and CO2 reduction. The key steps underlying homogeneous catalysis require a specific combination of electronic and steric effects from the ligands bound to the metal center. Finding the optimal combination of ligands is a challenging task due to the exceedingly large number of possibilities and the non-trivial ligand-ligand interactions. The classic example of Vaska's complex, trans-[Ir(PPh3)2(CO)(Cl)], illustrates this scenario. The ligands of this species activate iridium for the oxidative addition of hydrogen, yielding the dihydride cis-[Ir(H)2(PPh3)2(CO)(Cl)] complex. Despite the simplicity of this system, thousands of derivatives can be formulated for the activation of H2, with a limited number of ligands belonging to the same general categories found in the original complex. In this work, we show how DFT and machine learning (ML) methods can be combined to enable the prediction of reactivity within large chemical spaces containing thousands of complexes. In a space of 2574 species derived from Vaska's complex, data from DFT calculations are used to train and test ML models that predict the H2-activation barrier. In contrast to experiments and calculations requiring several days to be completed, the ML models were trained and used on a laptop on a time-scale of minutes. As a first approach, we combined Bayesian-optimized artificial neural networks (ANN) with features derived from autocorrelation and deltametric functions. The resulting ANNs achieved high accuracies, with mean absolute errors (MAE) between 1 and 2 kcal mol-1, depending on the size of the training set. By using a Gaussian process (GP) model trained with a set of selected features, including fingerprints, accuracy was further enhanced. Remarkably, this GP model minimized the MAE below 1 kcal mol-1, by using only 20% or less of the data available for training. The gradient boosting (GB) method was also used to assess the relevance of the features, which was used for both feature selection and model interpretation purposes. Features accounting for chemical composition, atom size and electronegativity were found to be the most determinant in the predictions. Further, the ligand fragments with the strongest influence on the H2-activation barrier were identified.

SUBMITTER: Friederich P

PROVIDER: S-EPMC7659707 | biostudies-literature | 2020 May

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Machine learning dihydrogen activation in the chemical space surrounding Vaska's complex.

Friederich Pascal P Dos Passos Gomes Gabriel G De Bin Riccardo R Aspuru-Guzik Alán A Balcells David D

Chemical science 20200407 18

Homogeneous catalysis using transition metal complexes is ubiquitously used for organic synthesis, as well as technologically relevant in applications such as water splitting and CO<sub>2</sub> reduction. The key steps underlying homogeneous catalysis require a specific combination of electronic and steric effects from the ligands bound to the metal center. Finding the optimal combination of ligands is a challenging task due to the exceedingly large number of possibilities and the non-trivial li ...[more]

PMID: 33224459

Dataset Information

Machine learning dihydrogen activation in the chemical space surrounding Vaska's complex.

Publications

Machine learning dihydrogen activation in the chemical space surrounding Vaska's complex.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Machine learning in chemical reaction space.
| S-EPMC7603480 | biostudies-literature

Machine Learning Informs RNA-Binding Chemical Space.
| S-EPMC9992102 | biostudies-literature

Comprehensive machine learning based study of the chemical space of herbicides.
| S-EPMC8169684 | biostudies-literature

Exploring the Chemical Space of CYP17A1 Inhibitors Using Cheminformatics and Machine Learning.
| S-EPMC9966999 | biostudies-literature

Exploring the chemical space of protein-protein interaction inhibitors through machine learning.
| S-EPMC8238997 | biostudies-literature

A review on machine learning algorithms for the ionic liquid chemical space.
| S-EPMC8153233 | biostudies-literature

Machine learning meets complex networks via coalescent embedding in the hyperbolic space.
| S-EPMC5694768 | biostudies-literature

Predicting cell-penetrating peptides using machine learning algorithms and navigating in their chemical space.
| S-EPMC8027643 | biostudies-literature

Assigning the Origin of Microbial Natural Products by Chemical Space Map and Machine Learning.
| S-EPMC7600738 | biostudies-literature

Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space.
| S-EPMC4476293 | biostudies-literature