Project description:We investigate possible improvements in the accuracy of semiempirical quantum chemistry (SQC) methods through the use of machine learning (ML) models for the parameters. For a given class of compounds, ML techniques require sufficiently large training sets to develop ML models that can be used for adapting SQC parameters to reflect changes in molecular composition and geometry. The ML-SQC approach allows the automatic tuning of SQC parameters for individual molecules, thereby improving the accuracy without deteriorating transferability to molecules with molecular descriptors very different from those in the training set. The performance of this approach is demonstrated for the semiempirical OM2 method using a set of 6095 constitutional isomers C7H10O2, for which accurate ab initio atomization enthalpies are available. The ML-OM2 results show improved average accuracy and a much reduced error range compared with those of standard OM2 results, with mean absolute errors in atomization enthalpies dropping from 6.3 to 1.7 kcal/mol. They are also found to be superior to the results from specific OM2 reparameterizations (rOM2) for the same set of isomers. The ML-SQC approach thus holds promise for fast and reasonably accurate high-throughput screening of materials and molecules.
Project description:Considering recent advancements and successes in the development of efficient quantum algorithms for electronic structure calculations-alongside impressive results using machine learning techniques for computation-hybridizing quantum computing with machine learning for the intent of performing electronic structure calculations is a natural progression. Here we report a hybrid quantum algorithm employing a restricted Boltzmann machine to obtain accurate molecular potential energy surfaces. By exploiting a quantum algorithm to help optimize the underlying objective function, we obtained an efficient procedure for the calculation of the electronic ground state energy for a small molecule system. Our approach achieves high accuracy for the ground state energy for H2, LiH, H2O at a specific location on its potential energy surface with a finite basis set. With the future availability of larger-scale quantum computers, quantum machine learning techniques are set to become powerful tools to obtain accurate values for electronic structures.
Project description:Predicting electronic energies, densities, and related chemical properties can facilitate the discovery of novel catalysts, medicines, and battery materials. However, existing machine learning techniques are challenged by the scarcity of training data when exploring unknown chemical spaces. We overcome this barrier by systematically incorporating knowledge of molecular electronic structure into deep learning. By developing a physics-inspired equivariant neural network, we introduce a method to learn molecular representations based on the electronic interactions among atomic orbitals. Our method, OrbNet-Equi, leverages efficient tight-binding simulations and learned mappings to recover high-fidelity physical quantities. OrbNet-Equi accurately models a wide spectrum of target properties while being several orders of magnitude faster than density functional theory. Despite only using training samples collected from readily available small-molecule libraries, OrbNet-Equi outperforms traditional semiempirical and machine learning-based methods on comprehensive downstream benchmarks that encompass diverse main-group chemical processes. Our method also describes interactions in challenging charge-transfer complexes and open-shell systems. We anticipate that the strategy presented here will help to expand opportunities for studies in chemistry and materials science, where the acquisition of experimental or reference training data is costly.
Project description:Two different types of approaches: (a) approaches that combine quantitative structure activity relationships, quantum mechanical electronic structure methods, and machine-learning and, (b) electronic structure vertical solvation approaches, were used to predict the logP coefficients of 11 molecules as part of the SAMPL6 logP blind prediction challenge. Using electronic structures optimized with density functional theory (DFT), several molecular descriptors were calculated for each molecule, including van der Waals areas and volumes, HOMO/LUMO energies, dipole moments, polarizabilities, and electrophilic and nucleophilic superdelocalizabilities. A multilinear regression model and a partial least squares model were used to train a set of 97 molecules. As well, descriptors were generated using the molecular operating environment and used to create additional machine learning models. Electronic structure vertical solvation approaches considered include DFT and the domain-based local pair natural orbital methods combined with the solvated variant of the correlation consistent composite approach.
Project description:The use of data science tools to provide the emergence of non-trivial chemical features for catalyst design is an important goal in catalysis science. Additionally, there is currently no general strategy for computational homogeneous, molecular catalyst design. Here, we report the unique combination of an experimentally verified DFT-transition-state model with a random forest machine learning model in a campaign to design new molecular Cr phosphine imine (Cr(P,N)) catalysts for selective ethylene oligomerization, specifically to increase 1-octene selectivity. This involved the calculation of 1-hexene : 1-octene transition-state selectivity for 105 (P,N) ligands and the harvesting of 14 descriptors, which were then used to build a random forest regression model. This model showed the emergence of several key design features, such as Cr-N distance, Cr-α distance, and Cr distance out of pocket, which were then used to rapidly design a new generation of Cr(P,N) catalyst ligands that are predicted to give >95% selectivity for 1-octene.
Project description:Simulating the nuclear-electronic quantum dynamics of large-scale molecular systems in the condensed phase is key for studying biologically and chemically important processes such as proton transfer and proton-coupled electron transfer reactions. Herein, the real-time nuclear-electronic orbital time-dependent density functional theory (RT-NEO-TDDFT) approach is combined with a hybrid quantum mechanical/molecular mechanical (QM/MM) strategy to enable the accurate description of coupled nuclear-electronic quantum dynamics in the presence of heterogeneous environments such as solvent or proteins. The densities of the electrons and quantum protons are propagated in real time, while the other nuclei are propagated classically on the instantaneous electron-proton vibronic surface. This approach is applied to phenol bound to lysozyme, intramolecular proton transfer in malonaldehyde, and nonequilibrium excited-state intramolecular proton transfer in o-hydroxybenzaldehyde. These examples illustrate that the RT-NEO-TDDFT framework, coupled with an atomistic representation of the environment, allows the simulation of condensed-phase systems that exhibit significant nuclear quantum effects.
Project description:ContextRotation about a chemical bond is important in many chemical processes and can be influenced by neighboring substituents on a molecule. Rotational energy barriers can be predicted by density functional theory (DFT) calculations. Here, we specifically explore how substituents influence the barrier to rotation about the C-O bond in symmetrically halogenated aromatic alcohols. A machine learning model was trained on the DFT-calculated rotational energies and was found to do a good job predicting rotational energy barriers from the electronegativity, atomic radius, and Hammett constant for each substituent. The machine learning model was found to perform better when it was trained separately on pyrenols, anthranols, or phenols than when it was trained on all classes of compounds together. Even though the models were trained on compounds containing only one kind of substituent, they were found to perform similarly well on compounds containing mixed substituents. Machine learning was able to predict the rotational energy barrier heights better than correlations among parameters that would be expected to be relevant based on chemical intuition.MethodsDFT calculations were done with Gaussian 16 software at the B3LYP/6-311 + G(d.p) level of theory. Machine learning was done using the classification and regression training (caret) package in R version 4.4.0.
Project description:Diabetes is a metabolic disorder that affects more than 420 million of people worldwide, and it is caused by the presence of a high level of sugar in blood for a long period. Diabetes can have serious long-term health consequences, such as cardiovascular diseases, strokes, chronic kidney diseases, foot ulcers, retinopathy, and others. Even if common, this disease is uneasy to spot, because it often comes with no symptoms. Especially for diabetes type 2, that happens mainly in the adults, knowing how long the diabetes has been present for a patient can have a strong impact on the treatment they can receive. This information, although pivotal, might be absent: for some patients, in fact, the year when they received the diabetes diagnosis might be well-known, but the year of the disease unset might be unknown. In this context, machine learning applied to electronic health records can be an effective tool to predict the past duration of diabetes for a patient. In this study, we applied a regression analysis based on several computational intelligence methods to a dataset of electronic health records of 73 patients with diabetes type 1 with 20 variables and another dataset of records of 400 patients of diabetes type 2 with 49 variables. Among the algorithms applied, Random Forests was able to outperform the other ones and to efficiently predict diabetes duration for both the cohorts, with the regression performances measured through the coefficient of determination R2. Afterwards, we applied the same method for feature ranking, and we detected the most relevant factors of the clinical records correlated with past diabetes duration: age, insulin intake, and body-mass index. Our study discoveries can have profound impact on clinical practice: when the information about the duration of diabetes of patient is missing, medical doctors can use our tool and focus on age, insulin intake, and body-mass index to infer this important aspect. Regarding limitations, unfortunately we were unable to find additional dataset of EHRs of patients with diabetes having the same variables of the two analyzed here, so we could not verify our findings on a validation cohort.
Project description:Objectives: To assess the accuracy of machine learning models in predicting kidney stone composition using variables extracted from the electronic health record (EHR). Materials and Methods: We identified kidney stone patients (n = 1296) with both stone composition and 24-hour (24H) urine testing. We trained machine learning models (XGBoost [XG] and logistic regression [LR]) to predict stone composition using 24H urine data and EHR-derived demographic and comorbidity data. Models predicted either binary (calcium vs noncalcium stone) or multiclass (calcium oxalate, uric acid, hydroxyapatite, or other) stone types. We evaluated performance using area under the receiver operating curve (ROC-AUC) and accuracy and identified predictors for each task. Results: For discriminating binary stone composition, XG outperformed LR with higher accuracy (91% vs 71%) with ROC-AUC of 0.80 for both models. Top predictors used by these models were supersaturations of uric acid and calcium phosphate, and urinary ammonium. For multiclass classification, LR outperformed XG with higher accuracy (0.64 vs 0.56) and ROC-AUC (0.79 vs 0.59), and urine pH had the highest predictive utility. Overall, 24H urine analyte data contributed more to the models' predictions of stone composition than EHR-derived variables. Conclusion: Machine learning models can predict calcium stone composition. LR outperforms XG in multiclass stone classification. Demographic and comorbidity data are predictive of stone composition; however, including 24H urine data improves performance. Further optimization of performance could lead to earlier directed medical therapy for kidney stone patients.
Project description:Quantum machine learning is often highlighted as one of the most promising practical applications for which quantum computers could provide a computational advantage. However, a major obstacle to the widespread use of quantum machine learning models in practice is that these models, even once trained, still require access to a quantum computer in order to be evaluated on new data. To solve this issue, we introduce a class of quantum models where quantum resources are only required during training, while the deployment of the trained model is classical. Specifically, the training phase of our models ends with the generation of a 'shadow model' from which the classical deployment becomes possible. We prove that: (i) this class of models is universal for classically-deployed quantum machine learning; (ii) it does have restricted learning capacities compared to 'fully quantum' models, but nonetheless (iii) it achieves a provable learning advantage over fully classical learners, contingent on widely believed assumptions in complexity theory. These results provide compelling evidence that quantum machine learning can confer learning advantages across a substantially broader range of scenarios, where quantum computers are exclusively employed during the training phase. By enabling classical deployment, our approach facilitates the implementation of quantum machine learning models in various practical contexts.