Dataset Information

Bayesian molecular design with a chemical language model.

ABSTRACT: The aim of computational molecular design is the identification of promising hypothetical molecules with a predefined set of desired properties. We address the issue of accelerating the material discovery with state-of-the-art machine learning techniques. The method involves two different types of prediction; the forward and backward predictions. The objective of the forward prediction is to create a set of machine learning models on various properties of a given molecule. Inverting the trained forward models through Bayes' law, we derive a posterior distribution for the backward prediction, which is conditioned by a desired property requirement. Exploring high-probability regions of the posterior with a sequential Monte Carlo technique, molecules that exhibit the desired properties can computationally be created. One major difficulty in the computational creation of molecules is the exclusion of the occurrence of chemically unfavorable structures. To circumvent this issue, we derive a chemical language model that acquires commonly occurring patterns of chemical fragments through natural language processing of ASCII strings of existing compounds, which follow the SMILES chemical language notation. In the backward prediction, the trained language model is used to refine chemical strings such that the properties of the resulting structures fall within the desired property region while chemically unfavorable structures are successfully removed. The present method is demonstrated through the design of small organic molecules with the property requirements on HOMO-LUMO gap and internal energy. The R package iqspr is available at the CRAN repository.

SUBMITTER: Ikebata H

PROVIDER: S-EPMC5393296 | biostudies-literature | 2017 Apr

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Bayesian molecular design with a chemical language model.

Ikebata Hisaki H Hongo Kenta K Isomura Tetsu T Maezono Ryo R Yoshida Ryo R

Journal of computer-aided molecular design 20170309 4

The aim of computational molecular design is the identification of promising hypothetical molecules with a predefined set of desired properties. We address the issue of accelerating the material discovery with state-of-the-art machine learning techniques. The method involves two different types of prediction; the forward and backward predictions. The objective of the forward prediction is to create a set of machine learning models on various properties of a given molecule. Inverting the trained ...[more]

PMID: 28281211

Similar Datasets

Project description:BACKGROUND:The basket trial evaluates the treatment effect of a targeted therapy in patients with the same genetic or molecular aberration, regardless of their cancer types. Bayesian hierarchical modeling has been proposed to adaptively borrow information across cancer types to improve the statistical power of basket trials. Although conceptually attractive, research has shown that Bayesian hierarchical models cannot appropriately determine the degree of information borrowing and may lead to substantially inflated type I error rates. METHODS:We propose a novel calibrated Bayesian hierarchical model approach to evaluate the treatment effect in basket trials. In our approach, the shrinkage parameter that controls information borrowing is not regarded as an unknown parameter. Instead, it is defined as a function of a similarity measure of the treatment effect across tumor subgroups. The key is that the function is calibrated using simulation such that information is strongly borrowed across subgroups if their treatment effects are similar and barely borrowed if the treatment effects are heterogeneous. RESULTS:The simulation study shows that our method has substantially better controlled type I error rates than the Bayesian hierarchical model. In some scenarios, for example, when the true response rate is between the null and alternative, the type I error rate of the proposed method can be inflated from 10% up to 20%, but is still better than that of the Bayesian hierarchical model. LIMITATION:The proposed design assumes a binary endpoint. Extension of the proposed design to ordinal and time-to-event endpoints is worthy of further investigation. CONCLUSION:The calibrated Bayesian hierarchical model provides a practical approach to design basket trials with more flexibility and better controlled type I error rates than the Bayesian hierarchical model. The software for implementing the proposed design is available at http://odin.mdacc.tmc.edu/~yyuan/index_code.html.

Dataset Information

Bayesian molecular design with a chemical language model.

Publications

Bayesian molecular design with a chemical language model.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets