Ontology highlight
ABSTRACT:
SUBMITTER: Arus-Pous J
PROVIDER: S-EPMC6419837 | biostudies-literature | 2019 Mar
REPOSITORIES: biostudies-literature
Arús-Pous Josep J Blaschke Thomas T Ulander Silas S Reymond Jean-Louis JL Chen Hongming H Engkvist Ola O
Journal of cheminformatics 20190312 1
Recent applications of recurrent neural networks (RNN) enable training models that sample the chemical space. In this study we train RNN with molecular string representations (SMILES) with a subset of the enumerated database GDB-13 (975 million molecules). We show that a model trained with 1 million structures (0.1% of the database) reproduces 68.9% of the entire database after training, when sampling 2 billion molecules. We also developed a method to assess the quality of the training process u ...[more]