Unknown

Dataset Information

0

SAVI, in silico generation of billions of easily synthesizable compounds through expert-system type rules.


ABSTRACT: We have made available a database of over 1 billion compounds predicted to be easily synthesizable, called Synthetically Accessible Virtual Inventory (SAVI). They have been created by a set of transforms based on an adaptation and extension of the CHMTRN/PATRAN programming languages describing chemical synthesis expert knowledge, which originally stem from the LHASA project. The chemoinformatics toolkit CACTVS was used to apply a total of 53 transforms to about 150,000 readily available building blocks (enamine.net). Only single-step, two-reactant syntheses were calculated for this database even though the technology can execute multi-step reactions. The possibility to incorporate scoring systems in CHMTRN allowed us to subdivide the database of 1.75 billion compounds in sets according to their predicted synthesizability, with the most-synthesizable class comprising 1.09 billion synthetic products. Properties calculated for all SAVI products show that the database should be well-suited for drug discovery. It is being made publicly available for free download from https://doi.org/10.35115/37n9-5738.

SUBMITTER: Patel H 

PROVIDER: S-EPMC7658252 | biostudies-literature | 2020 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

SAVI, in silico generation of billions of easily synthesizable compounds through expert-system type rules.

Patel Hitesh H   Ihlenfeldt Wolf-Dietrich WD   Judson Philip N PN   Moroz Yurii S YS   Pevzner Yuri Y   Peach Megan L ML   Delannée Victorien V   Tarasova Nadya I NI   Nicklaus Marc C MC  

Scientific data 20201111 1


We have made available a database of over 1 billion compounds predicted to be easily synthesizable, called Synthetically Accessible Virtual Inventory (SAVI). They have been created by a set of transforms based on an adaptation and extension of the CHMTRN/PATRAN programming languages describing chemical synthesis expert knowledge, which originally stem from the LHASA project. The chemoinformatics toolkit CACTVS was used to apply a total of 53 transforms to about 150,000 readily available building  ...[more]

Similar Datasets

| S-EPMC1115758 | biostudies-literature
| S-EPMC4416469 | biostudies-literature
| S-EPMC10498440 | biostudies-literature
| S-EPMC8906994 | biostudies-literature
| PRJEB9810 | ENA
| S-EPMC4580995 | biostudies-literature
| S-EPMC4373329 | biostudies-literature
| S-EPMC9206064 | biostudies-literature
| S-EPMC3439531 | biostudies-literature
| S-EPMC3371040 | biostudies-literature