Unknown

Dataset Information

0

Prediction of Compound Synthesis Accessibility Based on Reaction Knowledge Graph.


ABSTRACT: With the increasing application of deep-learning-based generative models for de novo molecule design, the quantitative estimation of molecular synthetic accessibility (SA) has become a crucial factor for prioritizing the structures generated from generative models. It is also useful for helping in the prioritization of hit/lead compounds and guiding retrosynthesis analysis. In this study, based on the USPTO and Pistachio reaction datasets, a chemical reaction network was constructed for the identification of the shortest reaction paths (SRP) needed to synthesize compounds, and different SRP cut-offs were then used as the threshold to distinguish a organic compound as either an easy-to-synthesize (ES) or hard-to-synthesize (HS) class. Two synthesis accessibility models (DNN-ECFP model and graph-based CMPNN model) were built using deep learning/machine learning algorithms. Compared to other existing synthesis accessibility scoring schemes, such as SYBA, SCScore, and SAScore, our results show that CMPNN (ROC AUC: 0.791) performs better than SYBA (ROC AUC: 0.76), albeit marginally, and outperforms SAScore and SCScore. Our prediction models based on historical reaction knowledge could be a potential tool for estimating molecule SA.

SUBMITTER: Li B 

PROVIDER: S-EPMC8838603 | biostudies-literature | 2022 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

Prediction of Compound Synthesis Accessibility Based on Reaction Knowledge Graph.

Li Baiqing B   Chen Hongming H  

Molecules (Basel, Switzerland) 20220203 3


With the increasing application of deep-learning-based generative models for de novo molecule design, the quantitative estimation of molecular synthetic accessibility (SA) has become a crucial factor for prioritizing the structures generated from generative models. It is also useful for helping in the prioritization of hit/lead compounds and guiding retrosynthesis analysis. In this study, based on the USPTO and Pistachio reaction datasets, a chemical reaction network was constructed for the iden  ...[more]

Similar Datasets

| S-EPMC10272164 | biostudies-literature
| S-EPMC9310259 | biostudies-literature
| S-EPMC11920434 | biostudies-literature
| S-EPMC11830800 | biostudies-literature
| S-EPMC11245034 | biostudies-literature
| S-EPMC10731821 | biostudies-literature
2024-09-13 | GSE262953 | GEO
| S-EPMC11602590 | biostudies-literature
| S-EPMC7873847 | biostudies-literature