Unknown

Dataset Information

0

A knowledge-guided pre-training framework for improving molecular representation learning.


ABSTRACT: Learning effective molecular feature representation to facilitate molecular property prediction is of great significance for drug discovery. Recently, there has been a surge of interest in pre-training graph neural networks (GNNs) via self-supervised learning techniques to overcome the challenge of data scarcity in molecular property prediction. However, current self-supervised learning-based methods suffer from two main obstacles: the lack of a well-defined self-supervised learning strategy and the limited capacity of GNNs. Here, we propose Knowledge-guided Pre-training of Graph Transformer (KPGT), a self-supervised learning framework to alleviate the aforementioned issues and provide generalizable and robust molecular representations. The KPGT framework integrates a graph transformer specifically designed for molecular graphs and a knowledge-guided pre-training strategy, to fully capture both structural and semantic knowledge of molecules. Through extensive computational tests on 63 datasets, KPGT exhibits superior performance in predicting molecular properties across various domains. Moreover, the practical applicability of KPGT in drug discovery has been validated by identifying potential inhibitors of two antitumor targets: hematopoietic progenitor kinase 1 (HPK1) and fibroblast growth factor receptor 1 (FGFR1). Overall, KPGT can provide a powerful and useful tool for advancing the artificial intelligence (AI)-aided drug discovery process.

SUBMITTER: Li H 

PROVIDER: S-EPMC10663446 | biostudies-literature | 2023 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

A knowledge-guided pre-training framework for improving molecular representation learning.

Li Han H   Zhang Ruotian R   Min Yaosen Y   Ma Dacheng D   Zhao Dan D   Zeng Jianyang J  

Nature communications 20231121 1


Learning effective molecular feature representation to facilitate molecular property prediction is of great significance for drug discovery. Recently, there has been a surge of interest in pre-training graph neural networks (GNNs) via self-supervised learning techniques to overcome the challenge of data scarcity in molecular property prediction. However, current self-supervised learning-based methods suffer from two main obstacles: the lack of a well-defined self-supervised learning strategy and  ...[more]

Similar Datasets

| S-EPMC11001485 | biostudies-literature
| S-EPMC10140620 | biostudies-literature
| S-EPMC11373321 | biostudies-literature
| S-EPMC8996628 | biostudies-literature
| S-EPMC11557864 | biostudies-literature
| S-EPMC5860058 | biostudies-literature
| S-EPMC5573135 | biostudies-literature
| S-EPMC9044235 | biostudies-literature
| S-EPMC11544137 | biostudies-literature
| S-EPMC8945064 | biostudies-literature