Unknown

Dataset Information

0

TopoFormer: Multiscale Topology-enabled Structure-to-Sequence Transformer for Protein-Ligand Interaction Predictions.


ABSTRACT: Pre-trained deep Transformers have had tremendous success in a wide variety of disciplines. However, in computational biology, essentially all Transformers are built upon the biological sequences, which ignores vital stereochemical information and may result in crucial errors in downstream predictions. On the other hand, three-dimensional (3D) molecular structures are incompatible with the sequential architecture of Transformer and natural language processing (NLP) models in general. This work addresses this foundational challenge by a topological Transformer (TopoFormer). TopoFormer is built by integrating NLP and a multiscale topology techniques, the persistent topological hyperdigraph Laplacian (PTHL), which systematically converts intricate 3D protein-ligand complexes at various spatial scales into a NLP-admissible sequence of topological invariants and homotopic shapes. Element-specific PTHLs are further developed to embed crucial physical, chemical, and biological interactions into topological sequences. TopoFormer surges ahead of conventional algorithms and recent deep learning variants and gives rise to exemplary scoring accuracy and superior performance in ranking, docking, and screening tasks in a number of benchmark datasets. The proposed topological sequences can be extracted from all kinds of structural data in data science to facilitate various NLP models, heralding a new era in AI-driven discovery.

SUBMITTER: Chen D 

PROVIDER: S-EPMC10889053 | biostudies-literature | 2024 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

TopoFormer: Multiscale Topology-enabled Structure-to-Sequence Transformer for Protein-Ligand Interaction Predictions.

Chen Dong D   Liu Jian J   Wei Guo-Wei GW  

Research square 20240209


Pre-trained deep Transformers have had tremendous success in a wide variety of disciplines. However, in computational biology, essentially all Transformers are built upon the biological sequences, which ignores vital stereochemical information and may result in crucial errors in downstream predictions. On the other hand, three-dimensional (3D) molecular structures are incompatible with the sequential architecture of Transformer and natural language processing (NLP) models in general. This work a  ...[more]

Similar Datasets

2023-01-08 | GSE211000 | GEO
2023-01-08 | GSE210999 | GEO
2023-01-08 | GSE210998 | GEO
| S-EPMC9250521 | biostudies-literature
| S-EPMC9945430 | biostudies-literature
| S-EPMC11760531 | biostudies-literature
| S-EPMC3154634 | biostudies-literature
| S-EPMC1988853 | biostudies-literature
| S-EPMC11767038 | biostudies-literature
| S-EPMC5873459 | biostudies-literature