Unknown

Dataset Information

0

Multi-modal deep learning enables efficient and accurate annotation of enzymatic active sites.


ABSTRACT: Annotating active sites in enzymes is crucial for advancing multiple fields including drug discovery, disease research, enzyme engineering, and synthetic biology. Despite the development of numerous automated annotation algorithms, a significant trade-off between speed and accuracy limits their large-scale practical applications. We introduce EasIFA, an enzyme active site annotation algorithm that fuses latent enzyme representations from the Protein Language Model and 3D structural encoder, and then aligns protein-level information with the knowledge of enzymatic reactions using a multi-modal cross-attention framework. EasIFA outperforms BLASTp with a 10-fold speed increase and improved recall, precision, f1 score, and MCC by 7.57%, 13.08%, 9.68%, and 0.1012, respectively. It also surpasses empirical-rule-based algorithm and other state-of-the-art deep learning annotation method based on PSSM features, achieving a speed increase ranging from 650 to 1400 times while enhancing annotation quality. This makes EasIFA a suitable replacement for conventional tools in both industrial and academic settings. EasIFA can also effectively transfer knowledge gained from coarsely annotated enzyme databases to smaller, high-precision datasets, highlighting its ability to model sparse and high-quality databases. Additionally, EasIFA shows potential as a catalytic site monitoring tool for designing enzymes with desired functions beyond their natural distribution.

SUBMITTER: Wang X 

PROVIDER: S-EPMC11347633 | biostudies-literature | 2024 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

Multi-modal deep learning enables efficient and accurate annotation of enzymatic active sites.

Wang Xiaorui X   Yin Xiaodan X   Jiang Dejun D   Zhao Huifeng H   Wu Zhenxing Z   Zhang Odin O   Wang Jike J   Li Yuquan Y   Deng Yafeng Y   Liu Huanxiang H   Luo Pei P   Han Yuqiang Y   Hou Tingjun T   Yao Xiaojun X   Hsieh Chang-Yu CY  

Nature communications 20240827 1


Annotating active sites in enzymes is crucial for advancing multiple fields including drug discovery, disease research, enzyme engineering, and synthetic biology. Despite the development of numerous automated annotation algorithms, a significant trade-off between speed and accuracy limits their large-scale practical applications. We introduce EasIFA, an enzyme active site annotation algorithm that fuses latent enzyme representations from the Protein Language Model and 3D structural encoder, and  ...[more]

Similar Datasets

| S-EPMC5499748 | biostudies-literature
| S-EPMC10780737 | biostudies-literature
| S-EPMC9301586 | biostudies-literature
| S-EPMC11890286 | biostudies-literature
| S-EPMC4889935 | biostudies-literature
| S-EPMC9556968 | biostudies-literature
| S-EPMC8076954 | biostudies-literature
| S-EPMC8560090 | biostudies-literature
| S-EPMC8501087 | biostudies-literature
| S-EPMC4709141 | biostudies-literature