Unknown

Dataset Information

0

SPOT-Contact-LM: improving single-sequence-based prediction of protein contact map using a transformer language model.


ABSTRACT:

Motivation

Accurate prediction of protein contact-map is essential for accurate protein structure and function prediction. As a result, many methods have been developed for protein contact map prediction. However, most methods rely on protein-sequence-evolutionary information, which may not exist for many proteins due to lack of naturally occurring homologous sequences. Moreover, generating evolutionary profiles is computationally intensive. Here, we developed a contact-map predictor utilizing the output of a pre-trained language model ESM-1b as an input along with a large training set and an ensemble of residual neural networks.

Results

We showed that the proposed method makes a significant improvement over a single-sequence-based predictor SSCpred with 15% improvement in the F1-score for the independent CASP14-FM test set. It also outperforms evolutionary-profile-based methods trRosetta and SPOT-Contact with 48.7% and 48.5% respective improvement in the F1-score on the proteins without homologs (Neff = 1) in the independent SPOT-2018 set. The new method provides a much faster and reasonably accurate alternative to evolution-based methods, useful for large-scale prediction.

Availability and implementation

Stand-alone-version of SPOT-Contact-LM is available at https://github.com/jas-preet/SPOT-Contact-Single. Direct prediction can also be made at https://sparks-lab.org/server/spot-contact-single. The datasets used in this research can also be downloaded from the GitHub.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Singh J 

PROVIDER: S-EPMC9113311 | biostudies-literature | 2022 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

SPOT-Contact-LM: improving single-sequence-based prediction of protein contact map using a transformer language model.

Singh Jaspreet J   Litfin Thomas T   Singh Jaswinder J   Paliwal Kuldip K   Zhou Yaoqi Y  

Bioinformatics (Oxford, England) 20220301 7


<h4>Motivation</h4>Accurate prediction of protein contact-map is essential for accurate protein structure and function prediction. As a result, many methods have been developed for protein contact map prediction. However, most methods rely on protein-sequence-evolutionary information, which may not exist for many proteins due to lack of naturally occurring homologous sequences. Moreover, generating evolutionary profiles is computationally intensive. Here, we developed a contact-map predictor uti  ...[more]

Similar Datasets

| S-EPMC11520403 | biostudies-literature
| S-EPMC8504630 | biostudies-literature
| S-EPMC3154634 | biostudies-literature
| S-EPMC10990103 | biostudies-literature
| S-EPMC10699601 | biostudies-literature
| S-EPMC7869490 | biostudies-literature
| S-EPMC3463120 | biostudies-literature
| S-EPMC4191875 | biostudies-literature
2024-09-13 | GSE262953 | GEO
| S-EPMC10440047 | biostudies-literature