Dataset Information

DeepECA: an end-to-end learning framework for protein contact prediction from a multiple sequence alignment.

ABSTRACT: BACKGROUND:Recently developed methods of protein contact prediction, a crucially important step for protein structure prediction, depend heavily on deep neural networks (DNNs) and multiple sequence alignments (MSAs) of target proteins. Protein sequences are accumulating to an increasing degree such that abundant sequences to construct an MSA of a target protein are readily obtainable. Nevertheless, many cases present different ends of the number of sequences that can be included in an MSA used for contact prediction. The abundant sequences might degrade prediction results, but opportunities remain for a limited number of sequences to construct an MSA. To resolve these persistent issues, we strove to develop a novel framework using DNNs in an end-to-end manner for contact prediction. RESULTS:We developed neural network models to improve precision of both deep and shallow MSAs. Results show that higher prediction accuracy was achieved by assigning weights to sequences in a deep MSA. Moreover, for shallow MSAs, adding a few sequential features was useful to increase the prediction accuracy of long-range contacts in our model. Based on these models, we expanded our model to a multi-task model to achieve higher accuracy by incorporating predictions of secondary structures and solvent-accessible surface areas. Moreover, we demonstrated that ensemble averaging of our models can raise accuracy. Using past CASP target protein domains, we tested our models and demonstrated that our final model is superior to or equivalent to existing meta-predictors. CONCLUSIONS:The end-to-end learning framework we built can use information derived from either deep or shallow MSAs for contact prediction. Recently, an increasing number of protein sequences have become accessible, including metagenomic sequences, which might degrade contact prediction results. Under such circumstances, our model can provide a means to reduce noise automatically. According to results of tertiary structure prediction based on contacts and secondary structures predicted by our model, more accurate three-dimensional models of a target protein are obtainable than those from existing ECA methods, starting from its MSA. DeepECA is available from https://github.com/tomiilab/DeepECA.

SUBMITTER: Fukuda H

PROVIDER: S-EPMC6953294 | biostudies-literature | 2020 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

DeepECA: an end-to-end learning framework for protein contact prediction from a multiple sequence alignment.

Fukuda Hiroyuki H Tomii Kentaro K

BMC bioinformatics 20200109 1

<h4>Background</h4>Recently developed methods of protein contact prediction, a crucially important step for protein structure prediction, depend heavily on deep neural networks (DNNs) and multiple sequence alignments (MSAs) of target proteins. Protein sequences are accumulating to an increasing degree such that abundant sequences to construct an MSA of a target protein are readily obtainable. Nevertheless, many cases present different ends of the number of sequences that can be included in an MS ...[more]

PMID: 31918654

Dataset Information

DeepECA: an end-to-end learning framework for protein contact prediction from a multiple sequence alignment.

Publications

DeepECA: an end-to-end learning framework for protein contact prediction from a multiple sequence alignment.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning.
| S-EPMC5820155 | biostudies-literature

CopulaNet: Learning residue co-evolution directly from multiple sequence alignment for protein structure prediction.
| S-EPMC8100175 | biostudies-literature

Protein multiple sequence alignment benchmarking through secondary structure prediction.
| S-EPMC5408826 | biostudies-other

Improving protein structure prediction using multiple sequence-based contact predictions.
| S-EPMC3154634 | biostudies-literature

DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins.
| S-EPMC7141871 | biostudies-literature

DeepSol: a deep learning framework for sequence-based protein solubility prediction.
| S-EPMC6355112 | biostudies-literature

Contact-based sequence alignment.
| S-EPMC419454 | biostudies-literature

A layout framework for genome-wide multiple sequence alignment graphs.
| S-EPMC11362851 | biostudies-literature

Assessment and refinement of eukaryotic gene structure prediction with gene-structure-aware multiple protein sequence alignment.
| S-EPMC4065584 | biostudies-literature

End-to-end learning of multiple sequence alignments with differentiable Smith-Waterman.
| S-EPMC9805565 | biostudies-literature