Dataset Information

On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins.

ABSTRACT: Coevolution-based contact prediction, either directly by coevolutionary couplings resulting from global statistical sequence models or using structural supervision and deep learning, has found widespread application in protein-structure prediction from sequence. However, one of the basic assumptions in global statistical modeling is that sequences form an at least approximately independent sample of an unknown probability distribution, which is to be learned from data. In the case of protein families, this assumption is obviously violated by phylogenetic relations between protein sequences. It has turned out to be notoriously difficult to take phylogenetic correlations into account in coevolutionary model learning. Here, we propose a complementary approach: we develop strategies to randomize or resample sequence data, such that conservation patterns and phylogenetic relations are preserved, while intrinsic (i.e. structure- or function-based) coevolutionary couplings are removed. A comparison between the results of Direct Coupling Analysis applied to real and to resampled data shows that the largest coevolutionary couplings, i.e. those used for contact prediction, are only weakly influenced by phylogeny. However, the phylogeny-induced spurious couplings in the resampled data are compatible in size with the first false-positive contact predictions from real data. Dissecting functional from phylogeny-induced couplings might therefore extend accurate contact predictions to the range of intermediate-size couplings.

SUBMITTER: Rodriguez Horta E

PROVIDER: S-EPMC8177639 | biostudies-literature | 2021 May

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins.

Rodriguez Horta Edwin E Weigt Martin M

PLoS computational biology 20210524 5

Coevolution-based contact prediction, either directly by coevolutionary couplings resulting from global statistical sequence models or using structural supervision and deep learning, has found widespread application in protein-structure prediction from sequence. However, one of the basic assumptions in global statistical modeling is that sequences form an at least approximately independent sample of an unknown probability distribution, which is to be learned from data. In the case of protein fam ...[more]

PMID: 34029316

Dataset Information

On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins.

Publications

On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Efficient prediction of co-complexed proteins based on coevolution.
| S-EPMC3494725 | biostudies-literature

Enhancing coevolution-based contact prediction by imposing structural self-consistency of the contacts.
| S-EPMC6057941 | biostudies-literature

FilterDCA: Interpretable supervised contact prediction using inter-domain coevolution.
| S-EPMC7577475 | biostudies-literature

Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning.
| S-EPMC5820155 | biostudies-literature

Enhanced Inter-helical Residue Contact Prediction in Transmembrane Proteins.
| S-EPMC3164537 | biostudies-literature

Phylogenetic analysis of ABCG subfamily proteins in plants: functional clustering and coevolution with ABCGs of pathogens.
| S-EPMC8359288 | biostudies-literature

Protein inter-residue contact and distance prediction by coupling complementary coevolution features with deep residual networks in CASP14.
| S-EPMC8616805 | biostudies-literature

Coevolution-based prediction of key allosteric residues for protein function regulation.
| S-EPMC9981151 | biostudies-literature

Extending the Horizon of Homology Detection with Coevolution-based Structure Prediction.
| S-EPMC8527833 | biostudies-literature

Coevolution of interacting fertilization proteins.
| S-EPMC2704960 | biostudies-literature