Unknown

Dataset Information

0

Highly accurate protein structure prediction for the human proteome.


ABSTRACT: Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure1. Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold2, at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective.

SUBMITTER: Tunyasuvunakool K 

PROVIDER: S-EPMC8387240 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC9367171 | biostudies-literature
| S-EPMC8371605 | biostudies-literature
| S-EPMC8949011 | biostudies-literature
| S-EPMC4575526 | biostudies-literature
| S-EPMC7804018 | biostudies-literature
| S-EPMC3686981 | biostudies-literature
| S-EPMC4747591 | biostudies-literature
| S-EPMC2712621 | biostudies-literature
| S-EPMC8898000 | biostudies-literature
| S-EPMC10719378 | biostudies-literature