Dataset Information

Highly accurate protein structure prediction for the human proteome.

ABSTRACT: Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure¹. Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold², at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective.

SUBMITTER: Tunyasuvunakool K

PROVIDER: S-EPMC8387240 | biostudies-literature | 2021 Aug

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Highly accurate protein structure prediction for the human proteome.

Tunyasuvunakool Kathryn K Adler Jonas J Wu Zachary Z Green Tim T Zielinski Michal M Žídek Augustin A Bridgland Alex A Cowie Andrew A Meyer Clemens C Laydon Agata A Velankar Sameer S Kleywegt Gerard J GJ Bateman Alex A Evans Richard R Pritzel Alexander A Figurnov Michael M Ronneberger Olaf O Bates Russ R Kohl Simon A A SAA Potapenko Anna A Ballard Andrew J AJ Romera-Paredes Bernardino B Nikolov Stanislav S Jain Rishub R Clancy Ellen E Reiman David D Petersen Stig S Senior Andrew W AW Kavukcuoglu Koray K Birney Ewan E Kohli Pushmeet P Jumper John J Hassabis Demis D

Nature 20210722 7873

Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure<sup>1</sup>. Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold<sup>2</sup>, at a scale t ...[more]

PMID: 34293799

Dataset Information

Highly accurate protein structure prediction for the human proteome.

Publications

Highly accurate protein structure prediction for the human proteome.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Highly accurate protein structure prediction and drug screen of monkeypox virus proteome.
| S-EPMC9367171 | biostudies-literature

Highly accurate protein structure prediction with AlphaFold.
| S-EPMC8371605 | biostudies-literature

Rethinking Protein Drug Design with Highly Accurate Structure Prediction of Anti-CRISPR Proteins.
| S-EPMC8949011 | biostudies-literature

Accurate Prediction of Docked Protein Structure Similarity.
| S-EPMC4575526 | biostudies-literature

Accurate protein structure prediction with hydroxyl radical protein footprinting data.
| S-EPMC7804018 | biostudies-literature

Accurate structure prediction of peptide-MHC complexes for identifying highly immunogenic antigens.
| S-EPMC3686981 | biostudies-literature

MHTAPred-SS: A Highly Targeted Autoencoder-Driven Deep Multi-Task Learning Framework for Accurate Protein Secondary Structure Prediction
| S-EPMC11677681 | biostudies-literature

Highly Accurate Structure-Based Prediction of HIV-1 Coreceptor Usage Suggests Intermolecular Interactions Driving Tropism.
| S-EPMC4747591 | biostudies-literature

Accurate protein function prediction via graph attention networks with predicted structure information.
| S-EPMC8898000 | biostudies-literature

Solvent accessible surface area approximations for rapid and accurate protein structure prediction.
| S-EPMC2712621 | biostudies-literature