Project description:AlphaFold2 (AF2), a deep-learning based model that predicts protein structures from their amino acid sequences, has recently been used to predict multiple protein conformations. In some cases, AF2 has successfully predicted both dominant and alternative conformations of fold-switching proteins, which remodel their secondary and tertiary structures in response to cellular stimuli. Whether AF2 has learned enough protein folding principles to reliably predict alternative conformations outside of its training set is unclear. Here, we address this question by assessing whether CFold-an implementation of the AF2 network trained on a more limited subset of experimentally determined protein structures- predicts alternative conformations of eight fold switchers from six protein families. Previous work suggests that AF2 predicted these alternative conformations by memorizing them during training. Unlike AF2, CFold's training set contains only one of these alternative conformations. Despite sampling 1300-4400 structures/protein with various sequence sampling techniques, CFold predicted only one alternative structure outside of its training set accurately and with high confidence while also generating experimentally inconsistent structures with higher confidence. Though these results indicate that AF2's current success in predicting alternative conformations of fold switchers stems largely from its training data, results from a sequence pruning technique suggest developments that could lead to a more reliable generative model in the future.
Project description:Serine protease inhibitors (serpins) include thousands of structurally conserved proteins playing key roles in many organisms. Mutations affecting serpins may disturb their conformation, leading to inactive forms. Unfortunately, conformational consequences of serpin mutations are difficult to predict. In this study, we integrate experimental data of patients with mutations affecting one serpin with the predictions obtained by AlphaFold and molecular dynamics. Five SERPINC1 mutations causing antithrombin deficiency, the strongest congenital thrombophilia were selected from a cohort of 350 unrelated patients based on functional, biochemical, and crystallographic evidence supporting a folding defect. AlphaFold gave an accurate prediction for the wild-type structure. However, it also produced native structures for all variants, regardless of complexity or conformational consequences in vivo. Similarly, molecular dynamics of up to 1000 ns at temperatures causing conformational transitions did not show significant changes in the native structure of wild-type and variants. In conclusion, AlphaFold and molecular dynamics force predictions into the native conformation at conditions with experimental evidence supporting a conformational change to other structures. It is necessary to improve predictive strategies for serpins that consider the conformational sensitivity of these molecules.
Project description:Although most proteins conform to the classical one-structure/one-function paradigm, an increasing number of proteins with dual structures and functions have been discovered. In response to cellular stimuli, such proteins undergo structural changes sufficiently dramatic to remodel even their secondary structures and domain organization. This "fold-switching" capability fosters protein multi-functionality, enabling cells to establish tight control over various biochemical processes. Accurate predictions of fold-switching proteins could both suggest underlying mechanisms for uncharacterized biological processes and reveal potential drug targets. Recently, we developed a prediction method for fold-switching proteins using structure-based thermodynamic calculations and discrepancies between predicted and experimentally determined protein secondary structure (Porter and Looger, Proc Natl Acad Sci U S A 2018; 115:5968-5973). Here we seek to leverage the negative information found in these secondary structure prediction discrepancies. To do this, we quantified secondary structure prediction accuracies of 192 known fold-switching regions (FSRs) within solved protein structures found in the Protein Data Bank (PDB). We find that the secondary structure prediction accuracies for these FSRs vary widely. Inaccurate secondary structure predictions are strongly associated with fold-switching proteins compared to equally long segments of non-fold-switching proteins selected at random. These inaccurate predictions are enriched in helix-to-strand and strand-to-coil discrepancies. Finally, we find that most proteins with inaccurate secondary structure predictions are underrepresented in the PDB compared with their alternatively folded cognates, suggesting that unequal representation of fold-switching conformers within the PDB could be an important cause of inaccurate secondary structure predictions. These results demonstrate that inconsistent secondary structure predictions can serve as a useful preliminary marker of fold switching.
Project description:Elucidation of the tertiary structure of proteins is an important task for biological and medical studies. AlphaFold, a modern deep-learning algorithm, enables the prediction of protein structure to a high level of accuracy. It has been applied in numerous studies in various areas of biology and medicine. Viruses are biological entities infecting eukaryotic and procaryotic organisms. They can pose a danger for humans and economically significant animals and plants, but they can also be useful for biological control, suppressing populations of pests and pathogens. AlphaFold can be used for studies of molecular mechanisms of viral infection to facilitate several activities, including drug design. Computational prediction and analysis of the structure of bacteriophage receptor-binding proteins can contribute to more efficient phage therapy. In addition, AlphaFold predictions can be used for the discovery of enzymes of bacteriophage origin that are able to degrade the cell wall of bacterial pathogens. The use of AlphaFold can assist fundamental viral research, including evolutionary studies. The ongoing development and improvement of AlphaFold can ensure that its contribution to the study of viral proteins will be significant in the future.
Project description:Artificial intelligence-based protein structure prediction methods such as AlphaFold have revolutionized structural biology. The accuracies of these predictions vary, however, and they do not take into account ligands, covalent modifications or other environmental factors. Here, we evaluate how well AlphaFold predictions can be expected to describe the structure of a protein by comparing predictions directly with experimental crystallographic maps. In many cases, AlphaFold predictions matched experimental maps remarkably closely. In other cases, even very high-confidence predictions differed from experimental maps on a global scale through distortion and domain orientation, and on a local scale in backbone and side-chain conformation. We suggest considering AlphaFold predictions as exceptionally useful hypotheses. We further suggest that it is important to consider the confidence in prediction when interpreting AlphaFold predictions and to carry out experimental structure determination to verify structural details, particularly those that involve interactions not included in the prediction.
Project description:Efficient identification of drug mechanisms of action remains a challenge. Computational docking approaches have been widely used to predict drug binding targets; yet, such approaches depend on existing protein structures, and accurate structural predictions have only recently become available from AlphaFold2. Here, we combine AlphaFold2 with molecular docking simulations to predict protein-ligand interactions between 296 proteins spanning Escherichia coli's essential proteome, and 218 active antibacterial compounds and 100 inactive compounds, respectively, pointing to widespread compound and protein promiscuity. We benchmark model performance by measuring enzymatic activity for 12 essential proteins treated with each antibacterial compound. We confirm extensive promiscuity, but find that the average area under the receiver operating characteristic curve (auROC) is 0.48, indicating weak model performance. We demonstrate that rescoring of docking poses using machine learning-based approaches improves model performance, resulting in average auROCs as large as 0.63, and that ensembles of rescoring functions improve prediction accuracy and the ratio of true-positive rate to false-positive rate. This work indicates that advances in modeling protein-ligand interactions, particularly using machine learning-based approaches, are needed to better harness AlphaFold2 for drug discovery.
Project description:Since the release of AlphaFold, researchers have actively refined its predictions and attempted to integrate it into existing pipelines for determining protein structures. These efforts have introduced a number of functionalities and optimisations at the latest Critical Assessment of protein Structure Prediction edition (CASP15), resulting in a marked improvement in the prediction of multimeric protein structures. However, AlphaFold's capability of predicting large protein complexes is still limited and integrating experimental data in the prediction pipeline is not straightforward. In this study, we introduce AF_unmasked to overcome these limitations. Our results demonstrate that AF_unmasked can integrate experimental information to build larger or hard to predict protein assemblies with high confidence. The resulting predictions can help interpret and augment experimental data. This approach generates high quality (DockQ score > 0.8) structures even when little to no evolutionary information is available and imperfect experimental structures are used as a starting point. AF_unmasked is developed and optimised to fill incomplete experimental structures (structural inpainting), which may provide insights into protein dynamics. In summary, AF_unmasked provides an easy-to-use method that efficiently integrates experiments to predict large protein complexes more confidently.
Project description:MotivationWhile the release of AlphaFold (AF) represented a breakthrough for the prediction of protein complex structures, its sensitivity, especially when using full length protein sequences, still remains limited. Modeling success rates might increase if AF predictions were guided by likely interacting protein fragments. This approach requires available sets of highly confident protein-protein interface types. Computational resources, such as 3did, infer interacting globular domain types from observed contacts in protein structures. Assessing the accuracy of these predicted interface types is difficult because we lack hand-curated reference sets of verified domain-domain interface (DDI) types.ResultsTo improve protein complex modeling of DDIs by AF, we manually inspected 80 randomly selected DDI types from the 3did resource to generate a first reference set of DDI types. Identified cases of DDI type nonapproval (40%) primarily resulted from inaccurate Pfam domain matches, crystal contacts, and synthetic protein constructs. Using logistic regression, we predicted a subset of 2411 out of 5724 considered DDI types in 3did to be of high confidence, which we subsequently applied to 53 000 human-protein interactions to predict DDIs followed by AF modeling. We obtained highly confident AF models for 604 out of 1129 predicted DDIs. Of note, for 47% of them no confident AF structural model could be obtained using full length protein sequences.Availability and implementationCode is available at https://github.com/KatjaLuckLab/DDI_manuscript.
Project description:Accurate protein structure predictions, enabled by recent advances in machine learning algorithms, provide an entry point to probing structural mechanisms and to integrating and querying many types of biochemical and biophysical results. Limitations in such protein structure predictions can be reduced and addressed through comparison to experimental Small Angle X-ray Scattering (SAXS) data that provides protein structural information in solution. SAXS data can not only validate computational predictions, but can improve conformational and assembly prediction to produce atomic models that are consistent with solution data and biologically relevant states. Here, we describe how to obtain protein structure predictions, compare them to experimental SAXS data and improve models to reflect experimental information from SAXS data. Furthermore, we consider the potential for such experimentally-validated protein structure predictions to broadly improve functional annotation in proteins identified in metagenomics and to identify functional clustering on conserved sites despite low sequence homology.