Project description:Recent advances in Artificial Intelligence and Machine Learning (e.g., AlphaFold, RosettaFold, and ESMFold) enable prediction of three-dimensional (3D) protein structures from amino acid sequences alone at accuracies comparable to lower-resolution experimental methods. These tools have been employed to predict structures across entire proteomes and the results of large-scale metagenomic sequence studies, yielding an exponential increase in available biomolecular 3D structural information. Given the enormous volume of this newly computed biostructure data, there is an urgent need for robust tools to manage, search, cluster, and visualize large collections of structures. Equally important is the capability to efficiently summarize and visualize metadata, biological/biochemical annotations, and structural features, particularly when working with vast numbers of protein structures of both experimental origin from the Protein Data Bank (PDB) and computationally-predicted models. Moreover, researchers require advanced visualization techniques that support interactive exploration of multiple sequences and structural alignments. This paper introduces a suite of tools provided on the RCSB PDB research-focused web portal RCSB. org, tailor-made for efficient management, search, organization, and visualization of this burgeoning corpus of 3D macromolecular structure data.
Project description:More than half of all structures in the PDB are assemblies of two or more proteins, including both homooligomers and heterooligomers. Structural information on these assemblies comes from X-ray crystallography, NMR, and cryo-EM spectroscopy. The correct assembly in an X-ray structure is often ambiguous, and computational methods have been developed to identify the most likely biologically relevant assembly based on physical properties of assemblies and sequence conservation in interfaces. Taking advantage of the large number of structures now available, some of the most recent methods have relied on similarity of interfaces and assemblies across structures of homologous proteins.
Project description:The accuracy of B factors in protein crystal structures has been determined by comparing the same atoms in numerous, independent crystal structures of Gallus gallus lysozyme. Both B-factor absolute differences and normal probability plots indicate that the estimated B-factor errors are quite large, close to 9 Å2 in ambient-temperature structures and to 6 Å2 in low-temperature structures, and surprisingly are comparable to values estimated two decades ago. It is well known that B factors are not due to local movements only but reflect several, additional factors from crystal defects, large-scale disorder, diffraction data quality etc. It therefore remains essential to normalize B factors when comparing different crystal structures, although it has clearly been shown that they provide useful information about protein dynamics. Improved, quantitative analyses of raw B factors require novel experimental and computational tools that are able to disaggregate local movements from other features and properties that affect B factors.
Project description:Compound promiscuity is often attributed to nonspecific binding or assay artifacts. On the other hand, it is well-known that many pharmaceutically relevant compounds are capable of engaging multiple targets in vivo, giving rise to polypharmacology. To explore and better understand promiscuous binding characteristics of small molecules, we have searched X-ray structures (and very few qualifying solution structures) for ligands that bind to multiple distantly related or unrelated target proteins. Experimental structures of a given ligand bound to different targets represent high-confidence data for exploring promiscuous binding events. A total of 192 ligands were identified that formed crystallographic complexes with proteins from different families and for which activity data were available. These "multifamily" compounds included endogenous ligands and were often more polar than other bound compounds and active in the submicromolar range. Unexpectedly, many promiscuous ligands displayed conserved or similar binding conformations in different active sites. Others were found to conformationally adjust to binding sites of different architectures. A comprehensive analysis of ligand-target interactions revealed that multifamily ligands frequently formed different interaction hotspots in binding sites, even if their bound conformations were similar, thus providing a rationale for promiscuous binding events at the molecular level of detail. As a part of this work, all multifamily ligands we have identified and associated activity data are made freely available.
Project description:In the recent Critical Assessment of Structure Prediction (CASP) competition, AlphaFold2 performed outstandingly. Its worst predictions were for nuclear magnetic resonance (NMR) structures, which has two alternative explanations: either the NMR structures were poor, implying that Alpha-Fold may be more accurate than NMR, or there is a genuine difference between crystal and solution structures. Here, we use the program Accuracy of NMR Structures Using RCI and Rigidity (ANSURR), which measures the accuracy of solution structures, and show that one of the NMR structures was indeed poor. We then compare Alpha-Fold predictions to NMR structures and show that Alpha-Fold tends to be more accurate than NMR ensembles. There are, however, some cases where the NMR ensembles are more accurate. These tend to be dynamic structures, where Alpha-Fold had low confidence. We suggest that Alpha-Fold could be used as the model for NMR-structure refinements and that Alpha-Fold structures validated by ANSURR may require no further refinement.
Project description:Energy landscapes have been used to conceptually describe and model protein folding but have been difficult to measure experimentally, in large part because of the myriad of partly folded protein conformations that cannot be isolated and thermodynamically characterized. Here we experimentally determine a detailed energy landscape for protein folding. We generated a series of overlapping constructs containing subsets of the seven ankyrin repeats of the Drosophila Notch receptor, a protein domain whose linear arrangement of modular structural units can be fragmented without disrupting structure. To a good approximation, stabilities of each construct can be described as a sum of energy terms associated with each repeat. The magnitude of each energy term indicates that each repeat is intrinsically unstable but is strongly stabilized by interactions with its nearest neighbors. These linear energy terms define an equilibrium free energy landscape, which shows an early free energy barrier and suggests preferred low-energy routes for folding.
Project description:The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB), founding member of the Worldwide Protein Data Bank (wwPDB), is the US data center for the open-access PDB archive. As wwPDB-designated Archive Keeper, RCSB PDB is also responsible for PDB data security. Annually, RCSB PDB serves >10 000 depositors of three-dimensional (3D) biostructures working on all permanently inhabited continents. RCSB PDB delivers data from its research-focused RCSB.org web portal to many millions of PDB data consumers based in virtually every United Nations-recognized country, territory, etc. This Database Issue contribution describes upgrades to the research-focused RCSB.org web portal that created a one-stop-shop for open access to ∼200 000 experimentally-determined PDB structures of biological macromolecules alongside >1 000 000 incorporated Computed Structure Models (CSMs) predicted using artificial intelligence/machine learning methods. RCSB.org is a 'living data resource.' Every PDB structure and CSM is integrated weekly with related functional annotations from external biodata resources, providing up-to-date information for the entire corpus of 3D biostructure data freely available from RCSB.org with no usage limitations. Within RCSB.org, PDB structures and the CSMs are clearly identified as to their provenance and reliability. Both are fully searchable, and can be analyzed and visualized using the full complement of RCSB.org web portal capabilities.
Project description:Determination of structure of RNA via NMR is complicated in large part by the lack of a precise parameterization linking the observed chemical shifts to the underlying geometric parameters. In contrast to proteins, where numerous high-resolution crystal structures serve as coordinate templates for this mapping, such models are rarely available for smaller oligonucleotides accessible via NMR, or they exhibit crystal packing and counter-ion binding artifacts that prevent their use for the chemical shifts analysis. On the other hand, NMR-determined structures of RNA often are not solved at the density of restraints required to precisely define the variable degrees of freedom. In this study we sidestep the problems of direct parameterization of the RNA chemical shifts/structure relationship and examine the effects of imposing local fragmental coordinate similarity restraints based on similarities of the experimental secondary ribose 13C/1H chemical shifts instead. The effect of such chemical shift similarity (CSS) restraints on the structural accuracy is assessed via residual dipolar coupling (RDC)-based cross-validation. Improvements in the coordinate accuracy are observed for all of the six RNA constructs considered here as test cases, which argues for routine inclusion of these terms during NMR-based oligonucleotide structure determination. Such accuracy improvements are expected to facilitate derivation of the chemical shift/structure relationships for RNA.
Project description:X-ray crystallography provides the most accurate models of protein-ligand structures. These models serve as the foundation of many computational methods including structure prediction, molecular modelling, and structure-based drug design. The success of these computational methods ultimately depends on the quality of the underlying protein-ligand models. X-ray crystallography offers the unparalleled advantage of a clear mathematical formalism relating the experimental data to the protein-ligand model. In the case of X-ray crystallography, the primary experimental evidence is the electron density of the molecules forming the crystal. The first step in the generation of an accurate and precise crystallographic model is the interpretation of the electron density of the crystal, typically carried out by construction of an atomic model. The atomic model must then be validated for fit to the experimental electron density and also for agreement with prior expectations of stereochemistry. Stringent validation of protein-ligand models has become possible as a result of the mandatory deposition of primary diffraction data, and many computational tools are now available to aid in the validation process. Validation of protein-ligand complexes has revealed some instances of overenthusiastic interpretation of ligand density. Fundamental concepts and metrics of protein-ligand quality validation are discussed and we highlight software tools to assist in this process. It is essential that end users select high quality protein-ligand models for their computational and biological studies, and we provide an overview of how this can be achieved.
Project description:We have found that refinement of protein NMR structures using Rosetta with experimental NMR restraints yields more accurate protein NMR structures than those that have been deposited in the PDB using standard refinement protocols. Using 40 pairs of NMR and X-ray crystal structures determined by the Northeast Structural Genomics Consortium, for proteins ranging in size from 5-22 kDa, restrained Rosetta refined structures fit better to the raw experimental data, are in better agreement with their X-ray counterparts, and have better phasing power compared to conventionally determined NMR structures. For 37 proteins for which NMR ensembles were available and which had similar structures in solution and in the crystal, all of the restrained Rosetta refined NMR structures were sufficiently accurate to be used for solving the corresponding X-ray crystal structures by molecular replacement. The protocol for restrained refinement of protein NMR structures was also compared with restrained CS-Rosetta calculations. For proteins smaller than 10 kDa, restrained CS-Rosetta, starting from extended conformations, provides slightly more accurate structures, while for proteins in the size range of 10-25 kDa the less CPU intensive restrained Rosetta refinement protocols provided equally or more accurate structures. The restrained Rosetta protocols described here can improve the accuracy of protein NMR structures and should find broad and general for studies of protein structure and function.