Using physical features of protein core packing to distinguish real proteins from decoys.
Ontology highlight
ABSTRACT: The ability to consistently distinguish real protein structures from computationally generated model decoys is not yet a solved problem. One route to distinguish real protein structures from decoys is to delineate the important physical features that specify a real protein. For example, it has long been appreciated that the hydrophobic cores of proteins contribute significantly to their stability. We used two sources to obtain datasets of decoys to compare with real protein structures: submissions to the biennial Critical Assessment of protein Structure Prediction competition, in which researchers attempt to predict the structure of a protein only knowing its amino acid sequence, and also decoys generated by 3DRobot, which have user-specified global root-mean-squared deviations from experimentally determined structures. Our analysis revealed that both sets of decoys possess cores that do not recapitulate the key features that define real protein cores. In particular, the model structures appear more densely packed (because of energetically unfavorable atomic overlaps), contain too few residues in the core, and have improper distributions of hydrophobic residues throughout the structure. Based on these observations, we developed a feed-forward neural network, which incorporates key physical features of protein cores, to predict how well a computational model recapitulates the real protein structure without knowledge of the structure of the target sequence. By identifying the important features of protein structure, our method is able to rank decoy structures with similar accuracy to that obtained by state-of-the-art methods that incorporate many additional features. The small number of physical features makes our model interpretable, emphasizing the importance of protein packing and hydrophobicity in protein structure prediction.
Project description:BackgroundSetting the rules for the identification of a stable conformation of a protein is of utmost importance for the efficient generation of structures in computer simulation. For structure prediction, a considerable number of possible models are generated from which the best model has to be selected.ResultsTwo scoring functions, Rs and Rp, based on the consideration of packing of residues, which indicate if the conformation of an amino acid sequence is native-like, are presented. These are defined using the solvent accessible surface area (ASA) and the partner number (PN) (other residues that are within 4.5 A) of a particular residue. The two functions evaluate the deviation from the average packing properties (ASA or PN) of all residues in a polypeptide chain corresponding to a model of its three-dimensional structure. While simple in concept and computationally less intensive, both the functions are at least as efficient as any other energy functions in discriminating the native structure from decoys in a large number of standard decoy sets, as well as on models submitted for the targets of CASP7. Rs appears to be slightly more effective than Rp, as determined by the number of times the native structure possesses the minimum value for the function and its separation from the average value for the decoys.ConclusionTwo parameters, Rs and Rp, are discussed that can very efficiently recognize the native fold for a sequence from an ensemble of decoy structures. Unlike many other algorithms that rely on the use of composite scoring function, these are based on a single parameter, viz., the accessible surface area (or the number of residues in contact), but still able to capture the essential attribute of the native fold.
Project description:The recent structural elucidation of about one dozen channels (in which we include transporters) has provided further evidence that these membrane proteins typically undergo large movements during their function. However, it is still not well understood how these proteins achieve the necessary trade-off between stability and mobility. To identify specific structural properties of channels, we compared the helix-packing and hydrogen-bonding patterns of channels with those of membrane coils; the latter is a class of membrane proteins whose structures are expected to be more rigid. We describe in detail how in channels, helix pairs are usually arranged in packing motifs with large crossing angles (|tau| approximately 40 degrees ), where the (small) side chains point away from the packing core and the backbones of the two helices are in close contact. We found that this contributes to a significant enrichment of Calpha-H...O bonds and to a packing geometry where right-handed parallel (tau = -40 degrees +/- 10 degrees ) and antiparallel (tau = +140 degrees +/- 25 degrees ) arrangements are equally preferred. By sharp contrast, the interdigitation and hydrogen bonding of side chains in helix pairs of membrane coils results in narrowly distributed left-handed antiparallel arrangements with crossing angles tau = -160 degrees +/- 10 degrees (|tau| approximately 20 degrees ). In addition, we show that these different helix-packing modes of the two types of membrane proteins correspond to specific hydrogen-bonding patterns. In particular, in channels, three times as many of the hydrogen-bonded helix pairs are found in parallel right-handed motifs than are non-hydrogen-bonded helix pairs. Finally, we discuss how the presence of weak hydrogen bonds, water-containing cavities, and right-handed crossing angles may facilitate the required conformational flexibility between helix pairs of channels while maintaining sufficient structural stability.
Project description:Cellular protein interaction networks exhibit sigmoidal input-output relationships with thresholds and steep responses (i.e., ultrasensitivity). Although cooperativity can be a source of ultrasensitivity, we examined whether the presence of "decoy" binding sites that are not coupled to activation could also lead to this effect. To systematically vary key parameters of the system, we designed a synthetic regulatory system consisting of an autoinhibited PDZ domain coupled to an activating SH3 domain binding site. In the absence of a decoy binding site, this system is non-ultrasensitive, as predicted by modeling of this system. Addition of a high-affinity decoy site adds a threshold, but the response is not ultrasensitive. We found that sigmoidal activation profiles can be generated utilizing multiple decoys with mixtures of high and low affinities, where high affinity decoys act to set the threshold and low affinity decoys ensure a sigmoidal response. Placing the synthetic decoy system in a mitotic spindle orientation cell culture system thresholds this physiological activity. Thus, simple combinations of non-activating binding sites can lead to complex regulatory responses in protein interaction networks.
Project description:We compare side chain prediction and packing of core and non-core regions of soluble proteins, protein-protein interfaces, and transmembrane proteins. We first identified or created comparable databases of high-resolution crystal structures of these 3 protein classes. We show that the solvent-inaccessible cores of the 3 classes of proteins are equally densely packed. As a result, the side chains of core residues at protein-protein interfaces and in the membrane-exposed regions of transmembrane proteins can be predicted by the hard-sphere plus stereochemical constraint model with the same high prediction accuracies (>90%) as core residues in soluble proteins. We also find that for all 3 classes of proteins, as one moves away from the solvent-inaccessible core, the packing fraction decreases as the solvent accessibility increases. However, the side chain predictability remains high (80% within 30°) up to a relative solvent accessibility, rSASA?0.3, for all 3 protein classes. Our results show that ?40% of the interface regions in protein complexes are "core", that is, densely packed with side chain conformations that can be accurately predicted using the hard-sphere model. We propose packing fraction as a metric that can be used to distinguish real protein-protein interactions from designed, non-binding, decoys. Our results also show that cores of membrane proteins are the same as cores of soluble proteins. Thus, the computational methods we are developing for the analysis of the effect of hydrophobic core mutations in soluble proteins will be equally applicable to analyses of mutations in membrane proteins.
Project description:A major driving force for water-soluble protein folding is the hydrophobic effect, but membrane proteins cannot make use of this stabilizing contribution in the apolar core of the bilayer. It has been proposed that membrane proteins compensate by packing more efficiently. We therefore investigated packing contributions experimentally by observing the energetic and structural consequences of cavity creating mutations in the core of a membrane protein. We observed little difference in the packing energetics of water and membrane soluble proteins. Our results imply that other mechanisms are employed to stabilize the structure of membrane proteins.
Project description:A critical step in the folding pathway of globular proteins is the formation of a tightly packed hydrophobic core. Several mutational studies have addressed the question of whether tight packing interactions are present during the rate-limiting step of folding. In some of these investigations, substituted side chains have been assumed to form native-like interactions in the transition state when the folding rates of mutant proteins correlate with their native-state stabilities. Alternatively, it has been argued that side chains participate in nonspecific hydrophobic collapse when the folding rates of mutant proteins correlate with side-chain hydrophobicity. In a reanalysis of published data, we have found that folding rates often correlate similarly well, or poorly, with both native-state stability and side-chain hydrophobicity, and it is therefore not possible to select an appropriate transition state model based on these one-parameter correlations. We show that this ambiguity can be resolved using a two-parameter model in which side chain burial and the formation of all other native-like interactions can occur asynchronously. Notably, the model agrees well with experimental data, even for positions where the one-parameter correlations are poor. We find that many side chains experience a previously unrecognized type of transition state environment in which specific, native-like interactions are formed, but hydrophobic burial dominates. Implications of these results to the design and analysis of protein folding studies are discussed.
Project description:BackgroundPrediction of protein solvent accessibility, also called accessible surface area (ASA) prediction, is an important step for tertiary structure prediction directly from one-dimensional sequences. Traditionally, predicting solvent accessibility is regarded as either a two- (exposed or buried) or three-state (exposed, intermediate or buried) classification problem. However, the states of solvent accessibility are not well-defined in real protein structures. Thus, a number of methods have been developed to directly predict the real value ASA based on evolutionary information such as position specific scoring matrix (PSSM).ResultsThis study enhances the PSSM-based features for real value ASA prediction by considering the physicochemical properties and solvent propensities of amino acid types. We propose a systematic method for identifying residue groups with respect to protein solvent accessibility. The amino acid columns in the PSSM profile that belong to a certain residue group are merged to generate novel features. Finally, support vector regression (SVR) is adopted to construct a real value ASA predictor. Experimental results demonstrate that the features produced by the proposed selection process are informative for ASA prediction.ConclusionExperimental results based on a widely used benchmark reveal that the proposed method performs best among several of existing packages for performing ASA prediction. Furthermore, the feature selection mechanism incorporated in this study can be applied to other regression problems using the PSSM. The program and data are available from the authors upon request.
Project description:The packing of helices spanning lipid bilayers is crucial for the stability and function of alpha-helical membrane proteins. Using a modified Voronoi procedure, we calculated packing densities for helix-helix contacts in membrane spanning domains. Our results show that the transmembrane helices of protein channels and transporters are significantly more loosely packed compared with helices in globular proteins. The observed packing deficiencies of these membrane proteins are also reflected by a higher amount of cavities at functionally important sites. The cavities positioned along the gated pores of membrane channels and transporters are noticeably lined by polar amino acids that should be exposed to the aqueous medium when the protein is in the open state. In contrast, nonpolar amino acids surround the cavities in those protein regions where large rearrangements are supposed to take place, as near the hinge regions of transporters or at restriction sites of protein channels. We presume that the observed deficiencies of helix-helix packing are essential for the helical mobility that sustains the function of many membrane protein channels and transporters.
Project description:We present a novel method called RosettaHoles for visual and quantitative assessment of underpacking in the protein core. RosettaHoles generates a set of spherical cavity balls that fill the empty volume between atoms in the protein interior. For visualization, the cavity balls are aggregated into contiguous overlapping clusters and small cavities are discarded, leaving an uncluttered representation of the unfilled regions of space in a structure. For quantitative analysis, the cavity ball data are used to estimate the probability of observing a given cavity in a high-resolution crystal structure. RosettaHoles provides excellent discrimination between real and computationally generated structures, is predictive of incorrect regions in models, identifies problematic structures in the Protein Data Bank, and promises to be a useful validation tool for newly solved experimental structures.
Project description:Structural flexibility is an essential attribute, without which few proteins could carry out their biological functions. Much information about protein flexibility has come from x-ray crystallography, in the form of atomic mean-square displacements (AMSDs) or B factors. Profiles showing the AMSD variation along the polypeptide chain are usually interpreted in dynamical terms but are ultimately governed by the local features of a highly complex energy landscape. Here, we bypass this complexity by showing that the AMSD profile is essentially determined by spatial variations in local packing density. On the basis of elementary statistical mechanics and generic features of atomic distributions in proteins, we predict a direct inverse proportionality between the AMSD and the contact density, i.e., the number of noncovalent neighbor atoms within a local region of approximately 1.5 nm(3) volume. Testing this local density model against a set of high-quality crystal structures of 38 nonhomologous proteins, we find that it accurately and consistently reproduces the prominent peaks in the AMSD profile and even captures minor features, such as the periodic AMSD variation within alpha helices. The predicted rigidifying effect of crystal contacts also agrees with experimental data. With regard to accuracy and computational efficiency, the model is clearly superior to its predecessors. The quantitative link between flexibility and packing density found here implies that AMSDs provide little independent information beyond that contained in the mean atomic coordinates.