Project description:HIV type 1 (HIV-1) is characterized by its rapid genetic evolution, leading to challenges in anti-HIV therapy. However, the sequence variations in HIV-1 proteins are not randomly distributed due to a combination of functional constraints and genetic drift. In this study, we examined patterns of sequence variability for evidence of linked sequence changes (termed as coevolution or covariation) in 15 HIV-1 proteins. It shows that the percentage of charged residues in the coevolving residues is significantly higher than that in all the HIV-1 proteins. Most of the coevolving residues are spatially proximal in the protein structures and tend to form relatively compact and independent units in the tertiary structures, termed as "protein sectors". These protein sectors are closely associated with anti-HIV drug resistance, T cell epitopes, and antibody binding sites. Finally, we explored candidate peptide inhibitors based on the protein sectors. Our results can establish an association between the coevolving residues and molecular functions of HIV-1 proteins, and then provide us with valuable knowledge of pathology of HIV-1 and therapeutics development.
Project description:The packing of protein atoms is an indicator for their stability and functionality, and applied in determining thermostability, in protein design, ligand binding and to identify flexible regions in proteins. Here, we present Voronoia, a database of atomic-scale packing data for protein 3D structures. It is based on an improved Voronoi Cell algorithm using hyperboloid interfaces to construct atomic volumes, and to resolve solvent-accessible and -inaccessible regions of atoms. The database contains atomic volumes, local packing densities and interior cavities calculated for 61 318 biological units from the PDB. A report for each structure summarizes the packing by residue and atom types, and lists the environment of interior cavities. The packing data are compared to a nonredundant set of structures from SCOP superfamilies. Both packing densities and cavities can be visualized in the 3D structures by the Jmol plugin. Additionally, PDB files can be submitted to the Voronoia server for calculation. This service performs calculations for most full-atomic protein structures within a few minutes. For batch jobs, a standalone version of the program with an optional PyMOL plugin is available for download. The database can be freely accessed at: http://bioinformatics.charite.de/voronoia.
Project description:BackgroundProtein secondary structure can be regarded as an information bridge that links the primary sequence and tertiary structure. Accurate 8-state secondary structure prediction can significantly give more precise and high resolution on structure-based properties analysis.ResultsWe present a novel deep learning architecture which exploits an integrative synergy of prediction by a convolutional neural network, residual network, and bidirectional recurrent neural network to improve the performance of protein secondary structure prediction. A local block comprised of convolutional filters and original input is designed for capturing local sequence features. The subsequent bidirectional recurrent neural network consisting of gated recurrent units can capture global context features. Furthermore, the residual network can improve the information flow between the hidden layers and the cascaded recurrent neural network. Our proposed deep network achieved 71.4% accuracy on the benchmark CB513 dataset for the 8-state prediction; and the ensemble learning by our model achieved 74% accuracy. Our model generalization capability is also evaluated on other three independent datasets CASP10, CASP11 and CASP12 for both 8- and 3-state prediction. These prediction performances are superior to the state-of-the-art methods.ConclusionOur experiment demonstrates that it is a valuable method for predicting protein secondary structure, and capturing local and global features concurrently is very useful in deep learning.
Project description:A comparative analysis of cavities enclosed in a tertiary structure of proteins and interfaces formed by the interaction of two protein subunits in obligate and non-obligate categories (represented by homodimeric molecules and heterocomplexes, respectively) is presented. The total volume of cavities increases with the size of the protein (or the interface), though the exact relationship may vary in different cases. Likewise, for individual cavities also there is quantitative dependence of the volume on the number of atoms (or residues) lining the cavity. The larger cavities tend to be less spherical, solvated, and the interfaces are enriched in these. On average 15 A(3) of cavity volume is found to accommodate single water, with another 40-45 A(3) needed for each additional solvent molecule. Polar atoms/residues have a higher propensity to line solvated cavities. Relative to the frequency of occurrence in the whole structure (or interface), residues in beta-strands are found more often lining the cavities, and those in turn and loop the least. Any depression in one chain not complemented by a protrusion in the other results in a cavity in the protein-protein interface. Through the use of the Voronoi volume, the packing of residues involved in protein-protein interaction has been compared to that in the protein interior. For a comparable number of atoms the interface has about twice the number of cavities relative to the tertiary structure.
Project description:Residue pairs that directly coevolve in protein families are generally close in protein 3D structures. Here we study the exceptions to this general trend-directly coevolving residue pairs that are distant in protein structures-to determine the origins of evolutionary pressure on spatially distant residues and to understand the sources of error in contact-based structure prediction. Over a set of 4,000 protein families, we find that 25% of directly coevolving residue pairs are separated by more than 5 Å in protein structures and 3% by more than 15 Å. The majority (91%) of directly coevolving residue pairs in the 5-15 Å range are found to be in contact in at least one homologous structure-these exceptions arise from structural variation in the family in the region containing the residues. Thirty-five percent of the exceptions greater than 15 Å are at homo-oligomeric interfaces, 19% arise from family structural variation, and 27% are in repeat proteins likely reflecting alignment errors. Of the remaining long-range exceptions (<1% of the total number of coupled pairs), many can be attributed to close interactions in an oligomeric state. Overall, the results suggest that directly coevolving residue pairs not in repeat proteins are spatially proximal in at least one biologically relevant protein conformation within the family; we find little evidence for direct coupling between residues at spatially separated allosteric and functional sites or for increased direct coupling between residue pairs on putative allosteric pathways connecting them.
Project description:A long-standing problem in molecular biology is the determination of a complete functional conformational landscape of proteins. This includes not only proteins' native structures, but also all their respective functional states, including functionally important intermediates. Here, we reveal a signature of functionally important states in several protein families, using direct coupling analysis, which detects residue pair coevolution of protein sequence composition. This signature is exploited in a protein structure-based model to uncover conformational diversity, including hidden functional configurations. We uncovered, with high resolution (mean ~1.9 Å rmsd for nonapo structures), different functional structural states for medium to large proteins (200-450 aa) belonging to several distinct families. The combination of direct coupling analysis and the structure-based model also predicts several intermediates or hidden states that are of functional importance. This enhanced sampling is broadly applicable and has direct implications in protein structure determination and the design of ligands or drugs to trap intermediate states.
Project description:We introduce a numerical scheme to evolve functional elastic materials that can accomplish a specified mechanical task. In this scheme, the number of solutions, their spatial architectures, and the correlations among them can be computed. As an example, we consider an "allosteric" task, which requires the material to respond specifically to a stimulus at a distant active site. We find that functioning materials evolve a less-constrained trumpet-shaped region connecting the stimulus and active sites, and that the amplitude of the elastic response varies nonmonotonically along the trumpet. As previously shown for some proteins, we find that correlations appearing during evolution alone are sufficient to identify key aspects of this design. Finally, we show that the success of this architecture stems from the emergence of soft edge modes recently found to appear near the surface of marginally connected materials. Overall, our in silico evolution experiment offers a window to study the relationship between structure, function, and correlations emerging during evolution.
Project description:BackgroundProtein function is determined by many factors, namely by its constitution, spatial arrangement, and dynamic behavior. Studying these factors helps the biochemists and biologists to better understand the protein behavior and to design proteins with modified properties. One of the most common approaches to these studies is to compare the protein structure with other molecules and to reveal similarities and differences in their polypeptide chains.ResultsWe support the comparison process by proposing a new visualization technique that bridges the gap between traditionally used 1D and 3D representations. By introducing the information about mutual positions of protein chains into the 1D sequential representation the users are able to observe the spatial differences between the proteins without any occlusion commonly present in 3D view. Our representation is designed to serve namely for comparison of multiple proteins or a set of time steps of molecular dynamics simulation.ConclusionsThe novel representation is demonstrated on two usage scenarios. The first scenario aims to compare a set of proteins from the family of cytochromes P450 where the position of the secondary structures has a significant impact on the substrate channeling. The second scenario focuses on the protein flexibility when by comparing a set of time steps our representation helps to reveal the most dynamically changing parts of the protein chain.
Project description:BackgroundMapping protein primary sequences to their three dimensional folds referred to as the 'second genetic code' remains an unsolved scientific problem. A crucial part of the problem concerns the geometrical specificity in side chain association leading to densely packed protein cores, a hallmark of correctly folded native structures. Thus, any model of packing within proteins should constitute an indispensable component of protein folding and design.ResultsIn this study an attempt has been made to find, characterize and classify recurring patterns in the packing of side chain atoms within a protein which sustains its native fold. The interaction of side chain atoms within the protein core has been represented as a contact network based on the surface complementarity and overlap between associating side chain surfaces. Some network topologies definitely appear to be preferred and they have been termed 'packing motifs', analogous to super secondary structures in proteins. Study of the distribution of these motifs reveals the ubiquitous presence of typical smaller graphs, which appear to get linked or coalesce to give larger graphs, reminiscent of the nucleation-condensation model in protein folding. One such frequently occurring motif, also envisaged as the unit of clustering, the three residue clique was invariably found in regions of dense packing. Finally, topological measures based on surface contact networks appeared to be effective in discriminating sequences native to a specific fold amongst a set of decoys.ConclusionsOut of innumerable topological possibilities, only a finite number of specific packing motifs are actually realized in proteins. This small number of motifs could serve as a basis set in the construction of larger networks. Of these, the triplet clique exhibits distinct preference both in terms of composition and geometry.
Project description:Many single-domain proteins are not only stable and water-soluble, but they also populate few to no intermediates during folding. This reduces interactions between partially folded proteins, misfolding, and aggregation, and makes the proteins tractable in biotechnological applications. Natural proteins fold thus, not necessarily only because their structures are well-suited for folding, but because their sequences optimize packing and fit their structures well. In contrast, folding experiments on the de novo designed Top7 suggest that it populates several intermediates. Additionally, in de novo protein design, where sequences are designed for natural and new non-natural structures, tens of sequences still need to be tested before success is achieved. Both these issues may be caused by the specific scaffolds used in design, i.e., some protein scaffolds may be more tolerant to packing perturbations and varied sequences. Here, we report a computational method for assessing the response of protein structures to packing perturbations. We then benchmark this method using designed proteins and find that it can identify scaffolds whose folding gets disrupted upon perturbing packing, leading to the population of intermediates. The method can also isolate regions of both natural and designed scaffolds that are sensitive to such perturbations and identify contacts which when present can rescue folding. Overall, this method can be used to identify protein scaffolds that are more amenable to whole protein design as well as to identify protein regions which are sensitive to perturbations and where further mutations should be avoided during protein engineering.