Dataset Information

Sequence variation in ligand binding sites in proteins.

ABSTRACT:

Background

The recent explosion in the availability of complete genome sequences has led to the cataloging of tens of thousands of new proteins and putative proteins. Many of these proteins can be structurally or functionally categorized from sequence conservation alone. In contrast, little attention has been given to the meaning of poorly-conserved sites in families of proteins, which are typically assumed to be of little structural or functional importance.

Results

Recently, using statistical free energy analysis of tetratricopeptide repeat (TPR) domains, we observed that positions in contact with peptide ligands are more variable than surface positions in general. Here we show that statistical analysis of TPRs, ankyrin repeats, Cys2His2 zinc fingers and PDZ domains accurately identifies specificity-determining positions by their sequence variation. Sequence variation is measured as deviation from a neutral reference state, and we present probabilistic and information theory formalisms that improve upon recently suggested methods such as statistical free energies and sequence entropies.

Conclusion

Sequence variation has been used to identify functionally-important residues in four selected protein families. With TPRs and ankyrin repeats, protein families that bind highly diverse ligands, the effect is so pronounced that sequence "hypervariation" alone can be used to predict ligand binding sites.

SUBMITTER: Magliery TJ

PROVIDER: S-EPMC1261162 | biostudies-literature | 2005 Sep

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Sequence variation in ligand binding sites in proteins.

Magliery Thomas J TJ Regan Lynne L

BMC bioinformatics 20050930

<h4>Background</h4>The recent explosion in the availability of complete genome sequences has led to the cataloging of tens of thousands of new proteins and putative proteins. Many of these proteins can be structurally or functionally categorized from sequence conservation alone. In contrast, little attention has been given to the meaning of poorly-conserved sites in families of proteins, which are typically assumed to be of little structural or functional importance.<h4>Results</h4>Recently, usi ...[more]

PMID: 16194281

Similar Datasets

Project description:Proteins can sample a broad landscape as they undergo conformational transition between different functional states. At the same time, as key players in almost all cellular processes, proteins are important drug targets. Considering the different conformational states of a protein is therefore central for a successful drug-design strategy. Here we introduce a novel docking protocol, termed extended-ensemble docking, pertaining to proteins that undergo large-scale (global) conformational changes during their function. In its application to multidrug ABC-transporter P-glycoprotein (Pgp), extensive non-equilibrium molecular dynamics simulations employing system-specific collective variables are first used to describe the transition cycle of the transporter. An extended set of conformations (extended ensemble) representing the full transition cycle between the inward- and the outward-facing states is then used to seed high-throughput docking calculations of known substrates, non-substrates, and modulators of the transporter. Large differences are predicted in the binding affinities to different conformations, with compounds showing stronger binding affinities to intermediate conformations compared to the starting crystal structure. Hierarchical clustering of the binding modes shows all ligands preferably bind to the large central cavity of the protein, formed at the apex of the transmembrane domain (TMD), whereas only small binding populations are observed in the previously described R and H sites present within the individual TMD leaflets. Based on the results, the central cavity is further divided into two major subsites, first preferably binding smaller substrates and high-affinity inhibitors, whereas the second one shows preference for larger substrates and low-affinity modulators. These central subsites along with the low-affinity interaction sites present within the individual TMD leaflets may respectively correspond to the proposed high- and low-affinity binding sites in Pgp. We propose further an optimization strategy for developing more potent inhibitors of Pgp, based on increasing its specificity to the extended ensemble of the protein, instead of using a single protein structure, as well as its selectivity for the high-affinity binding site. In contrast to earlier in silico studies using single static structures of Pgp, our results show better agreement with experimental studies, pointing to the importance of incorporating the global conformational flexibility of proteins in future drug-discovery endeavors.

Project description:Understanding the molecular details of protein-DNA interactions is critical for deciphering the mechanisms of gene regulation. We present a machine learning approach for the identification of amino acid residues involved in protein-DNA interactions.We start with a Naïve Bayes classifier trained to predict whether a given amino acid residue is a DNA-binding residue based on its identity and the identities of its sequence neighbors. The input to the classifier consists of the identities of the target residue and 4 sequence neighbors on each side of the target residue. The classifier is trained and evaluated (using leave-one-out cross-validation) on a non-redundant set of 171 proteins. Our results indicate the feasibility of identifying interface residues based on local sequence information. The classifier achieves 71% overall accuracy with a correlation coefficient of 0.24, 35% specificity and 53% sensitivity in identifying interface residues as evaluated by leave-one-out cross-validation. We show that the performance of the classifier is improved by using sequence entropy of the target residue (the entropy of the corresponding column in multiple alignment obtained by aligning the target sequence with its sequence homologs) as additional input. The classifier achieves 78% overall accuracy with a correlation coefficient of 0.28, 44% specificity and 41% sensitivity in identifying interface residues. Examination of the predictions in the context of 3-dimensional structures of proteins demonstrates the effectiveness of this method in identifying DNA-binding sites from sequence information. In 33% (56 out of 171) of the proteins, the classifier identifies the interaction sites by correctly recognizing at least half of the interface residues. In 87% (149 out of 171) of the proteins, the classifier correctly identifies at least 20% of the interface residues. This suggests the possibility of using such classifiers to identify potential DNA-binding motifs and to gain potentially useful insights into sequence correlates of protein-DNA interactions.Naïve Bayes classifiers trained to identify DNA-binding residues using sequence information offer a computationally efficient approach to identifying putative DNA-binding sites in DNA-binding proteins and recognizing potential DNA-binding motifs.

Project description:A recent clinical report has linked Streptococcus pyogenes ?-lactam antibiotic resistance to mutation in the penicillin binding protein (PBP) PBP2x. To determine whether this is an isolated case or reflects a broader prevalence of mutations that might confer reduced ?-lactam susceptibility, we investigated the relative frequency of PBP sequence variation within a global database of 9,667 S. pyogenes isolates. We found that mutations in S. pyogenes PBPs (PBP2x, PBP1a, PBP1b, and PBP2a) occur infrequently across this global database, with fewer than 3 amino acid changes differing between >99% of the global population. Only 4 of the 9,667 strains contained mutations near transpeptidase active sites of PBP2x or PBP1a. The reported PBP2x T553K substitution was not identified. These findings are in contrast to those of 2,520 S. pneumococcus sequences where PBP mutations are relatively frequent and are often located in key ?-lactam binding pockets. These data, combined with the general lack of penicillin resistance reported in S. pyogenes worldwide, suggests that extensive, unknown constraints restrict S. pyogenes PBP sequence plasticity. Our findings imply that while heavy antibiotic pressure may select for mutations in the PBPs, there is currently no evidence of such mutations becoming fixed in the S. pyogenes population or that mutations are being sequentially acquired in the PBPs.IMPORTANCE ?-Lactam antibiotics are the first-line therapeutic option for Streptococcus pyogenes infections. Despite the global high prevalence of S. pyogenes infections and widespread use of ?-lactams worldwide, reports of resistance to ?-lactam antibiotics, such as penicillin, have been incredibly rare. Recently, ?-lactam resistance, as defined by clinical breakpoints, was detected in two clinical S. pyogenes isolates with accompanying mutations in the active site of the penicillin binding protein PBP2x, raising concerns that ?-lactam resistance will become more widespread. We screened a global database of S. pyogenes genome sequences to investigate the frequency of PBP mutations, identifying that PBP mutations are uncommon relative to those of Streptococcus pneumoniae These findings support clinical observations that ?-lactam resistance is rare in S. pyogenes and suggest that there are considerable constraints on S. pyogenes PBP sequence variation.

Dataset Information

Sequence variation in ligand binding sites in proteins.

Background

Results

Conclusion

Publications

Sequence variation in ligand binding sites in proteins.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets