Dataset Information

Improved homology-driven computational validation of protein-protein interactions motivated by the evolutionary gene duplication and divergence hypothesis.

ABSTRACT: BACKGROUND: Protein-protein interaction (PPI) data sets generated by high-throughput experiments are contaminated by large numbers of erroneous PPIs. Therefore, computational methods for PPI validation are necessary to improve the quality of such data sets. Against the background of the theory that most extant PPIs arose as a consequence of gene duplication, the sensitive search for homologous PPIs, i.e. for PPIs descending from a common ancestral PPI, should be a successful strategy for PPI validation. RESULTS: To validate an experimentally observed PPI, we combine FASTA and PSI-BLAST to perform a sensitive sequence-based search for pairs of interacting homologous proteins within a large, integrated PPI database. A novel scoring scheme that incorporates both quality and quantity of all observed matches allows us (1) to consider also tentative paralogs and orthologs in this analysis and (2) to combine search results from more than one homology detection method. ROC curves illustrate the high efficacy of this approach and its improvement over other homology-based validation methods. CONCLUSION: New PPIs are primarily derived from preexisting PPIs and not invented de novo. Thus, the hallmark of true PPIs is the existence of homologous PPIs. The sensitive search for homologous PPIs within a large body of known PPIs is an efficient strategy to separate biologically relevant PPIs from the many spurious PPIs reported by high-throughput experiments.

SUBMITTER: Frech C

PROVIDER: S-EPMC2637843 | biostudies-other | 2009

REPOSITORIES: biostudies-other

ACCESS DATA

Publications

Improved homology-driven computational validation of protein-protein interactions motivated by the evolutionary gene duplication and divergence hypothesis.

Frech Christian C Kommenda Michael M Dorfer Viktoria V Kern Thomas T Hintner Helmut H Bauer Johann W JW Onder Kamil K

BMC bioinformatics 20090119

<h4>Background</h4>Protein-protein interaction (PPI) data sets generated by high-throughput experiments are contaminated by large numbers of erroneous PPIs. Therefore, computational methods for PPI validation are necessary to improve the quality of such data sets. Against the background of the theory that most extant PPIs arose as a consequence of gene duplication, the sensitive search for homologous PPIs, i.e. for PPIs descending from a common ancestral PPI, should be a successful strategy for ...[more]

PMID: 19152684

Similar Datasets

Project description:BACKGROUND: The Apicomplexa constitute an evolutionarily divergent phylum of protozoan pathogens responsible for widespread parasitic diseases such as malaria and toxoplasmosis. Many cellular functions in these medically important organisms are controlled by protein kinases, which have emerged as promising drug targets for parasitic diseases. However, an incomplete understanding of how apicomplexan kinases structurally and mechanistically differ from their host counterparts has hindered drug development efforts to target parasite kinases. RESULTS: We used the wealth of sequence data recently made available for 15 apicomplexan species to identify the kinome of each species and quantify the evolutionary constraints imposed on each family of apicomplexan kinases. Our analysis revealed lineage-specific adaptations in selected families, namely cyclin-dependent kinase (CDK), calcium-dependent protein kinase (CDPK) and CLK/LAMMER, which have been identified as important in the pathogenesis of these organisms. Bayesian analysis of selective constraints imposed on these families identified the sequence and structural features that most distinguish apicomplexan protein kinases from their homologs in model organisms and other eukaryotes. In particular, in a subfamily of CDKs orthologous to Plasmodium falciparum crk-5, the activation loop contains a novel PTxC motif which is absent from all CDKs outside Apicomplexa. Our analysis also suggests a convergent mode of regulation in a subset of apicomplexan CDPKs and mammalian MAPKs involving a commonly conserved arginine in the ?C helix. In all recognized apicomplexan CLKs, we find a set of co-conserved residues involved in substrate recognition and docking that are distinct from metazoan CLKs. CONCLUSIONS: We pinpoint key conserved residues that can be predicted to mediate functional differences from eukaryotic homologs in three identified kinase families. We discuss the structural, functional and evolutionary implications of these lineage-specific variations and propose specific hypotheses for experimental investigation. The apicomplexan-specific kinase features reported in this study can be used in the design of selective kinase inhibitors.

Dataset Information

Improved homology-driven computational validation of protein-protein interactions motivated by the evolutionary gene duplication and divergence hypothesis.

Publications

Improved homology-driven computational validation of protein-protein interactions motivated by the evolutionary gene duplication and divergence hypothesis.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets