Unknown

Dataset Information

0

Filtering high-throughput protein-protein interaction data using a combination of genomic features.


ABSTRACT: BACKGROUND:Protein-protein interaction data used in the creation or prediction of molecular networks is usually obtained from large scale or high-throughput experiments. This experimental data is liable to contain a large number of spurious interactions. Hence, there is a need to validate the interactions and filter out the incorrect data before using them in prediction studies. RESULTS:In this study, we use a combination of 3 genomic features -- structurally known interacting Pfam domains, Gene Ontology annotations and sequence homology -- as a means to assign reliability to the protein-protein interactions in Saccharomyces cerevisiae determined by high-throughput experiments. Using Bayesian network approaches, we show that protein-protein interactions from high-throughput data supported by one or more genomic features have a higher likelihood ratio and hence are more likely to be real interactions. Our method has a high sensitivity (90%) and good specificity (63%). We show that 56% of the interactions from high-throughput experiments in Saccharomyces cerevisiae have high reliability. We use the method to estimate the number of true interactions in the high-throughput protein-protein interaction data sets in Caenorhabditis elegans, Drosophila melanogaster and Homo sapiens to be 27%, 18% and 68% respectively. Our results are available for searching and downloading at http://helix.protein.osaka-u.ac.jp/htp/. CONCLUSION:A combination of genomic features that include sequence, structure and annotation information is a good predictor of true interactions in large and noisy high-throughput data sets. The method has a very high sensitivity and good specificity and can be used to assign a likelihood ratio, corresponding to the reliability, to each interaction.

SUBMITTER: Patil A 

PROVIDER: S-EPMC1127019 | biostudies-literature | 2005 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

Filtering high-throughput protein-protein interaction data using a combination of genomic features.

Patil Ashwini A   Nakamura Haruki H  

BMC bioinformatics 20050418


<h4>Background</h4>Protein-protein interaction data used in the creation or prediction of molecular networks is usually obtained from large scale or high-throughput experiments. This experimental data is liable to contain a large number of spurious interactions. Hence, there is a need to validate the interactions and filter out the incorrect data before using them in prediction studies.<h4>Results</h4>In this study, we use a combination of 3 genomic features -- structurally known interacting Pfa  ...[more]

Similar Datasets

| S-EPMC4762622 | biostudies-literature
| S-EPMC3509493 | biostudies-literature
| S-EPMC3740625 | biostudies-literature
| S-EPMC2621150 | biostudies-literature
| S-EPMC4666383 | biostudies-literature
| S-EPMC2982151 | biostudies-literature
| S-EPMC1766414 | biostudies-literature
| S-EPMC8435229 | biostudies-literature
| S-EPMC4335499 | biostudies-literature