Ontology highlight
ABSTRACT: Background
Previous studies have noted that drug targets appear to be associated with higher-degree or higher-centrality proteins in interaction networks. These studies explicitly or tacitly make choices of different source databases, data integration strategies, representation of proteins and complexes, and data reliability assumptions. Here we examined how the use of different data integration and representation techniques, or different notions of reliability, may affect the efficacy of degree and centrality as features in drug target prediction.Results
Fifty percent of drug targets have a degree of less than nine, and ninety-five percent have a degree of less than ninety. We found that drug targets are over-represented in higher degree bins - this relationship is only seen for the consolidated interactome and it is not dependent on n-ary interaction data or its representation. Degree acts as a weak predictive feature for drug-target status and using more reliable subsets of the data does not increase this performance. However, performance does increase if only cancer-related drug targets are considered. We also note that a protein's membership in pathway records can act as a predictive feature that is better than degree and that high-centrality may be an indicator of a drug that is more likely to be withdrawn.Conclusions
These results show that protein interaction data integration and cleaning is an important consideration when incorporating network properties as predictive features for drug-target status. The provided scripts and data sets offer a starting point for further studies and cross-comparison of methods.
SUBMITTER: Mora A
PROVIDER: S-EPMC3534413 | biostudies-literature | 2012 Nov
REPOSITORIES: biostudies-literature
Mora Antonio A Donaldson Ian M IM
BMC bioinformatics 20121112
<h4>Background</h4>Previous studies have noted that drug targets appear to be associated with higher-degree or higher-centrality proteins in interaction networks. These studies explicitly or tacitly make choices of different source databases, data integration strategies, representation of proteins and complexes, and data reliability assumptions. Here we examined how the use of different data integration and representation techniques, or different notions of reliability, may affect the efficacy o ...[more]