Unknown

Dataset Information

0

A protein domain co-occurrence network approach for predicting protein function and inferring species phylogeny.


ABSTRACT: Protein Domain Co-occurrence Network (DCN) is a biological network that has not been fully-studied. We analyzed the properties of the DCNs of H. sapiens, S. cerevisiae, C. elegans, D. melanogaster, and 15 plant genomes. These DCNs have the hallmark features of scale-free networks. We investigated the possibility of using DCNs to predict protein and domain functions. Based on our experiment conducted on 66 randomly selected proteins, the best of top 3 predictions made by our DCN-based aggregated neighbor-counting method achieved a semantic similarity score of 0.81 to the actual Gene Ontology terms of the proteins. Moreover, the top 3 predictions using neighbor-counting, ?(2), and a SVM-based method achieved an accuracy of 66%, 59%, and 61%, respectively, when used to predict specific Gene Ontology terms of human target domains. These predictions on average had a semantic similarity score of 0.82, 0.80, and 0.79 to the actual Gene Ontology terms, respectively. We also used DCNs to predict whether a domain is an enzyme domain, and our SVM-based and neighbor-inference method correctly classified 79% and 77% of the target domains, respectively. When using DCNs to classify a target domain into one of the six enzyme classes, we found that, as long as there is one EC number available in the neighboring domains, our SVM-based and neighboring-counting method correctly classified 92.4% and 91.9% of the target domains, respectively. Furthermore, we benchmarked the performance of using DCNs to infer species phylogenies on six different combinations of 398 single-chromosome prokaryotic genomes. The phylogenetic tree of 54 prokaryotic taxa generated by our DCNs-alignment-based method achieved a 93.45% similarity score compared to the Bergey's taxonomy. In summary, our studies show that genome-wide DCNs contain rich information that can be effectively used to decipher protein function and reveal the evolutionary relationship among species.

SUBMITTER: Wang Z 

PROVIDER: S-EPMC3063783 | biostudies-literature | 2011 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

A protein domain co-occurrence network approach for predicting protein function and inferring species phylogeny.

Wang Zheng Z   Zhang Xue-Cheng XC   Le Mi Ha MH   Xu Dong D   Stacey Gary G   Cheng Jianlin J  

PloS one 20110324 3


Protein Domain Co-occurrence Network (DCN) is a biological network that has not been fully-studied. We analyzed the properties of the DCNs of H. sapiens, S. cerevisiae, C. elegans, D. melanogaster, and 15 plant genomes. These DCNs have the hallmark features of scale-free networks. We investigated the possibility of using DCNs to predict protein and domain functions. Based on our experiment conducted on 66 randomly selected proteins, the best of top 3 predictions made by our DCN-based aggregated  ...[more]

Similar Datasets

| S-EPMC4150328 | biostudies-literature
| S-EPMC6567618 | biostudies-literature
| S-EPMC524420 | biostudies-literature
| S-EPMC5766236 | biostudies-literature
| S-EPMC5877883 | biostudies-other
| S-EPMC4357223 | biostudies-literature
| S-EPMC3248385 | biostudies-other
| S-EPMC6854734 | biostudies-literature
| S-EPMC552930 | biostudies-literature
| S-EPMC1794579 | biostudies-literature