Unknown

Dataset Information

0

Prediction of cancer proteins by integrating protein interaction, domain frequency, and domain interaction data using machine learning algorithms.


ABSTRACT: Many proteins are known to be associated with cancer diseases. It is quite often that their precise functional role in disease pathogenesis remains unclear. A strategy to gain a better understanding of the function of these proteins is to make use of a combination of different aspects of proteomics data types. In this study, we extended Aragues's method by employing the protein-protein interaction (PPI) data, domain-domain interaction (DDI) data, weighted domain frequency score (DFS), and cancer linker degree (CLD) data to predict cancer proteins. Performances were benchmarked based on three kinds of experiments as follows: (I) using individual algorithm, (II) combining algorithms, and (III) combining the same classification types of algorithms. When compared with Aragues's method, our proposed methods, that is, machine learning algorithm and voting with the majority, are significantly superior in all seven performance measures. We demonstrated the accuracy of the proposed method on two independent datasets. The best algorithm can achieve a hit ratio of 89.4% and 72.8% for lung cancer dataset and lung cancer microarray study, respectively. It is anticipated that the current research could help understand disease mechanisms and diagnosis.

SUBMITTER: Huang CH 

PROVIDER: S-EPMC4381656 | biostudies-literature | 2015

REPOSITORIES: biostudies-literature

altmetric image

Publications

Prediction of cancer proteins by integrating protein interaction, domain frequency, and domain interaction data using machine learning algorithms.

Huang Chien-Hung CH   Peng Huai-Shun HS   Ng Ka-Lok KL  

BioMed research international 20150317


Many proteins are known to be associated with cancer diseases. It is quite often that their precise functional role in disease pathogenesis remains unclear. A strategy to gain a better understanding of the function of these proteins is to make use of a combination of different aspects of proteomics data types. In this study, we extended Aragues's method by employing the protein-protein interaction (PPI) data, domain-domain interaction (DDI) data, weighted domain frequency score (DFS), and cancer  ...[more]

Similar Datasets

| S-EPMC3036623 | biostudies-literature
| S-EPMC10111190 | biostudies-literature
| S-EPMC9085875 | biostudies-literature
| S-EPMC6275787 | biostudies-literature
| S-EPMC5831789 | biostudies-literature
| S-EPMC9831019 | biostudies-literature
| S-EPMC8011785 | biostudies-literature
2013-01-01 | E-GEOD-29210 | biostudies-arrayexpress
| S-EPMC11373136 | biostudies-literature
| S-EPMC4914443 | biostudies-literature