Unknown

Dataset Information

0

Predicting human protein subcellular localization by heterogeneous and comprehensive approaches.


ABSTRACT: Drug development and investigation of protein function both require an understanding of protein subcellular localization. We developed a system, REALoc, that can predict the subcellular localization of singleplex and multiplex proteins in humans. This system, based on comprehensive strategy, consists of two heterogeneous systematic frameworks that integrate one-to-one and many-to-many machine learning methods and use sequence-based features, including amino acid composition, surface accessibility, weighted sign aa index, and sequence similarity profile, as well as gene ontology function-based features. REALoc can be used to predict localization to six subcellular compartments (cell membrane, cytoplasm, endoplasmic reticulum/Golgi, mitochondrion, nucleus, and extracellular). REALoc yielded a 75.3% absolute true success rate during five-fold cross-validation and a 57.1% absolute true success rate in an independent database test, which was >10% higher than six other prediction systems. Lastly, we analyzed the effects of Vote and GANN models on singleplex and multiplex localization prediction efficacy. REALoc is freely available at http://predictor.nchu.edu.tw/REALoc.

SUBMITTER: Tung CH 

PROVIDER: S-EPMC5489166 | biostudies-literature | 2017

REPOSITORIES: biostudies-literature

altmetric image

Publications

Predicting human protein subcellular localization by heterogeneous and comprehensive approaches.

Tung Chi-Hua CH   Chen Chi-Wei CW   Sun Han-Hao HH   Chu Yen-Wei YW  

PloS one 20170628 6


Drug development and investigation of protein function both require an understanding of protein subcellular localization. We developed a system, REALoc, that can predict the subcellular localization of singleplex and multiplex proteins in humans. This system, based on comprehensive strategy, consists of two heterogeneous systematic frameworks that integrate one-to-one and many-to-many machine learning methods and use sequence-based features, including amino acid composition, surface accessibilit  ...[more]

Similar Datasets

| S-EPMC3050600 | biostudies-literature
| S-EPMC524420 | biostudies-literature
| S-EPMC5001230 | biostudies-literature
| S-EPMC3314587 | biostudies-literature
| S-EPMC1289393 | biostudies-literature
| S-EPMC7604748 | biostudies-literature
| S-EPMC2896088 | biostudies-literature
| S-EPMC3374840 | biostudies-literature
| S-EPMC2612013 | biostudies-literature
| S-EPMC2893129 | biostudies-literature