Unknown

Dataset Information

0

A method to improve protein subcellular localization prediction by integrating various biological data sources.


ABSTRACT: BACKGROUND: Protein subcellular localization is crucial information to elucidate protein functions. Owing to the need for large-scale genome analysis, computational method for efficiently predicting protein subcellular localization is highly required. Although many previous works have been done for this task, the problem is still challenging due to several reasons: the number of subcellular locations in practice is large; distribution of protein in locations is imbalanced, that is the number of protein in each location remarkably different; and there are many proteins located in multiple locations. Thus it is necessary to explore new features and appropriate classification methods to improve the prediction performance. RESULTS: In this paper we propose a new predicting method which combines two key ideas: 1) Information of neighbour proteins in a probabilistic gene network is integrated to enrich the prediction features. 2) Fuzzy k-NN, a classification method based on fuzzy set theory is applied to predict protein locating in multiple sites. Experiment was conducted on a dataset consisting of 22 locations from Budding yeast proteins and significant improvement was observed. CONCLUSION: Our results suggest that the neighbourhood information from functional gene networks is predictive to subcellular localization. The proposed method thus can be integrated and complementary to other available prediction methods.

SUBMITTER: Tung TQ 

PROVIDER: S-EPMC2648781 | biostudies-literature | 2009

REPOSITORIES: biostudies-literature

altmetric image

Publications

A method to improve protein subcellular localization prediction by integrating various biological data sources.

Tung Thai Quang TQ   Lee Doheon D  

BMC bioinformatics 20090130


<h4>Background</h4>Protein subcellular localization is crucial information to elucidate protein functions. Owing to the need for large-scale genome analysis, computational method for efficiently predicting protein subcellular localization is highly required. Although many previous works have been done for this task, the problem is still challenging due to several reasons: the number of subcellular locations in practice is large; distribution of protein in locations is imbalanced, that is the num  ...[more]

Similar Datasets

| S-EPMC3000424 | biostudies-literature
| S-EPMC2745392 | biostudies-literature
| S-EPMC3584913 | biostudies-literature
| S-EPMC2582614 | biostudies-literature
| S-EPMC2176073 | biostudies-literature
| S-EPMC3236839 | biostudies-literature
| S-EPMC4867227 | biostudies-literature
| S-EPMC7214030 | biostudies-literature
| S-EPMC5496887 | biostudies-other
| S-EPMC7764902 | biostudies-literature