Unknown

Dataset Information

0

Gene function prediction using labeled and unlabeled data.


ABSTRACT: BACKGROUND: In general, gene function prediction can be formalized as a classification problem based on machine learning technique. Usually, both labeled positive and negative samples are needed to train the classifier. For the problem of gene function prediction, however, the available information is only about positive samples. In other words, we know which genes have the function of interested, while it is generally unclear which genes do not have the function, i.e. the negative samples. If all the genes outside of the target functional family are seen as negative samples, the imbalanced problem will arise because there are only a relatively small number of genes annotated in each family. Furthermore, the classifier may be degraded by the false negatives in the heuristically generated negative samples. RESULTS: In this paper, we present a new technique, namely Annotating Genes with Positive Samples (AGPS), for defining negative samples in gene function prediction. With the defined negative samples, it is straightforward to predict the functions of unknown genes. In addition, the AGPS algorithm is able to integrate various kinds of data sources to predict gene functions in a reliable and accurate manner. With the one-class and two-class Support Vector Machines as the core learning algorithm, the AGPS algorithm shows good performances for function prediction on yeast genes. CONCLUSION: We proposed a new method for defining negative samples in gene function prediction. Experimental results on yeast genes show that AGPS yields good performances on both training and test sets. In addition, the overlapping between prediction results and GO annotations on unknown genes also demonstrates the effectiveness of the proposed method.

SUBMITTER: Zhao XM 

PROVIDER: S-EPMC2275242 | biostudies-literature | 2008

REPOSITORIES: biostudies-literature

altmetric image

Publications

Gene function prediction using labeled and unlabeled data.

Zhao Xing-Ming XM   Wang Yong Y   Chen Luonan L   Aihara Kazuyuki K  

BMC bioinformatics 20080128


<h4>Background</h4>In general, gene function prediction can be formalized as a classification problem based on machine learning technique. Usually, both labeled positive and negative samples are needed to train the classifier. For the problem of gene function prediction, however, the available information is only about positive samples. In other words, we know which genes have the function of interested, while it is generally unclear which genes do not have the function, i.e. the negative sample  ...[more]

Similar Datasets

| S-EPMC2975410 | biostudies-literature
| S-EPMC3116449 | biostudies-literature
| S-EPMC4739180 | biostudies-literature
| S-EPMC2814200 | biostudies-literature
| S-EPMC7703756 | biostudies-literature
| S-EPMC4992543 | biostudies-literature
| S-EPMC2976573 | biostudies-literature
| S-EPMC7044318 | biostudies-literature
| S-EPMC3669044 | biostudies-literature
| S-EPMC10187222 | biostudies-literature