Unknown

Dataset Information

0

Semi-supervised learning improves gene expression-based prediction of cancer recurrence.


ABSTRACT: Gene expression profiling has shown great potential in outcome prediction for different types of cancers. Nevertheless, small sample size remains a bottleneck in obtaining robust and accurate classifiers. Traditional supervised learning techniques can only work with labeled data. Consequently, a large number of microarray data that do not have sufficient follow-up information are disregarded. To fully leverage all of the precious data in public databases, we turned to a semi-supervised learning technique, low density separation (LDS).Using a clinically important question of predicting recurrence risk in colorectal cancer patients, we demonstrated that (i) semi-supervised classification improved prediction accuracy as compared with the state of the art supervised method SVM, (ii) performance gain increased with the number of unlabeled samples, (iii) unlabeled data from different institutes could be employed after appropriate processing and (iv) the LDS method is robust with regard to the number of input features. To test the general applicability of this semi-supervised method, we further applied LDS on human breast cancer datasets and also observed superior performance. Our results demonstrated great potential of semi-supervised learning in gene expression-based outcome prediction for cancer patients.bing.zhang@vanderbilt.edu.Supplementary data are available at Bioinformatics online.

SUBMITTER: Shi M 

PROVIDER: S-EPMC3198572 | biostudies-literature | 2011 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Semi-supervised learning improves gene expression-based prediction of cancer recurrence.

Shi Mingguang M   Zhang Bing B  

Bioinformatics (Oxford, England) 20110904 21


<h4>Motivation</h4>Gene expression profiling has shown great potential in outcome prediction for different types of cancers. Nevertheless, small sample size remains a bottleneck in obtaining robust and accurate classifiers. Traditional supervised learning techniques can only work with labeled data. Consequently, a large number of microarray data that do not have sufficient follow-up information are disregarded. To fully leverage all of the precious data in public databases, we turned to a semi-s  ...[more]

Similar Datasets

| S-EPMC3908883 | biostudies-literature
| S-EPMC10163727 | biostudies-literature
2019-11-13 | GSE140262 | GEO
| S-EPMC8627024 | biostudies-literature
| S-EPMC7703937 | biostudies-literature
| S-EPMC3009533 | biostudies-literature
| S-EPMC9499925 | biostudies-literature
| S-EPMC4671612 | biostudies-literature
| S-EPMC4945015 | biostudies-literature
| S-EPMC4036113 | biostudies-literature