Unknown

Dataset Information

0

Determination of minimum training sample size for microarray-based cancer outcome prediction-an empirical assessment.


ABSTRACT: The promise of microarray technology in providing prediction classifiers for cancer outcome estimation has been confirmed by a number of demonstrable successes. However, the reliability of prediction results relies heavily on the accuracy of statistical parameters involved in classifiers. It cannot be reliably estimated with only a small number of training samples. Therefore, it is of vital importance to determine the minimum number of training samples and to ensure the clinical value of microarrays in cancer outcome prediction. We evaluated the impact of training sample size on model performance extensively based on 3 large-scale cancer microarray datasets provided by the second phase of MicroArray Quality Control project (MAQC-II). An SSNR-based (scale of signal-to-noise ratio) protocol was proposed in this study for minimum training sample size determination. External validation results based on another 3 cancer datasets confirmed that the SSNR-based approach could not only determine the minimum number of training samples efficiently, but also provide a valuable strategy for estimating the underlying performance of classifiers in advance. Once translated into clinical routine applications, the SSNR-based protocol would provide great convenience in microarray-based cancer outcome prediction in improving classifier reliability.

SUBMITTER: Shao L 

PROVIDER: S-EPMC3702597 | biostudies-literature | 2013

REPOSITORIES: biostudies-literature

altmetric image

Publications

Determination of minimum training sample size for microarray-based cancer outcome prediction-an empirical assessment.

Shao Li L   Fan Xiaohui X   Cheng Ningtao N   Wu Leihong L   Cheng Yiyu Y  

PloS one 20130705 7


The promise of microarray technology in providing prediction classifiers for cancer outcome estimation has been confirmed by a number of demonstrable successes. However, the reliability of prediction results relies heavily on the accuracy of statistical parameters involved in classifiers. It cannot be reliably estimated with only a small number of training samples. Therefore, it is of vital importance to determine the minimum number of training samples and to ensure the clinical value of microar  ...[more]

Similar Datasets

| S-EPMC10011335 | biostudies-literature
| S-EPMC7034864 | biostudies-literature
| S-EPMC6972819 | biostudies-literature
| S-EPMC4195669 | biostudies-literature
| S-EPMC2569926 | biostudies-literature
| S-EPMC2837028 | biostudies-literature
| S-EPMC5960641 | biostudies-literature
| S-EPMC4521133 | biostudies-literature
| S-EPMC5870539 | biostudies-literature
| S-EPMC10000262 | biostudies-literature