Unknown

Dataset Information

0

Biomarker discovery using statistically significant gene sets.


ABSTRACT: Analysis of large gene expression data sets in the presence and absence of a phenotype can lead to the selection of a group of genes serving as biomarkers jointly predicting the phenotype. Among gene selection methods, filter methods derived from ranked individual genes have been widely used in existing products for diagnosis and prognosis. Univariate filter approaches selecting genes individually, although computationally efficient, often ignore gene interactions inherent in the biological data. On the other hand, multivariate approaches selecting gene subsets are known to have a higher risk of selecting spurious gene subsets due to the overfitting of the vast number of gene subsets evaluated. Here we propose a framework of statistical significance tests for multivariate feature selection that can reduce the risk of selecting spurious gene subsets. Using three existing data sets, we show that our proposed approach is an essential step to identify such a gene set that is generated by a significant interaction of its members, even improving classification performance when compared to established approaches. This technique can be applied for the discovery of robust biomarkers for medical diagnosis.

SUBMITTER: Kim H 

PROVIDER: S-EPMC3179615 | biostudies-literature | 2011 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

Biomarker discovery using statistically significant gene sets.

Kim Hoon H   Watkinson John J   Anastassiou Dimitris D  

Journal of computational biology : a journal of computational molecular cell biology 20110401 10


Analysis of large gene expression data sets in the presence and absence of a phenotype can lead to the selection of a group of genes serving as biomarkers jointly predicting the phenotype. Among gene selection methods, filter methods derived from ranked individual genes have been widely used in existing products for diagnosis and prognosis. Univariate filter approaches selecting genes individually, although computationally efficient, often ignore gene interactions inherent in the biological data  ...[more]

Similar Datasets

| S-EPMC6691336 | biostudies-literature
| S-EPMC9333302 | biostudies-literature
| S-EPMC3084717 | biostudies-literature
| S-EPMC4280643 | biostudies-literature
| S-EPMC3890500 | biostudies-literature
| S-EPMC2943622 | biostudies-literature
| S-EPMC8601419 | biostudies-literature
| S-EPMC1200092 | biostudies-literature
| S-EPMC5918465 | biostudies-other
| S-EPMC5449640 | biostudies-literature