Unknown

Dataset Information

0

A feature selection method for classification within functional genomics experiments based on the proportional overlapping score.


ABSTRACT: BACKGROUND: Microarray technology, as well as other functional genomics experiments, allow simultaneous measurements of thousands of genes within each sample. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on selected discriminative genes. We propose a statistical method for selecting genes based on overlapping analysis of expression data across classes. This method results in a novel measure, called proportional overlapping score (POS), of a feature's relevance to a classification task. RESULTS: We apply POS, along-with four widely used gene selection methods, to several benchmark gene expression datasets. The experimental results of classification error rates computed using the Random Forest, k Nearest Neighbor and Support Vector Machine classifiers show that POS achieves a better performance. CONCLUSIONS: A novel gene selection method, POS, is proposed. POS analyzes the expressions overlap across classes taking into account the proportions of overlapping samples. It robustly defines a mask for each gene that allows it to minimize the effect of expression outliers. The constructed masks along-with a novel gene score are exploited to produce the selected subset of genes.

SUBMITTER: Mahmoud O 

PROVIDER: S-EPMC4141116 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

altmetric image

Publications

A feature selection method for classification within functional genomics experiments based on the proportional overlapping score.

Mahmoud Osama O   Harrison Andrew A   Perperoglou Aris A   Gul Asma A   Khan Zardad Z   Metodiev Metodi V MV   Lausen Berthold B  

BMC bioinformatics 20140811


<h4>Background</h4>Microarray technology, as well as other functional genomics experiments, allow simultaneous measurements of thousands of genes within each sample. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on selected discriminative genes. We propose a statistical method for selecting genes based on overlapping analysis of expression data across classes. This method results in a novel measure, called proporti  ...[more]

Similar Datasets

| S-EPMC8176540 | biostudies-literature
| S-EPMC6445890 | biostudies-literature
| S-EPMC4105478 | biostudies-literature
| S-EPMC8233431 | biostudies-literature
| S-EPMC9146727 | biostudies-literature
| S-EPMC5158321 | biostudies-literature
| S-EPMC6101392 | biostudies-literature
| S-EPMC8672607 | biostudies-literature
| S-EPMC1854845 | biostudies-other
| S-EPMC4127204 | biostudies-other