Unknown

Dataset Information

0

Minimum redundancy maximum relevance feature selection approach for temporal gene expression data.


ABSTRACT: Feature selection, aiming to identify a subset of features among a possibly large set of features that are relevant for predicting a response, is an important preprocessing step in machine learning. In gene expression studies this is not a trivial task for several reasons, including potential temporal character of data. However, most feature selection approaches developed for microarray data cannot handle multivariate temporal data without previous data flattening, which results in loss of temporal information. We propose a temporal minimum redundancy - maximum relevance (TMRMR) feature selection approach, which is able to handle multivariate temporal data without previous data flattening. In the proposed approach we compute relevance of a gene by averaging F-statistic values calculated across individual time steps, and we compute redundancy between genes by using a dynamical time warping approach.The proposed method is evaluated on three temporal gene expression datasets from human viral challenge studies. Obtained results show that the proposed method outperforms alternatives widely used in gene expression studies. In particular, the proposed method achieved improvement in accuracy in 34 out of 54 experiments, while the other methods outperformed it in no more than 4 experiments.We developed a filter-based feature selection method for temporal gene expression data based on maximum relevance and minimum redundancy criteria. The proposed method incorporates temporal information by combining relevance, which is calculated as an average F-statistic value across different time steps, with redundancy, which is calculated by employing dynamical time warping approach. As evident in our experiments, incorporating the temporal information into the feature selection process leads to selection of more discriminative features.

SUBMITTER: Radovic M 

PROVIDER: S-EPMC5209828 | biostudies-literature | 2017 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

Minimum redundancy maximum relevance feature selection approach for temporal gene expression data.

Radovic Milos M   Ghalwash Mohamed M   Filipovic Nenad N   Obradovic Zoran Z  

BMC bioinformatics 20170103 1


<h4>Background</h4>Feature selection, aiming to identify a subset of features among a possibly large set of features that are relevant for predicting a response, is an important preprocessing step in machine learning. In gene expression studies this is not a trivial task for several reasons, including potential temporal character of data. However, most feature selection approaches developed for microarray data cannot handle multivariate temporal data without previous data flattening, which resul  ...[more]

Similar Datasets

| S-EPMC4029432 | biostudies-literature
| S-EPMC5331171 | biostudies-literature
| S-EPMC1569877 | biostudies-literature
| S-EPMC4756144 | biostudies-literature
| S-EPMC6157248 | biostudies-literature
| S-EPMC9122960 | biostudies-literature
| S-EPMC2905366 | biostudies-literature
| S-EPMC10267731 | biostudies-literature
| S-EPMC7397300 | biostudies-literature
| S-EPMC7860207 | biostudies-literature