Unknown

Dataset Information

0

Feature Selection for high Dimensional DNA Microarray data using hybrid approaches.


ABSTRACT: Feature selection from DNA microarray data is a major challenge due to high dimensionality in expression data. The number of samples in the microarray data set is much smaller compared to the number of genes. Hence the data is improper to be used as the training set of a classifier. Therefore it is important to select features prior to training the classifier. It should be noted that only a small subset of genes from the data set exhibits a strong correlation with the class. This is because finding the relevant genes from the data set is often non-trivial. Thus there is a need to develop robust yet reliable methods for gene finding in expression data. We describe the use of several hybrid feature selection approaches for gene finding in expression data. These approaches include filtering (filter out the best genes from the data set) and wrapper (best subset of genes from the data set) phases. The methods use information gain (IG) and Pearson Product Moment Correlation (PPMC) as the filtering parameters and biogeography based optimization (BBO) as the wrapper approach. K nearest neighbour algorithm (KNN) and back propagation neural network are used for evaluating the fitness of gene subsets during feature selection. Our analysis shows that an impressive performance is provided by the IG-BBO-KNN combination in different data sets with high accuracy (>90%) and low error rate.

SUBMITTER: Kumar AP 

PROVIDER: S-EPMC3796884 | biostudies-other | 2013

REPOSITORIES: biostudies-other

altmetric image

Publications

Feature Selection for high Dimensional DNA Microarray data using hybrid approaches.

Kumar Ammu Prasanna AP   Valsala Preeja P  

Bioinformation 20130923 16


Feature selection from DNA microarray data is a major challenge due to high dimensionality in expression data. The number of samples in the microarray data set is much smaller compared to the number of genes. Hence the data is improper to be used as the training set of a classifier. Therefore it is important to select features prior to training the classifier. It should be noted that only a small subset of genes from the data set exhibits a strong correlation with the class. This is because find  ...[more]

Similar Datasets

| S-EPMC3445441 | biostudies-literature
| S-EPMC5738110 | biostudies-literature
| S-EPMC3577111 | biostudies-literature
| S-EPMC7860207 | biostudies-literature
| S-EPMC6101392 | biostudies-literature
| S-EPMC6532608 | biostudies-literature
| S-EPMC7092448 | biostudies-literature
| S-EPMC2639693 | biostudies-literature
| S-EPMC4426954 | biostudies-literature
| S-EPMC2951666 | biostudies-literature