A Deep Neural Network Model using Random Forest to Extract Feature Representation for Gene Expression Data Classification.
Ontology highlight
ABSTRACT: In predictive model development, gene expression data is associated with the unique challenge that the number of samples (n) is much smaller than the amount of features (p). This "n???p" property has prevented classification of gene expression data from deep learning techniques, which have been proved powerful under "n?>?p" scenarios in other application fields, such as image classification. Further, the sparsity of effective features with unknown correlation structures in gene expression profiles brings more challenges for classification tasks. To tackle these problems, we propose a newly developed classifier named Forest Deep Neural Network (fDNN), to integrate the deep neural network architecture with a supervised forest feature detector. Using this built-in feature detector, the method is able to learn sparse feature representations and feed the representations into a neural network to mitigate the overfitting problem. Simulation experiments and real data analyses using two RNA-seq expression datasets are conducted to evaluate fDNN's capability. The method is demonstrated a useful addition to current predictive models with better classification performance and more meaningful selected features compared to ordinary random forests and deep neural networks.
SUBMITTER: Kong Y
PROVIDER: S-EPMC6220289 | biostudies-literature | 2018 Nov
REPOSITORIES: biostudies-literature
ACCESS DATA