A Sparse Regression Method for Group-Wise Feature Selection with False Discovery Rate Control.
Ontology highlight
ABSTRACT: The method of Sorted L-One Penalized Estimation, or SLOPE, is a sparse regression method recently introduced by Bogdan et. al. [1] . It can be used to identify significant predictor variables in a linear model that may have more unknown parameters than observations. When the correlations between predictor variables are small, the SLOPE method is shown to successfully control the false discovery rate (the expected proportion of the irrelevant among all selected predictors) at a user specified level. However, the requirement for nearly uncorrelated predictors is too restrictive for genomic data, as demonstrated in our recent study [2] by an application of SLOPE to realistic simulated DNA sequence data. A possible solution is to divide the predictor variables into nearly uncorrelated groups, and to modify the procedure to select entire groups with an overall significant group effect, rather than individual predictors. Following this motivation, we extend SLOPE in the spirit of Group LASSO to Group SLOPE, a method that can handle group structures between the predictor variables, which are ubiquitous in real genomic data. Our theoretical results show that Group SLOPE controls the group-wise false discovery rate (gFDR), when groups are orthogonal to each other. For use in non-orthogonal settings, we propose two types of Monte Carlo based heuristics, which lead to gFDR control with Group SLOPE in simulations based on real SNP data. As an illustration of the merits of this method, an application of Group SLOPE to a dataset from the Framingham Heart Study results in the identification of some known DNA sequence regions associated with bone health, as well as some new candidate regions. The novel methods are implemented in the R package grpSLOPEMC , which is publicly available at https://github.com/agisga/grpSLOPEMC.
SUBMITTER: Gossmann A
PROVIDER: S-EPMC6326365 | biostudies-literature | 2018 Jul-Aug
REPOSITORIES: biostudies-literature
ACCESS DATA