Support Vector Machines with Disease-gene-centric Network Penalty for High Dimensional Microarray Data.
Ontology highlight
ABSTRACT: With the availability of genetic pathways or networks and accumulating knowledge on genes with variants predisposing to diseases (disease genes), we propose a disease-gene-centric support vector machine (DGC-SVM) that directly incorporates these two sources of prior information into building microarray-based classifiers for binary classification problems. DGC-SVM aims to detect the genes clustering together and around some key disease genes in a gene network. To achieve this goal, we propose a penalty over suitably defined groups of genes. A hierarchy is imposed on an undirected gene network to facilitate the definition of such gene groups. Our proposed DGC-SVM utilizes the hinge loss penalized by a sum of the L(infinity)-norm being applied to each group. The simulation studies show that DGC-SVM not only detects more disease genes along pathways than the existing standard SVM and SVM with an L(1)-penalty (L1-SVM), but also captures disease genes that potentially affect the outcome only weakly. Two real data applications demonstrate that DGC-SVM improves gene selection with predictive performance comparable to the standard-SVM and L1-SVM. The proposed method has the potential to be an effective classification tool that encourages gene selection along paths to or clustering around known disease genes for microarray data.
SUBMITTER: Zhu Y
PROVIDER: S-EPMC2854644 | biostudies-literature | 2009
REPOSITORIES: biostudies-literature
ACCESS DATA