Unknown

Dataset Information

0

IPF-LASSO: Integrative L1-Penalized Regression with Penalty Factors for Prediction Based on Multi-Omics Data.


ABSTRACT: As modern biotechnologies advance, it has become increasingly frequent that different modalities of high-dimensional molecular data (termed "omics" data in this paper), such as gene expression, methylation, and copy number, are collected from the same patient cohort to predict the clinical outcome. While prediction based on omics data has been widely studied in the last fifteen years, little has been done in the statistical literature on the integration of multiple omics modalities to select a subset of variables for prediction, which is a critical task in personalized medicine. In this paper, we propose a simple penalized regression method to address this problem by assigning different penalty factors to different data modalities for feature selection and prediction. The penalty factors can be chosen in a fully data-driven fashion by cross-validation or by taking practical considerations into account. In simulation studies, we compare the prediction performance of our approach, called IPF-LASSO (Integrative LASSO with Penalty Factors) and implemented in the R package ipflasso, with the standard LASSO and sparse group LASSO. The use of IPF-LASSO is also illustrated through applications to two real-life cancer datasets. All data and codes are available on the companion website to ensure reproducibility.

SUBMITTER: Boulesteix AL 

PROVIDER: S-EPMC5435977 | biostudies-literature | 2017

REPOSITORIES: biostudies-literature

altmetric image

Publications

IPF-LASSO: Integrative <i>L</i><sub>1</sub>-Penalized Regression with Penalty Factors for Prediction Based on Multi-Omics Data.

Boulesteix Anne-Laure AL   De Bin Riccardo R   Jiang Xiaoyu X   Fuchs Mathias M  

Computational and mathematical methods in medicine 20170504


As modern biotechnologies advance, it has become increasingly frequent that different modalities of high-dimensional molecular data (termed "omics" data in this paper), such as gene expression, methylation, and copy number, are collected from the same patient cohort to predict the clinical outcome. While prediction based on omics data has been widely studied in the last fifteen years, little has been done in the statistical literature on the integration of multiple omics modalities to select a s  ...[more]

Similar Datasets

| S-EPMC7065407 | biostudies-literature
| S-EPMC8796362 | biostudies-literature
| S-EPMC5499546 | biostudies-literature
| S-EPMC8070328 | biostudies-literature
| S-EPMC2732298 | biostudies-literature
| S-EPMC7523642 | biostudies-literature
| S-EPMC4642618 | biostudies-literature
| S-EPMC6499521 | biostudies-literature
| S-EPMC6134797 | biostudies-literature
| S-EPMC9310531 | biostudies-literature