Unknown

Dataset Information

0

Surrogate Assisted Semi-supervised Inference for High Dimensional Risk Prediction.


ABSTRACT: Risk modeling with electronic health records (EHR) data is challenging due to no direct observations of the disease outcome and the high-dimensional predictors. In this paper, we develop a surrogate assisted semi-supervised learning approach, leveraging small labeled data with annotated outcomes and extensive unlabeled data of outcome surrogates and high-dimensional predictors. We propose to impute the unobserved outcomes by constructing a sparse imputation model with outcome surrogates and high-dimensional predictors. We further conduct a one-step bias correction to enable interval estimation for the risk prediction. Our inference procedure is valid even if both the imputation and risk prediction models are misspecified. Our novel way of ultilizing unlabelled data enables the high-dimensional statistical inference for the challenging setting with a dense risk prediction model. We present an extensive simulation study to demonstrate the superiority of our approach compared to existing supervised methods. We apply the method to genetic risk prediction of type-2 diabetes mellitus using an EHR biobank cohort.

SUBMITTER: Hou J 

PROVIDER: S-EPMC10947223 | biostudies-literature | 2023 Jan-Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Surrogate Assisted Semi-supervised Inference for High Dimensional Risk Prediction.

Hou Jue J   Guo Zijian Z   Cai Tianxi T  

Journal of machine learning research : JMLR 20230101


Risk modeling with electronic health records (EHR) data is challenging due to no direct observations of the disease outcome and the high-dimensional predictors. In this paper, we develop a surrogate assisted semi-supervised learning approach, leveraging small labeled data with annotated outcomes and extensive unlabeled data of outcome surrogates and high-dimensional predictors. We propose to impute the unobserved outcomes by constructing a sparse imputation model with outcome surrogates and high  ...[more]

Similar Datasets

| S-EPMC11902906 | biostudies-literature
| S-EPMC3956069 | biostudies-literature
| S-EPMC6455938 | biostudies-literature
| S-EPMC8717470 | biostudies-literature
| S-EPMC3656881 | biostudies-literature
| S-EPMC4074792 | biostudies-literature
| S-EPMC9929447 | biostudies-literature
| S-EPMC6821385 | biostudies-literature
| S-EPMC10423721 | biostudies-literature
| S-EPMC10163727 | biostudies-literature