Unknown

Dataset Information

0

Integration of genetic and clinical information to improve imputation of data missing from electronic health records.


ABSTRACT: OBJECTIVE:Clinical data of patients' measurements and treatment history stored in electronic health record (EHR) systems are starting to be mined for better treatment options and disease associations. A primary challenge associated with utilizing EHR data is the considerable amount of missing data. Failure to address this issue can introduce significant bias in EHR-based research. Currently, imputation methods rely on correlations among the structured phenotype variables in the EHR. However, genetic studies have shown that many EHR-based phenotypes have a heritable component, suggesting that measured genetic variants might be useful for imputing missing data. In this article, we developed a computational model that incorporates patients' genetic information to perform EHR data imputation. MATERIALS AND METHODS:We used the individual single nucleotide polymorphism's association with phenotype variables in the EHR as input to construct a genetic risk score that quantifies the genetic contribution to the phenotype. Multiple approaches to constructing the genetic risk score were evaluated for optimal performance. The genetic score, along with phenotype correlation, is then used as a predictor to impute the missing values. RESULTS:To demonstrate the method performance, we applied our model to impute missing cardiovascular related measurements including low-density lipoprotein, heart failure, and aortic aneurysm disease in the electronic Medical Records and Genomics data. The integration method improved imputation's area-under-the-curve for binary phenotypes and decreased root-mean-square error for continuous phenotypes. CONCLUSION:Compared with standard imputation approaches, incorporating genetic information offers a novel approach that can utilize more of the EHR data for better performance in missing data imputation.

SUBMITTER: Li R 

PROVIDER: S-EPMC6748821 | biostudies-literature | 2019 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

Integration of genetic and clinical information to improve imputation of data missing from electronic health records.

Li Ruowang R   Chen Yong Y   Moore Jason H JH  

Journal of the American Medical Informatics Association : JAMIA 20191001 10


<h4>Objective</h4>Clinical data of patients' measurements and treatment history stored in electronic health record (EHR) systems are starting to be mined for better treatment options and disease associations. A primary challenge associated with utilizing EHR data is the considerable amount of missing data. Failure to address this issue can introduce significant bias in EHR-based research. Currently, imputation methods rely on correlations among the structured phenotype variables in the EHR. Howe  ...[more]

Similar Datasets

| S-EPMC5845101 | biostudies-literature
| S-EPMC5144587 | biostudies-literature
| S-EPMC7744924 | biostudies-literature
| S-EPMC9283251 | biostudies-literature
| S-EPMC4666800 | biostudies-literature
| S-EPMC2432074 | biostudies-literature
| S-EPMC6592487 | biostudies-literature
| S-EPMC7647298 | biostudies-literature
| S-EPMC10283126 | biostudies-literature
| S-EPMC9205685 | biostudies-literature