Unknown

Dataset Information

0

The Impact of Medical Big Data Anonymization on Early Acute Kidney Injury Risk Prediction.


ABSTRACT: Artificial intelligence enabled medical big data analysis has the potential to revolutionize medical practice from diagnosis and prediction of complex diseases to making recommendations and resource allocation decisions in an evidence-based manner. However, big data comes with big disclosure risks. To preserve privacy, excessive data anonymization is often necessary, leading to significant loss of data utility. In this paper, we develop a systematic data scrubbing procedure for large datasets when key variables are uncertain for re-identification risk assessment and assess the trade-off between anonymization of electronic health record data for sharing in support of open science and performance of machine learning models for early acute kidney injury risk prediction using the data. Results demonstrate that our proposed data scrubbing procedure can maintain good feature diversity and moderate data utility but raises concerns regarding its impact on knowledge discovery capability.

SUBMITTER: Song X 

PROVIDER: S-EPMC7233037 | biostudies-literature | 2020

REPOSITORIES: biostudies-literature

altmetric image

Publications

The Impact of Medical Big Data Anonymization on Early Acute Kidney Injury Risk Prediction.

Song Xing X   Waitman Lemuel R LR   Hu Yong Y   Luo Bo B   Li Fengjun F   Liu Mei M  

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science 20200530


Artificial intelligence enabled medical big data analysis has the potential to revolutionize medical practice from diagnosis and prediction of complex diseases to making recommendations and resource allocation decisions in an evidence-based manner. However, big data comes with big disclosure risks. To preserve privacy, excessive data anonymization is often necessary, leading to significant loss of data utility. In this paper, we develop a systematic data scrubbing procedure for large datasets wh  ...[more]

Similar Datasets

| S-EPMC6284146 | biostudies-literature
| S-EPMC10146481 | biostudies-literature
| S-EPMC9283743 | biostudies-literature
| S-EPMC10785491 | biostudies-literature
| S-EPMC8212017 | biostudies-literature
| S-EPMC9193404 | biostudies-literature
| S-EPMC10563357 | biostudies-literature
| S-EPMC8353807 | biostudies-literature
| S-EPMC9429796 | biostudies-literature
| S-EPMC6080076 | biostudies-literature