Dataset Information

Federated Learning of Electronic Health Records to Improve Mortality Prediction in Hospitalized Patients With COVID-19: Machine Learning Approach.

ABSTRACT:

Background

Machine learning models require large datasets that may be siloed across different health care institutions. Machine learning studies that focus on COVID-19 have been limited to single-hospital data, which limits model generalizability.

Objective

We aimed to use federated learning, a machine learning technique that avoids locally aggregating raw clinical data across multiple institutions, to predict mortality in hospitalized patients with COVID-19 within 7 days.

Methods

Patient data were collected from the electronic health records of 5 hospitals within the Mount Sinai Health System. Logistic regression with L1 regularization/least absolute shrinkage and selection operator (LASSO) and multilayer perceptron (MLP) models were trained by using local data at each site. We developed a pooled model with combined data from all 5 sites, and a federated model that only shared parameters with a central aggregator.

Results

The LASSO_federated model outperformed the LASSO_local model at 3 hospitals, and the MLP_federated model performed better than the MLP_local model at all 5 hospitals, as determined by the area under the receiver operating characteristic curve. The LASSO_pooled model outperformed the LASSO_federated model at all hospitals, and the MLP_federated model outperformed the MLP_pooled model at 2 hospitals.

Conclusions

The federated learning of COVID-19 electronic health record data shows promise in developing robust predictive models without compromising patient privacy.

SUBMITTER: Vaid A

PROVIDER: S-EPMC7842859 | biostudies-literature | 2021 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Federated Learning of Electronic Health Records to Improve Mortality Prediction in Hospitalized Patients With COVID-19: Machine Learning Approach.

Vaid Akhil A Jaladanki Suraj K SK Xu Jie J Teng Shelly S Kumar Arvind A Lee Samuel S Somani Sulaiman S Paranjpe Ishan I De Freitas Jessica K JK Wanyan Tingyi T Johnson Kipp W KW Bicak Mesude M Klang Eyal E Kwon Young Joon YJ Costa Anthony A Zhao Shan S Miotto Riccardo R Charney Alexander W AW Böttinger Erwin E Fayad Zahi A ZA Nadkarni Girish N GN Wang Fei F Glicksberg Benjamin S BS

JMIR medical informatics 20210127 1

<h4>Background</h4>Machine learning models require large datasets that may be siloed across different health care institutions. Machine learning studies that focus on COVID-19 have been limited to single-hospital data, which limits model generalizability.<h4>Objective</h4>We aimed to use federated learning, a machine learning technique that avoids locally aggregating raw clinical data across multiple institutions, to predict mortality in hospitalized patients with COVID-19 within 7 days.<h4>Meth ...[more]

PMID: 33400679

Similar Datasets

Project description:BackgroundRisk scores can be useful in clinical risk stratification and accurate allocations of medical resources, helping health providers improve patient care. Point-based scores are more understandable and explainable than other complex models and are now widely used in clinical decision making. However, the development of the risk scoring model is nontrivial and has not yet been systematically presented, with few studies investigating methods of clinical score generation using electronic health records.ObjectiveThis study aims to propose AutoScore, a machine learning-based automatic clinical score generator consisting of 6 modules for developing interpretable point-based scores. Future users can employ the AutoScore framework to create clinical scores effortlessly in various clinical applications.MethodsWe proposed the AutoScore framework comprising 6 modules that included variable ranking, variable transformation, score derivation, model selection, score fine-tuning, and model evaluation. To demonstrate the performance of AutoScore, we used data from the Beth Israel Deaconess Medical Center to build a scoring model for mortality prediction and then compared the data with other baseline models using the receiver operating characteristic analysis. A software package in R 3.5.3 (R Foundation) was also developed to demonstrate the implementation of AutoScore.ResultsImplemented on the data set with 44,918 individual admission episodes of intensive care, the AutoScore-created scoring models performed comparably well as other standard methods (ie, logistic regression, stepwise regression, least absolute shrinkage and selection operator, and random forest) in terms of predictive accuracy and model calibration but required fewer predictors and presented high interpretability and accessibility. The nine-variable, AutoScore-created, point-based scoring model achieved an area under the curve (AUC) of 0.780 (95% CI 0.764-0.798), whereas the model of logistic regression with 24 variables had an AUC of 0.778 (95% CI 0.760-0.795). Moreover, the AutoScore framework also drives the clinical research continuum and automation with its integration of all necessary modules.ConclusionsWe developed an easy-to-use, machine learning-based automatic clinical score generator, AutoScore; systematically presented its structure; and demonstrated its superiority (predictive performance and interpretability) over other conventional methods using a benchmark database. AutoScore will emerge as a potential scoring tool in various medical applications.

Project description:BackgroundPreterm birth (PTB), a common pregnancy complication, is responsible for 35% of the 3.1 million pregnancy-related deaths each year and significantly affects around 15 million children annually worldwide. Conventional approaches to predict PTB lack reliable predictive power, leaving >50% of cases undetected. Recently, machine learning (ML) models have shown potential as an appropriate complementary approach for PTB prediction using health records (HRs).ObjectiveThis study aimed to systematically review the literature concerned with PTB prediction using HR data and the ML approach.MethodsThis systematic review was conducted in accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) statement. A comprehensive search was performed in 7 bibliographic databases until May 15, 2021. The quality of the studies was assessed, and descriptive information, including descriptive characteristics of the data, ML modeling processes, and model performance, was extracted and reported.ResultsA total of 732 papers were screened through title and abstract. Of these 732 studies, 23 (3.1%) were screened by full text, resulting in 13 (1.8%) papers that met the inclusion criteria. The sample size varied from a minimum value of 274 to a maximum of 1,400,000. The time length for which data were extracted varied from 1 to 11 years, and the oldest and newest data were related to 1988 and 2018, respectively. Population, data set, and ML models' characteristics were assessed, and the performance of the model was often reported based on metrics such as accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve.ConclusionsVarious ML models used for different HR data indicated potential for PTB prediction. However, evaluation metrics, software and package used, data size and type, selected features, and importantly data management method often remain unjustified, threatening the reliability, performance, and internal or external validity of the model. To understand the usefulness of ML in covering the existing gap, future studies are also suggested to compare it with a conventional method on the same data set.

Project description:ObjectiveLiver cirrhosis is a leading cause of death and effects millions of people in the United States. Early mortality prediction among patients with cirrhosis might give healthcare providers more opportunity to effectively treat the condition. We hypothesized that laboratory test results and other related diagnoses would be associated with mortality in this population. Our another assumption was that a deep learning model could outperform the current Model for End Stage Liver disease (MELD) score in predicting mortality.Materials and methodsWe utilized electronic health record data from 34,575 patients with a diagnosis of cirrhosis from a large medical center to study associations with mortality. Three time-windows of mortality (365 days, 180 days and 90 days) and two cases with different number of variables (all 41 available variables and 4 variables in MELD-NA) were studied. Missing values were imputed using multiple imputation for continuous variables and mode for categorical variables. Deep learning and machine learning algorithms, i.e., deep neural networks (DNN), random forest (RF) and logistic regression (LR) were employed to study the associations between baseline features such as laboratory measurements and diagnoses for each time window by 5-fold cross validation method. Metrics such as area under the receiver operating curve (AUC), overall accuracy, sensitivity, and specificity were used to evaluate models.ResultsPerformance of models comprising all variables outperformed those with 4 MELD-NA variables for all prediction cases and the DNN model outperformed the LR and RF models. For example, the DNN model achieved an AUC of 0.88, 0.86, and 0.85 for 90, 180, and 365-day mortality respectively as compared to the MELD score, which resulted in corresponding AUCs of 0.81, 0.79, and 0.76 for the same instances. The DNN and LR models had a significantly better f1 score compared to MELD at all time points examined.ConclusionOther variables such as alkaline phosphatase, alanine aminotransferase, and hemoglobin were also top informative features besides the 4 MELD-Na variables. Machine learning and deep learning models outperformed the current standard of risk prediction among patients with cirrhosis. Advanced informatics techniques showed promise for risk prediction in patients with cirrhosis.

Dataset Information

Federated Learning of Electronic Health Records to Improve Mortality Prediction in Hospitalized Patients With COVID-19: Machine Learning Approach.

Background

Objective

Methods

Results

Conclusions

Publications

Federated Learning of Electronic Health Records to Improve Mortality Prediction in Hospitalized Patients With COVID-19: Machine Learning Approach.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets