Dataset Information

Neonatal mortality prediction with routinely collected data: a machine learning approach.

ABSTRACT:

Background

Recent decreases in neonatal mortality have been slower than expected for most countries. This study aims to predict the risk of neonatal mortality using only data routinely available from birth records in the largest city of the Americas.

Methods

A probabilistic linkage of every birth record occurring in the municipality of São Paulo, Brazil, between 2012 e 2017 was performed with the death records from 2012 to 2018 (1,202,843 births and 447,687 deaths), and a total of 7282 neonatal deaths were identified (a neonatal mortality rate of 6.46 per 1000 live births). Births from 2012 and 2016 (N = 941,308; or 83.44% of the total) were used to train five different machine learning algorithms, while births occurring in 2017 (N = 186,854; or 16.56% of the total) were used to test their predictive performance on new unseen data.

Results

The best performance was obtained by the extreme gradient boosting trees (XGBoost) algorithm, with a very high AUC of 0.97 and F1-score of 0.55. The 5% births with the highest predicted risk of neonatal death included more than 90% of the actual neonatal deaths. On the other hand, there were no deaths among the 5% births with the lowest predicted risk. There were no significant differences in predictive performance for vulnerable subgroups. The use of a smaller number of variables (WHO's five minimum perinatal indicators) decreased overall performance but the results still remained high (AUC of 0.91). With the addition of only three more variables, we achieved the same predictive performance (AUC of 0.97) as using all the 23 variables originally available from the Brazilian birth records.

Conclusion

Machine learning algorithms were able to identify with very high predictive performance the neonatal mortality risk of newborns using only routinely collected data.

SUBMITTER: Batista AFM

PROVIDER: S-EPMC8293479 | biostudies-literature | 2021 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Neonatal mortality prediction with routinely collected data: a machine learning approach.

Batista André F M AFM Diniz Carmen S G CSG Bonilha Eliana A EA Kawachi Ichiro I Chiavegatto Filho Alexandre D P ADP

BMC pediatrics 20210721 1

<h4>Background</h4>Recent decreases in neonatal mortality have been slower than expected for most countries. This study aims to predict the risk of neonatal mortality using only data routinely available from birth records in the largest city of the Americas.<h4>Methods</h4>A probabilistic linkage of every birth record occurring in the municipality of São Paulo, Brazil, between 2012 e 2017 was performed with the death records from 2012 to 2018 (1,202,843 births and 447,687 deaths), and a total of ...[more]

PMID: 34289819

Similar Datasets

Project description:BackgroundIntensive care units (ICUs) face financial, bed management, and staffing constraints. Detailed data covering all aspects of patients' journeys into and through intensive care are now collected and stored in electronic health records: machine learning has been used to analyse such data in order to provide decision support to clinicians.MethodsSystematic review of the applications of machine learning to routinely collected ICU data. Web of Science and MEDLINE databases were searched to identify candidate articles: those on image processing were excluded. The study aim, the type of machine learning used, the size of dataset analysed, whether and how the model was validated, and measures of predictive accuracy were extracted.ResultsOf 2450 papers identified, 258 fulfilled eligibility criteria. The most common study aims were predicting complications (77 papers [29.8% of studies]), predicting mortality (70 [27.1%]), improving prognostic models (43 [16.7%]), and classifying sub-populations (29 [11.2%]). Median sample size was 488 (IQR 108-4099): 41 studies analysed data on > 10,000 patients. Analyses focused on 169 (65.5%) papers that used machine learning to predict complications, mortality, length of stay, or improvement of health. Predictions were validated in 161 (95.2%) of these studies: the area under the ROC curve (AUC) was reported by 97 (60.2%) but only 10 (6.2%) validated predictions using independent data. The median AUC was 0.83 in studies of 1000-10,000 patients, rising to 0.94 in studies of > 100,000 patients. The most common machine learning methods were neural networks (72 studies [42.6%]), support vector machines (40 [23.7%]), and classification/decision trees (34 [20.1%]). Since 2015 (125 studies [48.4%]), the most common methods were support vector machines (37 studies [29.6%]) and random forests (29 [23.2%]).ConclusionsThe rate of publication of studies using machine learning to analyse routinely collected ICU data is increasing rapidly. The sample sizes used in many published studies are too small to exploit the potential of these methods. Methodological and reporting guidelines are needed, particularly with regard to the choice of method and validation of predictions, to increase confidence in reported findings and aid in translating findings towards routine use in clinical practice.

Project description:Importance:Inpatient violence remains a significant problem despite existing risk assessment methods. The lack of robustness and the high degree of effort needed to use current methods might be mitigated by using routinely registered clinical notes. Objective:To develop and validate a multivariable prediction model for assessing inpatient violence risk based on machine learning techniques applied to clinical notes written in patients' electronic health records. Design, Setting, and Participants:This prognostic study used retrospective clinical notes registered in electronic health records during admission at 2 independent psychiatric health care institutions in the Netherlands. No exclusion criteria for individual patients were defined. At site 1, all adults admitted between January 2013 and August 2018 were included, and at site 2 all adults admitted to general psychiatric wards between June 2016 and August 2018 were included. Data were analyzed between September 2018 and February 2019. Main Outcomes and Measures:Predictive validity and generalizability of prognostic models measured using area under the curve (AUC). Results:Clinical notes recorded during a total of 3189 admissions of 2209 unique individuals at site 1 (mean [SD] age, 34.0 [16.6] years; 1536 [48.2%] male) and 3253 admissions of 1919 unique individuals at site 2 (mean [SD] age, 45.9 [16.6] years; 2097 [64.5%] male) were analyzed. Violent outcome was determined using the Staff Observation Aggression Scale-Revised. Nested cross-validation was used to train and evaluate models that assess violence risk during the first 4 weeks of admission based on clinical notes available after 24 hours. The predictive validity of models was measured at site 1 (AUC = 0.797; 95% CI, 0.771-0.822) and site 2 (AUC = 0.764; 95% CI, 0.732-0.797). The validation of pretrained models in the other site resulted in AUCs of 0.722 (95% CI, 0.690-0.753) at site 1 and 0.643 (95% CI, 0.610-0.675) at site 2; the difference in AUCs between the internally trained model and the model trained on other-site data was significant at site 1 (AUC difference = 0.075; 95% CI, 0.045-0.105; P < .001) and site 2 (AUC difference = 0.121; 95% CI, 0.085-0.156; P < .001). Conclusions and Relevance:Internally validated predictions resulted in AUC values with good predictive validity, suggesting that automatic violence risk assessment using routinely registered clinical notes is possible. The validation of trained models using data from other sites corroborates previous findings that violence risk assessment generalizes modestly to different populations.

Project description:Background Since the beginning of coronavirus disease 2019 (COVID-19), the development of predictive models has sparked relevant interest due to the initial lack of knowledge about diagnosis, treatment, and prognosis. The present study aimed at developing a model, through a machine learning approach, to predict intensive care unit (ICU) mortality in COVID-19 patients based on predefined clinical parameters. Results Observational multicenter cohort study. All COVID-19 adult patients admitted to 25 ICUs belonging to the VENETO ICU network (February 28th 2020-april 4th 2021) were enrolled. Patients admitted to the ICUs before 4th March 2021 were used for model training (“training set”), while patients admitted after the 5th of March 2021 were used for external validation (“test set 1”). A further group of patients (“test set 2”), admitted to the ICU of IRCCS Ca’ Granda Ospedale Maggiore Policlinico of Milan, was used for external validation. A SuperLearner machine learning algorithm was applied for model development, and both internal and external validation was performed. Clinical variables available for the model were (i) age, gender, sequential organ failure assessment score, Charlson Comorbidity Index score (not adjusted for age), Palliative Performance Score; (ii) need of invasive mechanical ventilation, non-invasive mechanical ventilation, O2 therapy, vasoactive agents, extracorporeal membrane oxygenation, continuous venous-venous hemofiltration, tracheostomy, re-intubation, prone position during ICU stay; and (iii) re-admission in ICU. One thousand two hundred ninety-three (80%) patients were included in the “training set”, while 124 (8%) and 199 (12%) patients were included in the “test set 1” and “test set 2,” respectively. Three different predictive models were developed. Each model included different sets of clinical variables. The three models showed similar predictive performances, with a training balanced accuracy that ranged between 0.72 and 0.90, while the cross-validation performance ranged from 0.75 to 0.85. Age was the leading predictor for all the considered models. Conclusions Our study provides a useful and reliable tool, through a machine learning approach, for predicting ICU mortality in COVID-19 patients. In all the estimated models, age was the variable showing the most important impact on mortality. Supplementary Information The online version contains supplementary material available at 10.1186/s44158-021-00002-x.

Project description:BackgroundIntensive care unit (ICU) patients demand continuous monitoring of several clinical and laboratory parameters that directly influence their medical progress and the staff's decision-making. Those data are vital in the assistance of these patients, being already used by several scoring systems. In this context, machine learning approaches have been used for medical predictions based on clinical data, which includes patient outcomes.AimTo develop a binary classifier for the outcome of death in ICU patients based on clinical and laboratory parameters, a set formed by 1087 instances and 50 variables from ICU patients admitted to the emergency department was obtained in the "WiDS (Women in Data Science) Datathon 2020: ICU Mortality Prediction" dataset.MethodsFor categorical variables, frequencies and risk ratios were calculated. Numerical variables were computed as means and standard deviations and Mann-Whitney U tests were performed. We then divided the data into a training (80%) and test (20%) set. The training set was used to train a predictive model based on the Random Forest algorithm and the test set was used to evaluate the predictive effectiveness of the model.ResultsA statistically significant association was identified between need for intubation, as well predominant systemic cardiovascular involvement, and hospital death. A number of the numerical variables analyzed (for instance Glasgow Coma Score punctuations, mean arterial pressure, temperature, pH, and lactate, creatinine, albumin and bilirubin values) were also significantly associated with death outcome. The proposed binary Random Forest classifier obtained on the test set (n = 218) had an accuracy of 80.28%, sensitivity of 81.82%, specificity of 79.43%, positive predictive value of 73.26%, negative predictive value of 84.85%, F1 score of 0.74, and area under the curve score of 0.85. The predictive variables of the greatest importance were the maximum and minimum lactate values, adding up to a predictive importance of 15.54%.ConclusionWe demonstrated the efficacy of a Random Forest machine learning algorithm for handling clinical and laboratory data from patients under intensive monitoring. Therefore, we endorse the emerging notion that machine learning has great potential to provide us support to critically question existing methodologies, allowing improvements that reduce mortality.

Project description:ObjectivesPredictive analytics in emergency care has mostly been limited to the use of clinical decision rules (CDRs) in the form of simple heuristics and scoring systems. In the development of CDRs, limitations in analytic methods and concerns with usability have generally constrained models to a preselected small set of variables judged to be clinically relevant and to rules that are easily calculated. Furthermore, CDRs frequently suffer from questions of generalizability, take years to develop, and lack the ability to be updated as new information becomes available. Newer analytic and machine learning techniques capable of harnessing the large number of variables that are already available through electronic health records (EHRs) may better predict patient outcomes and facilitate automation and deployment within clinical decision support systems. In this proof-of-concept study, a local, big data-driven, machine learning approach is compared to existing CDRs and traditional analytic methods using the prediction of sepsis in-hospital mortality as the use case.MethodsThis was a retrospective study of adult ED visits admitted to the hospital meeting criteria for sepsis from October 2013 to October 2014. Sepsis was defined as meeting criteria for systemic inflammatory response syndrome with an infectious admitting diagnosis in the ED. ED visits were randomly partitioned into an 80%/20% split for training and validation. A random forest model (machine learning approach) was constructed using over 500 clinical variables from data available within the EHRs of four hospitals to predict in-hospital mortality. The machine learning prediction model was then compared to a classification and regression tree (CART) model, logistic regression model, and previously developed prediction tools on the validation data set using area under the receiver operating characteristic curve (AUC) and chi-square statistics.ResultsThere were 5,278 visits among 4,676 unique patients who met criteria for sepsis. Of the 4,222 patients in the training group, 210 (5.0%) died during hospitalization, and of the 1,056 patients in the validation group, 50 (4.7%) died during hospitalization. The AUCs with 95% confidence intervals (CIs) for the different models were as follows: random forest model, 0.86 (95% CI = 0.82 to 0.90); CART model, 0.69 (95% CI = 0.62 to 0.77); logistic regression model, 0.76 (95% CI = 0.69 to 0.82); CURB-65, 0.73 (95% CI = 0.67 to 0.80); MEDS, 0.71 (95% CI = 0.63 to 0.77); and mREMS, 0.72 (95% CI = 0.65 to 0.79). The random forest model AUC was statistically different from all other models (p ≤ 0.003 for all comparisons).ConclusionsIn this proof-of-concept study, a local big data-driven, machine learning approach outperformed existing CDRs as well as traditional analytic techniques for predicting in-hospital mortality of ED patients with sepsis. Future research should prospectively evaluate the effectiveness of this approach and whether it translates into improved clinical outcomes for high-risk sepsis patients. The methods developed serve as an example of a new model for predictive analytics in emergency care that can be automated, applied to other clinical outcomes of interest, and deployed in EHRs to enable locally relevant clinical predictions.

Dataset Information

Neonatal mortality prediction with routinely collected data: a machine learning approach.

Background

Methods

Results

Conclusion

Publications

Neonatal mortality prediction with routinely collected data: a machine learning approach.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets