Unknown

Dataset Information

0

Validation of a Machine Learning Model to Predict Childhood Lead Poisoning.


ABSTRACT:

Importance

Childhood lead poisoning causes irreversible neurobehavioral deficits, but current practice is secondary prevention.

Objective

To validate a machine learning (random forest) prediction model of elevated blood lead levels (EBLLs) by comparison with a parsimonious logistic regression.

Design, setting, and participants

This prognostic study for temporal validation of multivariable prediction models used data from the Women, Infants, and Children (WIC) program of the Chicago Department of Public Health. Participants included a development cohort of children born from January 1, 2007, to December 31, 2012, and a validation WIC cohort born from January 1 to December 31, 2013. Blood lead levels were measured until December 31, 2018. Data were analyzed from January 1 to October 31, 2019.

Exposures

Blood lead level test results; lead investigation findings; housing characteristics, permits, and violations; and demographic variables.

Main outcomes and measures

Incident EBLL (≥6 μg/dL). Models were assessed using the area under the receiver operating characteristic curve (AUC) and confusion matrix metrics (positive predictive value, sensitivity, and specificity) at various thresholds.

Results

Among 6812 children in the WIC validation cohort, 3451 (50.7%) were female, 3057 (44.9%) were Hispanic, 2804 (41.2%) were non-Hispanic Black, 458 (6.7%) were non-Hispanic White, and 442 (6.5%) were Asian (mean [SD] age, 5.5 [0.3] years). The median year of housing construction was 1919 (interquartile range, 1903-1948). Random forest AUC was 0.69 compared with 0.64 for logistic regression (difference, 0.05; 95% CI, 0.02-0.08). When predicting the 5% of children at highest risk to have EBLLs, random forest and logistic regression models had positive predictive values of 15.5% and 7.8%, respectively (difference, 7.7%; 95% CI, 3.7%-11.3%), sensitivity of 16.2% and 8.1%, respectively (difference, 8.1%; 95% CI, 3.9%-11.7%), and specificity of 95.5% and 95.1% (difference, 0.4%; 95% CI, 0.0%-0.7%).

Conclusions and relevance

The machine learning model outperformed regression in predicting childhood lead poisoning, especially in identifying children at highest risk. Such a model could be used to target the allocation of lead poisoning prevention resources to these children.

SUBMITTER: Potash E 

PROVIDER: S-EPMC7495240 | biostudies-literature | 2020 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

Validation of a Machine Learning Model to Predict Childhood Lead Poisoning.

Potash Eric E   Ghani Rayid R   Walsh Joe J   Jorgensen Emile E   Lohff Cortland C   Prachand Nik N   Mansour Raed R  

JAMA network open 20200901 9


<h4>Importance</h4>Childhood lead poisoning causes irreversible neurobehavioral deficits, but current practice is secondary prevention.<h4>Objective</h4>To validate a machine learning (random forest) prediction model of elevated blood lead levels (EBLLs) by comparison with a parsimonious logistic regression.<h4>Design, setting, and participants</h4>This prognostic study for temporal validation of multivariable prediction models used data from the Women, Infants, and Children (WIC) program of the  ...[more]

Similar Datasets

| S-EPMC7610191 | biostudies-literature
| S-EPMC11624443 | biostudies-literature
| S-EPMC11722487 | biostudies-literature
| S-EPMC9937004 | biostudies-literature
| S-EPMC9281065 | biostudies-literature
| S-EPMC9897265 | biostudies-literature
| S-EPMC9583033 | biostudies-literature
| S-EPMC11909980 | biostudies-literature
| S-EPMC10101605 | biostudies-literature
| S-EPMC11925937 | biostudies-literature