Dataset Information

Improving Automated Pediatric Bone Age Estimation Using Ensembles of Models from the 2017 RSNA Machine Learning Challenge.

ABSTRACT:

Purpose

To investigate improvements in performance for automatic bone age estimation that can be gained through model ensembling.

Materials and methods

A total of 48 submissions from the 2017 RSNA Pediatric Bone Age Machine Learning Challenge were used. Participants were provided with 12 611 pediatric hand radiographs with bone ages determined by a pediatric radiologist to develop models for bone age determination. The final results were determined using a test set of 200 radiographs labeled with the weighted average of six ratings. The mean pairwise model correlation and performance of all possible model combinations for ensembles of up to 10 models using the mean absolute deviation (MAD) were evaluated. A bootstrap analysis using the 200 test radiographs was conducted to estimate the true generalization MAD.

Results

The estimated generalization MAD of a single model was 4.55 months. The best-performing ensemble consisted of four models with an MAD of 3.79 months. The mean pairwise correlation of models within this ensemble was 0.47. In comparison, the lowest achievable MAD by combining the highest-ranking models based on individual scores was 3.93 months using eight models with a mean pairwise model correlation of 0.67.

Conclusion

Combining less-correlated, high-performing models resulted in better performance than naively combining the top-performing models. Machine learning competitions within radiology should be encouraged to spur development of heterogeneous models whose predictions can be combined to achieve optimal performance.© RSNA, 2019 Supplemental material is available for this article. See also the commentary by Siegel in this issue.

SUBMITTER: Pan I

PROVIDER: S-EPMC6884060 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Similar Datasets

Project description:BackgroundExcess adiposity in children is strongly correlated with obesity-related metabolic disease in adulthood, including diabetes, cardiovascular disease, and 13 types of cancer. Despite the many long-term health risks of childhood obesity, body mass index (BMI) Z-score is typically the only adiposity marker used in pediatric studies and clinical applications. The effects of regional adiposity are not captured in a single scalar measurement, and their effects on short- and long-term metabolic health are largely unknown. However, clinicians and researchers rarely deploy gold-standard methods for measuring compartmental fat such as magnetic resonance imaging (MRI) and dual X-ray absorptiometry (DXA) on children and adolescents due to cost or radiation concerns. Three-dimensional optical (3DO) scans are relatively inexpensive to obtain and use non-invasive and radiation-free imaging techniques to capture the external surface geometry of a patient's body. This 3D shape contains cues about the body composition that can be learned from a structured correlation between 3D body shape parameters and reference DXA scans obtained on a sample population.Study aimThis study seeks to introduce a radiation-free, automated 3D optical imaging solution for monitoring body shape and composition in children aged 5-17.MethodsWe introduce an automated, linear learning method to predict total and regional body composition of children aged 5-17 from 3DO scans. We collected 145 male and 206 female 3DO scans on children between the ages of 5 and 17 with three scanners from independent manufacturers. We used an automated shape templating method first introduced on an adult population to fit a topologically consistent 60,000 vertex (60 k) mesh to 3DO scans of arbitrary scanning source and mesh topology. We constructed a parameterized body shape space using principal component analysis (PCA) and estimated a regression matrix between the shape parameters and their associated DXA measurements. We automatically fit scans of 30 male and 38 female participants from a held-out test set and predicted 12 body composition measurements.ResultsThe coefficient of determination (R2) between 3DO predicted body composition and DXA measurements was at least 0.85 for all measurements with the exception of visceral fat on 3D scan predictions. Precision error was 1-4 times larger than that of DXA. No predicted variable was significantly different from DXA measurement except for male trunk lean mass.ConclusionOptical imaging can quickly, safely, and inexpensively estimate regional body composition in children aged 5-17. Frequent repeat measurements can be taken to chart changes in body adiposity over time without risk of radiation overexposure.

Project description:MotivationMetabolomics generates complex data necessitating advanced computational methods for generating biological insight. While machine learning (ML) is promising, the challenges of selecting the best algorithms and tuning hyperparameters, particularly for non-experts, remain. Automated machine learning (AutoML) can streamline this process; however, the issue of interpretability could persist. This research introduces a unified pipeline that combines AutoML with explainable AI (XAI) techniques to optimize metabolomics analysis.ResultsWe tested our approach on two datasets: renal cell carcinoma (RCC) urine metabolomics and ovarian cancer (OC) serum metabolomics. AutoML, using auto-sklearn, surpassed standalone ML algorithms such as SVM and random forest in differentiating between RCC and healthy controls, as well as OC patients and those with other gynecological cancers (Non-OC). Auto-sklearn employed a mix of algorithms and ensemble techniques, yielding a superior performance (AUC of 0.97 for RCC and 0.85 for OC). Shapley Additive Explanations (SHAP) provided a global ranking of feature importance, identifying dibutylamine and ganglioside GM(d34:1) as the top discriminative metabolites for RCC and OC, respectively. Waterfall plots offered local explanations by illustrating the influence of each metabolite on individual predictions. Dependence plots spotlighted metabolite interactions, such as the connection between hippuric acid and one of its derivatives in RCC, and between GM3(d34:1) and GM3(18:1_16:0) in OC, hinting at potential mechanistic relationships. Through decision plots, a detailed error analysis was conducted, contrasting feature importance for correctly versus incorrectly classified samples. In essence, our pipeline emphasizes the importance of harmonizing AutoML and XAI, facilitating both simplified ML application and improved interpretability in metabolomics data science.Availabilityhttps://github.com/obifarin/automl-xai-metabolomics.

Project description:The emergence of new technologies to synthesize and analyze big data with high-performance computing has increased our capacity to more accurately predict crop yields. Recent research has shown that machine learning (ML) can provide reasonable predictions faster and with higher flexibility compared to simulation crop modeling. However, a single machine learning model can be outperformed by a "committee" of models (machine learning ensembles) that can reduce prediction bias, variance, or both and is able to better capture the underlying distribution of the data. Yet, there are many aspects to be investigated with regard to prediction accuracy, time of the prediction, and scale. The earlier the prediction during the growing season the better, but this has not been thoroughly investigated as previous studies considered all data available to predict yields. This paper provides a machine leaning based framework to forecast corn yields in three US Corn Belt states (Illinois, Indiana, and Iowa) considering complete and partial in-season weather knowledge. Several ensemble models are designed using blocked sequential procedure to generate out-of-bag predictions. The forecasts are made in county-level scale and aggregated for agricultural district and state level scales. Results show that the proposed optimized weighted ensemble and the average ensemble are the most precise models with RRMSE of 9.5%. Stacked LASSO makes the least biased predictions (MBE of 53 kg/ha), while other ensemble models also outperformed the base learners in terms of bias. On the contrary, although random k-fold cross-validation is replaced by blocked sequential procedure, it is shown that stacked ensembles perform not as good as weighted ensemble models for time series data sets as they require the data to be non-IID to perform favorably. Comparing our proposed model forecasts with the literature demonstrates the acceptable performance of forecasts made by our proposed ensemble model. Results from the scenario of having partial in-season weather knowledge reveals that decent yield forecasts with RRMSE of 9.2% can be made as early as June 1st. Moreover, it was shown that the proposed model performed better than individual models and benchmark ensembles at agricultural district and state-level scales as well as county-level scale. To find the marginal effect of each input feature on the forecasts made by the proposed ensemble model, a methodology is suggested that is the basis for finding feature importance for the ensemble model. The findings suggest that weather features corresponding to weather in weeks 18-24 (May 1st to June 1st) are the most important input features.

Project description:IntroductionThe discovery of a new drug is a costly and lengthy endeavour. The computational prediction of which small molecules can bind to a protein target can accelerate this process if the predictions are fast and accurate enough. Recent machine-learning scoring functions re-evaluate the output of molecular docking to achieve more accurate predictions. However, previous scoring functions were trained on crystalised protein-ligand complexes and datasets of decoys. The limited availability of crystal structures and biases in the decoy datasets can lower the performance of scoring functions.ObjectivesTo address key limitations of previous scoring functions and thus improve the predictive performance of structure-based virtual screening.MethodsA novel machine-learning scoring function was created, named SCORCH (Scoring COnsensus for RMSD-based Classification of Hits). To develop SCORCH, training data is augmented by considering multiple ligand poses and labelling poses based on their RMSD from the native pose. Decoy bias is addressed by generating property-matched decoys for each ligand and using the same methodology for preparing and docking decoys and ligands. A consensus of 3 different machine learning approaches is also used to improve performance.ResultsWe find that multi-pose augmentation in SCORCH improves its docking power and screening power on independent benchmark datasets. SCORCH outperforms an equivalent scoring function trained on single poses, with a 1 % enrichment factor (EF) of 13.78 vs. 10.86 on 18 DEKOIS 2.0 targets and a mean native pose rank of 5.9 vs 30.4 on CSAR 2014. Additionally, SCORCH outperforms widely used scoring functions in virtual screening and pose prediction on independent benchmark datasets.ConclusionBy rationally addressing key limitations of previous scoring functions, SCORCH improves the performance of virtual screening. SCORCH also provides an estimate of its uncertainty, which can help reduce the cost and time required for drug discovery.