Browse
Submit Data
Databases
API
Help

Dataset Information

17 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

Assessing the calibration in toxicological in vitro models with conformal prediction.

ABSTRACT: Machine learning methods are widely used in drug discovery and toxicity prediction. While showing overall good performance in cross-validation studies, their predictive power (often) drops in cases where the query samples have drifted from the training data's descriptor space. Thus, the assumption for applying machine learning algorithms, that training and test data stem from the same distribution, might not always be fulfilled. In this work, conformal prediction is used to assess the calibration of the models. Deviations from the expected error may indicate that training and test data originate from different distributions. Exemplified on the Tox21 datasets, composed of chronologically released Tox21Train, Tox21Test and Tox21Score subsets, we observed that while internally valid models could be trained using cross-validation on Tox21Train, predictions on the external Tox21Score data resulted in higher error rates than expected. To improve the prediction on the external sets, a strategy exchanging the calibration set with more recent data, such as Tox21Test, has successfully been introduced. We conclude that conformal prediction can be used to diagnose data drifts and other issues related to model calibration. The proposed improvement strategy-exchanging the calibration data only-is convenient as it does not require retraining of the underlying model.

SUBMITTER: Morger A

PROVIDER: S-EPMC8082859 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Json Xml

Similar Datasets

A tutorial on calibration measurements and calibration models for clinical prediction models.

Project description:Our primary objective is to provide the clinical informatics community with an introductory tutorial on calibration measurements and calibration models for predictive models using existing R packages and custom implemented code in R on real and simulated data. Clinical predictive model performance is commonly published based on discrimination measures, but use of models for individualized predictions requires adequate model calibration. This tutorial is intended for clinical researchers who want to evaluate predictive models in terms of their applicability to a particular population. It is also for informaticians and for software engineers who want to understand the role that calibration plays in the evaluation of a clinical predictive model, and to provide them with a solid starting point to consider incorporating calibration evaluation and calibration models in their work. Covered topics include (1) an introduction to the importance of calibration in the clinical setting, (2) an illustration of the distinct roles that discrimination and calibration play in the assessment of clinical predictive models, (3) a tutorial and demonstration of selected calibration measurements, (4) a tutorial and demonstration of selected calibration models, and (5) a brief discussion of limitations of these methods and practical suggestions on how to use them in practice.

| S-EPMC7075534 | biostudies-literature

Addressing bias in prediction models by improving subpopulation calibration.

Project description:ObjectiveTo illustrate the problem of subpopulation miscalibration, to adapt an algorithm for recalibration of the predictions, and to validate its performance.Materials and methodsIn this retrospective cohort study, we evaluated the calibration of predictions based on the Pooled Cohort Equations (PCE) and the fracture risk assessment tool (FRAX) in the overall population and in subpopulations defined by the intersection of age, sex, ethnicity, socioeconomic status, and immigration history. We next applied the recalibration algorithm and assessed the change in calibration metrics, including calibration-in-the-large.Results1 021 041 patients were included in the PCE population, and 1 116 324 patients were included in the FRAX population. Baseline overall model calibration of the 2 tested models was good, but calibration in a substantial portion of the subpopulations was poor. After applying the algorithm, subpopulation calibration statistics were greatly improved, with the variance of the calibration-in-the-large values across all subpopulations reduced by 98.8% and 94.3% in the PCE and FRAX models, respectively.DiscussionPrediction models in medicine are increasingly common. Calibration, the agreement between predicted and observed risks, is commonly poor for subpopulations that were underrepresented in the development set of the models, resulting in bias and reduced performance for these subpopulations. In this work, we empirically evaluated an adapted version of the fairness algorithm designed by Hebert-Johnson et al. (2017) and demonstrated its use in improving subpopulation miscalibration.ConclusionA postprocessing and model-independent fairness algorithm for recalibration of predictive models greatly decreases the bias of subpopulation miscalibration and thus increases fairness and equality.

| S-EPMC7936516 | biostudies-literature

Distributional conformal prediction.

Project description:We propose a robust method for constructing conditionally valid prediction intervals based on models for conditional distributions such as quantile and distribution regression. Our approach can be applied to important prediction problems, including cross-sectional prediction, k-step-ahead forecasts, synthetic controls and counterfactual prediction, and individual treatment effects prediction. Our method exploits the probability integral transform and relies on permuting estimated ranks. Unlike regression residuals, ranks are independent of the predictors, allowing us to construct conditionally valid prediction intervals under heteroskedasticity. We establish approximate conditional validity under consistent estimation and provide approximate unconditional validity under model misspecification, under overfitting, and with time series data. We also propose a simple "shape" adjustment of our baseline method that yields optimal prediction intervals.

| S-EPMC8640792 | biostudies-literature

Assessing the Toxicological Relevance of Nanomaterial Agglomerates and Aggregates Using Realistic Exposure In Vitro.

Project description:Low dose repeated exposures are considered more relevant/realistic in assessing the health risks of nanomaterials (NM), as human exposure such as in workplace occurs in low doses and in a repeated manner. Thus, in a three-week study, we assessed the biological effects (cell viability, cell proliferation, oxidative stress, pro-inflammatory response, and DNA damage) of titanium-di-oxide nanoparticle (TiO2 NP) agglomerates and synthetic amorphous silica (SAS) aggregates of different sizes in human bronchial epithelial (HBE), colon epithelial (Caco2), and human monocytic (THP-1) cell lines repeatedly exposed to a non-cytotoxic dose (0.76 µg/cm2). We noticed that neither of the two TiO2 NPs nor their agglomeration states induced any effects (compared to control) in any of the cell lines tested while SAS aggregates induced some significant effects only in HBE cell cultures. In a second set of experiments, HBE cell cultures were exposed repeatedly to different SAS suspensions for two weeks (first and second exposure cycle) and allowed to recover (without SAS exposure, recovery period) for a week. We observed that SAS aggregates of larger sizes (size ~2.5 µm) significantly affected the cell proliferation, IL-6, IL-8, and total glutathione at the end of both exposure cycle while their nanosized counterparts (size less than 100 nm) induced more pronounced effects only at the end of the first exposure cycle. As noticed in our previous short-term (24 h) exposure study, large aggregates of SAS did appear to be similarly potent as nano sized aggregates. This study also suggests that aggregates of SAS of size greater than 100 nm are toxicologically relevant and should be considered in risk assessment.

| S-EPMC8308261 | biostudies-literature

An Analysis of Proteochemometric and Conformal Prediction Machine Learning Protein-Ligand Binding Affinity Models.

Project description:Protein-ligand binding affinity is a key pharmacodynamic endpoint in drug discovery. Sole reliance on experimental design, make, and test cycles is costly and time consuming, providing an opportunity for computational methods to assist. Herein, we present results comparing random forest and feed-forward neural network proteochemometric models for their ability to predict pIC50 measurements for held out generic Bemis-Murcko scaffolds. In addition, we assess the ability of conformal prediction to provide calibrated prediction intervals in both a retrospective and semi-prospective test using the recently released Grand Challenge 4 data set as an external test set. In total, random forest and deep neural network proteochemometric models show quality retrospective performance but suffer in the semi-prospective setting. However, the conformal predictor prediction intervals prove to be well-calibrated both retrospectively and semi-prospectively showing that they can be used to guide hit discovery and lead optimization campaigns.

| S-EPMC7328444 | biostudies-literature

Detection of calibration drift in clinical prediction models to inform model updating.

Project description:Model calibration, critical to the success and safety of clinical prediction models, deteriorates over time in response to the dynamic nature of clinical environments. To support informed, data-driven model updating strategies, we present and evaluate a calibration drift detection system. Methods are developed for maintaining dynamic calibration curves with optimized online stochastic gradient descent and for detecting increasing miscalibration with adaptive sliding windows. These methods are generalizable to support diverse prediction models developed using a variety of learning algorithms and customizable to address the unique needs of clinical use cases. In both simulation and case studies, our system accurately detected calibration drift. When drift is detected, our system further provides actionable alerts by including information on a window of recent data that may be appropriate for model updating. Simulations showed these windows were primarily composed of data accruing after drift onset, supporting the potential utility of the windows for model updating. By promoting model updating as calibration deteriorates rather than on pre-determined schedules, implementations of our drift detection system may minimize interim periods of insufficient model accuracy and focus analytic resources on those models most in need of attention.

| S-EPMC8627243 | biostudies-literature

Toxicological Analysis of Hepatocytes Using FLIM Technique: In Vitro versus Ex Vivo Models.

Project description:The search for new criteria indicating acute or chronic pathological processes resulting from exposure to toxic agents, testing of drugs for potential hepatotoxicity, and fundamental study of the mechanisms of hepatotoxicity at a molecular level still represents a challenging issue that requires the selection of adequate research models and tools. Microfluidic chips (MFCs) offer a promising in vitro model for express analysis and are easy to implement. However, to obtain comprehensive information, more complex models are needed. A fundamentally new label-free approach for studying liver pathology is fluorescence-lifetime imaging microscopy (FLIM). We obtained FLIM data on both the free and bound forms of NAD(P)H, which is associated with different metabolic pathways. In clinical cases, liver pathology resulting from overdoses is most often as a result of acetaminophen (APAP) or alcohol (ethanol). Therefore, we have studied and compared the metabolic state of hepatocytes in various experimental models of APAP and ethanol hepatotoxicity. We have determined the potential diagnostic criteria including the pathologically altered metabolism of the hepatocytes in the early stages of toxic damage, including pronounced changes in the contribution from the bound form of NAD(P)H. In contrast to the MFCs, the changes in the metabolic state of hepatocytes in the ex vivo models are, to a greater extent, associated with compensatory processes. Thus, MFCs in combination with FLIM can be applied as an effective tool set for the express modeling and diagnosis of hepatotoxicity in clinics.

| S-EPMC8616382 | biostudies-literature

Assessing quantile prediction with censored quantile regression models.

Project description:An important goal of censored quantile regression is to provide reliable predictions of survival quantiles, which are often reported in practice to offer robust and comprehensive biomedical summaries. However, formal methods for evaluating and comparing working quantile regression models in terms of their performance in predicting survival quantiles have been lacking, especially when the working models are subject to model mis-specification. In this article, we proposes a sensible and rigorous framework to fill in this gap. We introduce and justify a predictive performance measure defined based on the check loss function. We derive estimators of the proposed predictive performance measure and study their distributional properties and the corresponding inference procedures. More importantly, we develop model comparison procedures that enable thorough evaluations of model predictive performance among nested or non-nested models. Our proposals properly accommodate random censoring to the survival outcome and the realistic complication of model mis-specification, and thus are generally applicable. Extensive simulations and a real data example demonstrate satisfactory performances of the proposed methods in real life settings.

| S-EPMC5462897 | biostudies-literature

Assessing the transportability of clinical prediction models for cognitive impairment using causal models.

Project description:BackgroundMachine learning models promise to support diagnostic predictions, but may not perform well in new settings. Selecting the best model for a new setting without available data is challenging. We aimed to investigate the transportability by calibration and discrimination of prediction models for cognitive impairment in simulated external settings with different distributions of demographic and clinical characteristics.MethodsWe mapped and quantified relationships between variables associated with cognitive impairment using causal graphs, structural equation models, and data from the ADNI study. These estimates were then used to generate datasets and evaluate prediction models with different sets of predictors. We measured transportability to external settings under guided interventions on age, APOE ε4, and tau-protein, using performance differences between internal and external settings measured by calibration metrics and area under the receiver operating curve (AUC).ResultsCalibration differences indicated that models predicting with causes of the outcome were more transportable than those predicting with consequences. AUC differences indicated inconsistent trends of transportability between the different external settings. Models predicting with consequences tended to show higher AUC in the external settings compared to internal settings, while models predicting with parents or all variables showed similar AUC.ConclusionsWe demonstrated with a practical prediction task example that predicting with causes of the outcome results in better transportability compared to anti-causal predictions when considering calibration differences. We conclude that calibration performance is crucial when assessing model transportability to external settings.

| S-EPMC10439645 | biostudies-literature

Assessing risk model calibration with missing covariates.

Project description:When validating a risk model in an independent cohort, some predictors may be missing for some subjects. Missingness can be unplanned or by design, as in case-cohort or nested case-control studies, in which some covariates are measured only in subsampled subjects. Weighting methods and imputation are used to handle missing data. We propose methods to increase the efficiency of weighting to assess calibration of a risk model (i.e. bias in model predictions), which is quantified by the ratio of the number of observed events, $\mathcal{O}$, to expected events, $\mathcal{E}$, computed from the model. We adjust known inverse probability weights by incorporating auxiliary information available for all cohort members. We use survey calibration that requires the weighted sum of the auxiliary statistics in the complete data subset to equal their sum in the full cohort. We show that a pseudo-risk estimate that approximates the actual risk value but uses only variables available for the entire cohort is an excellent auxiliary statistic to estimate $\mathcal{E}$. We derive analytic variance formulas for $\mathcal{O}/\mathcal{E}$ with adjusted weights. In simulations, weight adjustment with pseudo-risk was much more efficient than inverse probability weighting and yielded consistent estimates even when the pseudo-risk was a poor approximation. Multiple imputation was often efficient but yielded biased estimates when the imputation model was misspecified. Using these methods, we assessed calibration of an absolute risk model for second primary thyroid cancer in an independent cohort.

| S-EPMC9608650 | biostudies-literature

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data