Dataset Information

The Univariate Flagging Algorithm (UFA): An interpretable approach for predictive modeling.

ABSTRACT: In many data classification problems, a number of methods will give similar accuracy. However, when working with people who are not experts in data science such as doctors, lawyers, and judges among others, finding interpretable algorithms can be a critical success factor. Practitioners have a deep understanding of the individual input variables but far less insight into how they interact with each other. For example, there may be ranges of an input variable for which the observed outcome is significantly more or less likely. This paper describes an algorithm for automatic detection of such thresholds, called the Univariate Flagging Algorithm (UFA). The algorithm searches for a separation that optimizes the difference between separated areas while obtaining a high level of support. We evaluate its performance using six sample datasets and demonstrate that thresholds identified by the algorithm align well with published results and known physiological boundaries. We also introduce two classification approaches that use UFA and show that the performance attained on unseen test data is comparable to or better than traditional classifiers when confidence intervals are considered. We identify conditions under which UFA performs well, including applications with large amounts of missing or noisy data, applications with a large number of inputs relative to observations, and applications where incidence of the target is low. We argue that ease of explanation of the results, robustness to missing data and noise, and detection of low incidence adverse outcomes are desirable features for clinical applications that can be achieved with relatively simple classifier, like UFA.

SUBMITTER: Sheth M

PROVIDER: S-EPMC6788700 | biostudies-literature | 2019

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

The Univariate Flagging Algorithm (UFA): An interpretable approach for predictive modeling.

Sheth Mallory M Gerovitch Albert A Welsch Roy R Markuzon Natasha N

PloS one 20191011 10

In many data classification problems, a number of methods will give similar accuracy. However, when working with people who are not experts in data science such as doctors, lawyers, and judges among others, finding interpretable algorithms can be a critical success factor. Practitioners have a deep understanding of the individual input variables but far less insight into how they interact with each other. For example, there may be ranges of an input variable for which the observed outcome is sig ...[more]

PMID: 31603902

Similar Datasets

Project description:BackgroundThe use of routine hospital data for understanding patterns of adverse outcomes has been limited in the past by the fact that pre-existing and post-admission conditions have been indistinguishable. The use of a 'Present on Admission' (or POA) indicator to distinguish pre-existing or co-morbid conditions from those arising during the episode of care has been advocated in the US for many years as a tool to support quality assurance activities and improve the accuracy of risk adjustment methodologies. The USA, Australia and Canada now all assign a flag to indicate the timing of onset of diagnoses. For quality improvement purposes, it is the 'not-POA' diagnoses (that is, those acquired in hospital) that are of interest.MethodsOur objective was to develop an algorithm for assessing the validity of assignment of 'not-POA' flags. We undertook expert review of the International Classification of Diseases, 10th Revision, Australian Modification (ICD-10-AM) to identify conditions that could not be plausibly hospital-acquired. The resulting computer algorithm was tested against all diagnoses flagged as complications in the Victorian (Australia) Admitted Episodes Dataset, 2005/06. Measures reported include rates of appropriate assignment of the new Australian 'Condition Onset' flag by ICD chapter, and patterns of invalid flagging.ResultsOf 18,418 diagnosis codes reviewed, 93.4% (n = 17,195) reflected agreement on status for flagging by at least 2 of 3 reviewers (including 64.4% unanimous agreement; Fleiss' Kappa: 0.61). In tests of the new algorithm, 96.14% of all hospital-acquired diagnosis codes flagged were found to be valid in the Victorian records analysed. A lower proportion of individual codes was judged to be acceptably flagged (76.2%), but this reflected a high proportion of codes used <5 times in the data set (789/1035 invalid codes).ConclusionAn indicator variable about the timing of occurrence of diagnoses can greatly expand the use of routinely coded data for hospital quality improvement programmes. The data-cleaning instrument developed and tested here can help guide coding practice in those health systems considering this change in hospital coding. The algorithm embodies principles for development of coding standards and coder education that would result in improved data validity for routine use of non-POA information.

Project description:BackgroundRobust and accurate prediction of severity for patients with COVID-19 is crucial for patient triaging decisions. Many proposed models were prone to either high bias risk or low-to-moderate discrimination. Some also suffered from a lack of clinical interpretability and were developed based on early pandemic period data. Hence, there has been a compelling need for advancements in prediction models for better clinical applicability.ObjectiveThe primary objective of this study was to develop and validate a machine learning-based Robust and Interpretable Early Triaging Support (RIETS) system that predicts severity progression (involving any of the following events: intensive care unit admission, in-hospital death, mechanical ventilation required, or extracorporeal membrane oxygenation required) within 15 days upon hospitalization based on routinely available clinical and laboratory biomarkers.MethodsWe included data from 5945 hospitalized patients with COVID-19 from 19 hospitals in South Korea collected between January 2020 and August 2022. For model development and external validation, the whole data set was partitioned into 2 independent cohorts by stratified random cluster sampling according to hospital type (general and tertiary care) and geographical location (metropolitan and nonmetropolitan). Machine learning models were trained and internally validated through a cross-validation technique on the development cohort. They were externally validated using a bootstrapped sampling technique on the external validation cohort. The best-performing model was selected primarily based on the area under the receiver operating characteristic curve (AUROC), and its robustness was evaluated using bias risk assessment. For model interpretability, we used Shapley and patient clustering methods.ResultsOur final model, RIETS, was developed based on a deep neural network of 11 clinical and laboratory biomarkers that are readily available within the first day of hospitalization. The features predictive of severity included lactate dehydrogenase, age, absolute lymphocyte count, dyspnea, respiratory rate, diabetes mellitus, c-reactive protein, absolute neutrophil count, platelet count, white blood cell count, and saturation of peripheral oxygen. RIETS demonstrated excellent discrimination (AUROC=0.937; 95% CI 0.935-0.938) with high calibration (integrated calibration index=0.041), satisfied all the criteria of low bias risk in a risk assessment tool, and provided detailed interpretations of model parameters and patient clusters. In addition, RIETS showed potential for transportability across variant periods with its sustainable prediction on Omicron cases (AUROC=0.903, 95% CI 0.897-0.910).ConclusionsRIETS was developed and validated to assist early triaging by promptly predicting the severity of hospitalized patients with COVID-19. Its high performance with low bias risk ensures considerably reliable prediction. The use of a nationwide multicenter cohort in the model development and validation implicates generalizability. The use of routinely collected features may enable wide adaptability. Interpretations of model parameters and patients can promote clinical applicability. Together, we anticipate that RIETS will facilitate the patient triaging workflow and efficient resource allocation when incorporated into a routine clinical practice.

Project description:Seizure prediction might be the solution to tackle the apparent unpredictability of seizures in patients with drug-resistant epilepsy, which comprise about a third of all patients with epilepsy. Designing seizure prediction models involves defining the pre-ictal period, a transition stage between inter-ictal brain activity and the seizure discharge. This period is typically a fixed interval, with some recent studies reporting the evaluation of different patient-specific pre-ictal intervals. Recently, researchers have aimed to determine the pre-ictal period, a transition stage between regular brain activity and a seizure. Authors have been using deep learning models given the ability of such models to automatically perform pre-processing, feature extraction, classification, and handling temporal and spatial dependencies. As these approaches create black-box models, clinicians may not have sufficient trust to use them in high-stake decisions. By considering these problems, we developed an evolutionary seizure prediction model that identifies the best set of features while automatically searching for the pre-ictal period and accounting for patient comfort. This methodology provides patient-specific interpretable insights, which might contribute to a better understanding of seizure generation processes and explain the algorithm's decisions. We tested our methodology on 238 seizures and 3687 h of continuous data, recorded on scalp recordings from 93 patients with several types of focal and generalised epilepsies. We compared the results with a seizure surrogate predictor and obtained a performance above chance for 32% patients. We also compared our results with a control method based on the standard machine learning pipeline (pre-processing, feature extraction, classifier training, and post-processing), where the control marginally outperformed our approach by validating 35% of the patients. In total, 54 patients performed above chance for at least one method: our methodology or the control one. Of these 54 patients, 21 ([Formula: see text]38%) were solely validated by our methodology, while 24 ([Formula: see text]44%) were only validated by the control method. These findings may evidence the need for different methodologies concerning different patients.

Dataset Information

The Univariate Flagging Algorithm (UFA): An interpretable approach for predictive modeling.

Publications

The Univariate Flagging Algorithm (UFA): An interpretable approach for predictive modeling.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets