Dataset Information

Building more accurate decision trees with the additive tree.

ABSTRACT: The expansion of machine learning to high-stakes application domains such as medicine, finance, and criminal justice, where making informed decisions requires clear understanding of the model, has increased the interest in interpretable machine learning. The widely used Classification and Regression Trees (CART) have played a major role in health sciences, due to their simple and intuitive explanation of predictions. Ensemble methods like gradient boosting can improve the accuracy of decision trees, but at the expense of the interpretability of the generated model. Additive models, such as those produced by gradient boosting, and full interaction models, such as CART, have been investigated largely in isolation. We show that these models exist along a spectrum, revealing previously unseen connections between these approaches. This paper introduces a rigorous formalization for the additive tree, an empirically validated learning technique for creating a single decision tree, and shows that this method can produce models equivalent to CART or gradient boosted stumps at the extremes by varying a single parameter. Although the additive tree is designed primarily to provide both the model interpretability and predictive performance needed for high-stakes applications like medicine, it also can produce decision trees represented by hybrid models between CART and boosted stumps that can outperform either of these approaches.

SUBMITTER: Luna JM

PROVIDER: S-EPMC6778203 | biostudies-literature | 2019 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Building more accurate decision trees with the additive tree.

Luna José Marcio JM Gennatas Efstathios D ED Ungar Lyle H LH Eaton Eric E Diffenderfer Eric S ES Jensen Shane T ST Simone Charles B CB Friedman Jerome H JH Solberg Timothy D TD Valdes Gilmer G

Proceedings of the National Academy of Sciences of the United States of America 20190916 40

The expansion of machine learning to high-stakes application domains such as medicine, finance, and criminal justice, where making informed decisions requires clear understanding of the model, has increased the interest in interpretable machine learning. The widely used Classification and Regression Trees (CART) have played a major role in health sciences, due to their simple and intuitive explanation of predictions. Ensemble methods like gradient boosting can improve the accuracy of decision tr ...[more]

PMID: 31527280

Similar Datasets

Project description:Individualized treatment rules can improve health outcomes by recognizing that patients may respond differently to treatment and assigning therapy with the most desirable predicted outcome for each individual. Flexible and efficient prediction models are desired as a basis for such individualized treatment rules to handle potentially complex interactions between patient factors and treatment. Modern Bayesian semiparametric and nonparametric regression models provide an attractive avenue in this regard as these allow natural posterior uncertainty quantification of patient specific treatment decisions as well as the population wide value of the prediction-based individualized treatment rule. In addition, via the use of such models, inference is also available for the value of the optimal individualized treatment rules. We propose such an approach and implement it using Bayesian Additive Regression Trees as this model has been shown to perform well in fitting nonparametric regression functions to continuous and binary responses, even with many covariates. It is also computationally efficient for use in practice. With Bayesian Additive Regression Trees, we investigate a treatment strategy which utilizes individualized predictions of patient outcomes from Bayesian Additive Regression Trees models. Posterior distributions of patient outcomes under each treatment are used to assign the treatment that maximizes the expected posterior utility. We also describe how to approximate such a treatment policy with a clinically interpretable individualized treatment rule, and quantify its expected outcome. The proposed method performs very well in extensive simulation studies in comparison with several existing methods. We illustrate the usage of the proposed method to identify an individualized choice of conditioning regimen for patients undergoing hematopoietic cell transplantation and quantify the value of this method of choice in relation to the optimal individualized treatment rule as well as non-individualized treatment strategies.

Project description:There is today no gold standard method to accurately define the time passed since infection at HIV diagnosis. Infection timing and incidence measurement is however essential to better monitor the dynamics of local epidemics and the effect of prevention initiatives.Three methods for infection timing were evaluated using 237 serial samples from documented seroconversions and 566 cross sectional samples from newly diagnosed patients: identification of antibodies against the HIV p31 protein in INNO-LIA, SediaTM BED CEIA and SediaTM LAg-Avidity EIA. A multi-assay decision tree for infection timing was developed.Clear differences in recency window between BED CEIA, LAg-Avidity EIA and p31 antibody presence were observed with a switch from recent to long term infection a median of 169.5, 108.0 and 64.5 days after collection of the pre-seroconversion sample respectively. BED showed high reliability for identification of long term infections while LAg-Avidity is highly accurate for identification of recent infections. Using BED as initial assay to identify the long term infections and LAg-Avidity as a confirmatory assay for those classified as recent infection by BED, explores the strengths of both while reduces the workload. The short recency window of p31 antibodies allows to discriminate very early from early infections based on this marker. BED recent infection results not confirmed by LAg-Avidity are considered to reflect a period more distant from the infection time. False recency predictions in this group can be minimized by elimination of patients with a CD4 count of less than 100 cells/mm3 or without no p31 antibodies. For 566 cross sectional sample the outcome of the decision tree confirmed the infection timing based on the results of all 3 markers but reduced the overall cost from 13.2 USD to 5.2 USD per sample.A step-wise multi assay decision tree allows accurate timing of the HIV infection at diagnosis at affordable effort and cost and can be an important new tool in studies analyzing the dynamics of local epidemics or the effects of prevention strategies.

Dataset Information

Building more accurate decision trees with the additive tree.

Publications

Building more accurate decision trees with the additive tree.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets