Dataset Information

Assessing the geographic specificity of pH prediction by classification and regression trees.

ABSTRACT: Soil pH effects a wide range of critical biogeochemical processes that dictate plant growth and diversity. Previous literature has established the capacity of classification and regression trees (CARTs) to predict soil pH, but limitations of CARTs in this context have not been fully explored. The current study collected soil pH, climatic, and topographic data from 100 locations across New York's Temperate Deciduous Forests (in the United States of America) to investigate the extrapolative capacity of a previously developed CART model as compared to novel CART and random forest (RF) models. Results showed that the previously developed CART underperformed in terms of predictive accuracy (RRMSE = 14.52%) when compared to a novel tree (RRMSE = 9.33%), and that a novel random forest outperformed both models (RRMSE = 8.88%), though its predictions did not differ significantly from the novel tree (p = 0.26). The most important predictors for model construction were climatic factors. These findings confirm existing reports that CART models are constrained by the spatial autocorrelation of geographic data and encourage the restricted application of relevant machine learning models to regions from which training data was collected. They also contradict previous literature implying that random forests should meaningfully boost the predictive accuracy of CARTs in the context of soil pH.

SUBMITTER: Egelberg J

PROVIDER: S-EPMC8357141 | biostudies-literature | 2021

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Assessing the geographic specificity of pH prediction by classification and regression trees.

Egelberg Jacob J Pena Nina N Rivera Rachel R Andruk Christina C

PloS one 20210811 8

Soil pH effects a wide range of critical biogeochemical processes that dictate plant growth and diversity. Previous literature has established the capacity of classification and regression trees (CARTs) to predict soil pH, but limitations of CARTs in this context have not been fully explored. The current study collected soil pH, climatic, and topographic data from 100 locations across New York's Temperate Deciduous Forests (in the United States of America) to investigate the extrapolative capaci ...[more]

PMID: 34379630

Similar Datasets

Project description:ObjectivesLittle research has been done in pharmacoepidemiology on the use of machine learning for exploring medicinal treatment effectiveness in oncology. Therefore, the aim of this study was to explore the added value of machine learning methods to investigate individual treatment responses for glioblastoma patients treated with temozolomide.MethodsBased on a retrospective observational registry covering 3090 patients with glioblastoma treated with temozolomide, we proposed the use of a two-step iterative exploratory learning process consisting of an initialization phase and a machine learning phase. For initialization, we defined a binary response variable as the target label using one-by-one nearest neighbor propensity score matching. Secondly, a classification tree algorithm was trained and validated for dividing individual patients into treatment response and non-response groups. Theorizing about treatment response was then done by evaluating the tree performance.ResultsThe classification tree model has an area under the curve (AUC) classification performance of 67% corresponding to a sensitivity of 0.69 and a specificity of 0.51. This result in predicting patient-level response was slightly better than the logistic regression model featuring an AUC of 64% (0.63 sensitivity and 0.54 specificity). The tree confirms confounding by age and discovers further age-related stratification with chemotherapy-treatment dependency, both not revealed in preceding clinical studies. The model lacked genetic information confounding treatment response.ConclusionsA classification tree was found to be suitable for understanding patient-level effectiveness for this glioblastoma-temozolomide case because of its high interpretability and capability to deal with covariate interdependencies, essential in a real-world environment. Possible improvements in the model's classification can be achieved by including genetic information and collecting primary data on treatment response. The model can be valuable in clinical practice for predicting personal treatment pathways.

Dataset Information

Assessing the geographic specificity of pH prediction by classification and regression trees.

Publications

Assessing the geographic specificity of pH prediction by classification and regression trees.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets