Dataset Information

Post-Contextual-Bandit Inference.

ABSTRACT: Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-commerce, healthcare, and policymaking because they can both improve outcomes for study participants and increase the chance of identifying good or even best policies. To support credible inference on novel interventions at the end of the study, nonetheless, we still want to construct valid confidence intervals on average treatment effects, subgroup effects, or value of new policies. The adaptive nature of the data collected by contextual bandit algorithms, however, makes this difficult: standard estimators are no longer asymptotically normally distributed and classic confidence intervals fail to provide correct coverage. While this has been addressed in non-contextual settings by using stabilized estimators, the contextual setting poses unique challenges that we tackle for the first time in this paper. We propose the Contextual Adaptive Doubly Robust (CADR) estimator, the first estimator for policy value that is asymptotically normal under contextual adaptive data collection. The main technical challenge in constructing CADR is designing adaptive and consistent conditional standard deviation estimators for stabilization. Extensive numerical experiments using 57 OpenML datasets demonstrate that confidence intervals based on CADR uniquely provide correct coverage.

SUBMITTER: Bibaut A

PROVIDER: S-EPMC9249103 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Similar Datasets

Project description:Ecological Momentary Assessments (EMA) deliver insights on how patients perceive tinnitus at different times and how they are affected by it. Moving to the next level, an mHealth app can support users more directly by predicting a user's next EMA and recommending personalized services based on these predictions. In this study, we analyzed the data of 21 users who were exposed to an mHealth app with non-personalized recommendations, and we investigate ways of predicting the next vector of EMA answers. We studied the potential of entity-centric predictors that learn for each user separately and neighborhood-based predictors that learn for each user separately but take also similar users into account, and we compared them to a predictor that learns from all past EMA indiscriminately, without considering which user delivered which data, i.e., to a "global model." Since users were exposed to two versions of the non-personalized recommendations app, we employed a Contextual Multi-Armed Bandit (CMAB), which chooses the best predictor for each user at each time point, taking each user's group into account. Our analysis showed that the combination of predictors into a CMAB achieves good performance throughout, since the global model was chosen at early time points and for users with few data, while the entity-centric, i.e., user-specific, predictors were used whenever the user had delivered enough data-the CMAB chose itself when the data were "enough." This flexible setting delivered insights on how user behavior can be predicted for personalization, as well as insights on the specific mHealth data. Our main findings are that for EMA prediction the entity-centric predictors should be preferred over a user-insensitive global model and that the choice of EMA items should be further investigated because some items are answered more rarely than others. Albeit our CMAB-based prediction workflow is robust to differences in exposition and interaction intensity, experimentators that design studies with mHealth apps should be prepared to quantify and closely monitor differences in the intensity of user-app interaction, since users with many interactions may have a disproportionate influence on global models.

Project description:Nonpoint source water quality management is challenged with allocating uncertain management actions and monitoring their performance in the absence of state-dependent decision making. This adaptive management context can be expressed as a multiarmed bandit problem. Multiarmed bandit strategies attempt to balance the exploitation of actions that appear to maximize performance with the exploration of uncertain, but potentially better, actions. We performed a test of multiarmed bandit strategies to inform adaptive water quality management in Massachusetts, USA. Conservation and restoration practitioners were tasked with allocating household wastewater treatments to minimize N inputs to impaired waters. We obtained time series of N monitoring data from 3 wastewater treatment types and organized them chronologically and randomly. The chronological data set represented nonstationary performance based on recent monitoring data, whereas the random data set represented stationary performance. We tested 2 multiarmed bandit strategies in hypothetical experiments to sample from the treatment data through 20 sequential decisions. A deterministic probability-matching strategy allocated treatments with the highest probability of success regarding their performance at each decision. A randomized probability-matching strategy randomly allocated treatments according to their probability of success at each decision. The strategies were compared with a nonadaptive strategy that equally allocated treatments at each decision. Results indicated that equal allocation is useful for learning in nonstationary situations but tended to overexplore inferior treatments and thus did not maximize performance when compared with the other strategies. Deterministic probability matching maximized performance in many stationary situations, but the strategy did not adequately explore treatments and converged on inferior treatments in nonstationary situations. Randomized probability matching balanced performance and learning in stationary situations, but the strategy could converge on inferior treatments in nonstationary situations. These findings provide evidence that probability-matching strategies are useful for adaptive management. Integr Environ Assess Manag 2020;16:841-852. © 2020 The Authors. Integrated Environmental Assessment and Management published by Wiley Periodicals LLC on behalf of Society of Environmental Toxicology & Chemistry (SETAC).

Dataset Information

Post-Contextual-Bandit Inference.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets