Dataset Information

Variance Regularized Counterfactual Risk Minimizationvia Variational Divergence Minimization.

ABSTRACT: Off-policy learning, the task of evaluating and improving policies using historic data collected from a logging policy, is important because on-policy evaluation is usually expensive and has adverse impacts. One of the major challenge of off-policy learning is to derive counterfactual estimators that also has low variance and thus low generalization error. In this work, inspired by learning bounds for importance sampling problems, we present a new counterfactual learning principle for off-policy learning with bandit feedbacks. Our method regularizes the generalization error by minimizing the distribution divergence between the logging policy and the new policy, and removes the need for iterating through all training samples to compute sample variance regularization in prior work. With neural network policies, our end-to-end training algorithms using variational divergence minimization showed significant improvement over conventional baseline algorithms and is also consistent with our theoretical results.

SUBMITTER: Wu H

PROVIDER: S-EPMC7419136 | biostudies-literature | 2018

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Variance Regularized Counterfactual Risk Minimizationvia Variational Divergence Minimization.

Wu Hang H Wang May D MD

Proceedings of machine learning research 20180101

Off-policy learning, the task of evaluating and improving policies using historic data collected from a logging policy, is important because on-policy evaluation is usually expensive and has adverse impacts. One of the major challenge of off-policy learning is to derive counterfactual estimators that also has low variance and thus low generalization error. In this work, inspired by learning bounds for importance sampling problems, we present a new counterfactual learning principle for off-policy ...[more]

PMID: 32789292

Dataset Information

Variance Regularized Counterfactual Risk Minimizationvia Variational Divergence Minimization.

Publications

Variance Regularized Counterfactual Risk Minimizationvia Variational Divergence Minimization.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Variational Principles in Quantum Monte Carlo: The Troubled Story of Variance Minimization.
| S-EPMC7365558 | biostudies-literature

Drug-target interaction prediction using Multi Graph Regularized Nuclear Norm Minimization.
| S-EPMC6964976 | biostudies-literature

Doppler OCT clutter rejection using variance minimization and offset extrapolation.
| S-EPMC6238902 | biostudies-literature

Quadratic divergence regularized SVM for optic disc segmentation.
| S-EPMC5480505 | biostudies-literature

Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression.
| S-EPMC6927181 | biostudies-literature

CHD Risk Minimization through Lifestyle Control: Machine Learning Gateway.
| S-EPMC7058059 | biostudies-literature

FairPRS: adjusting for admixed populations in polygenic risk scores using invariant risk minimization.
| S-EPMC10804441 | biostudies-literature

Duration of Time Intervals for Risk Minimization Measure Effectiveness Studies.
| S-EPMC11924164 | biostudies-literature

Fast and accurate Bayesian polygenic risk modeling with variational inference.
| S-EPMC10183379 | biostudies-literature

Counterfactual analysis of differential comorbidity risk factors in Alzheimer's disease and related dementias.
| S-EPMC9931358 | biostudies-literature