Dataset Information

Proper Inference for Value Function in High-Dimensional Q-Learning for Dynamic Treatment Regimes.

ABSTRACT: Dynamic treatment regimes are a set of decision rules and each treatment decision is tailored over time according to patients' responses to previous treatments as well as covariate history. There is a growing interest in development of correct statistical inference for optimal dynamic treatment regimes to handle the challenges of non-regularity problems in the presence of non-respondents who have zero-treatment effects, especially when the dimension of the tailoring variables is high. In this paper, we propose a high-dimensional Q-learning (HQ-learning) to facilitate the inference of optimal values and parameters. The proposed method allows us to simultaneously estimate the optimal dynamic treatment regimes and select the important variables that truly contribute to the individual reward. At the same time, hard thresholding is introduced in the method to eliminate the effects of the non-respondents. The asymptotic properties for the parameter estimators as well as the estimated optimal value function are then established by adjusting the bias due to thresholding. Both simulation studies and real data analysis demonstrate satisfactory performance for obtaining the proper inference for the value function for the optimal dynamic treatment regimes.

SUBMITTER: Zhu W

PROVIDER: S-EPMC6953729 | biostudies-literature | 2019

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Proper Inference for Value Function in High-Dimensional Q-Learning for Dynamic Treatment Regimes.

Zhu Wensheng W Zeng Donglin D Song Rui R

Journal of the American Statistical Association 20181029 527

Dynamic treatment regimes are a set of decision rules and each treatment decision is tailored over time according to patients' responses to previous treatments as well as covariate history. There is a growing interest in development of correct statistical inference for optimal dynamic treatment regimes to handle the challenges of non-regularity problems in the presence of non-respondents who have zero-treatment effects, especially when the dimension of the tailoring variables is high. In this pa ...[more]

PMID: 31929664

Similar Datasets

Project description:Precision medicine is currently a topic of great interest in clinical and intervention science. A key component of precision medicine is that it is evidence-based, i.e., data-driven, and consequently there has been tremendous interest in estimation of precision medicine strategies using observational or randomized study data. One way to formalize precision medicine is through a treatment regime, which is a sequence of decision rules, one per stage of clinical intervention, that map up-to-date patient information to a recommended treatment. An optimal treatment regime is defined as maximizing the mean of some cumulative clinical outcome if applied to a population of interest. It is well-known that even under simple generative models an optimal treatment regime can be a highly nonlinear function of patient information. Consequently, a focal point of recent methodological research has been the development of flexible models for estimating optimal treatment regimes. However, in many settings, estimation of an optimal treatment regime is an exploratory analysis intended to generate new hypotheses for subsequent research and not to directly dictate treatment to new patients. In such settings, an estimated treatment regime that is interpretable in a domain context may be of greater value than an unintelligible treatment regime built using 'black-box' estimation methods. We propose an estimator of an optimal treatment regime composed of a sequence of decision rules, each expressible as a list of "if-then" statements that can be presented as either a paragraph or as a simple flowchart that is immediately interpretable to domain experts. The discreteness of these lists precludes smooth, i.e., gradient-based, methods of estimation and leads to non-standard asymptotics. Nevertheless, we provide a computationally efficient estimation algorithm, prove consistency of the proposed estimator, and derive rates of convergence. We illustrate the proposed methods using a series of simulation examples and application to data from a sequential clinical trial on bipolar disorder.

Dataset Information

Proper Inference for Value Function in High-Dimensional Q-Learning for Dynamic Treatment Regimes.

Publications

Proper Inference for Value Function in High-Dimensional Q-Learning for Dynamic Treatment Regimes.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets