We address a practical problem ubiquitous in modern marketing campaigns, in which a central agent tries to learn a policy for allocating strategic financial incentives to customers and observes only bandit feedback. In contrast to traditional policy optimization frameworks, we take into account the additional reward structure and budget constraints common in this setting, and develop a new two-step method for solving this constrained counterfactual policy optimization problem. Our method first casts the reward estimation problem as a domain adaptation problem with supplementary structure, and then subsequently uses the estimators for optimizing the policy with constraints. We also establish theoretical error bounds for our estimation proced...
Off-policy evaluation (OPE) attempts to predict the performance of counterfactual policies using log...
Many multi-agent systems have a single coordinator providing incentives to a large number of agents....
The rapid accumulation of high-dimensional data has opened new opportunities to make informed decisi...
We address a practical problem ubiquitous in modern marketing campaigns, in which a central agent tr...
What is the most statistically efficient way to do off-policy optimization with batch data from band...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
This article studies the data-adaptive inference of an optimal treatment rule. A treatment rule is a...
We propose a novel method to predict rational counterfactual demand responses from an observed set o...
We develop a learning principle and an efficient algorithm for batch learning from logged bandit fee...
We study reward design strategies for incentivizing a reinforcement learning agent to adopt a policy...
This paper develops a theoretical framework for analyzing incentive schemes under bounded rationalit...
Recent work has defined an optimal reward problem (ORP) in which an agent designer, with an objectiv...
textabstractWe propose a novel method to predict rational counterfactual demand responses from an ob...
Sequential decision making is central to a range of marketing problems. Both firms and consumers aim...
I study an optimal design of monetary incentives in experiments where incentives are a treatment var...
Off-policy evaluation (OPE) attempts to predict the performance of counterfactual policies using log...
Many multi-agent systems have a single coordinator providing incentives to a large number of agents....
The rapid accumulation of high-dimensional data has opened new opportunities to make informed decisi...
We address a practical problem ubiquitous in modern marketing campaigns, in which a central agent tr...
What is the most statistically efficient way to do off-policy optimization with batch data from band...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
This article studies the data-adaptive inference of an optimal treatment rule. A treatment rule is a...
We propose a novel method to predict rational counterfactual demand responses from an observed set o...
We develop a learning principle and an efficient algorithm for batch learning from logged bandit fee...
We study reward design strategies for incentivizing a reinforcement learning agent to adopt a policy...
This paper develops a theoretical framework for analyzing incentive schemes under bounded rationalit...
Recent work has defined an optimal reward problem (ORP) in which an agent designer, with an objectiv...
textabstractWe propose a novel method to predict rational counterfactual demand responses from an ob...
Sequential decision making is central to a range of marketing problems. Both firms and consumers aim...
I study an optimal design of monetary incentives in experiments where incentives are a treatment var...
Off-policy evaluation (OPE) attempts to predict the performance of counterfactual policies using log...
Many multi-agent systems have a single coordinator providing incentives to a large number of agents....
The rapid accumulation of high-dimensional data has opened new opportunities to make informed decisi...