What is the most statistically efficient way to do off-policy optimization with batch data from bandit feedback? For log data generated by contextual bandit algorithms, we consider offline estimators for the expected reward from a counterfactual policy. Our estimators are shown to have lowest variance in a wide class of estimators, achieving variance reduction relative to standard estimators. We then apply our estimators to improve advertisement design by a major advertisement company. Consistent with the theoretical result, our estimators allow us to improve on the existing bandit algorithm with more statistical confidence compared to a state-of-theart benchmark
We investigate boosted ensemble models for off-policy learning from logged bandit feedback. Toward t...
The design of effective bandit algorithms to learn the optimal price is a task of extraordinary impo...
Machine and Statistical Learning techniques are used in almost all online advertisement systems. The...
What is the most statistically efficient way to do off-policy optimization with batch data from bandit...
We develop a learning principle and an efficient algorithm for batch learning from logged bandit fee...
We develop a learning principle and an efficient algorithm for batch learning from logged bandit fee...
We develop a learning principle and an efficient algorithm for batch learning from logged bandit fee...
Off-policy evaluation (OPE) attempts to predict the performance of counterfactual policies using log...
This paper identifies a severe problem of the counterfactual risk estimator typi-cally used in batch...
We address a practical problem ubiquitous in modern marketing campaigns, in which a central agent tr...
Counterfactual reasoning from logged data has become increasingly important for many applicationssuc...
We study the problem of batch learning from bandit feedback in the setting of extremely large action...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
206 pagesRecent advances in reinforcement learning (RL) provide exciting potential for making agents...
In this work, we explore an online reinforcement learning problem called the multi-armed bandit for ...
We investigate boosted ensemble models for off-policy learning from logged bandit feedback. Toward t...
The design of effective bandit algorithms to learn the optimal price is a task of extraordinary impo...
Machine and Statistical Learning techniques are used in almost all online advertisement systems. The...
What is the most statistically efficient way to do off-policy optimization with batch data from bandit...
We develop a learning principle and an efficient algorithm for batch learning from logged bandit fee...
We develop a learning principle and an efficient algorithm for batch learning from logged bandit fee...
We develop a learning principle and an efficient algorithm for batch learning from logged bandit fee...
Off-policy evaluation (OPE) attempts to predict the performance of counterfactual policies using log...
This paper identifies a severe problem of the counterfactual risk estimator typi-cally used in batch...
We address a practical problem ubiquitous in modern marketing campaigns, in which a central agent tr...
Counterfactual reasoning from logged data has become increasingly important for many applicationssuc...
We study the problem of batch learning from bandit feedback in the setting of extremely large action...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
206 pagesRecent advances in reinforcement learning (RL) provide exciting potential for making agents...
In this work, we explore an online reinforcement learning problem called the multi-armed bandit for ...
We investigate boosted ensemble models for off-policy learning from logged bandit feedback. Toward t...
The design of effective bandit algorithms to learn the optimal price is a task of extraordinary impo...
Machine and Statistical Learning techniques are used in almost all online advertisement systems. The...