We develop a learning principle and an efficient algorithm for batch learning from logged bandit feedback. Unlike in supervised learning, where the algorithm receives trainin
Batch mode reinforcement learning (BMRL) is a field of research which focuses on the inference of hi...
We develop a meta-learning framework for simple regret minimization in bandits. In this framework, a...
We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly ...
We develop a learning principle and an efficient algorithm for batch learning from logged bandit fee...
We develop a learning principle and an efficient algorithm for batch learning from logged bandit fee...
This paper identifies a severe problem of the counterfactual risk estimator typi-cally used in batch...
What is the most statistically efficient way to do off-policy optimization with batch data from band...
Off-policy evaluation (OPE) attempts to predict the performance of counterfactual policies using log...
We consider a special case of bandit problems, named batched bandits, in which an agent observes bat...
Interactive systems that interact with and learn from user behavior are ubiquitous today. Machine le...
Interactive systems that interact with and learn from user behavior are ubiquitous today. Machine le...
We study the problem of using causal models to improve the rate at which good interventions can be l...
We study the problem of batch learning from bandit feedback in the setting of extremely large action...
This paper explores a new form of the linear bandit problem in which the algorithm receives the usua...
We present a new algorithm for the contextual bandit learning problem, where the learner repeat-edly...
Batch mode reinforcement learning (BMRL) is a field of research which focuses on the inference of hi...
We develop a meta-learning framework for simple regret minimization in bandits. In this framework, a...
We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly ...
We develop a learning principle and an efficient algorithm for batch learning from logged bandit fee...
We develop a learning principle and an efficient algorithm for batch learning from logged bandit fee...
This paper identifies a severe problem of the counterfactual risk estimator typi-cally used in batch...
What is the most statistically efficient way to do off-policy optimization with batch data from band...
Off-policy evaluation (OPE) attempts to predict the performance of counterfactual policies using log...
We consider a special case of bandit problems, named batched bandits, in which an agent observes bat...
Interactive systems that interact with and learn from user behavior are ubiquitous today. Machine le...
Interactive systems that interact with and learn from user behavior are ubiquitous today. Machine le...
We study the problem of using causal models to improve the rate at which good interventions can be l...
We study the problem of batch learning from bandit feedback in the setting of extremely large action...
This paper explores a new form of the linear bandit problem in which the algorithm receives the usua...
We present a new algorithm for the contextual bandit learning problem, where the learner repeat-edly...
Batch mode reinforcement learning (BMRL) is a field of research which focuses on the inference of hi...
We develop a meta-learning framework for simple regret minimization in bandits. In this framework, a...
We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly ...