We develop a learning principle and an efficient algorithm for batch learning from logged bandit feedback. This learning setting is ubiquitous in online systems (e.g., ad placement, web search, recommendation), where an algorithm makes a prediction (e.g., ad ranking) for a given input (e.g., query) and observes bandit feedback (e.g., user clicks on presented ads). We first address the counterfactual nature of the learning problem (Bottou et al., 2013) through propensity scoring. Next, we prove generalization error bounds that ac-count for the variance of the propensity-weighted empirical risk estimator. In analogy to the Structural Risk Minimization principle of Wapnik and Tscherwonenkis (1979), these constructive bounds give rise to the Co...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
In many fields such as digital marketing, healthcare, finance, and robotics, it is common to have a ...
We study the problem of decision-making under uncertainty in the bandit setting. This thesis goes be...
We develop a learning principle and an efficient algorithm for batch learning from logged bandit fee...
This paper identifies a severe problem of the counterfactual risk estimator typi-cally used in batch...
We develop a learning principle and an efficient algorithm for batch learning from logged bandit fee...
What is the most statistically efficient way to do off-policy optimization with batch data from band...
Off-policy evaluation (OPE) attempts to predict the performance of counterfactual policies using log...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
We study the problem of batch learning from bandit feedback in the setting of extremely large action...
Interactive systems that interact with and learn from user behavior are ubiquitous today. Machine le...
Counterfactual reasoning from logged data has become increasingly important for many applicationssuc...
Machine and Statistical Learning techniques are used in almost all online advertisement systems. The...
Interactive systems that interact with and learn from user behavior are ubiquitous today. Machine le...
We consider a special case of bandit problems, named batched bandits, in which an agent observes bat...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
In many fields such as digital marketing, healthcare, finance, and robotics, it is common to have a ...
We study the problem of decision-making under uncertainty in the bandit setting. This thesis goes be...
We develop a learning principle and an efficient algorithm for batch learning from logged bandit fee...
This paper identifies a severe problem of the counterfactual risk estimator typi-cally used in batch...
We develop a learning principle and an efficient algorithm for batch learning from logged bandit fee...
What is the most statistically efficient way to do off-policy optimization with batch data from band...
Off-policy evaluation (OPE) attempts to predict the performance of counterfactual policies using log...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
We study the problem of batch learning from bandit feedback in the setting of extremely large action...
Interactive systems that interact with and learn from user behavior are ubiquitous today. Machine le...
Counterfactual reasoning from logged data has become increasingly important for many applicationssuc...
Machine and Statistical Learning techniques are used in almost all online advertisement systems. The...
Interactive systems that interact with and learn from user behavior are ubiquitous today. Machine le...
We consider a special case of bandit problems, named batched bandits, in which an agent observes bat...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
In many fields such as digital marketing, healthcare, finance, and robotics, it is common to have a ...
We study the problem of decision-making under uncertainty in the bandit setting. This thesis goes be...